7.5 KiB
7.5 KiB
8. API Client — claude.ts
Streaming, retries, caching, and model fallback — how Claude Code talks to the Anthropic API.
Request Lifecycle
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#e0e0e0', 'lineColor': '#28a745', 'actorTextColor': '#e0e0e0', 'actorBorder': '#28a745', 'signalColor': '#28a745', 'noteBkgColor': '#16213e', 'noteTextColor': '#e0e0e0', 'activationBkgColor': '#1b3a1b', 'activationBorderColor': '#28a745'}}}%%
sequenceDiagram
participant Q as query.ts
participant C as claude.ts
participant R as withRetry
participant K as AnthropicClient
participant A as Anthropic API
Q->>C: queryModel(messages, tools, options)
activate C
C->>C: resolve model — runtime override, plan-mode swap
C->>C: normalize messages — strip internal fields
C->>C: build tool schemas — filter by deny, defer via ToolSearch
C->>C: configure betas, cache_control, effort, task_budget
C->>C: add prompt cache breakpoints
C->>C: compute metadata — user_id, session_id, device_id
C->>R: withRetry(clientFactory, requestFn)
activate R
loop Retry on 429, 529, timeouts
R->>K: getAnthropicClient(apiKey, model)
K->>A: beta.messages.stream(params)
activate A
alt 200 OK
A-->>R: SSE event stream
else 429 Rate Limited
R->>R: exponential backoff
else 529 Overloaded
R->>R: backoff + optional model fallback
else 401 Auth Error
R-->>C: CannotRetryError — abort
end
deactivate A
end
deactivate R
C->>C: parse stream into AssistantMessage
C->>C: update usage tracking and cost
C->>C: detect prompt cache breaks
C-->>Q: yield AssistantMessage + StreamEvents
deactivate C
Request Building
Before each API call, claude.ts builds the request through several steps:
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#e0e0e0', 'lineColor': '#28a745', 'primaryBorderColor': '#28a745'}}}%%
flowchart TD
subgraph ModelRes["1. Model Resolution"]
RUNTIME["Runtime override<br/>from AppState"]
PLAN_SWAP["Plan-mode model<br/>swap for 200K+ contexts"]
FALLBACK_M["Fallback model<br/>on 529 overload"]
end
subgraph MsgNorm["2. Message Normalization"]
STRIP["Strip internal fields<br/>uuid, timestamp, etc."]
THINKING["Preserve thinking blocks<br/>within trajectory boundaries"]
SIGNS["Strip signature blocks"]
end
subgraph ToolBuild["3. Tool Schema Building"]
FILTER_DENY["Filter denied tools"]
DEFER["Defer tools via<br/>ToolSearch deferred loading"]
EAGER["Eager tools always<br/>in prompt"]
end
subgraph Config["4. Request Configuration"]
BETAS["Beta features<br/>prompt caching, token counting"]
CACHE_CTL["cache_control breakpoints<br/>system prompt caching"]
EFFORT_V["effort parameter<br/>controls thinking depth"]
TASK_BUD["task_budget<br/>agentic turn spend limit"]
METADATA["metadata<br/>user_id, session_id"]
end
ModelRes --> MsgNorm --> ToolBuild --> Config
API_REQ["POST /v1/messages<br/>SSE stream"]:::api
Config --> API_REQ
classDef api fill:#1b3a1b,stroke:#28a745,color:#e0e0e0,stroke-width:2px
Retry Logic — withRetry
The retry wrapper handles transient API failures:
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#e0e0e0', 'lineColor': '#dc3545', 'primaryBorderColor': '#dc3545'}}}%%
flowchart TD
REQUEST["API Request"]:::start
RESPONSE{"Response<br/>status?"}:::check
OK["200 OK<br/>Stream response"]:::success
RATE["429 Rate Limited"]:::error
OVER["529 Overloaded"]:::error
AUTH["401 Auth Error"]:::fatal
TIMEOUT["Timeout"]:::error
BACKOFF["Exponential backoff<br/>wait and retry"]:::retry
FALLBACK_SWITCH["Switch to fallback model<br/>if configured"]:::retry
ABORT["CannotRetryError<br/>abort immediately"]:::fatal
REQUEST --> RESPONSE
RESPONSE -->|"200"| OK
RESPONSE -->|"429"| RATE --> BACKOFF --> REQUEST
RESPONSE -->|"529"| OVER --> FALLBACK_SWITCH --> REQUEST
RESPONSE -->|"401"| AUTH --> ABORT
RESPONSE -->|"timeout"| TIMEOUT --> BACKOFF
classDef start fill:#1a2d4a,stroke:#4a9eff,color:#e0e0e0,stroke-width:2px
classDef check fill:#2d2d0d,stroke:#ffc107,color:#e0e0e0,stroke-width:2px
classDef success fill:#1b3a1b,stroke:#28a745,color:#e0e0e0,stroke-width:2px
classDef error fill:#3d2b00,stroke:#fd7e14,color:#e0e0e0,stroke-width:2px
classDef fatal fill:#4a1a1a,stroke:#dc3545,color:#e0e0e0,stroke-width:2px
classDef retry fill:#2d1b4e,stroke:#6f42c1,color:#e0e0e0,stroke-width:2px
Streaming Fallback
A unique feature: if the model is overloaded mid-stream (529), Claude Code can:
- Tombstone the partial assistant messages
- Switch to a fallback model
- Restart the stream from scratch
- The user sees no interruption — orphaned messages are removed from UI
Prompt Caching
Claude Code uses Anthropic's prompt cache to avoid re-processing unchanged context:
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#e0e0e0', 'lineColor': '#4a9eff', 'primaryBorderColor': '#4a9eff'}}}%%
flowchart LR
SYS["System prompt<br/>cache_control breakpoint"]:::cached
TOOLS["Tool schemas<br/>cache_control breakpoint"]:::cached
HISTORY["Conversation history<br/>bytes must match exactly"]:::uncached
API["API Request"]:::api
HIT["Cache HIT<br/>~90% cheaper<br/>~10x faster"]:::hit
MISS["Cache MISS<br/>full processing<br/>new cache created"]:::miss
SYS --> API
TOOLS --> API
HISTORY --> API
API --> HIT
API --> MISS
classDef cached fill:#1b3a1b,stroke:#28a745,color:#e0e0e0,stroke-width:2px
classDef uncached fill:#333,stroke:#888,color:#e0e0e0,stroke-width:1px
classDef api fill:#1a2d4a,stroke:#4a9eff,color:#e0e0e0,stroke-width:2px
classDef hit fill:#0d4f4f,stroke:#17a2b8,color:#e0e0e0,stroke-width:2px
classDef miss fill:#3d2b00,stroke:#fd7e14,color:#e0e0e0,stroke-width:2px
Cache breaks are detected and logged. The backfillObservableInput() pattern exists specifically to avoid breaking the cache — the original API-bound input is never mutated.
SSE Stream Events
The API returns Server-Sent Events in this order:
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#e0e0e0', 'lineColor': '#4a9eff', 'primaryBorderColor': '#4a9eff'}}}%%
sequenceDiagram
participant API as Anthropic API
participant C as claude.ts
API->>C: message_start — model, usage, id
loop For each content block
API->>C: content_block_start — type, index
loop Delta events
API->>C: content_block_delta — text / thinking / tool_use JSON
end
API->>C: content_block_stop
end
API->>C: message_delta — stop_reason, final usage
API->>C: message_stop
Note over C: Parse into AssistantMessage<br/>Track usage + cost<br/>Yield to query.ts
Cost Tracking
Every API call's usage is tracked in cost-tracker.ts:
- Input tokens (including cache reads/writes)
- Output tokens
- Per-model pricing
- Session totals exposed via
/costcommand
Previous: ← Extension Model · Next: UI Architecture →