# 8. API Client — `claude.ts`

> Streaming, retries, caching, and model fallback — how Claude Code talks to the Anthropic API.

---

## Request Lifecycle

```mermaid
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#e0e0e0', 'lineColor': '#28a745', 'actorTextColor': '#e0e0e0', 'actorBorder': '#28a745', 'signalColor': '#28a745', 'noteBkgColor': '#16213e', 'noteTextColor': '#e0e0e0', 'activationBkgColor': '#1b3a1b', 'activationBorderColor': '#28a745'}}}%%
sequenceDiagram
    participant Q as query.ts
    participant C as claude.ts
    participant R as withRetry
    participant K as AnthropicClient
    participant A as Anthropic API

    Q->>C: queryModel(messages, tools, options)
    activate C

    C->>C: resolve model — runtime override, plan-mode swap
    C->>C: normalize messages — strip internal fields
    C->>C: build tool schemas — filter by deny, defer via ToolSearch
    C->>C: configure betas, cache_control, effort, task_budget
    C->>C: add prompt cache breakpoints
    C->>C: compute metadata — user_id, session_id, device_id

    C->>R: withRetry(clientFactory, requestFn)
    activate R

    loop Retry on 429, 529, timeouts
        R->>K: getAnthropicClient(apiKey, model)
        K->>A: beta.messages.stream(params)
        activate A

        alt 200 OK
            A-->>R: SSE event stream
        else 429 Rate Limited
            R->>R: exponential backoff
        else 529 Overloaded
            R->>R: backoff + optional model fallback
        else 401 Auth Error
            R-->>C: CannotRetryError — abort
        end
        deactivate A
    end

    deactivate R

    C->>C: parse stream into AssistantMessage
    C->>C: update usage tracking and cost
    C->>C: detect prompt cache breaks
    C-->>Q: yield AssistantMessage + StreamEvents
    deactivate C
```

---

## Request Building

Before each API call, `claude.ts` builds the request through several steps:

```mermaid
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#e0e0e0', 'lineColor': '#28a745', 'primaryBorderColor': '#28a745'}}}%%
flowchart TD
    subgraph ModelRes["1. Model Resolution"]
        RUNTIME["Runtime override<br/>from AppState"]
        PLAN_SWAP["Plan-mode model<br/>swap for 200K+ contexts"]
        FALLBACK_M["Fallback model<br/>on 529 overload"]
    end

    subgraph MsgNorm["2. Message Normalization"]
        STRIP["Strip internal fields<br/>uuid, timestamp, etc."]
        THINKING["Preserve thinking blocks<br/>within trajectory boundaries"]
        SIGNS["Strip signature blocks"]
    end

    subgraph ToolBuild["3. Tool Schema Building"]
        FILTER_DENY["Filter denied tools"]
        DEFER["Defer tools via<br/>ToolSearch deferred loading"]
        EAGER["Eager tools always<br/>in prompt"]
    end

    subgraph Config["4. Request Configuration"]
        BETAS["Beta features<br/>prompt caching, token counting"]
        CACHE_CTL["cache_control breakpoints<br/>system prompt caching"]
        EFFORT_V["effort parameter<br/>controls thinking depth"]
        TASK_BUD["task_budget<br/>agentic turn spend limit"]
        METADATA["metadata<br/>user_id, session_id"]
    end

    ModelRes --> MsgNorm --> ToolBuild --> Config

    API_REQ["POST /v1/messages<br/>SSE stream"]:::api
    Config --> API_REQ

    classDef api fill:#1b3a1b,stroke:#28a745,color:#e0e0e0,stroke-width:2px
```

---

## Retry Logic — `withRetry`

The retry wrapper handles transient API failures:

```mermaid
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#e0e0e0', 'lineColor': '#dc3545', 'primaryBorderColor': '#dc3545'}}}%%
flowchart TD
    REQUEST["API Request"]:::start

    RESPONSE{"Response<br/>status?"}:::check

    OK["200 OK<br/>Stream response"]:::success
    RATE["429 Rate Limited"]:::error
    OVER["529 Overloaded"]:::error
    AUTH["401 Auth Error"]:::fatal
    TIMEOUT["Timeout"]:::error

    BACKOFF["Exponential backoff<br/>wait and retry"]:::retry
    FALLBACK_SWITCH["Switch to fallback model<br/>if configured"]:::retry
    ABORT["CannotRetryError<br/>abort immediately"]:::fatal

    REQUEST --> RESPONSE
    RESPONSE -->|"200"| OK
    RESPONSE -->|"429"| RATE --> BACKOFF --> REQUEST
    RESPONSE -->|"529"| OVER --> FALLBACK_SWITCH --> REQUEST
    RESPONSE -->|"401"| AUTH --> ABORT
    RESPONSE -->|"timeout"| TIMEOUT --> BACKOFF

    classDef start fill:#1a2d4a,stroke:#4a9eff,color:#e0e0e0,stroke-width:2px
    classDef check fill:#2d2d0d,stroke:#ffc107,color:#e0e0e0,stroke-width:2px
    classDef success fill:#1b3a1b,stroke:#28a745,color:#e0e0e0,stroke-width:2px
    classDef error fill:#3d2b00,stroke:#fd7e14,color:#e0e0e0,stroke-width:2px
    classDef fatal fill:#4a1a1a,stroke:#dc3545,color:#e0e0e0,stroke-width:2px
    classDef retry fill:#2d1b4e,stroke:#6f42c1,color:#e0e0e0,stroke-width:2px
```

### Streaming Fallback

A unique feature: if the model is overloaded mid-stream (529), Claude Code can:
1. **Tombstone** the partial assistant messages
2. Switch to a fallback model
3. Restart the stream from scratch
4. The user sees no interruption — orphaned messages are removed from UI

---

## Prompt Caching

Claude Code uses Anthropic's prompt cache to avoid re-processing unchanged context:

```mermaid
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#e0e0e0', 'lineColor': '#4a9eff', 'primaryBorderColor': '#4a9eff'}}}%%
flowchart LR
    SYS["System prompt<br/>cache_control breakpoint"]:::cached
    TOOLS["Tool schemas<br/>cache_control breakpoint"]:::cached
    HISTORY["Conversation history<br/>bytes must match exactly"]:::uncached

    API["API Request"]:::api

    HIT["Cache HIT<br/>~90% cheaper<br/>~10x faster"]:::hit
    MISS["Cache MISS<br/>full processing<br/>new cache created"]:::miss

    SYS --> API
    TOOLS --> API
    HISTORY --> API

    API --> HIT
    API --> MISS

    classDef cached fill:#1b3a1b,stroke:#28a745,color:#e0e0e0,stroke-width:2px
    classDef uncached fill:#333,stroke:#888,color:#e0e0e0,stroke-width:1px
    classDef api fill:#1a2d4a,stroke:#4a9eff,color:#e0e0e0,stroke-width:2px
    classDef hit fill:#0d4f4f,stroke:#17a2b8,color:#e0e0e0,stroke-width:2px
    classDef miss fill:#3d2b00,stroke:#fd7e14,color:#e0e0e0,stroke-width:2px
```

Cache breaks are detected and logged. The `backfillObservableInput()` pattern exists specifically to avoid breaking the cache — the original API-bound input is never mutated.

---

## SSE Stream Events

The API returns Server-Sent Events in this order:

```mermaid
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#e0e0e0', 'lineColor': '#4a9eff', 'primaryBorderColor': '#4a9eff'}}}%%
sequenceDiagram
    participant API as Anthropic API
    participant C as claude.ts

    API->>C: message_start — model, usage, id
    
    loop For each content block
        API->>C: content_block_start — type, index
        loop Delta events
            API->>C: content_block_delta — text / thinking / tool_use JSON
        end
        API->>C: content_block_stop
    end

    API->>C: message_delta — stop_reason, final usage
    API->>C: message_stop

    Note over C: Parse into AssistantMessage<br/>Track usage + cost<br/>Yield to query.ts
```

---

## Cost Tracking

Every API call's usage is tracked in `cost-tracker.ts`:
- Input tokens (including cache reads/writes)
- Output tokens
- Per-model pricing
- Session totals exposed via `/cost` command

---

**Previous:** [← Extension Model](./07-extension-model.md) · **Next:** [UI Architecture →](./09-ui-architecture.md)