claude-code/learn/02-agentic-loop.md

414 lines
12 KiB
Markdown

# 2. The Agentic Loop
> How Claude Code cycles between the model and tools — the core of the entire system.
---
## What Is the "Agentic Loop"?
The agentic loop is the mechanism that makes Claude Code more than a chatbot. Instead of:
```
User → Model → Response (done)
```
Claude Code does:
```
User → Model → Tool Call → Model → Tool Call → ... → Response (done)
```
The model can call tools, see results, and decide what to do next — in a **loop** — until it decides it's finished. This loop lives in **`query.ts`** (1,730 lines), and it's the most important file in the entire codebase.
---
## The Full Sequence
```mermaid
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#e0e0e0', 'lineColor': '#4a9eff', 'actorTextColor': '#e0e0e0', 'actorBorder': '#4a9eff', 'signalColor': '#4a9eff', 'noteBkgColor': '#16213e', 'noteTextColor': '#e0e0e0', 'activationBkgColor': '#2d1b4e', 'activationBorderColor': '#e83e8c'}}}%%
sequenceDiagram
participant U as User
participant QE as QueryEngine
participant PI as processUserInput
participant Q as query.ts
participant CP as Compaction
participant C as claude.ts
participant API as Anthropic API
participant T as Tool Executor
participant H as Hooks
U->>QE: submitMessage(prompt)
activate QE
QE->>PI: parse slash commands, @mentions, attachments
PI-->>QE: messages[], shouldQuery
alt Slash command — no API call needed
QE-->>U: return local result
else Model query required
QE->>QE: persist transcript to disk
QE->>Q: query(messages, systemPrompt, tools)
activate Q
loop Agentic Loop — until end_turn or max_turns
Q->>CP: snip compact
CP->>CP: micro compact
CP->>CP: auto compact
CP->>CP: context collapse
CP-->>Q: compacted messages
Q->>C: queryModel(messages, tools, thinking)
activate C
C->>C: build request: betas, cache, effort, budget
C->>API: POST /v1/messages — SSE stream
activate API
API-->>C: message_start — usage
API-->>C: content_block — thinking / text / tool_use
API-->>C: message_delta — stop_reason, final usage
deactivate API
C-->>Q: yield AssistantMessage + StreamEvents
deactivate C
alt stop_reason = end_turn
Q->>H: postSamplingHooks
Q->>Q: handleStopHooks
Q-->>QE: return Terminal result
else stop_reason = tool_use
Q->>T: runTools(toolUseBlocks)
activate T
T->>T: validate input against schema
T->>H: PreToolUse hooks
T->>T: check permissions
T->>T: call(input, context)
T->>H: PostToolUse hooks
T-->>Q: yield tool_result messages
deactivate T
Q->>Q: inject CLAUDE.md attachments
Note over Q: push tool_results, continue loop
else stop_reason = max_tokens
Q->>Q: truncation retry — up to 3x
Q->>CP: reactive compact — emergency
end
end
deactivate Q
QE->>QE: accumulate usage, persist transcript
QE-->>U: yield SDKMessage stream
end
deactivate QE
```
---
## Anatomy of `query.ts`
The file exports a single async generator function:
```typescript
export async function* query(params: QueryParams):
AsyncGenerator<StreamEvent | Message | ToolUseSummaryMessage, Terminal> {
// ... 1,730 lines of agentic loop
}
```
### Why an Async Generator?
This is a crucial design decision. An async generator (`async function*`) lets query.ts:
1. **Yield messages as they arrive** — The consumer (REPL or SDK) sees each message in real-time
2. **Maintain backpressure** — The loop pauses if the consumer isn't ready
3. **Support cancellation**`.return()` on the generator cleanly tears down the loop
4. **Compose generators**`yield*` delegates to sub-generators seamlessly
### The Loop State
Each loop iteration carries mutable state:
```typescript
type State = {
messages: Message[] // Conversation history
toolUseContext: ToolUseContext // Tool execution context
autoCompactTracking: AutoCompactTrackingState // Compact progress
maxOutputTokensRecoveryCount: number // Truncation retry counter
hasAttemptedReactiveCompact: boolean // Emergency compact flag
turnCount: number // Loop iteration counter
pendingToolUseSummary: Promise<...> // Async summary generation
transition: Continue | undefined // Why we're in this iteration
}
```
---
## Phase 1: Pre-Processing — Before the API Call
Before every API call, query.ts runs a **compaction pipeline** on the message history:
```typescript
// 1. Apply per-message tool result budgets
messagesForQuery = await applyToolResultBudget(messagesForQuery, ...)
// 2. Snip compact — sliding window over old turns
if (feature('HISTORY_SNIP')) {
const snipResult = snipModule.snipCompactIfNeeded(messagesForQuery)
messagesForQuery = snipResult.messages
}
// 3. Micro compact — truncate oversized tool results
const microcompactResult = await deps.microcompact(messagesForQuery, ...)
messagesForQuery = microcompactResult.messages
// 4. Context collapse — read-time projection
if (feature('CONTEXT_COLLAPSE') && contextCollapse) {
const collapseResult = await contextCollapse.applyCollapsesIfNeeded(...)
messagesForQuery = collapseResult.messages
}
// 5. Auto compact — full summarization via API
const { compactionResult } = await deps.autocompact(messagesForQuery, ...)
```
Each stage is feature-gated and runs **independently**. They compose — snip reduces history, micro truncates individual results, auto summarizes the whole thing, collapse archives old views.
### Blocking Limit Check
After compaction, the loop checks if we're at the **blocking limit** (>98% context used):
```typescript
const { isAtBlockingLimit } = calculateTokenWarningState(
tokenCountWithEstimation(messagesForQuery),
toolUseContext.options.mainLoopModel,
)
if (isAtBlockingLimit) {
yield createAssistantAPIErrorMessage({ content: PROMPT_TOO_LONG_ERROR_MESSAGE })
return { reason: 'blocking_limit' }
}
```
This prevents the API call entirely if we know it'll fail.
---
## Phase 2: The API Call
The actual model call uses `claude.ts`:
```typescript
for await (const message of deps.callModel({
messages: prependUserContext(messagesForQuery, userContext),
systemPrompt: fullSystemPrompt,
thinkingConfig: toolUseContext.options.thinkingConfig,
tools: toolUseContext.options.tools,
signal: toolUseContext.abortController.signal,
options: {
model: currentModel,
fallbackModel,
effortValue: appState.effortValue,
taskBudget: params.taskBudget,
// ... 20+ more options
},
})) {
// Process each streamed message
}
```
Responses stream in as SSE events. The loop processes three content block types:
| Block Type | What Happens |
|-----------|-------------|
| `thinking` | Rendered in UI, not sent back to model |
| `text` | Rendered as markdown in terminal |
| `tool_use` | Triggers tool execution (next phase) |
---
## Phase 3: Tool Execution
When the model responds with `tool_use` blocks, the loop executes them:
```typescript
// Parallel tool execution
const toolResults = yield* runTools(toolUseBlocks, canUseTool, toolUseContext)
```
### Streaming Tool Execution
A performance optimization: tools can begin executing **while the model is still streaming**:
```typescript
const useStreamingToolExecution = config.gates.streamingToolExecution
let streamingToolExecutor = useStreamingToolExecution
? new StreamingToolExecutor(tools, canUseTool, toolUseContext)
: null
```
The `StreamingToolExecutor` starts validating and permission-checking tool calls as their blocks arrive, before the full response is complete.
### Tool Lifecycle
Each tool goes through:
1. **Schema Validation** — Zod validates the input against `tool.inputSchema`
2. **PreToolUse Hooks** — User-defined scripts can approve, deny, or modify the input
3. **Permission Check** — Deny rules → Allow rules → Tool-specific check → Classifier → User dialog
4. **Execution**`tool.call(input, context)` runs the actual operation
5. **PostToolUse Hooks** — Scripts run after execution with the result
---
## Phase 4: Loop Continuation
After tools execute, the loop decides what to do next based on the `stop_reason`:
### `end_turn` — Model is done
```typescript
if (stop_reason === 'end_turn') {
// Run post-sampling hooks
await executePostSamplingHooks(assistantMessage, toolUseContext)
// Check stop hooks (can force continuation)
const stopResult = await handleStopHooks(assistantMessage, messages)
if (stopResult.shouldContinue) {
// Inject hook feedback and continue loop
} else {
return { reason: 'end_turn' } // Terminal — loop exits
}
}
```
### `tool_use` — Model wants to use tools
The tool results are pushed to messages and the loop continues:
```typescript
messages.push(...toolResults)
// Inject CLAUDE.md attachments for newly-discovered memory files
const attachments = await getAttachmentMessages(messages, toolUseContext)
messages.push(...attachments)
// Continue to next iteration (back to Phase 1)
```
### `max_tokens` — Response was truncated
```typescript
if (maxOutputTokensRecoveryCount < MAX_OUTPUT_TOKENS_RECOVERY_LIMIT) {
// Retry with increased max_tokens
state.maxOutputTokensRecoveryCount++
continue
} else {
// Trigger reactive compact as last resort
state.hasAttemptedReactiveCompact = true
}
```
---
## QueryEngine: The Session Wrapper
`QueryEngine.ts` wraps `query()` in a session lifecycle:
```typescript
class QueryEngine {
private mutableMessages: Message[]
private totalUsage: NonNullableUsage
private readFileState: FileStateCache
async *submitMessage(prompt): AsyncGenerator<SDKMessage> {
// 1. Parse user input (slash commands, @mentions)
const { messages, shouldQuery } = await processUserInput({ input: prompt })
// 2. Persist transcript to disk
await recordTranscript(messages)
// 3. Run the agentic loop
if (shouldQuery) {
for await (const message of query({ messages, systemPrompt, tools })) {
// 4. Map internal messages to SDK format
yield normalizedSDKMessage(message)
// 5. Persist each message
await recordTranscript(messages)
}
}
// 6. Return final result with usage stats
yield { type: 'result', total_cost_usd, usage, duration_ms }
}
}
```
---
## Key Patterns to Understand
### 1. Generator Composition
The codebase uses `yield*` heavily to compose generators:
```typescript
// query.ts delegates to sub-generators
yield* runTools(toolUseBlocks, canUseTool, toolUseContext)
// QueryEngine delegates to query
for await (const message of query(params)) {
yield* normalizeMessage(message)
}
```
### 2. Feature-Gated Loading
Code paths are gated by build-time feature flags:
```typescript
const reactiveCompact = feature('REACTIVE_COMPACT')
? require('./services/compact/reactiveCompact.js')
: null
// Later...
if (reactiveCompact?.isReactiveCompactEnabled()) {
// This entire branch is eliminated in builds where REACTIVE_COMPACT is false
}
```
### 3. Tombstone Messages
When a streaming fallback occurs (model switch mid-stream), orphaned messages are tombstoned:
```typescript
for (const msg of assistantMessages) {
yield { type: 'tombstone', message: msg } // Removed from UI + transcript
}
assistantMessages.length = 0 // Reset
```
### 4. Task Budget Tracking
API-level task budgets track spend across compaction boundaries:
```typescript
if (params.taskBudget) {
const preCompactContext = finalContextTokensFromLastResponse(messagesForQuery)
taskBudgetRemaining = Math.max(
0,
(taskBudgetRemaining ?? params.taskBudget.total) - preCompactContext,
)
}
```
---
## Common Debugging Scenarios
| Symptom | Where to Look |
|---------|--------------|
| Loop never stops | Check `maxTurns` limit, `handleStopHooks` |
| Tool not executing | Permission system — check deny rules, hooks, classifier |
| Context too large | Compaction pipeline — which stage is failing? |
| Model fallback | `withRetry` in claude.ts — 529 overloaded triggers |
| Truncation errors | `MAX_OUTPUT_TOKENS_RECOVERY_LIMIT` (3 retries) |
---
**Previous:** [← System Overview](./01-system-overview.md) · **Next:** [Tool System →](./03-tool-system.md)