414 lines
12 KiB
Markdown
414 lines
12 KiB
Markdown
# 2. The Agentic Loop
|
|
|
|
> How Claude Code cycles between the model and tools — the core of the entire system.
|
|
|
|
---
|
|
|
|
## What Is the "Agentic Loop"?
|
|
|
|
The agentic loop is the mechanism that makes Claude Code more than a chatbot. Instead of:
|
|
|
|
```
|
|
User → Model → Response (done)
|
|
```
|
|
|
|
Claude Code does:
|
|
|
|
```
|
|
User → Model → Tool Call → Model → Tool Call → ... → Response (done)
|
|
```
|
|
|
|
The model can call tools, see results, and decide what to do next — in a **loop** — until it decides it's finished. This loop lives in **`query.ts`** (1,730 lines), and it's the most important file in the entire codebase.
|
|
|
|
---
|
|
|
|
## The Full Sequence
|
|
|
|
```mermaid
|
|
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#e0e0e0', 'lineColor': '#4a9eff', 'actorTextColor': '#e0e0e0', 'actorBorder': '#4a9eff', 'signalColor': '#4a9eff', 'noteBkgColor': '#16213e', 'noteTextColor': '#e0e0e0', 'activationBkgColor': '#2d1b4e', 'activationBorderColor': '#e83e8c'}}}%%
|
|
sequenceDiagram
|
|
participant U as User
|
|
participant QE as QueryEngine
|
|
participant PI as processUserInput
|
|
participant Q as query.ts
|
|
participant CP as Compaction
|
|
participant C as claude.ts
|
|
participant API as Anthropic API
|
|
participant T as Tool Executor
|
|
participant H as Hooks
|
|
|
|
U->>QE: submitMessage(prompt)
|
|
activate QE
|
|
QE->>PI: parse slash commands, @mentions, attachments
|
|
PI-->>QE: messages[], shouldQuery
|
|
|
|
alt Slash command — no API call needed
|
|
QE-->>U: return local result
|
|
else Model query required
|
|
QE->>QE: persist transcript to disk
|
|
QE->>Q: query(messages, systemPrompt, tools)
|
|
activate Q
|
|
|
|
loop Agentic Loop — until end_turn or max_turns
|
|
Q->>CP: snip compact
|
|
CP->>CP: micro compact
|
|
CP->>CP: auto compact
|
|
CP->>CP: context collapse
|
|
CP-->>Q: compacted messages
|
|
|
|
Q->>C: queryModel(messages, tools, thinking)
|
|
activate C
|
|
C->>C: build request: betas, cache, effort, budget
|
|
C->>API: POST /v1/messages — SSE stream
|
|
activate API
|
|
|
|
API-->>C: message_start — usage
|
|
API-->>C: content_block — thinking / text / tool_use
|
|
API-->>C: message_delta — stop_reason, final usage
|
|
deactivate API
|
|
|
|
C-->>Q: yield AssistantMessage + StreamEvents
|
|
deactivate C
|
|
|
|
alt stop_reason = end_turn
|
|
Q->>H: postSamplingHooks
|
|
Q->>Q: handleStopHooks
|
|
Q-->>QE: return Terminal result
|
|
else stop_reason = tool_use
|
|
Q->>T: runTools(toolUseBlocks)
|
|
activate T
|
|
|
|
T->>T: validate input against schema
|
|
T->>H: PreToolUse hooks
|
|
T->>T: check permissions
|
|
T->>T: call(input, context)
|
|
T->>H: PostToolUse hooks
|
|
T-->>Q: yield tool_result messages
|
|
deactivate T
|
|
|
|
Q->>Q: inject CLAUDE.md attachments
|
|
Note over Q: push tool_results, continue loop
|
|
else stop_reason = max_tokens
|
|
Q->>Q: truncation retry — up to 3x
|
|
Q->>CP: reactive compact — emergency
|
|
end
|
|
end
|
|
|
|
deactivate Q
|
|
QE->>QE: accumulate usage, persist transcript
|
|
QE-->>U: yield SDKMessage stream
|
|
end
|
|
deactivate QE
|
|
```
|
|
|
|
---
|
|
|
|
## Anatomy of `query.ts`
|
|
|
|
The file exports a single async generator function:
|
|
|
|
```typescript
|
|
export async function* query(params: QueryParams):
|
|
AsyncGenerator<StreamEvent | Message | ToolUseSummaryMessage, Terminal> {
|
|
// ... 1,730 lines of agentic loop
|
|
}
|
|
```
|
|
|
|
### Why an Async Generator?
|
|
|
|
This is a crucial design decision. An async generator (`async function*`) lets query.ts:
|
|
|
|
1. **Yield messages as they arrive** — The consumer (REPL or SDK) sees each message in real-time
|
|
2. **Maintain backpressure** — The loop pauses if the consumer isn't ready
|
|
3. **Support cancellation** — `.return()` on the generator cleanly tears down the loop
|
|
4. **Compose generators** — `yield*` delegates to sub-generators seamlessly
|
|
|
|
### The Loop State
|
|
|
|
Each loop iteration carries mutable state:
|
|
|
|
```typescript
|
|
type State = {
|
|
messages: Message[] // Conversation history
|
|
toolUseContext: ToolUseContext // Tool execution context
|
|
autoCompactTracking: AutoCompactTrackingState // Compact progress
|
|
maxOutputTokensRecoveryCount: number // Truncation retry counter
|
|
hasAttemptedReactiveCompact: boolean // Emergency compact flag
|
|
turnCount: number // Loop iteration counter
|
|
pendingToolUseSummary: Promise<...> // Async summary generation
|
|
transition: Continue | undefined // Why we're in this iteration
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Phase 1: Pre-Processing — Before the API Call
|
|
|
|
Before every API call, query.ts runs a **compaction pipeline** on the message history:
|
|
|
|
```typescript
|
|
// 1. Apply per-message tool result budgets
|
|
messagesForQuery = await applyToolResultBudget(messagesForQuery, ...)
|
|
|
|
// 2. Snip compact — sliding window over old turns
|
|
if (feature('HISTORY_SNIP')) {
|
|
const snipResult = snipModule.snipCompactIfNeeded(messagesForQuery)
|
|
messagesForQuery = snipResult.messages
|
|
}
|
|
|
|
// 3. Micro compact — truncate oversized tool results
|
|
const microcompactResult = await deps.microcompact(messagesForQuery, ...)
|
|
messagesForQuery = microcompactResult.messages
|
|
|
|
// 4. Context collapse — read-time projection
|
|
if (feature('CONTEXT_COLLAPSE') && contextCollapse) {
|
|
const collapseResult = await contextCollapse.applyCollapsesIfNeeded(...)
|
|
messagesForQuery = collapseResult.messages
|
|
}
|
|
|
|
// 5. Auto compact — full summarization via API
|
|
const { compactionResult } = await deps.autocompact(messagesForQuery, ...)
|
|
```
|
|
|
|
Each stage is feature-gated and runs **independently**. They compose — snip reduces history, micro truncates individual results, auto summarizes the whole thing, collapse archives old views.
|
|
|
|
### Blocking Limit Check
|
|
|
|
After compaction, the loop checks if we're at the **blocking limit** (>98% context used):
|
|
|
|
```typescript
|
|
const { isAtBlockingLimit } = calculateTokenWarningState(
|
|
tokenCountWithEstimation(messagesForQuery),
|
|
toolUseContext.options.mainLoopModel,
|
|
)
|
|
if (isAtBlockingLimit) {
|
|
yield createAssistantAPIErrorMessage({ content: PROMPT_TOO_LONG_ERROR_MESSAGE })
|
|
return { reason: 'blocking_limit' }
|
|
}
|
|
```
|
|
|
|
This prevents the API call entirely if we know it'll fail.
|
|
|
|
---
|
|
|
|
## Phase 2: The API Call
|
|
|
|
The actual model call uses `claude.ts`:
|
|
|
|
```typescript
|
|
for await (const message of deps.callModel({
|
|
messages: prependUserContext(messagesForQuery, userContext),
|
|
systemPrompt: fullSystemPrompt,
|
|
thinkingConfig: toolUseContext.options.thinkingConfig,
|
|
tools: toolUseContext.options.tools,
|
|
signal: toolUseContext.abortController.signal,
|
|
options: {
|
|
model: currentModel,
|
|
fallbackModel,
|
|
effortValue: appState.effortValue,
|
|
taskBudget: params.taskBudget,
|
|
// ... 20+ more options
|
|
},
|
|
})) {
|
|
// Process each streamed message
|
|
}
|
|
```
|
|
|
|
Responses stream in as SSE events. The loop processes three content block types:
|
|
|
|
| Block Type | What Happens |
|
|
|-----------|-------------|
|
|
| `thinking` | Rendered in UI, not sent back to model |
|
|
| `text` | Rendered as markdown in terminal |
|
|
| `tool_use` | Triggers tool execution (next phase) |
|
|
|
|
---
|
|
|
|
## Phase 3: Tool Execution
|
|
|
|
When the model responds with `tool_use` blocks, the loop executes them:
|
|
|
|
```typescript
|
|
// Parallel tool execution
|
|
const toolResults = yield* runTools(toolUseBlocks, canUseTool, toolUseContext)
|
|
```
|
|
|
|
### Streaming Tool Execution
|
|
|
|
A performance optimization: tools can begin executing **while the model is still streaming**:
|
|
|
|
```typescript
|
|
const useStreamingToolExecution = config.gates.streamingToolExecution
|
|
let streamingToolExecutor = useStreamingToolExecution
|
|
? new StreamingToolExecutor(tools, canUseTool, toolUseContext)
|
|
: null
|
|
```
|
|
|
|
The `StreamingToolExecutor` starts validating and permission-checking tool calls as their blocks arrive, before the full response is complete.
|
|
|
|
### Tool Lifecycle
|
|
|
|
Each tool goes through:
|
|
|
|
1. **Schema Validation** — Zod validates the input against `tool.inputSchema`
|
|
2. **PreToolUse Hooks** — User-defined scripts can approve, deny, or modify the input
|
|
3. **Permission Check** — Deny rules → Allow rules → Tool-specific check → Classifier → User dialog
|
|
4. **Execution** — `tool.call(input, context)` runs the actual operation
|
|
5. **PostToolUse Hooks** — Scripts run after execution with the result
|
|
|
|
---
|
|
|
|
## Phase 4: Loop Continuation
|
|
|
|
After tools execute, the loop decides what to do next based on the `stop_reason`:
|
|
|
|
### `end_turn` — Model is done
|
|
|
|
```typescript
|
|
if (stop_reason === 'end_turn') {
|
|
// Run post-sampling hooks
|
|
await executePostSamplingHooks(assistantMessage, toolUseContext)
|
|
// Check stop hooks (can force continuation)
|
|
const stopResult = await handleStopHooks(assistantMessage, messages)
|
|
if (stopResult.shouldContinue) {
|
|
// Inject hook feedback and continue loop
|
|
} else {
|
|
return { reason: 'end_turn' } // Terminal — loop exits
|
|
}
|
|
}
|
|
```
|
|
|
|
### `tool_use` — Model wants to use tools
|
|
|
|
The tool results are pushed to messages and the loop continues:
|
|
|
|
```typescript
|
|
messages.push(...toolResults)
|
|
// Inject CLAUDE.md attachments for newly-discovered memory files
|
|
const attachments = await getAttachmentMessages(messages, toolUseContext)
|
|
messages.push(...attachments)
|
|
// Continue to next iteration (back to Phase 1)
|
|
```
|
|
|
|
### `max_tokens` — Response was truncated
|
|
|
|
```typescript
|
|
if (maxOutputTokensRecoveryCount < MAX_OUTPUT_TOKENS_RECOVERY_LIMIT) {
|
|
// Retry with increased max_tokens
|
|
state.maxOutputTokensRecoveryCount++
|
|
continue
|
|
} else {
|
|
// Trigger reactive compact as last resort
|
|
state.hasAttemptedReactiveCompact = true
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## QueryEngine: The Session Wrapper
|
|
|
|
`QueryEngine.ts` wraps `query()` in a session lifecycle:
|
|
|
|
```typescript
|
|
class QueryEngine {
|
|
private mutableMessages: Message[]
|
|
private totalUsage: NonNullableUsage
|
|
private readFileState: FileStateCache
|
|
|
|
async *submitMessage(prompt): AsyncGenerator<SDKMessage> {
|
|
// 1. Parse user input (slash commands, @mentions)
|
|
const { messages, shouldQuery } = await processUserInput({ input: prompt })
|
|
|
|
// 2. Persist transcript to disk
|
|
await recordTranscript(messages)
|
|
|
|
// 3. Run the agentic loop
|
|
if (shouldQuery) {
|
|
for await (const message of query({ messages, systemPrompt, tools })) {
|
|
// 4. Map internal messages to SDK format
|
|
yield normalizedSDKMessage(message)
|
|
// 5. Persist each message
|
|
await recordTranscript(messages)
|
|
}
|
|
}
|
|
|
|
// 6. Return final result with usage stats
|
|
yield { type: 'result', total_cost_usd, usage, duration_ms }
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Key Patterns to Understand
|
|
|
|
### 1. Generator Composition
|
|
|
|
The codebase uses `yield*` heavily to compose generators:
|
|
|
|
```typescript
|
|
// query.ts delegates to sub-generators
|
|
yield* runTools(toolUseBlocks, canUseTool, toolUseContext)
|
|
|
|
// QueryEngine delegates to query
|
|
for await (const message of query(params)) {
|
|
yield* normalizeMessage(message)
|
|
}
|
|
```
|
|
|
|
### 2. Feature-Gated Loading
|
|
|
|
Code paths are gated by build-time feature flags:
|
|
|
|
```typescript
|
|
const reactiveCompact = feature('REACTIVE_COMPACT')
|
|
? require('./services/compact/reactiveCompact.js')
|
|
: null
|
|
|
|
// Later...
|
|
if (reactiveCompact?.isReactiveCompactEnabled()) {
|
|
// This entire branch is eliminated in builds where REACTIVE_COMPACT is false
|
|
}
|
|
```
|
|
|
|
### 3. Tombstone Messages
|
|
|
|
When a streaming fallback occurs (model switch mid-stream), orphaned messages are tombstoned:
|
|
|
|
```typescript
|
|
for (const msg of assistantMessages) {
|
|
yield { type: 'tombstone', message: msg } // Removed from UI + transcript
|
|
}
|
|
assistantMessages.length = 0 // Reset
|
|
```
|
|
|
|
### 4. Task Budget Tracking
|
|
|
|
API-level task budgets track spend across compaction boundaries:
|
|
|
|
```typescript
|
|
if (params.taskBudget) {
|
|
const preCompactContext = finalContextTokensFromLastResponse(messagesForQuery)
|
|
taskBudgetRemaining = Math.max(
|
|
0,
|
|
(taskBudgetRemaining ?? params.taskBudget.total) - preCompactContext,
|
|
)
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Common Debugging Scenarios
|
|
|
|
| Symptom | Where to Look |
|
|
|---------|--------------|
|
|
| Loop never stops | Check `maxTurns` limit, `handleStopHooks` |
|
|
| Tool not executing | Permission system — check deny rules, hooks, classifier |
|
|
| Context too large | Compaction pipeline — which stage is failing? |
|
|
| Model fallback | `withRetry` in claude.ts — 529 overloaded triggers |
|
|
| Truncation errors | `MAX_OUTPUT_TOKENS_RECOVERY_LIMIT` (3 retries) |
|
|
|
|
---
|
|
|
|
**Previous:** [← System Overview](./01-system-overview.md) · **Next:** [Tool System →](./03-tool-system.md)
|