12 KiB
2. The Agentic Loop
How Claude Code cycles between the model and tools — the core of the entire system.
What Is the "Agentic Loop"?
The agentic loop is the mechanism that makes Claude Code more than a chatbot. Instead of:
User → Model → Response (done)
Claude Code does:
User → Model → Tool Call → Model → Tool Call → ... → Response (done)
The model can call tools, see results, and decide what to do next — in a loop — until it decides it's finished. This loop lives in query.ts (1,730 lines), and it's the most important file in the entire codebase.
The Full Sequence
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#e0e0e0', 'lineColor': '#4a9eff', 'actorTextColor': '#e0e0e0', 'actorBorder': '#4a9eff', 'signalColor': '#4a9eff', 'noteBkgColor': '#16213e', 'noteTextColor': '#e0e0e0', 'activationBkgColor': '#2d1b4e', 'activationBorderColor': '#e83e8c'}}}%%
sequenceDiagram
participant U as User
participant QE as QueryEngine
participant PI as processUserInput
participant Q as query.ts
participant CP as Compaction
participant C as claude.ts
participant API as Anthropic API
participant T as Tool Executor
participant H as Hooks
U->>QE: submitMessage(prompt)
activate QE
QE->>PI: parse slash commands, @mentions, attachments
PI-->>QE: messages[], shouldQuery
alt Slash command — no API call needed
QE-->>U: return local result
else Model query required
QE->>QE: persist transcript to disk
QE->>Q: query(messages, systemPrompt, tools)
activate Q
loop Agentic Loop — until end_turn or max_turns
Q->>CP: snip compact
CP->>CP: micro compact
CP->>CP: auto compact
CP->>CP: context collapse
CP-->>Q: compacted messages
Q->>C: queryModel(messages, tools, thinking)
activate C
C->>C: build request: betas, cache, effort, budget
C->>API: POST /v1/messages — SSE stream
activate API
API-->>C: message_start — usage
API-->>C: content_block — thinking / text / tool_use
API-->>C: message_delta — stop_reason, final usage
deactivate API
C-->>Q: yield AssistantMessage + StreamEvents
deactivate C
alt stop_reason = end_turn
Q->>H: postSamplingHooks
Q->>Q: handleStopHooks
Q-->>QE: return Terminal result
else stop_reason = tool_use
Q->>T: runTools(toolUseBlocks)
activate T
T->>T: validate input against schema
T->>H: PreToolUse hooks
T->>T: check permissions
T->>T: call(input, context)
T->>H: PostToolUse hooks
T-->>Q: yield tool_result messages
deactivate T
Q->>Q: inject CLAUDE.md attachments
Note over Q: push tool_results, continue loop
else stop_reason = max_tokens
Q->>Q: truncation retry — up to 3x
Q->>CP: reactive compact — emergency
end
end
deactivate Q
QE->>QE: accumulate usage, persist transcript
QE-->>U: yield SDKMessage stream
end
deactivate QE
Anatomy of query.ts
The file exports a single async generator function:
export async function* query(params: QueryParams):
AsyncGenerator<StreamEvent | Message | ToolUseSummaryMessage, Terminal> {
// ... 1,730 lines of agentic loop
}
Why an Async Generator?
This is a crucial design decision. An async generator (async function*) lets query.ts:
- Yield messages as they arrive — The consumer (REPL or SDK) sees each message in real-time
- Maintain backpressure — The loop pauses if the consumer isn't ready
- Support cancellation —
.return()on the generator cleanly tears down the loop - Compose generators —
yield*delegates to sub-generators seamlessly
The Loop State
Each loop iteration carries mutable state:
type State = {
messages: Message[] // Conversation history
toolUseContext: ToolUseContext // Tool execution context
autoCompactTracking: AutoCompactTrackingState // Compact progress
maxOutputTokensRecoveryCount: number // Truncation retry counter
hasAttemptedReactiveCompact: boolean // Emergency compact flag
turnCount: number // Loop iteration counter
pendingToolUseSummary: Promise<...> // Async summary generation
transition: Continue | undefined // Why we're in this iteration
}
Phase 1: Pre-Processing — Before the API Call
Before every API call, query.ts runs a compaction pipeline on the message history:
// 1. Apply per-message tool result budgets
messagesForQuery = await applyToolResultBudget(messagesForQuery, ...)
// 2. Snip compact — sliding window over old turns
if (feature('HISTORY_SNIP')) {
const snipResult = snipModule.snipCompactIfNeeded(messagesForQuery)
messagesForQuery = snipResult.messages
}
// 3. Micro compact — truncate oversized tool results
const microcompactResult = await deps.microcompact(messagesForQuery, ...)
messagesForQuery = microcompactResult.messages
// 4. Context collapse — read-time projection
if (feature('CONTEXT_COLLAPSE') && contextCollapse) {
const collapseResult = await contextCollapse.applyCollapsesIfNeeded(...)
messagesForQuery = collapseResult.messages
}
// 5. Auto compact — full summarization via API
const { compactionResult } = await deps.autocompact(messagesForQuery, ...)
Each stage is feature-gated and runs independently. They compose — snip reduces history, micro truncates individual results, auto summarizes the whole thing, collapse archives old views.
Blocking Limit Check
After compaction, the loop checks if we're at the blocking limit (>98% context used):
const { isAtBlockingLimit } = calculateTokenWarningState(
tokenCountWithEstimation(messagesForQuery),
toolUseContext.options.mainLoopModel,
)
if (isAtBlockingLimit) {
yield createAssistantAPIErrorMessage({ content: PROMPT_TOO_LONG_ERROR_MESSAGE })
return { reason: 'blocking_limit' }
}
This prevents the API call entirely if we know it'll fail.
Phase 2: The API Call
The actual model call uses claude.ts:
for await (const message of deps.callModel({
messages: prependUserContext(messagesForQuery, userContext),
systemPrompt: fullSystemPrompt,
thinkingConfig: toolUseContext.options.thinkingConfig,
tools: toolUseContext.options.tools,
signal: toolUseContext.abortController.signal,
options: {
model: currentModel,
fallbackModel,
effortValue: appState.effortValue,
taskBudget: params.taskBudget,
// ... 20+ more options
},
})) {
// Process each streamed message
}
Responses stream in as SSE events. The loop processes three content block types:
| Block Type | What Happens |
|---|---|
thinking |
Rendered in UI, not sent back to model |
text |
Rendered as markdown in terminal |
tool_use |
Triggers tool execution (next phase) |
Phase 3: Tool Execution
When the model responds with tool_use blocks, the loop executes them:
// Parallel tool execution
const toolResults = yield* runTools(toolUseBlocks, canUseTool, toolUseContext)
Streaming Tool Execution
A performance optimization: tools can begin executing while the model is still streaming:
const useStreamingToolExecution = config.gates.streamingToolExecution
let streamingToolExecutor = useStreamingToolExecution
? new StreamingToolExecutor(tools, canUseTool, toolUseContext)
: null
The StreamingToolExecutor starts validating and permission-checking tool calls as their blocks arrive, before the full response is complete.
Tool Lifecycle
Each tool goes through:
- Schema Validation — Zod validates the input against
tool.inputSchema - PreToolUse Hooks — User-defined scripts can approve, deny, or modify the input
- Permission Check — Deny rules → Allow rules → Tool-specific check → Classifier → User dialog
- Execution —
tool.call(input, context)runs the actual operation - PostToolUse Hooks — Scripts run after execution with the result
Phase 4: Loop Continuation
After tools execute, the loop decides what to do next based on the stop_reason:
end_turn — Model is done
if (stop_reason === 'end_turn') {
// Run post-sampling hooks
await executePostSamplingHooks(assistantMessage, toolUseContext)
// Check stop hooks (can force continuation)
const stopResult = await handleStopHooks(assistantMessage, messages)
if (stopResult.shouldContinue) {
// Inject hook feedback and continue loop
} else {
return { reason: 'end_turn' } // Terminal — loop exits
}
}
tool_use — Model wants to use tools
The tool results are pushed to messages and the loop continues:
messages.push(...toolResults)
// Inject CLAUDE.md attachments for newly-discovered memory files
const attachments = await getAttachmentMessages(messages, toolUseContext)
messages.push(...attachments)
// Continue to next iteration (back to Phase 1)
max_tokens — Response was truncated
if (maxOutputTokensRecoveryCount < MAX_OUTPUT_TOKENS_RECOVERY_LIMIT) {
// Retry with increased max_tokens
state.maxOutputTokensRecoveryCount++
continue
} else {
// Trigger reactive compact as last resort
state.hasAttemptedReactiveCompact = true
}
QueryEngine: The Session Wrapper
QueryEngine.ts wraps query() in a session lifecycle:
class QueryEngine {
private mutableMessages: Message[]
private totalUsage: NonNullableUsage
private readFileState: FileStateCache
async *submitMessage(prompt): AsyncGenerator<SDKMessage> {
// 1. Parse user input (slash commands, @mentions)
const { messages, shouldQuery } = await processUserInput({ input: prompt })
// 2. Persist transcript to disk
await recordTranscript(messages)
// 3. Run the agentic loop
if (shouldQuery) {
for await (const message of query({ messages, systemPrompt, tools })) {
// 4. Map internal messages to SDK format
yield normalizedSDKMessage(message)
// 5. Persist each message
await recordTranscript(messages)
}
}
// 6. Return final result with usage stats
yield { type: 'result', total_cost_usd, usage, duration_ms }
}
}
Key Patterns to Understand
1. Generator Composition
The codebase uses yield* heavily to compose generators:
// query.ts delegates to sub-generators
yield* runTools(toolUseBlocks, canUseTool, toolUseContext)
// QueryEngine delegates to query
for await (const message of query(params)) {
yield* normalizeMessage(message)
}
2. Feature-Gated Loading
Code paths are gated by build-time feature flags:
const reactiveCompact = feature('REACTIVE_COMPACT')
? require('./services/compact/reactiveCompact.js')
: null
// Later...
if (reactiveCompact?.isReactiveCompactEnabled()) {
// This entire branch is eliminated in builds where REACTIVE_COMPACT is false
}
3. Tombstone Messages
When a streaming fallback occurs (model switch mid-stream), orphaned messages are tombstoned:
for (const msg of assistantMessages) {
yield { type: 'tombstone', message: msg } // Removed from UI + transcript
}
assistantMessages.length = 0 // Reset
4. Task Budget Tracking
API-level task budgets track spend across compaction boundaries:
if (params.taskBudget) {
const preCompactContext = finalContextTokensFromLastResponse(messagesForQuery)
taskBudgetRemaining = Math.max(
0,
(taskBudgetRemaining ?? params.taskBudget.total) - preCompactContext,
)
}
Common Debugging Scenarios
| Symptom | Where to Look |
|---|---|
| Loop never stops | Check maxTurns limit, handleStopHooks |
| Tool not executing | Permission system — check deny rules, hooks, classifier |
| Context too large | Compaction pipeline — which stage is failing? |
| Model fallback | withRetry in claude.ts — 529 overloaded triggers |
| Truncation errors | MAX_OUTPUT_TOKENS_RECOVERY_LIMIT (3 retries) |
Previous: ← System Overview · Next: Tool System →