# 2. The Agentic Loop > How Claude Code cycles between the model and tools — the core of the entire system. --- ## What Is the "Agentic Loop"? The agentic loop is the mechanism that makes Claude Code more than a chatbot. Instead of: ``` User → Model → Response (done) ``` Claude Code does: ``` User → Model → Tool Call → Model → Tool Call → ... → Response (done) ``` The model can call tools, see results, and decide what to do next — in a **loop** — until it decides it's finished. This loop lives in **`query.ts`** (1,730 lines), and it's the most important file in the entire codebase. --- ## The Full Sequence ```mermaid %%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1a2e', 'primaryTextColor': '#e0e0e0', 'lineColor': '#4a9eff', 'actorTextColor': '#e0e0e0', 'actorBorder': '#4a9eff', 'signalColor': '#4a9eff', 'noteBkgColor': '#16213e', 'noteTextColor': '#e0e0e0', 'activationBkgColor': '#2d1b4e', 'activationBorderColor': '#e83e8c'}}}%% sequenceDiagram participant U as User participant QE as QueryEngine participant PI as processUserInput participant Q as query.ts participant CP as Compaction participant C as claude.ts participant API as Anthropic API participant T as Tool Executor participant H as Hooks U->>QE: submitMessage(prompt) activate QE QE->>PI: parse slash commands, @mentions, attachments PI-->>QE: messages[], shouldQuery alt Slash command — no API call needed QE-->>U: return local result else Model query required QE->>QE: persist transcript to disk QE->>Q: query(messages, systemPrompt, tools) activate Q loop Agentic Loop — until end_turn or max_turns Q->>CP: snip compact CP->>CP: micro compact CP->>CP: auto compact CP->>CP: context collapse CP-->>Q: compacted messages Q->>C: queryModel(messages, tools, thinking) activate C C->>C: build request: betas, cache, effort, budget C->>API: POST /v1/messages — SSE stream activate API API-->>C: message_start — usage API-->>C: content_block — thinking / text / tool_use API-->>C: message_delta — stop_reason, final usage deactivate API C-->>Q: yield AssistantMessage + StreamEvents deactivate C alt stop_reason = end_turn Q->>H: postSamplingHooks Q->>Q: handleStopHooks Q-->>QE: return Terminal result else stop_reason = tool_use Q->>T: runTools(toolUseBlocks) activate T T->>T: validate input against schema T->>H: PreToolUse hooks T->>T: check permissions T->>T: call(input, context) T->>H: PostToolUse hooks T-->>Q: yield tool_result messages deactivate T Q->>Q: inject CLAUDE.md attachments Note over Q: push tool_results, continue loop else stop_reason = max_tokens Q->>Q: truncation retry — up to 3x Q->>CP: reactive compact — emergency end end deactivate Q QE->>QE: accumulate usage, persist transcript QE-->>U: yield SDKMessage stream end deactivate QE ``` --- ## Anatomy of `query.ts` The file exports a single async generator function: ```typescript export async function* query(params: QueryParams): AsyncGenerator { // ... 1,730 lines of agentic loop } ``` ### Why an Async Generator? This is a crucial design decision. An async generator (`async function*`) lets query.ts: 1. **Yield messages as they arrive** — The consumer (REPL or SDK) sees each message in real-time 2. **Maintain backpressure** — The loop pauses if the consumer isn't ready 3. **Support cancellation** — `.return()` on the generator cleanly tears down the loop 4. **Compose generators** — `yield*` delegates to sub-generators seamlessly ### The Loop State Each loop iteration carries mutable state: ```typescript type State = { messages: Message[] // Conversation history toolUseContext: ToolUseContext // Tool execution context autoCompactTracking: AutoCompactTrackingState // Compact progress maxOutputTokensRecoveryCount: number // Truncation retry counter hasAttemptedReactiveCompact: boolean // Emergency compact flag turnCount: number // Loop iteration counter pendingToolUseSummary: Promise<...> // Async summary generation transition: Continue | undefined // Why we're in this iteration } ``` --- ## Phase 1: Pre-Processing — Before the API Call Before every API call, query.ts runs a **compaction pipeline** on the message history: ```typescript // 1. Apply per-message tool result budgets messagesForQuery = await applyToolResultBudget(messagesForQuery, ...) // 2. Snip compact — sliding window over old turns if (feature('HISTORY_SNIP')) { const snipResult = snipModule.snipCompactIfNeeded(messagesForQuery) messagesForQuery = snipResult.messages } // 3. Micro compact — truncate oversized tool results const microcompactResult = await deps.microcompact(messagesForQuery, ...) messagesForQuery = microcompactResult.messages // 4. Context collapse — read-time projection if (feature('CONTEXT_COLLAPSE') && contextCollapse) { const collapseResult = await contextCollapse.applyCollapsesIfNeeded(...) messagesForQuery = collapseResult.messages } // 5. Auto compact — full summarization via API const { compactionResult } = await deps.autocompact(messagesForQuery, ...) ``` Each stage is feature-gated and runs **independently**. They compose — snip reduces history, micro truncates individual results, auto summarizes the whole thing, collapse archives old views. ### Blocking Limit Check After compaction, the loop checks if we're at the **blocking limit** (>98% context used): ```typescript const { isAtBlockingLimit } = calculateTokenWarningState( tokenCountWithEstimation(messagesForQuery), toolUseContext.options.mainLoopModel, ) if (isAtBlockingLimit) { yield createAssistantAPIErrorMessage({ content: PROMPT_TOO_LONG_ERROR_MESSAGE }) return { reason: 'blocking_limit' } } ``` This prevents the API call entirely if we know it'll fail. --- ## Phase 2: The API Call The actual model call uses `claude.ts`: ```typescript for await (const message of deps.callModel({ messages: prependUserContext(messagesForQuery, userContext), systemPrompt: fullSystemPrompt, thinkingConfig: toolUseContext.options.thinkingConfig, tools: toolUseContext.options.tools, signal: toolUseContext.abortController.signal, options: { model: currentModel, fallbackModel, effortValue: appState.effortValue, taskBudget: params.taskBudget, // ... 20+ more options }, })) { // Process each streamed message } ``` Responses stream in as SSE events. The loop processes three content block types: | Block Type | What Happens | |-----------|-------------| | `thinking` | Rendered in UI, not sent back to model | | `text` | Rendered as markdown in terminal | | `tool_use` | Triggers tool execution (next phase) | --- ## Phase 3: Tool Execution When the model responds with `tool_use` blocks, the loop executes them: ```typescript // Parallel tool execution const toolResults = yield* runTools(toolUseBlocks, canUseTool, toolUseContext) ``` ### Streaming Tool Execution A performance optimization: tools can begin executing **while the model is still streaming**: ```typescript const useStreamingToolExecution = config.gates.streamingToolExecution let streamingToolExecutor = useStreamingToolExecution ? new StreamingToolExecutor(tools, canUseTool, toolUseContext) : null ``` The `StreamingToolExecutor` starts validating and permission-checking tool calls as their blocks arrive, before the full response is complete. ### Tool Lifecycle Each tool goes through: 1. **Schema Validation** — Zod validates the input against `tool.inputSchema` 2. **PreToolUse Hooks** — User-defined scripts can approve, deny, or modify the input 3. **Permission Check** — Deny rules → Allow rules → Tool-specific check → Classifier → User dialog 4. **Execution** — `tool.call(input, context)` runs the actual operation 5. **PostToolUse Hooks** — Scripts run after execution with the result --- ## Phase 4: Loop Continuation After tools execute, the loop decides what to do next based on the `stop_reason`: ### `end_turn` — Model is done ```typescript if (stop_reason === 'end_turn') { // Run post-sampling hooks await executePostSamplingHooks(assistantMessage, toolUseContext) // Check stop hooks (can force continuation) const stopResult = await handleStopHooks(assistantMessage, messages) if (stopResult.shouldContinue) { // Inject hook feedback and continue loop } else { return { reason: 'end_turn' } // Terminal — loop exits } } ``` ### `tool_use` — Model wants to use tools The tool results are pushed to messages and the loop continues: ```typescript messages.push(...toolResults) // Inject CLAUDE.md attachments for newly-discovered memory files const attachments = await getAttachmentMessages(messages, toolUseContext) messages.push(...attachments) // Continue to next iteration (back to Phase 1) ``` ### `max_tokens` — Response was truncated ```typescript if (maxOutputTokensRecoveryCount < MAX_OUTPUT_TOKENS_RECOVERY_LIMIT) { // Retry with increased max_tokens state.maxOutputTokensRecoveryCount++ continue } else { // Trigger reactive compact as last resort state.hasAttemptedReactiveCompact = true } ``` --- ## QueryEngine: The Session Wrapper `QueryEngine.ts` wraps `query()` in a session lifecycle: ```typescript class QueryEngine { private mutableMessages: Message[] private totalUsage: NonNullableUsage private readFileState: FileStateCache async *submitMessage(prompt): AsyncGenerator { // 1. Parse user input (slash commands, @mentions) const { messages, shouldQuery } = await processUserInput({ input: prompt }) // 2. Persist transcript to disk await recordTranscript(messages) // 3. Run the agentic loop if (shouldQuery) { for await (const message of query({ messages, systemPrompt, tools })) { // 4. Map internal messages to SDK format yield normalizedSDKMessage(message) // 5. Persist each message await recordTranscript(messages) } } // 6. Return final result with usage stats yield { type: 'result', total_cost_usd, usage, duration_ms } } } ``` --- ## Key Patterns to Understand ### 1. Generator Composition The codebase uses `yield*` heavily to compose generators: ```typescript // query.ts delegates to sub-generators yield* runTools(toolUseBlocks, canUseTool, toolUseContext) // QueryEngine delegates to query for await (const message of query(params)) { yield* normalizeMessage(message) } ``` ### 2. Feature-Gated Loading Code paths are gated by build-time feature flags: ```typescript const reactiveCompact = feature('REACTIVE_COMPACT') ? require('./services/compact/reactiveCompact.js') : null // Later... if (reactiveCompact?.isReactiveCompactEnabled()) { // This entire branch is eliminated in builds where REACTIVE_COMPACT is false } ``` ### 3. Tombstone Messages When a streaming fallback occurs (model switch mid-stream), orphaned messages are tombstoned: ```typescript for (const msg of assistantMessages) { yield { type: 'tombstone', message: msg } // Removed from UI + transcript } assistantMessages.length = 0 // Reset ``` ### 4. Task Budget Tracking API-level task budgets track spend across compaction boundaries: ```typescript if (params.taskBudget) { const preCompactContext = finalContextTokensFromLastResponse(messagesForQuery) taskBudgetRemaining = Math.max( 0, (taskBudgetRemaining ?? params.taskBudget.total) - preCompactContext, ) } ``` --- ## Common Debugging Scenarios | Symptom | Where to Look | |---------|--------------| | Loop never stops | Check `maxTurns` limit, `handleStopHooks` | | Tool not executing | Permission system — check deny rules, hooks, classifier | | Context too large | Compaction pipeline — which stage is failing? | | Model fallback | `withRetry` in claude.ts — 529 overloaded triggers | | Truncation errors | `MAX_OUTPUT_TOKENS_RECOVERY_LIMIT` (3 retries) | --- **Previous:** [← System Overview](./01-system-overview.md) · **Next:** [Tool System →](./03-tool-system.md)