Claude tool_use Loops: Fixing AI Workflow Failures
---
title: 'Claude tool_use Loops: Fixing AI Workflow Failures'
date: '2026-07-04'
description: 'We built multi-step AI workflows with Claude's tool_use API. Here's how we fixed tool call loops, error recovery failures, and token budget blowouts in production.'
author: 'Matthew J. Whitney'
tags: ['artificial intelligence', 'llm', 'ai integration', 'machine learning']
category: 'ai_ml'
published: true
---
The Claude tool use API was supposed to make our agent smarter. Instead, it made our bill $400 larger in a single afternoon.
We were building an orchestration layer for Crowdia — a workflow where Claude would call a sequence of tools: validate input, fetch external data, transform it, write to a database, then confirm the write succeeded. Five tools, clean chain, straightforward. The first three runs worked perfectly. Then we pushed a minor change to our error-handling middleware and watched the agent enter a loop. It called `fetch_external_data` forty-seven times in eleven minutes before our circuit breaker finally caught it. The model wasn't confused. It was doing exactly what we'd told it to do — which was the problem.
That experience forced me to actually understand what's happening inside a `tool_use` conversation turn rather than just trusting the happy path. What I found is that there are three distinct failure modes that will silently destroy your agent in production, and almost none of the getting-started documentation warns you about them. This is what I wish I'd known before that afternoon.
---
## How the AI Integration Loop Actually Works
Before fixing anything, you need a precise mental model. The [Claude Messages API](https://docs.anthropic.com/en/api/messages) doesn't execute tools — it returns a `tool_use` block in the response, and *your code* is responsible for executing the tool and feeding the result back. This means the loop is yours to control.
A minimal working loop looks like this:
```python
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "get_order_status",
"description": "Retrieves current status for a given order ID.",
"input_schema": {
"type": "object",
"properties": {
"order_id": {"type": "string", "description": "The order ID to look up"}
},
"required": ["order_id"]
}
}
]
messages = [{"role": "user", "content": "What's the status of order #CRW-8821?"}]
while True:
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
tools=tools,
messages=messages
)
if response.stop_reason == "end_turn":
print(response.content[0].text)
break
if response.stop_reason == "tool_use":
tool_use_block = next(b for b in response.content if b.type == "tool_use")
# YOUR code runs the tool here
tool_result = run_tool(tool_use_block.name, tool_use_block.input)
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use_block.id,
"content": tool_result
}]
})
This is the skeleton from the official tool use documentation. It works. But notice what it doesn't have: any loop counter, any error state tracking, any token budget awareness. Those omissions are where the three failure modes live.
Failure Mode 1: The Infinite Tool Call Loop
The loop in the example above has no exit condition except stop_reason == "end_turn". If your tool consistently returns something the model interprets as "I need to call this tool again," you get infinite recursion at API prices.
This is what happened to us on Crowdia. Our fetch_external_data tool was returning a paginated response. The model, correctly interpreting the has_more: true field, kept calling the tool to get the next page — but our result formatter was accidentally stripping the data and only returning the pagination metadata. From the model's perspective, it had fetched a page but gotten nothing useful, so it tried again.
The fix has two parts. First, a hard iteration cap:
MAX_TOOL_ITERATIONS = 10
iteration = 0
while iteration < MAX_TOOL_ITERATIONS:
iteration += 1
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=4096,
tools=tools,
messages=messages
)
if response.stop_reason == "end_turn":
break
if response.stop_reason == "tool_use":
# ... handle tool call
pass
if iteration >= MAX_TOOL_ITERATIONS:
raise RuntimeError(f"Agent exceeded max iterations ({MAX_TOOL_ITERATIONS}). Last stop_reason: {response.stop_reason}")
Second — and more importantly — fix the tool result. The model is making decisions based entirely on what you return from run_tool(). If your tool result is ambiguous or malformed, the model will do something reasonable with bad data, and that something will be wrong.
A good heuristic: every tool result should be unambiguous about whether it succeeded and what the agent should do next. If a paginated call returns the last page, say so explicitly in the result string.
Failure Mode 2: Broken Error Recovery in LLM Workflows
This one is subtler and more dangerous because it looks like it's working.
When a tool call fails — network timeout, validation error, downstream API down — you have to return something as the tool_result. The question is what. Most developers do one of two things: they return an empty string, or they raise an exception that kills the whole agent. Both are wrong.
If you return an empty string, the model has no idea an error occurred. It will proceed as if the tool succeeded with no data, generating confident-sounding output based on nothing. I've seen this produce database writes with null fields because the model assumed a lookup had returned empty results rather than failed.
If you raise an exception and kill the agent, you lose all the work done in previous tool calls. For a five-step chain, failing on step four and restarting from scratch is expensive and often unnecessary.
The right pattern is to return a structured error in the tool result and let the model decide how to recover:
def run_tool(tool_name: str, tool_input: dict) -> str:
try:
if tool_name == "fetch_external_data":
result = external_api.fetch(tool_input["query"])
return json.dumps({"status": "success", "data": result})
except requests.exceptions.Timeout:
return json.dumps({
"status": "error",
"error_type": "timeout",
"message": "The external API timed out after 30s. You may retry once or proceed without this data.",
"retryable": True
})
except ValueError as e:
return json.dumps({
"status": "error",
"error_type": "validation",
"message": f"Invalid input: {str(e)}",
"retryable": False
})
The retryable field is critical. When you give the model that signal, it can make an intelligent decision: retry a transient failure, or skip and complete the task with a caveat. Without it, the model either retries everything (causing loops) or gives up on things it could have retried.
This mirrors patterns from event-driven system design — the same principle that makes distributed systems resilient applies here. Your tool result is a message on a queue. Make it carry enough information for the consumer to act correctly without additional context.
Failure Mode 3: Token Budget Exhaustion When Chaining 5+ Tools
This is the one that kills you slowly. Each iteration of the tool loop appends to messages. By the time you've completed five tool calls, your message history contains: the original user message, five assistant responses (each with tool_use blocks), and five user messages (each with tool_result blocks). On claude-opus-4-5 with complex tool schemas, I've seen that push 15,000–20,000 tokens before the model has written a single word of final output.
If your max_tokens is set to 1024 (a common default), the model runs out of output budget right when it needs to synthesize everything it just learned. You get a truncated response with stop_reason: "max_tokens" — and the worst part is this doesn't throw an exception. It silently returns whatever the model managed to write before hitting the wall.
Two fixes here.
First, set max_tokens appropriately for multi-tool workflows. For chains of 5+ tools on claude-opus-4-5, I now start at 8192 and work down based on profiling. The model's context window is 200K tokens — you have room. Use it.
Second, track your input token consumption across iterations and bail out gracefully before you hit the wall:
TOKEN_BUDGET = 150000 # Leave headroom in the 200K context window
total_input_tokens = 0
while iteration < MAX_TOOL_ITERATIONS:
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=8192,
tools=tools,
messages=messages
)
total_input_tokens += response.usage.input_tokens
if total_input_tokens > TOKEN_BUDGET:
# Summarize what we have and return a partial result
raise RuntimeError(
f"Token budget exceeded: {total_input_tokens} input tokens consumed. "
f"Consider splitting this workflow into smaller sub-agents."
)
if response.stop_reason == "max_tokens":
# Don't silently accept a truncated response
raise RuntimeError(
f"Model hit max_tokens limit mid-response. Increase max_tokens or reduce tool chain length. "
f"Input tokens this call: {response.usage.input_tokens}"
)
# ... rest of loop
The response.usage object is your friend. Log it on every iteration. The first time you see input tokens jumping 3,000+ per iteration, you have a tool result that's too verbose and needs trimming.
Putting It Together: A Production-Ready Loop
Here's the pattern I actually use now on projects like KRAIN and Crowdia — combining all three fixes:
import anthropic
import json
import logging
logger = logging.getLogger(__name__)
def run_agent(user_message: str, tools: list, tool_runner: callable) -> str:
client = anthropic.Anthropic()
messages = [{"role": "user", "content": user_message}]
MAX_ITERATIONS = 10
TOKEN_BUDGET = 150_000
total_input_tokens = 0
iteration = 0
while iteration < MAX_ITERATIONS:
iteration += 1
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=8192,
tools=tools,
messages=messages
)
total_input_tokens += response.usage.input_tokens
logger.info(f"Iteration {iteration}: stop_reason={response.stop_reason}, "
f"input_tokens={response.usage.input_tokens}, "
f"total_input_tokens={total_input_tokens}")
if response.stop_reason == "max_tokens":
raise RuntimeError(
f"Hit max_tokens at iteration {iteration}. "
f"Input tokens this call: {response.usage.input_tokens}"
)
if total_input_tokens > TOKEN_BUDGET:
raise RuntimeError(
f"Token budget exceeded: {total_input_tokens} tokens consumed across {iteration} iterations."
)
if response.stop_reason == "end_turn":
text_blocks = [b for b in response.content if b.type == "text"]
return text_blocks[0].text if text_blocks else ""
if response.stop_reason == "tool_use":
tool_use_blocks = [b for b in response.content if b.type == "tool_use"]
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for tool_block in tool_use_blocks:
logger.info(f"Executing tool: {tool_block.name} with input: {tool_block.input}")
result = tool_runner(tool_block.name, tool_block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": tool_block.id,
"content": result
})
messages.append({"role": "user", "content": tool_results})
raise RuntimeError(f"Agent exceeded MAX_ITERATIONS ({MAX_ITERATIONS}) without reaching end_turn.")
This handles parallel tool calls (multiple tool_use blocks in a single response, which Claude 3+ supports), logs enough to debug production failures, and fails loudly instead of silently.
What Dan Luu Got Right About Agentic Loops
Dan Luu's notes on agentic coding touched on something I've independently found true: the failure modes in agentic loops aren't model failures, they're interface failures. The model is doing something reasonable given what it sees. The bug is almost always in what you're showing it — malformed tool results, ambiguous success states, truncated context — not in the model's reasoning.
That reframe matters because it tells you where to look when things break. Don't start by tweaking your system prompt or switching models. Start by logging every tool result and every response.usage reading. The bug is almost always visible in that data within two or three failed runs.
The Unglamorous Reality
Building multi-step AI workflows with the Claude tool use API is not hard in the sense of requiring deep ML knowledge. It's hard in the sense that the failure modes are quiet, the happy path is deceptively smooth, and the production edge cases don't show up in tutorials.
The three things that will actually bite you — infinite loops, silent error swallowing, and token exhaustion — are all fixable with straightforward engineering patterns. Iteration caps, structured error returns, and token budget tracking aren't clever tricks. They're the same discipline you'd apply to any stateful loop talking to external services.
The $400 afternoon on Crowdia was annoying. It was also the best possible way to learn that the orchestration layer is your responsibility, not the model's. Claude will do exactly what you tell it to do with the information you give it. Make sure both of those things are correct.
Matthew J. Whitney is a Principal Software Engineer and fractional CTO at Bedda.tech, specializing in AI/ML integration and cloud architecture. If you're building production AI agents and hitting walls, that's what we do.