Intermediate 18 min read · 2026-03-22

Tool Calling Patterns That Actually Work

Registry pattern, Pydantic validation, timeout handling, and the retry strategies that matter in production.

Tool Calling Patterns That Actually Work

Tool calling is where agents go from “impressive demo” to “useful software.” It is also where most agent projects fall apart. The model hallucinates parameters, tools time out silently, errors get swallowed, and the agent loops forever calling the same broken tool. This guide covers the patterns that survive contact with real users.

We will build up from raw tool definitions to a full registry with validation, timeouts, retries, and proper error formatting. Every code example is production Python — no pseudocode, no “exercise for the reader.”

Why tool calling is the hardest part to get right

When Claude makes a tool call, three things can go wrong that do not happen in normal API usage:

The model sends bad input. Claude is good at following schemas, but it is not perfect. It might send a string where you expect an integer, omit a required field, or pass an invalid enum value. If your tool function just crashes, the agent has no way to recover.
The tool itself fails. Network timeouts, database errors, rate limits, file not found — all the normal failure modes of software. But now they are happening inside an automated loop where nobody is watching.
The loop never terminates. The model calls a tool, gets an error, tries again with the same input, gets the same error, and repeats until you hit your token budget. Without explicit loop control, this will happen.

Getting tool calling right means handling all three failure modes explicitly.

The registry pattern

The naive approach is a big if/elif chain:

if tool_name == "search":
    result = search(tool_input["query"])
elif tool_name == "get_user":
    result = get_user(tool_input["user_id"])
elif tool_name == "send_email":
    result = send_email(tool_input["to"], tool_input["subject"], tool_input["body"])

This works for two tools. At ten tools it is unmaintainable. The registry pattern solves this by storing tools as data:

from core.tools import ToolRegistry

registry = ToolRegistry()

@registry.register(
    name="search_knowledge_base",
    description="Search the knowledge base for articles relevant to the user's question.",
    input_schema={
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "Search query"}
        },
        "required": ["query"]
    }
)
def search_kb(query: str) -> str:
    results = search(query)
    if not results:
        return "No relevant articles found."
    return "\n\n".join(f"**{r['title']}**\n{r['content']}" for r in results)

The ToolRegistry stores each tool’s name, description, function reference, schema, and configuration. When it is time to send tools to Claude, one call gives you everything:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful support agent.",
    tools=registry.to_claude_tools(),
    messages=messages,
)

to_claude_tools() iterates over all registered tools and formats them using each tool’s to_claude_schema() method, producing the exact list-of-dicts format Claude expects. No manual schema wrangling.

You can also skip the decorator and register functions directly:

registry.register_function(
    func=search_kb,
    name="search_knowledge_base",
    description="Search the knowledge base",
    input_schema={...},
    timeout_seconds=10.0,
    max_retries=2,
)

This is useful when you are registering third-party functions or building tools dynamically from configuration.

Input validation with Pydantic models

Raw dicts break in production. The model sends {"max_results": "five"} instead of {"max_results": 5}, your function throws a TypeError deep in business logic, and the error message is useless to both the model and your logs.

Pydantic validation catches these problems at the boundary, before your tool code runs:

from pydantic import BaseModel, Field

class SearchInput(BaseModel):
    query: str = Field(description="Search query based on the user's question")
    max_results: int = Field(default=5, ge=1, le=20)
    category: str | None = Field(default=None, description="Optional category filter")

@registry.register(
    name="search_knowledge_base",
    description="Search the knowledge base for relevant articles.",
    input_model=SearchInput,
)
def search_kb(query: str, max_results: int = 5, category: str | None = None) -> str:
    # By the time we get here, inputs are guaranteed valid
    results = search(query, max_results=max_results, category=category)
    return format_results(results)

When you pass input_model instead of input_schema, the registry auto-generates the JSON Schema from your Pydantic model using model_json_schema(). One source of truth for both validation and the schema sent to Claude.

Here is what happens inside the registry when a tool is called with an input_model:

def _validate_input(self, tool: Tool, tool_input: dict) -> dict | ToolResult:
    if tool.input_model is not None:
        try:
            validated = tool.input_model(**tool_input)
            return validated.model_dump()
        except PydanticValidationError as e:
            error_msgs = "; ".join(
                f"{'.'.join(str(loc) for loc in err['loc'])}: {err['msg']}"
                for err in e.errors()
            )
            return ToolResult(
                tool_use_id="",
                content=f"Validation error: {error_msgs}. Please fix the input and try again.",
                is_error=True,
            )
    return tool_input

If validation fails, the tool never executes. Instead, the registry returns a ToolResult with is_error=True and a clear message telling Claude what went wrong. Claude can then fix its input and retry — this is self-healing behavior you get for free.

The key insight: Pydantic gives you default values, type coercion, range checks, and readable error messages. Raw dicts give you KeyError: 'query' at 3am.

Timeout handling

Some tools talk to external services. APIs go down, databases hang, DNS resolves slowly. Without a timeout, your agent loop blocks indefinitely on a single tool call.

The registry uses SIGALRM on Unix systems for clean timeout enforcement:

def _execute_with_timeout(self, tool: Tool, validated_input: dict) -> Any:
    try:
        old_handler = signal.signal(signal.SIGALRM, self._timeout_handler)
        signal.alarm(int(tool.timeout_seconds))
        try:
            result = tool.function(**validated_input)
        finally:
            signal.alarm(0)
            signal.signal(signal.SIGALRM, old_handler)
    except AttributeError:
        # SIGALRM not available (Windows) — run without timeout
        result = tool.function(**validated_input)
    return result

@staticmethod
def _timeout_handler(signum, frame):
    raise TimeoutError("Tool execution timed out")

A few things to note:

Always restore the old handler. The finally block ensures you do not corrupt signal handling for the rest of the process, even if the tool raises an exception.
Graceful fallback on Windows. SIGALRM is Unix-only. On Windows, the tool runs without a timeout rather than crashing. For cross-platform production code, you would use concurrent.futures with a thread pool instead.
Set timeouts per tool. A knowledge base search might need 10 seconds. An email send might need 30. A local computation needs 2. Configure this when you register each tool:

@registry.register(
    name="search_knowledge_base",
    description="Search the knowledge base",
    input_schema={...},
    timeout_seconds=10.0,  # Fast fail for search
)
def search_kb(query: str) -> str:
    ...

@registry.register(
    name="send_notification",
    description="Send an email notification",
    input_schema={...},
    timeout_seconds=30.0,  # External API, needs more time
)
def send_notification(to: str, subject: str, body: str) -> str:
    ...

When a timeout fires, it becomes a TimeoutError that the retry logic can catch.

Retry strategies

Not all errors deserve a retry. A validation error will fail the same way every time. A TimeoutError or a ConnectionError might succeed on the next attempt.

The registry implements linear backoff with a configurable retry count:

last_error = None
for attempt in range(tool.max_retries + 1):
    try:
        result = self._execute_with_timeout(tool, validated_input)
        return ToolResult(
            tool_use_id=tool_use_id,
            content=result,
            execution_time_ms=elapsed,
        )
    except Exception as e:
        last_error = e
        if attempt < tool.max_retries:
            logger.warning(
                f"Tool '{tool_name}' attempt {attempt + 1} failed: {e}. Retrying..."
            )
            time.sleep(0.5 * (attempt + 1))  # Linear backoff

# All retries exhausted
error_msg = f"Tool '{tool_name}' failed after {tool.max_retries + 1} attempts: {last_error}"
return ToolResult(
    tool_use_id=tool_use_id,
    content=f"Error: {error_msg}. Please try a different approach.",
    is_error=True,
    execution_time_ms=elapsed,
)

The backoff is 0.5 * (attempt + 1) seconds — 0.5s, 1.0s, 1.5s. Simple and predictable. For production systems hitting rate-limited APIs, you might want exponential backoff instead:

import random

delay = min(2 ** attempt + random.uniform(0, 1), 30)
time.sleep(delay)

The jitter (random component) prevents thundering herd problems when multiple agent instances retry the same service simultaneously.

Guidelines for retry configuration:

Network calls (API requests, database queries): max_retries=2 or 3
Local computation (parsing, formatting): max_retries=0 — if it fails, it will fail again
External writes (sending emails, creating records): max_retries=1 with idempotency checks
Timeouts: always worth one retry, transient network issues are common

Formatting results for Claude

When Claude calls a tool, it includes an id field in the tool_use content block. Your result must reference this ID, or the API will reject it. The ToolResult dataclass handles this:

@dataclass
class ToolResult:
    tool_use_id: str
    content: str
    is_error: bool = False
    execution_time_ms: float = 0.0

The registry formats these into the exact structure Claude expects:

def format_results_for_claude(self, results: list[ToolResult]) -> list[dict]:
    return [
        {
            "type": "tool_result",
            "tool_use_id": r.tool_use_id,
            "content": r.content,
            **({"is_error": True} if r.is_error else {}),
        }
        for r in results
    ]

The is_error flag matters. When set to True, Claude knows the tool failed and will either try a different approach or explain the failure to the user. Without it, Claude might try to interpret an error message as a successful result.

Always return strings from your tool functions. If your tool naturally returns a dict or list, the registry serializes it with json.dumps. But explicit string formatting gives Claude better context:

# Bad — Claude gets raw JSON
def search_kb(query: str) -> dict:
    return {"results": [...], "count": 3}

# Good — Claude gets readable text
def search_kb(query: str) -> str:
    results = search(query)
    if not results:
        return "No articles found matching that query."
    formatted = "\n\n".join(
        f"**{r['title']}** (ID: {r['id']})\n{r['content']}"
        for r in results
    )
    return f"Found {len(results)} article(s):\n\n{formatted}"

The agent loop pattern

Every agent follows the same core loop. Understanding it deeply is more valuable than any framework:

def run_agent(user_message: str, messages: list, registry: ToolRegistry) -> str:
    messages.append({"role": "user", "content": user_message})
    max_iterations = 10  # Safety valve

    for _ in range(max_iterations):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            system="You are a helpful support agent.",
            tools=registry.to_claude_tools(),
            messages=messages,
        )

        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            # Model is done — extract final text
            text_blocks = [b.text for b in response.content if b.type == "text"]
            return "\n".join(text_blocks)

        # Extract tool calls from response
        tool_calls = [
            {"name": b.name, "input": b.input, "id": b.id}
            for b in response.content
            if b.type == "tool_use"
        ]

        # Execute all tool calls
        results = registry.execute_many(tool_calls)

        # Format and send results back
        tool_results = registry.format_results_for_claude(results)
        messages.append({"role": "user", "content": tool_results})

    return "I've reached my maximum number of steps. Please try rephrasing your question."

The critical elements:

max_iterations as a safety valve. Without this, a confused model can loop forever. Ten iterations is generous for most agents — if it has not answered after ten tool calls, something is wrong.
Append the full response.content to messages. Not just the text — the tool use blocks too. Claude needs to see its own tool calls in the conversation history to maintain coherence.
Send tool results as a “user” message. This is Claude’s API convention. Tool results are wrapped in {"role": "user", "content": [tool_results]}.
Handle multiple tool calls per turn. Claude can request several tools at once. execute_many handles them all and returns a list of results.

Common mistakes

Not handling tool errors. If your tool throws an unhandled exception, the agent loop crashes and the user gets nothing. Every tool should either return a string or let the registry catch and format the error.

Missing timeouts. A tool that calls an external API without a timeout will block your agent indefinitely. Set timeout_seconds on every tool that does I/O.

Infinite loops. Always use max_iterations in your agent loop. Also watch for “retry loops” where the model keeps calling the same tool with the same broken input. If you see the same tool called three times in a row with identical parameters, break out and return an error.

Swallowing the is_error flag. If you format tool results without is_error: True on failures, Claude has no signal that something went wrong. It will cheerfully treat an error traceback as search results.

Not logging tool execution. When something goes wrong at 2am, you need to know which tool was called, with what input, how long it took, and what it returned. Log every execution:

logger.info(
    f"Tool '{tool_name}' | input={tool_input} | "
    f"result_length={len(result.content)} | "
    f"is_error={result.is_error} | "
    f"time={result.execution_time_ms:.0f}ms"
)

Too many tools. Claude handles 5-10 tools well. At 20+ tools, it starts making worse decisions about which tool to call. If you have many tools, consider grouping them or using a two-stage approach where Claude first picks a category, then picks a tool.

What’s next

Tool calling is the mechanism. The real challenge is designing the right tools for your use case and testing that Claude uses them correctly. The agent evaluation guide covers how to write test cases that verify tool usage, and the cost tracking guide shows how to keep your agent loops from burning through your budget.

The StartToAgent starter kit has all of these patterns built out with the full ToolRegistry, Pydantic validation, timeout handling, and retry logic ready to use. If you want to skip building the infrastructure and focus on your agent’s actual tools, check out the kit.

Keep learning

Browse all guides