Beginner 12 min read · 2026-03-20

Building Your First AI Agent with Claude

From zero to a working agent in 30 minutes. Tool calling, conversation management, and deployment.

Building Your First AI Agent with Claude

Most tutorials on AI agents drown you in theory. This one gets you to a working agent in 30 minutes. You will build a customer support agent that can look up information, hold a conversation, and stay within budget. Along the way you will learn the three patterns that underpin every production agent: tool calling, conversation management, and cost tracking.

What is an AI agent?

An AI agent is a program that uses an LLM to decide what actions to take. Instead of hardcoding “if the user says X, do Y,” you describe available tools to the model and let it figure out which ones to call and in what order. Claude is particularly well-suited for this because its tool calling is reliable, its instruction following is precise, and it handles multi-step reasoning without excessive prompting.

We will build a simple customer support agent. A user asks a question, the agent searches a knowledge base, and returns a grounded answer. Simple enough to understand in one sitting, complex enough to teach you real patterns.

Prerequisites

Python 3.11+ (for modern typing syntax)
An Anthropic API key — grab one at console.anthropic.com
Basic Python knowledge — functions, dicts, loops

Step 1: Setting up

Install the SDK and create a minimal project structure.

pip install anthropic python-dotenv

Create your project:

my-agent/
  .env
  agent.py
  knowledge_base.py

Add your API key to .env:

ANTHROPIC_API_KEY=sk-ant-...

In knowledge_base.py, create a fake knowledge base we can search against. In a real project this would be a vector database or search index.

ARTICLES = [
    {
        "id": 1,
        "title": "How to reset your password",
        "content": "Go to Settings > Security > Reset Password. You will receive an email with a reset link valid for 24 hours.",
    },
    {
        "id": 2,
        "title": "Billing cycle explanation",
        "content": "We bill on the 1st of each month. Pro plans are $29/mo, Team plans are $79/mo. You can cancel anytime from Settings > Billing.",
    },
    {
        "id": 3,
        "title": "How to export your data",
        "content": "Navigate to Settings > Data > Export. Choose JSON or CSV format. Exports include all your projects and history. Large exports may take up to 1 hour.",
    },
]


def search(query: str) -> list[dict]:
    """Simple keyword search. Replace with vector search in production."""
    query_lower = query.lower()
    results = []
    for article in ARTICLES:
        if any(word in article["title"].lower() or word in article["content"].lower()
               for word in query_lower.split()):
            results.append(article)
    return results

Step 2: Your first Claude call

Start with a plain conversation — no tools yet. This establishes the foundation everything else builds on.

import anthropic
from dotenv import load_dotenv

load_dotenv()
client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful customer support agent for Acme Corp. Be concise and friendly.",
    messages=[
        {"role": "user", "content": "How do I reset my password?"}
    ],
)

print(response.content[0].text)

A few things to notice about the response object:

response.content is a list of content blocks (text, tool calls, etc.)
response.stop_reason tells you why the model stopped: "end_turn" means it finished naturally, "tool_use" means it wants to call a tool
response.usage contains input_tokens and output_tokens — you will need these later for cost tracking

Right now the agent is making up answers. It has no access to your actual knowledge base. That is what tools fix.

Step 3: Adding tool calling

Tool calling is the core mechanic that turns an LLM into an agent. You describe functions the model can invoke, and it decides when and how to use them.

First, define your tool schema. This tells Claude what the tool does and what parameters it accepts:

tools = [
    {
        "name": "search_knowledge_base",
        "description": "Search the customer support knowledge base for articles relevant to the user's question. Use this before answering any product question.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search query based on the user's question",
                }
            },
            "required": ["query"],
        },
    }
]

Now send the request with tools included:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful customer support agent for Acme Corp. Always search the knowledge base before answering product questions.",
    tools=tools,
    messages=[
        {"role": "user", "content": "How do I reset my password?"}
    ],
)

When Claude decides to use a tool, response.stop_reason will be "tool_use" and the content blocks will include a tool use block. You need to execute the tool and send the result back. This creates the agentic loop — the most important pattern in agent development:

from knowledge_base import search

def handle_tool_call(tool_name: str, tool_input: dict) -> str:
    if tool_name == "search_knowledge_base":
        results = search(tool_input["query"])
        if not results:
            return "No relevant articles found."
        return "\n\n".join(
            f"**{r['title']}**\n{r['content']}" for r in results
        )
    return f"Unknown tool: {tool_name}"


def run_agent(user_message: str, messages: list | None = None) -> str:
    if messages is None:
        messages = []

    messages.append({"role": "user", "content": user_message})

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            system="You are a helpful customer support agent for Acme Corp. Always search the knowledge base before answering product questions.",
            tools=tools,
            messages=messages,
        )

        # Append the assistant's full response to history
        messages.append({"role": "assistant", "content": response.content})

        # If the model is done, return the text
        if response.stop_reason == "end_turn":
            return response.content[0].text

        # Otherwise, process tool calls
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = handle_tool_call(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result,
                })

        # Send tool results back to Claude
        messages.append({"role": "user", "content": tool_results})

The loop is: send message, check stop reason, execute tools, send results, repeat. Every agent you will ever build follows this pattern. The only things that change are the tools and the logic around them.

Step 4: Conversation management

The messages list is your agent’s memory. Every turn gets appended, so Claude has full context of the conversation. But this creates a problem: conversations grow, tokens accumulate, and eventually you hit the context window limit.

The simplest solution is a sliding window. Keep the system prompt and the last N exchanges:

def trim_messages(messages: list, max_turns: int = 20) -> list:
    """Keep the most recent turns to stay within context limits."""
    if len(messages) <= max_turns * 2:
        return messages

    # Always keep the first user message for context
    trimmed = messages[:1]

    # Then keep the most recent turns
    trimmed.extend(messages[-(max_turns * 2):])
    return trimmed

Integrate this into the loop by calling messages = trim_messages(messages) before each API call. For production agents you will want something smarter — summarizing older turns, storing them in a database, or using a retrieval layer. But the sliding window gets you surprisingly far.

There is a subtlety here: when you trim messages, make sure you do not cut in the middle of a tool call sequence. Every tool_use block must have a matching tool_result in the next message, or the API will return an error.

Step 5: Adding cost tracking

If you skip this step, you will regret it. A runaway agent loop or a popular bot can burn through hundreds of dollars before you notice. Build cost awareness in from the start.

# Pricing per million tokens (check anthropic.com/pricing for current rates)
PRICING = {
    "claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
}


class CostTracker:
    def __init__(self, budget_usd: float = 1.00):
        self.budget_usd = budget_usd
        self.total_input_tokens = 0
        self.total_output_tokens = 0
        self.total_cost_usd = 0.0

    def track(self, usage, model: str) -> None:
        pricing = PRICING[model]
        input_cost = (usage.input_tokens / 1_000_000) * pricing["input"]
        output_cost = (usage.output_tokens / 1_000_000) * pricing["output"]

        self.total_input_tokens += usage.input_tokens
        self.total_output_tokens += usage.output_tokens
        self.total_cost_usd += input_cost + output_cost

    def check_budget(self) -> bool:
        """Returns True if we are still within budget."""
        return self.total_cost_usd < self.budget_usd

    def summary(self) -> str:
        return (
            f"Tokens: {self.total_input_tokens:,} in / {self.total_output_tokens:,} out | "
            f"Cost: ${self.total_cost_usd:.4f} / ${self.budget_usd:.2f} budget"
        )

Add tracker.track(response.usage, model) after every API call in your loop, and check tracker.check_budget() before making the next call. When the budget is exceeded, return a polite message to the user and end the conversation.

Step 6: Putting it together

Here is the complete agent combining everything above:

import anthropic
from dotenv import load_dotenv
from knowledge_base import search

load_dotenv()

client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-20250514"
SYSTEM_PROMPT = """You are a helpful customer support agent for Acme Corp.
Always search the knowledge base before answering product questions.
If no relevant articles are found, say so honestly -- do not make up answers."""

TOOLS = [
    {
        "name": "search_knowledge_base",
        "description": "Search the customer support knowledge base for articles relevant to the user's question.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search query based on the user's question",
                }
            },
            "required": ["query"],
        },
    }
]

PRICING = {
    "claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
}


class CostTracker:
    def __init__(self, budget_usd: float = 1.00):
        self.budget_usd = budget_usd
        self.total_cost_usd = 0.0
        self.total_input_tokens = 0
        self.total_output_tokens = 0

    def track(self, usage) -> None:
        pricing = PRICING[MODEL]
        self.total_input_tokens += usage.input_tokens
        self.total_output_tokens += usage.output_tokens
        self.total_cost_usd += (usage.input_tokens / 1e6) * pricing["input"]
        self.total_cost_usd += (usage.output_tokens / 1e6) * pricing["output"]

    def within_budget(self) -> bool:
        return self.total_cost_usd < self.budget_usd


def handle_tool_call(name: str, input: dict) -> str:
    if name == "search_knowledge_base":
        results = search(input["query"])
        if not results:
            return "No relevant articles found."
        return "\n\n".join(f"**{r['title']}**\n{r['content']}" for r in results)
    return f"Unknown tool: {name}"


def trim_messages(messages: list, max_turns: int = 20) -> list:
    if len(messages) <= max_turns * 2:
        return messages
    return messages[:1] + messages[-(max_turns * 2):]


def chat(user_input: str, messages: list, tracker: CostTracker) -> str:
    messages.append({"role": "user", "content": user_input})

    while True:
        if not tracker.within_budget():
            return "I have reached my usage limit for this conversation. Please start a new session."

        messages_to_send = trim_messages(messages)

        response = client.messages.create(
            model=MODEL,
            max_tokens=1024,
            system=SYSTEM_PROMPT,
            tools=TOOLS,
            messages=messages_to_send,
        )

        tracker.track(response.usage)
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            return response.content[0].text

        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = handle_tool_call(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result,
                })

        messages.append({"role": "user", "content": tool_results})


def main():
    messages = []
    tracker = CostTracker(budget_usd=0.50)

    print("Acme Corp Support Agent (type 'quit' to exit)\n")

    while True:
        user_input = input("You: ").strip()
        if user_input.lower() in ("quit", "exit"):
            print(f"\n{tracker.total_input_tokens:,} input tokens, "
                  f"{tracker.total_output_tokens:,} output tokens, "
                  f"${tracker.total_cost_usd:.4f} total cost")
            break

        reply = chat(user_input, messages, tracker)
        print(f"\nAgent: {reply}\n")


if __name__ == "__main__":
    main()

Run it with python agent.py and try asking questions like “How do I export my data?” or “What does the Team plan cost?” The agent will search the knowledge base and give grounded answers.

What’s next

You now have a working agent with tool calling, conversation management, and cost tracking. These three patterns are the foundation — every production agent builds on them.

From here, the interesting problems are: how do you add more tools without the agent getting confused? How do you test agent behavior reliably? How do you handle errors and retries in production? How do you evaluate whether your agent is actually helping users?

The StartToAgent starter kit has all of this built out with production-ready patterns: structured tool registries, automatic retries with exponential backoff, evaluation frameworks, and deployment configs. If you want to skip the boilerplate and start building your actual product, check out the kit.

Keep learning

Browse all guides