Building Your First AI Agent with Claude
From zero to a working agent in 30 minutes. Tool calling, conversation management, and deployment.
Building Your First AI Agent with Claude
Most tutorials on AI agents drown you in theory. This one gets you to a working agent in 30 minutes. You will build a customer support agent that can look up information, hold a conversation, and stay within budget. Along the way you will learn the three patterns that underpin every production agent: tool calling, conversation management, and cost tracking.
What is an AI agent?
An AI agent is a program that uses an LLM to decide what actions to take. Instead of hardcoding “if the user says X, do Y,” you describe available tools to the model and let it figure out which ones to call and in what order. Claude is particularly well-suited for this because its tool calling is reliable, its instruction following is precise, and it handles multi-step reasoning without excessive prompting.
We will build a simple customer support agent. A user asks a question, the agent searches a knowledge base, and returns a grounded answer. Simple enough to understand in one sitting, complex enough to teach you real patterns.
Prerequisites
- Python 3.11+ (for modern typing syntax)
- An Anthropic API key — grab one at console.anthropic.com
- Basic Python knowledge — functions, dicts, loops
Step 1: Setting up
Install the SDK and create a minimal project structure.
pip install anthropic python-dotenv
Create your project:
my-agent/
.env
agent.py
knowledge_base.py
Add your API key to .env:
ANTHROPIC_API_KEY=sk-ant-...
In knowledge_base.py, create a fake knowledge base we can search against. In a real project this would be a vector database or search index.
ARTICLES = [
{
"id": 1,
"title": "How to reset your password",
"content": "Go to Settings > Security > Reset Password. You will receive an email with a reset link valid for 24 hours.",
},
{
"id": 2,
"title": "Billing cycle explanation",
"content": "We bill on the 1st of each month. Pro plans are $29/mo, Team plans are $79/mo. You can cancel anytime from Settings > Billing.",
},
{
"id": 3,
"title": "How to export your data",
"content": "Navigate to Settings > Data > Export. Choose JSON or CSV format. Exports include all your projects and history. Large exports may take up to 1 hour.",
},
]
def search(query: str) -> list[dict]:
"""Simple keyword search. Replace with vector search in production."""
query_lower = query.lower()
results = []
for article in ARTICLES:
if any(word in article["title"].lower() or word in article["content"].lower()
for word in query_lower.split()):
results.append(article)
return results
Step 2: Your first Claude call
Start with a plain conversation — no tools yet. This establishes the foundation everything else builds on.
import anthropic
from dotenv import load_dotenv
load_dotenv()
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful customer support agent for Acme Corp. Be concise and friendly.",
messages=[
{"role": "user", "content": "How do I reset my password?"}
],
)
print(response.content[0].text)
A few things to notice about the response object:
response.contentis a list of content blocks (text, tool calls, etc.)response.stop_reasontells you why the model stopped:"end_turn"means it finished naturally,"tool_use"means it wants to call a toolresponse.usagecontainsinput_tokensandoutput_tokens— you will need these later for cost tracking
Right now the agent is making up answers. It has no access to your actual knowledge base. That is what tools fix.
Step 3: Adding tool calling
Tool calling is the core mechanic that turns an LLM into an agent. You describe functions the model can invoke, and it decides when and how to use them.
First, define your tool schema. This tells Claude what the tool does and what parameters it accepts:
tools = [
{
"name": "search_knowledge_base",
"description": "Search the customer support knowledge base for articles relevant to the user's question. Use this before answering any product question.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query based on the user's question",
}
},
"required": ["query"],
},
}
]
Now send the request with tools included:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful customer support agent for Acme Corp. Always search the knowledge base before answering product questions.",
tools=tools,
messages=[
{"role": "user", "content": "How do I reset my password?"}
],
)
When Claude decides to use a tool, response.stop_reason will be "tool_use" and the content blocks will include a tool use block. You need to execute the tool and send the result back. This creates the agentic loop — the most important pattern in agent development:
from knowledge_base import search
def handle_tool_call(tool_name: str, tool_input: dict) -> str:
if tool_name == "search_knowledge_base":
results = search(tool_input["query"])
if not results:
return "No relevant articles found."
return "\n\n".join(
f"**{r['title']}**\n{r['content']}" for r in results
)
return f"Unknown tool: {tool_name}"
def run_agent(user_message: str, messages: list | None = None) -> str:
if messages is None:
messages = []
messages.append({"role": "user", "content": user_message})
while True:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful customer support agent for Acme Corp. Always search the knowledge base before answering product questions.",
tools=tools,
messages=messages,
)
# Append the assistant's full response to history
messages.append({"role": "assistant", "content": response.content})
# If the model is done, return the text
if response.stop_reason == "end_turn":
return response.content[0].text
# Otherwise, process tool calls
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = handle_tool_call(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
# Send tool results back to Claude
messages.append({"role": "user", "content": tool_results})
The loop is: send message, check stop reason, execute tools, send results, repeat. Every agent you will ever build follows this pattern. The only things that change are the tools and the logic around them.
Step 4: Conversation management
The messages list is your agent’s memory. Every turn gets appended, so Claude has full context of the conversation. But this creates a problem: conversations grow, tokens accumulate, and eventually you hit the context window limit.
The simplest solution is a sliding window. Keep the system prompt and the last N exchanges:
def trim_messages(messages: list, max_turns: int = 20) -> list:
"""Keep the most recent turns to stay within context limits."""
if len(messages) <= max_turns * 2:
return messages
# Always keep the first user message for context
trimmed = messages[:1]
# Then keep the most recent turns
trimmed.extend(messages[-(max_turns * 2):])
return trimmed
Integrate this into the loop by calling messages = trim_messages(messages) before each API call. For production agents you will want something smarter — summarizing older turns, storing them in a database, or using a retrieval layer. But the sliding window gets you surprisingly far.
There is a subtlety here: when you trim messages, make sure you do not cut in the middle of a tool call sequence. Every tool_use block must have a matching tool_result in the next message, or the API will return an error.
Step 5: Adding cost tracking
If you skip this step, you will regret it. A runaway agent loop or a popular bot can burn through hundreds of dollars before you notice. Build cost awareness in from the start.
# Pricing per million tokens (check anthropic.com/pricing for current rates)
PRICING = {
"claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
}
class CostTracker:
def __init__(self, budget_usd: float = 1.00):
self.budget_usd = budget_usd
self.total_input_tokens = 0
self.total_output_tokens = 0
self.total_cost_usd = 0.0
def track(self, usage, model: str) -> None:
pricing = PRICING[model]
input_cost = (usage.input_tokens / 1_000_000) * pricing["input"]
output_cost = (usage.output_tokens / 1_000_000) * pricing["output"]
self.total_input_tokens += usage.input_tokens
self.total_output_tokens += usage.output_tokens
self.total_cost_usd += input_cost + output_cost
def check_budget(self) -> bool:
"""Returns True if we are still within budget."""
return self.total_cost_usd < self.budget_usd
def summary(self) -> str:
return (
f"Tokens: {self.total_input_tokens:,} in / {self.total_output_tokens:,} out | "
f"Cost: ${self.total_cost_usd:.4f} / ${self.budget_usd:.2f} budget"
)
Add tracker.track(response.usage, model) after every API call in your loop, and check tracker.check_budget() before making the next call. When the budget is exceeded, return a polite message to the user and end the conversation.
Step 6: Putting it together
Here is the complete agent combining everything above:
import anthropic
from dotenv import load_dotenv
from knowledge_base import search
load_dotenv()
client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-20250514"
SYSTEM_PROMPT = """You are a helpful customer support agent for Acme Corp.
Always search the knowledge base before answering product questions.
If no relevant articles are found, say so honestly -- do not make up answers."""
TOOLS = [
{
"name": "search_knowledge_base",
"description": "Search the customer support knowledge base for articles relevant to the user's question.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query based on the user's question",
}
},
"required": ["query"],
},
}
]
PRICING = {
"claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
}
class CostTracker:
def __init__(self, budget_usd: float = 1.00):
self.budget_usd = budget_usd
self.total_cost_usd = 0.0
self.total_input_tokens = 0
self.total_output_tokens = 0
def track(self, usage) -> None:
pricing = PRICING[MODEL]
self.total_input_tokens += usage.input_tokens
self.total_output_tokens += usage.output_tokens
self.total_cost_usd += (usage.input_tokens / 1e6) * pricing["input"]
self.total_cost_usd += (usage.output_tokens / 1e6) * pricing["output"]
def within_budget(self) -> bool:
return self.total_cost_usd < self.budget_usd
def handle_tool_call(name: str, input: dict) -> str:
if name == "search_knowledge_base":
results = search(input["query"])
if not results:
return "No relevant articles found."
return "\n\n".join(f"**{r['title']}**\n{r['content']}" for r in results)
return f"Unknown tool: {name}"
def trim_messages(messages: list, max_turns: int = 20) -> list:
if len(messages) <= max_turns * 2:
return messages
return messages[:1] + messages[-(max_turns * 2):]
def chat(user_input: str, messages: list, tracker: CostTracker) -> str:
messages.append({"role": "user", "content": user_input})
while True:
if not tracker.within_budget():
return "I have reached my usage limit for this conversation. Please start a new session."
messages_to_send = trim_messages(messages)
response = client.messages.create(
model=MODEL,
max_tokens=1024,
system=SYSTEM_PROMPT,
tools=TOOLS,
messages=messages_to_send,
)
tracker.track(response.usage)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
return response.content[0].text
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = handle_tool_call(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
messages.append({"role": "user", "content": tool_results})
def main():
messages = []
tracker = CostTracker(budget_usd=0.50)
print("Acme Corp Support Agent (type 'quit' to exit)\n")
while True:
user_input = input("You: ").strip()
if user_input.lower() in ("quit", "exit"):
print(f"\n{tracker.total_input_tokens:,} input tokens, "
f"{tracker.total_output_tokens:,} output tokens, "
f"${tracker.total_cost_usd:.4f} total cost")
break
reply = chat(user_input, messages, tracker)
print(f"\nAgent: {reply}\n")
if __name__ == "__main__":
main()
Run it with python agent.py and try asking questions like “How do I export my data?” or “What does the Team plan cost?” The agent will search the knowledge base and give grounded answers.
What’s next
You now have a working agent with tool calling, conversation management, and cost tracking. These three patterns are the foundation — every production agent builds on them.
From here, the interesting problems are: how do you add more tools without the agent getting confused? How do you test agent behavior reliably? How do you handle errors and retries in production? How do you evaluate whether your agent is actually helping users?
The StartToAgent starter kit has all of this built out with production-ready patterns: structured tool registries, automatic retries with exponential backoff, evaluation frameworks, and deployment configs. If you want to skip the boilerplate and start building your actual product, check out the kit.
Keep learning
Browse all guides