Cost Tracking: The Feature You'll Wish You Built First
Per-conversation budgets, token logging, and the alerting system that prevents $500 surprise bills.
Cost Tracking: The Feature You’ll Wish You Built First
Here is a story that plays out every month. A developer builds an agent, ships it to beta users, and goes to bed feeling good. At 3am, a user finds an edge case that makes the agent loop — calling tools, getting errors, retrying, calling more tools. By morning, the Anthropic invoice has a line item for $500 that nobody budgeted for.
This is not hypothetical. It is the most common production incident for agent-based applications. The fix is not complicated, but it has to be built in from the start. Bolting cost tracking onto an existing agent is like adding seatbelts after the car is on the highway.
Understanding Claude’s pricing
Before you can track costs, you need to understand the pricing model. Claude charges per token, with different rates for input and output:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Claude Sonnet 4 | $3.00 | $15.00 |
| Claude Opus 4 | $15.00 | $75.00 |
| Claude Haiku 4.5 | $0.80 | $4.00 |
Output tokens are 5x more expensive than input tokens. This matters because agent loops are output-heavy: every tool call decision, every intermediate reasoning step, every reformulation of a search query costs output tokens. A single agent conversation with 5 tool calls might use 2,000 input tokens but 8,000 output tokens.
The other thing to internalize: tokens accumulate across turns. Each time you send the conversation history back to Claude, you are paying for all previous messages again as input. A 20-turn conversation with a 4,000-token system prompt means you are paying for that system prompt 20 times. Conversation management and cost tracking are deeply linked.
Building a cost tracker
The core abstraction is an APICallRecord — a log entry for every API call your agent makes:
from dataclasses import dataclass, field
from typing import Any
import time
@dataclass
class APICallRecord:
"""Record of a single API call."""
timestamp: float
model: str
input_tokens: int
output_tokens: int
cost_usd: float
conversation_id: str = ""
metadata: dict[str, Any] = field(default_factory=dict)
Every field here earns its place. timestamp lets you build time-series dashboards. model matters because pricing varies by model. conversation_id enables per-conversation budgets. metadata is your escape hatch for anything else — the tool that was called, the user ID, the deployment environment.
The CostTracker class wraps a list of these records and provides the math:
from core.errors import BudgetExceededError
# Pricing per million tokens — update when Anthropic changes pricing
MODEL_PRICING = {
"claude-sonnet-4-20250514": {"input": 3.0, "output": 15.0},
"claude-opus-4-20250514": {"input": 15.0, "output": 75.0},
"claude-haiku-4-5-20241022": {"input": 0.80, "output": 4.0},
}
DEFAULT_PRICING = {"input": 3.0, "output": 15.0}
class CostTracker:
def __init__(
self,
budget_per_conversation: float = 0.0,
budget_per_session: float = 0.0,
):
self.budget_per_conversation = budget_per_conversation
self.budget_per_session = budget_per_session
self._records: list[APICallRecord] = []
self._conversation_costs: dict[str, float] = {}
A budget of 0.0 means no limit. In development that is fine. In production, always set both budgets. A reasonable starting point for a support agent: budget_per_conversation=0.50 and budget_per_session=25.0.
Recording API calls
After every client.messages.create() call, record the usage:
tracker = CostTracker(
budget_per_conversation=0.50,
budget_per_session=25.0,
)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=messages,
)
# Record immediately after the call
tracker.record(
model="claude-sonnet-4-20250514",
input_tokens=response.usage.input_tokens,
output_tokens=response.usage.output_tokens,
conversation_id="conv-123",
)
Inside record(), the cost calculation is straightforward:
@staticmethod
def _calculate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
pricing = MODEL_PRICING.get(model, DEFAULT_PRICING)
input_cost = (input_tokens / 1_000_000) * pricing["input"]
output_cost = (output_tokens / 1_000_000) * pricing["output"]
return input_cost + output_cost
Notice the fallback to DEFAULT_PRICING. When Anthropic releases a new model and you upgrade before updating your pricing table, the tracker does not crash. It uses Sonnet-level pricing as a conservative estimate. This is a small design decision that prevents outages.
Per-conversation and per-session budgets
The tracker checks budgets after every recorded call:
def _check_budgets(self, conversation_id: str = "") -> None:
if self.budget_per_session > 0 and self.session_cost > self.budget_per_session:
raise BudgetExceededError(
budget=self.budget_per_session,
spent=self.session_cost,
scope="session",
)
if (
self.budget_per_conversation > 0
and conversation_id
and self.conversation_cost(conversation_id) > self.budget_per_conversation
):
raise BudgetExceededError(
budget=self.budget_per_conversation,
spent=self.conversation_cost(conversation_id),
scope="conversation",
conversation_id=conversation_id,
)
BudgetExceededError is a custom exception. Catch it in your agent loop and return a graceful message:
from core.errors import BudgetExceededError
def agent_loop(user_input: str, messages: list, tracker: CostTracker) -> str:
messages.append({"role": "user", "content": user_input})
while True:
try:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=messages,
)
tracker.record(
model="claude-sonnet-4-20250514",
input_tokens=response.usage.input_tokens,
output_tokens=response.usage.output_tokens,
conversation_id="conv-123",
)
except BudgetExceededError as e:
return (
f"I've reached my usage limit for this {e.scope}. "
"Please start a new conversation or contact support."
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
return response.content[0].text
# ... handle tool calls ...
Two levels of budget give you different protections. Per-conversation budgets prevent individual runaway loops. Per-session budgets protect against a flood of normal-cost conversations that add up — a bot going viral on social media, for instance.
The pricing table approach
Hard-coding prices is fragile. The MODEL_PRICING dict centralizes pricing and makes updates trivial:
MODEL_PRICING: dict[str, dict[str, float]] = {
"claude-sonnet-4-20250514": {"input": 3.0, "output": 15.0},
"claude-opus-4-20250514": {"input": 15.0, "output": 75.0},
"claude-haiku-4-5-20241022": {"input": 0.80, "output": 4.0},
}
DEFAULT_PRICING = {"input": 3.0, "output": 15.0}
When Anthropic ships a new model, add one line. When prices change, update the numbers. The DEFAULT_PRICING fallback means your agent keeps running even if you forget. In a team setting, consider loading this from a config file or environment variable so deployments do not require code changes.
A practical pattern: check anthropic.com/pricing monthly and update the dict. If you manage many agents, write a script that scrapes the pricing page and updates your config. Overestimating is always better than underestimating.
Querying costs at runtime
The tracker provides several ways to inspect costs while your agent is running:
# Total cost for the entire session
print(f"Session cost: ${tracker.session_cost:.4f}")
# Token breakdown
tokens = tracker.session_tokens
print(f"Input: {tokens['input']:,} | Output: {tokens['output']:,} | Total: {tokens['total']:,}")
# Cost for a specific conversation
conv_cost = tracker.conversation_cost("conv-123")
print(f"Conversation cost: ${conv_cost:.4f}")
# Total API calls
print(f"API calls: {tracker.total_calls}")
# Full summary as a dict (good for logging)
import json
print(json.dumps(tracker.summary(), indent=2))
The summary() method returns a dict with everything you need for dashboards:
{
"total_calls": 12,
"total_cost_usd": 0.0847,
"total_input_tokens": 15230,
"total_output_tokens": 3891,
"total_tokens": 19121,
"conversations": {
"conv-123": 0.0412,
"conv-456": 0.0435
},
"budget_per_conversation": 0.5,
"budget_per_session": 25.0
}
Exporting usage data
For billing reconciliation, internal reporting, or debugging, export your records:
# JSON — complete records with metadata
tracker.export_json("usage_2026_03.json")
# CSV — tabular format for spreadsheets and data tools
tracker.export_csv("usage_2026_03.csv")
The JSON export includes the full summary plus every individual record:
def export_json(self, path: str | Path) -> None:
data = {
"summary": self.summary(),
"records": [
{
"timestamp": r.timestamp,
"model": r.model,
"input_tokens": r.input_tokens,
"output_tokens": r.output_tokens,
"cost_usd": round(r.cost_usd, 6),
"conversation_id": r.conversation_id,
"metadata": r.metadata,
}
for r in self._records
],
}
Path(path).write_text(json.dumps(data, indent=2))
The CSV export is intentionally simple — timestamp, model, tokens, cost, conversation ID. Load it into pandas, pipe it into a dashboard, or open it in a spreadsheet:
import pandas as pd
df = pd.read_csv("usage_2026_03.csv")
daily_cost = df.groupby(df['timestamp'].apply(
lambda t: pd.Timestamp(t, unit='s').date()
))['cost_usd'].sum()
print(daily_cost)
What to monitor in production
Once you have cost data flowing, here is what to watch:
Cost per conversation. This is your primary metric. If your median conversation costs $0.03 and you suddenly see conversations costing $0.50, something changed — a new edge case, a broken tool, a prompt regression. Set an alert at 5x your median.
Token efficiency ratio. Calculate output_tokens / input_tokens per conversation. A healthy ratio for a support agent is 0.3-0.5 (the agent reads more than it writes). If this ratio spikes above 1.0, the agent is generating too much text or looping.
Cost by model tier. If you use multiple models (Haiku for simple routing, Sonnet for complex reasoning), track the cost split. You might discover that 80% of conversations could be handled by Haiku, saving 75% of your costs.
Budget hit rate. What percentage of conversations hit the budget limit? If it is above 2%, your budget is too low or your agent has a systematic problem. If it is exactly 0%, your budget might be too high to catch runaway loops before they get expensive.
Daily and weekly totals. Plot these over time. Costs should correlate with traffic. If costs grow faster than traffic, investigate.
A simple alerting setup for a production agent:
import logging
logger = logging.getLogger(__name__)
def post_conversation_check(tracker: CostTracker, conversation_id: str) -> None:
cost = tracker.conversation_cost(conversation_id)
if cost > 0.20:
logger.warning(
f"High-cost conversation {conversation_id}: ${cost:.4f}"
)
if tracker.session_cost > tracker.budget_per_session * 0.8:
logger.warning(
f"Session approaching budget: ${tracker.session_cost:.4f} / "
f"${tracker.budget_per_session:.2f}"
)
Call this after every conversation ends. Pipe the warnings to Slack, PagerDuty, or wherever your team watches for alerts.
What’s next
Cost tracking is defensive infrastructure. It does not make your agent smarter, but it makes sure a dumb mistake does not cost you hundreds of dollars. Build it first, before you have users, before you have traffic.
For the tool patterns that generate these costs, see the tool calling patterns guide. For testing that your agent stays within budget consistently, see the evaluation guide.
The StartToAgent starter kit includes the full CostTracker with per-conversation budgets, JSON/CSV export, and budget enforcement out of the box. No need to build this from scratch when you could be building your actual product. Check out the kit.
Keep learning
Browse all guides