Why We Chose the Anthropic SDK Over LangChain
Less abstraction, more control. Here's the reasoning behind our stack decision.
Why We Chose the Anthropic SDK Over LangChain
When we started building StartToAgent, the first technical decision we had to make was the most consequential: do we build on LangChain or on the raw Anthropic SDK?
We chose the Anthropic SDK. Not because LangChain is bad — it is a well-maintained project that solves real problems. We chose the SDK because for production agents built on a single provider, the abstractions that LangChain provides are not just unnecessary. They actively get in the way.
This post explains our reasoning. It is opinionated, but we have tried to be fair. If your situation is different from ours, LangChain might be exactly right for you.
The thin wrapper philosophy
The Anthropic Python SDK is a thin wrapper around the REST API. When you call client.messages.create(), you know exactly what HTTP request is being made, what the request body looks like, and what you will get back. The response object maps directly to the API response.
This matters more than it sounds. When you are debugging an agent that made a weird tool call at 3 AM, you want to be able to look at the raw request and response. You want to paste the request body into the API playground and reproduce the issue. You want to read the API documentation and have it match your code.
With LangChain, there is a translation layer between you and the API. Your tool definition goes through a decorator, gets converted to a LangChain Tool object, which gets serialized to the Anthropic tool schema at call time. The response comes back, gets deserialized into LangChain message types, and tool calls get wrapped in their own objects. Every step in that chain is a place where something can go wrong, and every step is a place where the debugging story gets harder.
For a weekend project, this does not matter much. For a production agent that handles thousands of conversations, it matters a lot.
Where LangChain abstractions get in the way
Here are specific areas where we found LangChain’s abstractions created more work than they saved.
Tool calling
With the Anthropic SDK, a tool definition is a dictionary:
tools = [{
"name": "search_docs",
"description": "Search the documentation",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
}]
It maps directly to what the API expects. You can copy this from the Anthropic documentation, paste it in, and it works.
With LangChain, you typically define tools using decorators or Pydantic models:
@tool
def search_docs(query: str) -> str:
"""Search the documentation"""
...
This is more concise, sure. But the moment you need something the decorator does not support — complex nested input schemas, dynamic tool descriptions, conditional tool availability — you are fighting the abstraction instead of just writing a dictionary.
We found ourselves writing more code to work around LangChain’s tool abstractions than we would have written to just use the raw SDK. And crucially, that workaround code was harder to understand and debug than the straightforward SDK version.
Error handling and retries
The Anthropic SDK raises specific exceptions: RateLimitError, APIStatusError, APIConnectionError. You can catch each one and handle it differently. Rate limited? Back off and retry. Server error? Retry with exponential backoff. Bad request? Log it and bail.
try:
response = client.messages.create(...)
except anthropic.RateLimitError:
time.sleep(backoff)
continue
except anthropic.BadRequestError as e:
logger.error(f"Bad request: {e}")
break
When LangChain wraps these calls, the exception types change. Sometimes you get a LangChain exception wrapping an Anthropic exception. Sometimes the retry logic is handled internally by LangChain’s callback system. Sometimes the error gets swallowed by a chain and surfaces as a different error three layers up.
We spent more time debugging LangChain’s error handling behavior than building our own. Our error handler in StartToAgent is about 60 lines of straightforward try/except logic. It handles every edge case we have encountered in production. We understand every line of it because we wrote every line of it.
Cost tracking
This is the one that really sealed the decision. The Anthropic API returns usage.input_tokens and usage.output_tokens with every response. Multiplying these by the per-token price gives you exact cost per call. Simple.
In LangChain, token usage is available through callbacks. You set up a callback handler, attach it to your chain, and collect the token counts as they flow through. This works, but it is a fundamentally different model. Your cost tracking logic is now event-driven instead of sequential. It is harder to reason about, harder to test, and harder to get right when you have agents that make tool calls that trigger sub-chains.
We wanted cost tracking that was dead simple: after every API call, record the cost. After every conversation, know the total. After every day, know the burn rate. The SDK makes this trivial. LangChain makes it possible but not trivial.
Debugging agentic loops
The core of every agent is a loop: send messages, check if the model wants to call tools, execute tools, send results back, repeat. With the Anthropic SDK, this loop is explicit in your code:
while True:
response = client.messages.create(...)
if response.stop_reason == "end_turn":
break
# handle tool calls
You can log every iteration. You can set breakpoints. You can inspect the exact messages being sent at each step. The control flow is your control flow.
LangChain’s AgentExecutor handles this loop for you. Which is convenient until something goes wrong. When an agent gets stuck in a loop, or makes an unexpected tool call, or produces output you do not understand, you need to trace through LangChain’s execution engine to figure out what happened. The agent loop — the most critical piece of logic in your entire system — is inside a library you do not control.
When LangChain makes sense
We are not saying LangChain is bad. Here are scenarios where it genuinely makes sense:
Multi-provider support. If your product needs to work with OpenAI, Anthropic, Google, and local models, LangChain’s provider abstraction saves you from writing four different integrations. The abstraction cost is worth it when you are actually abstracting over meaningful differences.
Rapid prototyping. If you are exploring an idea and need to get something working in an afternoon, LangChain’s batteries-included approach lets you move fast. You can always rewrite in raw SDK later if the idea works.
Complex RAG pipelines. LangChain’s document loaders, text splitters, and vector store integrations are genuinely useful if you are building a retrieval pipeline with many moving parts. The ecosystem of integrations saves real work.
Teams with mixed LLM experience. If your team includes people who are new to LLM development, LangChain’s higher-level abstractions can make the codebase more approachable. The decorator-based tool definitions are easier to read than raw JSON schemas.
When it does not make sense
Single-provider production agents. If you are building with Claude and only Claude, the provider abstraction is pure overhead. You are paying the complexity cost without getting the portability benefit.
Performance-sensitive applications. Every abstraction layer adds latency and memory overhead. For agents that need to respond fast and handle high throughput, fewer layers means better performance.
Teams that need to deeply understand their agent behavior. If you are in a domain where agent decisions have real consequences — customer support, financial analysis, medical information — you want to be as close to the model as possible. Every layer between you and the API is a layer that could behave in ways you do not expect.
Long-lived production systems. Framework dependencies are upgrade risks. When LangChain ships a major version with breaking changes, you have to update your code. When you own your SDK integration, you update on your own schedule.
Our recommendation
If you are building production agents with Claude, start with the Anthropic SDK. Write your agent loop explicitly. Handle your own errors. Track your own costs. You will understand every line of your system, and when things go wrong at 3 AM, you will be able to fix them.
If you later find that you genuinely need multi-provider support or complex RAG pipelines, you can add LangChain for those specific pieces. But do not start with a framework and try to simplify later. Start simple and add complexity only when you have a concrete reason.
The StartToAgent kit demonstrates this approach: production-ready agents with clean infrastructure, built entirely on the Anthropic SDK. Every module is readable. Every decision is traceable. Every cost is accounted for. That is the kind of foundation you want to build production systems on.
Keep reading
Browse all posts