Back to Blog

Launching StartToAgent

Why we built a starter kit for AI agents, what's inside, and where we're headed next.


Launching StartToAgent

There is a strange gap in the AI agent ecosystem right now. On one side, you have tutorials that walk you through a 50-line chatbot and call it an “agent.” On the other side, you have enterprise platforms that want five figures a year and a three-month onboarding. If you are a developer or a small team that wants to build real agents with Claude — agents that handle errors, track costs, and actually work in production — you have been mostly on your own.

That is why we built StartToAgent.

The problem we kept running into

Over the past year, we built agents for clients across customer support, research automation, and document processing. Every single project started the same way: we would scaffold the same infrastructure from scratch. Tool registries. Conversation management. Cost tracking. Error handling with retries. Evaluation harnesses.

The agent logic itself — the interesting part, the part unique to each use case — was maybe 20% of the code. The other 80% was plumbing. Good plumbing, hard-won plumbing, plumbing that took us months to get right. But plumbing nonetheless.

We looked for starter kits that would give us this foundation. What we found was either too simple (just the Anthropic SDK with a while loop) or too heavy (full frameworks that abstracted away the parts we needed to control). Nothing hit the sweet spot of “production-ready infrastructure without opinions about my agent’s logic.”

So we built it.

What is in the kit

StartToAgent is a Python starter kit for building AI agents with the Anthropic SDK. It is not a framework. It does not have a plugin system, a configuration language, or a dependency graph that takes a PhD to understand. It is a codebase you own and modify.

Here is what you get:

Three working agents that demonstrate different patterns:

  • A customer support agent that searches a knowledge base, handles multi-turn conversations, and escalates when it cannot help. This is the workflow pattern — deterministic steps with LLM decision-making at key junctures.
  • A research agent that breaks complex questions into sub-tasks, searches multiple sources, synthesizes findings, and produces structured reports. This is the multi-step reasoning pattern — the agent plans, executes, and iterates.
  • A document extraction agent that takes unstructured documents and produces clean, validated JSON conforming to your schema. This is the structured output pattern — constrained generation with validation loops.

Five core infrastructure modules that every agent shares:

  • Tool registry — Define tools once, get schema generation, validation, and dispatch for free. No decorators, no magic. Just a dictionary and a function.
  • Conversation manager — Handles message history, sliding window trimming, and the critical invariant that every tool_use block has a matching tool_result.
  • Cost tracker — Per-call and cumulative cost tracking with configurable budgets. Know exactly what every conversation costs before your invoice arrives.
  • Error handler — Retries with exponential backoff, rate limit detection, and graceful degradation. The stuff that separates a demo from a product.
  • Structured output parser — JSON extraction with Pydantic validation, automatic retry on malformed output, and schema-guided prompting.

An evaluation framework for testing agent behavior:

  • Define test cases as input/expected-output pairs
  • Run evals against your agent and get pass/fail with detailed diffs
  • Track eval results over time to catch regressions
  • Works with both deterministic checks (did the agent call the right tool?) and LLM-graded checks (is this response helpful?)

The philosophy

We made some opinionated decisions. Here is the reasoning behind the big ones.

Anthropic SDK only. No LangChain, no LlamaIndex, no orchestration frameworks. The Anthropic SDK is clean, well-documented, and gives you full control over every API call. Frameworks add indirection that makes debugging harder and limits what you can do. When you are building agents that need to handle edge cases gracefully, you want to see every token flowing through your system. We wrote a whole post about this reasoning.

You own the code. StartToAgent is not a library you install from pip. It is a codebase you clone, read, and modify. When you need to change how tool dispatch works, you change it. When you need to add a custom retry strategy, you add it. There is no upgrade path to worry about, no breaking changes, no waiting for a maintainer to merge your PR.

Opinionated defaults, easy overrides. We picked sensible defaults for everything — model selection, retry counts, budget limits, context window management. But every default is a constant or a parameter, not buried in a framework. Change it in one place and move on.

Tests and evals are first-class. Every module has tests. Every agent has evals. The eval framework is not an afterthought bolted on later — it is part of the kit from day one. Because agent development without evals is just vibes-based programming, and vibes do not scale.

Who this is for

StartToAgent is for developers and small teams who want to build production AI agents without starting from scratch or adopting a heavy framework. Specifically:

  • Solo developers building AI-powered features into their products. You know Python, you have an Anthropic API key, and you want to skip the month of infrastructure work.
  • Small teams that need to ship agent-powered workflows and want a shared foundation that everyone understands. The codebase is small enough to read in an afternoon.
  • Agency developers building agents for clients across different domains. The three agent patterns cover most use cases, and the shared infrastructure means you are not reinventing the wheel for each project.

If you are building a massive multi-model orchestration system with dozens of providers and hundreds of agents, this is probably too small for you. If you are just trying to make a simple chatbot, this is probably too much. StartToAgent lives in the middle — serious agent development with Claude, without the overhead.

What is coming next

This launch is the beginning, not the end. Here is what we are working on:

More agent patterns. We are building out agents for code generation, data analysis, and email triage. Each one demonstrates a different pattern and adds to the shared infrastructure.

More guides. The guides section already has a walkthrough for building your first agent. We are adding guides on tool design, prompt engineering for agents, evaluation strategies, and deployment patterns.

Community input. We want to hear what you are building and what infrastructure you wish you had. The agents and modules in the kit should reflect what real developers actually need, not what we think is cool.

Deeper eval tooling. The current eval framework handles the basics. We are building out support for multi-turn conversation evals, cost-aware eval budgets, and integration with CI/CD pipelines.

Get started

The kit is available now. You can grab it from the kit page, clone the repo, and have a working agent running in under 10 minutes.

If you want to learn the fundamentals first, start with the Building Your First Agent guide. It walks you through tool calling, conversation management, and cost tracking step by step.

And if you just want to follow along as we build, join the newsletter. We send updates when we ship new agents, modules, and guides. No spam, no fluff, just the useful stuff.

We built StartToAgent because we needed it. We hope you do too.

Keep reading

Browse all posts