Building Your First AI Agent for Production: The Steps Nobody Shows You
AI agent tutorials all stop at the same point: the prototype that works in a Jupyter notebook. Here I cover what comes next — building an agent that runs in production with real users.
What an AI Agent Really Is (In Practice)
An AI agent is a loop: the model receives context, decides on an action, executes the corresponding tool, gets the result back, and repeats until it has a final answer.
Minimal architecture:
interface AgentStep {
thought: string;
tool?: string;
toolInput?: Record<string, unknown>;
observation?: string;
finalAnswer?: string;
}
async function runAgent(userQuery: string, tools: Tool[]): Promise<string> {
const messages: Message[] = [{ role: 'user', content: userQuery }];
for (let i = 0; i < MAX_STEPS; i++) {
const response = await callClaude({ messages, tools });
const step = parseAgentResponse(response);
if (step.finalAnswer) return step.finalAnswer;
if (!step.tool) throw new Error('Agent stuck: no tool, no answer');
const result = await executeTool(step.tool, step.toolInput);
messages.push(
{ role: 'assistant', content: response.content },
{ role: 'user', content: `Tool result: ${result}` }
);
}
throw new Error(`Agent exceeded MAX_STEPS (${MAX_STEPS})`);
}
MAX_STEPS is non-negotiable. Without it, an agent can loop indefinitely.
Define Tools with Precision
Tool quality is what separates a useful agent from one that hallucinates.
Each tool needs:
- An explicit name (
search_database, notsearch) - A description that says when to use it AND when not to
- A strict JSON schema for parameters
const searchTool = {
name: "search_client_database",
description: `Search the client database by name, SIRET, or email.
Use when: the user asks about a specific client or wants client details.
Do NOT use for: general statistics or reports — use get_stats instead.`,
input_schema: {
type: "object",
properties: {
query: { type: "string", description: "Name, SIRET, or email" },
limit: { type: "number", description: "Max results, default 5" }
},
required: ["query"]
}
};
The more precise the description, the fewer unnecessary calls the agent makes.
State Management and Persistence
A production agent needs persistent state between sessions. Simple pattern I use:
interface AgentSession {
id: string;
userId: string;
messages: Message[];
context: Record<string, unknown>;
createdAt: string;
updatedAt: string;
}
In practice: Redis for active sessions (30min TTL), PostgreSQL for long-term history.
Tests Before Production
What I systematically test:
- Happy path: agent completes the task in ≤ N steps
- Tool failure: a tool returns an error — agent must degrade gracefully
- Infinite loop: MAX_STEPS reached — agent must exit cleanly
- Malicious input: prompt injection, context overflow
- Average cost: validate that cost per session stays within budget
describe('AgentCore', () => {
it('completes booking in ≤5 steps', async () => {
const result = await runAgent('Book a meeting for next Tuesday', mockTools);
expect(result).toContain('confirmed');
expect(stepCount).toBeLessThanOrEqual(5);
});
it('handles tool failure gracefully', async () => {
mockTools.search_calendar.mockRejectedValue(new Error('timeout'));
const result = await runAgent('Check my schedule', mockTools);
expect(result).toContain('unavailable');
});
});
Production Monitoring
Two critical metrics to watch in real time:
- Average steps per session — if it rises, a tool is broken or the prompt is ambiguous
- MAX_STEPS session rate — if > 2%, there’s a bug in the agent logic
Without monitoring, you discover problems when users complain. With it, you see them first.
What We Don’t Do
- Never deploy without a regression test suite
- Never exceed 10 tools per agent (beyond that, the model loses track)
- Never let an agent modify critical data without human confirmation
Stéphanie Caumont
AI Product Owner · Learn more