Building Your First AI Agent for Production: The Steps Nobody Shows You

Jul 2, 20268 min

AI agent tutorials all stop at the same point: the prototype that works in a Jupyter notebook. Here I cover what comes next — building an agent that runs in production with real users.

What an AI Agent Really Is (In Practice)

An AI agent is a loop: the model receives context, decides on an action, executes the corresponding tool, gets the result back, and repeats until it has a final answer.

Minimal architecture:

interface AgentStep {
  thought: string;
  tool?: string;
  toolInput?: Record<string, unknown>;
  observation?: string;
  finalAnswer?: string;
}

async function runAgent(userQuery: string, tools: Tool[]): Promise<string> {
  const messages: Message[] = [{ role: 'user', content: userQuery }];

  for (let i = 0; i < MAX_STEPS; i++) {
    const response = await callClaude({ messages, tools });
    const step = parseAgentResponse(response);

    if (step.finalAnswer) return step.finalAnswer;
    if (!step.tool) throw new Error('Agent stuck: no tool, no answer');

    const result = await executeTool(step.tool, step.toolInput);
    messages.push(
      { role: 'assistant', content: response.content },
      { role: 'user', content: `Tool result: ${result}` }
    );
  }
  throw new Error(`Agent exceeded MAX_STEPS (${MAX_STEPS})`);
}

MAX_STEPS is non-negotiable. Without it, an agent can loop indefinitely.

Define Tools with Precision

Tool quality is what separates a useful agent from one that hallucinates.

Each tool needs:

An explicit name (search_database, not search)
A description that says when to use it AND when not to
A strict JSON schema for parameters

const searchTool = {
  name: "search_client_database",
  description: `Search the client database by name, SIRET, or email.
    Use when: the user asks about a specific client or wants client details.
    Do NOT use for: general statistics or reports — use get_stats instead.`,
  input_schema: {
    type: "object",
    properties: {
      query: { type: "string", description: "Name, SIRET, or email" },
      limit: { type: "number", description: "Max results, default 5" }
    },
    required: ["query"]
  }
};

The more precise the description, the fewer unnecessary calls the agent makes.

State Management and Persistence

A production agent needs persistent state between sessions. Simple pattern I use:

interface AgentSession {
  id: string;
  userId: string;
  messages: Message[];
  context: Record<string, unknown>;
  createdAt: string;
  updatedAt: string;
}

In practice: Redis for active sessions (30min TTL), PostgreSQL for long-term history.

Tests Before Production

What I systematically test:

Happy path: agent completes the task in ≤ N steps
Tool failure: a tool returns an error — agent must degrade gracefully
Infinite loop: MAX_STEPS reached — agent must exit cleanly
Malicious input: prompt injection, context overflow
Average cost: validate that cost per session stays within budget

describe('AgentCore', () => {
  it('completes booking in ≤5 steps', async () => {
    const result = await runAgent('Book a meeting for next Tuesday', mockTools);
    expect(result).toContain('confirmed');
    expect(stepCount).toBeLessThanOrEqual(5);
  });

  it('handles tool failure gracefully', async () => {
    mockTools.search_calendar.mockRejectedValue(new Error('timeout'));
    const result = await runAgent('Check my schedule', mockTools);
    expect(result).toContain('unavailable');
  });
});

Production Monitoring

Two critical metrics to watch in real time:

Average steps per session — if it rises, a tool is broken or the prompt is ambiguous
MAX_STEPS session rate — if > 2%, there’s a bug in the agent logic

Without monitoring, you discover problems when users complain. With it, you see them first.

What We Don’t Do

Never deploy without a regression test suite
Never exceed 10 tools per agent (beyond that, the model loses track)
Never let an agent modify critical data without human confirmation

Stéphanie Caumont

AI Product Owner · Learn more

← All articles Contact me