Building a Production AI Agent with the Claude Agent SDK | Snippets Ltd

If you've built chatbots or simple LLM wrappers, you know the drill: send a prompt, get a response, done. But real AI agents are different. They loop, they call tools, they make decisions, they recover from errors. They do actual work.

The Claude API gives you everything you need to build these kinds of agents in TypeScript. The @anthropic-ai/sdk package handles the API calls, and you build the agentic loop on top: tool definitions, execution logic, error recovery. Not toy demos, but production systems with proper error handling, context management, and human-in-the-loop controls.

In this article, we'll build a production-grade AI agent that processes customer support tickets. It reads tickets, classifies them, looks up relevant documentation, drafts responses, and escalates when it's unsure. By the end, you'll have a solid understanding of how to structure agentic systems with Claude.

What makes an agent different from a chatbot

A chatbot takes input and produces output. An agent takes input, decides what to do, takes action, observes the result, and repeats until the task is done. This is the agentic loop.

The key differences:

Tool use — The agent can call external functions (search a database, send an email, update a record)
Multi-step reasoning — The agent breaks complex tasks into steps and executes them sequentially
Decision making — The agent decides which tools to use and when, based on the current state
Error recovery — When something fails, the agent can retry, try a different approach, or escalate

The Claude API gives you the building blocks for this loop. You define tools, build the execution logic, and let it run.

Setting up the project

Let's start with a fresh TypeScript project. We need the Anthropic SDK and Zod for input validation.

mkdir support-agent
cd support-agent
npm init -y
npm install @anthropic-ai/sdk zod
npm install -D typescript @types/node
npx tsc --init

Update tsconfig.json:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "Node16",
    "moduleResolution": "Node16",
    "strict": true,
    "esModuleInterop": true,
    "outDir": "./dist",
    "rootDir": "./src"
  },
  "include": ["src/**/*"]
}

Project structure:

src/
├── agent.ts              # Agent configuration and loop
├── tools/
│   ├── classify-ticket.ts
│   ├── search-docs.ts
│   ├── draft-response.ts
│   └── escalate.ts
├── types.ts
└── index.ts              # Entry point

Defining the types

// src/types.ts

export interface SupportTicket {
  id: string;
  subject: string;
  body: string;
  customerEmail: string;
  priority: 'low' | 'medium' | 'high' | 'urgent';
  category?: string;
}

export interface TicketClassification {
  category: string;
  sentiment: 'positive' | 'neutral' | 'negative' | 'angry';
  requiresHuman: boolean;
  confidence: number;
}

export interface DocumentResult {
  title: string;
  content: string;
  relevanceScore: number;
}

export interface AgentResult {
  ticketId: string;
  classification: TicketClassification;
  draftResponse: string;
  escalated: boolean;
  toolCalls: string[];
}

Building the tool definitions

Each tool is a function the agent can call. In the Claude API, tools are defined with a name, description, and a JSON schema for the input parameters. The description is critical because Claude reads it to decide when and how to use the tool.

// src/tools/classify-ticket.ts

import Anthropic from '@anthropic-ai/sdk';
import type { SupportTicket, TicketClassification } from '../types.js';

export const classifyTicketTool: Anthropic.Messages.Tool = {
  name: 'classify_ticket',
  description:
    'Classify a support ticket into a category and assess sentiment. Use this as the first step when processing any new ticket.',
  input_schema: {
    type: 'object' as const,
    properties: {
      category: {
        type: 'string',
        enum: ['billing', 'technical', 'account', 'feature_request', 'bug_report', 'general'],
        description: 'The primary category of the support ticket.',
      },
      sentiment: {
        type: 'string',
        enum: ['positive', 'neutral', 'negative', 'angry'],
        description: 'The emotional tone of the customer message.',
      },
      requiresHuman: {
        type: 'boolean',
        description:
          'Whether this ticket needs human review. Set to true for angry customers, legal mentions, or requests you cannot confidently handle.',
      },
      confidence: {
        type: 'number',
        description: 'How confident you are in this classification, from 0 to 1.',
      },
    },
    required: ['category', 'sentiment', 'requiresHuman', 'confidence'],
  },
};

export function handleClassifyTicket(input: Record<string, unknown>): TicketClassification {
  return {
    category: input.category as string,
    sentiment: input.sentiment as TicketClassification['sentiment'],
    requiresHuman: input.requiresHuman as boolean,
    confidence: input.confidence as number,
  };
}

// src/tools/search-docs.ts

import Anthropic from '@anthropic-ai/sdk';
import type { DocumentResult } from '../types.js';

export const searchDocsTool: Anthropic.Messages.Tool = {
  name: 'search_documentation',
  description:
    'Search the knowledge base for relevant documentation. Use this after classifying a ticket to find information that helps draft an accurate response.',
  input_schema: {
    type: 'object' as const,
    properties: {
      query: {
        type: 'string',
        description: 'The search query. Be specific — include the product area and the problem.',
      },
      category: {
        type: 'string',
        description: 'The ticket category to narrow the search scope.',
      },
    },
    required: ['query'],
  },
};

export function handleSearchDocs(input: Record<string, unknown>): DocumentResult[] {
  const query = input.query as string;

  // In production, this would hit your actual search index (OpenSearch, Pinecone, etc.)
  // For this example, we return mock results
  return [
    {
      title: `Documentation: ${query}`,
      content: `Relevant documentation content for "${query}". In a real system, this comes from your knowledge base.`,
      relevanceScore: 0.92,
    },
  ];
}

// src/tools/draft-response.ts

import Anthropic from '@anthropic-ai/sdk';

export const draftResponseTool: Anthropic.Messages.Tool = {
  name: 'draft_response',
  description:
    'Draft a response to send to the customer. Use this after searching documentation. The response should be helpful, empathetic, and accurate.',
  input_schema: {
    type: 'object' as const,
    properties: {
      response: {
        type: 'string',
        description: 'The full draft response to send to the customer.',
      },
      internalNotes: {
        type: 'string',
        description: 'Internal notes for the support team, not visible to the customer.',
      },
    },
    required: ['response'],
  },
};

// src/tools/escalate.ts

import Anthropic from '@anthropic-ai/sdk';

export const escalateTool: Anthropic.Messages.Tool = {
  name: 'escalate_to_human',
  description:
    'Escalate the ticket to a human agent. Use this when the ticket requires human judgment: angry customers, legal issues, refund requests over $100, or when your confidence is below 0.7.',
  input_schema: {
    type: 'object' as const,
    properties: {
      reason: {
        type: 'string',
        description: 'Why this ticket needs human review.',
      },
      suggestedTeam: {
        type: 'string',
        enum: ['billing', 'engineering', 'legal', 'management'],
        description: 'Which team should handle this escalation.',
      },
    },
    required: ['reason', 'suggestedTeam'],
  },
};

The agentic loop

This is the core of the agent. We send the ticket to Claude along with the available tools, then keep looping as long as Claude wants to call more tools. Each tool result gets fed back into the conversation, and Claude decides what to do next.

// src/agent.ts

import Anthropic from '@anthropic-ai/sdk';
import type { SupportTicket, AgentResult, TicketClassification } from './types.js';
import { classifyTicketTool, handleClassifyTicket } from './tools/classify-ticket.js';
import { searchDocsTool, handleSearchDocs } from './tools/search-docs.js';
import { draftResponseTool } from './tools/draft-response.js';
import { escalateTool } from './tools/escalate.js';

const MAX_ITERATIONS = 10;

const SYSTEM_PROMPT = `You are a customer support agent for a SaaS product. Your job is to process support tickets efficiently and accurately.

For each ticket, follow this workflow:
1. Classify the ticket (category, sentiment, whether it needs human review)
2. If the ticket needs human review, escalate immediately
3. Search the documentation for relevant information
4. Draft a response based on the documentation and ticket context

Be empathetic but concise. If you're not confident in your answer (below 0.7), escalate to a human rather than guessing.`;

export async function processTicket(ticket: SupportTicket): Promise<AgentResult> {
  const client = new Anthropic();
  const tools = [classifyTicketTool, searchDocsTool, draftResponseTool, escalateTool];
  const toolCallLog: string[] = [];

  let classification: TicketClassification | null = null;
  let draftResponse = '';
  let escalated = false;

  const messages: Anthropic.Messages.MessageParam[] = [
    {
      role: 'user',
      content: `Process this support ticket:\n\nTicket ID: ${ticket.id}\nSubject: ${ticket.subject}\nPriority: ${ticket.priority}\n\n${ticket.body}`,
    },
  ];

  for (let iteration = 0; iteration < MAX_ITERATIONS; iteration++) {
    const response = await client.messages.create({
      model: 'claude-sonnet-4-5-20250929',
      max_tokens: 4096,
      system: SYSTEM_PROMPT,
      tools,
      messages,
    });

    // If Claude is done (no more tool calls), break
    if (response.stop_reason !== 'tool_use') {
      break;
    }

    // Process tool calls
    const assistantContent = response.content;
    messages.push({ role: 'assistant', content: assistantContent });

    const toolResults: Anthropic.Messages.ToolResultBlockParam[] = [];

    for (const block of assistantContent) {
      if (block.type !== 'tool_use') continue;

      toolCallLog.push(block.name);
      let result: string;

      switch (block.name) {
        case 'classify_ticket': {
          classification = handleClassifyTicket(block.input as Record<string, unknown>);
          result = JSON.stringify(classification);
          break;
        }
        case 'search_documentation': {
          const docs = handleSearchDocs(block.input as Record<string, unknown>);
          result = JSON.stringify(docs);
          break;
        }
        case 'draft_response': {
          const input = block.input as Record<string, unknown>;
          draftResponse = input.response as string;
          result = JSON.stringify({ status: 'draft_saved', length: draftResponse.length });
          break;
        }
        case 'escalate_to_human': {
          escalated = true;
          const input = block.input as Record<string, unknown>;
          result = JSON.stringify({
            status: 'escalated',
            reason: input.reason,
            team: input.suggestedTeam,
          });
          break;
        }
        default: {
          result = JSON.stringify({ error: `Unknown tool: ${block.name}` });
        }
      }

      toolResults.push({
        type: 'tool_result',
        tool_use_id: block.id,
        content: result,
      });
    }

    messages.push({ role: 'user', content: toolResults });
  }

  return {
    ticketId: ticket.id,
    classification: classification ?? {
      category: 'general',
      sentiment: 'neutral',
      requiresHuman: true,
      confidence: 0,
    },
    draftResponse,
    escalated,
    toolCalls: toolCallLog,
  };
}

A few things to notice here:

The loop has a hard cap (MAX_ITERATIONS). Without this, a confused agent could loop forever, burning tokens. In production, set this to something reasonable for your use case.
Each tool call gets logged. The toolCallLog array gives you an audit trail of every decision the agent made.
The agent decides the workflow. We don't hardcode "classify then search then draft." Claude reads the system prompt and figures out the right sequence. If the ticket is clearly angry, it might escalate immediately without searching docs.

Running the agent

// src/index.ts

import { processTicket } from './agent.js';
import type { SupportTicket } from './types.js';

const testTicket: SupportTicket = {
  id: 'TICKET-1234',
  subject: 'Cannot access my account after password reset',
  body: `Hi, I reset my password yesterday but now I can't log in at all.
I've tried the "forgot password" flow three times and the reset email
never arrives. I have a presentation tomorrow and I need access to my
dashboard urgently. Please help!`,
  customerEmail: 'customer@example.com',
  priority: 'high',
};

async function main(): Promise<void> {
  console.log('Processing ticket:', testTicket.id);
  const result = await processTicket(testTicket);

  console.log('\n--- Agent Result ---');
  console.log('Classification:', JSON.stringify(result.classification, null, 2));
  console.log('Escalated:', result.escalated);
  console.log('Tool calls:', result.toolCalls);
  console.log('\nDraft response:\n', result.draftResponse);
}

main().catch(console.error);

When you run this, Claude will typically:

Call classify_ticket — categorizes as "account", sentiment "negative", confidence ~0.9
Call search_documentation — searches for "password reset email not arriving"
Call draft_response — writes a helpful response with troubleshooting steps

The entire workflow happens automatically. Claude decides the order, the search queries, and the response content.

Error handling in production

The example above works for demos, but production agents need proper error handling. Here's how to make the agentic loop resilient:

async function processTicketWithRetry(
  ticket: SupportTicket,
  maxRetries: number = 3
): Promise<AgentResult> {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await processTicket(ticket);
    } catch (error: unknown) {
      const isRateLimited =
        error instanceof Anthropic.RateLimitError;
      const isOverloaded =
        error instanceof Anthropic.InternalServerError;

      if ((isRateLimited || isOverloaded) && attempt < maxRetries) {
        const backoffMs = Math.min(1000 * Math.pow(2, attempt), 30000);
        console.warn(
          `Attempt ${attempt} failed (${isRateLimited ? 'rate limited' : 'overloaded'}), retrying in ${backoffMs}ms`
        );
        await new Promise((resolve) => setTimeout(resolve, backoffMs));
        continue;
      }

      throw error;
    }
  }

  throw new Error('Exhausted all retry attempts');
}

Key patterns:

Exponential backoff for rate limits and server errors. Claude's API returns 429 (rate limited) and 529 (overloaded) — both are transient and worth retrying.
Fail fast on other errors. A 400 (bad request) or 401 (auth error) won't get better with retries. Let them propagate.
Cap the backoff. Without the Math.min, exponential backoff can reach absurd wait times.

Context management

As the conversation grows (each tool call adds to the message history), you'll eventually hit Claude's context window limit. For long-running agents, you need a strategy:

Summarization. After every N tool calls, ask Claude to summarize the conversation so far, then replace the message history with the summary. This keeps the context focused.

async function summarizeContext(
  client: Anthropic,
  messages: Anthropic.Messages.MessageParam[]
): Promise<string> {
  const response = await client.messages.create({
    model: 'claude-sonnet-4-5-20250929',
    max_tokens: 1024,
    messages: [
      {
        role: 'user',
        content: `Summarize this agent conversation in 2-3 paragraphs. Focus on what was done, what was learned, and what still needs to happen:\n\n${JSON.stringify(messages)}`,
      },
    ],
  });

  const textBlock = response.content.find(
    (block): block is Anthropic.Messages.TextBlock => block.type === 'text'
  );

  return textBlock?.text ?? '';
}

Sliding window. Keep only the last N messages plus the original system prompt. Simple but effective for tasks where earlier context becomes irrelevant.

Selective context. Before each API call, filter the message history to include only messages relevant to the current step. This requires more logic but gives the best results for complex workflows.

Human-in-the-loop controls

Not every decision should be automated. Production agents need breakpoints where a human can review, approve, or override.

interface HumanReviewRequest {
  ticketId: string;
  reason: string;
  agentDraft: string;
  classification: TicketClassification;
}

function requiresHumanApproval(result: AgentResult): boolean {
  if (result.escalated) return true;
  if (result.classification.confidence < 0.7) return true;
  if (result.classification.sentiment === 'angry') return true;

  return false;
}

In a real system, this feeds into a review queue (a simple database table or a tool like Linear/Jira). The human approves, edits, or rejects the draft, and the system sends the final response.

The important thing is that these checks happen after the agent runs, not before. Let the agent do its work, then gate the output. This gives you the speed of automation with the safety of human oversight.

Monitoring and observability

You can't improve what you can't measure. Track these metrics for your agent:

Tool call sequence — What tools did the agent call, in what order? This reveals workflow patterns and anomalies.
Token usage per ticket — How much is each ticket costing you? Flag tickets that consume unusually many tokens.
Classification confidence distribution — If confidence is consistently low, your tools or system prompt need work.
Escalation rate — If the agent escalates too often, it's not useful. If it never escalates, it might be overconfident.
Iteration count — Agents that take many iterations to complete are either stuck or working on genuinely complex tasks. Both are worth investigating.

Log all of these as structured JSON and ship them to your observability stack (Datadog, Grafana, even a simple PostgreSQL table).

Conclusion

Building production AI agents with Claude is about more than just prompt engineering. It's about building a system: tools with clear contracts, an agentic loop with hard limits, error handling that respects API realities, context management that scales, and human oversight that catches mistakes.

The Claude API gives you the building blocks. The patterns in this article give you the scaffolding. The rest is your domain logic.

Start with a simple agent that does one thing well. Add tools incrementally. Monitor everything. And remember: the best agent is one that knows when to ask for help.