LLM Tool Calling in Production: The Complete Developer Guide

Developer writing LLM tool calling code with OpenAI API integration

From Chatbot to Actual Agent

A language model that only generates text is a chatbot. An LLM with tool calling is an agent.

Tool calling (also called function calling) lets the model decide to invoke a real function in your code — search a database, update a CRM, send an email, call an API — instead of just generating text.

Here's everything you need to know to implement it in production.

The Basic Pattern

You define tools as JSON schemas and pass them to the API. The model decides when to use them.

const tools = [
  {
    type: "function",
    function: {
      name: "reschedule_call",
      description: "Reschedule the call when the lead asks to be called later",
      parameters: {
        type: "object",
        properties: {
          leadId: { type: "string", description: "The lead's unique ID" },
          datetime: { type: "string", description: "ISO 8601 datetime for the reschedule" },
          reason: { type: "string", enum: ["busy", "not_interested", "requested_later"] },
        },
        required: ["leadId", "datetime"],
      },
    },
  },
];

const response = await openai.chat.completions.create({
  model: "gpt-4-turbo",
  messages,
  tools,
  tool_choice: "auto", // Let the model decide when to call tools
});

Handling the Tool Call Response

The response won't always be a text message. Check for tool calls first:

const message = response.choices[0].message;

if (message.tool_calls) {
  for (const toolCall of message.tool_calls) {
    const toolName = toolCall.function.name;
    const toolArgs = JSON.parse(toolCall.function.arguments);

    // Execute the real function
    const result = await executeTool(toolName, toolArgs);

    // Add the result back to the conversation history
    messages.push(message); // Assistant's tool call request
    messages.push({
      role: "tool",
      tool_call_id: toolCall.id,
      content: JSON.stringify(result),
    });
  }

  // Get the model's response after seeing the tool result
  const followUp = await openai.chat.completions.create({
    model: "gpt-4-turbo",
    messages,
    tools,
  });
}

This is the tool execution loop — the foundation of every AI agent.

Validating Tool Arguments with Zod

Never trust the model's JSON arguments directly. Always validate:

import { z } from "zod";

const RescheduleSchema = z.object({
  leadId: z.string().min(1),
  datetime: z.string().datetime(),
  reason: z.enum(["busy", "not_interested", "requested_later"]).optional(),
});

async function executeTool(toolName, rawArgs) {
  if (toolName === "reschedule_call") {
    const parsed = RescheduleSchema.safeParse(rawArgs);
    if (!parsed.success) {
      // Return a structured error — don't throw
      return { success: false, error: "Invalid arguments", details: parsed.error.issues };
    }
    return await rescheduleCall(parsed.data);
  }
}

Returning errors as tool results (instead of throwing) keeps the agent loop running. The model sees the error and can self-correct.

5 Production Design Principles

1. Keep tools narrow and focused One function = one clear action. Never build a do_everything tool.

2. Write descriptions like they're for a confused junior dev The model uses your description to decide when and how to call the tool. Be explicit about when to use it AND when NOT to.

"description": "Reschedule the call ONLY when the lead explicitly asks for a callback at a later time. Do NOT use this if the lead is simply hesitant or asking for more info."

3. Validate all arguments with Zod before execution The model generates arguments — they can be malformed, missing, or wrong type.

4. Log every tool call with full arguments and results This is your audit trail and debugging lifeline. You cannot debug an agent without these logs.

5. Handle tool failures gracefully Return structured errors. Give the model a chance to retry or choose a different path. Don't crash the agent loop on a single tool failure.

Real Example: Calling Agent Tool Suite

In my AI voice platform, the agent has exactly 4 tools:

Tool	Trigger	Action
`mark_qualified`	Lead shows clear interest	Store score + flag for sales team
`mark_not_interested`	Lead declines	End call politely
`reschedule_call`	Lead asks for callback	Re-queue with datetime
`transfer_to_human`	Lead is hot or frustrated	Hand off via Twilio

Four tools. One job each. Zero ambiguity.

My AI engineering projects: buildbysandeep.dev/projects