Prompt Engineering Patterns That Actually Work in Production

Developer writing AI prompts at computer with code editor open

Stop Winging Your Prompts

Most developers treat prompt engineering as trial and error. Write something, see if it works, tweak it, repeat. That's fine for a demo.

In production — where the prompt runs 10,000 times a day and your business depends on it — you need patterns.

Here are the 7 patterns I use across every AI system I've built.

Pattern 1: The Three-Layer Prompt

Never put everything in one system message. Separate concerns:

const messages = [
  {
    role: "system",
    content: AGENT_PERSONA, // Who the agent is, its personality, rules
  },
  {
    role: "system",
    content: buildContextPrompt(lead, campaign), // Dynamic data for this request
  },
  ...conversationHistory, // The actual conversation
  {
    role: "user",
    content: currentUserMessage,
  },
];

Why it works: Each layer has a clear job. Persona is stable (rarely changes). Context is dynamic. History is append-only. This structure makes debugging 10x easier.

Pattern 2: Chain of Thought for Complex Decisions

For decisions with multiple conditions (should I qualify this lead, transfer, or reschedule?), ask the model to reason before answering:

const DECISION_PROMPT = `
Analyze this conversation and make a qualification decision.

Think step by step:
1. What signals of interest did the lead show?
2. Were there any objections? How were they handled?
3. Did the lead give a clear intent signal?

Then respond with JSON: { "reasoning": "...", "decision": "qualified|not_interested|reschedule", "score": 1-10 }
`;

The reasoning field — even though you might not show it to users — dramatically improves decision accuracy.

Pattern 3: Force Structured Output with Zod

Never parse free-form text when you need structured data. Force JSON output and validate it:

import { z } from "zod";

const schema = z.object({
  response: z.string().min(1).max(300),
  leadScore: z.number().int().min(1).max(10),
  nextAction: z.enum(["continue", "qualify", "reschedule", "transfer", "end"]),
});

const raw = await openai.chat.completions.create({
  model: "gpt-4-turbo",
  messages,
  response_format: { type: "json_object" }, // Force JSON mode
});

const parsed = schema.safeParse(JSON.parse(raw.choices[0].message.content));
if (!parsed.success) {
  return getDefaultAgentResponse(); // Graceful fallback
}

Pattern 4: Persona Anchoring with DO / DON'T

Vague personas produce inconsistent agents. Be explicit with concrete examples:

You are Alex, a friendly and professional sales development representative for [Company].

DO:
- Start every call by introducing yourself clearly: "Hi, this is Alex from [Company]..."
- Use the lead's name when you have it
- Ask one question at a time
- Sound natural and human — use contractions ("I'm" not "I am")

DON'T:
- Reveal that you are an AI unless directly asked
- Make promises about pricing or delivery times
- Keep talking if the lead says "not interested" — acknowledge and end professionally
- Use filler phrases like "Certainly!" or "Absolutely!" — they sound robotic

DO/DON'T lists outperform abstract instructions every time.

Pattern 5: Context Compression (Sliding Window)

Conversation history grows with every turn. At 50 tokens per turn and 20 turns, you've spent 1000 tokens just on history — and it keeps growing.

My solution: Keep the last 4 turns verbatim, and compress earlier turns into a summary:

function compressHistory(history, maxRecentTurns = 4) {
  if (history.length <= maxRecentTurns) return history;

  const old = history.slice(0, history.length - maxRecentTurns);
  const recent = history.slice(-maxRecentTurns);

  const summary = `[Earlier conversation summary: ${summarize(old)}]`;

  return [
    { role: "system", content: summary },
    ...recent,
  ];
}

This keeps context cost bounded without losing critical information.

Pattern 6: Temperature Tuning Table

Temperature isn't a vibe — it's a parameter that controls output randomness. Use it intentionally:

Use Case	Temperature	Reasoning
Lead qualification decisions	0.1	Needs consistency
Script-following AI agent	0.2	Some flexibility, mostly consistent
Customer support responses	0.4	Natural variation but accurate
Creative email subject lines	0.8	Explore diverse options
Brainstorming, ideation	1.0	Max creativity

In Calling Agent, the agent runs at 0.2 during calls. The post-call summary runs at 0.5 for more natural language.

Pattern 7: Prompt Versioning in the Database

Never hardcode prompts in your source code. Store them in your database:

// MongoDB schema
const PromptSchema = new Schema({
  name: String,          // "agent_system_prompt"
  version: Number,       // 14
  content: String,       // The actual prompt text
  isActive: Boolean,     // Only one version active at a time
  createdAt: Date,
  metrics: {
    qualificationRate: Number,
    avgScore: Number,
    callsRun: Number,
  },
});

// Fetch active prompt at runtime
const prompt = await Prompt.findOne({ name: "agent_system_prompt", isActive: true });

Benefits:

A/B test prompts without deploying code
Roll back a bad prompt in seconds
Track performance metrics per prompt version
Non-technical teammates can update prompts

All production AI patterns from my portfolio: buildbysandeep.dev