Prompt Engineering Patterns That Actually Work in Production

7 prompt engineering patterns I use in production AI systems — three-layer prompts, chain of thought, Zod output constraints, persona anchoring, context compression, temperature tuning, and prompt versioning.

SP

Sandeep Prajapati

Full Stack Developer · Ambit Global

July 30, 20259 min read

Developer writing AI prompts at computer with code editor open

Stop Winging Your Prompts

Most developers treat prompt engineering as trial and error. Write something, see if it works, tweak it, repeat. That's fine for a demo.

In production — where the prompt runs 10,000 times a day and your business depends on it — you need patterns.

Here are the 7 patterns I use across every AI system I've built.


Pattern 1: The Three-Layer Prompt

Never put everything in one system message. Separate concerns:

const messages = [
  {
    role: "system",
    content: AGENT_PERSONA, // Who the agent is, its personality, rules
  },
  {
    role: "system",
    content: buildContextPrompt(lead, campaign), // Dynamic data for this request
  },
  ...conversationHistory, // The actual conversation
  {
    role: "user",
    content: currentUserMessage,
  },
];

Why it works: Each layer has a clear job. Persona is stable (rarely changes). Context is dynamic. History is append-only. This structure makes debugging 10x easier.


Pattern 2: Chain of Thought for Complex Decisions

For decisions with multiple conditions (should I qualify this lead, transfer, or reschedule?), ask the model to reason before answering:

const DECISION_PROMPT = `
Analyze this conversation and make a qualification decision.

Think step by step:
1. What signals of interest did the lead show?
2. Were there any objections? How were they handled?
3. Did the lead give a clear intent signal?

Then respond with JSON: { "reasoning": "...", "decision": "qualified|not_interested|reschedule", "score": 1-10 }
`;

The reasoning field — even though you might not show it to users — dramatically improves decision accuracy.


Pattern 3: Force Structured Output with Zod

Never parse free-form text when you need structured data. Force JSON output and validate it:

import { z } from "zod";

const schema = z.object({
  response: z.string().min(1).max(300),
  leadScore: z.number().int().min(1).max(10),
  nextAction: z.enum(["continue", "qualify", "reschedule", "transfer", "end"]),
});

const raw = await openai.chat.completions.create({
  model: "gpt-4-turbo",
  messages,
  response_format: { type: "json_object" }, // Force JSON mode
});

const parsed = schema.safeParse(JSON.parse(raw.choices[0].message.content));
if (!parsed.success) {
  return getDefaultAgentResponse(); // Graceful fallback
}

Pattern 4: Persona Anchoring with DO / DON'T

Vague personas produce inconsistent agents. Be explicit with concrete examples:

You are Alex, a friendly and professional sales development representative for [Company].

DO:
- Start every call by introducing yourself clearly: "Hi, this is Alex from [Company]..."
- Use the lead's name when you have it
- Ask one question at a time
- Sound natural and human — use contractions ("I'm" not "I am")

DON'T:
- Reveal that you are an AI unless directly asked
- Make promises about pricing or delivery times
- Keep talking if the lead says "not interested" — acknowledge and end professionally
- Use filler phrases like "Certainly!" or "Absolutely!" — they sound robotic

DO/DON'T lists outperform abstract instructions every time.


Pattern 5: Context Compression (Sliding Window)

Conversation history grows with every turn. At 50 tokens per turn and 20 turns, you've spent 1000 tokens just on history — and it keeps growing.

My solution: Keep the last 4 turns verbatim, and compress earlier turns into a summary:

function compressHistory(history, maxRecentTurns = 4) {
  if (history.length <= maxRecentTurns) return history;

  const old = history.slice(0, history.length - maxRecentTurns);
  const recent = history.slice(-maxRecentTurns);

  const summary = `[Earlier conversation summary: ${summarize(old)}]`;

  return [
    { role: "system", content: summary },
    ...recent,
  ];
}

This keeps context cost bounded without losing critical information.


Pattern 6: Temperature Tuning Table

Temperature isn't a vibe — it's a parameter that controls output randomness. Use it intentionally:

Use CaseTemperatureReasoning
Lead qualification decisions0.1Needs consistency
Script-following AI agent0.2Some flexibility, mostly consistent
Customer support responses0.4Natural variation but accurate
Creative email subject lines0.8Explore diverse options
Brainstorming, ideation1.0Max creativity

In Calling Agent, the agent runs at 0.2 during calls. The post-call summary runs at 0.5 for more natural language.


Pattern 7: Prompt Versioning in the Database

Never hardcode prompts in your source code. Store them in your database:

// MongoDB schema
const PromptSchema = new Schema({
  name: String,          // "agent_system_prompt"
  version: Number,       // 14
  content: String,       // The actual prompt text
  isActive: Boolean,     // Only one version active at a time
  createdAt: Date,
  metrics: {
    qualificationRate: Number,
    avgScore: Number,
    callsRun: Number,
  },
});

// Fetch active prompt at runtime
const prompt = await Prompt.findOne({ name: "agent_system_prompt", isActive: true });

Benefits:

  • A/B test prompts without deploying code
  • Roll back a bad prompt in seconds
  • Track performance metrics per prompt version
  • Non-technical teammates can update prompts

All production AI patterns from my portfolio: buildbysandeep.dev

SP

Written by

Sandeep Prajapati

Full Stack Developer with 3+ years experience. Building enterprise AI systems, real-time platforms, and mobile apps. Currently at Ambit Global Solution.