Stop Winging Your Prompts
Most developers treat prompt engineering as trial and error. Write something, see if it works, tweak it, repeat. That's fine for a demo.
In production — where the prompt runs 10,000 times a day and your business depends on it — you need patterns.
Here are the 7 patterns I use across every AI system I've built.
Pattern 1: The Three-Layer Prompt
Never put everything in one system message. Separate concerns:
const messages = [
{
role: "system",
content: AGENT_PERSONA, // Who the agent is, its personality, rules
},
{
role: "system",
content: buildContextPrompt(lead, campaign), // Dynamic data for this request
},
...conversationHistory, // The actual conversation
{
role: "user",
content: currentUserMessage,
},
];
Why it works: Each layer has a clear job. Persona is stable (rarely changes). Context is dynamic. History is append-only. This structure makes debugging 10x easier.
Pattern 2: Chain of Thought for Complex Decisions
For decisions with multiple conditions (should I qualify this lead, transfer, or reschedule?), ask the model to reason before answering:
const DECISION_PROMPT = `
Analyze this conversation and make a qualification decision.
Think step by step:
1. What signals of interest did the lead show?
2. Were there any objections? How were they handled?
3. Did the lead give a clear intent signal?
Then respond with JSON: { "reasoning": "...", "decision": "qualified|not_interested|reschedule", "score": 1-10 }
`;
The reasoning field — even though you might not show it to users — dramatically improves decision accuracy.
Pattern 3: Force Structured Output with Zod
Never parse free-form text when you need structured data. Force JSON output and validate it:
import { z } from "zod";
const schema = z.object({
response: z.string().min(1).max(300),
leadScore: z.number().int().min(1).max(10),
nextAction: z.enum(["continue", "qualify", "reschedule", "transfer", "end"]),
});
const raw = await openai.chat.completions.create({
model: "gpt-4-turbo",
messages,
response_format: { type: "json_object" }, // Force JSON mode
});
const parsed = schema.safeParse(JSON.parse(raw.choices[0].message.content));
if (!parsed.success) {
return getDefaultAgentResponse(); // Graceful fallback
}
Pattern 4: Persona Anchoring with DO / DON'T
Vague personas produce inconsistent agents. Be explicit with concrete examples:
You are Alex, a friendly and professional sales development representative for [Company].
DO:
- Start every call by introducing yourself clearly: "Hi, this is Alex from [Company]..."
- Use the lead's name when you have it
- Ask one question at a time
- Sound natural and human — use contractions ("I'm" not "I am")
DON'T:
- Reveal that you are an AI unless directly asked
- Make promises about pricing or delivery times
- Keep talking if the lead says "not interested" — acknowledge and end professionally
- Use filler phrases like "Certainly!" or "Absolutely!" — they sound robotic
DO/DON'T lists outperform abstract instructions every time.
Pattern 5: Context Compression (Sliding Window)
Conversation history grows with every turn. At 50 tokens per turn and 20 turns, you've spent 1000 tokens just on history — and it keeps growing.
My solution: Keep the last 4 turns verbatim, and compress earlier turns into a summary:
function compressHistory(history, maxRecentTurns = 4) {
if (history.length <= maxRecentTurns) return history;
const old = history.slice(0, history.length - maxRecentTurns);
const recent = history.slice(-maxRecentTurns);
const summary = `[Earlier conversation summary: ${summarize(old)}]`;
return [
{ role: "system", content: summary },
...recent,
];
}
This keeps context cost bounded without losing critical information.
Pattern 6: Temperature Tuning Table
Temperature isn't a vibe — it's a parameter that controls output randomness. Use it intentionally:
| Use Case | Temperature | Reasoning |
|---|---|---|
| Lead qualification decisions | 0.1 | Needs consistency |
| Script-following AI agent | 0.2 | Some flexibility, mostly consistent |
| Customer support responses | 0.4 | Natural variation but accurate |
| Creative email subject lines | 0.8 | Explore diverse options |
| Brainstorming, ideation | 1.0 | Max creativity |
In Calling Agent, the agent runs at 0.2 during calls. The post-call summary runs at 0.5 for more natural language.
Pattern 7: Prompt Versioning in the Database
Never hardcode prompts in your source code. Store them in your database:
// MongoDB schema
const PromptSchema = new Schema({
name: String, // "agent_system_prompt"
version: Number, // 14
content: String, // The actual prompt text
isActive: Boolean, // Only one version active at a time
createdAt: Date,
metrics: {
qualificationRate: Number,
avgScore: Number,
callsRun: Number,
},
});
// Fetch active prompt at runtime
const prompt = await Prompt.findOne({ name: "agent_system_prompt", isActive: true });
Benefits:
- A/B test prompts without deploying code
- Roll back a bad prompt in seconds
- Track performance metrics per prompt version
- Non-technical teammates can update prompts
All production AI patterns from my portfolio: buildbysandeep.dev