Generative AI in Production: Lessons From Building 3 AI Products

5 hard lessons from shipping 3 AI products in 18 months — prompt versioning, latency, cost control, hallucination mitigation, and observability.

SP

Sandeep Prajapati

Full Stack Developer · Ambit Global

October 28, 20259 min read

Generative AI system architecture with LLM APIs, caching, and monitoring layers

Everyone's Building AI. Few Are Running It in Production.

There's a massive gap between "I made a ChatGPT wrapper" and "I run an AI product serving real users."

I've built three AI products in the last 18 months:

  1. Calling Agent — AI voice campaign platform (1000+ leads/day)
  2. AI Agent Builder — Drag-and-drop platform to deploy custom AI agents
  3. AI Integration Layer — LLM-powered customer interaction system

Here's everything I've learned that nobody writes about.


Lesson 1: Prompt Engineering Is Software Engineering

Your prompt is not a note you leave for the AI. It's code. Treat it that way.

What this means practically:

  • Version control your prompts — I store prompts in the database with version IDs
  • Test prompts like unit tests — Run the same inputs against new prompt versions before deploying
  • Separate concerns — System prompt (persona/rules), context prompt (dynamic data), user message
const buildPrompt = (agent, lead, history) => ({
  system: `${agent.persona}\n\nRules:\n${agent.rules.join("\n")}`,
  context: `Lead Info: ${JSON.stringify(lead)}\nObjective: ${agent.objective}`,
  history: compressHistory(history, MAX_TOKENS),
});

Lesson 2: Latency Will Kill Your UX

GPT-4 Turbo averages 2–4 seconds for a full completion. In a voice call, 4 seconds of silence feels like a dropped call.

What I did:

  • Streaming responses — Start speaking as tokens arrive, don't wait for the full completion
  • Parallel API calls — Run sentiment analysis and lead scoring in parallel, not sequentially
  • Cache common responses — Greetings, objection handlers, closing lines — cache them in Redis
// Streaming reduces perceived latency dramatically
const stream = await openai.chat.completions.create({
  model: "gpt-4-turbo",
  messages: prompt,
  stream: true,
});

for await (const chunk of stream) {
  const token = chunk.choices[0]?.delta?.content;
  if (token) sendToTwilioStream(token); // Speak tokens as they arrive
}

Lesson 3: Cost Will Surprise You

GPT-4 Turbo at $0.01/1K input tokens sounds cheap. At 1000 calls/day with 500 tokens per turn and 5 turns per call — that's $25/day just in LLM costs, not counting Twilio, Redis, or hosting.

Optimizations I implemented:

  • GPT-3.5 for simple turns (greetings, confirmations), GPT-4 only for complex reasoning
  • Token budgeting — Hard limit on context window per call
  • Batching — Group non-urgent API calls

Cost dropped 60% without a noticeable quality difference.


Lesson 4: Hallucinations Are a Product Problem, Not Just an AI Problem

When an AI agent gives wrong information to a real customer, that's a business risk.

My mitigation strategy:

  • Constrain the output — Force JSON responses for structured data
  • Verification layer — Run a second prompt to check if the first response contradicts known facts
  • Confidence scoring — If the model's logprobs indicate uncertainty, fall back to a safe default

Lesson 5: Observability Is Non-Negotiable

You cannot debug an AI product without logs. I built a logging layer that captured:

  • Every prompt sent (with version ID)
  • Every completion received
  • Latency per API call
  • Token usage per session
  • Final outcome (qualified / not qualified / transferred)

This data revealed that 40% of failed calls happened because the agent didn't handle "call me later" variants correctly — a fixable prompt issue invisible without logging.


The Stack That Works

LLM:         OpenAI GPT-4 Turbo (reasoning) + GPT-3.5 (simple turns)
Voice:       Twilio Programmable Voice + Deepgram STT
Queue:       Redis Bull (rate limiting + retries)
Real-time:   Socket.io (WebSocket events to dashboard)
Storage:     MongoDB (conversations) + S3 (recordings)
Frontend:    Next.js + Tailwind CSS
Monitoring:  Custom dashboard + Sentry

See all my AI projects: buildbysandeep.dev/projects

SP

Written by

Sandeep Prajapati

Full Stack Developer with 3+ years experience. Building enterprise AI systems, real-time platforms, and mobile apps. Currently at Ambit Global Solution.