How to Architect a Multi-Tenant AI SaaS Without Losing Your Mind

A technical deep-dive into multi-tenant AI SaaS architecture — shared DB with tenant isolation, Redis Bull with tenant namespacing, and per-tenant LLM rate limiting.

SP

Sandeep Prajapati

Full Stack Developer · Ambit Global

September 20, 20258 min read

Multi-tenant SaaS cloud infrastructure architecture diagram

The Question Nobody Asks Until It's Too Late

"How do we add a second customer?"

Building for one client is straightforward. Adding a second means every assumption you made — about databases, queues, rate limits, billing — gets stress-tested simultaneously.

When I built the AI campaign SaaS at my company, I had to design for multi-tenancy from day one. Here's exactly how I did it.


Option 1: Database Per Tenant

One MongoDB database per client. Clean separation, easy to backup individually, no risk of data leaks.

Problem: At 10 clients, you're managing 10 connection pools. At 50 clients, your Atlas bill triples. And migrations become a nightmare.


Option 2: Shared DB with Tenant Column (What I Chose)

Single database, every collection has a tenantId field. Every query is wrapped with a tenant filter.

// Utility: tenantQuery — always scopes queries to the correct tenant
const tenantQuery = (tenantId, additionalFilter = {}) => ({
  tenantId,
  ...additionalFilter,
});

// Usage in API routes
const campaigns = await Campaign.find(
  tenantQuery(req.user.tenantId, { status: "active" })
);

I created a middleware that extracts tenantId from the JWT and attaches it to every request:

export function tenantMiddleware(req, res, next) {
  const token = verifyJWT(req.headers.authorization);
  req.tenantId = token.tenantId;
  next();
}

Advantage: One DB, clean queries, easy horizontal scaling.

Risk: Forgetting the tenant filter. Mitigated with ESLint rules that flag any Collection.find() without tenantId.


Redis Bull Queues: Namespacing Per Tenant

Each tenant needs its own job queue — so a burst of 1000 leads from Tenant A doesn't starve Tenant B's queue.

function getCampaignQueue(tenantId) {
  // Creates/reuses a queue scoped to this tenant
  const queueKey = `campaign:${tenantId}`;
  if (!queueCache[queueKey]) {
    queueCache[queueKey] = new Bull(queueKey, { redis: redisConfig });
    queueCache[queueKey].process(10, processCampaignJob);
  }
  return queueCache[queueKey];
}

// When a campaign starts:
const queue = getCampaignQueue(tenantId);
await queue.add({ leadId, campaignId, tenantId });

Each tenant gets fair-share concurrency. Tenant A's 1000-lead burst doesn't affect Tenant B's 50-lead campaign.


Per-Tenant LLM Rate Limiting

OpenAI has global rate limits. If Tenant A blasts requests and hits the limit, Tenant B's calls fail too.

Solution: A rate limiter at the application layer, enforced per tenant before any OpenAI call:

const tenantLimiter = new Map(); // tenantId → token bucket

async function callLLM(tenantId, messages) {
  const limiter = getOrCreateLimiter(tenantId, {
    tokensPerMinute: 90000, // Per-tenant soft limit
  });

  await limiter.waitForCapacity(estimateTokens(messages));
  return openai.chat.completions.create({ messages });
}

This gave each tenant a fair share of the global token budget and prevented runaway tenants from causing cascading failures.


AI Prompt Isolation

The most subtle multi-tenant concern: prompt bleed. If Tenant A's system prompt somehow leaks into Tenant B's completions — that's a catastrophic data privacy issue.

My rules:

  1. System prompts are always fetched fresh per-request — never cached globally
  2. Conversation history is always fetched with tenantId filter
  3. No shared in-memory conversation state between tenants
async function buildAgentMessages(tenantId, campaignId, callHistory) {
  // Everything scoped to tenantId
  const campaign = await Campaign.findOne({ _id: campaignId, tenantId });
  const systemPrompt = campaign.agentConfig.systemPrompt;
  const history = await CallHistory.find({ callSid: callHistory.sid, tenantId });

  return [
    { role: "system", content: systemPrompt },
    ...history.map(formatHistoryEntry),
  ];
}

Multi-Tenancy Checklist

When adding any new feature, I run through this checklist:

  • Does every DB query include tenantId?
  • Does every Redis key include tenantId?
  • Does every job queue include tenantId in job data?
  • Does every log line include tenantId for debugging?
  • Does this feature respect per-tenant billing limits?

See the full AI platform architecture at buildbysandeep.dev

SP

Written by

Sandeep Prajapati

Full Stack Developer with 3+ years experience. Building enterprise AI systems, real-time platforms, and mobile apps. Currently at Ambit Global Solution.