How to Architect a Multi-Tenant AI SaaS Without Losing Your Mind

Multi-tenant SaaS cloud infrastructure architecture diagram

The Question Nobody Asks Until It's Too Late

"How do we add a second customer?"

Building for one client is straightforward. Adding a second means every assumption you made — about databases, queues, rate limits, billing — gets stress-tested simultaneously.

When I built the AI campaign SaaS at my company, I had to design for multi-tenancy from day one. Here's exactly how I did it.

Option 1: Database Per Tenant

One MongoDB database per client. Clean separation, easy to backup individually, no risk of data leaks.

Problem: At 10 clients, you're managing 10 connection pools. At 50 clients, your Atlas bill triples. And migrations become a nightmare.

Option 2: Shared DB with Tenant Column (What I Chose)

Single database, every collection has a tenantId field. Every query is wrapped with a tenant filter.

// Utility: tenantQuery — always scopes queries to the correct tenant
const tenantQuery = (tenantId, additionalFilter = {}) => ({
  tenantId,
  ...additionalFilter,
});

// Usage in API routes
const campaigns = await Campaign.find(
  tenantQuery(req.user.tenantId, { status: "active" })
);

I created a middleware that extracts tenantId from the JWT and attaches it to every request:

export function tenantMiddleware(req, res, next) {
  const token = verifyJWT(req.headers.authorization);
  req.tenantId = token.tenantId;
  next();
}

Advantage: One DB, clean queries, easy horizontal scaling.

Risk: Forgetting the tenant filter. Mitigated with ESLint rules that flag any Collection.find() without tenantId.

Redis Bull Queues: Namespacing Per Tenant

Each tenant needs its own job queue — so a burst of 1000 leads from Tenant A doesn't starve Tenant B's queue.

function getCampaignQueue(tenantId) {
  // Creates/reuses a queue scoped to this tenant
  const queueKey = `campaign:${tenantId}`;
  if (!queueCache[queueKey]) {
    queueCache[queueKey] = new Bull(queueKey, { redis: redisConfig });
    queueCache[queueKey].process(10, processCampaignJob);
  }
  return queueCache[queueKey];
}

// When a campaign starts:
const queue = getCampaignQueue(tenantId);
await queue.add({ leadId, campaignId, tenantId });

Each tenant gets fair-share concurrency. Tenant A's 1000-lead burst doesn't affect Tenant B's 50-lead campaign.

Per-Tenant LLM Rate Limiting

OpenAI has global rate limits. If Tenant A blasts requests and hits the limit, Tenant B's calls fail too.

Solution: A rate limiter at the application layer, enforced per tenant before any OpenAI call:

const tenantLimiter = new Map(); // tenantId → token bucket

async function callLLM(tenantId, messages) {
  const limiter = getOrCreateLimiter(tenantId, {
    tokensPerMinute: 90000, // Per-tenant soft limit
  });

  await limiter.waitForCapacity(estimateTokens(messages));
  return openai.chat.completions.create({ messages });
}

This gave each tenant a fair share of the global token budget and prevented runaway tenants from causing cascading failures.

AI Prompt Isolation

The most subtle multi-tenant concern: prompt bleed. If Tenant A's system prompt somehow leaks into Tenant B's completions — that's a catastrophic data privacy issue.

My rules:

System prompts are always fetched fresh per-request — never cached globally
Conversation history is always fetched with tenantId filter
No shared in-memory conversation state between tenants

async function buildAgentMessages(tenantId, campaignId, callHistory) {
  // Everything scoped to tenantId
  const campaign = await Campaign.findOne({ _id: campaignId, tenantId });
  const systemPrompt = campaign.agentConfig.systemPrompt;
  const history = await CallHistory.find({ callSid: callHistory.sid, tenantId });

  return [
    { role: "system", content: systemPrompt },
    ...history.map(formatHistoryEntry),
  ];
}

Multi-Tenancy Checklist

When adding any new feature, I run through this checklist:

Does every DB query include tenantId?
Does every Redis key include tenantId?
Does every job queue include tenantId in job data?
Does every log line include tenantId for debugging?
Does this feature respect per-tenant billing limits?

See the full AI platform architecture at buildbysandeep.dev