Postgres Schema Design for SaaS: Patterns That Age Well

Q: Should I use UUIDs or auto-incrementing integers for SaaS primary keys?

Use bigint identity columns for internal tables, since they are compact and index tightly. Use UUIDv7 for anything exposed in URLs or APIs because it is opaque, non-enumerable, and time-ordered to preserve index locality. Avoid random UUIDv4 as a primary key on high-write tables, as random inserts fragment B-tree indexes and increase write amplification.

Q: Are soft deletes worth the complexity?

For aggregate roots like accounts and documents, yes, they enable undo, support recovery, and audit trails. Use a nullable deleted_at timestamptz, not a boolean, and add partial unique indexes filtered on deleted_at IS NULL. Hard-delete cheap leaf rows instead. Pair soft deletes with a separate purge job so retention deadlines and erasure requests are honored.

Q: How do I change a Postgres schema without downtime?

Use the expand, migrate, contract pattern across three deploys: add the new shape as nullable, backfill in batches while dual-writing, then drop the old shape. Create indexes CONCURRENTLY, add constraints as NOT VALID then VALIDATE separately, and set a lock_timeout so a stuck migration fails fast instead of freezing the table.

Most SaaS schemas don't fail on day one. They fail in year two, when a "quick" column rename locks a hot table for nine minutes, when a deleted account turns out to have left orphaned rows everywhere, or when a tenant sees another tenant's invoice. Postgres is forgiving enough to let you ship a sloppy schema fast, and unforgiving enough to make you pay interest on it forever. This post collects the patterns we reach for at CodeAustral when we design a schema we expect to live for a decade — pragmatic defaults, not academic purity.

Choose IDs You Won't Regret

Your primary key shows up in URLs, logs, foreign keys, and every join. Changing it later is a migration nightmare, so decide deliberately.

Bigint identity (GENERATED ALWAYS AS IDENTITY) is the boring, correct default for internal tables. It's compact, sorts naturally, and indexes tightly. Use it when the ID never leaves your trust boundary.
UUIDv7 is our default for anything exposed externally (API resources, public URLs, sync clients). Unlike random UUIDv4, v7 is time-ordered, so it doesn't shred B-tree locality or bloat your indexes with random inserts. Postgres 18 ships uuidv7() natively; on older versions use an extension or generate it in the app.
Avoid exposing sequential bigints in public URLs — they leak volume and enable enumeration. Avoid UUIDv4 as a primary key on high-write tables for the index-locality reasons above.

A common compromise: a bigint id as the internal PK and a separate public_id uuid (v7) for the API. You get tight internal joins and an opaque external handle.

create table invoices (
  id          bigint generated always as identity primary key,
  public_id   uuid not null default uuidv7(),
  account_id  bigint not null references accounts(id),
  ...
);
create unique index invoices_public_id_key on invoices (public_id);

Soft Deletes Without the Footguns

Most SaaS products need "deleted but recoverable" semantics — for undo, support recovery, and audit. The naive is_deleted boolean is the wrong shape. Use a nullable timestamp instead:

deleted_at timestamptz

deleted_at IS NULL means live; a value means when it died. This gives you the retention window for free and reads more honestly.

The real footgun is that soft deletes quietly break uniqueness and foreign keys. Two patterns keep you honest:

create unique index users_email_live on users (email) where deleted_at is null;

Partial unique indexes so a "deleted" row doesn't block re-creating a live one:
A single filtered view or a query helper that every read path goes through, so nobody forgets the deleted_at is null predicate. Forgetting it once in a billing query is how customers get charged for cancelled subscriptions.

Be disciplined about scope: soft-delete the aggregate roots (accounts, projects, documents), hard-delete the cheap leaf rows. Soft-deleting everything turns every table into a graveyard you must filter forever. And when a real deletion deadline arrives (GDPR, contracts), have a separate hard-delete job that purges past the retention window — soft delete is not erasure.

Audit Columns Every Table Should Have

Add these to essentially every table from the start. Retrofitting them later means backfilling nulls and guessing at history.

created_at timestamptz not null default now()
updated_at timestamptz not null default now() — kept current by a trigger, not by trusting the app
created_by / updated_by referencing your actor (user or service), where it matters

Always use timestamptz, never timestamp. timestamp without time zone is a trap that silently stores wall-clock values and produces off-by-hours bugs the first time someone connects from a different zone. Store everything in UTC at the database boundary.

A one-time trigger keeps updated_at truthful regardless of which code path wrote the row:

create or replace function set_updated_at() returns trigger
language plpgsql as $$
begin
  new.updated_at = now();
  return new;
end $$;

create trigger trg_set_updated_at
  before update on invoices
  for each row execute function set_updated_at();

For tables where you need full change history — money movements, permission changes, anything compliance cares about — go beyond audit columns to an append-only audit log table or a temporal pattern. Don't try to reconstruct history from updated_at alone; it only ever tells you about the most recent write.

JSONB vs Columns: A Decision List

JSONB is the most overused feature in SaaS schemas. It's excellent in a narrow band and a liability outside it. Decide per field, not per table.

Reach for real columns when:

The field is queried, filtered, sorted, or joined on.
It has a constraint, a foreign key, or a default that matters.
It's part of the domain you'd describe out loud ("an invoice has a status and a total").
You'd want a migration to be visible in code review when its shape changes.

Reach for JSONB when:

The shape is genuinely open-ended or per-tenant (custom fields, form responses, third-party webhook payloads you store verbatim).
You're capturing a raw external document for audit and may never query inside it.
The data is read as a whole blob, not sliced.

The trap is using JSONB to dodge migrations on core domain data. Six months later you're writing data->>'status' = 'paid' everywhere, you can't add a foreign key, your indexes are awkward GIN indexes, and a typo'd key fails silently instead of erroring. If you do query inside JSONB, index it deliberately — a GIN index for containment queries, or an expression index on the specific extracted path you filter on:

create index invoices_meta_source on invoices ((metadata->>'source'));

A healthy schema is mostly typed columns with one or two metadata jsonb escape hatches for the genuinely unstructured parts.

Indexes That Match How You Actually Query

Indexes are where good schemas earn their keep. A few rules that age well:

Index every foreign key column. Postgres does not do this automatically, and an unindexed FK turns parent deletes and joins into sequential scans. This is the single most common missing index we find in audits.
Match composite index column order to your query predicates. An index on (account_id, created_at) serves "this account's recent rows" beautifully; reversed, it doesn't.
Use partial indexes for skewed queries. If 99% of rows are status = 'archived' and you only query the active 1%, where status <> 'archived' makes the index tiny and fast.
Add covering indexes (`INCLUDE`) for hot read paths so the query returns from the index without touching the heap.
Always create indexes `CONCURRENTLY` in production. A plain CREATE INDEX takes an exclusive lock and blocks writes for the duration on a large table.

Resist the urge to index everything. Every index slows writes and consumes space. Index for the queries you measurably run, drop the ones pg_stat_user_indexes shows are never scanned, and revisit after real traffic rather than guessing up front.

Migration Discipline That Survives Growth

Schema quality is mostly a process problem, not a design problem. The patterns below keep migrations from becoming outages.

Expand, migrate, contract

Never do a breaking schema change in one step on a live system. Split it into three deploys:

Expand — add the new column/table, nullable, with no constraint. Backwards compatible.
Migrate — backfill data in batches and start dual-writing from the app.
Contract — once everything reads the new shape, drop the old column and tighten constraints.

This is how you rename a column or change a type without a maintenance window.

Know which operations lock

Postgres has made many DDL operations cheap, but the sharp edges remain:

Adding a column with a volatile default or rewriting a type still rewrites the whole table under a heavy lock. Adding a nullable column, or one with a constant default, is fast (metadata-only since PG 11).
Adding a NOT NULL constraint scans the table. Add it as NOT VALID first, then VALIDATE CONSTRAINT separately, which takes a weaker lock.
Set lock_timeout in your migration sessions so a migration that can't get its lock fails fast instead of queueing behind it and freezing the whole table.

Treat migrations as forward-only, reviewed code

Migrations live in version control, run in CI against a production-like copy, and are never edited after they ship. We don't rely on down migrations in production — rolling forward with a new migration is safer than reversing one. Keep each migration small and single-purpose so a failure is easy to reason about.

Multi-Tenancy and Row-Level Security

For B2B SaaS, the most expensive bug class is cross-tenant data leakage. Decide your isolation model early.

Shared schema, `tenant_id` column — simplest, scales well, and what we use by default. Every tenant-scoped table carries a tenant_id, and it's the leading column of most indexes.
Schema-per-tenant — stronger isolation, painful past a few hundred tenants (migrations fan out across every schema).
Database-per-tenant — for heavily regulated or very large customers only.

With the shared-schema model, Row-Level Security (RLS) is your safety net, not your only line of defense. The application should still scope queries by tenant_id; RLS guarantees that a forgotten WHERE clause can't leak data anyway.

alter table invoices enable row level security;

create policy tenant_isolation on invoices
  using (tenant_id = current_setting('app.tenant_id')::bigint);

Your connection layer sets app.tenant_id per request (SET LOCAL inside the transaction). Two cautions: the role that owns the table bypasses RLS unless you also set FORCE ROW LEVEL SECURITY, so your application should connect as a non-owner role; and test the policies as part of your suite, because a wrong policy fails open in ways that are easy to miss.

A Schema Checklist That Ages Well

Before a new table ships, we run it past a short list:

IDs decided: bigint internal, UUIDv7 for anything public.
created_at / updated_at (timestamptz) present, updated_at trigger attached.
deleted_at where recoverability matters; partial unique indexes account for it.
Every foreign key column indexed; composite indexes ordered to real queries.
JSONB used only for genuinely unstructured data, with a deliberate index if queried.
tenant_id present and RLS policy written for tenant-scoped tables.
The migration is expand/contract-safe and uses CONCURRENTLY / NOT VALID where needed.

None of this is exotic. It's the difference between a schema you extend calmly for years and one you're afraid to touch.

Frequently Asked Questions

Should I use UUIDs or auto-incrementing integers for SaaS primary keys?

Use bigint identity columns for internal tables — they're compact and index tightly. Use UUIDv7 for anything exposed in URLs or APIs, since it's opaque, non-enumerable, and time-ordered to preserve index locality. Avoid random UUIDv4 as a primary key on high-write tables, as random inserts fragment B-tree indexes and increase write amplification.

Are soft deletes worth the complexity?

For aggregate roots like accounts and documents, yes — they enable undo, support recovery, and audit trails. Use a nullable deleted_at timestamptz, not a boolean, and add partial unique indexes filtered on deleted_at IS NULL. Hard-delete cheap leaf rows instead. Pair soft deletes with a separate purge job so retention deadlines and erasure requests are actually honored.

When should I store data in JSONB instead of columns?

Use JSONB only for genuinely open-ended data: per-tenant custom fields, raw webhook payloads, or form responses read as a whole. Use typed columns for anything you filter, sort, join, or constrain on. Using JSONB to avoid migrations on core domain data backfires — you lose foreign keys, get awkward indexing, and replace loud schema errors with silent typos.

How do I change a Postgres schema without downtime?

Use the expand/migrate/contract pattern across three deploys: add the new shape as nullable, backfill in batches while dual-writing, then drop the old shape. Create indexes CONCURRENTLY, add constraints as NOT VALID then VALIDATE separately, and set a lock_timeout so a stuck migration fails fast instead of freezing the table.

Is row-level security enough for multi-tenant isolation?

RLS is a safety net, not your only defense. Keep scoping queries by tenant_id in the application; RLS ensures a forgotten WHERE clause can't leak across tenants. Connect as a non-owner role (owners bypass RLS unless you FORCE ROW LEVEL SECURITY), set the tenant context with SET LOCAL per transaction, and test policies, since a wrong policy fails open.

Working with CodeAustral

We design and build SaaS platforms where the schema has to survive real growth — multi-tenant billing systems, AI products, and restaurant tech among them. If you're starting a new product or wrestling with a schema that's becoming hard to change, send us a brief at codeaustral.com/contact and we'll help you get the foundations right before they calcify.