Flux v2 architecture

This page explains the architecture of Flux v2 shared (pooled) projects. It is the deepest page in the docs.

If you only want to make your first successful request, start with:

Return here once you want the deeper system model — the invariants the gateway protects, what the bridge JWT actually carries, where tenant isolation begins and ends, and what Flux deliberately does not promise.

What you will learn

  • The mental model: pooled infrastructure with logically isolated tenants
  • What your application sees, and what it does not (no static keys on v2)
  • How a request travels from your app to a tenant schema, step by step
  • Where the trust pivot is, and why one component is a security control
  • The operational behavior — rate limits, scaling, PgBouncer constraints — that shapes how v2 behaves under load
  • What v2 explicitly does not promise, and when v1 dedicated is the better fit

1. Mental model

Flux v2 is:

  • Pooled infrastructure — many projects share one PostgreSQL cluster and one PostgREST pool.
  • Schema-isolated tenants — each project owns a deterministic Postgres schema and a deterministic Postgres role; nothing in those names is derived from your slug.
  • JWT-authenticated requests — there are no anonymous or service keys exposed on the public API. Your application sends a JWT signed with the project's secret; the Flux gateway verifies it.
  • PostgREST-backed APIs — your tables become HTTP resources via PostgREST. The contract is HTTP and PostgreSQL, not a Flux-proprietary client SDK.
  • Operationally shared, logically isolated — the cluster and the pool are shared. The data, the role, and the schema are not.

The whole architecture exists to make those five sentences true at the same time.

What this page leaves for later

The first half of this page does not mention PgBouncer, Traefik, internal pool sharding, Redis cache keys, or operational scaling. Those exist and matter, but the reader who needs them will have read the mental model first. They appear from §6 Internal architecture onward.

2. What your app sees

Your app has three points of contact with v2:

  1. The Service URL. A public HTTPS hostname for your project, of the form https://api--<slug>--<hash>.<base-domain>. This is where every request goes. See Service URLs.
  2. A JWT in the Authorization header. Your app signs an HS256 JWT with the project's jwt_secret (or asks an IdP like Clerk or Auth.js to sign one with that secret). Every request carries Authorization: Bearer <token>. See JWT authentication.
  3. JSON in and out, via PostgREST. Tables become resources at /<table>; query parameters filter, sort, and project. The shape is PostgREST's published interface, not something Flux wraps.

What your app does not see:

  • No static anon or service keys on the public internet. Pooled stacks do not expose long-lived database-equivalent secrets to browsers. The trust pivot is the gateway, not a key embedded in your bundle.
  • No PostgREST hostname. PostgREST is not publicly reachable on v2. Only the gateway has a network path to it.
  • No internal JWT. The token your app signs is the only one your app handles. The gateway issues a separate, short-lived bridge JWT for the internal hop to PostgREST; you never see it.

That last difference is the one most readers coming from other BaaS platforms get wrong. There is no anon key to leak.

3. Authentication flow

A complete v2 request, end to end:

txt
Your app  ↓ Authorization: Bearer <project JWT, HS256 with jwt_secret>Flux gateway  ├─ resolves Host header to a tenant_id (system DB; Redis as read-through cache)  ├─ verifies the project JWT against the project's jwt_secret  ├─ enforces the per-tenant rate limit (Redis, fail-open)  └─ mints a short-lived bridge JWT { role, tenant_id, exp }  ↓ private network onlyPostgREST pool  ├─ verifies the bridge JWT against PGRST_JWT_SECRET  └─ SET ROLE <role from the bridge JWT> on the connectionPgBouncer (transaction pooling)PostgreSQL shared cluster  └─ search_path is t_<shortid>_api; only that schema is visible to this role

Three signing keys are in play:

SignsKey materialHeld by
Project JWT (your app's token)The project's jwt_secret (per-project, in flux-system)Your app, the gateway
Bridge JWT (gateway → PostgREST)The pool's PGRST_JWT_SECRETThe gateway, PostgREST pool
Admin / service tokens (control plane)A separate signing keyControl plane only — never on the public path

Apps sign with the per-project secret. Apps never see the pool secret. The two are deliberately separate so that compromising one tenant's secret does not let an attacker forge tokens accepted by the pool, and so that the pool can be rotated independently of every tenant.

For the security posture this implies, see Authentication model and Bridge JWTs.

4. Tenant isolation

Every v2 project has:

  • A tenant_id — a UUID, immutable, never exposed on the public path.
  • A shortid — the first 12 hex characters of tenant_id with hyphens removed. Deterministic; immutable.
  • A schema named t_<shortid>_api, e.g. t_5ecfa3ab72d1_api.
  • A role named t_<shortid>_role, e.g. t_5ecfa3ab72d1_role.
  • A search_path for that role set to its schema.
  • Grants that give the role USAGE on its schema and table privileges only there.

The slug — the human-readable name shown in the dashboard and in the Service URL — is never embedded in a schema or role name. Slugs can change; identifiers must not.

Where the math lives

Twelve hex characters is 48 bits of entropy. The probability of a shortid collision across 2,000 tenants on a single cluster is roughly 2000² / 2 × 2^48 ≈ 1.4 × 10⁻⁹. The full identifier t_<12 hex>_api is 18 bytes — well inside PostgreSQL's 63-byte limit.

txt
tenant_id  = "5ecfa3ab-72d1-4b2a-9a1d-0f1e2d3c4b5a"uuid_hex   = "5ecfa3ab72d14b2a9a1d0f1e2d3c4b5a"shortid    = "5ecfa3ab72d1"                      first 12 hex charsschema     = "t_5ecfa3ab72d1_api"role       = "t_5ecfa3ab72d1_role"

What you are protected from on v2

  • Cross-tenant SQL access. A request that arrives with one tenant's bridge JWT cannot read another tenant's tables. The role's USAGE and SELECT grants exist only on its own schema.
  • Slug-based collisions. Renaming a project does not move data. Schema names trace to the immutable tenant_id, not the mutable slug.
  • Direct PostgREST access. PostgREST is not on the public internet on v2. The only network path is through the gateway.

What you are not protected from on v2

  • Cluster-level blast radius. v2 deliberately accepts that a misbehaving tenant can stress the shared cluster. Operational controls (rate limits, connection caps, statement timeouts) reduce this; they do not eliminate it.
  • Hard CPU isolation. v2 does not promise per-tenant CPU pinning. If you need that, you want v1 dedicated.
  • RLS by default. v2's baseline isolation is schema and role, not row-level security. RLS is an opt-in additional control your app adds; it is not the load-bearing boundary.

For the threat-model framing, see Threat model. For the structural layout, see Tenant isolation (architecture).

5. Why no static keys

Most BaaS platforms expose a static anon key (and sometimes a service key) in the browser. Flux v2 does not. This is deliberate; the alternative was considered and rejected.

A static key on the public internet is a credential. Once it is in a browser bundle, every reader has it; rotating it breaks every deployed client; and the role it grants is bounded only by the policies attached to it. PostgREST and PostgreSQL can carry that model, but it concentrates the trust decision in a piece of string sitting in your repo and your CDN.

v2 moves the trust pivot to the gateway instead:

  • The token your app sends is signed with your project's secret. Compromising one project's secret does not leak any other project's data.
  • The token is verified server-side, on every request, by code that owns the tenant resolution. Forgery requires the project secret, not just access to network traffic.
  • The role assigned downstream is chosen by the gateway from the verified token, not by a client-supplied claim PostgREST trusts blindly.
  • The internal step uses a short-lived bridge JWT that never leaves the gateway → PostgREST hop. There is no long-lived database-equivalent credential sitting on the public internet.

This is the no-shim policy in concrete form: Flux fixes the capability gap (per-tenant verification at the edge) at the Flux layer, instead of asking every application to ship around an exposed key. You can still reach for RLS when your data model needs row-by-row checks; the platform-level boundary is one layer up and does not depend on your policies being correct.

6. Internal architecture

This section names the parts the previous five did not. The reader who only needs to ship can stop here and come back later; the reader who needs to debug, plan capacity, or evaluate trust will want what follows.

Control plane vs data plane

The system splits cleanly:

PlaneWhat it ownsComponents
Control planeDecides what exists. Project records, tenant identifiers, secrets, schema migrations, lifecycle. Read-only from the request hot path.flux-system Postgres, apps/dashboard (UI + API), @flux/core (engine registry), @flux/cli
Data planeServes application traffic.@flux/gateway, PostgREST pool, PgBouncer, the shared PostgreSQL cluster, Redis (cache and best-effort telemetry)

The control plane writes to flux-system. The gateway reads from it (with Redis as a read-through cache). The gateway never writes to flux-system and never participates in lifecycle.

Engine abstraction

v1 dedicated and v2 shared coexist in one codebase. The choice is not a fork. A project record carries an explicit mode field (v1_dedicated or v2_shared); the engine registry in @flux/core selects the implementation by mode.

A canonical project row:

json
{  "tenant_id": "5ecfa3ab-72d1-4b2a-9a1d-0f1e2d3c4b5a",  "slug": "bloom-atelier",  "mode": "v2_shared",  "plan": "free",  "status": "active"}

mode is always explicit. Provisioning may default new projects to v2_shared, but logic must not infer mode from plan alone. Changing a project's mode is an audited migration action (Pooled → dedicated migrate), not an implicit flag flip.

Both engines share an interface — provisionProject, deleteProject, suspendProject, getApiUrl, getCredentials, setEnv, listEnv, importSql. v2 adds createTenantSchema, createTenantRole, assignTenantMetadata. v1 adds container and Docker-network management. The dashboard, the CLI, and the gateway depend only on the interface; neither plane needs to know which engine a project uses unless it has engine-specific work to do.

Gateway responsibility boundary

The gateway is small on purpose. It:

  • Resolves hostname → tenant_id (system DB; Redis as cache).
  • Verifies the project JWT against the project's jwt_secret.
  • Enforces the per-tenant rate limit; returns HTTP 429 without forwarding when the limit is exceeded.
  • Mints the short-lived bridge JWT and proxies the request to a PostgREST instance on the private network.
  • Best-effort INCR activity:{tenant_id} for activity tracking.

It does not:

  • Hold business logic.
  • Write to any database.
  • Manage tenant lifecycle.
  • Become a "mini control plane."

The reason for that discipline is in §5 Why no static keys and in the failure-boundary discussion below: the gateway is a security control. The smaller and more focused it is, the easier it is to test, audit, and reason about.

Custom domains

Tenants can map custom hostnames onto their project. The control plane stores (hostname → tenant_id) rows in flux-system.domains; the gateway resolves a request's Host header against that table (with Redis caching). On domain create, update, or delete, the control plane explicitly evicts hostname:{old_host} from Redis — TTL expiry alone produces stale routing for the TTL window, which is the wrong default for a routing decision.

Redis is cache and telemetry, never authoritative

KeyPurposeAuthoritative?
hostname:{host}Tenant resolution cacheNo — flux-system.domains is truth on miss
rate:{tenant_id}Per-tenant rate-limit counterNo — limit is policy; counter is best-effort
activity:{tenant_id}Heartbeat for last_accessed_at flushNo — Postgres holds the durable value

Wiping Redis must not affect data correctness. The gateway must never block on Redis; if Redis is unavailable, the gateway falls back to the database (for resolution) or fails open (for rate limiting). The non-blocking Redis pattern is a load-bearing rule in the data-plane code:

typescript
try {  await redis.incr(key);} catch {  // swallow; log a metric if needed; continue with the primary request}

This applies everywhere Redis touches the request path. A Redis timeout that propagates into a request handler is a bug.

Engine selection by tier

TierEngineIsolationCompliance posture
Freev2_sharedschema + rolenot a compliance boundary
Prov2_sharedschema + rolenot a compliance boundary
Enterprisev1_dedicatedcontainerSOC 2 / HIPAA-ready

Tiers map to engines, but the docs and the catalog use the engine name when precision matters. Marketing tiers can rename; the engine vocabulary is stable.

7. Operational behavior

This is the section that determines how v2 behaves under load and what you should expect during a busy weekend.

Rate limiting

  • Algorithm. Fixed window per tenant per hour.
  • Default limits. 20 req/min on Free, 100 req/min on Pro. Configurable per project.
  • Over-limit response. HTTP 429 from the gateway; the request is not forwarded to PostgREST.
  • Redis unavailable. Default is fail-open — allow the request, log a metric. Availability over strict enforcement during cache outages. A future Pro+ option may opt into fail-closed or hybrid enforcement; the default does not.

Activity tracking is lossy by design

The system needs to know whether a tenant was active in the last few minutes; it does not need exact per-request timestamps. Each request does a best-effort INCR activity:{tenant_id} with a 60-second TTL; a flush worker every 30–60 seconds reads the active set and updates projects.last_accessed_at. Redis loss costs at most one window of activity; Postgres remains the durable record.

The downstream consumer is the idle-reaping job (reapIdleProjects), which suspends or reclaims resources from projects inactive for a configured threshold. That feeds cost control on the shared tier.

PgBouncer transaction pooling — what it breaks

PostgREST connects through PgBouncer in transaction pooling mode. A given Postgres backend is held only for the duration of a single transaction, then returned to the pool. Some Postgres patterns assume session continuity and break:

PatternBroken by transaction pooling?
CREATE TEMP TABLE across requestsyes
Advisory locks across requestsyes
SET / SET LOCAL persisting beyond a transactionyes
Named prepared statementsyes — use unnamed
LISTEN / NOTIFYyes — use a dedicated non-pooled connection if you need it

If you write SQL that depends on session-scoped state across requests on v2, the symptoms are non-deterministic and hard to reproduce. Document this for your application's authors.

Per-role guardrails at provisioning time

Every tenant role is provisioned with cluster-friendly limits:

sql
ALTER ROLE t_5ecfa3ab72d1_role SET statement_timeout = '5s';ALTER ROLE t_5ecfa3ab72d1_role CONNECTION LIMIT 5;

Adjusted by tier. These are cheap controls that prevent one tenant from monopolizing the cluster. Treat them as defaults, not absolutes — a Pro project that legitimately needs longer queries can have its limit raised.

Portable tenant backups

flux backup create on v2 runs pg_dump -Fc --schema=t_<shortid>_api --no-owner --no-acl against the shared cluster. The artifact contains only that tenant's schema — no other tenants, no cluster-global objects.

This is a portable tenant export, restorable into any Postgres for migrations or off-platform analysis. It is not a substitute for cluster-level disaster recovery (physical backups, WAL archiving, PITR), which is a platform operations concern. Verification (flux backup verify) runs the same pg_restore smoke test in a disposable Postgres container that v1 uses.

Catalog rows for these backups carry kind = tenant_export, distinct from the project_db rows v1 produces. The full backup contract — trust states, what backups guarantee, how the destructive-action gate works — lives in Backups.

Scaling

  • Per-cluster planning range. 500–2,000 tenants per shared Postgres cluster. Beyond that, pg_namespace catalog growth, migration fan-out time, and per-role overhead start to bite.
  • Horizontal scaling. When a cluster approaches capacity, provision a new shared cluster and assign new tenants there via flux-system.projects.cluster_id. Existing tenants stay on their assigned cluster indefinitely.
  • PostgREST pool sharding. When a single PostgREST pool becomes a bottleneck, shard by tenant hash or by tier (Free vs Pro on separate pools). The gateway's upstream lookup chooses the pool; this is an operational change, not an application-visible one.

Vertical scaling of a single cluster beyond the planning range is not the answer. The model is many right-sized clusters, not a few enormous ones.

8. Failure boundaries

The honest version of "what can go wrong on v2" and "what the platform does not promise."

The top risk: JWT mis-issuance

The single most consequential failure mode on v2 is a gateway-issued bridge JWT carrying the wrong role or tenant_id. PostgREST will trust that JWT and SET ROLE accordingly; PostgreSQL will then serve whatever that role is allowed to see. There is no further database-level guard unless an application has added RLS for its own reasons — and v2's baseline does not.

The defense is gateway correctness:

  • The gateway is the only code path that issues runtime JWTs (invariant 3 below). No other service mints a token PostgREST will accept on this path.
  • tenant_id is validated against flux-system on every resolution; the bridge JWT reflects the database record, not a value the client supplied.
  • Bridge JWT TTL is short (1–5 minutes). A mis-issued token has a small window to do damage before it expires.
  • The gateway resolution path and host parser are tested against both flattened and dotted hostnames, and against tenant-renaming flows.

This is the failure mode that justifies treating the gateway as a security control rather than a proxy.

What v2 does not promise

  • RLS on by default. Schema and role separation are the load-bearing boundary on v2. RLS is an opt-in tool your application adds when its data model needs row-by-row checks.
  • Hard CPU or memory isolation. v2 is shared; cluster-level blast radius is the deliberate tradeoff. If you need hardware isolation, v1 dedicated is the right mode.
  • Compliance posture on shared infrastructure. SOC 2 and HIPAA workloads belong on v1_dedicated. v2 is not a compliance boundary.
  • Protection from compromised IdP keys. Your identity provider's signing keys remain in scope. If those keys leak, an attacker can mint tokens your gateway accepts; rotating the per-project jwt_secret is the response.
  • Strict rate limiting during Redis outages. The default is fail-open (availability over enforcement). A future Pro+ option may opt into fail-closed; today, an extended Redis outage makes rate limits advisory.

For the equivalent reader-facing framing, see Threat model.

Operational risks tracked in the open

RiskSeverityNote
pg_namespace catalog bloat above 2,000 schemas per clustermediumHard cluster cap; horizontal scaling plan above
Migration fan-out across many tenant schemasmediumDDL fan-out is slow at scale; tooling is incremental
PostgREST pool reload at scalelow–mediumRolling restart strategy needed to avoid dropped in-flight requests
Custom-domain cache eviction gapslowMitigated by explicit eviction on domain ops
RLS opt-in for Pro+futureNot in v2 scope today; tracked as a future tier enhancement
Enterprise on shared infrastructurefutureRequires compliance review and explicit migration; not an automatic flag

These are tracked openly because hiding them would not make them go away.

9. Migration to dedicated

If your project outgrows pooled or your compliance posture changes, flux migrate orchestrates a move from v2_shared to v1_dedicated. The control plane provisions the dedicated stack, dumps the tenant schema with pg_dump, restores into the new database, and (on full cutover) flips the mode field. The Service URL shape is preserved; your app's env may need to update if you rotated secrets during the move.

The migration is staged, dry-runnable, and explicit. See Pooled → dedicated migrate for the full procedure, the success checklist, and the troubleshooting notes.

A future tool may allow v1_dedicated → v2_shared migration. It is not available today and is not on a timeline.


Reference

The sections below are stable contracts the rest of this page builds on. They are written for the reader who wants to verify a claim without re-reading the prose, and for the Flux maintainers who treat them as code-review gates.

Invariants

These rules are unconditional. The Flux team treats violations as blocking review issues.

#Invariant
1tenant_id (UUID) is immutable. Slug is UI-only and may change.
2Postgres schema and role names are derived only from tenant_id via the deterministic shortid. Slug is never embedded in a schema or role identifier.
3The gateway is the only issuer of bridge JWTs for tenant API traffic.
4PostgREST is not publicly reachable. Only the gateway has a network path to it.
5Redis is never authoritative. It is cache and best-effort telemetry. Wiping Redis must not affect data correctness.
6Per-tenant schemas are never enumerated in PGRST_DB_SCHEMAS. Access is controlled by role grants, search_path, and JWT role. Listing tenant schemas there widens introspection surface and makes isolation depend on JWT claims alone.

What Flux deliberately does not do

These are not bugs or future work. They are absences chosen on purpose.

PatternWhy Flux does not adopt it
Static anon / service keys on the public internetOne leaked key compromises every request that key authorizes; the gateway-verified JWT model is the correction. See §5 Why no static keys.
RLS enabled by default on every tenant schemaSpeculative complexity; hurts query planning; not the load-bearing boundary on v2. RLS remains available where applications need it (RLS).
Extra abstraction layers (TenantService, RoutingService, etc.) inside @flux/corePremature indirection without payoff at this scale. The engine interface is small; new abstractions earn their place.
Async job queues introduced before they are neededOperational complexity before product-market fit. The control plane is imperative today; a reconcile loop will arrive when the operational model justifies it.
Redis errors propagating into the gateway hot pathViolates the fail-open contract; blocks legitimate requests for a cache outage. The non-blocking pattern is enforced.
mode inferred solely from planmode is the source of truth. plan may default new provisioning, but the engine must never be chosen by an implicit if (plan === "enterprise") check.
Gateway logic in @flux/core or DB logic in @flux/cliViolates the control-plane / data-plane boundary. @flux/cli is a thin wrapper; @flux/gateway owns the request path.

This is the no-shim policy made concrete: Flux refuses to add layers, defaults, or escape hatches that would let the platform appear smaller while making applications larger.

Repository layout

Package / appLayerv2 role
packages/core (@flux/core)control planeengine registry, canonical types, project orchestration
packages/engine-v1 (@flux/engine-v1)enginededicated container execution strategy (v1_dedicated)
packages/engine-v2 (@flux/engine-v2)engineshared cluster execution strategy (v2_shared)
packages/gateway (@flux/gateway)data planeresolve → verify → rate-limit → bridge JWT → proxy
packages/cli (@flux/cli)control planeCLI surface; wraps @flux/core; no DB or gateway logic
packages/sdk (@flux/sdk)clienttenant API client
apps/dashboardcontrol planeUI + API routes; reads from flux-system

This map is contributor-facing. App builders never need to install or import any of these packages.

Next steps