Flux v2 architecture
This page explains the architecture of Flux v2 shared (pooled) projects. It is the deepest page in the docs.
If you only want to make your first successful request, start with:
Return here once you want the deeper system model — the invariants the gateway protects, what the bridge JWT actually carries, where tenant isolation begins and ends, and what Flux deliberately does not promise.
What you will learn
- The mental model: pooled infrastructure with logically isolated tenants
- What your application sees, and what it does not (no static keys on v2)
- How a request travels from your app to a tenant schema, step by step
- Where the trust pivot is, and why one component is a security control
- The operational behavior — rate limits, scaling, PgBouncer constraints — that shapes how v2 behaves under load
- What v2 explicitly does not promise, and when v1 dedicated is the better fit
1. Mental model
Flux v2 is:
- Pooled infrastructure — many projects share one PostgreSQL cluster and one PostgREST pool.
- Schema-isolated tenants — each project owns a deterministic Postgres schema and a deterministic Postgres role; nothing in those names is derived from your slug.
- JWT-authenticated requests — there are no anonymous or service keys exposed on the public API. Your application sends a JWT signed with the project's secret; the Flux gateway verifies it.
- PostgREST-backed APIs — your tables become HTTP resources via PostgREST. The contract is HTTP and PostgreSQL, not a Flux-proprietary client SDK.
- Operationally shared, logically isolated — the cluster and the pool are shared. The data, the role, and the schema are not.
The whole architecture exists to make those five sentences true at the same time.
What this page leaves for later
The first half of this page does not mention PgBouncer, Traefik, internal pool sharding, Redis cache keys, or operational scaling. Those exist and matter, but the reader who needs them will have read the mental model first. They appear from §6 Internal architecture onward.
2. What your app sees
Your app has three points of contact with v2:
- The Service URL. A public HTTPS hostname for your project, of the form
https://api--<slug>--<hash>.<base-domain>. This is where every request goes. See Service URLs. - A JWT in the
Authorizationheader. Your app signs an HS256 JWT with the project'sjwt_secret(or asks an IdP like Clerk or Auth.js to sign one with that secret). Every request carriesAuthorization: Bearer <token>. See JWT authentication. - JSON in and out, via PostgREST. Tables become resources at
/<table>; query parameters filter, sort, and project. The shape is PostgREST's published interface, not something Flux wraps.
What your app does not see:
- No static
anonorservicekeys on the public internet. Pooled stacks do not expose long-lived database-equivalent secrets to browsers. The trust pivot is the gateway, not a key embedded in your bundle. - No PostgREST hostname. PostgREST is not publicly reachable on v2. Only the gateway has a network path to it.
- No internal JWT. The token your app signs is the only one your app handles. The gateway issues a separate, short-lived bridge JWT for the internal hop to PostgREST; you never see it.
That last difference is the one most readers coming from other BaaS platforms get wrong. There is no anon key to leak.
3. Authentication flow
A complete v2 request, end to end:
Your app ↓ Authorization: Bearer <project JWT, HS256 with jwt_secret>Flux gateway ├─ resolves Host header to a tenant_id (system DB; Redis as read-through cache) ├─ verifies the project JWT against the project's jwt_secret ├─ enforces the per-tenant rate limit (Redis, fail-open) └─ mints a short-lived bridge JWT { role, tenant_id, exp } ↓ private network onlyPostgREST pool ├─ verifies the bridge JWT against PGRST_JWT_SECRET └─ SET ROLE <role from the bridge JWT> on the connection ↓PgBouncer (transaction pooling) ↓PostgreSQL shared cluster └─ search_path is t_<shortid>_api; only that schema is visible to this role
Three signing keys are in play:
| Signs | Key material | Held by |
|---|---|---|
| Project JWT (your app's token) | The project's jwt_secret (per-project, in flux-system) | Your app, the gateway |
| Bridge JWT (gateway → PostgREST) | The pool's PGRST_JWT_SECRET | The gateway, PostgREST pool |
| Admin / service tokens (control plane) | A separate signing key | Control plane only — never on the public path |
Apps sign with the per-project secret. Apps never see the pool secret. The two are deliberately separate so that compromising one tenant's secret does not let an attacker forge tokens accepted by the pool, and so that the pool can be rotated independently of every tenant.
For the security posture this implies, see Authentication model and Bridge JWTs.
4. Tenant isolation
Every v2 project has:
- A
tenant_id— a UUID, immutable, never exposed on the public path. - A
shortid— the first 12 hex characters oftenant_idwith hyphens removed. Deterministic; immutable. - A schema named
t_<shortid>_api, e.g.t_5ecfa3ab72d1_api. - A role named
t_<shortid>_role, e.g.t_5ecfa3ab72d1_role. - A
search_pathfor that role set to its schema. - Grants that give the role
USAGEon its schema and table privileges only there.
The slug — the human-readable name shown in the dashboard and in the Service URL — is never embedded in a schema or role name. Slugs can change; identifiers must not.
Where the math lives
Twelve hex characters is 48 bits of entropy. The probability of a shortid collision across 2,000 tenants on a single cluster is roughly 2000² / 2 × 2^48 ≈ 1.4 × 10⁻⁹. The full identifier t_<12 hex>_api is 18 bytes — well inside PostgreSQL's 63-byte limit.
tenant_id = "5ecfa3ab-72d1-4b2a-9a1d-0f1e2d3c4b5a"uuid_hex = "5ecfa3ab72d14b2a9a1d0f1e2d3c4b5a"shortid = "5ecfa3ab72d1" first 12 hex charsschema = "t_5ecfa3ab72d1_api"role = "t_5ecfa3ab72d1_role"
What you are protected from on v2
- Cross-tenant SQL access. A request that arrives with one tenant's bridge JWT cannot read another tenant's tables. The role's
USAGEandSELECTgrants exist only on its own schema. - Slug-based collisions. Renaming a project does not move data. Schema names trace to the immutable
tenant_id, not the mutable slug. - Direct PostgREST access. PostgREST is not on the public internet on v2. The only network path is through the gateway.
What you are not protected from on v2
- Cluster-level blast radius. v2 deliberately accepts that a misbehaving tenant can stress the shared cluster. Operational controls (rate limits, connection caps, statement timeouts) reduce this; they do not eliminate it.
- Hard CPU isolation. v2 does not promise per-tenant CPU pinning. If you need that, you want v1 dedicated.
- RLS by default. v2's baseline isolation is schema and role, not row-level security. RLS is an opt-in additional control your app adds; it is not the load-bearing boundary.
For the threat-model framing, see Threat model. For the structural layout, see Tenant isolation (architecture).
5. Why no static keys
Most BaaS platforms expose a static anon key (and sometimes a service key) in the browser. Flux v2 does not. This is deliberate; the alternative was considered and rejected.
A static key on the public internet is a credential. Once it is in a browser bundle, every reader has it; rotating it breaks every deployed client; and the role it grants is bounded only by the policies attached to it. PostgREST and PostgreSQL can carry that model, but it concentrates the trust decision in a piece of string sitting in your repo and your CDN.
v2 moves the trust pivot to the gateway instead:
- The token your app sends is signed with your project's secret. Compromising one project's secret does not leak any other project's data.
- The token is verified server-side, on every request, by code that owns the tenant resolution. Forgery requires the project secret, not just access to network traffic.
- The role assigned downstream is chosen by the gateway from the verified token, not by a client-supplied claim PostgREST trusts blindly.
- The internal step uses a short-lived bridge JWT that never leaves the gateway → PostgREST hop. There is no long-lived database-equivalent credential sitting on the public internet.
This is the no-shim policy in concrete form: Flux fixes the capability gap (per-tenant verification at the edge) at the Flux layer, instead of asking every application to ship around an exposed key. You can still reach for RLS when your data model needs row-by-row checks; the platform-level boundary is one layer up and does not depend on your policies being correct.
6. Internal architecture
This section names the parts the previous five did not. The reader who only needs to ship can stop here and come back later; the reader who needs to debug, plan capacity, or evaluate trust will want what follows.
Control plane vs data plane
The system splits cleanly:
| Plane | What it owns | Components |
|---|---|---|
| Control plane | Decides what exists. Project records, tenant identifiers, secrets, schema migrations, lifecycle. Read-only from the request hot path. | flux-system Postgres, apps/dashboard (UI + API), @flux/core (engine registry), @flux/cli |
| Data plane | Serves application traffic. | @flux/gateway, PostgREST pool, PgBouncer, the shared PostgreSQL cluster, Redis (cache and best-effort telemetry) |
The control plane writes to flux-system. The gateway reads from it (with Redis as a read-through cache). The gateway never writes to flux-system and never participates in lifecycle.
Engine abstraction
v1 dedicated and v2 shared coexist in one codebase. The choice is not a fork. A project record carries an explicit mode field (v1_dedicated or v2_shared); the engine registry in @flux/core selects the implementation by mode.
A canonical project row:
{ "tenant_id": "5ecfa3ab-72d1-4b2a-9a1d-0f1e2d3c4b5a", "slug": "bloom-atelier", "mode": "v2_shared", "plan": "free", "status": "active"}
mode is always explicit. Provisioning may default new projects to v2_shared, but logic must not infer mode from plan alone. Changing a project's mode is an audited migration action (Pooled → dedicated migrate), not an implicit flag flip.
Both engines share an interface — provisionProject, deleteProject, suspendProject, getApiUrl, getCredentials, setEnv, listEnv, importSql. v2 adds createTenantSchema, createTenantRole, assignTenantMetadata. v1 adds container and Docker-network management. The dashboard, the CLI, and the gateway depend only on the interface; neither plane needs to know which engine a project uses unless it has engine-specific work to do.
Gateway responsibility boundary
The gateway is small on purpose. It:
- Resolves hostname →
tenant_id(system DB; Redis as cache). - Verifies the project JWT against the project's
jwt_secret. - Enforces the per-tenant rate limit; returns HTTP 429 without forwarding when the limit is exceeded.
- Mints the short-lived bridge JWT and proxies the request to a PostgREST instance on the private network.
- Best-effort
INCR activity:{tenant_id}for activity tracking.
It does not:
- Hold business logic.
- Write to any database.
- Manage tenant lifecycle.
- Become a "mini control plane."
The reason for that discipline is in §5 Why no static keys and in the failure-boundary discussion below: the gateway is a security control. The smaller and more focused it is, the easier it is to test, audit, and reason about.
Custom domains
Tenants can map custom hostnames onto their project. The control plane stores (hostname → tenant_id) rows in flux-system.domains; the gateway resolves a request's Host header against that table (with Redis caching). On domain create, update, or delete, the control plane explicitly evicts hostname:{old_host} from Redis — TTL expiry alone produces stale routing for the TTL window, which is the wrong default for a routing decision.
Redis is cache and telemetry, never authoritative
| Key | Purpose | Authoritative? |
|---|---|---|
hostname:{host} | Tenant resolution cache | No — flux-system.domains is truth on miss |
rate:{tenant_id} | Per-tenant rate-limit counter | No — limit is policy; counter is best-effort |
activity:{tenant_id} | Heartbeat for last_accessed_at flush | No — Postgres holds the durable value |
Wiping Redis must not affect data correctness. The gateway must never block on Redis; if Redis is unavailable, the gateway falls back to the database (for resolution) or fails open (for rate limiting). The non-blocking Redis pattern is a load-bearing rule in the data-plane code:
try { await redis.incr(key);} catch { // swallow; log a metric if needed; continue with the primary request}
This applies everywhere Redis touches the request path. A Redis timeout that propagates into a request handler is a bug.
Engine selection by tier
| Tier | Engine | Isolation | Compliance posture |
|---|---|---|---|
| Free | v2_shared | schema + role | not a compliance boundary |
| Pro | v2_shared | schema + role | not a compliance boundary |
| Enterprise | v1_dedicated | container | SOC 2 / HIPAA-ready |
Tiers map to engines, but the docs and the catalog use the engine name when precision matters. Marketing tiers can rename; the engine vocabulary is stable.
7. Operational behavior
This is the section that determines how v2 behaves under load and what you should expect during a busy weekend.
Rate limiting
- Algorithm. Fixed window per tenant per hour.
- Default limits. 20 req/min on Free, 100 req/min on Pro. Configurable per project.
- Over-limit response. HTTP 429 from the gateway; the request is not forwarded to PostgREST.
- Redis unavailable. Default is fail-open — allow the request, log a metric. Availability over strict enforcement during cache outages. A future Pro+ option may opt into fail-closed or hybrid enforcement; the default does not.
Activity tracking is lossy by design
The system needs to know whether a tenant was active in the last few minutes; it does not need exact per-request timestamps. Each request does a best-effort INCR activity:{tenant_id} with a 60-second TTL; a flush worker every 30–60 seconds reads the active set and updates projects.last_accessed_at. Redis loss costs at most one window of activity; Postgres remains the durable record.
The downstream consumer is the idle-reaping job (reapIdleProjects), which suspends or reclaims resources from projects inactive for a configured threshold. That feeds cost control on the shared tier.
PgBouncer transaction pooling — what it breaks
PostgREST connects through PgBouncer in transaction pooling mode. A given Postgres backend is held only for the duration of a single transaction, then returned to the pool. Some Postgres patterns assume session continuity and break:
| Pattern | Broken by transaction pooling? |
|---|---|
CREATE TEMP TABLE across requests | yes |
| Advisory locks across requests | yes |
SET / SET LOCAL persisting beyond a transaction | yes |
| Named prepared statements | yes — use unnamed |
LISTEN / NOTIFY | yes — use a dedicated non-pooled connection if you need it |
If you write SQL that depends on session-scoped state across requests on v2, the symptoms are non-deterministic and hard to reproduce. Document this for your application's authors.
Per-role guardrails at provisioning time
Every tenant role is provisioned with cluster-friendly limits:
ALTER ROLE t_5ecfa3ab72d1_role SET statement_timeout = '5s';ALTER ROLE t_5ecfa3ab72d1_role CONNECTION LIMIT 5;
Adjusted by tier. These are cheap controls that prevent one tenant from monopolizing the cluster. Treat them as defaults, not absolutes — a Pro project that legitimately needs longer queries can have its limit raised.
Portable tenant backups
flux backup create on v2 runs pg_dump -Fc --schema=t_<shortid>_api --no-owner --no-acl against the shared cluster. The artifact contains only that tenant's schema — no other tenants, no cluster-global objects.
This is a portable tenant export, restorable into any Postgres for migrations or off-platform analysis. It is not a substitute for cluster-level disaster recovery (physical backups, WAL archiving, PITR), which is a platform operations concern. Verification (flux backup verify) runs the same pg_restore smoke test in a disposable Postgres container that v1 uses.
Catalog rows for these backups carry kind = tenant_export, distinct from the project_db rows v1 produces. The full backup contract — trust states, what backups guarantee, how the destructive-action gate works — lives in Backups.
Scaling
- Per-cluster planning range. 500–2,000 tenants per shared Postgres cluster. Beyond that,
pg_namespacecatalog growth, migration fan-out time, and per-role overhead start to bite. - Horizontal scaling. When a cluster approaches capacity, provision a new shared cluster and assign new tenants there via
flux-system.projects.cluster_id. Existing tenants stay on their assigned cluster indefinitely. - PostgREST pool sharding. When a single PostgREST pool becomes a bottleneck, shard by tenant hash or by tier (Free vs Pro on separate pools). The gateway's upstream lookup chooses the pool; this is an operational change, not an application-visible one.
Vertical scaling of a single cluster beyond the planning range is not the answer. The model is many right-sized clusters, not a few enormous ones.
8. Failure boundaries
The honest version of "what can go wrong on v2" and "what the platform does not promise."
The top risk: JWT mis-issuance
The single most consequential failure mode on v2 is a gateway-issued bridge JWT carrying the wrong role or tenant_id. PostgREST will trust that JWT and SET ROLE accordingly; PostgreSQL will then serve whatever that role is allowed to see. There is no further database-level guard unless an application has added RLS for its own reasons — and v2's baseline does not.
The defense is gateway correctness:
- The gateway is the only code path that issues runtime JWTs (invariant 3 below). No other service mints a token PostgREST will accept on this path.
tenant_idis validated againstflux-systemon every resolution; the bridge JWT reflects the database record, not a value the client supplied.- Bridge JWT TTL is short (1–5 minutes). A mis-issued token has a small window to do damage before it expires.
- The gateway resolution path and host parser are tested against both flattened and dotted hostnames, and against tenant-renaming flows.
This is the failure mode that justifies treating the gateway as a security control rather than a proxy.
What v2 does not promise
- RLS on by default. Schema and role separation are the load-bearing boundary on v2. RLS is an opt-in tool your application adds when its data model needs row-by-row checks.
- Hard CPU or memory isolation. v2 is shared; cluster-level blast radius is the deliberate tradeoff. If you need hardware isolation, v1 dedicated is the right mode.
- Compliance posture on shared infrastructure. SOC 2 and HIPAA workloads belong on
v1_dedicated. v2 is not a compliance boundary. - Protection from compromised IdP keys. Your identity provider's signing keys remain in scope. If those keys leak, an attacker can mint tokens your gateway accepts; rotating the per-project
jwt_secretis the response. - Strict rate limiting during Redis outages. The default is fail-open (availability over enforcement). A future Pro+ option may opt into fail-closed; today, an extended Redis outage makes rate limits advisory.
For the equivalent reader-facing framing, see Threat model.
Operational risks tracked in the open
| Risk | Severity | Note |
|---|---|---|
pg_namespace catalog bloat above 2,000 schemas per cluster | medium | Hard cluster cap; horizontal scaling plan above |
| Migration fan-out across many tenant schemas | medium | DDL fan-out is slow at scale; tooling is incremental |
| PostgREST pool reload at scale | low–medium | Rolling restart strategy needed to avoid dropped in-flight requests |
| Custom-domain cache eviction gaps | low | Mitigated by explicit eviction on domain ops |
| RLS opt-in for Pro+ | future | Not in v2 scope today; tracked as a future tier enhancement |
| Enterprise on shared infrastructure | future | Requires compliance review and explicit migration; not an automatic flag |
These are tracked openly because hiding them would not make them go away.
9. Migration to dedicated
If your project outgrows pooled or your compliance posture changes, flux migrate orchestrates a move from v2_shared to v1_dedicated. The control plane provisions the dedicated stack, dumps the tenant schema with pg_dump, restores into the new database, and (on full cutover) flips the mode field. The Service URL shape is preserved; your app's env may need to update if you rotated secrets during the move.
The migration is staged, dry-runnable, and explicit. See Pooled → dedicated migrate for the full procedure, the success checklist, and the troubleshooting notes.
A future tool may allow v1_dedicated → v2_shared migration. It is not available today and is not on a timeline.
Reference
The sections below are stable contracts the rest of this page builds on. They are written for the reader who wants to verify a claim without re-reading the prose, and for the Flux maintainers who treat them as code-review gates.
Invariants
These rules are unconditional. The Flux team treats violations as blocking review issues.
| # | Invariant |
|---|---|
| 1 | tenant_id (UUID) is immutable. Slug is UI-only and may change. |
| 2 | Postgres schema and role names are derived only from tenant_id via the deterministic shortid. Slug is never embedded in a schema or role identifier. |
| 3 | The gateway is the only issuer of bridge JWTs for tenant API traffic. |
| 4 | PostgREST is not publicly reachable. Only the gateway has a network path to it. |
| 5 | Redis is never authoritative. It is cache and best-effort telemetry. Wiping Redis must not affect data correctness. |
| 6 | Per-tenant schemas are never enumerated in PGRST_DB_SCHEMAS. Access is controlled by role grants, search_path, and JWT role. Listing tenant schemas there widens introspection surface and makes isolation depend on JWT claims alone. |
What Flux deliberately does not do
These are not bugs or future work. They are absences chosen on purpose.
| Pattern | Why Flux does not adopt it |
|---|---|
Static anon / service keys on the public internet | One leaked key compromises every request that key authorizes; the gateway-verified JWT model is the correction. See §5 Why no static keys. |
| RLS enabled by default on every tenant schema | Speculative complexity; hurts query planning; not the load-bearing boundary on v2. RLS remains available where applications need it (RLS). |
Extra abstraction layers (TenantService, RoutingService, etc.) inside @flux/core | Premature indirection without payoff at this scale. The engine interface is small; new abstractions earn their place. |
| Async job queues introduced before they are needed | Operational complexity before product-market fit. The control plane is imperative today; a reconcile loop will arrive when the operational model justifies it. |
| Redis errors propagating into the gateway hot path | Violates the fail-open contract; blocks legitimate requests for a cache outage. The non-blocking pattern is enforced. |
mode inferred solely from plan | mode is the source of truth. plan may default new provisioning, but the engine must never be chosen by an implicit if (plan === "enterprise") check. |
Gateway logic in @flux/core or DB logic in @flux/cli | Violates the control-plane / data-plane boundary. @flux/cli is a thin wrapper; @flux/gateway owns the request path. |
This is the no-shim policy made concrete: Flux refuses to add layers, defaults, or escape hatches that would let the platform appear smaller while making applications larger.
Repository layout
| Package / app | Layer | v2 role |
|---|---|---|
packages/core (@flux/core) | control plane | engine registry, canonical types, project orchestration |
packages/engine-v1 (@flux/engine-v1) | engine | dedicated container execution strategy (v1_dedicated) |
packages/engine-v2 (@flux/engine-v2) | engine | shared cluster execution strategy (v2_shared) |
packages/gateway (@flux/gateway) | data plane | resolve → verify → rate-limit → bridge JWT → proxy |
packages/cli (@flux/cli) | control plane | CLI surface; wraps @flux/core; no DB or gateway logic |
packages/sdk (@flux/sdk) | client | tenant API client |
apps/dashboard | control plane | UI + API routes; reads from flux-system |
This map is contributor-facing. App builders never need to install or import any of these packages.
Next steps
- Mental model — the same picture, one screen, no internals.
- Gateway — the resolve → verify → bridge → proxy hop in more detail.
- Bridge JWTs — what the internal token carries and why.
- Threat model — the failure boundaries above, framed for the security reader.
- Pooled → dedicated migrate — when v2 stops being the right answer.