Staging Rls Hardening Plan
Source: docs/operations/staging-rls-hardening-plan.md
# Staging RLS Hardening Plan (Worker + Async Paths)
Date: 2026-02-17
Owner: Platform engineering
Scope: non-request async execution paths that currently use direct Prisma calls without transaction-scoped tenant GUC.
## Implementation status
- Queue/job lifecycle hardening: implemented.
- Worker handler hardening (`wayfair-upload`, `sima-validation`): implemented.
- Notification async stack hardening (scheduler/engine/hooks/templates): implemented.
- FORCE RLS re-enable migration: `apps/api/prisma/migrations/20260302000000_reenable_force_rls_tenant_tables`.
- Verification checklist: `docs/operations/staging-force-rls-verification-checklist.md`.
## Why this existed
Staging previously had `FORCE ROW LEVEL SECURITY` disabled on non-guest tables because worker and async paths did not run inside `withTenant()` transactions. The gaps below have been remediated (see Implementation status above). The inventory is preserved for audit trail.
## Gap inventory (resolved)
### Job claiming and lifecycle
- `apps/api/src/workers/job-processor.ts:70`
- `apps/api/src/workers/job-processor.ts:114`
- `apps/api/src/workers/job-processor.ts:128`
- `apps/api/src/lib/jobs.ts:57`
- `apps/api/src/lib/jobs.ts:73`
- `apps/api/src/lib/jobs.ts:82`
- `apps/api/src/lib/jobs.ts:94`
- `apps/api/src/lib/jobs.ts:109`
Risk: queue cannot claim/update jobs under forced RLS.
### Core ingest handlers (worker-executed)
- `apps/api/src/workers/handlers/wayfair-upload.ts:71`
- `apps/api/src/workers/handlers/wayfair-upload.ts:103`
- `apps/api/src/workers/handlers/wayfair-upload.ts:123`
- `apps/api/src/workers/handlers/wayfair-upload.ts:152`
- `apps/api/src/workers/handlers/wayfair-upload.ts:389`
- `apps/api/src/workers/handlers/sima-validation.ts:104`
- `apps/api/src/workers/handlers/sima-validation.ts:110`
- `apps/api/src/workers/handlers/sima-validation.ts:135`
- `apps/api/src/workers/handlers/sima-validation.ts:153`
- `apps/api/src/workers/handlers/sima-validation.ts:182`
- `apps/api/src/workers/handlers/sima-validation.ts:381`
Risk: async catalog/SIMA lanes fail on first DB call.
### SHIP / ORDERS async libs called by worker
- `apps/api/src/lib/ship/ingest.ts:78`
- `apps/api/src/lib/ship/ingest.ts:196`
- `apps/api/src/lib/orders/ingest.ts:133`
- `apps/api/src/lib/orders/ingest.ts:209`
- `apps/api/src/lib/orders/linkage.ts:23`
- `apps/api/src/lib/orders/linkage.ts:58`
- `apps/api/src/lib/orders/linkage.ts:91`
- `apps/api/src/lib/orders/linkage.ts:143`
- `apps/api/src/lib/orders/linkage.ts:179`
- `apps/api/src/lib/orders/linkage.ts:213`
- `apps/api/src/lib/ship/carrier-agreement-import.ts:90`
Risk: SHIP/Orders ingest path breaks under force.
### Shared config loaders used during processing
- `apps/api/src/lib/trade/detection.ts:20`
- `apps/api/src/lib/ship/tenant-carrier-overrides.ts:24`
Risk: workers fail before business logic runs.
### Notifications async stack (job processor + schedulers + templates)
- `apps/api/src/lib/notifications/engine.ts:39`
- `apps/api/src/lib/notifications/engine.ts:80`
- `apps/api/src/lib/notifications/scheduler.ts:50`
- `apps/api/src/lib/notifications/scheduler.ts:71`
- `apps/api/src/lib/notifications/spike-detection.ts:39`
- `apps/api/src/lib/notifications/caught-order.ts:26`
- `apps/api/src/lib/notifications/exception-alert.ts:29`
- `apps/api/src/lib/notifications/templates/trade-digest.ts:47`
- `apps/api/src/lib/notifications/templates/trade-caught.ts:35`
- `apps/api/src/lib/notifications/templates/trade-spike.ts:40`
- `apps/api/src/lib/notifications/templates/ship-recon.ts:47`
- `apps/api/src/lib/notifications/templates/ship-exception.ts:35`
- `apps/api/src/lib/notifications/templates/ship-stale.ts:46`
Risk: digest/event jobs and schedule fan-out fail under force.
## Remediation sequence
1. Harden queue primitives first
- Make job claim/reclaim and job lifecycle updates tenant-scoped with `withTenant()`.
- Keep claim semantics atomic (guarded update) while scoping per tenant.
2. Harden worker handlers next
- Ensure each worker-executed DB path runs inside tenant-scoped transactions.
- Eliminate direct `prisma.*` in handlers unless inside `withTenant(prisma, tenantId, ...)`.
3. Harden notifications async stack
- Wrap engine, scheduler, and template queries in tenant context.
- Keep idempotency/rate-limit semantics unchanged.
4. Re-enable FORCE RLS (DB)
- Add migration to set `FORCE ROW LEVEL SECURITY` back on staging-deferred tables.
- Run smoke flows and parity workflow post-migration.
## Verification gates
- `pnpm --filter @rgl8r/api test`
- Worker smoke: enqueue and process one job for each lane:
- `ship_upload`
- `order_upload`
- `sima_validation`
- `catalog_upload` (legacy `wayfair_upload` rows count under this lane)
- `notification_event`
- `notification_digest`
- Re-enable force migration applies cleanly in staging.
- Parity workflow publishes artifact even on failures.
## Notes
- Guest flow is already using `withTenant()` and is not the blocker.
- This plan intentionally avoids `BYPASSRLS` as a runtime dependency.