Staging Force Rls Runbook
Source: docs/operations/staging-force-rls-runbook.md
# Staging FORCE RLS Rollout Runbook
Date: 2026-02-18
Owner: Platform engineering
This runbook is the execution order for restoring `FORCE ROW LEVEL SECURITY` in staging.
## Fast path (single command)
```bash
DATABASE_URL=postgresql://... scripts/run-staging-force-rls-rollout.sh --since-hours=24
```
Verification-only mode (skip migration):
```bash
DATABASE_URL=postgresql://... scripts/run-staging-force-rls-rollout.sh --skip-migrate --since-hours=24
```
With required async lanes (recommended once staging traffic baseline is known):
```bash
DATABASE_URL=postgresql://... \
REQUIRED_ASYNC_LANES=ship_upload,sima_validation \
scripts/run-staging-force-rls-rollout.sh --since-hours=24
```
## Prerequisites
- Async hardening PRs are merged and deployed first:
- worker/job lifecycle hardening
- notification async hardening
- Staging API + worker are healthy.
- Local tools:
- `psql`
- `pnpm`
- `DATABASE_URL` exported to the staging DB.
## Step 1: Apply migration
```bash
pnpm --filter @rgl8r/api exec prisma migrate deploy
```
Expected: migration `20260302000000_reenable_force_rls_tenant_tables` applies cleanly.
## Step 2: Verify RLS + FORCE RLS flags
```bash
scripts/verify-force-rls-staging.sh
```
Expected: PASS for all tenant-scoped tables with at least one policy per table.
## Step 3: Validate async lane health
Run after at least one processing cycle (or after test jobs are queued):
```bash
scripts/smoke-async-lanes.sh --since-hours=24
```
Expected:
- each **required** lane has at least one `COMPLETED` job
- lanes with no required designation can warn without failing the run
Available lanes:
- `ship_upload`
- `order_upload`
- `sima_validation`
- `catalog_upload` (includes legacy `wayfair_upload` rows)
- `notification_event`
- `notification_digest`
Set required lanes via env:
```bash
REQUIRED_ASYNC_LANES=ship_upload,sima_validation scripts/smoke-async-lanes.sh --since-hours=24
```
## Step 4: Check parity + logs
- Run/confirm SHIP parity workflow artifact.
- Confirm there are no repeated tenant-context/RLS errors in staging logs.
## Optional: Run checks in GitHub
Use workflow **Staging FORCE RLS Checks** (`.github/workflows/staging-force-rls-checks.yml`) with:
- `since_hours=24` (or your desired lookback).
This runs both scripts against `STAGING_DATABASE_URL` and uploads artifacts:
- `force-rls-check.out/.err`
- `async-lane-smoke.out/.err`
The workflow also runs daily at 14:00 UTC via cron (defaults to a 24-hour lookback window).
Required lane policy can be configured using repo variable `STAGING_REQUIRED_ASYNC_LANES`.
Optional alerting:
- Set `STAGING_CHECKS_SLACK_WEBHOOK_URL` to receive Slack alerts when checks fail.
## Emergency rollback
Use only if staging becomes unhealthy after migration:
```bash
psql "$DATABASE_URL" -v ON_ERROR_STOP=1 -f \
apps/api/prisma/migrations/20260302000000_reenable_force_rls_tenant_tables/rollback.sql
```
Then rerun:
```bash
scripts/smoke-async-lanes.sh --since-hours=24
```