Skip to Content
InternalDocsOperationsP7 E A7 Cnus Corridor Activation Runbook

P7 E A7 Cnus Corridor Activation Runbook

Source: docs/operations/p7-e-a7-cnus-corridor-activation-runbook.md

# P7-E-A7: CN→US Corridor Activation Runbook Operational runbook for the CN→US trade corridor. Documents rollback controls, on-call procedures, observability, and activation rehearsal evidence for the trade detection pipeline. **Plan linkage:** `P7-E` Gate `G6` (Operational Readiness) --- ## Scope This runbook documents operational readiness for the CN→US corridor: - Tiered rollback plan for the trade detection pipeline - On-call owner and escalation procedures for the activation window - Existing observability and monitoring surface - Activation rehearsal evidence for rollback controls --- ## Current State - **G1–G5 gates complete** — rules seeded, tests pass, corridor isolation proven (PRs #379, #381, #384) - **Runtime wiring complete** — CN→US corridor rules are evaluated in the trade detection path (PR #388): - `loadTradeCorridorRules()` loads enabled non-`SIMA_%` TRADE rules. - Workers preload corridor rules for `screeningAuthority='US'` when `TRADE_DETECTOR_MODE !== legacy`. - `getFinalTradeOutput()` evaluates corridor rules with `originCountry` + `destinationCountry` context. - `dual` mode logs corridor matches; `new` mode applies corridor matches. - Runtime still requires customer trigger and go/no-go approval for production activation. --- ## Rollback Plan Three independent rollback tiers, from narrowest to broadest: | Tier | Mechanism | Scope | Time-to-revert | How | |------|-----------|-------|----------------|-----| | 1. Tenant-level | `TenantDetectorConfig` | Disable `SIMA_EXPOSURE` for one tenant (disables all trade detection for that tenant, not just corridor rules) | < 2 min | `POST /api/admin/tenants/:id/detectors` with `{ "detectors": [{"detectorCode": "SIMA_EXPOSURE", "enabled": false}] }` | | 2. Rule-level | `rule_definitions.enabled` | Disable specific rule codes globally | < 2 min | Look up rule: `GET /api/admin/ref/rule-definitions?ruleCode=SECT301_CHECK` (note the `id` and `updatedAt` from response), then `PUT /api/admin/ref/rule-definitions/:id` with `{"enabled": false, "updatedAt": "<updatedAt from GET>"}` — or SQL: `UPDATE app.rule_definitions SET enabled = false WHERE rule_code IN ('SECT301_CHECK', 'US_232_CHECK')` | | 3. Global | `TRADE_DETECTOR_MODE=legacy` | Bypass entire new detector pipeline | < 5 min | Env var change + service restart on Render | **Data cleanup:** Not needed — corridor rules only flag findings, they do not mutate trade data. Disabling rules prevents new flags; existing flags remain for audit trail. ### Rollback Rehearsal Evidence (Automated) - **Tier 1 (tenant-level):** `apps/api/src/__tests__/trade/detector-selector-cre.test.ts` verifies corridor rules are skipped when `SIMA_EXPOSURE` is disabled in tenant detector config. - **Tier 2 (rule-level):** `apps/api/src/__tests__/rules/loader.integration.test.ts` verifies disabled corridor rules are excluded by `loadTradeCorridorRules()`. - **Tier 3 (global):** `apps/api/src/__tests__/trade/detector-selector-cre.test.ts` verifies `TRADE_DETECTOR_MODE=legacy` bypasses corridor-rule application. --- ## On-Call Procedures - **On-call owner:** Engineering lead (Dan) for initial activation window - **Escalation:** Slack webhook (configured via `STAGING_CHECKS_SLACK_WEBHOOK_URL`, fed by `staging-health-checks.yml` workflow) - **Monitoring:** Render log dashboard — filter by `detectorCode` or `screeningAuthority` in structured JSON logs - **Incident response:** 1. Tier 1 disable (tenant-level) → investigate root cause 2. Fix and re-enable 3. Escalate to Tier 3 (global `TRADE_DETECTOR_MODE=legacy`) only if issue is systemic --- ## Observability (What Exists Today) ### Health check `GET /health` endpoint returns `{ status: 'ok', service: 'rgl8r-api', timestamp: '<ISO>' }`. Automated 30-min cron workflow (`staging-health-checks.yml`) with Slack alerting on failure. ### Detection logging `detector-selector.ts` logs on specific code paths: - **WARN** on fallback to legacy (no detector output): `{detectorCode, sku}` - **WARN** on parity mismatch in `dual` mode: `{sku, legacy, detector}` - **DEBUG** on detector selection (every run): `{sku, screeningAuthority, tradeDetectorMode, selectedSource, detectorOutputPresent, selectedStatus, selectedMeasureCode}` - **INFO/DEBUG** on CRE refinement: `{sku, refinement}` or `{sku, rulesEvaluated, rulesMatched}` ### Rule readiness check - `GET /api/admin/ref/rule-definitions?ruleCode=SECT301_CHECK` — confirms rule is seeded and shows enabled/disabled state - `GET /api/admin/ref/rule-definitions?ruleCode=US_232_CHECK` — same ### Rule loading logging - `loadRulesForContext` emits DEBUG-level log: `{origin, destination, module, enabledOnly, ruleCount, ruleCodes}` - `loadSimaCRERules` emits DEBUG-level log: `{module, ruleCodePrefix, ruleCount, ruleCodes}` ### Known observability gaps - Scope metadata (corridor, origin, destination) is not logged during individual rule evaluation - Per-detection structured log with full rule-evaluation trace remains a follow-on improvement --- ## Pre-Activation Checklist For customer go/no-go: - [ ] Run `scripts/run-cnus-corridor-dry-run.sh` and archive summary artifact (`docs/operations/p7-e-a11-cnus-staging-dry-run-signoff.md`) - [ ] Verify rules seeded: `GET /api/admin/ref/rule-definitions?ruleCode=SECT301_CHECK` returns enabled rule - [ ] Verify rules seeded: `GET /api/admin/ref/rule-definitions?ruleCode=US_232_CHECK` returns enabled rule - [ ] Confirm `TRADE_DETECTOR_MODE` is set to `dual` (not `new`) for initial observation period - [ ] Upload test file with CN-origin products to staging → verify expected behavior in logs - [ ] Monitor logs for 24h activation window - [ ] If clean: promote to `new` mode, proceed to broader rollout per S2 (risk signoff) --- ## Exit Criteria (G6) - [x] Rollback plan documented with tiered controls (this runbook) - [x] On-call owner identified for activation window - [x] Existing observability confirmed (health check + detection logging + rule readiness API) - [x] Activation rehearsal evidence captured for all rollback tiers - [x] Known gaps documented (scope metadata logging depth)