P7 E A7 Cnus Corridor Activation Runbook

Source: docs/operations/p7-e-a7-cnus-corridor-activation-runbook.md

# P7-E-A7: CN→US Corridor Activation Runbook
 
Operational runbook for the CN→US trade corridor. Documents rollback controls, on-call procedures, observability, and activation rehearsal evidence for the trade detection pipeline.
 
**Plan linkage:** `P7-E` Gate `G6` (Operational Readiness)
 
---
 
## Scope
 
This runbook documents operational readiness for the CN→US corridor:
 
- Tiered rollback plan for the trade detection pipeline
- On-call owner and escalation procedures for the activation window
- Existing observability and monitoring surface
- Activation rehearsal evidence for rollback controls
 
---
 
## Current State
 
- **G1–G5 gates complete** — rules seeded, tests pass, corridor isolation proven (PRs #379, #381, #384)
- **Runtime wiring complete** — CN→US corridor rules are evaluated in the trade detection path (PR #388):
  - `loadTradeCorridorRules()` loads enabled non-`SIMA_%` TRADE rules.
  - Workers preload corridor rules for `screeningAuthority='US'` when `TRADE_DETECTOR_MODE !== legacy`.
  - `getFinalTradeOutput()` evaluates corridor rules with `originCountry` + `destinationCountry` context.
  - `dual` mode logs corridor matches; `new` mode applies corridor matches.
- Runtime still requires customer trigger and go/no-go approval for production activation.
 
---
 
## Rollback Plan
 
Three independent rollback tiers, from narrowest to broadest:
 
| Tier | Mechanism | Scope | Time-to-revert | How |
|------|-----------|-------|----------------|-----|
| 1. Tenant-level | `TenantDetectorConfig` | Disable `SIMA_EXPOSURE` for one tenant (disables all trade detection for that tenant, not just corridor rules) | < 2 min | `POST /api/admin/tenants/:id/detectors` with `{ "detectors": [{"detectorCode": "SIMA_EXPOSURE", "enabled": false}] }` |
| 2. Rule-level | `rule_definitions.enabled` | Disable specific rule codes globally | < 2 min | Look up rule: `GET /api/admin/ref/rule-definitions?ruleCode=SECT301_CHECK` (note the `id` and `updatedAt` from response), then `PUT /api/admin/ref/rule-definitions/:id` with `{"enabled": false, "updatedAt": "<updatedAt from GET>"}` — or SQL: `UPDATE app.rule_definitions SET enabled = false WHERE rule_code IN ('SECT301_CHECK', 'US_232_CHECK')` |
| 3. Global | `TRADE_DETECTOR_MODE=legacy` | Bypass entire new detector pipeline | < 5 min | Env var change + service restart on Render |
 
**Data cleanup:** Not needed — corridor rules only flag findings, they do not mutate trade data. Disabling rules prevents new flags; existing flags remain for audit trail.
 
### Rollback Rehearsal Evidence (Automated)
 
- **Tier 1 (tenant-level):** `apps/api/src/__tests__/trade/detector-selector-cre.test.ts` verifies corridor rules are skipped when `SIMA_EXPOSURE` is disabled in tenant detector config.
- **Tier 2 (rule-level):** `apps/api/src/__tests__/rules/loader.integration.test.ts` verifies disabled corridor rules are excluded by `loadTradeCorridorRules()`.
- **Tier 3 (global):** `apps/api/src/__tests__/trade/detector-selector-cre.test.ts` verifies `TRADE_DETECTOR_MODE=legacy` bypasses corridor-rule application.
 
---
 
## On-Call Procedures
 
- **On-call owner:** Engineering lead (Dan) for initial activation window
- **Escalation:** Slack webhook (configured via `STAGING_CHECKS_SLACK_WEBHOOK_URL`, fed by `staging-health-checks.yml` workflow)
- **Monitoring:** Render log dashboard — filter by `detectorCode` or `screeningAuthority` in structured JSON logs
- **Incident response:**
  1. Tier 1 disable (tenant-level) → investigate root cause
  2. Fix and re-enable
  3. Escalate to Tier 3 (global `TRADE_DETECTOR_MODE=legacy`) only if issue is systemic
 
---
 
## Observability (What Exists Today)
 
### Health check
 
`GET /health` endpoint returns `{ status: 'ok', service: 'rgl8r-api', timestamp: '<ISO>' }`. Automated 30-min cron workflow (`staging-health-checks.yml`) with Slack alerting on failure.
 
### Detection logging
 
`detector-selector.ts` logs on specific code paths:
 
- **WARN** on fallback to legacy (no detector output): `{detectorCode, sku}`
- **WARN** on parity mismatch in `dual` mode: `{sku, legacy, detector}`
- **DEBUG** on detector selection (every run): `{sku, screeningAuthority, tradeDetectorMode, selectedSource, detectorOutputPresent, selectedStatus, selectedMeasureCode}`
- **INFO/DEBUG** on CRE refinement: `{sku, refinement}` or `{sku, rulesEvaluated, rulesMatched}`
 
### Rule readiness check
 
- `GET /api/admin/ref/rule-definitions?ruleCode=SECT301_CHECK` — confirms rule is seeded and shows enabled/disabled state
- `GET /api/admin/ref/rule-definitions?ruleCode=US_232_CHECK` — same
 
### Rule loading logging
 
- `loadRulesForContext` emits DEBUG-level log: `{origin, destination, module, enabledOnly, ruleCount, ruleCodes}`
- `loadSimaCRERules` emits DEBUG-level log: `{module, ruleCodePrefix, ruleCount, ruleCodes}`
 
### Known observability gaps
 
- Scope metadata (corridor, origin, destination) is not logged during individual rule evaluation
- Per-detection structured log with full rule-evaluation trace remains a follow-on improvement
 
---
 
## Pre-Activation Checklist
 
For customer go/no-go:
 
- [ ] Run `scripts/run-cnus-corridor-dry-run.sh` and archive summary artifact (`docs/operations/p7-e-a11-cnus-staging-dry-run-signoff.md`)
- [ ] Verify rules seeded: `GET /api/admin/ref/rule-definitions?ruleCode=SECT301_CHECK` returns enabled rule
- [ ] Verify rules seeded: `GET /api/admin/ref/rule-definitions?ruleCode=US_232_CHECK` returns enabled rule
- [ ] Confirm `TRADE_DETECTOR_MODE` is set to `dual` (not `new`) for initial observation period
- [ ] Upload test file with CN-origin products to staging → verify expected behavior in logs
- [ ] Monitor logs for 24h activation window
- [ ] If clean: promote to `new` mode, proceed to broader rollout per S2 (risk signoff)
 
---
 
## Exit Criteria (G6)
 
- [x] Rollback plan documented with tiered controls (this runbook)
- [x] On-call owner identified for activation window
- [x] Existing observability confirmed (health check + detection logging + rule readiness API)
- [x] Activation rehearsal evidence captured for all rollback tiers
- [x] Known gaps documented (scope metadata logging depth)