Skip to Content
InternalDocsProposalsRef Schema Proposal

Ref Schema Proposal

Source: docs/proposals/ref-schema-proposal.md

# Reference Data Schema Proposal (`ref.*`) **Author:** Claude Code **Date:** February 2026 **Status:** Draft **Source:** Adapted from `_archived/services/regulatory-ingestion-service/README.md` --- ## Overview This document proposes a `ref` schema for regulatory reference data, separate from the `app` schema for tenant data. The design is based on the archived RIS (Regulatory Ingestion Service) provenance model. ### Addendum: ID Strategy and Cross-Schema FKs (2026-02-26) - Legacy `ref.*` tables retain `cuid()` IDs for seed determinism. - New API-facing `ref.*` tables may use UUID IDs when route-level UUID validation is required (for example, `ref.compliance_obligations`). - Explicit cross-schema foreign keys from tenant-scoped `app.*` tables to `ref.*` are permitted when the app entity represents tenant state bound to global reference records (for example, `app.obligation_evidence -> ref.compliance_obligations`). --- ## Design Principles 1. **Separation of concerns:** Reference data (`ref.*`) is global; tenant data (`app.*`) is isolated 2. **Full provenance:** Every record tracks source, retrieval date, and change history 3. **Temporal validity:** `effective_from` / `effective_to` on all regulatory data 4. **Determinism:** Screening decisions pin to a `DataRelease` version 5. **Coverage transparency:** `coverage_tier` indicates data quality level 6. **Source-specific validation:** Enforce required fields per jurisdiction at ingestion time (not via DB hard requirements) --- ## Schema Overview ``` ref.data_sources -- Where data comes from ref.data_releases -- Versioned bundles (e.g., CA-2026.02) ref.data_change_log -- Audit trail for all changes ref.trade_remedies -- AD/CVD orders (SIMA, US 337, etc.) ref.scope_rulings -- Scope determinations for trade remedies ref.tariff_schedules -- HTS line items with duty rates ref.carrier_rules -- Claim windows, surcharges, service guarantees ref.fta_rules -- FTA rules of origin (Phase 2+) ref.sanctions_entries -- Denied party lists (Phase 2+) ref.golden_test_cases -- Regression test suite ``` --- ## Core Tables ### ref.data_sources Tracks where reference data originates. **Proposed source codes (initial):** | Code | Description | |---|---| | `CBSA_SIMA` | Canada SIMA measures (CBSA) | | `US_DOC` | US Department of Commerce AD/CVD | | `USITC_HTS` | USITC Harmonized Tariff Schedule | | `EU_TARIC` | EU TARIC tariff schedules | | `UK_CDS` | UK Customs Declaration Service | | `OFAC_SDN` | OFAC sanctions list | ```prisma model DataSource { id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid name String @db.VarChar(255) // "CBSA SIMA Measures in Force" code String @db.VarChar(50) // CBSA_SIMA, US_DOC, EU_TARIC, etc. url String? @db.Text // Source URL sourceType String @db.VarChar(50) // government, commercial, manual jurisdiction String? @db.VarChar(10) // CA, US, EU, etc. fetchFrequency String? @db.VarChar(20) // daily, weekly, monthly lastFetchedAt DateTime? lastChangeHash String? @db.VarChar(64) // SHA-256 of last fetch for change detection isActive Boolean @default(true) createdAt DateTime @default(now()) updatedAt DateTime @updatedAt @@unique([code]) @@map("data_sources") @@schema("ref") } ``` ### ref.data_releases Versioned bundles that pin screening decisions to a specific data state. ```prisma model DataRelease { id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid releaseCode String @unique @db.VarChar(50) // "CA-2026.02" jurisdiction String @db.VarChar(10) // CA, US, ALL effectiveFrom DateTime effectiveTo DateTime? releaseNotes String? @db.Text checksums Json // { trade_remedies: "sha256", tariff_schedules: "sha256", ... } status String @default("draft") @db.VarChar(20) // draft, active, superseded createdAt DateTime @default(now()) publishedAt DateTime? publishedBy String? @db.VarChar(255) @@index([jurisdiction, status]) @@map("data_releases") @@schema("ref") } ``` ### ref.data_change_log Immutable audit trail for all reference data changes. ```prisma model DataChangeLog { id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid tableName String @db.VarChar(100) recordId String @db.Uuid changeType String @db.VarChar(20) // insert, update, delete oldValue Json? newValue Json? changeSummary String? @db.Text // Human-readable: "duty_rate: 5% -> 7%" dataHash String? @db.VarChar(64) // SHA-256 of new value changedBy String @db.VarChar(255) // user_id or "system:ai_pipeline" changedAt DateTime @default(now()) reviewedBy String? @db.VarChar(255) reviewedAt DateTime? releaseId String? @db.Uuid // Which release included this change @@index([tableName, recordId]) @@index([changedAt]) @@index([releaseId]) @@map("data_change_log") @@schema("ref") } ``` --- ## Reference Data Tables ### ref.trade_remedies AD/CVD orders (SIMA, US Section 337, EU trade defense, etc.) ```prisma enum CoverageTier { ALPHA // Automated ingest only BETA // Automated + partial SME review VERIFIED // Full SME review + cross-source checks } model TradeRemedy { id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid // Identity // Identity hierarchy: // - sourceCaseId: Raw ID from source (e.g., "A-570-965") // - caseNumber: Canonical normalized ID (may equal sourceCaseId) // - measureCode: Optional jurisdiction-specific code (e.g., "UDS" for CBSA SIMA) caseNumber String @db.VarChar(50) // Canonical case ID (may be derived from sourceCaseId) sourceSystem String @db.VarChar(30) // Controlled via ref.data_sources.code sourceCaseId String @db.VarChar(100) // Raw per-source identifier measureCode String? @db.VarChar(20) // Optional; required only for sources that define it (e.g., SIMA) name String @db.VarChar(255) // "Upholstered Domestic Seating" // Jurisdiction imposingCountry String @db.VarChar(10) // CA, US, EU originCountries String[] // Countries subject to order // Scope caseType String @db.VarChar(30) // antidumping, countervailing, safeguard productScope String @db.Text // Scope description hsCodes String[] // Covered HS codes (10-digit) hsPrefixes String[] // 4-digit prefixes for fast matching materials String[] // Relevant materials (optional) // Rates dutyType String? @db.VarChar(30) // ad_valorem, specific, combination dutyRate Decimal? @db.Decimal(10,4) // Rate if ad valorem dutyAmount Decimal? @db.Decimal(10,2) // Amount if specific estimatedPerUnit Decimal? @db.Decimal(10,2) // Estimated cost per unit // Temporal effectiveFrom DateTime effectiveTo DateTime? // null = still active sunsetDate DateTime? // Scheduled review date status String @db.VarChar(20) // active, suspended, revoked, sunset_pending // Provenance (RIS model) sourceId String @db.Uuid source DataSource @relation(fields: [sourceId], references: [id]) sourceUrl String? @db.Text // Direct link to source document sourcePublishedAt DateTime? // When source published this retrievedAt DateTime @default(now()) // When we fetched it dataHash String? @db.VarChar(64) // SHA-256 for change detection previousVersionId String? @db.Uuid // Link to previous version // Quality coverageTier CoverageTier @default(ALPHA) confidenceScore Int? // 0-100 reviewedBy String? @db.VarChar(255) reviewedAt DateTime? // Release dataReleaseId String? @db.Uuid // Metadata references String[] // Related case numbers, Federal Register citations notes String? @db.Text createdAt DateTime @default(now()) updatedAt DateTime @updatedAt scopeRulings ScopeRuling[] @@unique([imposingCountry, caseNumber, effectiveFrom]) @@unique([sourceSystem, sourceCaseId, effectiveFrom]) @@index([sourceSystem, sourceCaseId]) @@index([imposingCountry, status]) // NOTE: Prisma does not support array indexes in schema. // Create GIN indexes for array fields via raw SQL migration: // CREATE INDEX trade_remedies_hs_prefixes_gin ON ref.trade_remedies USING GIN (hs_prefixes); // CREATE INDEX trade_remedies_origin_countries_gin ON ref.trade_remedies USING GIN (origin_countries); @@index([measureCode]) @@index([dataReleaseId]) @@map("trade_remedies") @@schema("ref") } ``` ### ref.scope_rulings Scope determinations that clarify whether specific products are in/out of scope. ```prisma model ScopeRuling { id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid remedyId String @db.Uuid remedy TradeRemedy @relation(fields: [remedyId], references: [id]) rulingNumber String @db.VarChar(50) rulingDate DateTime productDesc String @db.Text // Product description in ruling determination String @db.VarChar(20) // in_scope, out_of_scope, partial reasoning String? @db.Text // Summary of reasoning // Provenance sourceUrl String? @db.Text retrievedAt DateTime @default(now()) createdAt DateTime @default(now()) @@index([remedyId]) @@index([rulingDate]) @@map("scope_rulings") @@schema("ref") } ``` ### ref.carrier_rules Carrier-specific rules for claims, surcharges, service guarantees. ```prisma model CarrierRule { id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid // Identity carrier String @db.VarChar(50) // UPS, FEDEX, DHL, etc. ruleType String @db.VarChar(50) // claim_window, surcharge, service_guarantee claimType String? @db.VarChar(50) // AMOUNT_VARIANCE, DIM_VARIANCE, etc. // Scope geography String? @db.VarChar(10) // null = global, or country code serviceLevel String? @db.VarChar(50) // null = all, or specific service // Parameters parameters Json // Rule-specific params // For claim_window: { filing_window_days: 15, start_anchor: "invoice_date" } // For surcharge: { surcharge_type: "fuel", rate: 0.15 } // For service_guarantee: { refund_eligible: true, claim_window_days: 15 } // Temporal effectiveFrom DateTime effectiveTo DateTime? // Provenance sourceId String? @db.Uuid sourceUrl String? @db.Text retrievedAt DateTime @default(now()) // Quality coverageTier CoverageTier @default(ALPHA) reviewedBy String? @db.VarChar(255) reviewedAt DateTime? // Release dataReleaseId String? @db.Uuid createdAt DateTime @default(now()) updatedAt DateTime @updatedAt @@unique([carrier, ruleType, claimType, geography, serviceLevel, effectiveFrom]) @@index([carrier, ruleType]) @@index([dataReleaseId]) @@map("carrier_rules") @@schema("ref") } ``` ### ref.tariff_schedules (Phase 2+) HTS line items with duty rates. ```prisma model TariffSchedule { id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid // Identity country String @db.VarChar(10) // CA, US, EU hsCode String @db.VarChar(15) // 10-digit HS code // Description description String @db.Text parentHsCode String? @db.VarChar(15) // For hierarchy navigation // Rates mfnRate Decimal? @db.Decimal(10,4) // Most Favored Nation rate rateType String @db.VarChar(20) // ad_valorem, specific, compound rateExpression String? @db.VarChar(100) // e.g., "5% + $0.50/kg" unitOfMeasure String? @db.VarChar(20) // Temporal effectiveFrom DateTime effectiveTo DateTime? // Provenance sourceId String @db.Uuid retrievedAt DateTime @default(now()) dataHash String? @db.VarChar(64) // Quality coverageTier CoverageTier @default(ALPHA) // Release dataReleaseId String? @db.Uuid createdAt DateTime @default(now()) updatedAt DateTime @updatedAt @@unique([country, hsCode, effectiveFrom]) @@index([country, hsCode]) @@index([dataReleaseId]) @@map("tariff_schedules") @@schema("ref") } ``` ### ref.golden_test_cases Regression test suite for data releases. ```prisma model GoldenTestCase { id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid // Test input testName String @db.VarChar(255) sku String @db.VarChar(100) hsCode String @db.VarChar(15) origin String @db.VarChar(10) destination String @db.VarChar(10) productCategory String? @db.VarChar(50) attributes Json? // Additional test attributes // Expected output expectedOutcome String @db.VarChar(30) // AT_RISK, CLEARED, NEEDS_REVIEW expectedMeasure String? @db.VarChar(20) // Expected measure code if AT_RISK expectedReason String? @db.VarChar(100) // Expected reason code // Metadata notes String? @db.Text addedBy String? @db.VarChar(255) addedAt DateTime @default(now()) lastVerifiedAt DateTime? lastVerifiedRelease String? @db.VarChar(50) // Release code when last verified @@map("golden_test_cases") @@schema("ref") } ``` --- ## Migration Plan ### Step 1: Create ref schema ```sql CREATE SCHEMA IF NOT EXISTS ref; ``` ### Step 2: Add schema to Prisma ```prisma // In schema.prisma generator client { provider = "prisma-client-js" previewFeatures = ["multiSchema"] } datasource db { provider = "postgresql" url = env("DATABASE_URL") schemas = ["app", "ref"] } ``` ### Step 3: Create migration ```bash npx prisma migrate dev --name add_ref_schema ``` ### Step 4: Migrate SIMA data ```typescript // apps/api/scripts/migrate-sima-to-ref.ts import { SIMA_MEASURES } from '@rgl8r/trade' import { prisma } from '../src/lib/db' async function migrateSima() { // Create data source const source = await prisma.dataSource.create({ data: { name: 'CBSA SIMA Measures in Force', code: 'CBSA_SIMA', url: 'https://cbsa-asfc.gc.ca/sima-lmsi/mif-mev/menu-eng.html', sourceType: 'government', jurisdiction: 'CA', fetchFrequency: 'monthly', } }) // Migrate each measure for (const [code, measure] of Object.entries(SIMA_MEASURES)) { await prisma.tradeRemedy.create({ data: { caseNumber: code, sourceSystem: source.code, sourceCaseId: code, measureCode: code, name: measure.name, imposingCountry: 'CA', originCountries: measure.origins, caseType: 'antidumping', // Most SIMA are AD productScope: measure.description, hsCodes: measure.codes, hsPrefixes: measure.prefixes, materials: measure.materials || [], estimatedPerUnit: measure.estimatedAddPerUnit, effectiveFrom: new Date(), // TODO: Get actual effective date status: 'active', sourceId: source.id, coverageTier: 'BETA', // We've reviewed these } }) } } ``` --- ## Linking to Tenant Data Screening decisions in `app.*` should reference the release: ```prisma // In app schema model SIMAExposure { // ... existing fields dataReleaseId String? @db.Uuid // Pin to specific release // Allows: "This AT_RISK was determined using CA-2026.02 data" } model Finding { // ... existing fields dataReleaseId String? @db.Uuid // Pin to specific release } ``` --- ## API Endpoints (Internal) For Phase 1, simple CRUD via admin CLI or internal API: ``` GET /internal/ref/trade-remedies GET /internal/ref/trade-remedies/:id POST /internal/ref/trade-remedies PUT /internal/ref/trade-remedies/:id DELETE /internal/ref/trade-remedies/:id (soft delete) GET /internal/ref/releases POST /internal/ref/releases POST /internal/ref/releases/:id/publish GET /internal/ref/golden-tests POST /internal/ref/golden-tests/run ``` --- ## Definition of Done (Phase 1) - [ ] `ref` schema created in PostgreSQL - [ ] Prisma models for `DataSource`, `DataRelease`, `DataChangeLog`, `TradeRemedy`, `CarrierRule` - [ ] SIMA measures migrated from TypeScript to `ref.trade_remedies` - [ ] Carrier claim configs migrated from `app.carrier_claim_config` to `ref.carrier_rules` - [ ] Detection code reads from `ref.*` instead of hardcoded TypeScript - [ ] No hardcoded regulatory data remains in TypeScript - [ ] Basic admin CLI for CRUD operations - [ ] At least one golden test case passing --- ## Addendum: ID Strategy and Cross-Schema Foreign Keys (P13-0) **Date:** 2026-02-25 **Context:** P13-0 (Compliance Navigator thin-slice spike, PR #471) introduced `ref.compliance_obligations` and `app.obligation_evidence`, establishing two new patterns for the `ref` schema. ### Decision 1: New `ref` models use UUID for API-facing IDs **Existing pattern:** Original `ref` models (`DataSource`, `DataRelease`, `DataChangeLog`) use `@default(dbgenerated("gen_random_uuid()")) @db.Uuid`. `TradeRemedy` and `CarrierRule` also use UUID. `RuleDefinition` uses `@default(cuid())`. **New pattern:** `ComplianceObligation` uses UUID (`@db.Uuid`) to align with the `createProxyHandler` factory in `apps/web/src/lib/proxy.ts`, which validates route params as UUID by default (`params: { id: { validate: 'uuid' } }`). Using cuid would require every proxy route to skip UUID validation or use `params: { id: {} }`. **Migration policy:** Existing `ref` models are NOT migrated from cuid to UUID — the cost outweighs the benefit for models that already have working proxy routes. New `ref` models MUST use UUID. The proxy helper comment block notes that ref models may use either format depending on vintage. ### Decision 2: Cross-schema foreign keys (`app` → `ref`) are permitted **Pattern:** `app.obligation_evidence.obligationId` references `ref.compliance_obligations.id` via a cross-schema FK with `ON DELETE CASCADE`. Prisma supports this with the `multiSchema` preview feature. **Rationale:** Obligation evidence is tenant-scoped (lives in `app` with RLS) but references global obligation definitions (lives in `ref`, no RLS). The FK ensures referential integrity — evidence cannot reference a nonexistent obligation. CASCADE ensures cleanup when obligations are retired. **Guard:** The `schemas` array in `prisma/schema.prisma` generator must list ALL schemas (`["app", "ref"]`). If a schema is missing, `prisma format` strips models annotated with that schema. This is already documented in CLAUDE.md under "Prisma Gotchas." --- ## References - **RIS design:** `_archived/services/regulatory-ingestion-service/README.md` - **Data foundation strategy:** `docs/proposals/data-foundation-strategy.md` - **Current SIMA measures:** `packages/modules/trade/src/config/sima-measures.ts` - **P13-0 spike:** PR #471 (`codex/p13-0-compliance-spike`)