Ref Schema Proposal
Source: docs/proposals/ref-schema-proposal.md
# Reference Data Schema Proposal (`ref.*`)
**Author:** Claude Code
**Date:** February 2026
**Status:** Draft
**Source:** Adapted from `_archived/services/regulatory-ingestion-service/README.md`
---
## Overview
This document proposes a `ref` schema for regulatory reference data, separate from the `app` schema for tenant data. The design is based on the archived RIS (Regulatory Ingestion Service) provenance model.
### Addendum: ID Strategy and Cross-Schema FKs (2026-02-26)
- Legacy `ref.*` tables retain `cuid()` IDs for seed determinism.
- New API-facing `ref.*` tables may use UUID IDs when route-level UUID validation is required (for example, `ref.compliance_obligations`).
- Explicit cross-schema foreign keys from tenant-scoped `app.*` tables to `ref.*` are permitted when the app entity represents tenant state bound to global reference records (for example, `app.obligation_evidence -> ref.compliance_obligations`).
---
## Design Principles
1. **Separation of concerns:** Reference data (`ref.*`) is global; tenant data (`app.*`) is isolated
2. **Full provenance:** Every record tracks source, retrieval date, and change history
3. **Temporal validity:** `effective_from` / `effective_to` on all regulatory data
4. **Determinism:** Screening decisions pin to a `DataRelease` version
5. **Coverage transparency:** `coverage_tier` indicates data quality level
6. **Source-specific validation:** Enforce required fields per jurisdiction at ingestion time (not via DB hard requirements)
---
## Schema Overview
```
ref.data_sources -- Where data comes from
ref.data_releases -- Versioned bundles (e.g., CA-2026.02)
ref.data_change_log -- Audit trail for all changes
ref.trade_remedies -- AD/CVD orders (SIMA, US 337, etc.)
ref.scope_rulings -- Scope determinations for trade remedies
ref.tariff_schedules -- HTS line items with duty rates
ref.carrier_rules -- Claim windows, surcharges, service guarantees
ref.fta_rules -- FTA rules of origin (Phase 2+)
ref.sanctions_entries -- Denied party lists (Phase 2+)
ref.golden_test_cases -- Regression test suite
```
---
## Core Tables
### ref.data_sources
Tracks where reference data originates.
**Proposed source codes (initial):**
| Code | Description |
|---|---|
| `CBSA_SIMA` | Canada SIMA measures (CBSA) |
| `US_DOC` | US Department of Commerce AD/CVD |
| `USITC_HTS` | USITC Harmonized Tariff Schedule |
| `EU_TARIC` | EU TARIC tariff schedules |
| `UK_CDS` | UK Customs Declaration Service |
| `OFAC_SDN` | OFAC sanctions list |
```prisma
model DataSource {
id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid
name String @db.VarChar(255) // "CBSA SIMA Measures in Force"
code String @db.VarChar(50) // CBSA_SIMA, US_DOC, EU_TARIC, etc.
url String? @db.Text // Source URL
sourceType String @db.VarChar(50) // government, commercial, manual
jurisdiction String? @db.VarChar(10) // CA, US, EU, etc.
fetchFrequency String? @db.VarChar(20) // daily, weekly, monthly
lastFetchedAt DateTime?
lastChangeHash String? @db.VarChar(64) // SHA-256 of last fetch for change detection
isActive Boolean @default(true)
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@unique([code])
@@map("data_sources")
@@schema("ref")
}
```
### ref.data_releases
Versioned bundles that pin screening decisions to a specific data state.
```prisma
model DataRelease {
id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid
releaseCode String @unique @db.VarChar(50) // "CA-2026.02"
jurisdiction String @db.VarChar(10) // CA, US, ALL
effectiveFrom DateTime
effectiveTo DateTime?
releaseNotes String? @db.Text
checksums Json // { trade_remedies: "sha256", tariff_schedules: "sha256", ... }
status String @default("draft") @db.VarChar(20) // draft, active, superseded
createdAt DateTime @default(now())
publishedAt DateTime?
publishedBy String? @db.VarChar(255)
@@index([jurisdiction, status])
@@map("data_releases")
@@schema("ref")
}
```
### ref.data_change_log
Immutable audit trail for all reference data changes.
```prisma
model DataChangeLog {
id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid
tableName String @db.VarChar(100)
recordId String @db.Uuid
changeType String @db.VarChar(20) // insert, update, delete
oldValue Json?
newValue Json?
changeSummary String? @db.Text // Human-readable: "duty_rate: 5% -> 7%"
dataHash String? @db.VarChar(64) // SHA-256 of new value
changedBy String @db.VarChar(255) // user_id or "system:ai_pipeline"
changedAt DateTime @default(now())
reviewedBy String? @db.VarChar(255)
reviewedAt DateTime?
releaseId String? @db.Uuid // Which release included this change
@@index([tableName, recordId])
@@index([changedAt])
@@index([releaseId])
@@map("data_change_log")
@@schema("ref")
}
```
---
## Reference Data Tables
### ref.trade_remedies
AD/CVD orders (SIMA, US Section 337, EU trade defense, etc.)
```prisma
enum CoverageTier {
ALPHA // Automated ingest only
BETA // Automated + partial SME review
VERIFIED // Full SME review + cross-source checks
}
model TradeRemedy {
id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid
// Identity
// Identity hierarchy:
// - sourceCaseId: Raw ID from source (e.g., "A-570-965")
// - caseNumber: Canonical normalized ID (may equal sourceCaseId)
// - measureCode: Optional jurisdiction-specific code (e.g., "UDS" for CBSA SIMA)
caseNumber String @db.VarChar(50) // Canonical case ID (may be derived from sourceCaseId)
sourceSystem String @db.VarChar(30) // Controlled via ref.data_sources.code
sourceCaseId String @db.VarChar(100) // Raw per-source identifier
measureCode String? @db.VarChar(20) // Optional; required only for sources that define it (e.g., SIMA)
name String @db.VarChar(255) // "Upholstered Domestic Seating"
// Jurisdiction
imposingCountry String @db.VarChar(10) // CA, US, EU
originCountries String[] // Countries subject to order
// Scope
caseType String @db.VarChar(30) // antidumping, countervailing, safeguard
productScope String @db.Text // Scope description
hsCodes String[] // Covered HS codes (10-digit)
hsPrefixes String[] // 4-digit prefixes for fast matching
materials String[] // Relevant materials (optional)
// Rates
dutyType String? @db.VarChar(30) // ad_valorem, specific, combination
dutyRate Decimal? @db.Decimal(10,4) // Rate if ad valorem
dutyAmount Decimal? @db.Decimal(10,2) // Amount if specific
estimatedPerUnit Decimal? @db.Decimal(10,2) // Estimated cost per unit
// Temporal
effectiveFrom DateTime
effectiveTo DateTime? // null = still active
sunsetDate DateTime? // Scheduled review date
status String @db.VarChar(20) // active, suspended, revoked, sunset_pending
// Provenance (RIS model)
sourceId String @db.Uuid
source DataSource @relation(fields: [sourceId], references: [id])
sourceUrl String? @db.Text // Direct link to source document
sourcePublishedAt DateTime? // When source published this
retrievedAt DateTime @default(now()) // When we fetched it
dataHash String? @db.VarChar(64) // SHA-256 for change detection
previousVersionId String? @db.Uuid // Link to previous version
// Quality
coverageTier CoverageTier @default(ALPHA)
confidenceScore Int? // 0-100
reviewedBy String? @db.VarChar(255)
reviewedAt DateTime?
// Release
dataReleaseId String? @db.Uuid
// Metadata
references String[] // Related case numbers, Federal Register citations
notes String? @db.Text
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
scopeRulings ScopeRuling[]
@@unique([imposingCountry, caseNumber, effectiveFrom])
@@unique([sourceSystem, sourceCaseId, effectiveFrom])
@@index([sourceSystem, sourceCaseId])
@@index([imposingCountry, status])
// NOTE: Prisma does not support array indexes in schema.
// Create GIN indexes for array fields via raw SQL migration:
// CREATE INDEX trade_remedies_hs_prefixes_gin ON ref.trade_remedies USING GIN (hs_prefixes);
// CREATE INDEX trade_remedies_origin_countries_gin ON ref.trade_remedies USING GIN (origin_countries);
@@index([measureCode])
@@index([dataReleaseId])
@@map("trade_remedies")
@@schema("ref")
}
```
### ref.scope_rulings
Scope determinations that clarify whether specific products are in/out of scope.
```prisma
model ScopeRuling {
id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid
remedyId String @db.Uuid
remedy TradeRemedy @relation(fields: [remedyId], references: [id])
rulingNumber String @db.VarChar(50)
rulingDate DateTime
productDesc String @db.Text // Product description in ruling
determination String @db.VarChar(20) // in_scope, out_of_scope, partial
reasoning String? @db.Text // Summary of reasoning
// Provenance
sourceUrl String? @db.Text
retrievedAt DateTime @default(now())
createdAt DateTime @default(now())
@@index([remedyId])
@@index([rulingDate])
@@map("scope_rulings")
@@schema("ref")
}
```
### ref.carrier_rules
Carrier-specific rules for claims, surcharges, service guarantees.
```prisma
model CarrierRule {
id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid
// Identity
carrier String @db.VarChar(50) // UPS, FEDEX, DHL, etc.
ruleType String @db.VarChar(50) // claim_window, surcharge, service_guarantee
claimType String? @db.VarChar(50) // AMOUNT_VARIANCE, DIM_VARIANCE, etc.
// Scope
geography String? @db.VarChar(10) // null = global, or country code
serviceLevel String? @db.VarChar(50) // null = all, or specific service
// Parameters
parameters Json // Rule-specific params
// For claim_window: { filing_window_days: 15, start_anchor: "invoice_date" }
// For surcharge: { surcharge_type: "fuel", rate: 0.15 }
// For service_guarantee: { refund_eligible: true, claim_window_days: 15 }
// Temporal
effectiveFrom DateTime
effectiveTo DateTime?
// Provenance
sourceId String? @db.Uuid
sourceUrl String? @db.Text
retrievedAt DateTime @default(now())
// Quality
coverageTier CoverageTier @default(ALPHA)
reviewedBy String? @db.VarChar(255)
reviewedAt DateTime?
// Release
dataReleaseId String? @db.Uuid
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@unique([carrier, ruleType, claimType, geography, serviceLevel, effectiveFrom])
@@index([carrier, ruleType])
@@index([dataReleaseId])
@@map("carrier_rules")
@@schema("ref")
}
```
### ref.tariff_schedules (Phase 2+)
HTS line items with duty rates.
```prisma
model TariffSchedule {
id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid
// Identity
country String @db.VarChar(10) // CA, US, EU
hsCode String @db.VarChar(15) // 10-digit HS code
// Description
description String @db.Text
parentHsCode String? @db.VarChar(15) // For hierarchy navigation
// Rates
mfnRate Decimal? @db.Decimal(10,4) // Most Favored Nation rate
rateType String @db.VarChar(20) // ad_valorem, specific, compound
rateExpression String? @db.VarChar(100) // e.g., "5% + $0.50/kg"
unitOfMeasure String? @db.VarChar(20)
// Temporal
effectiveFrom DateTime
effectiveTo DateTime?
// Provenance
sourceId String @db.Uuid
retrievedAt DateTime @default(now())
dataHash String? @db.VarChar(64)
// Quality
coverageTier CoverageTier @default(ALPHA)
// Release
dataReleaseId String? @db.Uuid
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@unique([country, hsCode, effectiveFrom])
@@index([country, hsCode])
@@index([dataReleaseId])
@@map("tariff_schedules")
@@schema("ref")
}
```
### ref.golden_test_cases
Regression test suite for data releases.
```prisma
model GoldenTestCase {
id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid
// Test input
testName String @db.VarChar(255)
sku String @db.VarChar(100)
hsCode String @db.VarChar(15)
origin String @db.VarChar(10)
destination String @db.VarChar(10)
productCategory String? @db.VarChar(50)
attributes Json? // Additional test attributes
// Expected output
expectedOutcome String @db.VarChar(30) // AT_RISK, CLEARED, NEEDS_REVIEW
expectedMeasure String? @db.VarChar(20) // Expected measure code if AT_RISK
expectedReason String? @db.VarChar(100) // Expected reason code
// Metadata
notes String? @db.Text
addedBy String? @db.VarChar(255)
addedAt DateTime @default(now())
lastVerifiedAt DateTime?
lastVerifiedRelease String? @db.VarChar(50) // Release code when last verified
@@map("golden_test_cases")
@@schema("ref")
}
```
---
## Migration Plan
### Step 1: Create ref schema
```sql
CREATE SCHEMA IF NOT EXISTS ref;
```
### Step 2: Add schema to Prisma
```prisma
// In schema.prisma
generator client {
provider = "prisma-client-js"
previewFeatures = ["multiSchema"]
}
datasource db {
provider = "postgresql"
url = env("DATABASE_URL")
schemas = ["app", "ref"]
}
```
### Step 3: Create migration
```bash
npx prisma migrate dev --name add_ref_schema
```
### Step 4: Migrate SIMA data
```typescript
// apps/api/scripts/migrate-sima-to-ref.ts
import { SIMA_MEASURES } from '@rgl8r/trade'
import { prisma } from '../src/lib/db'
async function migrateSima() {
// Create data source
const source = await prisma.dataSource.create({
data: {
name: 'CBSA SIMA Measures in Force',
code: 'CBSA_SIMA',
url: 'https://cbsa-asfc.gc.ca/sima-lmsi/mif-mev/menu-eng.html',
sourceType: 'government',
jurisdiction: 'CA',
fetchFrequency: 'monthly',
}
})
// Migrate each measure
for (const [code, measure] of Object.entries(SIMA_MEASURES)) {
await prisma.tradeRemedy.create({
data: {
caseNumber: code,
sourceSystem: source.code,
sourceCaseId: code,
measureCode: code,
name: measure.name,
imposingCountry: 'CA',
originCountries: measure.origins,
caseType: 'antidumping', // Most SIMA are AD
productScope: measure.description,
hsCodes: measure.codes,
hsPrefixes: measure.prefixes,
materials: measure.materials || [],
estimatedPerUnit: measure.estimatedAddPerUnit,
effectiveFrom: new Date(), // TODO: Get actual effective date
status: 'active',
sourceId: source.id,
coverageTier: 'BETA', // We've reviewed these
}
})
}
}
```
---
## Linking to Tenant Data
Screening decisions in `app.*` should reference the release:
```prisma
// In app schema
model SIMAExposure {
// ... existing fields
dataReleaseId String? @db.Uuid // Pin to specific release
// Allows: "This AT_RISK was determined using CA-2026.02 data"
}
model Finding {
// ... existing fields
dataReleaseId String? @db.Uuid // Pin to specific release
}
```
---
## API Endpoints (Internal)
For Phase 1, simple CRUD via admin CLI or internal API:
```
GET /internal/ref/trade-remedies
GET /internal/ref/trade-remedies/:id
POST /internal/ref/trade-remedies
PUT /internal/ref/trade-remedies/:id
DELETE /internal/ref/trade-remedies/:id (soft delete)
GET /internal/ref/releases
POST /internal/ref/releases
POST /internal/ref/releases/:id/publish
GET /internal/ref/golden-tests
POST /internal/ref/golden-tests/run
```
---
## Definition of Done (Phase 1)
- [ ] `ref` schema created in PostgreSQL
- [ ] Prisma models for `DataSource`, `DataRelease`, `DataChangeLog`, `TradeRemedy`, `CarrierRule`
- [ ] SIMA measures migrated from TypeScript to `ref.trade_remedies`
- [ ] Carrier claim configs migrated from `app.carrier_claim_config` to `ref.carrier_rules`
- [ ] Detection code reads from `ref.*` instead of hardcoded TypeScript
- [ ] No hardcoded regulatory data remains in TypeScript
- [ ] Basic admin CLI for CRUD operations
- [ ] At least one golden test case passing
---
## Addendum: ID Strategy and Cross-Schema Foreign Keys (P13-0)
**Date:** 2026-02-25
**Context:** P13-0 (Compliance Navigator thin-slice spike, PR #471) introduced `ref.compliance_obligations` and `app.obligation_evidence`, establishing two new patterns for the `ref` schema.
### Decision 1: New `ref` models use UUID for API-facing IDs
**Existing pattern:** Original `ref` models (`DataSource`, `DataRelease`, `DataChangeLog`) use `@default(dbgenerated("gen_random_uuid()")) @db.Uuid`. `TradeRemedy` and `CarrierRule` also use UUID. `RuleDefinition` uses `@default(cuid())`.
**New pattern:** `ComplianceObligation` uses UUID (`@db.Uuid`) to align with the `createProxyHandler` factory in `apps/web/src/lib/proxy.ts`, which validates route params as UUID by default (`params: { id: { validate: 'uuid' } }`). Using cuid would require every proxy route to skip UUID validation or use `params: { id: {} }`.
**Migration policy:** Existing `ref` models are NOT migrated from cuid to UUID — the cost outweighs the benefit for models that already have working proxy routes. New `ref` models MUST use UUID. The proxy helper comment block notes that ref models may use either format depending on vintage.
### Decision 2: Cross-schema foreign keys (`app` → `ref`) are permitted
**Pattern:** `app.obligation_evidence.obligationId` references `ref.compliance_obligations.id` via a cross-schema FK with `ON DELETE CASCADE`. Prisma supports this with the `multiSchema` preview feature.
**Rationale:** Obligation evidence is tenant-scoped (lives in `app` with RLS) but references global obligation definitions (lives in `ref`, no RLS). The FK ensures referential integrity — evidence cannot reference a nonexistent obligation. CASCADE ensures cleanup when obligations are retired.
**Guard:** The `schemas` array in `prisma/schema.prisma` generator must list ALL schemas (`["app", "ref"]`). If a schema is missing, `prisma format` strips models annotated with that schema. This is already documented in CLAUDE.md under "Prisma Gotchas."
---
## References
- **RIS design:** `_archived/services/regulatory-ingestion-service/README.md`
- **Data foundation strategy:** `docs/proposals/data-foundation-strategy.md`
- **Current SIMA measures:** `packages/modules/trade/src/config/sima-measures.ts`
- **P13-0 spike:** PR #471 (`codex/p13-0-compliance-spike`)