Wayfair Altana Ingestion Spike
Source: docs/spikes/wayfair-altana-ingestion-spike.md
# Spike: Wayfair Altana Ingestion Format (P6-B-SPIKE)
**Status:** Open (awaiting sample data from Wayfair)
**Date:** 2026-02-11
**Author:** Dan + Claude
---
## 1. Summary
P6-B requires implementing two capabilities for SIMA compliance that depend on richer attribute data than what the current Wayfair Excel upload provides:
1. **True components detection** -- Identifying when a parent SKU (e.g., "Bedroom Set") contains a component subject to SIMA measures (e.g., a mattress from China). This requires parent-child relationship data that does not exist in the current Wayfair Excel format.
2. **Commercial/residential attribute logic** -- The UDS (Upholstered Domestic Seating) SIMA measure only applies to *domestic* seating. If a product is classified as commercial use, it can be cleared. This requires an `intendedUse` attribute.
Wayfair has an **Altana dataset** that provides 300-400 attributes per SKU, which likely contains the data needed for both capabilities. This spike documents what we know, what we need, and provides a framework for ingesting the Altana data when a sample arrives.
---
## 2. What We Know
### Current SIMA Validation Pipeline
The existing pipeline works as follows:
1. **Upload:** Wayfair Excel file uploaded via `POST /api/upload`, processed by `processWayfairUpload` handler (`apps/api/src/workers/handlers/wayfair-upload.ts`).
2. **Parse:** `parseWayfairExcelFromBuffer` (`packages/modules/trade/src/wayfair-parser.ts`) parses a multi-row-per-SKU Excel format. Each row contains one attribute (Schema Tag pattern). Column mappings are defined in `packages/modules/trade/src/config/wayfair-columns.ts`.
3. **Validate:** `validateProductAttributes` (`packages/modules/trade/src/validation.ts`) validates three key attributes:
- `countryOfOrigin` -- Normalized to ISO 2-letter code
- `material` -- Classified into categories (e.g., "upholstered", "foam", "stainless_steel")
- `hsCode` -- Validated against known SIMA-relevant HS prefixes with misclassification detection
4. **SIMA Check:** `validateSIMAExposure` / `getSIMAOutcome` (`packages/modules/trade/src/sima.ts`) determines exposure against 8 active CBSA SIMA measures defined in `packages/modules/trade/src/config/sima-measures.ts`.
### Current Attributes Used
From `wayfair-columns.ts`, the parser currently extracts:
| Attribute | Source Column(s) | Used For |
|-----------|-----------------|----------|
| `countryOfOrigin` | "Country of Origin", "Country of Origin - Additional Details" | SIMA origin check |
| `material` | "Material", "Primary Material", plus 10 other material columns | SIMA material scope check |
| `hsCode` | "Wayfair HS codes" | SIMA HS code coverage |
| `consensusHsCode` | "Consensus HS Code" | HS confidence cross-reference |
| `topPickHsCode` | "Top Pick Code" | HS confidence cross-reference |
| `className` | "Class_Name" | Product categorization |
| `name` | "Product_Name" | Keyword analysis for HS validation |
### What Is Missing
The current pipeline has no concept of:
- **Parent-child SKU relationships** -- Whether a SKU is a kit/set containing sub-components
- **Intended use** -- Whether a product is for commercial or residential use
- **Component inclusion status** -- Whether a component is included in a set or sold separately
- **Component-level HS/origin/material** -- Per-component trade attributes for kits
These are the exact gaps that the Altana dataset is expected to fill.
### About Altana
Altana is a supply chain intelligence platform that enriches product data with trade compliance attributes. Their typical export format includes:
- Product identity (SKU, GTIN, name)
- Detailed material composition (often multi-level)
- Country of origin with supply chain provenance
- HS code suggestions with confidence scores
- Product taxonomy/categorization
- Supplier and manufacturer data
- Bill of materials / component relationships (when available)
Typical export: CSV with one row per SKU, 300-400+ columns.
---
## 3. Required Attribute Mapping
The table below maps RGL8R attribute needs to likely Altana field names. **These are educated guesses** -- actual field names will be confirmed when we receive a sample file.
| RGL8R Attribute | Purpose | Likely Altana Field(s) | Confidence | Notes |
|----------------|---------|----------------------|------------|-------|
| `componentOf` / `parentSkuId` | True components detection | `parent_sku`, `kit_id`, `bom_parent`, `set_id` | LOW | Altana may not expose this directly -- could be a Wayfair-side enrichment |
| `intendedUse` | UDS commercial/residential classification | `intended_use`, `market_segment`, `product_use`, `application_type` | MEDIUM | Critical for UDS measure -- "COMMERCIAL" clears the SKU |
| `materialComposition` | Material-based SIMA scope (multi-material) | `material_composition`, `primary_material`, `material_detail` | HIGH | Altana typically has detailed material data |
| `countryOfOrigin` | SIMA origin check | `country_of_origin`, `coo`, `manufacturing_country` | HIGH | Already used; Altana may provide higher confidence |
| `hsCode` | SIMA HS code coverage | `hs_code`, `hts_code`, `tariff_code`, `hs_6`, `hs_10` | HIGH | Altana may provide multiple HS suggestions with confidence |
| `hsConfidence` | HS confidence scoring | `hs_confidence`, `classification_confidence` | MEDIUM | Could improve our HS validation confidence threshold |
| `supplierCountry` | Supply chain provenance | `supplier_country`, `vendor_country`, `ship_from_country` | MEDIUM | Useful for origin validation cross-check |
| `productCategory` | Category-specific attribute requirements | `product_type`, `category`, `taxonomy` | HIGH | Enables category-based attribute completeness rules |
| `swivelIndicator` | UDS sub-classification | `swivel`, `swivel_type`, `rotation` | LOW | Very Wayfair-specific; may not exist in Altana |
| `inclusionStatus` | Component inclusion (INCLUDED/NOT_INCLUDED) | `included_in_set`, `kit_inclusion`, `bundle_component` | LOW | Likely requires Wayfair-side data, not Altana |
### Attribute Priority for P6-B
1. **Must Have:** `countryOfOrigin`, `hsCode`, `materialComposition` (improved versions of existing)
2. **Must Have for True Components:** `componentOf` / `parentSkuId` (or equivalent BOM data)
3. **Must Have for Commercial/Residential:** `intendedUse`
4. **Nice to Have:** `hsConfidence`, `supplierCountry`, `productCategory`
---
## 4. Expected File Format
### Likely Format
Altana typically exports as **CSV** (comma-delimited), though Excel exports are also common. Key format characteristics to confirm:
| Property | Expected | Need to Confirm |
|----------|----------|-----------------|
| File type | CSV (.csv) or Excel (.xlsx) | Yes |
| Encoding | UTF-8 | Yes |
| Delimiter | Comma (CSV) | Yes |
| Header row | Row 1 | Yes |
| Row structure | One row per SKU | Yes -- critical difference from current Wayfair format |
| Column count | 300-400+ | Yes |
| SKU identifier | SKU or GTIN column | Yes -- must match Wayfair SKU format |
| Null handling | Empty string or "N/A" | Yes |
### Key Format Difference from Current Pipeline
The current Wayfair Excel parser (`wayfair-parser.ts`) handles a **multi-row-per-SKU** format where each row is one attribute (the Schema Tag pattern). The Altana export is expected to be a **flat one-row-per-SKU** format with attributes as columns.
This is a fundamental structural difference that affects parser design.
### Information Needed from Wayfair
Before implementation can begin, we need:
1. **A sample Altana export file** (even a small subset of 10-20 SKUs with all columns)
2. **Column name mapping documentation** (if Wayfair has customized Altana's default column names)
3. **SKU identifier format** -- Does the Altana export use the same SKU values as the existing Wayfair Excel?
4. **Update frequency** -- How often does Wayfair refresh the Altana dataset? (daily, weekly, ad hoc)
5. **Delivery mechanism** -- File drop, API call, or manual upload?
6. **Subset or full catalog** -- Will we receive the full 1.2M SKU catalog or a filtered subset?
---
## 5. Ingestion Path Options
### Option A: Extend Existing Upload Pipeline (Recommended Starting Point)
Use the existing `POST /api/upload` endpoint with an Altana-specific parser.
**How it works:**
1. User uploads Altana CSV/Excel via the existing upload endpoint
2. Job processor detects the file format (Altana vs. Wayfair Schema Tag) based on column headers
3. New `parseAltanaExport()` parser transforms the flat format into `NormalizedProduct[]`
4. Existing validation + SIMA pipeline processes the products as before
**Changes required:**
- New parser: `packages/modules/trade/src/altana-parser.ts`
- New column config: `packages/modules/trade/src/config/altana-columns.ts`
- Format detection logic in the upload handler (sniff column headers to auto-detect)
- Extended `NormalizedProduct` interface to include new attributes (`intendedUse`, `componentOf`, etc.)
**Pros:**
- Reuses existing job queue, progress tracking, deduplication, and outcome persistence
- Minimal API surface change
- Single upload path for users
**Cons:**
- Altana's 300-400 column format may stress the XLSX parser memory for large files
- Need robust format auto-detection
### Option B: Dedicated Altana Adapter
Create a separate ingestion endpoint and handler for Altana data.
**How it works:**
1. New endpoint: `POST /api/altana/upload` (or `POST /api/catalog/upload` with source param)
2. Dedicated handler: `apps/api/src/workers/handlers/altana-upload.ts`
3. Parser optimized for flat CSV (streaming, lower memory footprint)
4. Writes to a staging table first, then merges with existing SIMA results
**Pros:**
- Clean separation of concerns
- Can optimize for CSV streaming (important for 1.2M SKU files)
- No risk of breaking existing Wayfair upload path
**Cons:**
- More code to maintain
- Separate API surface to document and test
- Need to handle data merging (Altana enrichment + existing Wayfair outcomes)
### Option C: Hybrid -- Enrichment Layer
Treat Altana as an **enrichment source** rather than a primary upload.
**How it works:**
1. Wayfair Excel remains the primary upload (SKU identity, HS, basic attributes)
2. Altana data is uploaded separately as an enrichment pass
3. Enrichment updates existing `SkuAttribute` records with new columns
4. SIMA reprocessing triggered for enriched SKUs
**Pros:**
- Cleanest data model (primary source + enrichment layers)
- Supports multiple enrichment sources (Altana, Zonos, manual)
- Aligns with the `classification_source` field in the Phase 1.5 spec
**Cons:**
- Two-step process for users (upload Wayfair, then upload Altana)
- More complex orchestration
### Recommendation
**Start with Option A** unless the Altana format proves incompatible with the existing pipeline (e.g., file too large for in-memory parsing, or SKU identifiers don't match). If scale is an issue (1.2M SKUs in a single file), pivot to **Option B** with streaming CSV support. If the data model clearly separates primary and enrichment sources, evolve toward **Option C** over time.
---
## 6. Gaps and Risks
### Risks
| Risk | Severity | Mitigation |
|------|----------|------------|
| **Altana may not expose parent-child relationships** | HIGH | True components detection is the primary P6-B deliverable. If Altana doesn't include BOM/kit data, we need a fallback: (a) Wayfair provides a separate parent-child mapping file, or (b) we infer components from product names and category taxonomy. |
| **`intendedUse` may not exist in Altana** | HIGH | Commercial/residential classification for UDS is critical. If Altana doesn't have this, we need: (a) Wayfair-side enrichment, (b) inference from product category/name (e.g., "Office Chair" -> COMMERCIAL), or (c) manual enrichment workflow. |
| **Altana field names may differ from expectations** | MEDIUM | We've mapped likely names but haven't seen actual data. Parser must be configurable (column mapping config, not hardcoded). |
| **SKU identifier mismatch** | MEDIUM | Altana may use a different SKU format (GTIN, UPC, or Altana internal ID) than Wayfair. Need a join key strategy. |
| **File size at scale** | MEDIUM | A 1.2M SKU Altana export with 400 columns could be 1-2 GB. The current in-memory XLSX parser won't handle this. Need streaming CSV support for production use. |
| **Multiple HS code suggestions** | LOW | Altana may provide multiple HS codes with confidence scores. Current pipeline expects a single HS code. Need to decide: use highest confidence, or store all and let the user choose? |
| **Data freshness mismatch** | LOW | Altana data may be updated on a different cadence than Wayfair uploads. Need to handle stale enrichment gracefully. |
### Gaps
| Gap | Impact | Resolution Path |
|-----|--------|-----------------|
| No sample Altana file available | Blocks all parser implementation | Waiting on Wayfair to provide sample |
| No confirmed field names | Cannot build column mappings | Resolved when sample arrives |
| No confirmed delivery mechanism | Cannot design automation | Ask Wayfair about file delivery (SFTP, API, manual) |
| No parent-child relationship confirmation | Cannot design true components ingestion | Ask Wayfair specifically about BOM/kit data in Altana |
| UDS commercial/residential logic not yet implemented | Blocked on `intendedUse` attribute availability | Implement the logic once we confirm the attribute exists |
| No `parsers/` directory exists in trade module | Minor -- parsers currently live at module root | Will organize into `parsers/` when adding Altana parser |
---
## 7. Next Steps (When Sample Data Arrives)
### Immediate (Day 1 with sample)
1. **Inspect the file format** -- CSV vs. Excel, delimiter, encoding, row count, column count
2. **Identify the SKU column** -- Confirm it matches Wayfair SKU format (join key)
3. **Map actual field names** to RGL8R attributes (update the mapping table in Section 3)
4. **Confirm presence of critical fields:**
- Parent-child / BOM / kit relationship data
- Intended use / market segment / application type
- Material composition detail
- HS code(s) with confidence
### Short-Term (Week 1)
5. **Build Altana parser prototype** -- `packages/modules/trade/src/altana-parser.ts`
6. **Test with existing pipeline** -- Feed parsed Altana data through `validateProductAttributes` and `getSIMAOutcome`
7. **Measure delta** -- Compare SIMA outcomes between Wayfair-only vs. Wayfair + Altana enrichment
8. **Evaluate Option A vs. B** -- Does the file fit in memory? Does the SKU join work?
### Medium-Term (Week 2-3)
9. **Implement true components ingestion** -- If parent-child data exists, build the component relationship parser
10. **Implement commercial/residential logic** -- If `intendedUse` data exists, add UDS classification logic
11. **Extend `NormalizedProduct` interface** -- Add new attribute fields
12. **Update validation pipeline** -- Add new attribute validators
13. **Write integration tests** -- Using anonymized sample data
---
## 8. Go/No-Go Criteria
### Go Criteria (All Must Be True)
Before proceeding with true components and commercial/residential implementation:
| # | Criterion | How to Verify |
|---|-----------|---------------|
| 1 | **Sample Altana file received** from Wayfair | File exists and is parseable |
| 2 | **SKU join key confirmed** -- Altana SKUs match Wayfair SKUs | Run a join between sample Altana and existing Wayfair upload; >90% match rate |
| 3 | **Parent-child data exists** in Altana export | At least one column contains BOM/kit/set relationship data |
| 4 | **Intended use data exists** (or a viable proxy) | A column indicates commercial vs. residential use, or product category taxonomy allows inference |
| 5 | **Material composition is richer** than current Wayfair data | Altana provides more granular material classification than the Schema Tag attributes |
| 6 | **File can be ingested** through existing or new pipeline | Parser prototype successfully processes the sample file |
### No-Go Conditions (Any Triggers Pivot)
| Condition | Pivot Action |
|-----------|-------------|
| Altana has no parent-child data | Request separate BOM file from Wayfair, or implement name-based component inference |
| Altana has no intended use data | Implement category-based inference rules (e.g., "Office" category -> COMMERCIAL) |
| SKU match rate below 80% | Investigate ID mapping; may need Wayfair to provide a crosswalk file |
| File too large for single upload | Implement streaming CSV parser (Option B) |
| Wayfair cannot provide sample within 2 weeks | Proceed with name/category-based heuristics for components and intended use; retrofit Altana later |
---
## References
- `packages/modules/trade/src/sima.ts` -- Current SIMA validation logic
- `packages/modules/trade/src/config/sima-measures.ts` -- 8 active CBSA SIMA measures
- `packages/modules/trade/src/wayfair-parser.ts` -- Current Wayfair Excel parser
- `packages/modules/trade/src/config/wayfair-columns.ts` -- Current column mappings
- `packages/modules/trade/src/validation.ts` -- Attribute validation (country, material, HS)
- `apps/api/src/workers/handlers/wayfair-upload.ts` -- Upload job handler
- `specs/rgl8r-phase1.5-wayfair-addendum.md` -- Phase 1.5 Wayfair requirements (true components, attribute completeness)
- `docs/proposals/data-foundation-strategy.md` -- Data platform roadmap