Skip to Content
InternalDocsSpikesWayfair Altana Ingestion Spike

Wayfair Altana Ingestion Spike

Source: docs/spikes/wayfair-altana-ingestion-spike.md

# Spike: Wayfair Altana Ingestion Format (P6-B-SPIKE) **Status:** Open (awaiting sample data from Wayfair) **Date:** 2026-02-11 **Author:** Dan + Claude --- ## 1. Summary P6-B requires implementing two capabilities for SIMA compliance that depend on richer attribute data than what the current Wayfair Excel upload provides: 1. **True components detection** -- Identifying when a parent SKU (e.g., "Bedroom Set") contains a component subject to SIMA measures (e.g., a mattress from China). This requires parent-child relationship data that does not exist in the current Wayfair Excel format. 2. **Commercial/residential attribute logic** -- The UDS (Upholstered Domestic Seating) SIMA measure only applies to *domestic* seating. If a product is classified as commercial use, it can be cleared. This requires an `intendedUse` attribute. Wayfair has an **Altana dataset** that provides 300-400 attributes per SKU, which likely contains the data needed for both capabilities. This spike documents what we know, what we need, and provides a framework for ingesting the Altana data when a sample arrives. --- ## 2. What We Know ### Current SIMA Validation Pipeline The existing pipeline works as follows: 1. **Upload:** Wayfair Excel file uploaded via `POST /api/upload`, processed by `processWayfairUpload` handler (`apps/api/src/workers/handlers/wayfair-upload.ts`). 2. **Parse:** `parseWayfairExcelFromBuffer` (`packages/modules/trade/src/wayfair-parser.ts`) parses a multi-row-per-SKU Excel format. Each row contains one attribute (Schema Tag pattern). Column mappings are defined in `packages/modules/trade/src/config/wayfair-columns.ts`. 3. **Validate:** `validateProductAttributes` (`packages/modules/trade/src/validation.ts`) validates three key attributes: - `countryOfOrigin` -- Normalized to ISO 2-letter code - `material` -- Classified into categories (e.g., "upholstered", "foam", "stainless_steel") - `hsCode` -- Validated against known SIMA-relevant HS prefixes with misclassification detection 4. **SIMA Check:** `validateSIMAExposure` / `getSIMAOutcome` (`packages/modules/trade/src/sima.ts`) determines exposure against 8 active CBSA SIMA measures defined in `packages/modules/trade/src/config/sima-measures.ts`. ### Current Attributes Used From `wayfair-columns.ts`, the parser currently extracts: | Attribute | Source Column(s) | Used For | |-----------|-----------------|----------| | `countryOfOrigin` | "Country of Origin", "Country of Origin - Additional Details" | SIMA origin check | | `material` | "Material", "Primary Material", plus 10 other material columns | SIMA material scope check | | `hsCode` | "Wayfair HS codes" | SIMA HS code coverage | | `consensusHsCode` | "Consensus HS Code" | HS confidence cross-reference | | `topPickHsCode` | "Top Pick Code" | HS confidence cross-reference | | `className` | "Class_Name" | Product categorization | | `name` | "Product_Name" | Keyword analysis for HS validation | ### What Is Missing The current pipeline has no concept of: - **Parent-child SKU relationships** -- Whether a SKU is a kit/set containing sub-components - **Intended use** -- Whether a product is for commercial or residential use - **Component inclusion status** -- Whether a component is included in a set or sold separately - **Component-level HS/origin/material** -- Per-component trade attributes for kits These are the exact gaps that the Altana dataset is expected to fill. ### About Altana Altana is a supply chain intelligence platform that enriches product data with trade compliance attributes. Their typical export format includes: - Product identity (SKU, GTIN, name) - Detailed material composition (often multi-level) - Country of origin with supply chain provenance - HS code suggestions with confidence scores - Product taxonomy/categorization - Supplier and manufacturer data - Bill of materials / component relationships (when available) Typical export: CSV with one row per SKU, 300-400+ columns. --- ## 3. Required Attribute Mapping The table below maps RGL8R attribute needs to likely Altana field names. **These are educated guesses** -- actual field names will be confirmed when we receive a sample file. | RGL8R Attribute | Purpose | Likely Altana Field(s) | Confidence | Notes | |----------------|---------|----------------------|------------|-------| | `componentOf` / `parentSkuId` | True components detection | `parent_sku`, `kit_id`, `bom_parent`, `set_id` | LOW | Altana may not expose this directly -- could be a Wayfair-side enrichment | | `intendedUse` | UDS commercial/residential classification | `intended_use`, `market_segment`, `product_use`, `application_type` | MEDIUM | Critical for UDS measure -- "COMMERCIAL" clears the SKU | | `materialComposition` | Material-based SIMA scope (multi-material) | `material_composition`, `primary_material`, `material_detail` | HIGH | Altana typically has detailed material data | | `countryOfOrigin` | SIMA origin check | `country_of_origin`, `coo`, `manufacturing_country` | HIGH | Already used; Altana may provide higher confidence | | `hsCode` | SIMA HS code coverage | `hs_code`, `hts_code`, `tariff_code`, `hs_6`, `hs_10` | HIGH | Altana may provide multiple HS suggestions with confidence | | `hsConfidence` | HS confidence scoring | `hs_confidence`, `classification_confidence` | MEDIUM | Could improve our HS validation confidence threshold | | `supplierCountry` | Supply chain provenance | `supplier_country`, `vendor_country`, `ship_from_country` | MEDIUM | Useful for origin validation cross-check | | `productCategory` | Category-specific attribute requirements | `product_type`, `category`, `taxonomy` | HIGH | Enables category-based attribute completeness rules | | `swivelIndicator` | UDS sub-classification | `swivel`, `swivel_type`, `rotation` | LOW | Very Wayfair-specific; may not exist in Altana | | `inclusionStatus` | Component inclusion (INCLUDED/NOT_INCLUDED) | `included_in_set`, `kit_inclusion`, `bundle_component` | LOW | Likely requires Wayfair-side data, not Altana | ### Attribute Priority for P6-B 1. **Must Have:** `countryOfOrigin`, `hsCode`, `materialComposition` (improved versions of existing) 2. **Must Have for True Components:** `componentOf` / `parentSkuId` (or equivalent BOM data) 3. **Must Have for Commercial/Residential:** `intendedUse` 4. **Nice to Have:** `hsConfidence`, `supplierCountry`, `productCategory` --- ## 4. Expected File Format ### Likely Format Altana typically exports as **CSV** (comma-delimited), though Excel exports are also common. Key format characteristics to confirm: | Property | Expected | Need to Confirm | |----------|----------|-----------------| | File type | CSV (.csv) or Excel (.xlsx) | Yes | | Encoding | UTF-8 | Yes | | Delimiter | Comma (CSV) | Yes | | Header row | Row 1 | Yes | | Row structure | One row per SKU | Yes -- critical difference from current Wayfair format | | Column count | 300-400+ | Yes | | SKU identifier | SKU or GTIN column | Yes -- must match Wayfair SKU format | | Null handling | Empty string or "N/A" | Yes | ### Key Format Difference from Current Pipeline The current Wayfair Excel parser (`wayfair-parser.ts`) handles a **multi-row-per-SKU** format where each row is one attribute (the Schema Tag pattern). The Altana export is expected to be a **flat one-row-per-SKU** format with attributes as columns. This is a fundamental structural difference that affects parser design. ### Information Needed from Wayfair Before implementation can begin, we need: 1. **A sample Altana export file** (even a small subset of 10-20 SKUs with all columns) 2. **Column name mapping documentation** (if Wayfair has customized Altana's default column names) 3. **SKU identifier format** -- Does the Altana export use the same SKU values as the existing Wayfair Excel? 4. **Update frequency** -- How often does Wayfair refresh the Altana dataset? (daily, weekly, ad hoc) 5. **Delivery mechanism** -- File drop, API call, or manual upload? 6. **Subset or full catalog** -- Will we receive the full 1.2M SKU catalog or a filtered subset? --- ## 5. Ingestion Path Options ### Option A: Extend Existing Upload Pipeline (Recommended Starting Point) Use the existing `POST /api/upload` endpoint with an Altana-specific parser. **How it works:** 1. User uploads Altana CSV/Excel via the existing upload endpoint 2. Job processor detects the file format (Altana vs. Wayfair Schema Tag) based on column headers 3. New `parseAltanaExport()` parser transforms the flat format into `NormalizedProduct[]` 4. Existing validation + SIMA pipeline processes the products as before **Changes required:** - New parser: `packages/modules/trade/src/altana-parser.ts` - New column config: `packages/modules/trade/src/config/altana-columns.ts` - Format detection logic in the upload handler (sniff column headers to auto-detect) - Extended `NormalizedProduct` interface to include new attributes (`intendedUse`, `componentOf`, etc.) **Pros:** - Reuses existing job queue, progress tracking, deduplication, and outcome persistence - Minimal API surface change - Single upload path for users **Cons:** - Altana's 300-400 column format may stress the XLSX parser memory for large files - Need robust format auto-detection ### Option B: Dedicated Altana Adapter Create a separate ingestion endpoint and handler for Altana data. **How it works:** 1. New endpoint: `POST /api/altana/upload` (or `POST /api/catalog/upload` with source param) 2. Dedicated handler: `apps/api/src/workers/handlers/altana-upload.ts` 3. Parser optimized for flat CSV (streaming, lower memory footprint) 4. Writes to a staging table first, then merges with existing SIMA results **Pros:** - Clean separation of concerns - Can optimize for CSV streaming (important for 1.2M SKU files) - No risk of breaking existing Wayfair upload path **Cons:** - More code to maintain - Separate API surface to document and test - Need to handle data merging (Altana enrichment + existing Wayfair outcomes) ### Option C: Hybrid -- Enrichment Layer Treat Altana as an **enrichment source** rather than a primary upload. **How it works:** 1. Wayfair Excel remains the primary upload (SKU identity, HS, basic attributes) 2. Altana data is uploaded separately as an enrichment pass 3. Enrichment updates existing `SkuAttribute` records with new columns 4. SIMA reprocessing triggered for enriched SKUs **Pros:** - Cleanest data model (primary source + enrichment layers) - Supports multiple enrichment sources (Altana, Zonos, manual) - Aligns with the `classification_source` field in the Phase 1.5 spec **Cons:** - Two-step process for users (upload Wayfair, then upload Altana) - More complex orchestration ### Recommendation **Start with Option A** unless the Altana format proves incompatible with the existing pipeline (e.g., file too large for in-memory parsing, or SKU identifiers don't match). If scale is an issue (1.2M SKUs in a single file), pivot to **Option B** with streaming CSV support. If the data model clearly separates primary and enrichment sources, evolve toward **Option C** over time. --- ## 6. Gaps and Risks ### Risks | Risk | Severity | Mitigation | |------|----------|------------| | **Altana may not expose parent-child relationships** | HIGH | True components detection is the primary P6-B deliverable. If Altana doesn't include BOM/kit data, we need a fallback: (a) Wayfair provides a separate parent-child mapping file, or (b) we infer components from product names and category taxonomy. | | **`intendedUse` may not exist in Altana** | HIGH | Commercial/residential classification for UDS is critical. If Altana doesn't have this, we need: (a) Wayfair-side enrichment, (b) inference from product category/name (e.g., "Office Chair" -> COMMERCIAL), or (c) manual enrichment workflow. | | **Altana field names may differ from expectations** | MEDIUM | We've mapped likely names but haven't seen actual data. Parser must be configurable (column mapping config, not hardcoded). | | **SKU identifier mismatch** | MEDIUM | Altana may use a different SKU format (GTIN, UPC, or Altana internal ID) than Wayfair. Need a join key strategy. | | **File size at scale** | MEDIUM | A 1.2M SKU Altana export with 400 columns could be 1-2 GB. The current in-memory XLSX parser won't handle this. Need streaming CSV support for production use. | | **Multiple HS code suggestions** | LOW | Altana may provide multiple HS codes with confidence scores. Current pipeline expects a single HS code. Need to decide: use highest confidence, or store all and let the user choose? | | **Data freshness mismatch** | LOW | Altana data may be updated on a different cadence than Wayfair uploads. Need to handle stale enrichment gracefully. | ### Gaps | Gap | Impact | Resolution Path | |-----|--------|-----------------| | No sample Altana file available | Blocks all parser implementation | Waiting on Wayfair to provide sample | | No confirmed field names | Cannot build column mappings | Resolved when sample arrives | | No confirmed delivery mechanism | Cannot design automation | Ask Wayfair about file delivery (SFTP, API, manual) | | No parent-child relationship confirmation | Cannot design true components ingestion | Ask Wayfair specifically about BOM/kit data in Altana | | UDS commercial/residential logic not yet implemented | Blocked on `intendedUse` attribute availability | Implement the logic once we confirm the attribute exists | | No `parsers/` directory exists in trade module | Minor -- parsers currently live at module root | Will organize into `parsers/` when adding Altana parser | --- ## 7. Next Steps (When Sample Data Arrives) ### Immediate (Day 1 with sample) 1. **Inspect the file format** -- CSV vs. Excel, delimiter, encoding, row count, column count 2. **Identify the SKU column** -- Confirm it matches Wayfair SKU format (join key) 3. **Map actual field names** to RGL8R attributes (update the mapping table in Section 3) 4. **Confirm presence of critical fields:** - Parent-child / BOM / kit relationship data - Intended use / market segment / application type - Material composition detail - HS code(s) with confidence ### Short-Term (Week 1) 5. **Build Altana parser prototype** -- `packages/modules/trade/src/altana-parser.ts` 6. **Test with existing pipeline** -- Feed parsed Altana data through `validateProductAttributes` and `getSIMAOutcome` 7. **Measure delta** -- Compare SIMA outcomes between Wayfair-only vs. Wayfair + Altana enrichment 8. **Evaluate Option A vs. B** -- Does the file fit in memory? Does the SKU join work? ### Medium-Term (Week 2-3) 9. **Implement true components ingestion** -- If parent-child data exists, build the component relationship parser 10. **Implement commercial/residential logic** -- If `intendedUse` data exists, add UDS classification logic 11. **Extend `NormalizedProduct` interface** -- Add new attribute fields 12. **Update validation pipeline** -- Add new attribute validators 13. **Write integration tests** -- Using anonymized sample data --- ## 8. Go/No-Go Criteria ### Go Criteria (All Must Be True) Before proceeding with true components and commercial/residential implementation: | # | Criterion | How to Verify | |---|-----------|---------------| | 1 | **Sample Altana file received** from Wayfair | File exists and is parseable | | 2 | **SKU join key confirmed** -- Altana SKUs match Wayfair SKUs | Run a join between sample Altana and existing Wayfair upload; >90% match rate | | 3 | **Parent-child data exists** in Altana export | At least one column contains BOM/kit/set relationship data | | 4 | **Intended use data exists** (or a viable proxy) | A column indicates commercial vs. residential use, or product category taxonomy allows inference | | 5 | **Material composition is richer** than current Wayfair data | Altana provides more granular material classification than the Schema Tag attributes | | 6 | **File can be ingested** through existing or new pipeline | Parser prototype successfully processes the sample file | ### No-Go Conditions (Any Triggers Pivot) | Condition | Pivot Action | |-----------|-------------| | Altana has no parent-child data | Request separate BOM file from Wayfair, or implement name-based component inference | | Altana has no intended use data | Implement category-based inference rules (e.g., "Office" category -> COMMERCIAL) | | SKU match rate below 80% | Investigate ID mapping; may need Wayfair to provide a crosswalk file | | File too large for single upload | Implement streaming CSV parser (Option B) | | Wayfair cannot provide sample within 2 weeks | Proceed with name/category-based heuristics for components and intended use; retrofit Altana later | --- ## References - `packages/modules/trade/src/sima.ts` -- Current SIMA validation logic - `packages/modules/trade/src/config/sima-measures.ts` -- 8 active CBSA SIMA measures - `packages/modules/trade/src/wayfair-parser.ts` -- Current Wayfair Excel parser - `packages/modules/trade/src/config/wayfair-columns.ts` -- Current column mappings - `packages/modules/trade/src/validation.ts` -- Attribute validation (country, material, HS) - `apps/api/src/workers/handlers/wayfair-upload.ts` -- Upload job handler - `specs/rgl8r-phase1.5-wayfair-addendum.md` -- Phase 1.5 Wayfair requirements (true components, attribute completeness) - `docs/proposals/data-foundation-strategy.md` -- Data platform roadmap