Ship Ingest Perf

Source: docs/benchmarks/ship-ingest-perf.md

# SHIP CSV Ingestion Performance Benchmark
 
## Purpose
 
Compare CSV parsing performance between `main` (sync/in-memory parser) and PR #172 (`codex/p2a-c-ingestion-streaming`, streaming parser). The streaming parser (`parseShipmentCSVFromFile`) reads from disk row-by-row instead of loading the entire file into memory, targeting reduced peak memory usage at scale.
 
## What Changed (PR #172)
 
- Added `parseShipmentCSVFromFile()` using `csv-parse` async iterator (stream mode)
- Refactored `processShipUploadJob` in `ingest.ts` to call the file-based parser directly (no `readFile` call)
- Extracted shared `parseShipmentRow()` helper to keep behavior parity between sync and stream parsers
 
## Test Methodology
 
### Parser-Only Benchmark
 
The benchmark measures the CSV parser in isolation, without DB writes, API server, or detection logic. This isolates the parsing and memory allocation behavior.
 
1. **Data generation:** `scripts/generate-ship-csv.ts` creates realistic CSV files with configurable row counts. Columns match the expected SHIP CSV format. ~2% of rows contain intentional validation errors (missing fields, bad dates) to exercise error paths.
 
2. **Measurement approach:**
   - **Wall time:** `performance.now()` around parser call
   - **Heap delta:** `process.memoryUsage().heapUsed` before/after parsing
   - **Peak RSS:** `/usr/bin/time -l` (macOS) or `/usr/bin/time -v` (Linux)
   - **Throughput:** rows parsed / wall time
 
3. **Branch comparison:** The script checks out each branch, installs dependencies, and runs the parser. On `main`, it uses `parseShipmentCSV` (sync, full-file read). On the streaming branch, it uses `parseShipmentCSVFromFile` (stream from file).
 
4. **Repetition:** Each configuration runs 3 times (configurable via `RUNS` env var). Median values are reported.
 
### Running the Benchmark
 
```bash
# Default: 100k + 1M rows, 3 runs each, main vs streaming branch
./scripts/bench-ship-ingest.sh
 
# Custom row counts
ROW_COUNTS="100000 500000 1000000" ./scripts/bench-ship-ingest.sh
 
# Custom branches
BRANCHES="main my-feature-branch" ./scripts/bench-ship-ingest.sh
 
# More runs for statistical significance
RUNS=5 ./scripts/bench-ship-ingest.sh
```
 
### Generating Test Data Only
 
```bash
# 100k rows
npx tsx scripts/generate-ship-csv.ts 100000 /tmp/ship-100k.csv
 
# 1M rows
npx tsx scripts/generate-ship-csv.ts 1000000 /tmp/ship-1m.csv
 
# Custom error rate
ERROR_RATE=0.05 npx tsx scripts/generate-ship-csv.ts 100000 /tmp/ship-100k-5pct-errors.csv
```
 
## Final Validation Results (2026-02-13)
 
This is the final P2A-C done-metric validation run after PR #185 merged. It validates current `main` at 1M rows.
 
Command:
 
```bash
BRANCHES="main" ROW_COUNTS="1000000" RUNS=3 ./scripts/bench-ship-ingest.sh
```
 
### Environment
 
| Property | Value |
|----------|-------|
| Machine | arm64 MacBook (Darwin 25.2.0) |
| CPU | Apple M4 |
| RAM | 24 GB |
| Node.js | v24.10.0 |
| OS | macOS (Darwin Kernel 25.2.0) |
| Date | 2026-02-13 20:39 UTC |
 
### Raw Runs (1M rows, `main`)
 
| Run | Wall Time (ms) | Heap Delta (MB) | Peak RSS (MB) | Rows Parsed | Invalid Rows | Rows/sec |
|-----|----------------|-----------------|---------------|-------------|--------------|----------|
| 1 | 6,028 | 1,652.3 | 1,526.9 | 979,991 | 20,009 | 162,573 |
| 2 | 5,930 | 1,677.4 | 1,721.4 | 979,991 | 20,009 | 165,259 |
| 3 | 5,898 | 1,666.3 | 1,567.2 | 979,991 | 20,009 | 166,156 |
 
### Median Summary (3 runs)
 
| Branch | Row Count | Median Wall (ms) | Median Heap Delta (MB) | Median Peak RSS (MB) | Median Rows/sec |
|--------|-----------|------------------|------------------------|----------------------|-----------------|
| `main` | 1,000,000 | 5,930 | 1,666.3 | 1,567.2 | 165,259 |
 
### Notes
 
- This run is parser-only (no DB writes, no API server, no detection pipeline), matching the benchmark method.
- `Invalid Rows` is the parser's skipped/error row count for the run.
- CSV had ~2% intentional invalid rows; parsed row count and error count were stable across runs.
- Prior branch-to-branch comparison work was captured during PR #174; this final run is the closeout validation on merged `main`.
 
## Key Metrics to Watch
 
| Metric | Why It Matters |
|--------|---------------|
| **Peak RSS at 1M rows** | Main goal of streaming: avoid loading entire file into memory |
| **Wall time parity** | Streaming should not be significantly slower |
| **Rows/sec consistency** | Throughput should scale linearly with row count |
| **Error count parity** | Both parsers should report identical error counts |
 
## Caveats
 
- This benchmarks the parser only. Full ingestion (DB upserts, detection, rollup) adds significant time.
- Peak RSS includes Node.js runtime + tsx transpilation overhead (~50-80 MB baseline).
- The streaming parser still accumulates all `ParsedShipment` objects in memory (it does not stream them to DB). The memory savings come from not loading the raw CSV string into memory.
- For true streaming-to-DB, a future iteration would need to yield batches from the parser and write them incrementally.
 
## Files
 
| File | Purpose |
|------|---------|
| `scripts/generate-ship-csv.ts` | CSV test data generator |
| `scripts/bench-ship-ingest.sh` | Benchmark orchestrator |
| `docs/benchmarks/ship-ingest-perf.md` | This file (results template) |