Ship Ingest Perf
Source: docs/benchmarks/ship-ingest-perf.md
# SHIP CSV Ingestion Performance Benchmark
## Purpose
Compare CSV parsing performance between `main` (sync/in-memory parser) and PR #172 (`codex/p2a-c-ingestion-streaming`, streaming parser). The streaming parser (`parseShipmentCSVFromFile`) reads from disk row-by-row instead of loading the entire file into memory, targeting reduced peak memory usage at scale.
## What Changed (PR #172)
- Added `parseShipmentCSVFromFile()` using `csv-parse` async iterator (stream mode)
- Refactored `processShipUploadJob` in `ingest.ts` to call the file-based parser directly (no `readFile` call)
- Extracted shared `parseShipmentRow()` helper to keep behavior parity between sync and stream parsers
## Test Methodology
### Parser-Only Benchmark
The benchmark measures the CSV parser in isolation, without DB writes, API server, or detection logic. This isolates the parsing and memory allocation behavior.
1. **Data generation:** `scripts/generate-ship-csv.ts` creates realistic CSV files with configurable row counts. Columns match the expected SHIP CSV format. ~2% of rows contain intentional validation errors (missing fields, bad dates) to exercise error paths.
2. **Measurement approach:**
- **Wall time:** `performance.now()` around parser call
- **Heap delta:** `process.memoryUsage().heapUsed` before/after parsing
- **Peak RSS:** `/usr/bin/time -l` (macOS) or `/usr/bin/time -v` (Linux)
- **Throughput:** rows parsed / wall time
3. **Branch comparison:** The script checks out each branch, installs dependencies, and runs the parser. On `main`, it uses `parseShipmentCSV` (sync, full-file read). On the streaming branch, it uses `parseShipmentCSVFromFile` (stream from file).
4. **Repetition:** Each configuration runs 3 times (configurable via `RUNS` env var). Median values are reported.
### Running the Benchmark
```bash
# Default: 100k + 1M rows, 3 runs each, main vs streaming branch
./scripts/bench-ship-ingest.sh
# Custom row counts
ROW_COUNTS="100000 500000 1000000" ./scripts/bench-ship-ingest.sh
# Custom branches
BRANCHES="main my-feature-branch" ./scripts/bench-ship-ingest.sh
# More runs for statistical significance
RUNS=5 ./scripts/bench-ship-ingest.sh
```
### Generating Test Data Only
```bash
# 100k rows
npx tsx scripts/generate-ship-csv.ts 100000 /tmp/ship-100k.csv
# 1M rows
npx tsx scripts/generate-ship-csv.ts 1000000 /tmp/ship-1m.csv
# Custom error rate
ERROR_RATE=0.05 npx tsx scripts/generate-ship-csv.ts 100000 /tmp/ship-100k-5pct-errors.csv
```
## Final Validation Results (2026-02-13)
This is the final P2A-C done-metric validation run after PR #185 merged. It validates current `main` at 1M rows.
Command:
```bash
BRANCHES="main" ROW_COUNTS="1000000" RUNS=3 ./scripts/bench-ship-ingest.sh
```
### Environment
| Property | Value |
|----------|-------|
| Machine | arm64 MacBook (Darwin 25.2.0) |
| CPU | Apple M4 |
| RAM | 24 GB |
| Node.js | v24.10.0 |
| OS | macOS (Darwin Kernel 25.2.0) |
| Date | 2026-02-13 20:39 UTC |
### Raw Runs (1M rows, `main`)
| Run | Wall Time (ms) | Heap Delta (MB) | Peak RSS (MB) | Rows Parsed | Invalid Rows | Rows/sec |
|-----|----------------|-----------------|---------------|-------------|--------------|----------|
| 1 | 6,028 | 1,652.3 | 1,526.9 | 979,991 | 20,009 | 162,573 |
| 2 | 5,930 | 1,677.4 | 1,721.4 | 979,991 | 20,009 | 165,259 |
| 3 | 5,898 | 1,666.3 | 1,567.2 | 979,991 | 20,009 | 166,156 |
### Median Summary (3 runs)
| Branch | Row Count | Median Wall (ms) | Median Heap Delta (MB) | Median Peak RSS (MB) | Median Rows/sec |
|--------|-----------|------------------|------------------------|----------------------|-----------------|
| `main` | 1,000,000 | 5,930 | 1,666.3 | 1,567.2 | 165,259 |
### Notes
- This run is parser-only (no DB writes, no API server, no detection pipeline), matching the benchmark method.
- `Invalid Rows` is the parser's skipped/error row count for the run.
- CSV had ~2% intentional invalid rows; parsed row count and error count were stable across runs.
- Prior branch-to-branch comparison work was captured during PR #174; this final run is the closeout validation on merged `main`.
## Key Metrics to Watch
| Metric | Why It Matters |
|--------|---------------|
| **Peak RSS at 1M rows** | Main goal of streaming: avoid loading entire file into memory |
| **Wall time parity** | Streaming should not be significantly slower |
| **Rows/sec consistency** | Throughput should scale linearly with row count |
| **Error count parity** | Both parsers should report identical error counts |
## Caveats
- This benchmarks the parser only. Full ingestion (DB upserts, detection, rollup) adds significant time.
- Peak RSS includes Node.js runtime + tsx transpilation overhead (~50-80 MB baseline).
- The streaming parser still accumulates all `ParsedShipment` objects in memory (it does not stream them to DB). The memory savings come from not loading the raw CSV string into memory.
- For true streaming-to-DB, a future iteration would need to yield batches from the parser and write them incrementally.
## Files
| File | Purpose |
|------|---------|
| `scripts/generate-ship-csv.ts` | CSV test data generator |
| `scripts/bench-ship-ingest.sh` | Benchmark orchestrator |
| `docs/benchmarks/ship-ingest-perf.md` | This file (results template) |