Upload Pipeline & Chunking
The pipeline guarantees that large files (e.g., 1M rows) never crash the Node.js memory heap.
Upload Flow​
Upload API Idempotency & Hashing​
- The API Gateway generates a SHA256 checksum during upload initialization.
- The
fileHashis stored in PostgreSQL to detect duplicate uploads natively and ensure integrity verification for audit workflows. - All orchestration APIs require an
Idempotency-Keyheader (24h Redis TTL).
Contact Upload Queue Isolation​
The API Gateway must NEVER parse large uploads directly. The flow guarantees isolation:
Upload API -> Malware Scan Worker -> Upload Queue (Chunker) -> Validation Worker -> Persistence Worker
Chunking & Parsing Flow​
Import Cancellation Support​
Users can safely cancel an import that is queued or actively processing.
- Stop Future Chunks: Upload workers check if the job is cancelled before pushing more chunks.
- Preserve Valid Data: Any valid contacts already successfully persisted to the DB remain intact.
- Queue Cleanup: Removes the remaining
process-chunkjobs for the specificjobIdfrom BullMQ, preventing orphaned chunks from being processed.
Parsing Strategies​
- CSV: Uses Node.js
fs.createReadStreampiped intocsv-parser. - XLSX: Uses a streaming XLSX parser (like
xlsxorexceljsstreaming API) to avoid buffering the entire workbook in memory.