Skip to main content

Upload Pipeline & Chunking

The pipeline guarantees that large files (e.g., 1M rows) never crash the Node.js memory heap.

Upload Flow​

Upload API Idempotency & Hashing​

  • The API Gateway generates a SHA256 checksum during upload initialization.
  • The fileHash is stored in PostgreSQL to detect duplicate uploads natively and ensure integrity verification for audit workflows.
  • All orchestration APIs require an Idempotency-Key header (24h Redis TTL).

Contact Upload Queue Isolation​

The API Gateway must NEVER parse large uploads directly. The flow guarantees isolation: Upload API -> Malware Scan Worker -> Upload Queue (Chunker) -> Validation Worker -> Persistence Worker

Chunking & Parsing Flow​

Import Cancellation Support​

Users can safely cancel an import that is queued or actively processing.

  • Stop Future Chunks: Upload workers check if the job is cancelled before pushing more chunks.
  • Preserve Valid Data: Any valid contacts already successfully persisted to the DB remain intact.
  • Queue Cleanup: Removes the remaining process-chunk jobs for the specific jobId from BullMQ, preventing orphaned chunks from being processed.

Parsing Strategies​

  • CSV: Uses Node.js fs.createReadStream piped into csv-parser.
  • XLSX: Uses a streaming XLSX parser (like xlsx or exceljs streaming API) to avoid buffering the entire workbook in memory.