Skip to main content

Contact Upsert & Deduplication Strategy

The platform maintains a unique tenant-scoped contact database. A strict database-level constraint UNIQUE(vendorId, mobile) dictates that the same normalized E.164 number under the same tenant represents the same identity.

Duplicate Handling Modes (Merge Strategy)​

When initiating an import, the user can select a merge strategy. MERGE_LABELS is the default.

  1. SKIP_DUPLICATES: If a contact exists, ignore the new row entirely.
  2. MERGE_LABELS (Default): Preserve existing contact data but merge new labels. Does not duplicate labels.
  3. UPDATE_METADATA: Overwrite display name and attributes based on Metadata Conflict Rules.
  4. FULL_REPLACE (Restricted): Aggressively replace all metadata. This is considered dangerous and requires Admin/Advanced user privileges and explicit confirmation.

Metadata Conflict Resolution Rules​

To prevent lower-quality imports from overwriting rich data, the UPDATE_METADATA strategy evaluates deterministic confidence scores:

IF existing.source = 'MANUAL'
preserve existing (do not overwrite)

ELSE IF length(incoming.name) > length(COALESCE(existing.name, '')) AND incoming.name != ''
update existing

ELSE
preserve existing

Precedence Priority​

  1. Priority 1 (Manual Superiority): MANUAL sources are fully protected from automated import overwrites.
  2. Priority 2 (String Quality): If both are non-manual (e.g., IMPORT or API), the longer, richer string wins (e.g., "Rajiv Kumar" overwrites "Rajiv").

Optional Metadata Extensions (Future Foundation)​

The schema natively supports future-proofing attributes:

  • normalizedName: Lowercase/stripped name stored for search optimization and fuzzy deduplication mapping (e.g. "Rajiv Kumar" -> "rajivkumar").
  • lastImportedAt: Timestamp tracking the last successful import merge for stale-contact cleanup and engagement analytics.

Contact Source Attribution​

Every contact tracks its origin via the source field.

  • IMPORT: Created via bulk file upload.
  • MANUAL: Created directly in the UI.
  • API: Synced via an external integration.
  • SYNC: Synced via third-party providers.

Import Conflict Preview​

During the PENDING_REVIEW phase, the /api/v1/contacts/import/:id/preview API generates a conflict summary:

{
"jobStatus": "PENDING_REVIEW",
"conflictSummary": {
"totalNewContacts": 850,
"totalExistingContacts": 150,
"contactsWithUpdatedNames": 20,
"contactsWithNewLabels": 35
},
"validSample": [...],
"errorSample": [...]
}

This is calculated by the ValidationWorker querying the DB for existing E.164 numbers before pushing the chunk payload.

Parameterized PostgreSQL Bulk UPSERT Strategy​

For massive scale, the PersistenceWorker strictly uses PostgreSQL bulk UPSERTs with parameterized raw SQL via Prisma.$executeRaw. Prisma.$executeRawUnsafe and string concatenation are strictly prohibited to prevent SQL injection during CSV ingestion.

// Example using Prisma.sql and parameterized prepared statements
const query = Prisma.sql`
INSERT INTO "Contact" ("id", "vendorId", "mobile", "name", "source", "lastImportedAt")
VALUES ${Prisma.join(
contacts.map(c => Prisma.sql`(${c.id}, ${c.vendorId}, ${c.mobile}, ${c.name}, ${c.source}, NOW())`)
)}
ON CONFLICT ("vendorId", "mobile")
DO UPDATE SET
"name" = CASE
WHEN "Contact"."source" = 'MANUAL' THEN "Contact"."name"
WHEN length(EXCLUDED."name") > length(COALESCE("Contact"."name", '')) THEN EXCLUDED."name"
ELSE "Contact"."name"
END,
"lastImportedAt" = NOW(),
"updatedAt" = NOW();
`;
await this.prisma.$executeRaw(query);

This guarantees SQL-injection-safe transactional chunk processing, atomic updates, and race-condition-safe persistence.

Soft Delete Lifecycle​

Contacts are never hard deleted immediately to preserve campaign history and message logs.

  • Contacts feature a deletedAt timestamp.
  • Deleted contacts are filtered out of broadcast campaigns but remain visible in historical message logs.
  • Users can execute a recovery flow to restore soft-deleted contacts.

Auditability & History​

All modifications trigger an audit event to track:

  • Previous vs. New Display Name
  • Merged Labels
  • Import Source & Timestamp

These logs are stored in ContactAuditLog. Retention Policy: Audit logs are retained for 90 days. A scheduled cleanup job archives and purges logs older than 90 days.

Future Import Versioning (Architectural Foundation)​

While not fully implemented yet, the architecture supports import lineage. Future iterations will link contacts to specific import batch versions (e.g., "Wedding Import v1" vs "v2") using a lightweight mapping table, allowing users to trace exactly which import job generated or modified a specific contact attribute.