The nightmare scenario is always the same.
You spend weeks writing migration scripts. You map every field. You run your tests on a small sample file, and everything looks perfect.
Then, on Go-Live weekend, you run the script on the full production database. It runs for 4 hours… and crashes at 99%.
The culprit? A single row in the Price column containing the text “TBD” instead of a number. Or maybe a duplicate ID in a field that was supposed to be unique.
This is the “Dirty Data” problem. And if you don’t audit your data before you write your scripts, you are building on quicksand.
In this guide, we’ll look at the three “Silent Killers” of migration projects and how to spot them instantly without uploading your sensitive data to the cloud.
Killer #1: The “Placeholder” Date
Legacy systems hate empty fields. Instead of leaving a birthday NULL, old software often auto-fills a “Placeholder Date” to satisfy database constraints.
Common offenders include:
1900-01-011970-01-01(The Unix Epoch)9999-12-31(The “Forever” date)
The Risk: If you migrate these values into a modern CRM (like Salesforce or HubSpot), you will trigger thousands of “Happy Birthday” emails to customers who were born in 1900.
The Fix: Check the Value Distribution of your date columns. If 40% of your rows have the exact same date, you need to write a transformation rule to convert that specific date to NULL.
Killer #2: Low Cardinality in “Unique” Fields
Cardinality refers to the uniqueness of data in a column.
- High Cardinality: Values are unique (e.g., Social Security Numbers, User IDs).
- Low Cardinality: Values repeat often (e.g., Country, Status, Gender).
The Risk: You assume the Email column is unique, so you use it as a Primary Key in your new system. But your legacy data contains 50 rows where the email is unknown@example.com or test@test.com.
When you try to import this, your new database will throw a Duplicate Entry Error and reject the entire batch.
Killer #3: The “Mixed Type” Column
Excel is notorious for this. You have a column called Phone_Number. It looks like numbers. But row 4,050 contains a note: “Call after 5pm”.
The Risk:
- SQL Database:
VARCHARcannot becomeINT. Crash. - JSON API: Sends a string instead of a number, breaking the frontend sorting logic.
Tutorial: How to Audit Data (Privately)
Most data profiling tools are cloud-based. But you can’t just upload your customer’s PII (Personally Identifiable Information) to a random website to check for nulls. That’s a GDPR violation waiting to happen.
We built a Local Data Profiler that runs entirely in your browser. It gives you an X-Ray of your CSV without the data ever leaving your device.
Step 1: Upload & Scan
Drag your CSV export into the Profiler. The tool calculates statistics on the fly using Web Workers.
Step 2: Check “Completeness”
Look at the Null Rate for every column.
- Scenario: Your target system requires
Last Name. - Audit: The Profiler shows
Last Nameis only 85% Complete. - Action: You now know you must write a script to fill the missing 15% with a placeholder like “Unknown.”
Step 3: Check “Top 5 Values”
Look at the distribution card for your Phone column.
- Audit: The tool shows the most common value is
N/A(appearing 500 times). - Action: Add a logic step:
IF phone == 'N/A' THEN set NULL.
[Try the Enterprise Data Profiler]
- Spot Nulls & Duplicates
- Identify “Placeholder” Data
- 100% Client-Side Privacy
Don’t let a single bad row ruin your Go-Live weekend. Audit first, migrate second.