Data fidelity

The silent data loss in COBOL migrations — and how to prove it didn't happen

IronParse · field notes

The most dangerous bug in a mainframe migration is the one that doesn't crash. A script that converts a COBOL copybook to JSON can run cleanly, produce well-formed output, pass a casual eyeball check — and still have quietly altered the data. No exception, no log line, no failed job. The numbers just stop matching the source, and you usually find out in production, from a regulator or a reconciliation report.

Here are the failure modes that hide, and why each one slips past a naive conversion.

Packed decimals become strings

A field declared PIC S9(7)V99 COMP-3 is a signed, packed-BCD number with two implied decimal places, stored in five bytes — not text. A generic byte-to-string conversion reads those bytes as characters and emits garbage, or "helpfully" stringifies them. Now a premium that was 0012345.67 is a meaningless token, and every downstream calculation built on it is wrong.

Variable-length arrays get flattened

An OCCURS 1 TO 12 DEPENDING ON CVG-CNT clause means the record holds a variable number of coverage entries, governed by a count field elsewhere in the record. A converter that doesn't read the controlling count will grab a fixed slice — often one element — and silently drop the rest. A policy with eight coverages migrates with one. Nothing errors.

REDEFINES picks the wrong view

REDEFINES overlays two interpretations on the same bytes; which one is valid depends on context the bytes themselves don't carry. Resolve it wrong and you've read a date as a code, or a balance as a flag — type-valid, completely incorrect.

The quiet ones: signs, implied decimals, and encoding

Sign nibbles in packed and zoned fields, the implied decimal point in a V clause, and EBCDIC-versus-ASCII collation all live below the surface. Each is invisible to a structural-looking conversion and each corrupts numbers or ordering in ways that pass schema validation but fail arithmetic.

Why "it ran without errors" proves nothing

The common thread: every one of these produces valid-looking output. The JSON parses. The types check. The job exits zero. Absence of an error is not evidence of fidelity — it's just the absence of a crash. To actually know the migration preserved the data, you have to test the one thing that can't be faked.

The test that can't be faked: byte-identical round-trip

Decode the migrated representation back into the original record format and compare bytes. If sha256(re-encoded) == sha256(original), every field — packed decimals, sign nibbles, array boundaries, the lot — was decoded correctly, because a single wrong bit anywhere breaks the hash. This is the load-bearing guarantee behind a parity proof, alongside checks that the structure parses, the field count matches the source, every PIC clause decodes to a concrete type, and the emitted schema compiles.

Run all five and you can make a precise claim: not "it looked fine," but "every field is provably preserved, here is the signed receipt." That's the difference between hoping a migration worked and proving it.

See it on a real record

IronParse runs these checks deterministically and signs a parity receipt. Generate one against a sample ACORD record, or read the spec.

Live parity receipt → The spec

← All insights