COMP-3, REDEFINES, OCCURS DEPENDING ON: a field guide to the clauses that break migrations
A COBOL copybook is not a schema in the modern sense — it is a byte-layout map. Every clause says exactly where a field sits, how many bytes it spans, and how to decode those bytes back into a number, a string, or a structure. A converter that treats the copybook as a loose list of typed fields, instead of an exact physical layout, will read the right bytes the wrong way. The output looks structured and parses cleanly. It is also wrong.
This is a clause-by-clause guide to the constructs that most often survive a migration in name but not in value. For each one: what it means, how it is physically stored, the specific way a naive converter mangles it, and how to verify it actually came through. For the broader argument that error-free output proves nothing, see the silent data loss in COBOL migrations. The exact checks IronParse runs are in the spec.
PIC S9(7)V99 COMP-3 — packed decimal
What it means. A signed numeric, seven digits before an implied decimal point and two after, stored as COMP-3 — packed BCD. How it is stored. Each nibble (half-byte) holds one decimal digit; the final low nibble holds the sign. Digit count is nine, so bytes = floor(9/2) + 1 = 5. The value 12345.67 with a positive sign packs as the five bytes below. How it gets mangled. A byte-to-text converter reads 0x12 0x34 0x56 0x7C as characters and emits control-character garbage, or stringifies the raw bytes. The implied decimal is dropped, the sign nibble is read as a sixth digit, or the value is silently truncated to the printable bytes. How to verify. Re-pack the decoded number to BCD and compare bytes; confirm the decoded value is 12345.67, not 1234567, 123456.7, or a string.
05 PREMIUM-AMT PIC S9(7)V99 COMP-3. raw bytes : 12 34 56 7C (4 of 5 nibble-pairs shown; C = positive sign) correct : +12345.67 common-wrong : "\x124Vt" (read as text) or 1234567 (decimal dropped)
REDEFINES — overlaid views on the same bytes
What it means. Two or more field definitions describing the same storage. Which definition is valid is decided by context the bytes do not carry — usually a record-type flag elsewhere. How it is stored. There is one set of bytes. POLICY-DATA and CLAIM-DATA occupy the identical offset and length; only one interpretation is meaningful for a given record. How it gets mangled. A converter that emits every redefinition produces duplicate, contradictory fields — and a downstream consumer picks the wrong one. Worse, a converter that always picks the first definition reads a claim record through the policy layout: a date becomes a code, a balance becomes a flag. Type-valid, completely incorrect. How to verify. Confirm the chosen view is driven by the discriminator field, and round-trip: the bytes you re-encode from the selected view must equal the original bytes exactly.
05 REC-TYPE PIC X.
05 POLICY-DATA PIC X(40).
05 CLAIM-DATA REDEFINES POLICY-DATA PIC X(40).
REC-TYPE = 'C' -> CLAIM-DATA is the valid view
common-wrong : decode as POLICY-DATA anyway -> 40 bytes read with wrong layout
OCCURS 1 TO 12 DEPENDING ON — variable-length array
What it means. A repeating group whose element count is not fixed — it ranges from one to twelve, and the actual count lives in another field. How it is stored. The record holds exactly CVG-CNT elements back to back; everything after the array shifts by CVG-CNT × element-length. The array's size, and the position of every field that follows it, depend on a value read at runtime. How it gets mangled. A converter that ignores the controlling count grabs a fixed slice — often one element, sometimes the maximum twelve — and either drops real coverages or reads trailing bytes as phantom ones. Because the array length also sets the offset of subsequent fields, getting the count wrong corrupts the entire rest of the record, not just the array. How to verify. Confirm the decoded element count equals the controlling field, and that the first byte after the array aligns with the next defined field. Round-trip catches the rest: a misplaced boundary breaks the hash.
05 CVG-CNT PIC 9(2).
05 COVERAGE OCCURS 1 TO 12 DEPENDING ON CVG-CNT.
10 CVG-CODE PIC X(4).
10 CVG-LIMIT PIC S9(9) COMP-3.
CVG-CNT = 08 -> 8 coverage elements present
common-wrong : read 1 element, drop 7 ; or read 12, invent 4 from later bytes
Sign handling — zoned and packed sign nibbles (overpunch)
What it means. COBOL does not store a separate - character. In packed (COMP-3) fields the sign is the trailing low nibble: C or F positive, D negative. In zoned-decimal (DISPLAY) fields the sign is overpunched onto the high nibble of the last digit byte. How it is stored. A zoned -123 stores 1, 2, then a byte whose high nibble encodes "negative 3" — in EBCDIC, 0xD3, which renders as the letter L. How it gets mangled. A converter that strips the "non-numeric" last character loses the sign and emits 123 instead of -123; a refund becomes a charge. Or it reads the overpunch byte as a literal letter and corrupts the value entirely. How to verify. Decode the sign nibble explicitly and assert the value's sign; round-trip the field so the exact overpunch byte is reproduced.
05 ADJUST-AMT PIC S9(3) DISPLAY. *> zoned decimal value -123 in EBCDIC : F1 F2 D3 (D3 = overpunched negative 3, prints 'L') correct : -123 common-wrong : 123 (sign dropped) or "12L" (overpunch read as a letter)
The implied decimal (V) and sign (S) — bytes that don't exist
What it means. In PIC S9(7)V99 the V marks where the decimal point sits and the S marks the field as signed. Neither consumes a byte. How it is stored. It isn't — both are pure metadata in the copybook. The bytes hold only digits (and, for the sign, a nibble that does double duty). The decimal point and the concept of signedness exist solely in the layout description. How it gets mangled. A converter that decodes "the bytes" without consulting the PIC clause cannot know the point is two places from the right. It emits 1234567 for a value that means 12345.67 — off by a factor of 100, on every monetary field in the file. Likewise, ignore S and an unsigned read turns the sign nibble into a stray digit. How to verify. Assert the scale: a field declared with Vnn must decode with exactly nn fractional digits. Confirm the digit count of the raw value matches 9-positions in the PIC, not 9-positions plus an absorbed point or sign.
05 BALANCE PIC S9(7)V99 COMP-3. *> V and S occupy ZERO bytes
stored digits represent : 1234567 (7 + 2 = 9 digit positions)
correct : 12345.67 (V places the point 2 from the right)
common-wrong : 1234567 (point ignored: off by 100x)
EBCDIC vs ASCII — character translation and collation
What it means. Mainframe text is EBCDIC, not ASCII. The two encodings map characters to different byte values, and they sort in a different order. How it is stored. An A is 0xC1 in EBCDIC and 0x41 in ASCII; a space is 0x40 versus 0x20. In EBCDIC, letters sort before digits — the opposite of ASCII. How it gets mangled. A converter that copies bytes without translating emits mojibake for every text field, and packed/zoned fields decode wrong because the digit-and-sign byte values differ. A converter that translates text but assumes ASCII collation re-sorts keys into a different order, breaking range scans, control-break logic, and any downstream join on a sorted key. How to verify. Confirm declared DISPLAY text decodes to the expected characters under the source code page, and that any sort-dependent logic is validated against EBCDIC collation, not the platform default.
05 POLICY-NO PIC X(6) DISPLAY. EBCDIC bytes : C1 C2 F1 F2 40 40 correct (cp037 -> text) : "AB12 " common-wrong : "\xC1\xC2\xF1\xF2@@" (no translation) collation : EBCDIC sorts letters < digits ; ASCII sorts digits < letters
A checklist you can run against your own migration
For any copybook you are converting, walk the field list and confirm each of these. A single "no" is a field that may be silently corrupted.
- Every
COMP-3field decodes to a number with the correct sign and the correct number of fractional digits — not a string, not an integer 100× too large. - Every
REDEFINESview is selected by its discriminator field, and unused views are not emitted as contradictory duplicates. - Every
OCCURS ... DEPENDING ONarray decodes to exactly the count in its controlling field, and the next field after the array lands on the right byte. - Every signed field — packed nibble or zoned overpunch — reproduces its exact sign byte, including EBCDIC overpunch values.
- Every
Vscale andSsign is honored from the PIC clause, not inferred from byte contents. - All
DISPLAYtext is translated under the correct source code page, and any sort-order logic is checked against EBCDIC collation. - Above all: re-encode the decoded record and compare bytes to the original. If
sha256(re-encoded) == sha256(original), every clause above decoded correctly — because one wrong nibble anywhere breaks the hash.
Stop trusting the field list. Test the bytes.
IronParse decodes each of these clauses deterministically, round-trips the record, and signs a parity receipt. Generate one against a sample record, or read the spec.
Generate a receipt against a sample → The spec