v1.0.x carryforward of ADR-034 T0 score-match wiring + ADR-039 gate-3 invariant-scaffold unskip; narrow supersession with explicit v1.1.x landing condition
Superseded on one or more axes by ADR-058. The body below retains its original prose per the ADR-073 immutability rule; the corrected position lives in the superseding ADR. See the Decisions index to navigate.
ADR-051: v1.0.x carryforward of ADR-034 T0 + ADR-039 gate 3; narrow supersession
Status
Accepted (2026-05-18). Block A closed at v1.0.9 via ADR-058. Block B (38 invariant-test stubs unskip per ADR-039 gate 3) remains carryforward to v1.1.x.
Context
REPO_AUDIT_2026-05-18 (committed at 9ed7dd9 on 2026-05-18 morning) declared the repo “not submission-ready” against ADR-039 / Phase 5 gates. The audit’s P0 + P1 sections flagged 8 blocking surfaces. The v1.0.0 + v1.0.1 tags closed 6 of those surfaces by direct implementation:
- WRITEUP/EVIDENCE/HYPERPARAMETER_DISCLOSURE/SPEC_SHEET/NEXT_STEPS /THREAT_MODEL/REPRODUCIBILITY placeholder sweep.
- Quarto 7-spoke split + render allowlist + secrets-free render.
- CI Python 3.13 bump.
- Single-class slice filter at source (per ADR-005 + WRITEUP §Methodology caveats).
- HF Hub publish (canonical fold0/seed42 per rung; ADR-032 model card discipline; live at v1.0.1).
- ADR-049/050 frontmatter closure metadata + SPEC_SHEET rung language alignment.
- Push + green CI + green Publish + v1.0.0 tag + reviewer URLs at HTTP 200 + v0.9.0-rc1/rc2/rc3 rehearsal-tag trail.
The two remaining gaps are governance-grade: the audit explicitly said “implement OR supersede via ADR”. v1.0.0 + v1.0.1 documented carryforward intent in WRITEUP/reproducibility.md + tests/test_invariants.py module docstrings but did not write the supersession ADR. This ADR closes that loop.
Decision
Narrow supersession of ADR-034 (T0 tier) and ADR-039 (gate 3) with explicit v1.1.x landing conditions.
Block A — ADR-034 T0 score-match wiring → v1.1.x
The HF Hub publish half of T0 lands at v1.0.1 (per ADR-032 deliverable). The script body of scripts/eval_from_hub.py remains a scaffold that exits 2 with a clear message pointing at this ADR + the WRITEUP/reproducibility.md T0 maintainer note.
v1.1.x landing condition: make eval-from-hub RUNG=frozen-probe + RUNG=lora exit 0 with score-match summary within 1e-4 absolute tolerance per ADR-034 §Tier T0 §Score-match contract. The wiring is ~100 LOC across scripts/eval_from_hub.py (load + inference + score-match) and tests under tests/smoke/.
T1 + T3 tiers of ADR-034 unchanged. make test-smoke (T1; laptop, no GPU, no network, ~1 min) and make headline-cloud (T3; A100 80GB; ~$28; full LODO matrix re-train + re-eval) ship unchanged at v1.0.x.
Block B — ADR-039 gate 3 invariant scaffolds → v1.1.x
tests/test_invariants.py ships at v1.0.1 with:
- 10 implemented invariants (all green; sourced from Phase 1-4 artifacts: data balance, source disjointness, dedup calibration, leakage report cleanness, contamination scan cleanness, reference-scorer schema uniformity, calibration battery output shape, etc.).
- 38 scaffold stubs marked
@pytest.mark.skip(reason="v1.0.0 carryforward stub — see module docstring; deferred to v1.1.x").
The module docstring catalogues the 38 stubs into 3 buckets:
- Spec-invariant scaffolds (~20 stubs). The invariant is true by code construction (e.g.,
test_hyperparameter_immutabilitywould assert the config hash matches the committed value;src/utils/config_hash.pyalready enforces this at runtime). The executable assertion is the better-discipline form but the underlying invariant is already enforced. - Reporting invariants (~5 stubs). e.g.,
test_reporting_completeness_assumptions_in_caveatswould assert every severity ≥ medium assumption appears in WRITEUP §Methodology caveats. A manual review at v1.0.0 confirmed this; the pre-commit hook forno_emoji_check.pyhandles the related “no emoji” rule. Executable test deferred. - ADR-050-orphaned invariants (~13 stubs). e.g.,
test_full_ft_ood_predictions_completewould assert full-FT OOD predictions exist for every (fold, seed, slice) cell — but full-FT OOD was dropped per ADR-050 X11 FUSE crash. The invariant describes a target that no longer exists; in v1.1.x these stubs either get superseded by ADR-050-aware versions or removed.
v1.1.x landing condition: pytest -m unit tests/test_invariants.py returns 48 passed / 0 skipped (or N skipped, each with an explicit ADR-numbered exemption reason that survives audit).
Gates 1 + 2 + 4 + 5 + 6 of ADR-039 unchanged. All five remain valid acceptance criteria for the v1.0.x submission tag (zero [OPEN] in SPEC_SHEET; zero open rows in SPEC_GREENFIELD ledger; SUBMISSION_AUDIT.md regenerates clean; v0.9.0-rc rehearsal tag fired before v1.0.0; all three reviewer URLs return 200 — all confirmed at v1.0.0 + v1.0.1 close).
Consequences
- Governance: explicit, immutable record of the two carryforwards. ADR-034 + ADR-039 are not “violated” — they are narrowly superseded on the two specific axes with explicit landing conditions. The rest of both ADRs is unchanged.
- Reviewer-facing: WRITEUP/reproducibility.md T0 maintainer note (already drafted at v1.0.1) cross-references this ADR by name. A reviewer who runs
make eval-from-huband gets exit 2 sees the script’s stderr message pointing at this ADR- the maintainer note.
- Implementation: zero code or methodology changes ship with ADR-051. It is governance-only. The v1.0.2 tag = ADR-051 + decisions/README.md index update + CHANGELOG + the two ADR-034
- ADR-039 frontmatter
superseded_byfield additions.
- ADR-039 frontmatter
- Audit-trail: SUBMISSION_AUDIT.md regenerates via
scripts/regenerate_audit.pywith ADR-051 included; the closure metadata fields (closing_commit,supersedes,superseded_by) are populated post-tag in a v1.0.2 final commit (or here in this ADR if the closing_commit is the ADR-051 commit itself).
Alternatives Considered
Retroactively documented per ADR-072 (2026-05-20 frontmatter + structural backfill). The three alternatives surfaced at 2026-05-18 lock time:
- Drop both T0 + invariant commitments outright. Rejected: ADR-034 + ADR-039 are methodology contracts; silently dropping them violates the SDD discipline that ADR-005 + AGENTS.md establish. The contracts need an explicit superseding record even if the implementation defers.
- Fix-forward inline before v1.0.0 tag. Rejected: the implementation surface for both blocks was substantial — Block A is ~100 LOC of T0 score-match wiring (HF Hub download + AutoModelForSequenceClassification load + score-match within 1e-4 tolerance per ADR-034 §Tier T0); Block B is 38 invariant test bodies covering class-balance + source-disjoint splits + leakage + calibration + reporting-completeness invariants per ADR-039 gate 3. Both implementations exceeded the v1.0.0 rehearsal-tag window’s available time.
- No carryforward ADR; let ADR-034 + ADR-039 stand unmet. Rejected: violates the immutability discipline ADR-005 + ADR-067 codify. An unmet methodology contract without an explicit supersession record creates an undocumented “methodology debt” that’s hard to audit. ADR-051’s existence makes the debt explicit + tracked.
The chosen path (narrow supersession + explicit v1.1.x landing conditions) preserves the methodology contract by binding it to a future close while keeping the audit trail honest.
Linked ADRs
- Superseded (narrow): ADR-034 (T0 score-match wiring axis only); ADR-039 (gate 3 axis only).
- Referenced: ADR-032 (HF Hub publication; v1.0.1 deliverable that closes the publish half of T0); ADR-046 (Phase 4 analysis bundle; defined many of the invariants now scaffolded).
- Source: decisions/audits/REPO_AUDIT_2026-05-18.md (the explicit audit finding that invited the supersession-or-implementation decision); WRITEUP/reproducibility.md (T0 maintainer note); tests/test_invariants.py (module docstring with the v1.0.0-carryforward catalog).
Transcript
Decisions surfaced during the 2026-05-18 post-v1.0.1 audit re-examination conversation. Two /exploring-options batches (4 questions each) locked the supersession scope (single ADR covering both axes; immediate v1.0.2 tag rather than v1.1.0 defer). No transcript file required — the conversation history in the v1.0.1 → v1.0.2 commit-message bodies is the audit trail.