# Migrating to v1.0 The v1.0 release is a **stability-contract activation**, not a code delta from v0.51. Every fix that landed at v0.51 is what v1.0 ships; the new thing at v1.0 is that the [ADR 0003 Tier 1 / Tier 2 / Tier 3 stability contract](../adr/0003-stability-contract-and-gate3-methodology.md) becomes load-bearing. Breaking changes to Tier-1 surfaces after v1.0 require a major bump (`v2.0`). If you're jumping from v0.50 or earlier, **read [`v0.51.md`](v0.51.md) first** — that is where the actual migration-step content lives. This document recaps what's locked at v1.0 and lists what's deferred to v1.0.1. ## What v1.0 locks (ADR 0003 activation) Per the [stability contract](../adr/0003-stability-contract-and-gate3-methodology.md), after v1.0: - **Tier 1 STRICT** — public-API signatures captured in `tests/golden/public_api/snapshot.json`. Any signature drift bumps to `v2.0`. - **Tier 2 ADDITIVE** — the 9 strict Protocols (`Scorer`, `LeakageCheck`, `Splitter`, `ThresholdSelector`, `DatasetLoader`, `MetricSpec`, `MetaLearner`, `Probe`, `TextTransform`) + 1 opt-in (`Versioned`). Method shapes are frozen; new subprotocols / new Protocols may be added. - **Tier 3 FREE** — internal modules (`_rng`, `_sweep`, `_parallel`, the various `_*.py` files). Refactors don't need a major bump. The four v1.0 gates (real consumer in production, Protocol-shape review cycle, multi-model methodology cross-review, Croissant interop end-to-end) are all closed — see [`roadmap.md`](../roadmap.md#v100-path-long-term-gated) for the gate ledger. ## Round 8 + Round 9 audit closure (recap) v1.0 ships with the Round 8 + Round 9 multi-LLM cross-review batch closed: - **Round 8** (verified against v0.50.0): 13 confirmed findings → fixed in v0.51; 3 refuted (Gemini over-confidence pattern); 2 deferred to v1.x as Tier-2 additive (custom exceptions; joblib memory-aware capping). - **Round 9** (verified against v0.51 RC): 6 confirmed of 10 source items + 3 third-audit findings my Claude verification caught in modules neither auditor cited (`_sweep.py`, `bootstrap.py`, `metrics.py`). 2 candidate-blocker-tier items fixed in this RC (F-sweep-1 NaN/inf scorer-output validation; F-bootstrap-1 BCa degeneracy warning + `mde_from_ci` NaN-width guard). 4 minors deferred to v1.0.1. Full ledger at [`audit_findings.md`](../audit_findings.md) Round 8 + Round 9 sections. ## Carried-over deprecations The R8-C1 `DeprecationWarning` on multi-seed `evaluate_folded(seeds=...)` calls without an explicit `reseed_splitter` callback **persists past v1.0 by design**. The pre-v1.0 deprecation window is one minor (v0.51 → v1.0); `DEPRECATION.md` requires ≥2 minors to close a deprecation cycle. The warning therefore becomes a permanent docstring + runtime nudge — single-seed callers see no change; multi-seed callers should pass `reseed_splitter` for true seed variance. ## Deferred to v1.0.1 The following items are filed in the [`v1.0.1 cleanup tracking issue`](https://github.com/brandon-behring/eval-toolkit/issues) (`gh issue list --label tracked --label improvement`) and will pick up in the next minor. All are **Tier-2 ADDITIVE or Tier-3 FREE** — the v1.0 Tier-1 contract is not affected. - **RC2** — `SimilarityStrategy` contract reconciliation: demoted in `extending.md` + README to "pre-v0.7 internal interface" but still pinned in `__init__.py:__all__` and the public-API snapshot. Resolve to a single canonical Tier (3-internal vs 2-additive) and align all surfaces. - **RC3** — `tests/test_harness_folded.py` R8-C1 reseed_splitter regression test harden: current count-only assertions should compare fold-row indices across seeds. - **RC4** — v0.51 documentation count ambiguity ("13 confirmed / 3 refuted / 2 deferred" tallies appear with minor variance across `audit_findings.md`, `migration/v0.51.md`, and CHANGELOG headers). Reconcile to a single canonical tally. - **F-metrics-1** — `brier_score` docstring precision pass. - **F-metrics-3** — ECE behavior on uniform / uninformative scores: clarify docs OR add boundary-condition validation. - **F-metrics-4** — `brier_score` single-class edge-case docstring ambiguity. Carried forward from earlier rounds: - **R8-G3** — custom exception hierarchy beyond `ValueError` (additive Tier-2). Deferred until a downstream consumer requests it. - **R8-G4** — joblib memory-aware `n_jobs` capping. Caller-owned per documented hazard at `_parallel.py`; non-trivial mitigation (RAM measurement + DataFrame-size accounting) — deferred indefinitely. ## What you should do on upgrade If your consumer is on v0.51, **nothing**. v1.0 is bit-equivalent to v0.51 for all public behavior. If your consumer is on v0.50 or earlier, follow [`v0.51.md`](v0.51.md) for the actual migration steps. The migration sequence is v0.49 → v0.50 → v0.51 → v1.0; each step has its own guide. If you depend on `eval-toolkit` in a downstream project, pin `>=1.0,<2.0` to opt into the stability contract. Tier-1 breakages after v1.0 will land in `v2.0`. ## See also - [`roadmap.md`](../roadmap.md) — v1.0 gates ledger + post-v1.0 forward-look. - [`audit_findings.md`](../audit_findings.md) — full Round 5 → Round 9 audit history. - [`adr/0003-stability-contract-and-gate3-methodology.md`](../adr/0003-stability-contract-and-gate3-methodology.md) — the contract this tag locks in. - [`MIGRATION.md`](../MIGRATION.md) — version-to-version migration index.