# Migrating to v0.51

The v0.51 release is the **Round 8 audit rectification batch** —
all 18 verified findings from the multi-LLM cross-review of v0.50.0
land in one BREAKING-allowed minor before v1.0 tags. The audit
verification report `audit-verification-codex-gemini-v0.50.0.md`
(a repo-root, **gitignored** local record — not committed, so no link)
confirmed 13 of 18 claims; refuted 3 (R8-G2,
R8-G5, R8-V1+R8-V2 paired); deferred 2 (R8-G3, R8-G4) to v1.x as
Tier-2 additive. v0.51 ships fixes for the 13 confirmed claims;
deferred items filed in `audit_findings.md` Round 8 ledger.

Round 9 multi-LLM cross-review verifies the v0.51 RC before v1.0
tags. If Round 9 lands clean, v1.0 follows v0.51 directly.

If you're jumping from v0.49 (or earlier) and have not migrated
through v0.50, read `migration/v0.50.md` first.

## What's BREAKING at v0.51

### 1. R8-C3: `recall_at_fpr` fallback sentinel

When no threshold satisfies `target_fpr`, the fallback now returns
a sentinel `RecallAtFprResult(threshold=np.inf, recall=0.0,
actual_fpr=0.0, fp=0, tn=n_val_neg)`. Pre-v0.51 the fallback
returned `threshold=1.0` and then computed `y_pred = (y_score >= 1.0)`,
silently classifying any negative-class sample with score 1.0 as
positive — `actual_fpr` returned as 1.0 in violation of the function's
target FPR ceiling.

**Before v0.51 (buggy):**

```text
recall_at_fpr(y=[0,1], scores=[1.0,1.0], target_fpr=0.0)
# returned actual_fpr=1.0, fp=1 (violates ceiling)
```

**v0.51 (correct):**

```text
recall_at_fpr(y=[0,1], scores=[1.0,1.0], target_fpr=0.0)
# returns RecallAtFprResult(threshold=np.inf, actual_fpr=0.0, fp=0, ...)
```

Migration: any caller filtering on `result.threshold` should add an
`np.isinf(result.threshold)` branch — pre-v0.51 the unsatisfiable
signal was `threshold=1.0`.

### 2. R8-C4a: Generator-rng parallel stability in `_score_all_slices`

`harness.evaluate(..., rng=np.random.default_rng(N), n_jobs=2)` now
produces bit-identical results to `n_jobs=1`. Pre-v0.51 the same
rng object was attached to every `(slice, scorer)` work unit and
joblib forked copies at the SAME generator state, so all workers
used identical bootstrap sample streams — silent non-independence
across pairs and divergence vs sequential mode. v0.51 spawns one
independent SeedSequence per work unit at the dispatch boundary.

Integer-`rng` callers (the common case) are unaffected. Callers
passing `Generator` instances now get reproducible results across
`n_jobs` per the SPEC 7 contract.

### 3. R8-C4b: `spawn_seed_sequences` respects Generator state

`_rng.spawn_seed_sequences(rng, n)` now draws fresh entropy from
the generator via `rng.integers(0, 2**63 - 1, size=n)` and wraps
each in a `SeedSequence`. Each call advances generator state, so
repeated calls on the same instance yield different children.
Pre-v0.51 the function extracted `bit_generator.seed_seq` and
called `.spawn(n)` — Generator advancement was ignored.

Most callers (passing fresh generators) see no change. Callers who
advanced the rng before passing it now get the semantically-correct
different children.

### 4. R8-C2: `SourceDisjointKFoldSplitter` k-cap

`iter_folds(...)` now caps the fold count at `min(self.k,
n_sources)` (matching `get_n_splits(...)`). Pre-v0.51 the loop ran
`range(self.k)` and yielded EMPTY test partitions when
`k > n_sources` while `get_n_splits` returned `min(k, n_sources)` —
the two methods silently disagreed. A `UserWarning` is emitted when
the cap fires.

Callers that consumed surplus empty-test folds (which was the bug)
will see fewer iterations now.

## What's Added at v0.51 (additive)

### R8-C1: `reseed_splitter` callback on `evaluate_folded`

```text
from dataclasses import replace
evaluate_folded(
    scorers, splitter, slice_,
    seeds=(1, 2, 3),
    reseed_splitter=lambda sp, s: replace(sp, seed=s),
    ...
)
```

Default `None` preserves the historical replay-folds behavior +
emits a `DeprecationWarning` whenever `len(seeds) > 1`. Note: the
warning persists past v1.0 because the pre-v1.0 deprecation window
is one minor and `DEPRECATION.md` requires ≥2 to close a cycle.

### R8-C6 / F1 / F2 / F3: validation rigor

Additive earlier-failure with better diagnostics:

- `calibration.reliability_curve` + `maximum_calibration_error`
  validate `y_score ∈ [0, 1]` (matches metrics.py-side ECE rigor).
- `calibration.fit_temperature` validates the `bounds` tuple
  (finite + positive + `lo < hi`).
- `losses.RecallAtLowFPR` validates `pos_weight > 0` at construction.
- `metric_specs.ece(n_bins=)` validates `n_bins` eagerly at
  spec-construction (matches eager `strategy` validation).
- `analysis.CsvPredictionReader` detects missing CSV columns at read
  time → actionable `ValueError` instead of cryptic dtype error
  downstream.

## What's Fixed at v0.51 (docs / structure)

- **R8-C5**: README links repointed from `docs/` (broken) to
  `docs/source/` (correct). Migration toctree extended with v0.49 +
  v0.50 + v0.51 entries.
- **R8-C8**: SimilarityStrategy demoted from "Tier-2 strict" in
  README + `extending.md` to "pre-v0.7 internal interface" matching
  the 9-strict list in `docs/source/api/strict_tier2_protocols.md`.
- **R8-G1**: `repo-strategy.md` gains a supersession note pointing at
  ADR 0001 (flat-module-layout).
- **R8-C9**: `claims.GateResult.to_dict` docstring notes the JSON-safety
  contract — strict-JSON requires `artifacts.write_json_strict` or
  explicit `sanitize_for_json` call.
- **R8-C10**: `.gitignore` audit-artifact patterns extended to cover
  `codex-comprehensive-audit-*`, `audit-verification-*`, and the
  per-LLM report aliases.

## Deferred to v1.x

- **R8-G3** (custom exception hierarchy beyond `ValueError`): additive
  Tier-2 change; deferred until a downstream consumer requests it.
- **R8-G4** (memory-aware `n_jobs` capping in joblib path): the
  hazard is documented at `_parallel.py:55-59` but mitigation needs
  RAM-measurement + dataframe-size accounting — non-trivial. Deferred;
  caller is responsible for `n_jobs` sizing under their RAM budget.

## Notes

- The pre-existing `test_bootstrap_calibration_mc.py` failures
  (6 tests) on `origin/main` are a v0.50 SPEC-7 migration gap in
  `_generate_population` helper — unrelated to the v0.51 audit batch.
  Will be picked up in Round 9 prep.
- Round 9 multi-LLM cross-review runs against the v0.51 RC at
  `gate3-audit-round-9.md` (briefing to be authored at release-prep
  time).