# v0.6.x → v0.7.x migration

v0.7.0 is a **BREAKING release**: the `select_threshold` string API
was removed in favor of a `ThresholdSelector` Protocol. Five new
extension Protocols ship for downstream projects.

This guide lists every change with copy-pastable before/after.

## At a glance

| Change | Type |
|---|---|
| `select_threshold(criterion=str)` removed | BREAKING |
| `OperatingPoint` Literal alias removed | BREAKING (only affects callers using the alias as a type hint) |
| `select_threshold` moved from `eval_toolkit.metrics` to `eval_toolkit.thresholds` | Re-export preserves `from eval_toolkit import select_threshold`; only direct submodule imports break |
| 5 new Protocol surfaces (`ThresholdSelector`, `LeakageCheck`, `Splitter`, `DatasetLoader`, `Versioned`) | Additive |
| `evaluate(...)` gains `leakage_checks` / `on_leakage` / `on_scorer_error` parameters | Additive (defaults preserve old behavior) |
| `RunResult` gains `by_fold` / `fold_summary` / `schema_version` fields | Additive (default-empty / `"v1"`) |

## 1. The `select_threshold` migration

The single most-impactful change. Every call site updates mechanically.

| v0.6 | v0.7 |
|---|---|
| `criterion="max_f1"` | `criterion=MaxF1Selector()` |
| `criterion="recall_0.90"` | `criterion=TargetRecallSelector(0.90)` |
| `criterion="recall_0.95"` | `criterion=TargetRecallSelector(0.95)` |
| `criterion="precision@0.90"` *(local fork in some consumers)* | `criterion=TargetPrecisionSelector(0.90)` |
| `criterion="recall@0.90"` *(local fork in some consumers)* | `criterion=TargetRecallSelector(0.90)` |

### Decoding the v0.7 `TypeError`

If you call `select_threshold(y, s, criterion="max_f1")` on v0.7, you'll get:

```text
TypeError: select_threshold requires a ThresholdSelector instance (v0.7.0+);
got str='max_f1'.
Migration:
  'max_f1'      -> MaxF1Selector()
  'recall_0.90' -> TargetRecallSelector(0.90)
  'recall_0.95' -> TargetRecallSelector(0.95)
  'precision@p' -> TargetPrecisionSelector(p)
  'recall@p'    -> TargetRecallSelector(p)
See CHANGELOG v0.7.0 for the full guide.
```

### Worked example

```python
import numpy as np
from eval_toolkit import select_threshold, MaxF1Selector, TargetRecallSelector

y = np.array([0, 0, 1, 1, 0, 1])
s = np.array([0.1, 0.2, 0.7, 0.9, 0.3, 0.8])

# v0.6 (now broken):
# tr = select_threshold(y, s, criterion="max_f1")  # TypeError!

# v0.7 (works):
tr = select_threshold(y, s, criterion=MaxF1Selector())
print(f"max-F1 threshold = {tr.threshold:.3f}")

tr_r = select_threshold(y, s, criterion=TargetRecallSelector(0.95))
print(f"recall>=0.95 threshold = {tr_r.threshold:.3f}")
```

### Naming-only difference

If you used `criterion="recall@0.90"` (an at-sign separator from
some local forks), it's now `TargetRecallSelector(0.90)` — same
behavior under a normalized API. Note the `recall@p` *semantics*
also standardized — see the
[thresholds.md Pitfall](../methodology/thresholds.md#thresholds-pitfalls)
on the smallest-vs-highest-threshold-meeting-floor convention.

## 2. `OperatingPoint` Literal alias removed

v0.6 exposed `OperatingPoint = Literal["max_f1", "recall_0.90", "recall_0.95"]`
as a type hint. v0.7 removes it because the new
`ThresholdSelector` instance API doesn't take strings.

| v0.6 | v0.7 |
|---|---|
| `def my_fn(crit: OperatingPoint = "max_f1") -> ...` | `def my_fn(crit: ThresholdSelector = MaxF1Selector()) -> ...` |
| `from eval_toolkit.metrics import OperatingPoint` | drop the import; use `ThresholdSelector` from `eval_toolkit.thresholds` |

## 3. `select_threshold` module location

| v0.6 | v0.7 |
|---|---|
| `from eval_toolkit.metrics import select_threshold` | works (forwarded) |
| `from eval_toolkit import select_threshold` | works (always did) |
| `from eval_toolkit.thresholds import select_threshold` | works (canonical home) |

The function moved to `eval_toolkit.thresholds`; `eval_toolkit.metrics`
no longer defines it but the package-level re-export
(`from eval_toolkit import select_threshold`) is preserved.

## 4. Adopting the new Protocols (optional but recommended)

v0.7 adds five extension surfaces that the four `prompt_injection_*`
consumer repos collectively migrated to:

- [`ThresholdSelector`](../methodology/thresholds.md) — pluggable
  threshold rules.
- [`LeakageCheck`](../methodology/leakage.md) — pluggable leakage
  validators (incl. the new `NormalizedFormLeakageCheck` for
  encoding-obfuscated dupes).
- [`Splitter`](../methodology/splits.md) — pluggable train/test
  splitting (incl. `SourceDisjointKFoldSplitter`).
- [`DatasetLoader`](../extending.md#dataset-loader) — pluggable
  named-splits dataset loading.
- [`Versioned`](../methodology/versioning.md) — opt-in per-object
  version capture for `RunManifest`.

See [`docs/extending.md`](../extending.md) for end-to-end recipes.

## 5. Adopting the new `evaluate(...)` parameters

`evaluate(...)` and `evaluate_folded(...)` gained three optional
parameters in v0.7. None break v0.6 callers; defaults preserve old
behavior.

```python
from eval_toolkit import (
    evaluate, EvalSlice,
    NormalizedFormLeakageCheck, LabelConflictCheck,
)
import pandas as pd, numpy as np

class _Scorer:
    def predict_proba(self, X):
        return np.full(len(X), 0.5)

df = pd.DataFrame({"text": ["a", "b"], "label": [0, 1]})
slice_ = EvalSlice(name="test", df=df)

# v0.7 — inline leakage validation:
result = evaluate(
    {"s": _Scorer()},
    [slice_],
    run_id="r",
    leakage_checks=[NormalizedFormLeakageCheck(), LabelConflictCheck()],
    on_leakage="record",     # or "raise" (default) / "skip"
    on_scorer_error="raise",  # or "record"
)
```

## 6. New `RunResult` fields

`RunResult` gains three additive fields:

- `by_fold: dict[str, RunResult]` — populated by `evaluate_folded`;
  empty for non-folded runs.
- `fold_summary: dict` — auto-CV-CI summary; empty for non-folded.
- `schema_version: str = "v1"` — for downstream JSON parsers.

Old code that does `result.by_slice[name]["by_scorer"][...]` is
unchanged.

## See also

- [`docs/migration/v0.8.md`](v0.8.md) — v0.7 → v0.8.
- The four `prompt_injection_*` repos' v0.7 migration commits — real
  worked examples of the threshold migration applied at scale.