# v0.6.x → v0.7.x migration v0.7.0 is a **BREAKING release**: the `select_threshold` string API was removed in favor of a `ThresholdSelector` Protocol. Five new extension Protocols ship for downstream projects. This guide lists every change with copy-pastable before/after. ## At a glance | Change | Type | |---|---| | `select_threshold(criterion=str)` removed | BREAKING | | `OperatingPoint` Literal alias removed | BREAKING (only affects callers using the alias as a type hint) | | `select_threshold` moved from `eval_toolkit.metrics` to `eval_toolkit.thresholds` | Re-export preserves `from eval_toolkit import select_threshold`; only direct submodule imports break | | 5 new Protocol surfaces (`ThresholdSelector`, `LeakageCheck`, `Splitter`, `DatasetLoader`, `Versioned`) | Additive | | `evaluate(...)` gains `leakage_checks` / `on_leakage` / `on_scorer_error` parameters | Additive (defaults preserve old behavior) | | `RunResult` gains `by_fold` / `fold_summary` / `schema_version` fields | Additive (default-empty / `"v1"`) | ## 1. The `select_threshold` migration The single most-impactful change. Every call site updates mechanically. | v0.6 | v0.7 | |---|---| | `criterion="max_f1"` | `criterion=MaxF1Selector()` | | `criterion="recall_0.90"` | `criterion=TargetRecallSelector(0.90)` | | `criterion="recall_0.95"` | `criterion=TargetRecallSelector(0.95)` | | `criterion="precision@0.90"` *(local fork in some consumers)* | `criterion=TargetPrecisionSelector(0.90)` | | `criterion="recall@0.90"` *(local fork in some consumers)* | `criterion=TargetRecallSelector(0.90)` | ### Decoding the v0.7 `TypeError` If you call `select_threshold(y, s, criterion="max_f1")` on v0.7, you'll get: ```text TypeError: select_threshold requires a ThresholdSelector instance (v0.7.0+); got str='max_f1'. Migration: 'max_f1' -> MaxF1Selector() 'recall_0.90' -> TargetRecallSelector(0.90) 'recall_0.95' -> TargetRecallSelector(0.95) 'precision@p' -> TargetPrecisionSelector(p) 'recall@p' -> TargetRecallSelector(p) See CHANGELOG v0.7.0 for the full guide. ``` ### Worked example ```python import numpy as np from eval_toolkit import select_threshold, MaxF1Selector, TargetRecallSelector y = np.array([0, 0, 1, 1, 0, 1]) s = np.array([0.1, 0.2, 0.7, 0.9, 0.3, 0.8]) # v0.6 (now broken): # tr = select_threshold(y, s, criterion="max_f1") # TypeError! # v0.7 (works): tr = select_threshold(y, s, criterion=MaxF1Selector()) print(f"max-F1 threshold = {tr.threshold:.3f}") tr_r = select_threshold(y, s, criterion=TargetRecallSelector(0.95)) print(f"recall>=0.95 threshold = {tr_r.threshold:.3f}") ``` ### Naming-only difference If you used `criterion="recall@0.90"` (an at-sign separator from some local forks), it's now `TargetRecallSelector(0.90)` — same behavior under a normalized API. Note the `recall@p` *semantics* also standardized — see the [thresholds.md Pitfall](../methodology/thresholds.md#thresholds-pitfalls) on the smallest-vs-highest-threshold-meeting-floor convention. ## 2. `OperatingPoint` Literal alias removed v0.6 exposed `OperatingPoint = Literal["max_f1", "recall_0.90", "recall_0.95"]` as a type hint. v0.7 removes it because the new `ThresholdSelector` instance API doesn't take strings. | v0.6 | v0.7 | |---|---| | `def my_fn(crit: OperatingPoint = "max_f1") -> ...` | `def my_fn(crit: ThresholdSelector = MaxF1Selector()) -> ...` | | `from eval_toolkit.metrics import OperatingPoint` | drop the import; use `ThresholdSelector` from `eval_toolkit.thresholds` | ## 3. `select_threshold` module location | v0.6 | v0.7 | |---|---| | `from eval_toolkit.metrics import select_threshold` | works (forwarded) | | `from eval_toolkit import select_threshold` | works (always did) | | `from eval_toolkit.thresholds import select_threshold` | works (canonical home) | The function moved to `eval_toolkit.thresholds`; `eval_toolkit.metrics` no longer defines it but the package-level re-export (`from eval_toolkit import select_threshold`) is preserved. ## 4. Adopting the new Protocols (optional but recommended) v0.7 adds five extension surfaces that the four `prompt_injection_*` consumer repos collectively migrated to: - [`ThresholdSelector`](../methodology/thresholds.md) — pluggable threshold rules. - [`LeakageCheck`](../methodology/leakage.md) — pluggable leakage validators (incl. the new `NormalizedFormLeakageCheck` for encoding-obfuscated dupes). - [`Splitter`](../methodology/splits.md) — pluggable train/test splitting (incl. `SourceDisjointKFoldSplitter`). - [`DatasetLoader`](../extending.md#dataset-loader) — pluggable named-splits dataset loading. - [`Versioned`](../methodology/versioning.md) — opt-in per-object version capture for `RunManifest`. See [`docs/extending.md`](../extending.md) for end-to-end recipes. ## 5. Adopting the new `evaluate(...)` parameters `evaluate(...)` and `evaluate_folded(...)` gained three optional parameters in v0.7. None break v0.6 callers; defaults preserve old behavior. ```python from eval_toolkit import ( evaluate, EvalSlice, NormalizedFormLeakageCheck, LabelConflictCheck, ) import pandas as pd, numpy as np class _Scorer: def predict_proba(self, X): return np.full(len(X), 0.5) df = pd.DataFrame({"text": ["a", "b"], "label": [0, 1]}) slice_ = EvalSlice(name="test", df=df) # v0.7 — inline leakage validation: result = evaluate( {"s": _Scorer()}, [slice_], run_id="r", leakage_checks=[NormalizedFormLeakageCheck(), LabelConflictCheck()], on_leakage="record", # or "raise" (default) / "skip" on_scorer_error="raise", # or "record" ) ``` ## 6. New `RunResult` fields `RunResult` gains three additive fields: - `by_fold: dict[str, RunResult]` — populated by `evaluate_folded`; empty for non-folded runs. - `fold_summary: dict` — auto-CV-CI summary; empty for non-folded. - `schema_version: str = "v1"` — for downstream JSON parsers. Old code that does `result.by_slice[name]["by_scorer"][...]` is unchanged. ## See also - [`docs/migration/v0.8.md`](v0.8.md) — v0.7 → v0.8. - The four `prompt_injection_*` repos' v0.7 migration commits — real worked examples of the threshold migration applied at scale.