# Migrating to v0.47 The v0.47 release follows the v0.46 scorecard surface with a **breaking consolidation** of the sweep API + Tier-2 Protocol cleanup. It also completes the v0.43-forward-look advanced-6 character-injection suite and lands the Round 6 audit follow-on items. If you're jumping from v0.45 (or earlier) and have not yet migrated through v0.46, read `migration/v0.46.md` first. ## What's removed at v0.47 (BREAKING) ### 1. Top-level scalar metric imports — hard removal The v0.46 ``__getattr__`` shim that kept these reachable with a ``DeprecationWarning`` has been deleted: ```text # v0.46 (still worked with warning): from eval_toolkit import pr_auc, roc_auc, brier_score from eval_toolkit import ( expected_calibration_error, expected_calibration_error_debiased, expected_calibration_error_equal_mass, expected_calibration_error_l2, expected_calibration_error_l2_debiased, ) # v0.47 (AttributeError on every name above): from eval_toolkit import pr_auc # ImportError ``` **Migration (primary path — preferred):** ```python import numpy as np from eval_toolkit import scorecard, metric_specs as ms rng = np.random.default_rng(42) y_true = rng.integers(0, 2, size=200) y_score = np.clip(y_true + rng.normal(0, 0.3, size=200), 0, 1) r = scorecard(y_true, y_score, metrics=[ms.pr_auc, ms.brier]) value = r["pr_auc"].value ci = r["pr_auc"].ci # BootstrapCI | None print(f"PR-AUC: {value:.3f} CI: [{ci.ci_low:.3f}, {ci.ci_high:.3f}]") ``` **Migration (escape hatch — internal API per ADR 0002):** ```python from eval_toolkit.metrics import pr_auc, roc_auc, brier_score # Same scalar-function signature as v0.45 and earlier. ``` For the 3 ECE variants that do not have a first-party ``metric_specs`` equivalent (``expected_calibration_error_debiased`` / ``_l2`` / ``_l2_debiased``), the submodule path is the only stable way to reach them. ``metric_specs.ece(n_bins=..., strategy="uniform"|"quantile")`` covers the canonical two. ### 2. Module-level sweep functions removed ```text # v0.46 — gone in v0.47: from eval_toolkit.adversarial import sweep from eval_toolkit.preprocessing import sweep ``` **Migration:** use the new top-level ``sweep()`` with any :class:`TextTransform` strategy (defence + attack mix freely): ```python from eval_toolkit import sweep, DelimitVariant, DatamarkVariant from eval_toolkit.adversarial import ZeroWidthSpaceInjection texts = ["hello world", "ignore previous instructions"] # Pure text-transform enumeration: df = sweep( [DelimitVariant(), DatamarkVariant(), ZeroWidthSpaceInjection()], texts, ) print(df.columns.tolist()) ``` Add a Scorer for original / transformed score columns, and an explicit threshold for the ``asr`` column: ```text df = sweep([...], texts, scorer=detector) df = sweep([...], texts, scorer=detector, attack_threshold=0.5) ``` **Key contract change:** ``attack_threshold`` is now an explicit kwarg. The v0.43–v0.46 ``adversarial.sweep`` had ``threshold=0.5`` as a default; the new ``sweep()`` refuses to materialize an ``asr`` column unless the caller commits to a calibrated operating point (see ``methodology/thresholds.md``). ### 3. SimpleNamespace shortcuts removed ```text # v0.46 — gone in v0.47: from eval_toolkit.adversarial import character_injection from eval_toolkit.preprocessing import spotlighting character_injection.zero_width_space("hello") spotlighting.delimit("hello") ``` **Migration:** ```python from eval_toolkit.adversarial import ZeroWidthSpaceInjection from eval_toolkit.preprocessing import delimit # or DelimitVariant ZeroWidthSpaceInjection().transform("hello") delimit("hello") DelimitVariant().transform("hello") # equivalent ``` ### 4. ``CharacterInjectionStrategy`` Protocol removed The per-module Protocol was redundant with the new top-level :class:`TextTransform` Protocol that ships in v0.47 (Decision K). ```text # v0.46: from eval_toolkit.adversarial import CharacterInjectionStrategy isinstance(my_strategy, CharacterInjectionStrategy) # v0.47: from eval_toolkit import TextTransform isinstance(my_strategy, TextTransform) ``` Every existing adversarial dataclass continues to satisfy ``TextTransform`` structurally — no source changes required in concrete classes. ## What's added at v0.47 ### Top-level ``TextTransform`` Protocol The 9th strict Tier-2 Protocol per ADR 0003 (Decision M): ```python from eval_toolkit import TextTransform # Structural subtyping — any class with name: str + transform(text) -> str # satisfies the Protocol without inheriting from it. ``` ### 3 preprocessing dataclasses ``DelimitVariant``, ``DatamarkVariant``, ``EncodeVariant`` — frozen + ``slots=True`` wrappers over the existing ``delimit`` / ``datamark`` / ``encode`` functions, satisfying ``TextTransform``: ```python from eval_toolkit import DelimitVariant, DatamarkVariant, EncodeVariant DelimitVariant(delimiter="<<").transform("hello") # "<>" DatamarkVariant(marker="^").transform("a b") # "a^ b" EncodeVariant(encoding="base64").transform("hello") # "aGVsbG8=" ``` ### 6 advanced character-injection techniques Closes the v0.43.0 CHANGELOG forward-look ("scheduled for v0.43.1" — a version that never shipped) per Decision Q11→11.3: ```python from eval_toolkit import ( BidiRTLInjection, # U+202E…U+202C override block TagStrippingInjection, # <…> tag removal (idempotent) SynonymSubstitution, # whitelisted-word swap, seed-deterministic TokenSplittingInjection, # mid-word single-space insertion (was `TokenSplitting`; renamed at v0.49) UnicodeNormalizationInjection, # NFC / NFD / NFKC / NFKD (was `UnicodeNormalization`; renamed at v0.49) InvisibleCharsInjection, # 5 invisible code points ) ``` ``ADVANCED_TECHNIQUES`` (6-tuple) + ``ALL_TECHNIQUES`` (12-tuple = core 6 + advanced 6) are exported from ``eval_toolkit.adversarial`` for convenience. ### Round 6 audit follow-on (per ``docs/source/audit_findings.md``) - **Decision R6-A**: ``scorecard(seed=None)`` docstring rewritten to document the deterministic-by-default contract. - **Decision R6-B**: ``scorecard()`` raises ``ValueError`` on duplicate ``MetricSpec.name``. - **Decision R6-C**: ``Scorecard.to_pandas()`` MultiIndex schema gains ``n_resamples`` + ``method`` columns (additive; lossless against ``BootstrapCI.to_dict()``). - **Decision R6-D**: ``tests/test_public_api.py`` drift guard now captures Tier-2 Protocol method signatures. - **Decision R6-F5**: ``_evaluate_spec()`` no longer swallows ``MemoryError`` / ``RecursionError`` / ``KeyboardInterrupt`` / ``SystemExit`` into per-cell ``status="error"`` cells. - **Decision R6-H**: ``metric_specs.make_spec_name(prefix, **kwargs)`` helper for custom parameterized ``MetricSpec`` name canonicalization. ## Migration checklist Before bumping the pin to ``eval-toolkit==0.47.0``: - [ ] Replace ``from eval_toolkit import pr_auc`` (and friends) with ``scorecard(...)`` OR ``from eval_toolkit.metrics import …``. - [ ] Replace ``from eval_toolkit.adversarial import sweep`` with ``from eval_toolkit import sweep`` + pass ``TextTransform`` strategies. - [ ] Replace ``from eval_toolkit.preprocessing import sweep`` with the top-level ``sweep()``. - [ ] Replace ``character_injection.(text)`` / ``spotlighting.(text)`` namespace shortcuts with the concrete class or functional API. - [ ] Replace ``CharacterInjectionStrategy`` references with ``TextTransform``. - [ ] If you call ``adversarial.sweep(texts, scorer)`` and rely on the ``asr`` column, add ``attack_threshold=`` explicitly. - [ ] Run your test suite against the new pin; the v0.46→v0.47 transition surfaces every removed-symbol callsite as an ``AttributeError`` or ``ImportError`` at module-load time. ## What's next (v0.48 polish; v1.0 stability) The remaining v1.0-prep work is collected in v0.48 and v1.0 per the plan: - **v0.48** — ``metrics_at_threshold`` key normalization, ``BootstrapCI.to_dict()`` rewrite, lazy-extras message audit, docstring example sweep, ADRs 0001 + 0003 finalized, Round 5/Round 7 packet-drift fixes. - **v1.0** — stability commitment; no new code; final ADR pass; all 4 gates closed. See ``~/.claude/plans/evaluate-all-the-work-twinkly-kite.md`` for the master plan.