What’s new#
A digestible summary of recent releases — for the full per-line detail, see CHANGELOG.md.
0.29.0 — best-practice infrastructure (May 2026)#
Theme: forward-compatibility + observability + perf-regression detection. No new user-facing API surface; better support for the existing one.
Highlights#
@deprecateddecorator for forward-compat planning. Mark public symbols with a string deadline:@deprecated("0.31.0", reason="…", use_instead="…"). Validated at decoration time; emitsDeprecationWarningon every call. Policy indocs/DEPRECATION.md: minimum two-minor-versions warning before removal.Structured library logging.
eval_toolkit.{harness,leakage, bootstrap,loaders}each emit DEBUG records (silent by default via NullHandler at the root). Filter granularly:logging.getLogger("eval_toolkit.harness").setLevel(logging.DEBUG).pytest-benchmark on math kernels. Weekly cron in
nightly-benchmarks.ymlrecords per-kernel timings; catches perf regressions before release. Excluded from PR CI to keep iteration fast.docs/RELEASING.mdrunbook. Concrete release recipe with every gotcha + recovery from the v0.27.x and v0.28.x cycles..editorconfig+CODEOWNERS. Repository hygiene.
Migration#
No migration needed — v0.29.0 is fully backward-compatible with v0.28.x. Logging is opt-in; deprecation decorator is internal tooling, not yet applied to any public symbols.
0.28.1 — security patch (May 2026)#
Theme: supply-chain security signal.
CodeQL workflow (push/PR/weekly cron) — static security analysis with the
security-extendedquery suite. Findings populate the Security → Code scanning tab.pip-audit gate in
test-base-install— fails PR CI on any known CVE in runtime deps (numpy/scipy/scikit-learn/jsonschema). Dev-extras vulns are surfaced via Dependabot.
No source-code behavior change.
0.28.0 — temporalcv cross-pollination bundle (May 2026)#
Theme: cross-pollinate from sibling temporalcv + hosted docs.
PurgedKFoldSplitter(from eval_toolkit import PurgedKFoldSplitter) — time-aware k-fold with explicit purge gap + post-test embargo, preventing label-window leakage when labels have a forward horizon. Per López de Prado 2018 Ch. 7.compute_label_overlap(t_train, t_test, horizon)— standalone helper for auditing arbitrary train/test label overlap.Nightly Monte Carlo bootstrap CI calibration — validates empirical coverage of nominal 95% CIs across pr_auc / roc_auc / imbalanced cases. Catches “math is wrong but self-consistent” bugs that goldens can’t.
6-example documentation gallery — minimal sybil-validated Markdown examples covering metrics + bootstrap, evaluate harness, calibration, leakage detection, claims/gates, paired comparison.
Hosted mkdocs-material docs site at brandon-behring.github.io/eval-toolkit. Auto-generated API reference from NumPy-style docstrings via mkdocstrings. Full LaTeX + TikZ rendering via MathJax v3 + tikzjax.
Public-repo polish:
SECURITY.mddisclosure policy,CITATION.cffacademic citation, 5 README badges.
0.27.2 — base-install fix (May 2026)#
Theme: silent bug fix — pip install eval-toolkit was broken
for users who didn’t also install [dataframe].
Fixed
import pandas as pdat module top level in four modules (harness,loaders,leakage,splits) so base install works for the headlinefrom eval_toolkit import evaluatepath.Added a regression-guard CI job (
test-base-install) that installs without extras and verifies imports.
0.27.1 — first PyPI release (May 2026)#
Theme: eval-toolkit goes public.
First release on PyPI:
pip install eval-toolkit.Auto-publish on
v*tag via PyPI Trusted Publishing (OIDC).Bumped from v0.27.0 because the internal milestone tag of that version pre-dated the PyPI publishing infrastructure.
Prior releases#
For the full v0.26 → v0.27 → v0.28 progression and individual release CHANGELOG entries, see CHANGELOG.md.