Worked example: plot_pareto_frontier — cost vs performance#
What this shows. Cost-vs-performance scatter with the Pareto frontier (non-dominated points) overlaid. Dominated points are shown in a muted color. Use case: multi-rung evaluation where each rung has a different (compute_cost, AUPRC) tradeoff and you need to surface the defensible operating points. Shipped in v0.33.0 (closes upstream issue #15).
Runtime: <1 s. Requires
[plotting]extra.
Setup#
import numpy as np
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
from eval_toolkit import plot_pareto_frontier
Synthetic data: rungs with cost/performance tradeoff#
Build a scenario where 8 model rungs trade off compute cost against AUPRC — some dominated, some on the frontier:
# (cost, auprc) per rung — cost in arbitrary units.
cost = np.array([1.0, 2.0, 3.0, 5.0, 8.0, 10.0, 15.0, 25.0])
auprc = np.array([0.60, 0.70, 0.71, 0.78, 0.82, 0.81, 0.85, 0.86])
labels = [f"rung_{i}" for i in range(len(cost))]
Rungs 2 (cost=3, auprc=0.71) and 5 (cost=10, auprc=0.81) are dominated by neighbors: rung 1 (cost=2, auprc=0.70) gives nearly the same AUPRC at lower cost; rung 4 (cost=8, auprc=0.82) gives higher AUPRC at lower cost than rung 5. Pareto frontier highlights this.
Basic frontier#
fig = plot_pareto_frontier(
cost, auprc,
point_labels=labels,
title="AUPRC vs compute cost (per-rung)",
)
plt.close(fig)
The frontier line connects non-dominated points; dominated points are
plotted in PALETTE["baseline"] (muted) — the visual contrast makes the
defensible-points obvious at a glance.
With caller-managed ax#
fig, ax = plt.subplots(figsize=(7, 5))
plot_pareto_frontier(
cost, auprc,
point_labels=labels,
ax=ax,
title="Cost-vs-AUPRC with frontier overlay",
)
plt.close(fig)
Lower-metric-is-better case#
By default higher_metric_is_better=True (higher metric = preferred).
For metrics where lower is better (latency, calibration error, etc.) flip
the flag:
latency_ms = np.array([10.0, 15.0, 25.0, 50.0, 80.0])
ece = np.array([0.08, 0.05, 0.04, 0.03, 0.04]) # lower ECE = better calibration
fig = plot_pareto_frontier(
latency_ms, ece,
point_labels=[f"calib_v{i}" for i in range(5)],
higher_metric_is_better=False, # lower ECE = better
title="ECE vs latency (lower = better both axes)",
)
plt.close(fig)
Common pitfalls#
Cost is always lower-is-better: the function’s signature is
(cost, metric)— cost is always treated as “lower is better”. To invert the cost axis, negate the input array.point_labelslength must match arrays: aValueErroris raised iflen(point_labels) != len(cost). PassNoneto omit annotations.Tied points: when two rungs have identical
(cost, metric), the frontier-membership tie-break is index order (first-wins). Visually imperceptible; semantically: both are on the frontier.
See also#
plot_metric_bars()for the per-rung headline comparison (when cost is not the secondary axis)The consumer’s Phase 4 F1 figure (per ADR-046 Q6) is the prototypical use case driving this primitive