Worked example: plot_pareto_frontier — cost vs performance#

What this shows. Cost-vs-performance scatter with the Pareto frontier (non-dominated points) overlaid. Dominated points are shown in a muted color. Use case: multi-rung evaluation where each rung has a different (compute_cost, AUPRC) tradeoff and you need to surface the defensible operating points. Shipped in v0.33.0 (closes upstream issue #15).

Runtime: <1 s. Requires [plotting] extra.

Setup#

import numpy as np
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
from eval_toolkit import plot_pareto_frontier

Synthetic data: rungs with cost/performance tradeoff#

Build a scenario where 8 model rungs trade off compute cost against AUPRC — some dominated, some on the frontier:

# (cost, auprc) per rung — cost in arbitrary units.
cost = np.array([1.0, 2.0, 3.0, 5.0, 8.0, 10.0, 15.0, 25.0])
auprc = np.array([0.60, 0.70, 0.71, 0.78, 0.82, 0.81, 0.85, 0.86])
labels = [f"rung_{i}" for i in range(len(cost))]

Rungs 2 (cost=3, auprc=0.71) and 5 (cost=10, auprc=0.81) are dominated by neighbors: rung 1 (cost=2, auprc=0.70) gives nearly the same AUPRC at lower cost; rung 4 (cost=8, auprc=0.82) gives higher AUPRC at lower cost than rung 5. Pareto frontier highlights this.

Basic frontier#

fig = plot_pareto_frontier(
    cost, auprc,
    point_labels=labels,
    title="AUPRC vs compute cost (per-rung)",
)
plt.close(fig)

The frontier line connects non-dominated points; dominated points are plotted in PALETTE["baseline"] (muted) — the visual contrast makes the defensible-points obvious at a glance.

With caller-managed ax#

fig, ax = plt.subplots(figsize=(7, 5))
plot_pareto_frontier(
    cost, auprc,
    point_labels=labels,
    ax=ax,
    title="Cost-vs-AUPRC with frontier overlay",
)
plt.close(fig)

Lower-metric-is-better case#

By default higher_metric_is_better=True (higher metric = preferred). For metrics where lower is better (latency, calibration error, etc.) flip the flag:

latency_ms = np.array([10.0, 15.0, 25.0, 50.0, 80.0])
ece = np.array([0.08, 0.05, 0.04, 0.03, 0.04])  # lower ECE = better calibration

fig = plot_pareto_frontier(
    latency_ms, ece,
    point_labels=[f"calib_v{i}" for i in range(5)],
    higher_metric_is_better=False,  # lower ECE = better
    title="ECE vs latency (lower = better both axes)",
)
plt.close(fig)

Common pitfalls#

  • Cost is always lower-is-better: the function’s signature is (cost, metric) — cost is always treated as “lower is better”. To invert the cost axis, negate the input array.

  • point_labels length must match arrays: a ValueError is raised if len(point_labels) != len(cost). Pass None to omit annotations.

  • Tied points: when two rungs have identical (cost, metric), the frontier-membership tie-break is index order (first-wins). Visually imperceptible; semantically: both are on the frontier.

See also#

  • plot_metric_bars() for the per-rung headline comparison (when cost is not the secondary axis)

  • The consumer’s Phase 4 F1 figure (per ADR-046 Q6) is the prototypical use case driving this primitive