---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

# Worked example: `plot_pareto_frontier` — cost vs performance

> **What this shows.** Cost-vs-performance scatter with the Pareto
> frontier (non-dominated points) overlaid. Dominated points are shown
> in a muted color. Use case: multi-rung evaluation where each rung has a
> different (compute_cost, AUPRC) tradeoff and you need to surface the
> defensible operating points. Shipped in v0.33.0 (closes upstream
> issue #15).
>
> **Runtime:** <1 s. Requires `[plotting]` extra.

## Setup

```{code-cell}
import numpy as np
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
from eval_toolkit import plot_pareto_frontier
```

## Synthetic data: rungs with cost/performance tradeoff

Build a scenario where 8 model rungs trade off compute cost against AUPRC
— some dominated, some on the frontier:

```{code-cell}
# (cost, auprc) per rung — cost in arbitrary units.
cost = np.array([1.0, 2.0, 3.0, 5.0, 8.0, 10.0, 15.0, 25.0])
auprc = np.array([0.60, 0.70, 0.71, 0.78, 0.82, 0.81, 0.85, 0.86])
labels = [f"rung_{i}" for i in range(len(cost))]
```

Rungs 2 (cost=3, auprc=0.71) and 5 (cost=10, auprc=0.81) are *dominated* by
neighbors: rung 1 (cost=2, auprc=0.70) gives nearly the same AUPRC at
lower cost; rung 4 (cost=8, auprc=0.82) gives higher AUPRC at lower cost
than rung 5. Pareto frontier highlights this.

## Basic frontier

```{code-cell}
fig = plot_pareto_frontier(
    cost, auprc,
    point_labels=labels,
    title="AUPRC vs compute cost (per-rung)",
)
plt.close(fig)
```

The frontier line connects non-dominated points; dominated points are
plotted in `PALETTE["baseline"]` (muted) — the visual contrast makes the
defensible-points obvious at a glance.

## With caller-managed `ax`

```{code-cell}
fig, ax = plt.subplots(figsize=(7, 5))
plot_pareto_frontier(
    cost, auprc,
    point_labels=labels,
    ax=ax,
    title="Cost-vs-AUPRC with frontier overlay",
)
plt.close(fig)
```

## Lower-metric-is-better case

By default `higher_metric_is_better=True` (higher metric = preferred).
For metrics where lower is better (latency, calibration error, etc.) flip
the flag:

```{code-cell}
latency_ms = np.array([10.0, 15.0, 25.0, 50.0, 80.0])
ece = np.array([0.08, 0.05, 0.04, 0.03, 0.04])  # lower ECE = better calibration

fig = plot_pareto_frontier(
    latency_ms, ece,
    point_labels=[f"calib_v{i}" for i in range(5)],
    higher_metric_is_better=False,  # lower ECE = better
    title="ECE vs latency (lower = better both axes)",
)
plt.close(fig)
```

## Common pitfalls

- **Cost is always lower-is-better**: the function's signature is
  `(cost, metric)` — cost is always treated as "lower is better". To
  invert the cost axis, negate the input array.
- **`point_labels` length must match arrays**: a `ValueError` is raised
  if `len(point_labels) != len(cost)`. Pass `None` to omit annotations.
- **Tied points**: when two rungs have identical `(cost, metric)`, the
  frontier-membership tie-break is index order (first-wins). Visually
  imperceptible; semantically: both are on the frontier.

## See also

- {func}`~eval_toolkit.plotting.plot_metric_bars` for the per-rung
  headline comparison (when cost is not the secondary axis)
- The consumer's Phase 4 F1 figure (per ADR-046 Q6) is the prototypical
  use case driving this primitive