ADR 0006: Pairing rules for cross-detector list-grammar in audit validators#
Status: Accepted at v1.3.0 — applies to all future audit validators
in the eval_toolkit.audit_* flat-module family.
Date: 2026-05-26
Deciders: Brandon Behring (author), /exploring-options 2-round
review during #81 implementation, consumer-feedback audit Round 14.
Supersedes: N/A. Superseded by: N/A.
Context#
ADR 0005 introduced a two-layer correctness model for audit validators:
Layer 1 — Identity (
BindingKeyfrozen dataclass) — v1.1.0.Layer 2 — Scope (content-type + context-keyword filters via
scope="narrative") — v1.1.0 + v1.2.0.
The v1.2.0 release shipped four context-aware filters (T1 delta,
T2 floor, T3 consume-on-match per-sentence, T4 sentence-boundary
detector-pair reject) under scope="narrative", achieving 93%
total noise reduction (95 → 7) on the consumer’s writeup. ADR 0005’s
“Future work (deferred)” section explicitly named two remaining
failure modes — sentence-boundary unawareness (closed by v1.2.0’s
T4) and multi-detector list parsing in dense prose (still
deferred). Consumer adoption at prompt-injection-detection-submission@v1.3.12
reduced their dogfood to 4 residual warnings, all in this
deferred category.
Issue #81 documented the 4 residuals as three distinct prose patterns:
Pattern A — “for X” postfix:
"versus 0.364 [...] for the frozen probe and 0.291 [...] for TF-IDF + LR". The validator’s proximity-based detector pairing mis-attributes 0.291 to the nearer prior detector mention; the"for TF-IDF + LR"postfix is the authoritative binding signal.Pattern B — possessive
's:"LoRA's pooled OOD AUROC is 0.383 against frozen probe's 0.515". The'sconstruction isn’t part of detector-alias regex matching; cross-detector confusion follows.Pattern C — group subject:
"... 0.38 AUROC, ~0.6 drop for the trained detectors; frozen probe's gap is 0.91 → 0.515". The value 0.38 belongs to “trained detectors” (a multi-detector group), not the next-mentioned single detector.
A fourth pattern emerged during v1.3.0 dogfood:
Pattern D — metric-axis confusion:
"than the AUPRC delta suggests: LoRA's pooled OOD AUROC is 0.383". The proximity-based metric check finds “AUPRC” from a delta clause earlier in the prose, even though AUROC is the metric semantically owning 0.383. This is symmetric to detector-axis pairing — the same positional heuristic fails for metric-axis in dense prose.
These four patterns are pairing-rule problems, not identity or scope problems. They require a third correctness layer that operates on top of identity + scope.
Decision#
Introduce Layer 3 — pairing rules as the third correctness layer for audit validators:
Layer |
Correctness dimension |
Mechanism |
Release |
|---|---|---|---|
1 |
Identity |
Structured key with named fields |
v1.1.0 |
2 |
Scope |
Content-type + context-keyword filters |
v1.1.0 + v1.2.0 |
3 |
Pairing |
Override or suppress proximity-based detector / metric pairing under explicit grammar cues |
v1.3.0 |
Layer 3 ships under the existing scope="narrative" bundle (no new
public kwargs). Tier-1 ADDITIVE per
ADR 0003:
scope="all" callers see zero behavior change.
Four pairing rules#
Pattern A — "for {detector}" postfix override.
When a candidate value is followed (within +50 chars) by a "for {detector_alias}" construct AND no other value pattern lies
between the value and the postfix (excluding values in CI brackets
per v1.1.0’s scope=’narrative’ content-type filter), the postfix is
authoritative:
If the postfix names THIS binding’s detector → confirm pairing (bypass proximity check).
If it names a DIFFERENT canonical detector → skip (the other detector’s loop iteration will claim the value).
If unresolved → fall through to proximity.
Pattern B — "{detector}'s" possessive override.
Same mechanics as Pattern A, but scanning −80 chars before the
value for "{alias}'s". The LAST possessive in the pre-window is
authoritative IF its end position is within 30 chars of the value
start (covers both immediate "frozen probe's 0.515" and
short-clause "LoRA's ... AUROC is 0.383"). Last-match — not
first — is critical: an earlier possessive belonging to a different
preceding value must not bleed into a later value’s check.
Pattern C — group-subject suppression.
When prose contains "for the {trained|frozen|baseline|all|both|other} detectors" within ±60 chars of the value AND on the same side of
any sentence boundary, the value refers to a multi-detector group
statement that doesn’t bind to a single canonical detector. The
candidate is SUPPRESSED (no override). Multi-detector inference is
deferred to v1.4.0+.
Pattern D — metric-axis nearest-pairing.
Symmetric to detector-axis pairing. Pre-collects ALL metric
positions per file (across consumer-supplied metric_aliases,
not just metrics tied to canonical bindings). Requires the NEAREST
metric mention to the value (by text-order last-before-first-after)
to be THIS binding’s canonical metric. Catches prose with multiple
metrics in close proximity where the v1.2.0 window-based proximity
check picks up the wrong metric.
Why suppression (Pattern C) rather than inference#
ADR 0005’s deferred-work section framed multi-detector list parsing
as a 200+ LOC parser-level problem. v1.3.0 takes a simpler path:
when prose explicitly names a multi-detector group ("for the trained detectors"), the validator SUPPRESSES the candidate
rather than trying to infer which detectors own the value. This
matches the architectural pattern of v1.2.0’s T1/T2 (recognize a
context cue, skip the candidate) and avoids the high-risk
multi-detector iteration path (~250 LOC, MODERATE-HIGH risk per
the Round 12 Explore analysis). Inference can be added as a
Layer 3 extension (multi-detector iteration) in v1.4.0+ if
consumer demand emerges.
Scope of this ADR#
Applies to the audit_* flat-module family only. Other parts
of the codebase (e.g., MetricSpec, harness.evaluate) are NOT
retroactively forced to adopt pairing-rule mechanics. Audit
validators are a coherent subfamily that share the closed-config
pattern + the consumer-prose-aware mission.
Future pairing-rule additions (e.g., enumeration parsing for the
"X scored Y, Z, and W for A, B, C respectively" pattern, or
multi-detector inference replacing Pattern C’s suppression) join
Layer 3 as additional rule families under the same architectural
slot.
Consequences#
Positive#
Closes #81’s 4 residuals. Consumer-side dogfood reaches 0 warnings (down from 4); HARD-gate promotion of
audit_value_bindingsbecomes credible. Combined with v1.1.0 + v1.2.0, 100% reduction vs the pre-fix v1.0.5 baseline on the consumer’s writeup.Architectural consistency. Layer 3 is opt-in via the existing
scope="narrative"bundle (no new kwargs); backward-compat preserved forscope="all"callers.Symmetric metric-axis pairing. Pattern D extends the positional pairing model from detector-axis to metric-axis, using the existing
_nearest_canonical_keyhelper. Establishes “axis-by-axis nearest-pairing” as a reusable Layer 3 building block.Bypass + confirm semantics. Pattern A/B overrides are authoritative: they CONFIRM pairing (bypass proximity check) when they match THIS binding’s detector. Avoids the bug where override + proximity disagree and the value is wrongly rejected.
Negative#
Layer 3 adds ~150 LOC. Pattern helpers, per-call regex builds, inner-loop wiring. Within the flat-module maintainability bar (comparable to v1.2.0’s T1-T4 = ~150 LOC).
Pattern A intervening-value check + Pattern C sentence- boundary check reuse the existing exclusion-ranges and sentence-positions infrastructure from v1.1.0/v1.2.0. Adds cross-layer coupling: changes to bracket-exclusion logic or sentence-detection logic now also affect Layer 3 correctness. Mitigation: unit tests pin each rule’s behavior under known prose patterns.
Pattern D requires
metric_aliasesfor unbound metrics. When prose mentions a metric (e.g., AUROC) that has no canonical binding, the consumer must still pass it inmetric_aliasesfor Pattern D to recognize it. Without the alias, Pattern D falls through to the binding’s own metric (legacy v1.2.0 behavior). Documented in v1.3.0 CHANGELOG.
Future work (post-v1.3.0)#
Multi-detector inference for Pattern C. Replace suppression with multi-detector ownership iteration: when “for the trained detectors” is found, iterate the value-comparison block once per detector in the implied group. ~250 LOC; MODERATE-HIGH risk. Track as v1.4.0+ if consumer demand emerges.
Enumeration parsing. Prose like
"X scored Y, Z, and W for A, B, C respectively"requires positional alignment between two lists. Not addressed by v1.3.0. Track as v1.4.0+ if needed.Markdown AST parsing (ADR 0005 §A4) — v2.0 territory.
Alternatives considered#
A1 — Markdown AST parsing#
Rejected for v1.x per ADR 0005 §A4. Too heavy, fragile to markdown dialects, ~1000+ LOC dependency. Stays v2.0 territory.
A2 — Pattern A + B only (defer C and D)#
Closes 3 of 4 residuals. ~100 LOC. Rejected because the consumer’s HARD-gate promotion is blocked on ALL 4 warnings; a 75% close-rate release would not unblock the consumer-side workflow that motivated this work.
A3 — Multi-detector inference for Pattern C (instead of suppression)#
~250 LOC; replaces the simple “for the trained detectors” suppression with explicit iteration over implied group detectors. Rejected for v1.3.0 because suppression closes the same false positives at ~30 LOC; inference’s marginal value isn’t worth the complexity until consumer prose surfaces a case where suppression hides a real bug.
A4 — Public kwargs for pairing rules#
Add list_connectives: Sequence[str] | None = None,
possessive_patterns: ..., etc. — let consumers extend the
hardcoded sets. Rejected per ADR 0005 §4 reasoning: YAGNI without
concrete consumer demand. The hardcoded frozensets cover the
consumer’s actual prose patterns; runtime extension can be added
in a future v1.3.x patch if needed.
A5 — Layer 3 as a separate scope="strict" tier#
A new scope value that’s narrative + pairing rules.
Rejected because it creates an ordering relationship consumers
must remember (all ⊂ narrative ⊂ strict) and grows the API
surface. The existing scope="narrative" bundle already
represents “opt-in narrative-prose-aware correctness mode”; Layer
3 fits within that mental model.
Cross-references#
ADR 0001 — flat-module layout still applies; Layer 3 helpers live in
audit_value_bindings.pyalongside the v1.2.0 helpers, not in a subpackage.ADR 0003 — Tier-1 ADDITIVE classification. No new public kwargs; no signature drift; only
__version__and the inner-loop logic change.ADR 0005 — introduces Layer 1 + 2 and explicitly defers Layer 3 to v1.3.0+. This ADR is the formal closure of that deferred work.
Round 14 audit findings — captures the v1.3.0 cycle dogfood + the four-pattern taxonomy.
Issue #81 — consumer-filed signal that triggered this ADR.