Threat model — summary
Public-site note. This is a scope reference. It tells you which attack classes were in scope and which were deliberately deferred; it does not claim deployment coverage.
This file is a convenience aggregator. Canonical content lives in WRITEUP.md §0 + §5.6; the decision ledger rows in SPEC_GREENFIELD.md §0 Threat are the lock points for what’s in / out of scope. This file gives a single-page security-flavor entry to those.
Attack classes named
| Class | In scope? | Spec lock |
|---|---|---|
| Direct injection — adversarial text in user input attempting to override system instructions | In scope | ADR-014 (Phase 0-01) |
| Indirect injection — adversarial text arriving via context channels (retrieved docs, tool outputs, file attachments) | In scope | ADR-014 (Phase 0-01) |
| Multi-turn injection — adversarial payload split across multiple conversation turns | Deferred | ADR-014; see WRITEUP §5.6 |
| Encoded payloads — base64 / leetspeak / hex / Unicode confusables | Deferred | ADR-014; see WRITEUP §5.6 |
| Paraphrase attacks — semantic equivalents of training-set injections | Deferred | ADR-014; see WRITEUP §5.6 |
| Adversarial perturbations — gradient-guided or search-based evasion against a specific classifier | Deferred (named, not silently dropped) | ADR-014 |
Scope decisions (all locked at Phase 0-01 via ADR-014)
- Attack classes in scope: direct + indirect injection (multi-turn / encoded / paraphrase / adversarial perturbations deferred)
- Language scope: English-only
- Length cap: 512 tokens, single-turn
Reference-scorer training-overlap audit (locked discipline)
Any external reference scorer evaluated alongside in-house rungs gets a training-data audit. Verdict per the three-state taxonomy:
verified_disjoint— training data verifiably disjoint from project sourcessuspected_contamination— known overlap with one or more project sourcesvendor_black_box— training data not disclosed (audit shifts to fold-pattern + scope-mismatch analysis)
Verdicts land in EVIDENCE.md §1-2. The taxonomy is encoded in eval-toolkit manifests’ contamination_flags field (see docs/MANIFEST_SCHEMA.md).
Out of scope (named, not silently)
- Deployment recommendation
- SOTA chasing
- Production-readiness testing
- See WRITEUP §8 for the consolidated deferred list
Cross-references
WRITEUP.md§0 Motivation, §5.6 Adversarial robustnessSPEC_GREENFIELD.md§0 Threat (Feature Specs section) + decision ledger §0 rowsEVIDENCE.md§1-2 (reference scorer audits)docs/MANIFEST_SCHEMA.md(contamination_flagsfield)docs/research/attacks_defenses/(literature dossier on attack taxonomies + defenses)