Submission deadline 2026-05-18 morning with infra-leveraged Long-scope attempt and fallback ladder
ADR-001: Submission deadline 2026-05-18 morning with infra-leveraged Long-scope attempt and fallback ladder
Status
Accepted (2026-05-15)
Context
The brief mandates a submission delivered on the morning of 2026-05-18. Today is 2026-05-15. Available working window: Friday afternoon → Saturday → Sunday → Monday morning submission, ≈ 2.5 working days, weekend-heavy. By calendar alone this falls in the “Tight” bucket of the time-budget framework (≤ 5 working days), at the extreme low end.
A naive Tight-bucket interpretation would scope to a single backbone × single training rung minimal demo. However, the project’s three load-bearing libraries (eval-toolkit, runpod-deploy, research_toolkit) provide prebuilt primitives for the full workload: GPU rental + per-row prediction persistence, bootstrap CI machinery, calibration battery, paired comparison, splits + leakage scans. Library-first discipline (CLAUDE.md) means the project is not bottlenecked by hand-rolled scaffolding.
This decision encodes the bet: heavy library infrastructure compresses what would normally be a 2-week 2×3-grid workload into 2.5 working days. It also encodes the contingency: an explicit fallback ladder so scope-shrink under deadline pressure is pre-decided, not improvised.
Decision
Submission deadline: 2026-05-18, morning.
Time budget: ≈ 2.5 working days.
Scope ambition: Long-bucket — full 2×3 trained-rung grid (DeBERTa-v3 + ModernBERT × {frozen-probe, LoRA, full-FT}), multi-seed protocol (3 seeds paired across rungs, escalating to 5 if budget permits), full OOD slate, paired-bootstrap rung-vs-rung differences, written methodology analysis depth.
Fallback ladder (activates if mid-Phase-2 checkpoint shows full grid infeasible):
- Full 2×3 grid → drop full-FT → 2×2 (frozen + LoRA, both backbones)
- 2×2 → drop second backbone → 1×2 (single backbone, frozen + LoRA)
- 1×2 → drop to 1×1 minimal demo
- Multi-seed protocol degrades 3 → 2 → 1 seed in lockstep with each rung drop
Reporting discipline: the writeup reports what was achieved, not what was attempted. Honest reduction is preferred over fabrication.
Consequences
Positive:
- Pre-committed fallback ladder removes scope-shrink panic at the deadline. Triggering rule (mid-Phase-2 checkpoint) is concrete.
- Library-first leverage is the mechanism that makes Long-scope feasible — also forces the project to use library primitives correctly (anti-hand-rolling discipline reinforced).
- Phase checklists (per
docs/ROADMAP.md) are work-completed, not metric thresholds — failing-forward through the ladder is built into the spec, not an apology.
Negative / cost:
- The bet may fail. If infrastructure surprises emerge (RunPod queue contention; library bugs in eval-toolkit’s bootstrap or splits primitives; HF dataset rate-limits), the ladder activates and the headline narrative shrinks.
- Weekend-heavy schedule means human-availability risk is high; one bad night of sleep can compress remaining hours significantly.
- Multi-seed escalation (3 → 5) is unlikely to fit; the realistic operating point is 3 seeds.
Neutral:
- The Long-scope-attempt framing matches reviewer expectations for a methodology-first submission (Q4 locks ADR-004). A1+A2 audience reads “attempted full grid, achieved X under time constraints” as competent scoping, not as failure.
Alternatives Considered
- Tight-bucket-minimal (1×1 single-rung demo): Conservative; guaranteed to complete; flat narrative; doesn’t exercise the library infrastructure. Rejected because it under-uses the library leverage that distinguishes this submission, and the methodology-first framing demands enough rungs to compare.
- Long-scope without fallback ladder: Optimistic; commits the project to the full grid with no documented retreat path. Rejected because unmitigated deadline-pressure scope-shrink is the most common source of methodology corner-cutting; pre-committing the ladder removes that pressure.
- Defer the deadline / negotiate extension: Not the project’s call. The brief is authoritative.
References
docs/ROADMAP.md§Phase 0 close criterion + §Replanning checkpoint (per-phase)SPEC_STRATEGY.md— strategy doc on scope-shrink under pressureCLAUDE.md— library-first discipline; anti-pattern rule on hand-rolling