# Migration notes (v0.8.2 and prompt-injection-v3) ## Lifecycle policy: `stop:` → `lifecycle:` (v0.8.2) The YAML `stop:` block is renamed to `lifecycle:` with three-valued actions instead of booleans, and the defaults change so that **successful runs release their volume disk by default**. ### Motivation On 2026-05-17 the repo's RunPod account held 76 stale EXITED pods totaling 3,930 GB of preserved volume disk — **$1.10/hr (~$26/day, ~$393/month)** of idle storage burn. The leak existed because `runpodctl pod stop` (which `runpod-deploy` issued under the old `on_success: true` default) *pauses* a pod but **keeps the volume disk allocated indefinitely** at RunPod's $0.10/GB·month rate. Operators reasonably assumed "stop" meant "terminated" — the documentation at `lifecycle.md:214-222` literally said so. The schema change makes the action space explicit and the cost trade-off visible at config-edit time. ### New schema ```yaml lifecycle: on_success: delete # NEW default — releases volume disk on success on_failure: stop # NEW default — preserves paused pod for SSH forensics ``` Each field accepts one of three strings (plus a fourth on `on_success` only): | value | runpodctl call | volume disk after | | ---------- | ------------------- | ----------------- | | `preserve` | _(none)_ | continues at full rate (compute + disk) | | `stop` | `pod stop ` | **continues at ~$0.10/GB·month indefinitely** | | `delete` | `pod delete ` | released | | `recycle` | `pod stop ` | continues at ~$0.10/GB·month; next run resumes this pod (`on_success` only) | See [`lifecycle.md` §7](lifecycle.md#7-lifecycle-action-cleanup) for the full table and [`lifecycle.md` §7b](lifecycle.md#7b-cost-discipline-cleaning-up-after-forensics) for the cleanup-after-forensics workflow. ### Legacy `stop:` block — bool shim Existing configs using the old `stop: {on_success: bool, on_failure: bool}` block continue to parse; a single `[deprecated]` WARNING is emitted per parse. The shim maps: | old form | new equivalent | | ------------------------------ | ----------------------------- | | `stop.on_success: true` | `lifecycle.on_success: delete` | | `stop.on_success: false` | `lifecycle.on_success: preserve` | | `stop.on_failure: true` | `lifecycle.on_failure: stop` | | `stop.on_failure: false` | `lifecycle.on_failure: preserve` | **v0.8.3 removed the bool shim.** A YAML config containing `stop:` now raises `ValueError` with a message naming the v0.8.3 removal and pointing at this doc. Consumers pinned to v0.8.2 or earlier continue to parse the legacy form with a `[deprecated]` WARNING; pinning to `runpod-deploy>=0.8.3` requires migrating to the `lifecycle:` block first. ### CLI changes | old command | new command | | ---------------------------------------------------- | ----------------------------------------------------------- | | `runpod-deploy stop --state-file ` | `runpod-deploy cleanup --state-file --mode stop` | | _(no equivalent — was a manual `xargs` invocation)_ | `runpod-deploy cleanup --all-stopped [--yes]` | | _(no equivalent)_ | `runpod-deploy ls-stale [--json]` | The `stop` subcommand remains as a deprecated alias. ### Python API changes (breaking for direct importers) ```python notest # Before from runpod_deploy import StopPolicySpec from runpod_deploy.provider import stop_pod # After from runpod_deploy import LifecyclePolicySpec, LIFECYCLE_ACTIONS, StalePod from runpod_deploy.provider import cleanup_pod, list_stale_pods, bulk_delete_pods ``` `RunpodJobSpec.stop` is renamed to `RunpodJobSpec.lifecycle`. ### What you need to do 1. **Now**: nothing required — your existing configs and any in-flight runs continue to work via the bool shim. Watch the `[deprecated]` warnings to gauge your migration backlog. 2. **Next sweep / next config edit**: rename the `stop:` block to `lifecycle:` and replace booleans with string values. The migration is mechanical; the table above is the full mapping. 3. **Audit**: run `runpod-deploy ls-stale` to find any historical pods that the old code left behind; bulk-release with `runpod-deploy cleanup --all-stopped --yes`. 4. **Hygiene**: wire `runpod-deploy ls-stale` into a weekly cron or CI job to detect drift. See [`recipes/stale-pod-audit.md`](recipes/stale-pod-audit.md). --- ## prompt-injection-v3 Migration This document walks `prompt-injection-v3` consumers through replacing v3's hand-rolled deploy commands (`uv run reviewer-runpod`, `uv run v3-1-runpod`, `uv run v3-1-runpod-ephemeral`) with thin wrappers around `runpod-deploy run`. If you're migrating a different consumer (e.g., a fresh project), skip this doc and go straight to [`quickstart.md`](quickstart.md). ## Why migrate `prompt-injection-v3` (the project) pre-dates `runpod-deploy` (the tool). The v3-era deploy scripts were hand-rolled bash that duplicated GPU/DC failover, staging, and artifact-pull logic. Every sweep maintenance change required editing six different scripts in parallel. `runpod-deploy` absorbs those primitives: - **GPU/DC failover** — `pod.gpu_order` + `pod.datacenters` iterate the matrix automatically; v3 had to encode this in bash per script. - **Staging excludes** — `staging[].excludes_default` + standard rsync excludes replace the `--exclude` flag stacks v3 maintained inline. - **Cost capping** — `budget.cost_cap_usd` enforces both per-invocation budget *and* derives the implicit runtime ceiling; v3 had cost caps only via `timeout` on `runpodctl pod create`. - **Deploy metadata capture** — git SHA + lockfile hash land in `runpod_deploy_pull_manifest.json` automatically; v3 hand-rolled `GIT_SHA=$(git rev-parse HEAD)` injection. - **Artifact pull manifest** — `runpod_deploy_pull_manifest.json` records what was pulled, when, with what cost; v3 had ad-hoc `pulled_log.txt`. The migration is mechanical: each v3 deploy command becomes a YAML config + a Makefile target that invokes `runpod-deploy run`. ## One-time setup In the `prompt-injection-v3` repo: ```sh # Add runpod-deploy as an optional dependency # (in pyproject.toml's [project.optional-dependencies.cloud]): # cloud = ["runpod-deploy>=0.8.1"] uv sync --extra cloud # runpod-deploy is now at .venv/bin/runpod-deploy .venv/bin/runpod-deploy --help ``` This is the recommended "consumer-owned configs" pattern (see the runpod-deploy README's "Consumer-owned configs" section). ## Per-job migration ### Step 1: write the YAML config Create `configs/runpod/.yaml` in your v3 repo. Use [`quickstart.md`](quickstart.md) as the template; reference [`config-reference.md`](config-reference.md) for field semantics. The v3-era environment variables and command-line flags map to YAML sections as follows: | v3 hand-rolled | runpod-deploy YAML | |---|---| | `--gpu-type`, `--gpu-type-fallback` | `pod.gpu_order` (ordered list) | | `--datacenter`, `--datacenter-fallback` | `pod.datacenters` (ordered list) | | `--cost-cap-usd` (per-script) | `budget.cost_cap_usd` | | `--timeout-minutes` | `budget.max_runtime_minutes` | | `--cloud-type` | `pod.cloud_type` (`SECURE` or `COMMUNITY`) | | Hand-rolled `rsync --exclude=foo` | `staging[].excludes_extra: [foo]` | | Hand-rolled `git rev-parse HEAD` | Auto-captured in manifest | | Inline bash run script | `run.body` (multi-line YAML string) | ### Step 2: validate ```sh runpod-deploy validate --config configs/runpod/.yaml --all ``` The `--all` flag runs every opt-in check: schema validation, local path existence, GPU availability against the configured datacenters, consumer pyproject scan. Fix anything it flags before paying for a pod. ### Step 3: dry-run ```sh runpod-deploy run --config configs/runpod/.yaml --offline-dry-run ``` `--offline-dry-run` walks the command shape without hitting the network — no `runpodctl` calls, no SSH, no rsync. Confirms the orchestrator state machine accepts your config end-to-end. ### Step 4: real run ```sh runpod-deploy run --config configs/runpod/.yaml ``` On success, your artifacts land under `artifacts/runpod//` along with `runpod_deploy_pull_manifest.json`. The pod is stopped automatically per `stop.on_success: true`. ### Step 5: keep the v3 command name (optional) If you want `uv run reviewer-runpod` to keep working as a thin shim, add a one-line wrapper to `pyproject.toml`: ```toml [project.scripts] reviewer-runpod = "your_v3_module.cli:reviewer_runpod_main" ``` Where `reviewer_runpod_main` is a 3-line Python function that calls `subprocess.run(["runpod-deploy", "run", "--config", "runpod/reviewer.yaml", *sys.argv[1:]])`. This lets you keep your existing tooling (`uv run reviewer-runpod --dry-run`) while the underlying execution is delegated. ## Regression testing Before retiring the old v3 hand-rolled deploy scripts, run both in parallel for one billing cycle: 1. Run the v3 hand-rolled script: `uv run reviewer-runpod`. Note pod-id, wall time, cost. 2. Run the runpod-deploy equivalent: `runpod-deploy run --config runpod/reviewer.yaml`. Note pod-id, wall time, cost. 3. Compare the pulled artifacts byte-for-byte (`diff -r artifacts/v3-script-output/ artifacts/runpod//`). If the artifacts diverge, do NOT retire the v3 script until you've diagnosed the cause. Common causes: missing files in `staging[]` (check `excludes_default` semantics), different environment variables (check `remote_env.exports` + `secrets`), or different `gpu_order` producing different GPU classes per shard. ## Backwards-compat timeline - **v3.x with hand-rolled scripts**: keep working as-is. No runpod-deploy dependency. - **v3.x with runpod-deploy >= 0.8.1**: add the wrapper per Step 5; both invocation styles work side-by-side. - **v4.x (planned)**: hand-rolled scripts removed; runpod-deploy is the only path. Migration deadline TBD; will be announced in v3's CHANGELOG when set. ## See also - [`quickstart.md`](quickstart.md) — the canonical onboarding path for fresh consumers. - [`config-reference.md`](config-reference.md) — full YAML schema. - [`extending.md`](extending.md) — the three-tier extension story (consumers / library users / contributors). - [`recipes/local-preflight-then-run.md`](recipes/local-preflight-then-run.md) — the canonical "audit then deploy" pattern; replaces ad-hoc pre-run hooks in v3-era scripts.