Artifact Runtime · SKILL

Canonical runtime-backed view of the artifact, independent from metadata governance.
Resolved Artifact

{
  "frontmatter": {
    "description": "Autonomous LLM research skill — drives iterative training experiments on train.py, evaluating val_bpb improvements and maintaining a results log.",
    "name": "load_skill_autoresearch"
  },
  "kind": "skill",
  "playbook": "# AutoResearch Skill — Autonomous LLM Training Loop\n\nThis skill enables Hera to drive autonomous machine learning research experiments.\nIt follows Karpathy's autoresearch protocol: modify code → train → evaluate → keep/discard → repeat.\n\n## Context\n\n- **Workspace**: `/home/paulo/Programs/apps/OS/Autoresearch/`\n- **Editable file**: `train.py` — contains GPT model, optimizer, training loop\n- **Read-only file**: `prepare.py` — data loading, tokenizer, evaluation (DO NOT MODIFY)\n- **Results log**: `results.tsv` — tab-separated experiment log\n- **Time budget**: Each training run is exactly 5 minutes (wall clock)\n- **Metric**: `val_bpb` (validation bits per byte) — lower is better\n- **GPU**: RTX 3090 (24GB VRAM, Ampere architecture)\n\n## Protocol\n\n### Step 1: Load Context\n\nBefore proposing changes, read:\n1. Current `train.py` to understand the baseline\n2. `results.tsv` to see what has been tried (if exists)\n3. Recent git log to understand experiment history\n\n### Step 2: Propose Experiment\n\nUse the `ml_researcher` agent persona to reason about what to try next.\nOutput a structured proposal with:\n- What to change in `train.py`\n- Why it might help\n- Expected impact on VRAM\n\n### Step 3: Apply Changes\n\nUse the `write_file` tool to modify `train.py` with the proposed changes.\nThen commit: `git add train.py && git commit -m \"experiment: \"`\n\n### Step 4: Run Training\n\nExecute: `CUDA_VISIBLE_DEVICES=1 uv run train.py > run.log 2>&1`\n\n**Timeout**: If training exceeds 10 minutes total, kill it and treat as failure.\n\n### Step 5: Evaluate Results\n\nParse output: `grep \"^val_bpb:\\|^peak_vram_mb:\" run.log`\n\n- If grep is empty → crash. Run `tail -n 50 run.log` for stack trace.\n- If val_bpb improved → KEEP (advance branch)\n- If val_bpb equal or worse → DISCARD (`git reset --hard HEAD~1`)\n\n### Step 6: Log Results\n\nAppend to `results.tsv`:\n```\n\\t\\t\\t\\t\n```\n\n### Step 7: Loop\n\nGo back to Step 2. **NEVER STOP** unless manually interrupted.\n\n## Key Rules\n\n1. Only modify `train.py` — everything else is read-only\n2. One change per experiment — isolate variables\n3. Simpler is better — removing code for equal results is a win\n4. Don't add new dependencies — use only what's in `pyproject.toml`\n5. Monitor VRAM — stay under 23GB peak on RTX 3090\n6. If stuck, try more radical architecture changes rather than stopping\n",
  "summary": {
    "description": "Autonomous LLM research skill — drives iterative training experiments on train.py, evaluating val_bpb improvements and maintaining a results log.",
    "id": "autoresearch",
    "kind": "skill",
    "path": "Skills/autoresearch/SKILL.md",
    "title": "load_skill_autoresearch"
  },
  "validation_issues": []
}
Validation

No validation issues detected.