PaperTool/docs/superpowers/plans/2026-03-31-paper-replication-agent.md

# Paper Replication Agent Implementation Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Build a Paper Replication Agent System that automates ML/DL paper reproduction with PyTorch code generation and TDD-driven validation.

**Architecture:** Primary Agent (paper-director) orchestrates 4 subagents (paper-analyzer, paper-image-extractor, code-writer, test-runner) through file-based context handoff. Skills provide domain-specific guidance. Commands provide entry points.

**Tech Stack:** OpenCode agents (Markdown), Skills (Markdown), Commands (Markdown), JSON config

**Spec:** `docs/superpowers/specs/2026-03-31-paper-replication-agent-design.md`

---

## File Structure

| File | Responsibility | Action |
|------|----------------|--------|
| `.opencode/agents/paper-director.md` | Primary agent - orchestrates workflow, manages checkpoints | Create |
| `.opencode/agents/paper-analyzer.md` | Subagent - parses paper text, creates replication plan | Create |
| `.opencode/agents/paper-image-extractor.md` | Subagent - extracts and understands paper images | Create |
| `.opencode/agents/code-writer.md` | Subagent - generates PyTorch code with TDD | Create |
| `.opencode/agents/test-runner.md` | Subagent - runs tests, creates replication report | Create |
| `.opencode/skills/paper-parsing/SKILL.md` | Skill - paper analysis methodology | Create |
| `.opencode/skills/code-generation/SKILL.md` | Skill - code generation from paper | Create |
| `.opencode/skills/pytorch-patterns/SKILL.md` | Skill - PyTorch best practices | Create |
| `.opencode/skills/verification/SKILL.md` | Skill - result verification methodology | Create |
| `.opencode/skills/environment-management/SKILL.md` | Skill - Conda + uv environment setup | Create |
| `.opencode/commands/replicate.md` | Command - /replicate entry point | Create |
| `.opencode/commands/verify.md` | Command - /verify entry point | Create |
| `opencode.json` | Project configuration | Create |
| `workspace/.gitkeep` | Workspace directory placeholder | Create |

---

### Task 1: Create paper-director Agent (Primary)

**Files:**
- Create: `.opencode/agents/paper-director.md`

- [ ] **Step 1: Create agents directory**

```bash
mkdir -p .opencode/agents
```

- [ ] **Step 2: Write paper-director.md**

Create `.opencode/agents/paper-director.md`:

```markdown
---
name: paper-director
description: |
  Primary agent for ML/DL paper replication. Orchestrates the complete workflow:
  1. Creates workspace directories
  2. Dispatches paper-image-extractor to analyze images
  3. Dispatches paper-analyzer to parse paper and create replication plan
  4. Presents human checkpoint for approval
  5. Generates tests and dispatches code-writer
  6. Dispatches test-runner for final verification
  Use when: User wants to replicate a paper, or runs /replicate command.
mode: primary
model: inherit
---

# Paper Replication Director

You are the orchestrator for ML/DL paper replication projects. Your role is to manage the complete workflow from paper analysis to working PyTorch code.

## Core Responsibilities

1. **Workspace Management**: Create and organize project directories
2. **Workflow Orchestration**: Dispatch subagents in correct sequence
3. **Quality Control**: Ensure outputs meet standards before proceeding
4. **Human Checkpoint**: Present analysis results for user approval
5. **Error Recovery**: Handle failures gracefully

## Workflow

### Phase 1: Paper Analysis

When given a paper (Markdown file or text):

1. **Create workspace directory**:
   ```
   workspace/{paper_name}/
   ├── analysis/
   ├── src/
   │   ├── models/
   │   ├── training/
   │   └── utils/
   ├── tests/
   ├── docs/
   └── reports/
   ```

2. **Dispatch @paper-image-extractor**:
   - Input: Paper file path
   - Output: `analysis/image_understanding.md`
   - Wait for completion before proceeding

3. **Dispatch @paper-analyzer**:
   - Input: Paper file + `analysis/image_understanding.md`
   - Output: `analysis/paper_structure.md` + `analysis/replication_plan.md`
   - Wait for completion before proceeding

4. **Human Checkpoint** - Present to user:
   ```
   ## Paper Analysis Complete

   ### Basic Information
   - Title: {title}
   - Core contribution: {summary}

   ### Model Architecture
   {architecture_description}

   ### Replication Targets
   {list_of_figures_to_replicate}

   ### Implementation Plan
   {planned_modules}

   ### Risks and Limitations
   {identified_risks}

   ---
   Please review and confirm to proceed, or provide corrections.
   ```

### Phase 2: Code Generation (TDD Mode)

After user approval:

1. **Load Skills**:
   - Load `code-generation` skill
   - Load `pytorch-patterns` skill
   - Load `environment-management` skill

2. **Generate Test Cases**:
   - Create test files based on replication plan
   - Tests should verify model architecture, forward pass, loss computation

3. **Dispatch @code-writer** iteratively:
   - For each module in replication plan:
     - Provide: Analysis docs + relevant test files
     - Expect: Implementation that passes tests
   - Iterate until all tests pass (max 3 retries per module)

4. **Generate Documentation**:
   - Create `docs/README.md` with usage instructions

### Phase 3: Verification

1. **Dispatch @test-runner**:
   - Run complete test suite
   - Compare with paper's expected results
   - Generate `reports/replication_report.md`

2. **Present Final Report** to user

## Error Handling

| Error | Action |
|-------|--------|
| Paper file not found | Ask user to provide correct path |
| Image extraction fails | Mark images as "unable to parse", continue |
| Test fails after 3 retries | Mark module as "needs manual intervention", continue with others |
| Missing dependencies | Suggest installation commands |

## Output Format

Always structure your responses clearly:
- Use headers for phases
- Show progress indicators
- Highlight decisions requiring user input
- Summarize completed work before asking for confirmation
```

- [ ] **Step 3: Verify file creation**

```bash
cat .opencode/agents/paper-director.md
```

Expected: File contents match the markdown above.

- [ ] **Step 4: Commit**

```bash
git add .opencode/agents/paper-director.md
git commit -m "feat(agents): add paper-director primary agent

Orchestrates ML/DL paper replication workflow with human checkpoint."
```

---

### Task 2: Create paper-analyzer Agent (Subagent)

**Files:**
- Create: `.opencode/agents/paper-analyzer.md`

- [ ] **Step 1: Write paper-analyzer.md**

Create `.opencode/agents/paper-analyzer.md`:

```markdown
---
name: paper-analyzer
description: |
  Subagent that parses ML/DL paper text content and creates structured analysis.
  Produces paper_structure.md (what the paper contains) and replication_plan.md (what to implement).
  Requires image_understanding.md as input for complete analysis.
mode: subagent
model: inherit
permission:
  edit: allow
  bash: deny
---

# Paper Analyzer

You analyze ML/DL papers and produce structured documentation for replication.

## Required Inputs

1. **Paper content**: Markdown file or plain text
2. **Image understanding**: `image_understanding.md` from paper-image-extractor

## Required Outputs

### 1. paper_structure.md

```markdown
# Paper Structure Analysis

## Basic Information
- **Title**:
- **Authors**:
- **Year**:
- **Venue**:

## Abstract Summary
{2-3 sentence summary of core contribution}

## Problem Statement
{What problem does this paper solve?}

## Key Contributions
1. {contribution 1}
2. {contribution 2}
...

## Method Overview

### Architecture
{Text description of model architecture}
{Reference to architecture diagrams from image_understanding.md}

### Key Components
| Component | Description | Implementation Priority |
|-----------|-------------|------------------------|
| {name} | {what it does} | {high/medium/low} |

### Mathematical Formulation
{Key equations in LaTeX}

$$
L = L_{task} + \lambda L_{reg}
$$

### Training Details
- **Optimizer**:
- **Learning rate**:
- **Batch size**:
- **Epochs**:
- **Hardware**:

## Experiments

### Datasets
| Dataset | Size | Purpose |
|---------|------|---------|
| {name} | {size} | {train/eval/test} |

### Metrics
- {metric 1}: {description}
- {metric 2}: {description}

### Key Results
{Reference to result figures from image_understanding.md}
{Numerical results to reproduce}

## Appendix Notes
{Any supplementary material findings}
```

### 2. replication_plan.md

```markdown
# Replication Plan

## Scope
{What will be replicated vs. what is out of scope}

## Implementation Order

### Module 1: {name}
- **File**: `src/models/{filename}.py`
- **Dependencies**: None
- **Test file**: `tests/test_{filename}.py`
- **Acceptance criteria**:
  - [ ] Forward pass produces correct output shape
  - [ ] Gradient flow verified
  - [ ] {specific behavior from paper}

### Module 2: {name}
...

## Replication Targets

### Figure X: {description}
- **Type**: {architecture diagram / training curve / comparison table}
- **Data source**: {what computation produces this}
- **Priority**: {high/medium/low}
- **Expected values**: {numerical ranges if applicable}

## Environment Requirements
- Python >= 3.10
- PyTorch >= 2.0
- {other dependencies}

## Estimated Effort
- Core model: {X hours}
- Training pipeline: {X hours}
- Evaluation: {X hours}

## Known Challenges
1. {challenge}: {mitigation strategy}
```

## Analysis Methodology

When analyzing a paper:

1. **First pass**: Extract basic info (title, authors, abstract)
2. **Method pass**: Understand architecture and algorithms
3. **Experiment pass**: Identify what needs to be reproduced
4. **Integration pass**: Combine with image_understanding.md
5. **Planning pass**: Create actionable replication plan

## Quality Checklist

Before completing:
- [ ] All sections of paper_structure.md filled
- [ ] Image descriptions integrated from image_understanding.md
- [ ] Replication plan has clear module boundaries
- [ ] Each module has testable acceptance criteria
- [ ] Dependencies between modules identified
- [ ] Numerical targets extracted where available
```

- [ ] **Step 2: Verify file creation**

```bash
cat .opencode/agents/paper-analyzer.md
```

Expected: File contents match the markdown above.

- [ ] **Step 3: Commit**

```bash
git add .opencode/agents/paper-analyzer.md
git commit -m "feat(agents): add paper-analyzer subagent

Parses paper text and creates replication plan with testable criteria."
```

---

### Task 3: Create paper-image-extractor Agent (Subagent)

**Files:**
- Create: `.opencode/agents/paper-image-extractor.md`

- [ ] **Step 1: Write paper-image-extractor.md**

Create `.opencode/agents/paper-image-extractor.md`:

```markdown
---
name: paper-image-extractor
description: |
  Subagent that extracts and understands images from ML/DL papers.
  Analyzes architecture diagrams, experiment plots, algorithm pseudocode, and equations.
  Output is used by paper-analyzer to create complete replication plan.
mode: subagent
model: inherit
permission:
  edit: allow
  bash:
    "*": deny
    "ls *": allow
---

# Paper Image Extractor

You extract and analyze images from ML/DL papers, producing detailed text descriptions that enable code replication.

## Required Input

- Paper file path (Markdown with image references)

## Required Output

`image_understanding.md` in the analysis directory.

## Output Format

```markdown
# Image Understanding

## Summary
- Total images found: {N}
- Architecture diagrams: {N}
- Experiment figures: {N}
- Algorithm/pseudocode: {N}
- Equations/tables: {N}

---

## Image 1: {caption or identifier}

**Type**: Architecture Diagram | Experiment Plot | Algorithm | Equation | Table | Other

**Location**: {file path or URL}

**Description**:
{Detailed text description of what the image shows}

### For Architecture Diagrams:

**Components**:
| Layer/Block | Input Shape | Output Shape | Parameters |
|-------------|-------------|--------------|------------|
| {name} | {shape} | {shape} | {count if shown} |

**Data Flow**:
1. Input → {first operation}
2. {intermediate steps}
3. → Output

**Key Details**:
- {notable architectural choices}
- {skip connections, attention mechanisms, etc.}

### For Experiment Plots:

**Axes**:
- X-axis: {label} (range: {min}-{max})
- Y-axis: {label} (range: {min}-{max})

**Data Series**:
| Series | Description | Key Points |
|--------|-------------|------------|
| {name/color} | {what it represents} | {peak value, convergence point, etc.} |

**Numerical Extraction**:
- At x={value}: y≈{value}
- Final value: {value}
- Best result: {value}

**Trends**:
- {observed patterns}

### For Algorithm/Pseudocode:

**Algorithm Name**: {name}

**Inputs**: {list}
**Outputs**: {list}

**Steps**:
1. {step 1}
2. {step 2}
...

**Python Translation Hint**:
```python
# Suggested structure
def algorithm_name(inputs):
    # step 1
    # step 2
    return outputs
```

### For Equations:

**Equation**:
$$
{LaTeX representation}
$$

**Variables**:
- {symbol}: {meaning}

**Implementation Notes**:
- {how to compute this in PyTorch}

---

## Image 2: ...
```

## Analysis Guidelines

### Architecture Diagrams
- Identify all layers/blocks and their connections
- Note input/output shapes when visible
- Capture skip connections, residual paths
- Identify attention mechanisms, normalization layers
- Note any dimension annotations

### Experiment Plots
- Extract actual numerical values where possible
- Identify which curve corresponds to the paper's method
- Note baseline comparisons
- Capture convergence behavior
- Identify error bars or confidence intervals

### Algorithm Pseudocode
- Convert to structured steps
- Identify loops, conditions
- Note any hyperparameters mentioned
- Suggest PyTorch equivalents

### Equations
- Transcribe to LaTeX
- Define all variables
- Note how to implement in code

## Replication Priority

Mark each image with replication priority:
- **HIGH**: Core architecture, main results to reproduce
- **MEDIUM**: Training curves, ablation studies
- **LOW**: Conceptual diagrams, background figures

## Quality Checklist

Before completing:
- [ ] All images in paper cataloged
- [ ] Architecture diagrams have layer-by-layer breakdown
- [ ] Experiment figures have numerical values extracted
- [ ] Equations transcribed to LaTeX
- [ ] Replication priorities assigned
- [ ] Output enables paper-analyzer to create complete plan
```

- [ ] **Step 2: Verify file creation**

```bash
cat .opencode/agents/paper-image-extractor.md
```

Expected: File contents match the markdown above.

- [ ] **Step 3: Commit**

```bash
git add .opencode/agents/paper-image-extractor.md
git commit -m "feat(agents): add paper-image-extractor subagent

Analyzes paper images to extract architecture details and numerical results."
```

---

### Task 4: Create code-writer Agent (Subagent)

**Files:**
- Create: `.opencode/agents/code-writer.md`

- [ ] **Step 1: Write code-writer.md**

Create `.opencode/agents/code-writer.md`:

```markdown
---
name: code-writer
description: |
  Subagent that generates PyTorch code based on paper analysis.
  Works in TDD mode: receives test files, writes code to pass tests.
  Also manages project environment using Conda + uv.
mode: subagent
model: inherit
permission:
  edit: allow
  bash:
    "*": allow
---

# Code Writer

You generate PyTorch code to replicate ML/DL papers, working in strict TDD mode.

## Required Inputs

1. `paper_structure.md` - Paper analysis
2. `image_understanding.md` - Image analysis
3. `replication_plan.md` - Implementation plan
4. Test files for the module to implement

## Working Mode: TDD

**Iron Rule**: Write code ONLY to make failing tests pass.

1. Receive test file
2. Run test to verify it fails
3. Write minimal code to pass
4. Run test to verify it passes
5. Refactor if needed (keeping tests green)

## Environment Setup

Before writing any code, ensure environment is ready:

### Step 1: Check/Create Conda Base

```bash
# Check if ai_base exists
conda env list | grep ai_base

# If not exists, create it
conda create -n ai_base python=3.10 -y
```

### Step 2: Create Project Environment

```bash
cd workspace/{paper_name}

# Create uv virtual environment using Conda's Python
uv venv --python $(conda run -n ai_base which python)

# On Windows:
# uv venv --python $(conda run -n ai_base python -c "import sys; print(sys.executable)")
```

### Step 3: Create pyproject.toml

```toml
[project]
name = "{paper_name}"
version = "0.1.0"
requires-python = ">=3.10"
dependencies = [
    "torch>=2.0.0",
    "numpy>=1.24.0",
    "matplotlib>=3.7.0",
    "tqdm>=4.65.0",
]

[project.optional-dependencies]
dev = [
    "pytest>=7.0.0",
    "pytest-cov>=4.0.0",
]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
```

### Step 4: Install Dependencies

```bash
# Activate and install
source .venv/bin/activate  # Linux/Mac
# .venv\Scripts\activate   # Windows

uv pip install -e ".[dev]"
```

## Code Generation Guidelines

### Model Architecture

```python
"""
{module_name}.py

Implements {component} from "{paper_title}"
Reference: Section {X}, Figure {Y}
"""

import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Optional, Tuple


class {ComponentName}(nn.Module):
    """
    {Brief description from paper}

    Args:
        {param}: {description}

    Paper reference:
        - Architecture: Figure {X}
        - Equation: ({Y})
    """

    def __init__(self, {params}):
        super().__init__()
        # Initialize layers

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Forward pass.

        Args:
            x: Input tensor of shape {expected_shape}

        Returns:
            Output tensor of shape {output_shape}
        """
        # Implementation
        return output
```

### Training Scripts

```python
"""
train.py

Training script for {paper_title} replication.
"""

import torch
from torch.utils.data import DataLoader
from tqdm import tqdm

def train_epoch(model, dataloader, optimizer, criterion, device):
    """Single training epoch."""
    model.train()
    total_loss = 0.0

    for batch in tqdm(dataloader, desc="Training"):
        # Training step
        pass

    return total_loss / len(dataloader)


def main():
    # Configuration from paper
    config = {
        "lr": 1e-4,  # Section X
        "batch_size": 32,  # Section X
        "epochs": 100,
    }

    # Setup
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # Model, optimizer, criterion
    # ...

    # Training loop
    for epoch in range(config["epochs"]):
        loss = train_epoch(model, train_loader, optimizer, criterion, device)
        print(f"Epoch {epoch+1}: Loss = {loss:.4f}")


if __name__ == "__main__":
    main()
```

## File Organization

```
src/
├── __init__.py
├── models/
│   ├── __init__.py
│   ├── {main_model}.py
│   └── {component}.py
├── training/
│   ├── __init__.py
│   ├── train.py
│   ├── losses.py
│   └── optimizers.py
└── utils/
    ├── __init__.py
    ├── data.py
    └── metrics.py
```

## Quality Checklist

Before completing each module:
- [ ] All tests pass
- [ ] Type hints on all public functions
- [ ] Docstrings with paper references
- [ ] Input/output shapes documented
- [ ] No hardcoded magic numbers (use config)
- [ ] Device-agnostic (CPU/GPU)
```

- [ ] **Step 2: Verify file creation**

```bash
cat .opencode/agents/code-writer.md
```

Expected: File contents match the markdown above.

- [ ] **Step 3: Commit**

```bash
git add .opencode/agents/code-writer.md
git commit -m "feat(agents): add code-writer subagent

Generates PyTorch code in TDD mode with environment management."
```

---

### Task 5: Create test-runner Agent (Subagent)

**Files:**
- Create: `.opencode/agents/test-runner.md`

- [ ] **Step 1: Write test-runner.md**

Create `.opencode/agents/test-runner.md`:

```markdown
---
name: test-runner
description: |
  Subagent that runs tests, verifies code correctness, and generates replication reports.
  Compares results with paper's expected values and documents any differences.
mode: subagent
model: inherit
permission:
  edit: allow
  bash:
    "*": allow
---

# Test Runner

You run tests, verify replication correctness, and generate comprehensive reports.

## Required Inputs

1. Generated code in `src/`
2. Test files in `tests/`
3. `replication_plan.md` with expected results

## Required Outputs

1. Test execution results
2. `reports/replication_report.md`

## Workflow

### Step 1: Run Test Suite

```bash
cd workspace/{paper_name}
source .venv/bin/activate

# Run all tests with coverage
pytest tests/ -v --cov=src --cov-report=term-missing
```

### Step 2: Verify Replication Targets

For each target in replication_plan.md:

1. Run the relevant computation
2. Compare with expected values
3. Calculate deviation

### Step 3: Generate Report

## Report Format

```markdown
# Replication Report: {Paper Title}

**Date**: {date}
**Status**: {Complete | Partial | Failed}

## Summary

| Metric | Status |
|--------|--------|
| Tests Passing | {X}/{Y} |
| Code Coverage | {X}% |
| Replication Accuracy | {qualitative} |

## Test Results

### Unit Tests

| Test | Status | Time |
|------|--------|------|
| test_model_forward | PASS | 0.1s |
| test_loss_computation | PASS | 0.05s |
| ... | ... | ... |

### Failed Tests (if any)

#### {test_name}
- **Error**: {error message}
- **Expected**: {expected}
- **Actual**: {actual}
- **Likely cause**: {analysis}

## Replication Targets

### Figure X: {description}

**Status**: Replicated | Partially Replicated | Not Replicated

**Paper Values**:
| Metric | Paper | Ours | Deviation |
|--------|-------|------|-----------|
| {metric} | {value} | {value} | {%} |

**Analysis**:
{explanation of any differences}

### Table Y: {description}

...

## Code Quality

- **Type Safety**: {assessment}
- **Documentation**: {assessment}
- **Test Coverage**: {percentage}

## Reproducibility Checklist

- [ ] Environment setup documented
- [ ] Random seeds set
- [ ] Hyperparameters match paper
- [ ] Data preprocessing matches paper
- [ ] Evaluation metrics match paper

## Known Differences from Paper

1. **{difference}**: {explanation and justification}

## Recommendations

1. {recommendation for improvement}

## Appendix: Full Test Output

```
{pytest output}
```
```

## Deviation Thresholds

| Deviation | Classification |
|-----------|----------------|
| < 1% | Excellent match |
| 1-5% | Acceptable |
| 5-10% | Needs investigation |
| > 10% | Significant difference |

## Analysis Guidelines

When results differ from paper:

1. Check implementation against paper equations
2. Verify hyperparameters
3. Check data preprocessing
4. Consider numerical precision differences
5. Note if paper has known errata

## Quality Checklist

Before completing:
- [ ] All tests executed
- [ ] Coverage report generated
- [ ] Each replication target evaluated
- [ ] Deviations analyzed and explained
- [ ] Recommendations provided
- [ ] Report is self-contained
```

- [ ] **Step 2: Verify file creation**

```bash
cat .opencode/agents/test-runner.md
```

Expected: File contents match the markdown above.

- [ ] **Step 3: Commit**

```bash
git add .opencode/agents/test-runner.md
git commit -m "feat(agents): add test-runner subagent

Runs tests and generates comprehensive replication reports."
```

---

### Task 6: Create paper-parsing Skill

**Files:**
- Create: `.opencode/skills/paper-parsing/SKILL.md`

- [ ] **Step 1: Create skills directory**

```bash
mkdir -p .opencode/skills/paper-parsing
```

- [ ] **Step 2: Write SKILL.md**

Create `.opencode/skills/paper-parsing/SKILL.md`:

```markdown
---
name: paper-parsing
description: Use when analyzing ML/DL papers to ensure comprehensive extraction of all relevant information
---

# Paper Parsing Methodology

## Overview

Systematic approach to parsing ML/DL papers for replication. Emphasizes **completeness** and **openness** to avoid missing critical details.

**Announce at start:** "I'm using the paper-parsing skill to ensure comprehensive paper analysis."

## Core Philosophy

1. **Completeness over speed**: Better to extract too much than miss something
2. **Open-ended discovery**: Papers contain unique insights; don't force into templates
3. **Cross-reference**: Information appears in multiple places; cross-check
4. **Explicit uncertainty**: Mark unclear items rather than guessing

## Paper Sections Checklist

### Abstract
- [ ] Core contribution identified
- [ ] Key results/numbers extracted
- [ ] Problem domain understood

### Introduction
- [ ] Problem motivation clear
- [ ] Gap in existing work identified
- [ ] Proposed solution summarized
- [ ] Claimed contributions listed

### Related Work
- [ ] Key prior methods identified
- [ ] Differences from this work noted
- [ ] Potential baselines for comparison

### Method / Approach
- [ ] Architecture fully described
- [ ] All components identified
- [ ] Mathematical formulation complete
- [ ] Training procedure detailed
- [ ] Loss functions specified
- [ ] Hyperparameters listed

### Experiments
- [ ] Datasets listed with sizes
- [ ] Evaluation metrics defined
- [ ] Baseline comparisons noted
- [ ] Ablation studies cataloged
- [ ] Key numerical results extracted

### Appendix / Supplementary
- [ ] Additional implementation details
- [ ] Extended results
- [ ] Proofs or derivations
- [ ] Code references

## Information Extraction Patterns

### Architecture Details

Look for:
- Layer types and configurations
- Activation functions
- Normalization methods
- Attention mechanisms
- Skip connections
- Input/output dimensions

Common locations:
- Method section figures
- Architecture diagrams
- Table of hyperparameters
- Appendix implementation details

### Training Configuration

| Parameter | Typical Locations |
|-----------|-------------------|
| Learning rate | Experiments, Appendix |
| Batch size | Experiments, Appendix |
| Optimizer | Method, Appendix |
| Epochs | Experiments |
| Hardware | Experiments, Appendix |
| Training time | Experiments |

### Numerical Results

Extract from:
- Main results tables
- Comparison figures
- Ablation tables
- Training curves (approximate values)

Format as:
| Metric | Dataset | Value | Conditions |
|--------|---------|-------|------------|
| Accuracy | CIFAR-10 | 95.2% | ResNet-50 backbone |

## Common Omissions to Watch For

1. **Initialization**: Often in appendix or not mentioned
2. **Data augmentation**: May be standard but unspecified
3. **Early stopping criteria**: Often implied
4. **Evaluation protocol**: Train/val/test split details
5. **Random seeds**: Reproducibility details
6. **Software versions**: PyTorch, CUDA versions

## Quality Verification

Before completing analysis:

1. **Coverage check**: Every section reviewed?
2. **Consistency check**: Numbers match across sections?
3. **Completeness check**: Could someone implement from this?
4. **Ambiguity check**: Unclear items marked?

## Output Quality Markers

Good analysis:
- Specific numbers, not "good performance"
- Exact layer configs, not "standard ResNet"
- Explicit uncertainty markers
- Cross-references between sections

Poor analysis:
- Vague descriptions
- Missing hyperparameters
- No numerical targets
- Assumptions without noting them

## Red Flags

If you notice:
- "Implementation details in code" → Check GitHub link
- "Standard settings" → Look up the standard
- "Following [citation]" → May need to read that paper
- Inconsistent numbers → Note the discrepancy
```

- [ ] **Step 3: Verify file creation**

```bash
cat .opencode/skills/paper-parsing/SKILL.md
```

Expected: File contents match the markdown above.

- [ ] **Step 4: Commit**

```bash
git add .opencode/skills/paper-parsing/SKILL.md
git commit -m "feat(skills): add paper-parsing skill

Comprehensive methodology for ML/DL paper analysis."
```

---

### Task 7: Create code-generation Skill

**Files:**
- Create: `.opencode/skills/code-generation/SKILL.md`

- [ ] **Step 1: Create skill directory**

```bash
mkdir -p .opencode/skills/code-generation
```

- [ ] **Step 2: Write SKILL.md**

Create `.opencode/skills/code-generation/SKILL.md`:

```markdown
---
name: code-generation
description: Use when generating PyTorch code from paper analysis to ensure correct mapping from paper to code
---

# Code Generation from Papers

## Overview

Guidelines for translating paper descriptions into working PyTorch code.

**Announce at start:** "I'm using the code-generation skill to ensure accurate paper-to-code translation."

## Core Principles

1. **Traceability**: Every code block should reference paper section/equation
2. **Testability**: Write code that can be unit tested
3. **Readability**: Prefer clarity over cleverness
4. **Modularity**: One component per file

## Paper-to-Code Mapping

### Architecture Diagrams → nn.Module

| Diagram Element | PyTorch Equivalent |
|-----------------|-------------------|
| Box/Block | nn.Module subclass |
| Arrow | forward() call chain |
| Split | Multiple outputs / tuple |
| Merge | torch.cat / torch.add |
| Skip connection | Residual addition |

### Equations → Tensor Operations

| Notation | PyTorch |
|----------|---------|
| $Wx + b$ | `nn.Linear(in, out)` |
| $\sigma(x)$ | `torch.sigmoid(x)` or `nn.Sigmoid()` |
| $\text{softmax}(x)$ | `F.softmax(x, dim=-1)` |
| $\|x\|_2$ | `torch.norm(x, p=2)` |
| $x \odot y$ | `x * y` (element-wise) |
| $x^T y$ | `torch.matmul(x.T, y)` or `x.T @ y` |
| $\sum_i$ | `torch.sum(x, dim=i)` |
| $\mathbb{E}[x]$ | `torch.mean(x)` |

### Loss Functions

| Paper Description | PyTorch |
|-------------------|---------|
| Cross-entropy | `nn.CrossEntropyLoss()` |
| MSE / L2 | `nn.MSELoss()` |
| L1 | `nn.L1Loss()` |
| BCE | `nn.BCEWithLogitsLoss()` |
| KL divergence | `nn.KLDivLoss()` |
| Custom | Subclass or functional |

## Code Structure Template

```python
"""
{component_name}.py

Implements {what} from "{paper_title}" ({year})

Paper Reference:
- Section: {section_number}
- Equation: ({equation_number})
- Figure: {figure_number}

Author: Auto-generated for paper replication
"""

import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Optional, Tuple, List


class {ComponentName}(nn.Module):
    """
    {One-line description}

    From paper: "{exact quote or paraphrase}"

    Args:
        {param1}: {description} (paper: {where specified})
        {param2}: {description}

    Shape:
        - Input: {shape description}
        - Output: {shape description}

    Example:
        >>> layer = {ComponentName}(dim=512)
        >>> x = torch.randn(32, 100, 512)
        >>> out = layer(x)
        >>> out.shape
        torch.Size([32, 100, 512])
    """

    def __init__(
        self,
        {param1}: {type},
        {param2}: {type} = {default},
    ):
        super().__init__()

        # Paper Section X.Y: "{description}"
        self.layer1 = nn.Linear(...)

        # Equation (N): ...
        self.layer2 = nn.LayerNorm(...)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Forward pass implementing Equation (N).

        Args:
            x: Input tensor of shape (batch, seq, dim)

        Returns:
            Output tensor of shape (batch, seq, dim)
        """
        # Step 1: ... (Eq. N, first term)
        h = self.layer1(x)

        # Step 2: ... (Eq. N, second term)
        out = self.layer2(h)

        return out
```

## Common Patterns

### Residual Connection

```python
# Paper: "We add a residual connection"
out = self.sublayer(x) + x
```

### Layer Normalization

```python
# Paper: "Pre-LN Transformer"
x = self.norm(x)
x = self.attention(x)

# Paper: "Post-LN Transformer"
x = x + self.attention(x)
x = self.norm(x)
```

### Multi-Head Attention

```python
# Paper: "Standard multi-head attention with h heads"
self.attention = nn.MultiheadAttention(
    embed_dim=d_model,
    num_heads=h,
    dropout=dropout,
    batch_first=True,
)
```

### Custom Activation

```python
# Paper: "We use GELU activation"
x = F.gelu(x)

# Paper: "We use Swish/SiLU activation"
x = F.silu(x)
```

## Handling Ambiguity

When paper is unclear:

1. **Check code repository** if available
2. **Follow common practice** for the architecture type
3. **Document assumption** in code comment
4. **Add TODO** for verification

```python
# TODO: Paper unclear on initialization. Using PyTorch default.
# See: https://github.com/paper/repo for reference implementation
self.linear = nn.Linear(in_dim, out_dim)
```

## Verification Checklist

Before completing a module:

- [ ] All equations implemented
- [ ] Shapes documented and verified
- [ ] Paper references in comments
- [ ] Type hints complete
- [ ] Example in docstring works
- [ ] No hardcoded dimensions (use params)
- [ ] Gradient flow verified (no in-place ops breaking autograd)
```

- [ ] **Step 3: Verify file creation**

```bash
cat .opencode/skills/code-generation/SKILL.md
```

Expected: File contents match the markdown above.

- [ ] **Step 4: Commit**

```bash
git add .opencode/skills/code-generation/SKILL.md
git commit -m "feat(skills): add code-generation skill

Paper-to-PyTorch code translation guidelines."
```

---

### Task 8: Create pytorch-patterns Skill

**Files:**
- Create: `.opencode/skills/pytorch-patterns/SKILL.md`

- [ ] **Step 1: Create skill directory**

```bash
mkdir -p .opencode/skills/pytorch-patterns
```

- [ ] **Step 2: Write SKILL.md**

Create `.opencode/skills/pytorch-patterns/SKILL.md`:

```markdown
---
name: pytorch-patterns
description: Use when writing PyTorch code to follow best practices and common patterns
---

# PyTorch Best Practices

## Overview

Established patterns for writing clean, efficient, and maintainable PyTorch code.

**Announce at start:** "I'm using the pytorch-patterns skill for best practice code."

## Model Definition

### Basic Module

```python
import torch
import torch.nn as nn
from typing import Optional


class MyModel(nn.Module):
    def __init__(self, config: dict):
        super().__init__()
        self.config = config

        # Define layers
        self.encoder = nn.Linear(config["input_dim"], config["hidden_dim"])
        self.decoder = nn.Linear(config["hidden_dim"], config["output_dim"])

        # Initialize weights
        self._init_weights()

    def _init_weights(self):
        """Initialize weights following paper's specification."""
        for module in self.modules():
            if isinstance(module, nn.Linear):
                nn.init.xavier_uniform_(module.weight)
                if module.bias is not None:
                    nn.init.zeros_(module.bias)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        h = self.encoder(x)
        h = torch.relu(h)
        out = self.decoder(h)
        return out
```

### Model with Multiple Outputs

```python
from typing import Tuple, NamedTuple


class ModelOutput(NamedTuple):
    logits: torch.Tensor
    hidden_states: torch.Tensor
    attention_weights: Optional[torch.Tensor] = None


class MultiOutputModel(nn.Module):
    def forward(self, x: torch.Tensor) -> ModelOutput:
        # ... computation ...
        return ModelOutput(
            logits=logits,
            hidden_states=hidden,
            attention_weights=attn if self.return_attention else None,
        )
```

## Device Management

### Automatic Device Handling

```python
class DeviceAwareModel(nn.Module):
    @property
    def device(self) -> torch.device:
        """Get model's device from first parameter."""
        return next(self.parameters()).device

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # Input automatically on correct device if caller handles it
        # For internal tensors:
        mask = torch.ones(x.size(0), device=self.device)
        return x * mask
```

### Training Script Device Setup

```python
def get_device() -> torch.device:
    """Get best available device."""
    if torch.cuda.is_available():
        return torch.device("cuda")
    elif torch.backends.mps.is_available():
        return torch.device("mps")
    return torch.device("cpu")


device = get_device()
model = MyModel(config).to(device)

# DataLoader handles device transfer
for batch in dataloader:
    inputs = batch["inputs"].to(device)
    targets = batch["targets"].to(device)
```

## Training Loop

### Standard Pattern

```python
def train_epoch(
    model: nn.Module,
    dataloader: DataLoader,
    optimizer: torch.optim.Optimizer,
    criterion: nn.Module,
    device: torch.device,
    scheduler: Optional[torch.optim.lr_scheduler._LRScheduler] = None,
) -> float:
    """Train for one epoch."""
    model.train()
    total_loss = 0.0
    num_batches = 0

    for batch in tqdm(dataloader, desc="Training"):
        # Move to device
        inputs = batch["inputs"].to(device)
        targets = batch["targets"].to(device)

        # Forward pass
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)

        # Backward pass
        loss.backward()

        # Gradient clipping (if needed)
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

        # Update
        optimizer.step()
        if scheduler is not None:
            scheduler.step()

        total_loss += loss.item()
        num_batches += 1

    return total_loss / num_batches


@torch.no_grad()
def evaluate(
    model: nn.Module,
    dataloader: DataLoader,
    criterion: nn.Module,
    device: torch.device,
) -> Tuple[float, float]:
    """Evaluate model."""
    model.eval()
    total_loss = 0.0
    correct = 0
    total = 0

    for batch in dataloader:
        inputs = batch["inputs"].to(device)
        targets = batch["targets"].to(device)

        outputs = model(inputs)
        loss = criterion(outputs, targets)

        total_loss += loss.item()
        preds = outputs.argmax(dim=-1)
        correct += (preds == targets).sum().item()
        total += targets.size(0)

    return total_loss / len(dataloader), correct / total
```

## Data Loading

### Custom Dataset

```python
from torch.utils.data import Dataset, DataLoader


class PaperDataset(Dataset):
    def __init__(self, data_path: str, transform=None):
        self.data = self._load_data(data_path)
        self.transform = transform

    def _load_data(self, path: str):
        # Load from disk
        pass

    def __len__(self) -> int:
        return len(self.data)

    def __getitem__(self, idx: int) -> dict:
        item = self.data[idx]
        if self.transform:
            item = self.transform(item)
        return item


def get_dataloader(
    dataset: Dataset,
    batch_size: int,
    shuffle: bool = True,
    num_workers: int = 4,
) -> DataLoader:
    return DataLoader(
        dataset,
        batch_size=batch_size,
        shuffle=shuffle,
        num_workers=num_workers,
        pin_memory=True,  # Faster GPU transfer
        drop_last=True,   # Consistent batch sizes
    )
```

## Checkpointing

### Save and Load

```python
def save_checkpoint(
    model: nn.Module,
    optimizer: torch.optim.Optimizer,
    epoch: int,
    loss: float,
    path: str,
):
    """Save training checkpoint."""
    torch.save({
        "epoch": epoch,
        "model_state_dict": model.state_dict(),
        "optimizer_state_dict": optimizer.state_dict(),
        "loss": loss,
    }, path)


def load_checkpoint(
    path: str,
    model: nn.Module,
    optimizer: Optional[torch.optim.Optimizer] = None,
) -> dict:
    """Load training checkpoint."""
    checkpoint = torch.load(path, weights_only=True)
    model.load_state_dict(checkpoint["model_state_dict"])
    if optimizer is not None:
        optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
    return checkpoint
```

## Reproducibility

### Set Seeds

```python
import random
import numpy as np
import torch


def set_seed(seed: int = 42):
    """Set all random seeds for reproducibility."""
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

    # For deterministic behavior (may impact performance)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
```

## Common Gotchas

### In-place Operations

```python
# BAD: Breaks autograd
x += 1
x[:, 0] = 0

# GOOD: Creates new tensor
x = x + 1
x = torch.cat([torch.zeros_like(x[:, :1]), x[:, 1:]], dim=1)
```

### Detaching for Metrics

```python
# BAD: Keeps computation graph
accuracy = (preds == targets).float().mean()
all_accs.append(accuracy)  # Memory leak!

# GOOD: Detach for logging
accuracy = (preds == targets).float().mean().item()
all_accs.append(accuracy)
```

### Mixed Precision

```python
from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

for batch in dataloader:
    optimizer.zero_grad()

    with autocast():
        outputs = model(inputs)
        loss = criterion(outputs, targets)

    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()
```
```

- [ ] **Step 3: Verify file creation**

```bash
cat .opencode/skills/pytorch-patterns/SKILL.md
```

Expected: File contents match the markdown above.

- [ ] **Step 4: Commit**

```bash
git add .opencode/skills/pytorch-patterns/SKILL.md
git commit -m "feat(skills): add pytorch-patterns skill

PyTorch best practices and common patterns."
```

---

### Task 9: Create verification Skill

**Files:**
- Create: `.opencode/skills/verification/SKILL.md`

- [ ] **Step 1: Create skill directory**

```bash
mkdir -p .opencode/skills/verification
```

- [ ] **Step 2: Write SKILL.md**

Create `.opencode/skills/verification/SKILL.md`:

```markdown
---
name: verification
description: Use when verifying replication results against paper's reported values
---

# Replication Verification

## Overview

Systematic approach to verifying that replicated code produces results matching the original paper.

**Announce at start:** "I'm using the verification skill to validate replication accuracy."

## Verification Levels

### Level 1: Code Correctness
- Unit tests pass
- No runtime errors
- Gradient flow works

### Level 2: Behavioral Match
- Output shapes correct
- Value ranges reasonable
- Edge cases handled

### Level 3: Numerical Match
- Results within tolerance of paper
- Trends match (even if absolute values differ)
- Statistical significance considered

## Test Design for Replication

### Shape Tests

```python
def test_model_output_shape():
    """Verify model produces correct output shape per paper."""
    model = MyModel(config)
    x = torch.randn(batch_size, seq_len, input_dim)
    out = model(x)

    # Paper Section 3.2: "Output dimension is 512"
    assert out.shape == (batch_size, seq_len, 512)
```

### Value Range Tests

```python
def test_attention_weights_sum():
    """Attention weights should sum to 1 (paper Eq. 3)."""
    model = AttentionLayer(config)
    x = torch.randn(batch_size, seq_len, dim)
    _, attn_weights = model(x, return_attention=True)

    # Softmax output sums to 1
    assert torch.allclose(attn_weights.sum(dim=-1), torch.ones(batch_size, seq_len))
```

### Gradient Tests

```python
def test_gradient_flow():
    """Verify gradients flow through all parameters."""
    model = MyModel(config)
    x = torch.randn(batch_size, input_dim, requires_grad=True)
    out = model(x)
    loss = out.sum()
    loss.backward()

    for name, param in model.named_parameters():
        assert param.grad is not None, f"No gradient for {name}"
        assert not torch.isnan(param.grad).any(), f"NaN gradient for {name}"
```

### Numerical Match Tests

```python
def test_loss_value_reasonable():
    """Loss should be in expected range per paper Figure 2."""
    model = MyModel(config)
    # ... setup ...

    loss = compute_loss(model, data)

    # Paper reports initial loss ~2.3 (cross-entropy on 10 classes)
    assert 2.0 < loss.item() < 3.0, f"Initial loss {loss.item()} outside expected range"
```

## Comparison Methodology

### Absolute Comparison

```python
def compare_absolute(paper_value: float, our_value: float, tolerance: float = 0.01):
    """Compare with absolute tolerance."""
    diff = abs(paper_value - our_value)
    return diff <= tolerance, diff
```

### Relative Comparison

```python
def compare_relative(paper_value: float, our_value: float, tolerance: float = 0.05):
    """Compare with relative tolerance (5% default)."""
    if paper_value == 0:
        return our_value == 0, abs(our_value)
    relative_diff = abs(paper_value - our_value) / abs(paper_value)
    return relative_diff <= tolerance, relative_diff
```

### Statistical Comparison

```python
def compare_with_variance(
    paper_mean: float,
    paper_std: float,
    our_values: List[float],
    confidence: float = 0.95,
):
    """Compare considering paper's reported variance."""
    our_mean = np.mean(our_values)
    our_std = np.std(our_values)

    # Check if means are within 2 standard deviations
    combined_std = np.sqrt(paper_std**2 + our_std**2)
    z_score = abs(paper_mean - our_mean) / combined_std

    return z_score < 2.0, z_score
```

## Common Difference Sources

### Acceptable Differences

| Source | Typical Impact | Mitigation |
|--------|---------------|------------|
| Random seed | 1-2% | Run multiple seeds |
| Floating point | < 0.1% | Use float64 for verification |
| Framework differences | 1-3% | Document and accept |
| Hardware differences | 0.5-1% | Note in report |

### Concerning Differences

| Source | Typical Impact | Action |
|--------|---------------|--------|
| Wrong architecture | > 10% | Review code vs paper |
| Wrong hyperparameters | 5-20% | Verify all settings |
| Data preprocessing | Variable | Match paper exactly |
| Evaluation protocol | Variable | Check train/val/test split |

## Verification Checklist

### Before Comparison

- [ ] Seeds set for reproducibility
- [ ] Same evaluation data as paper
- [ ] Same preprocessing pipeline
- [ ] Same evaluation metrics

### During Comparison

- [ ] Run multiple times with different seeds
- [ ] Record mean and standard deviation
- [ ] Compare trends, not just final values
- [ ] Check intermediate checkpoints if available

### After Comparison

- [ ] Document all differences
- [ ] Explain likely causes
- [ ] Determine if differences are acceptable
- [ ] Suggest improvements if needed

## Report Template

```markdown
## Verification Result: {Metric Name}

**Paper Value**: {value} ± {std}
**Our Value**: {value} ± {std}
**Difference**: {absolute} ({relative}%)

**Status**: MATCH | ACCEPTABLE | INVESTIGATE | MISMATCH

**Analysis**:
{explanation of difference}

**Confidence**: {HIGH | MEDIUM | LOW}
{reasoning for confidence level}
```
```

- [ ] **Step 3: Verify file creation**

```bash
cat .opencode/skills/verification/SKILL.md
```

Expected: File contents match the markdown above.

- [ ] **Step 4: Commit**

```bash
git add .opencode/skills/verification/SKILL.md
git commit -m "feat(skills): add verification skill

Replication result verification methodology."
```

---

### Task 10: Create environment-management Skill

**Files:**
- Create: `.opencode/skills/environment-management/SKILL.md`

- [ ] **Step 1: Create skill directory**

```bash
mkdir -p .opencode/skills/environment-management
```

- [ ] **Step 2: Write SKILL.md**

Create `.opencode/skills/environment-management/SKILL.md`:

```markdown
---
name: environment-management
description: Use when setting up Python environment for ML/DL paper replication using Conda + uv
---

# Environment Management (Conda + uv)

## Overview

Hybrid approach using Conda for system-level dependencies and uv for project isolation.

**Announce at start:** "I'm using the environment-management skill for Conda + uv setup."

## Architecture

```
┌─────────────────────────────────────────┐
│           Conda (System Base)           │
│  - Python interpreter                    │
│  - CUDA toolkit                          │
│  - System-level C++ libraries            │
└─────────────────────────────────────────┘
                    │
                    │ provides Python
                    ▼
┌─────────────────────────────────────────┐
│        uv (Project Isolation)           │
│  - Per-project .venv                     │
│  - Fast dependency resolution            │
│  - Reproducible installs                 │
└─────────────────────────────────────────┘
```

## Setup Commands

### Step 1: Conda Base Environment

Check if base exists:
```bash
conda env list | grep ai_base
```

Create if needed:
```bash
# Linux/Mac
conda create -n ai_base python=3.10 cuda-toolkit=11.8 -y

# Windows (CUDA from NVIDIA, not conda)
conda create -n ai_base python=3.10 -y
```

### Step 2: Project Environment

```bash
cd workspace/{paper_name}

# Get Conda Python path
# Linux/Mac:
PYTHON_PATH=$(conda run -n ai_base which python)

# Windows:
# PYTHON_PATH=$(conda run -n ai_base python -c "import sys; print(sys.executable)")

# Create uv venv
uv venv --python $PYTHON_PATH
```

### Step 3: Activate and Install

```bash
# Linux/Mac
source .venv/bin/activate

# Windows
.venv\Scripts\activate

# Install dependencies
uv pip install -e ".[dev]"
```

## pyproject.toml Template

```toml
[project]
name = "{paper_name}"
version = "0.1.0"
description = "Replication of {paper_title}"
requires-python = ">=3.10"

dependencies = [
    # Core ML
    "torch>=2.0.0",
    "numpy>=1.24.0",

    # Visualization
    "matplotlib>=3.7.0",
    "seaborn>=0.12.0",

    # Utilities
    "tqdm>=4.65.0",
    "pyyaml>=6.0",
]

[project.optional-dependencies]
dev = [
    "pytest>=7.0.0",
    "pytest-cov>=4.0.0",
    "black>=23.0.0",
    "ruff>=0.0.260",
]

# Add based on paper requirements
vision = [
    "torchvision>=0.15.0",
    "pillow>=9.5.0",
]

nlp = [
    "transformers>=4.30.0",
    "tokenizers>=0.13.0",
    "datasets>=2.12.0",
]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
addopts = "-v --tb=short"

[tool.black]
line-length = 88
target-version = ["py310"]

[tool.ruff]
line-length = 88
select = ["E", "F", "I", "N", "W"]
```

## PyTorch + CUDA Compatibility

| CUDA Version | PyTorch Version | Install Command |
|--------------|-----------------|-----------------|
| 11.8 | 2.0+ | `uv pip install torch --index-url https://download.pytorch.org/whl/cu118` |
| 12.1 | 2.1+ | `uv pip install torch --index-url https://download.pytorch.org/whl/cu121` |
| CPU only | Any | `uv pip install torch --index-url https://download.pytorch.org/whl/cpu` |

## Environment Verification

```bash
# Check Python
python --version

# Check PyTorch
python -c "import torch; print(f'PyTorch {torch.__version__}')"

# Check CUDA
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
python -c "import torch; print(f'CUDA version: {torch.version.cuda}')"

# Check GPU
python -c "import torch; print(f'GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else \"N/A\"}')"
```

## Troubleshooting

### CUDA Not Found

```bash
# Check NVIDIA driver
nvidia-smi

# Reinstall PyTorch with correct CUDA
uv pip install torch --index-url https://download.pytorch.org/whl/cu118 --force-reinstall
```

### Dependency Conflicts

```bash
# Clear cache and reinstall
uv cache clean
uv pip install -e ".[dev]" --force-reinstall
```

### Permission Errors (Windows)

```powershell
# Run as Administrator or:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
```

## Best Practices

1. **One environment per paper**: Don't mix dependencies
2. **Pin versions in pyproject.toml**: For reproducibility
3. **Use dev dependencies**: Keep test tools separate
4. **Document CUDA version**: In README.md
5. **Commit pyproject.toml**: Not .venv/

## Quick Reference

```bash
# Full setup sequence (Linux/Mac)
conda activate ai_base || conda create -n ai_base python=3.10 cuda-toolkit=11.8 -y && conda activate ai_base
cd workspace/{paper_name}
uv venv --python $(which python)
source .venv/bin/activate
uv pip install -e ".[dev]"
pytest tests/ -v
```
```

- [ ] **Step 3: Verify file creation**

```bash
cat .opencode/skills/environment-management/SKILL.md
```

Expected: File contents match the markdown above.

- [ ] **Step 4: Commit**

```bash
git add .opencode/skills/environment-management/SKILL.md
git commit -m "feat(skills): add environment-management skill

Conda + uv hybrid environment setup for ML projects."
```

---

### Task 11: Create /replicate Command

**Files:**
- Create: `.opencode/commands/replicate.md`

- [ ] **Step 1: Create commands directory**

```bash
mkdir -p .opencode/commands
```

- [ ] **Step 2: Write replicate.md**

Create `.opencode/commands/replicate.md`:

```markdown
---
description: Start paper replication workflow
agent: paper-director
---

Start the paper replication workflow for the specified paper.

## Input

Paper file: $ARGUMENTS

If no file specified, ask the user to provide the path to a paper (Markdown file or paste text directly).

## Workflow

1. Validate paper file exists (if path provided)
2. Extract paper name from filename or ask user
3. Create workspace directory: `workspace/{paper_name}/`
4. Begin Phase 1: Paper Analysis
   - Dispatch @paper-image-extractor
   - Dispatch @paper-analyzer
5. Present Human Checkpoint with analysis summary
6. After approval, begin Phase 2: Code Generation (TDD)
7. Begin Phase 3: Verification
8. Present final replication report

## Example Usage

```
/replicate workspace/attention_is_all_you_need.md
```

Or without arguments:
```
/replicate
> Please provide the path to your paper or paste the content directly.
```
```

- [ ] **Step 3: Verify file creation**

```bash
cat .opencode/commands/replicate.md
```

Expected: File contents match the markdown above.

- [ ] **Step 4: Commit**

```bash
git add .opencode/commands/replicate.md
git commit -m "feat(commands): add /replicate command

Entry point for paper replication workflow."
```

---

### Task 12: Create /verify Command

**Files:**
- Create: `.opencode/commands/verify.md`

- [ ] **Step 1: Write verify.md**

Create `.opencode/commands/verify.md`:

```markdown
---
description: Verify replication results for a completed project
agent: paper-director
---

Verify the replication results for an existing project.

## Input

Project directory: $ARGUMENTS

If no directory specified, list available projects in workspace/ and ask user to select.

## Workflow

1. Validate project directory exists
2. Check required files exist:
   - `analysis/paper_structure.md`
   - `analysis/replication_plan.md`
   - `src/` with code
   - `tests/` with tests
3. Dispatch @test-runner to:
   - Run test suite
   - Compare results with paper
   - Generate/update `reports/replication_report.md`
4. Present verification summary

## Example Usage

```
/verify workspace/attention_is_all_you_need/
```

Or without arguments:
```
/verify
> Available projects:
> 1. attention_is_all_you_need
> 2. resnet
> Please select a project to verify.
```
```

- [ ] **Step 2: Verify file creation**

```bash
cat .opencode/commands/verify.md
```

Expected: File contents match the markdown above.

- [ ] **Step 3: Commit**

```bash
git add .opencode/commands/verify.md
git commit -m "feat(commands): add /verify command

Entry point for verification of existing replication projects."
```

---

### Task 13: Create opencode.json Configuration

**Files:**
- Create: `opencode.json`

- [ ] **Step 1: Write opencode.json**

Create `opencode.json`:

```json
{
  "$schema": "https://opencode.ai/config.json",
  "default_agent": "paper-director",
  "agent": {
    "paper-director": {
      "mode": "primary"
    },
    "paper-analyzer": {
      "mode": "subagent"
    },
    "paper-image-extractor": {
      "mode": "subagent"
    },
    "code-writer": {
      "mode": "subagent"
    },
    "test-runner": {
      "mode": "subagent"
    }
  }
}
```

- [ ] **Step 2: Verify file creation**

```bash
cat opencode.json
```

Expected: Valid JSON matching above.

- [ ] **Step 3: Commit**

```bash
git add opencode.json
git commit -m "feat: add opencode.json project configuration

Sets paper-director as default agent with subagent definitions."
```

---

### Task 14: Create Workspace Directory

**Files:**
- Create: `workspace/.gitkeep`

- [ ] **Step 1: Create workspace directory**

```bash
mkdir -p workspace
```

- [ ] **Step 2: Create .gitkeep**

```bash
touch workspace/.gitkeep
```

Or on Windows:
```powershell
New-Item -ItemType File -Path workspace/.gitkeep -Force
```

- [ ] **Step 3: Verify directory creation**

```bash
ls -la workspace/
```

Expected: Directory exists with .gitkeep file.

- [ ] **Step 4: Commit**

```bash
git add workspace/.gitkeep
git commit -m "feat: add workspace directory for paper replication projects

Papers placed here will be processed by the replication agents."
```

---

### Task 15: Final Verification

**Files:**
- Read: All created files

- [ ] **Step 1: Verify directory structure**

```bash
find .opencode -type f -name "*.md" | sort
```

Expected output:
```
.opencode/agents/code-writer.md
.opencode/agents/paper-analyzer.md
.opencode/agents/paper-director.md
.opencode/agents/paper-image-extractor.md
.opencode/agents/test-runner.md
.opencode/commands/replicate.md
.opencode/commands/verify.md
.opencode/skills/code-generation/SKILL.md
.opencode/skills/environment-management/SKILL.md
.opencode/skills/paper-parsing/SKILL.md
.opencode/skills/pytorch-patterns/SKILL.md
.opencode/skills/verification/SKILL.md
```

- [ ] **Step 2: Verify opencode.json**

```bash
cat opencode.json | python -m json.tool
```

Expected: Valid JSON output, no errors.

- [ ] **Step 3: Verify workspace exists**

```bash
ls workspace/
```

Expected: .gitkeep file present.

- [ ] **Step 4: Run OpenCode to verify agents load**

```bash
opencode --help
```

Then in OpenCode:
```
/help
```

Verify that `/replicate` and `/verify` commands appear.

- [ ] **Step 5: Test agent switching**

In OpenCode, press Tab to cycle agents. Verify `paper-director` is available.

- [ ] **Step 6: Test subagent mention**

```
@paper-analyzer Can you help me?
```

Verify subagent responds.

- [ ] **Step 7: Final commit summary**

```bash
git log --oneline -15
```

Expected: All feature commits present.

---

## Self-Review Checklist

- [x] **Spec coverage**: All 5 agents, 5 skills, 2 commands, config file defined
- [x] **No placeholders**: All code blocks complete
- [x] **Consistent naming**: Agent/skill names match throughout
- [x] **File paths exact**: All paths specified completely
- [x] **Commits granular**: Each task has a commit step