commit 4801fb2cc202f7e4e2fe485ec7d8105ff13a1347 Author: hc <1328308360@qq.com> Date: Tue Mar 31 17:29:53 2026 +0800 Initial commit: design spec and implementation plan - Design spec: docs/superpowers/specs/2026-03-31-paper-replication-agent-design.md - Implementation plan: docs/superpowers/plans/2026-03-31-paper-replication-agent.md - Existing agent: .opencode/agents/paper-image-extractor.md diff --git a/.opencode/agents/paper-image-extractor.md b/.opencode/agents/paper-image-extractor.md new file mode 100644 index 0000000..dc7b468 --- /dev/null +++ b/.opencode/agents/paper-image-extractor.md @@ -0,0 +1,18 @@ +--- +description: 提取论文Markdown文件中的图片并生成文字理解,用于指导论文复现 +mode: subagent +tools: + write: true + edit: true + bash: true +--- +你是一个专门用于“论文图片识别与理解”的Agent。 + +你的核心任务是: +1. 接收或寻找用户指定的论文 Markdown(.md)文件。 +2. 读取该文件并提取其中包含的所有图片链接或路径(如实验图表、网络架构图、算法伪代码、公式截图等)。 +3. 借助你的视觉理解能力或相关工具分析这些图片,提取出图片中的关键信息和深层含义。 +4. 将这些图片的视觉信息转化为详细的文字理解版本。这些文字应该足够清晰专业,能够直接指导其他代码生成模型进行论文的代码复现工作。 +5. 将最终的理解结果汇总,可以直接输出给用户,或者将其保存为一个专门的文档(如 `image_understanding.md`)供后续环节使用。 + +请确保你对图片的解析准确,特别是模型架构和数据流向,这对复现工作至关重要。 diff --git a/docs/superpowers/plans/2026-03-31-paper-replication-agent.md b/docs/superpowers/plans/2026-03-31-paper-replication-agent.md new file mode 100644 index 0000000..9a07609 --- /dev/null +++ b/docs/superpowers/plans/2026-03-31-paper-replication-agent.md @@ -0,0 +1,2603 @@ +# Paper Replication Agent Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Build a Paper Replication Agent System that automates ML/DL paper reproduction with PyTorch code generation and TDD-driven validation. + +**Architecture:** Primary Agent (paper-director) orchestrates 4 subagents (paper-analyzer, paper-image-extractor, code-writer, test-runner) through file-based context handoff. Skills provide domain-specific guidance. Commands provide entry points. + +**Tech Stack:** OpenCode agents (Markdown), Skills (Markdown), Commands (Markdown), JSON config + +**Spec:** `docs/superpowers/specs/2026-03-31-paper-replication-agent-design.md` + +--- + +## File Structure + +| File | Responsibility | Action | +|------|----------------|--------| +| `.opencode/agents/paper-director.md` | Primary agent - orchestrates workflow, manages checkpoints | Create | +| `.opencode/agents/paper-analyzer.md` | Subagent - parses paper text, creates replication plan | Create | +| `.opencode/agents/paper-image-extractor.md` | Subagent - extracts and understands paper images | Create | +| `.opencode/agents/code-writer.md` | Subagent - generates PyTorch code with TDD | Create | +| `.opencode/agents/test-runner.md` | Subagent - runs tests, creates replication report | Create | +| `.opencode/skills/paper-parsing/SKILL.md` | Skill - paper analysis methodology | Create | +| `.opencode/skills/code-generation/SKILL.md` | Skill - code generation from paper | Create | +| `.opencode/skills/pytorch-patterns/SKILL.md` | Skill - PyTorch best practices | Create | +| `.opencode/skills/verification/SKILL.md` | Skill - result verification methodology | Create | +| `.opencode/skills/environment-management/SKILL.md` | Skill - Conda + uv environment setup | Create | +| `.opencode/commands/replicate.md` | Command - /replicate entry point | Create | +| `.opencode/commands/verify.md` | Command - /verify entry point | Create | +| `opencode.json` | Project configuration | Create | +| `workspace/.gitkeep` | Workspace directory placeholder | Create | + +--- + +### Task 1: Create paper-director Agent (Primary) + +**Files:** +- Create: `.opencode/agents/paper-director.md` + +- [ ] **Step 1: Create agents directory** + +```bash +mkdir -p .opencode/agents +``` + +- [ ] **Step 2: Write paper-director.md** + +Create `.opencode/agents/paper-director.md`: + +```markdown +--- +name: paper-director +description: | + Primary agent for ML/DL paper replication. Orchestrates the complete workflow: + 1. Creates workspace directories + 2. Dispatches paper-image-extractor to analyze images + 3. Dispatches paper-analyzer to parse paper and create replication plan + 4. Presents human checkpoint for approval + 5. Generates tests and dispatches code-writer + 6. Dispatches test-runner for final verification + Use when: User wants to replicate a paper, or runs /replicate command. +mode: primary +model: inherit +--- + +# Paper Replication Director + +You are the orchestrator for ML/DL paper replication projects. Your role is to manage the complete workflow from paper analysis to working PyTorch code. + +## Core Responsibilities + +1. **Workspace Management**: Create and organize project directories +2. **Workflow Orchestration**: Dispatch subagents in correct sequence +3. **Quality Control**: Ensure outputs meet standards before proceeding +4. **Human Checkpoint**: Present analysis results for user approval +5. **Error Recovery**: Handle failures gracefully + +## Workflow + +### Phase 1: Paper Analysis + +When given a paper (Markdown file or text): + +1. **Create workspace directory**: + ``` + workspace/{paper_name}/ + ├── analysis/ + ├── src/ + │ ├── models/ + │ ├── training/ + │ └── utils/ + ├── tests/ + ├── docs/ + └── reports/ + ``` + +2. **Dispatch @paper-image-extractor**: + - Input: Paper file path + - Output: `analysis/image_understanding.md` + - Wait for completion before proceeding + +3. **Dispatch @paper-analyzer**: + - Input: Paper file + `analysis/image_understanding.md` + - Output: `analysis/paper_structure.md` + `analysis/replication_plan.md` + - Wait for completion before proceeding + +4. **Human Checkpoint** - Present to user: + ``` + ## Paper Analysis Complete + + ### Basic Information + - Title: {title} + - Core contribution: {summary} + + ### Model Architecture + {architecture_description} + + ### Replication Targets + {list_of_figures_to_replicate} + + ### Implementation Plan + {planned_modules} + + ### Risks and Limitations + {identified_risks} + + --- + Please review and confirm to proceed, or provide corrections. + ``` + +### Phase 2: Code Generation (TDD Mode) + +After user approval: + +1. **Load Skills**: + - Load `code-generation` skill + - Load `pytorch-patterns` skill + - Load `environment-management` skill + +2. **Generate Test Cases**: + - Create test files based on replication plan + - Tests should verify model architecture, forward pass, loss computation + +3. **Dispatch @code-writer** iteratively: + - For each module in replication plan: + - Provide: Analysis docs + relevant test files + - Expect: Implementation that passes tests + - Iterate until all tests pass (max 3 retries per module) + +4. **Generate Documentation**: + - Create `docs/README.md` with usage instructions + +### Phase 3: Verification + +1. **Dispatch @test-runner**: + - Run complete test suite + - Compare with paper's expected results + - Generate `reports/replication_report.md` + +2. **Present Final Report** to user + +## Error Handling + +| Error | Action | +|-------|--------| +| Paper file not found | Ask user to provide correct path | +| Image extraction fails | Mark images as "unable to parse", continue | +| Test fails after 3 retries | Mark module as "needs manual intervention", continue with others | +| Missing dependencies | Suggest installation commands | + +## Output Format + +Always structure your responses clearly: +- Use headers for phases +- Show progress indicators +- Highlight decisions requiring user input +- Summarize completed work before asking for confirmation +``` + +- [ ] **Step 3: Verify file creation** + +```bash +cat .opencode/agents/paper-director.md +``` + +Expected: File contents match the markdown above. + +- [ ] **Step 4: Commit** + +```bash +git add .opencode/agents/paper-director.md +git commit -m "feat(agents): add paper-director primary agent + +Orchestrates ML/DL paper replication workflow with human checkpoint." +``` + +--- + +### Task 2: Create paper-analyzer Agent (Subagent) + +**Files:** +- Create: `.opencode/agents/paper-analyzer.md` + +- [ ] **Step 1: Write paper-analyzer.md** + +Create `.opencode/agents/paper-analyzer.md`: + +```markdown +--- +name: paper-analyzer +description: | + Subagent that parses ML/DL paper text content and creates structured analysis. + Produces paper_structure.md (what the paper contains) and replication_plan.md (what to implement). + Requires image_understanding.md as input for complete analysis. +mode: subagent +model: inherit +permission: + edit: allow + bash: deny +--- + +# Paper Analyzer + +You analyze ML/DL papers and produce structured documentation for replication. + +## Required Inputs + +1. **Paper content**: Markdown file or plain text +2. **Image understanding**: `image_understanding.md` from paper-image-extractor + +## Required Outputs + +### 1. paper_structure.md + +```markdown +# Paper Structure Analysis + +## Basic Information +- **Title**: +- **Authors**: +- **Year**: +- **Venue**: + +## Abstract Summary +{2-3 sentence summary of core contribution} + +## Problem Statement +{What problem does this paper solve?} + +## Key Contributions +1. {contribution 1} +2. {contribution 2} +... + +## Method Overview + +### Architecture +{Text description of model architecture} +{Reference to architecture diagrams from image_understanding.md} + +### Key Components +| Component | Description | Implementation Priority | +|-----------|-------------|------------------------| +| {name} | {what it does} | {high/medium/low} | + +### Mathematical Formulation +{Key equations in LaTeX} + +$$ +L = L_{task} + \lambda L_{reg} +$$ + +### Training Details +- **Optimizer**: +- **Learning rate**: +- **Batch size**: +- **Epochs**: +- **Hardware**: + +## Experiments + +### Datasets +| Dataset | Size | Purpose | +|---------|------|---------| +| {name} | {size} | {train/eval/test} | + +### Metrics +- {metric 1}: {description} +- {metric 2}: {description} + +### Key Results +{Reference to result figures from image_understanding.md} +{Numerical results to reproduce} + +## Appendix Notes +{Any supplementary material findings} +``` + +### 2. replication_plan.md + +```markdown +# Replication Plan + +## Scope +{What will be replicated vs. what is out of scope} + +## Implementation Order + +### Module 1: {name} +- **File**: `src/models/{filename}.py` +- **Dependencies**: None +- **Test file**: `tests/test_{filename}.py` +- **Acceptance criteria**: + - [ ] Forward pass produces correct output shape + - [ ] Gradient flow verified + - [ ] {specific behavior from paper} + +### Module 2: {name} +... + +## Replication Targets + +### Figure X: {description} +- **Type**: {architecture diagram / training curve / comparison table} +- **Data source**: {what computation produces this} +- **Priority**: {high/medium/low} +- **Expected values**: {numerical ranges if applicable} + +## Environment Requirements +- Python >= 3.10 +- PyTorch >= 2.0 +- {other dependencies} + +## Estimated Effort +- Core model: {X hours} +- Training pipeline: {X hours} +- Evaluation: {X hours} + +## Known Challenges +1. {challenge}: {mitigation strategy} +``` + +## Analysis Methodology + +When analyzing a paper: + +1. **First pass**: Extract basic info (title, authors, abstract) +2. **Method pass**: Understand architecture and algorithms +3. **Experiment pass**: Identify what needs to be reproduced +4. **Integration pass**: Combine with image_understanding.md +5. **Planning pass**: Create actionable replication plan + +## Quality Checklist + +Before completing: +- [ ] All sections of paper_structure.md filled +- [ ] Image descriptions integrated from image_understanding.md +- [ ] Replication plan has clear module boundaries +- [ ] Each module has testable acceptance criteria +- [ ] Dependencies between modules identified +- [ ] Numerical targets extracted where available +``` + +- [ ] **Step 2: Verify file creation** + +```bash +cat .opencode/agents/paper-analyzer.md +``` + +Expected: File contents match the markdown above. + +- [ ] **Step 3: Commit** + +```bash +git add .opencode/agents/paper-analyzer.md +git commit -m "feat(agents): add paper-analyzer subagent + +Parses paper text and creates replication plan with testable criteria." +``` + +--- + +### Task 3: Create paper-image-extractor Agent (Subagent) + +**Files:** +- Create: `.opencode/agents/paper-image-extractor.md` + +- [ ] **Step 1: Write paper-image-extractor.md** + +Create `.opencode/agents/paper-image-extractor.md`: + +```markdown +--- +name: paper-image-extractor +description: | + Subagent that extracts and understands images from ML/DL papers. + Analyzes architecture diagrams, experiment plots, algorithm pseudocode, and equations. + Output is used by paper-analyzer to create complete replication plan. +mode: subagent +model: inherit +permission: + edit: allow + bash: + "*": deny + "ls *": allow +--- + +# Paper Image Extractor + +You extract and analyze images from ML/DL papers, producing detailed text descriptions that enable code replication. + +## Required Input + +- Paper file path (Markdown with image references) + +## Required Output + +`image_understanding.md` in the analysis directory. + +## Output Format + +```markdown +# Image Understanding + +## Summary +- Total images found: {N} +- Architecture diagrams: {N} +- Experiment figures: {N} +- Algorithm/pseudocode: {N} +- Equations/tables: {N} + +--- + +## Image 1: {caption or identifier} + +**Type**: Architecture Diagram | Experiment Plot | Algorithm | Equation | Table | Other + +**Location**: {file path or URL} + +**Description**: +{Detailed text description of what the image shows} + +### For Architecture Diagrams: + +**Components**: +| Layer/Block | Input Shape | Output Shape | Parameters | +|-------------|-------------|--------------|------------| +| {name} | {shape} | {shape} | {count if shown} | + +**Data Flow**: +1. Input → {first operation} +2. {intermediate steps} +3. → Output + +**Key Details**: +- {notable architectural choices} +- {skip connections, attention mechanisms, etc.} + +### For Experiment Plots: + +**Axes**: +- X-axis: {label} (range: {min}-{max}) +- Y-axis: {label} (range: {min}-{max}) + +**Data Series**: +| Series | Description | Key Points | +|--------|-------------|------------| +| {name/color} | {what it represents} | {peak value, convergence point, etc.} | + +**Numerical Extraction**: +- At x={value}: y≈{value} +- Final value: {value} +- Best result: {value} + +**Trends**: +- {observed patterns} + +### For Algorithm/Pseudocode: + +**Algorithm Name**: {name} + +**Inputs**: {list} +**Outputs**: {list} + +**Steps**: +1. {step 1} +2. {step 2} +... + +**Python Translation Hint**: +```python +# Suggested structure +def algorithm_name(inputs): + # step 1 + # step 2 + return outputs +``` + +### For Equations: + +**Equation**: +$$ +{LaTeX representation} +$$ + +**Variables**: +- {symbol}: {meaning} + +**Implementation Notes**: +- {how to compute this in PyTorch} + +--- + +## Image 2: ... +``` + +## Analysis Guidelines + +### Architecture Diagrams +- Identify all layers/blocks and their connections +- Note input/output shapes when visible +- Capture skip connections, residual paths +- Identify attention mechanisms, normalization layers +- Note any dimension annotations + +### Experiment Plots +- Extract actual numerical values where possible +- Identify which curve corresponds to the paper's method +- Note baseline comparisons +- Capture convergence behavior +- Identify error bars or confidence intervals + +### Algorithm Pseudocode +- Convert to structured steps +- Identify loops, conditions +- Note any hyperparameters mentioned +- Suggest PyTorch equivalents + +### Equations +- Transcribe to LaTeX +- Define all variables +- Note how to implement in code + +## Replication Priority + +Mark each image with replication priority: +- **HIGH**: Core architecture, main results to reproduce +- **MEDIUM**: Training curves, ablation studies +- **LOW**: Conceptual diagrams, background figures + +## Quality Checklist + +Before completing: +- [ ] All images in paper cataloged +- [ ] Architecture diagrams have layer-by-layer breakdown +- [ ] Experiment figures have numerical values extracted +- [ ] Equations transcribed to LaTeX +- [ ] Replication priorities assigned +- [ ] Output enables paper-analyzer to create complete plan +``` + +- [ ] **Step 2: Verify file creation** + +```bash +cat .opencode/agents/paper-image-extractor.md +``` + +Expected: File contents match the markdown above. + +- [ ] **Step 3: Commit** + +```bash +git add .opencode/agents/paper-image-extractor.md +git commit -m "feat(agents): add paper-image-extractor subagent + +Analyzes paper images to extract architecture details and numerical results." +``` + +--- + +### Task 4: Create code-writer Agent (Subagent) + +**Files:** +- Create: `.opencode/agents/code-writer.md` + +- [ ] **Step 1: Write code-writer.md** + +Create `.opencode/agents/code-writer.md`: + +```markdown +--- +name: code-writer +description: | + Subagent that generates PyTorch code based on paper analysis. + Works in TDD mode: receives test files, writes code to pass tests. + Also manages project environment using Conda + uv. +mode: subagent +model: inherit +permission: + edit: allow + bash: + "*": allow +--- + +# Code Writer + +You generate PyTorch code to replicate ML/DL papers, working in strict TDD mode. + +## Required Inputs + +1. `paper_structure.md` - Paper analysis +2. `image_understanding.md` - Image analysis +3. `replication_plan.md` - Implementation plan +4. Test files for the module to implement + +## Working Mode: TDD + +**Iron Rule**: Write code ONLY to make failing tests pass. + +1. Receive test file +2. Run test to verify it fails +3. Write minimal code to pass +4. Run test to verify it passes +5. Refactor if needed (keeping tests green) + +## Environment Setup + +Before writing any code, ensure environment is ready: + +### Step 1: Check/Create Conda Base + +```bash +# Check if ai_base exists +conda env list | grep ai_base + +# If not exists, create it +conda create -n ai_base python=3.10 -y +``` + +### Step 2: Create Project Environment + +```bash +cd workspace/{paper_name} + +# Create uv virtual environment using Conda's Python +uv venv --python $(conda run -n ai_base which python) + +# On Windows: +# uv venv --python $(conda run -n ai_base python -c "import sys; print(sys.executable)") +``` + +### Step 3: Create pyproject.toml + +```toml +[project] +name = "{paper_name}" +version = "0.1.0" +requires-python = ">=3.10" +dependencies = [ + "torch>=2.0.0", + "numpy>=1.24.0", + "matplotlib>=3.7.0", + "tqdm>=4.65.0", +] + +[project.optional-dependencies] +dev = [ + "pytest>=7.0.0", + "pytest-cov>=4.0.0", +] + +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" +``` + +### Step 4: Install Dependencies + +```bash +# Activate and install +source .venv/bin/activate # Linux/Mac +# .venv\Scripts\activate # Windows + +uv pip install -e ".[dev]" +``` + +## Code Generation Guidelines + +### Model Architecture + +```python +""" +{module_name}.py + +Implements {component} from "{paper_title}" +Reference: Section {X}, Figure {Y} +""" + +import torch +import torch.nn as nn +import torch.nn.functional as F +from typing import Optional, Tuple + + +class {ComponentName}(nn.Module): + """ + {Brief description from paper} + + Args: + {param}: {description} + + Paper reference: + - Architecture: Figure {X} + - Equation: ({Y}) + """ + + def __init__(self, {params}): + super().__init__() + # Initialize layers + + def forward(self, x: torch.Tensor) -> torch.Tensor: + """ + Forward pass. + + Args: + x: Input tensor of shape {expected_shape} + + Returns: + Output tensor of shape {output_shape} + """ + # Implementation + return output +``` + +### Training Scripts + +```python +""" +train.py + +Training script for {paper_title} replication. +""" + +import torch +from torch.utils.data import DataLoader +from tqdm import tqdm + +def train_epoch(model, dataloader, optimizer, criterion, device): + """Single training epoch.""" + model.train() + total_loss = 0.0 + + for batch in tqdm(dataloader, desc="Training"): + # Training step + pass + + return total_loss / len(dataloader) + + +def main(): + # Configuration from paper + config = { + "lr": 1e-4, # Section X + "batch_size": 32, # Section X + "epochs": 100, + } + + # Setup + device = torch.device("cuda" if torch.cuda.is_available() else "cpu") + + # Model, optimizer, criterion + # ... + + # Training loop + for epoch in range(config["epochs"]): + loss = train_epoch(model, train_loader, optimizer, criterion, device) + print(f"Epoch {epoch+1}: Loss = {loss:.4f}") + + +if __name__ == "__main__": + main() +``` + +## File Organization + +``` +src/ +├── __init__.py +├── models/ +│ ├── __init__.py +│ ├── {main_model}.py +│ └── {component}.py +├── training/ +│ ├── __init__.py +│ ├── train.py +│ ├── losses.py +│ └── optimizers.py +└── utils/ + ├── __init__.py + ├── data.py + └── metrics.py +``` + +## Quality Checklist + +Before completing each module: +- [ ] All tests pass +- [ ] Type hints on all public functions +- [ ] Docstrings with paper references +- [ ] Input/output shapes documented +- [ ] No hardcoded magic numbers (use config) +- [ ] Device-agnostic (CPU/GPU) +``` + +- [ ] **Step 2: Verify file creation** + +```bash +cat .opencode/agents/code-writer.md +``` + +Expected: File contents match the markdown above. + +- [ ] **Step 3: Commit** + +```bash +git add .opencode/agents/code-writer.md +git commit -m "feat(agents): add code-writer subagent + +Generates PyTorch code in TDD mode with environment management." +``` + +--- + +### Task 5: Create test-runner Agent (Subagent) + +**Files:** +- Create: `.opencode/agents/test-runner.md` + +- [ ] **Step 1: Write test-runner.md** + +Create `.opencode/agents/test-runner.md`: + +```markdown +--- +name: test-runner +description: | + Subagent that runs tests, verifies code correctness, and generates replication reports. + Compares results with paper's expected values and documents any differences. +mode: subagent +model: inherit +permission: + edit: allow + bash: + "*": allow +--- + +# Test Runner + +You run tests, verify replication correctness, and generate comprehensive reports. + +## Required Inputs + +1. Generated code in `src/` +2. Test files in `tests/` +3. `replication_plan.md` with expected results + +## Required Outputs + +1. Test execution results +2. `reports/replication_report.md` + +## Workflow + +### Step 1: Run Test Suite + +```bash +cd workspace/{paper_name} +source .venv/bin/activate + +# Run all tests with coverage +pytest tests/ -v --cov=src --cov-report=term-missing +``` + +### Step 2: Verify Replication Targets + +For each target in replication_plan.md: + +1. Run the relevant computation +2. Compare with expected values +3. Calculate deviation + +### Step 3: Generate Report + +## Report Format + +```markdown +# Replication Report: {Paper Title} + +**Date**: {date} +**Status**: {Complete | Partial | Failed} + +## Summary + +| Metric | Status | +|--------|--------| +| Tests Passing | {X}/{Y} | +| Code Coverage | {X}% | +| Replication Accuracy | {qualitative} | + +## Test Results + +### Unit Tests + +| Test | Status | Time | +|------|--------|------| +| test_model_forward | PASS | 0.1s | +| test_loss_computation | PASS | 0.05s | +| ... | ... | ... | + +### Failed Tests (if any) + +#### {test_name} +- **Error**: {error message} +- **Expected**: {expected} +- **Actual**: {actual} +- **Likely cause**: {analysis} + +## Replication Targets + +### Figure X: {description} + +**Status**: Replicated | Partially Replicated | Not Replicated + +**Paper Values**: +| Metric | Paper | Ours | Deviation | +|--------|-------|------|-----------| +| {metric} | {value} | {value} | {%} | + +**Analysis**: +{explanation of any differences} + +### Table Y: {description} + +... + +## Code Quality + +- **Type Safety**: {assessment} +- **Documentation**: {assessment} +- **Test Coverage**: {percentage} + +## Reproducibility Checklist + +- [ ] Environment setup documented +- [ ] Random seeds set +- [ ] Hyperparameters match paper +- [ ] Data preprocessing matches paper +- [ ] Evaluation metrics match paper + +## Known Differences from Paper + +1. **{difference}**: {explanation and justification} + +## Recommendations + +1. {recommendation for improvement} + +## Appendix: Full Test Output + +``` +{pytest output} +``` +``` + +## Deviation Thresholds + +| Deviation | Classification | +|-----------|----------------| +| < 1% | Excellent match | +| 1-5% | Acceptable | +| 5-10% | Needs investigation | +| > 10% | Significant difference | + +## Analysis Guidelines + +When results differ from paper: + +1. Check implementation against paper equations +2. Verify hyperparameters +3. Check data preprocessing +4. Consider numerical precision differences +5. Note if paper has known errata + +## Quality Checklist + +Before completing: +- [ ] All tests executed +- [ ] Coverage report generated +- [ ] Each replication target evaluated +- [ ] Deviations analyzed and explained +- [ ] Recommendations provided +- [ ] Report is self-contained +``` + +- [ ] **Step 2: Verify file creation** + +```bash +cat .opencode/agents/test-runner.md +``` + +Expected: File contents match the markdown above. + +- [ ] **Step 3: Commit** + +```bash +git add .opencode/agents/test-runner.md +git commit -m "feat(agents): add test-runner subagent + +Runs tests and generates comprehensive replication reports." +``` + +--- + +### Task 6: Create paper-parsing Skill + +**Files:** +- Create: `.opencode/skills/paper-parsing/SKILL.md` + +- [ ] **Step 1: Create skills directory** + +```bash +mkdir -p .opencode/skills/paper-parsing +``` + +- [ ] **Step 2: Write SKILL.md** + +Create `.opencode/skills/paper-parsing/SKILL.md`: + +```markdown +--- +name: paper-parsing +description: Use when analyzing ML/DL papers to ensure comprehensive extraction of all relevant information +--- + +# Paper Parsing Methodology + +## Overview + +Systematic approach to parsing ML/DL papers for replication. Emphasizes **completeness** and **openness** to avoid missing critical details. + +**Announce at start:** "I'm using the paper-parsing skill to ensure comprehensive paper analysis." + +## Core Philosophy + +1. **Completeness over speed**: Better to extract too much than miss something +2. **Open-ended discovery**: Papers contain unique insights; don't force into templates +3. **Cross-reference**: Information appears in multiple places; cross-check +4. **Explicit uncertainty**: Mark unclear items rather than guessing + +## Paper Sections Checklist + +### Abstract +- [ ] Core contribution identified +- [ ] Key results/numbers extracted +- [ ] Problem domain understood + +### Introduction +- [ ] Problem motivation clear +- [ ] Gap in existing work identified +- [ ] Proposed solution summarized +- [ ] Claimed contributions listed + +### Related Work +- [ ] Key prior methods identified +- [ ] Differences from this work noted +- [ ] Potential baselines for comparison + +### Method / Approach +- [ ] Architecture fully described +- [ ] All components identified +- [ ] Mathematical formulation complete +- [ ] Training procedure detailed +- [ ] Loss functions specified +- [ ] Hyperparameters listed + +### Experiments +- [ ] Datasets listed with sizes +- [ ] Evaluation metrics defined +- [ ] Baseline comparisons noted +- [ ] Ablation studies cataloged +- [ ] Key numerical results extracted + +### Appendix / Supplementary +- [ ] Additional implementation details +- [ ] Extended results +- [ ] Proofs or derivations +- [ ] Code references + +## Information Extraction Patterns + +### Architecture Details + +Look for: +- Layer types and configurations +- Activation functions +- Normalization methods +- Attention mechanisms +- Skip connections +- Input/output dimensions + +Common locations: +- Method section figures +- Architecture diagrams +- Table of hyperparameters +- Appendix implementation details + +### Training Configuration + +| Parameter | Typical Locations | +|-----------|-------------------| +| Learning rate | Experiments, Appendix | +| Batch size | Experiments, Appendix | +| Optimizer | Method, Appendix | +| Epochs | Experiments | +| Hardware | Experiments, Appendix | +| Training time | Experiments | + +### Numerical Results + +Extract from: +- Main results tables +- Comparison figures +- Ablation tables +- Training curves (approximate values) + +Format as: +| Metric | Dataset | Value | Conditions | +|--------|---------|-------|------------| +| Accuracy | CIFAR-10 | 95.2% | ResNet-50 backbone | + +## Common Omissions to Watch For + +1. **Initialization**: Often in appendix or not mentioned +2. **Data augmentation**: May be standard but unspecified +3. **Early stopping criteria**: Often implied +4. **Evaluation protocol**: Train/val/test split details +5. **Random seeds**: Reproducibility details +6. **Software versions**: PyTorch, CUDA versions + +## Quality Verification + +Before completing analysis: + +1. **Coverage check**: Every section reviewed? +2. **Consistency check**: Numbers match across sections? +3. **Completeness check**: Could someone implement from this? +4. **Ambiguity check**: Unclear items marked? + +## Output Quality Markers + +Good analysis: +- Specific numbers, not "good performance" +- Exact layer configs, not "standard ResNet" +- Explicit uncertainty markers +- Cross-references between sections + +Poor analysis: +- Vague descriptions +- Missing hyperparameters +- No numerical targets +- Assumptions without noting them + +## Red Flags + +If you notice: +- "Implementation details in code" → Check GitHub link +- "Standard settings" → Look up the standard +- "Following [citation]" → May need to read that paper +- Inconsistent numbers → Note the discrepancy +``` + +- [ ] **Step 3: Verify file creation** + +```bash +cat .opencode/skills/paper-parsing/SKILL.md +``` + +Expected: File contents match the markdown above. + +- [ ] **Step 4: Commit** + +```bash +git add .opencode/skills/paper-parsing/SKILL.md +git commit -m "feat(skills): add paper-parsing skill + +Comprehensive methodology for ML/DL paper analysis." +``` + +--- + +### Task 7: Create code-generation Skill + +**Files:** +- Create: `.opencode/skills/code-generation/SKILL.md` + +- [ ] **Step 1: Create skill directory** + +```bash +mkdir -p .opencode/skills/code-generation +``` + +- [ ] **Step 2: Write SKILL.md** + +Create `.opencode/skills/code-generation/SKILL.md`: + +```markdown +--- +name: code-generation +description: Use when generating PyTorch code from paper analysis to ensure correct mapping from paper to code +--- + +# Code Generation from Papers + +## Overview + +Guidelines for translating paper descriptions into working PyTorch code. + +**Announce at start:** "I'm using the code-generation skill to ensure accurate paper-to-code translation." + +## Core Principles + +1. **Traceability**: Every code block should reference paper section/equation +2. **Testability**: Write code that can be unit tested +3. **Readability**: Prefer clarity over cleverness +4. **Modularity**: One component per file + +## Paper-to-Code Mapping + +### Architecture Diagrams → nn.Module + +| Diagram Element | PyTorch Equivalent | +|-----------------|-------------------| +| Box/Block | nn.Module subclass | +| Arrow | forward() call chain | +| Split | Multiple outputs / tuple | +| Merge | torch.cat / torch.add | +| Skip connection | Residual addition | + +### Equations → Tensor Operations + +| Notation | PyTorch | +|----------|---------| +| $Wx + b$ | `nn.Linear(in, out)` | +| $\sigma(x)$ | `torch.sigmoid(x)` or `nn.Sigmoid()` | +| $\text{softmax}(x)$ | `F.softmax(x, dim=-1)` | +| $\|x\|_2$ | `torch.norm(x, p=2)` | +| $x \odot y$ | `x * y` (element-wise) | +| $x^T y$ | `torch.matmul(x.T, y)` or `x.T @ y` | +| $\sum_i$ | `torch.sum(x, dim=i)` | +| $\mathbb{E}[x]$ | `torch.mean(x)` | + +### Loss Functions + +| Paper Description | PyTorch | +|-------------------|---------| +| Cross-entropy | `nn.CrossEntropyLoss()` | +| MSE / L2 | `nn.MSELoss()` | +| L1 | `nn.L1Loss()` | +| BCE | `nn.BCEWithLogitsLoss()` | +| KL divergence | `nn.KLDivLoss()` | +| Custom | Subclass or functional | + +## Code Structure Template + +```python +""" +{component_name}.py + +Implements {what} from "{paper_title}" ({year}) + +Paper Reference: +- Section: {section_number} +- Equation: ({equation_number}) +- Figure: {figure_number} + +Author: Auto-generated for paper replication +""" + +import torch +import torch.nn as nn +import torch.nn.functional as F +from typing import Optional, Tuple, List + + +class {ComponentName}(nn.Module): + """ + {One-line description} + + From paper: "{exact quote or paraphrase}" + + Args: + {param1}: {description} (paper: {where specified}) + {param2}: {description} + + Shape: + - Input: {shape description} + - Output: {shape description} + + Example: + >>> layer = {ComponentName}(dim=512) + >>> x = torch.randn(32, 100, 512) + >>> out = layer(x) + >>> out.shape + torch.Size([32, 100, 512]) + """ + + def __init__( + self, + {param1}: {type}, + {param2}: {type} = {default}, + ): + super().__init__() + + # Paper Section X.Y: "{description}" + self.layer1 = nn.Linear(...) + + # Equation (N): ... + self.layer2 = nn.LayerNorm(...) + + def forward(self, x: torch.Tensor) -> torch.Tensor: + """ + Forward pass implementing Equation (N). + + Args: + x: Input tensor of shape (batch, seq, dim) + + Returns: + Output tensor of shape (batch, seq, dim) + """ + # Step 1: ... (Eq. N, first term) + h = self.layer1(x) + + # Step 2: ... (Eq. N, second term) + out = self.layer2(h) + + return out +``` + +## Common Patterns + +### Residual Connection + +```python +# Paper: "We add a residual connection" +out = self.sublayer(x) + x +``` + +### Layer Normalization + +```python +# Paper: "Pre-LN Transformer" +x = self.norm(x) +x = self.attention(x) + +# Paper: "Post-LN Transformer" +x = x + self.attention(x) +x = self.norm(x) +``` + +### Multi-Head Attention + +```python +# Paper: "Standard multi-head attention with h heads" +self.attention = nn.MultiheadAttention( + embed_dim=d_model, + num_heads=h, + dropout=dropout, + batch_first=True, +) +``` + +### Custom Activation + +```python +# Paper: "We use GELU activation" +x = F.gelu(x) + +# Paper: "We use Swish/SiLU activation" +x = F.silu(x) +``` + +## Handling Ambiguity + +When paper is unclear: + +1. **Check code repository** if available +2. **Follow common practice** for the architecture type +3. **Document assumption** in code comment +4. **Add TODO** for verification + +```python +# TODO: Paper unclear on initialization. Using PyTorch default. +# See: https://github.com/paper/repo for reference implementation +self.linear = nn.Linear(in_dim, out_dim) +``` + +## Verification Checklist + +Before completing a module: + +- [ ] All equations implemented +- [ ] Shapes documented and verified +- [ ] Paper references in comments +- [ ] Type hints complete +- [ ] Example in docstring works +- [ ] No hardcoded dimensions (use params) +- [ ] Gradient flow verified (no in-place ops breaking autograd) +``` + +- [ ] **Step 3: Verify file creation** + +```bash +cat .opencode/skills/code-generation/SKILL.md +``` + +Expected: File contents match the markdown above. + +- [ ] **Step 4: Commit** + +```bash +git add .opencode/skills/code-generation/SKILL.md +git commit -m "feat(skills): add code-generation skill + +Paper-to-PyTorch code translation guidelines." +``` + +--- + +### Task 8: Create pytorch-patterns Skill + +**Files:** +- Create: `.opencode/skills/pytorch-patterns/SKILL.md` + +- [ ] **Step 1: Create skill directory** + +```bash +mkdir -p .opencode/skills/pytorch-patterns +``` + +- [ ] **Step 2: Write SKILL.md** + +Create `.opencode/skills/pytorch-patterns/SKILL.md`: + +```markdown +--- +name: pytorch-patterns +description: Use when writing PyTorch code to follow best practices and common patterns +--- + +# PyTorch Best Practices + +## Overview + +Established patterns for writing clean, efficient, and maintainable PyTorch code. + +**Announce at start:** "I'm using the pytorch-patterns skill for best practice code." + +## Model Definition + +### Basic Module + +```python +import torch +import torch.nn as nn +from typing import Optional + + +class MyModel(nn.Module): + def __init__(self, config: dict): + super().__init__() + self.config = config + + # Define layers + self.encoder = nn.Linear(config["input_dim"], config["hidden_dim"]) + self.decoder = nn.Linear(config["hidden_dim"], config["output_dim"]) + + # Initialize weights + self._init_weights() + + def _init_weights(self): + """Initialize weights following paper's specification.""" + for module in self.modules(): + if isinstance(module, nn.Linear): + nn.init.xavier_uniform_(module.weight) + if module.bias is not None: + nn.init.zeros_(module.bias) + + def forward(self, x: torch.Tensor) -> torch.Tensor: + h = self.encoder(x) + h = torch.relu(h) + out = self.decoder(h) + return out +``` + +### Model with Multiple Outputs + +```python +from typing import Tuple, NamedTuple + + +class ModelOutput(NamedTuple): + logits: torch.Tensor + hidden_states: torch.Tensor + attention_weights: Optional[torch.Tensor] = None + + +class MultiOutputModel(nn.Module): + def forward(self, x: torch.Tensor) -> ModelOutput: + # ... computation ... + return ModelOutput( + logits=logits, + hidden_states=hidden, + attention_weights=attn if self.return_attention else None, + ) +``` + +## Device Management + +### Automatic Device Handling + +```python +class DeviceAwareModel(nn.Module): + @property + def device(self) -> torch.device: + """Get model's device from first parameter.""" + return next(self.parameters()).device + + def forward(self, x: torch.Tensor) -> torch.Tensor: + # Input automatically on correct device if caller handles it + # For internal tensors: + mask = torch.ones(x.size(0), device=self.device) + return x * mask +``` + +### Training Script Device Setup + +```python +def get_device() -> torch.device: + """Get best available device.""" + if torch.cuda.is_available(): + return torch.device("cuda") + elif torch.backends.mps.is_available(): + return torch.device("mps") + return torch.device("cpu") + + +device = get_device() +model = MyModel(config).to(device) + +# DataLoader handles device transfer +for batch in dataloader: + inputs = batch["inputs"].to(device) + targets = batch["targets"].to(device) +``` + +## Training Loop + +### Standard Pattern + +```python +def train_epoch( + model: nn.Module, + dataloader: DataLoader, + optimizer: torch.optim.Optimizer, + criterion: nn.Module, + device: torch.device, + scheduler: Optional[torch.optim.lr_scheduler._LRScheduler] = None, +) -> float: + """Train for one epoch.""" + model.train() + total_loss = 0.0 + num_batches = 0 + + for batch in tqdm(dataloader, desc="Training"): + # Move to device + inputs = batch["inputs"].to(device) + targets = batch["targets"].to(device) + + # Forward pass + optimizer.zero_grad() + outputs = model(inputs) + loss = criterion(outputs, targets) + + # Backward pass + loss.backward() + + # Gradient clipping (if needed) + torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) + + # Update + optimizer.step() + if scheduler is not None: + scheduler.step() + + total_loss += loss.item() + num_batches += 1 + + return total_loss / num_batches + + +@torch.no_grad() +def evaluate( + model: nn.Module, + dataloader: DataLoader, + criterion: nn.Module, + device: torch.device, +) -> Tuple[float, float]: + """Evaluate model.""" + model.eval() + total_loss = 0.0 + correct = 0 + total = 0 + + for batch in dataloader: + inputs = batch["inputs"].to(device) + targets = batch["targets"].to(device) + + outputs = model(inputs) + loss = criterion(outputs, targets) + + total_loss += loss.item() + preds = outputs.argmax(dim=-1) + correct += (preds == targets).sum().item() + total += targets.size(0) + + return total_loss / len(dataloader), correct / total +``` + +## Data Loading + +### Custom Dataset + +```python +from torch.utils.data import Dataset, DataLoader + + +class PaperDataset(Dataset): + def __init__(self, data_path: str, transform=None): + self.data = self._load_data(data_path) + self.transform = transform + + def _load_data(self, path: str): + # Load from disk + pass + + def __len__(self) -> int: + return len(self.data) + + def __getitem__(self, idx: int) -> dict: + item = self.data[idx] + if self.transform: + item = self.transform(item) + return item + + +def get_dataloader( + dataset: Dataset, + batch_size: int, + shuffle: bool = True, + num_workers: int = 4, +) -> DataLoader: + return DataLoader( + dataset, + batch_size=batch_size, + shuffle=shuffle, + num_workers=num_workers, + pin_memory=True, # Faster GPU transfer + drop_last=True, # Consistent batch sizes + ) +``` + +## Checkpointing + +### Save and Load + +```python +def save_checkpoint( + model: nn.Module, + optimizer: torch.optim.Optimizer, + epoch: int, + loss: float, + path: str, +): + """Save training checkpoint.""" + torch.save({ + "epoch": epoch, + "model_state_dict": model.state_dict(), + "optimizer_state_dict": optimizer.state_dict(), + "loss": loss, + }, path) + + +def load_checkpoint( + path: str, + model: nn.Module, + optimizer: Optional[torch.optim.Optimizer] = None, +) -> dict: + """Load training checkpoint.""" + checkpoint = torch.load(path, weights_only=True) + model.load_state_dict(checkpoint["model_state_dict"]) + if optimizer is not None: + optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) + return checkpoint +``` + +## Reproducibility + +### Set Seeds + +```python +import random +import numpy as np +import torch + + +def set_seed(seed: int = 42): + """Set all random seeds for reproducibility.""" + random.seed(seed) + np.random.seed(seed) + torch.manual_seed(seed) + torch.cuda.manual_seed_all(seed) + + # For deterministic behavior (may impact performance) + torch.backends.cudnn.deterministic = True + torch.backends.cudnn.benchmark = False +``` + +## Common Gotchas + +### In-place Operations + +```python +# BAD: Breaks autograd +x += 1 +x[:, 0] = 0 + +# GOOD: Creates new tensor +x = x + 1 +x = torch.cat([torch.zeros_like(x[:, :1]), x[:, 1:]], dim=1) +``` + +### Detaching for Metrics + +```python +# BAD: Keeps computation graph +accuracy = (preds == targets).float().mean() +all_accs.append(accuracy) # Memory leak! + +# GOOD: Detach for logging +accuracy = (preds == targets).float().mean().item() +all_accs.append(accuracy) +``` + +### Mixed Precision + +```python +from torch.cuda.amp import autocast, GradScaler + +scaler = GradScaler() + +for batch in dataloader: + optimizer.zero_grad() + + with autocast(): + outputs = model(inputs) + loss = criterion(outputs, targets) + + scaler.scale(loss).backward() + scaler.step(optimizer) + scaler.update() +``` +``` + +- [ ] **Step 3: Verify file creation** + +```bash +cat .opencode/skills/pytorch-patterns/SKILL.md +``` + +Expected: File contents match the markdown above. + +- [ ] **Step 4: Commit** + +```bash +git add .opencode/skills/pytorch-patterns/SKILL.md +git commit -m "feat(skills): add pytorch-patterns skill + +PyTorch best practices and common patterns." +``` + +--- + +### Task 9: Create verification Skill + +**Files:** +- Create: `.opencode/skills/verification/SKILL.md` + +- [ ] **Step 1: Create skill directory** + +```bash +mkdir -p .opencode/skills/verification +``` + +- [ ] **Step 2: Write SKILL.md** + +Create `.opencode/skills/verification/SKILL.md`: + +```markdown +--- +name: verification +description: Use when verifying replication results against paper's reported values +--- + +# Replication Verification + +## Overview + +Systematic approach to verifying that replicated code produces results matching the original paper. + +**Announce at start:** "I'm using the verification skill to validate replication accuracy." + +## Verification Levels + +### Level 1: Code Correctness +- Unit tests pass +- No runtime errors +- Gradient flow works + +### Level 2: Behavioral Match +- Output shapes correct +- Value ranges reasonable +- Edge cases handled + +### Level 3: Numerical Match +- Results within tolerance of paper +- Trends match (even if absolute values differ) +- Statistical significance considered + +## Test Design for Replication + +### Shape Tests + +```python +def test_model_output_shape(): + """Verify model produces correct output shape per paper.""" + model = MyModel(config) + x = torch.randn(batch_size, seq_len, input_dim) + out = model(x) + + # Paper Section 3.2: "Output dimension is 512" + assert out.shape == (batch_size, seq_len, 512) +``` + +### Value Range Tests + +```python +def test_attention_weights_sum(): + """Attention weights should sum to 1 (paper Eq. 3).""" + model = AttentionLayer(config) + x = torch.randn(batch_size, seq_len, dim) + _, attn_weights = model(x, return_attention=True) + + # Softmax output sums to 1 + assert torch.allclose(attn_weights.sum(dim=-1), torch.ones(batch_size, seq_len)) +``` + +### Gradient Tests + +```python +def test_gradient_flow(): + """Verify gradients flow through all parameters.""" + model = MyModel(config) + x = torch.randn(batch_size, input_dim, requires_grad=True) + out = model(x) + loss = out.sum() + loss.backward() + + for name, param in model.named_parameters(): + assert param.grad is not None, f"No gradient for {name}" + assert not torch.isnan(param.grad).any(), f"NaN gradient for {name}" +``` + +### Numerical Match Tests + +```python +def test_loss_value_reasonable(): + """Loss should be in expected range per paper Figure 2.""" + model = MyModel(config) + # ... setup ... + + loss = compute_loss(model, data) + + # Paper reports initial loss ~2.3 (cross-entropy on 10 classes) + assert 2.0 < loss.item() < 3.0, f"Initial loss {loss.item()} outside expected range" +``` + +## Comparison Methodology + +### Absolute Comparison + +```python +def compare_absolute(paper_value: float, our_value: float, tolerance: float = 0.01): + """Compare with absolute tolerance.""" + diff = abs(paper_value - our_value) + return diff <= tolerance, diff +``` + +### Relative Comparison + +```python +def compare_relative(paper_value: float, our_value: float, tolerance: float = 0.05): + """Compare with relative tolerance (5% default).""" + if paper_value == 0: + return our_value == 0, abs(our_value) + relative_diff = abs(paper_value - our_value) / abs(paper_value) + return relative_diff <= tolerance, relative_diff +``` + +### Statistical Comparison + +```python +def compare_with_variance( + paper_mean: float, + paper_std: float, + our_values: List[float], + confidence: float = 0.95, +): + """Compare considering paper's reported variance.""" + our_mean = np.mean(our_values) + our_std = np.std(our_values) + + # Check if means are within 2 standard deviations + combined_std = np.sqrt(paper_std**2 + our_std**2) + z_score = abs(paper_mean - our_mean) / combined_std + + return z_score < 2.0, z_score +``` + +## Common Difference Sources + +### Acceptable Differences + +| Source | Typical Impact | Mitigation | +|--------|---------------|------------| +| Random seed | 1-2% | Run multiple seeds | +| Floating point | < 0.1% | Use float64 for verification | +| Framework differences | 1-3% | Document and accept | +| Hardware differences | 0.5-1% | Note in report | + +### Concerning Differences + +| Source | Typical Impact | Action | +|--------|---------------|--------| +| Wrong architecture | > 10% | Review code vs paper | +| Wrong hyperparameters | 5-20% | Verify all settings | +| Data preprocessing | Variable | Match paper exactly | +| Evaluation protocol | Variable | Check train/val/test split | + +## Verification Checklist + +### Before Comparison + +- [ ] Seeds set for reproducibility +- [ ] Same evaluation data as paper +- [ ] Same preprocessing pipeline +- [ ] Same evaluation metrics + +### During Comparison + +- [ ] Run multiple times with different seeds +- [ ] Record mean and standard deviation +- [ ] Compare trends, not just final values +- [ ] Check intermediate checkpoints if available + +### After Comparison + +- [ ] Document all differences +- [ ] Explain likely causes +- [ ] Determine if differences are acceptable +- [ ] Suggest improvements if needed + +## Report Template + +```markdown +## Verification Result: {Metric Name} + +**Paper Value**: {value} ± {std} +**Our Value**: {value} ± {std} +**Difference**: {absolute} ({relative}%) + +**Status**: MATCH | ACCEPTABLE | INVESTIGATE | MISMATCH + +**Analysis**: +{explanation of difference} + +**Confidence**: {HIGH | MEDIUM | LOW} +{reasoning for confidence level} +``` +``` + +- [ ] **Step 3: Verify file creation** + +```bash +cat .opencode/skills/verification/SKILL.md +``` + +Expected: File contents match the markdown above. + +- [ ] **Step 4: Commit** + +```bash +git add .opencode/skills/verification/SKILL.md +git commit -m "feat(skills): add verification skill + +Replication result verification methodology." +``` + +--- + +### Task 10: Create environment-management Skill + +**Files:** +- Create: `.opencode/skills/environment-management/SKILL.md` + +- [ ] **Step 1: Create skill directory** + +```bash +mkdir -p .opencode/skills/environment-management +``` + +- [ ] **Step 2: Write SKILL.md** + +Create `.opencode/skills/environment-management/SKILL.md`: + +```markdown +--- +name: environment-management +description: Use when setting up Python environment for ML/DL paper replication using Conda + uv +--- + +# Environment Management (Conda + uv) + +## Overview + +Hybrid approach using Conda for system-level dependencies and uv for project isolation. + +**Announce at start:** "I'm using the environment-management skill for Conda + uv setup." + +## Architecture + +``` +┌─────────────────────────────────────────┐ +│ Conda (System Base) │ +│ - Python interpreter │ +│ - CUDA toolkit │ +│ - System-level C++ libraries │ +└─────────────────────────────────────────┘ + │ + │ provides Python + ▼ +┌─────────────────────────────────────────┐ +│ uv (Project Isolation) │ +│ - Per-project .venv │ +│ - Fast dependency resolution │ +│ - Reproducible installs │ +└─────────────────────────────────────────┘ +``` + +## Setup Commands + +### Step 1: Conda Base Environment + +Check if base exists: +```bash +conda env list | grep ai_base +``` + +Create if needed: +```bash +# Linux/Mac +conda create -n ai_base python=3.10 cuda-toolkit=11.8 -y + +# Windows (CUDA from NVIDIA, not conda) +conda create -n ai_base python=3.10 -y +``` + +### Step 2: Project Environment + +```bash +cd workspace/{paper_name} + +# Get Conda Python path +# Linux/Mac: +PYTHON_PATH=$(conda run -n ai_base which python) + +# Windows: +# PYTHON_PATH=$(conda run -n ai_base python -c "import sys; print(sys.executable)") + +# Create uv venv +uv venv --python $PYTHON_PATH +``` + +### Step 3: Activate and Install + +```bash +# Linux/Mac +source .venv/bin/activate + +# Windows +.venv\Scripts\activate + +# Install dependencies +uv pip install -e ".[dev]" +``` + +## pyproject.toml Template + +```toml +[project] +name = "{paper_name}" +version = "0.1.0" +description = "Replication of {paper_title}" +requires-python = ">=3.10" + +dependencies = [ + # Core ML + "torch>=2.0.0", + "numpy>=1.24.0", + + # Visualization + "matplotlib>=3.7.0", + "seaborn>=0.12.0", + + # Utilities + "tqdm>=4.65.0", + "pyyaml>=6.0", +] + +[project.optional-dependencies] +dev = [ + "pytest>=7.0.0", + "pytest-cov>=4.0.0", + "black>=23.0.0", + "ruff>=0.0.260", +] + +# Add based on paper requirements +vision = [ + "torchvision>=0.15.0", + "pillow>=9.5.0", +] + +nlp = [ + "transformers>=4.30.0", + "tokenizers>=0.13.0", + "datasets>=2.12.0", +] + +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[tool.pytest.ini_options] +testpaths = ["tests"] +python_files = ["test_*.py"] +addopts = "-v --tb=short" + +[tool.black] +line-length = 88 +target-version = ["py310"] + +[tool.ruff] +line-length = 88 +select = ["E", "F", "I", "N", "W"] +``` + +## PyTorch + CUDA Compatibility + +| CUDA Version | PyTorch Version | Install Command | +|--------------|-----------------|-----------------| +| 11.8 | 2.0+ | `uv pip install torch --index-url https://download.pytorch.org/whl/cu118` | +| 12.1 | 2.1+ | `uv pip install torch --index-url https://download.pytorch.org/whl/cu121` | +| CPU only | Any | `uv pip install torch --index-url https://download.pytorch.org/whl/cpu` | + +## Environment Verification + +```bash +# Check Python +python --version + +# Check PyTorch +python -c "import torch; print(f'PyTorch {torch.__version__}')" + +# Check CUDA +python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')" +python -c "import torch; print(f'CUDA version: {torch.version.cuda}')" + +# Check GPU +python -c "import torch; print(f'GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else \"N/A\"}')" +``` + +## Troubleshooting + +### CUDA Not Found + +```bash +# Check NVIDIA driver +nvidia-smi + +# Reinstall PyTorch with correct CUDA +uv pip install torch --index-url https://download.pytorch.org/whl/cu118 --force-reinstall +``` + +### Dependency Conflicts + +```bash +# Clear cache and reinstall +uv cache clean +uv pip install -e ".[dev]" --force-reinstall +``` + +### Permission Errors (Windows) + +```powershell +# Run as Administrator or: +Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser +``` + +## Best Practices + +1. **One environment per paper**: Don't mix dependencies +2. **Pin versions in pyproject.toml**: For reproducibility +3. **Use dev dependencies**: Keep test tools separate +4. **Document CUDA version**: In README.md +5. **Commit pyproject.toml**: Not .venv/ + +## Quick Reference + +```bash +# Full setup sequence (Linux/Mac) +conda activate ai_base || conda create -n ai_base python=3.10 cuda-toolkit=11.8 -y && conda activate ai_base +cd workspace/{paper_name} +uv venv --python $(which python) +source .venv/bin/activate +uv pip install -e ".[dev]" +pytest tests/ -v +``` +``` + +- [ ] **Step 3: Verify file creation** + +```bash +cat .opencode/skills/environment-management/SKILL.md +``` + +Expected: File contents match the markdown above. + +- [ ] **Step 4: Commit** + +```bash +git add .opencode/skills/environment-management/SKILL.md +git commit -m "feat(skills): add environment-management skill + +Conda + uv hybrid environment setup for ML projects." +``` + +--- + +### Task 11: Create /replicate Command + +**Files:** +- Create: `.opencode/commands/replicate.md` + +- [ ] **Step 1: Create commands directory** + +```bash +mkdir -p .opencode/commands +``` + +- [ ] **Step 2: Write replicate.md** + +Create `.opencode/commands/replicate.md`: + +```markdown +--- +description: Start paper replication workflow +agent: paper-director +--- + +Start the paper replication workflow for the specified paper. + +## Input + +Paper file: $ARGUMENTS + +If no file specified, ask the user to provide the path to a paper (Markdown file or paste text directly). + +## Workflow + +1. Validate paper file exists (if path provided) +2. Extract paper name from filename or ask user +3. Create workspace directory: `workspace/{paper_name}/` +4. Begin Phase 1: Paper Analysis + - Dispatch @paper-image-extractor + - Dispatch @paper-analyzer +5. Present Human Checkpoint with analysis summary +6. After approval, begin Phase 2: Code Generation (TDD) +7. Begin Phase 3: Verification +8. Present final replication report + +## Example Usage + +``` +/replicate workspace/attention_is_all_you_need.md +``` + +Or without arguments: +``` +/replicate +> Please provide the path to your paper or paste the content directly. +``` +``` + +- [ ] **Step 3: Verify file creation** + +```bash +cat .opencode/commands/replicate.md +``` + +Expected: File contents match the markdown above. + +- [ ] **Step 4: Commit** + +```bash +git add .opencode/commands/replicate.md +git commit -m "feat(commands): add /replicate command + +Entry point for paper replication workflow." +``` + +--- + +### Task 12: Create /verify Command + +**Files:** +- Create: `.opencode/commands/verify.md` + +- [ ] **Step 1: Write verify.md** + +Create `.opencode/commands/verify.md`: + +```markdown +--- +description: Verify replication results for a completed project +agent: paper-director +--- + +Verify the replication results for an existing project. + +## Input + +Project directory: $ARGUMENTS + +If no directory specified, list available projects in workspace/ and ask user to select. + +## Workflow + +1. Validate project directory exists +2. Check required files exist: + - `analysis/paper_structure.md` + - `analysis/replication_plan.md` + - `src/` with code + - `tests/` with tests +3. Dispatch @test-runner to: + - Run test suite + - Compare results with paper + - Generate/update `reports/replication_report.md` +4. Present verification summary + +## Example Usage + +``` +/verify workspace/attention_is_all_you_need/ +``` + +Or without arguments: +``` +/verify +> Available projects: +> 1. attention_is_all_you_need +> 2. resnet +> Please select a project to verify. +``` +``` + +- [ ] **Step 2: Verify file creation** + +```bash +cat .opencode/commands/verify.md +``` + +Expected: File contents match the markdown above. + +- [ ] **Step 3: Commit** + +```bash +git add .opencode/commands/verify.md +git commit -m "feat(commands): add /verify command + +Entry point for verification of existing replication projects." +``` + +--- + +### Task 13: Create opencode.json Configuration + +**Files:** +- Create: `opencode.json` + +- [ ] **Step 1: Write opencode.json** + +Create `opencode.json`: + +```json +{ + "$schema": "https://opencode.ai/config.json", + "default_agent": "paper-director", + "agent": { + "paper-director": { + "mode": "primary" + }, + "paper-analyzer": { + "mode": "subagent" + }, + "paper-image-extractor": { + "mode": "subagent" + }, + "code-writer": { + "mode": "subagent" + }, + "test-runner": { + "mode": "subagent" + } + } +} +``` + +- [ ] **Step 2: Verify file creation** + +```bash +cat opencode.json +``` + +Expected: Valid JSON matching above. + +- [ ] **Step 3: Commit** + +```bash +git add opencode.json +git commit -m "feat: add opencode.json project configuration + +Sets paper-director as default agent with subagent definitions." +``` + +--- + +### Task 14: Create Workspace Directory + +**Files:** +- Create: `workspace/.gitkeep` + +- [ ] **Step 1: Create workspace directory** + +```bash +mkdir -p workspace +``` + +- [ ] **Step 2: Create .gitkeep** + +```bash +touch workspace/.gitkeep +``` + +Or on Windows: +```powershell +New-Item -ItemType File -Path workspace/.gitkeep -Force +``` + +- [ ] **Step 3: Verify directory creation** + +```bash +ls -la workspace/ +``` + +Expected: Directory exists with .gitkeep file. + +- [ ] **Step 4: Commit** + +```bash +git add workspace/.gitkeep +git commit -m "feat: add workspace directory for paper replication projects + +Papers placed here will be processed by the replication agents." +``` + +--- + +### Task 15: Final Verification + +**Files:** +- Read: All created files + +- [ ] **Step 1: Verify directory structure** + +```bash +find .opencode -type f -name "*.md" | sort +``` + +Expected output: +``` +.opencode/agents/code-writer.md +.opencode/agents/paper-analyzer.md +.opencode/agents/paper-director.md +.opencode/agents/paper-image-extractor.md +.opencode/agents/test-runner.md +.opencode/commands/replicate.md +.opencode/commands/verify.md +.opencode/skills/code-generation/SKILL.md +.opencode/skills/environment-management/SKILL.md +.opencode/skills/paper-parsing/SKILL.md +.opencode/skills/pytorch-patterns/SKILL.md +.opencode/skills/verification/SKILL.md +``` + +- [ ] **Step 2: Verify opencode.json** + +```bash +cat opencode.json | python -m json.tool +``` + +Expected: Valid JSON output, no errors. + +- [ ] **Step 3: Verify workspace exists** + +```bash +ls workspace/ +``` + +Expected: .gitkeep file present. + +- [ ] **Step 4: Run OpenCode to verify agents load** + +```bash +opencode --help +``` + +Then in OpenCode: +``` +/help +``` + +Verify that `/replicate` and `/verify` commands appear. + +- [ ] **Step 5: Test agent switching** + +In OpenCode, press Tab to cycle agents. Verify `paper-director` is available. + +- [ ] **Step 6: Test subagent mention** + +``` +@paper-analyzer Can you help me? +``` + +Verify subagent responds. + +- [ ] **Step 7: Final commit summary** + +```bash +git log --oneline -15 +``` + +Expected: All feature commits present. + +--- + +## Self-Review Checklist + +- [x] **Spec coverage**: All 5 agents, 5 skills, 2 commands, config file defined +- [x] **No placeholders**: All code blocks complete +- [x] **Consistent naming**: Agent/skill names match throughout +- [x] **File paths exact**: All paths specified completely +- [x] **Commits granular**: Each task has a commit step diff --git a/docs/superpowers/specs/2026-03-31-paper-replication-agent-design.md b/docs/superpowers/specs/2026-03-31-paper-replication-agent-design.md new file mode 100644 index 0000000..e8323fe --- /dev/null +++ b/docs/superpowers/specs/2026-03-31-paper-replication-agent-design.md @@ -0,0 +1,757 @@ +# Paper Replication Agent Design Specification + +**Date**: 2026-03-31 +**Status**: Draft +**Author**: OpenCode AI + +--- + +## 1. Overview + +### 1.1 Purpose + +设计一个基于 OpenCode 平台的论文复现 Agent 系统,专注于机器学习/深度学习论文的自动化复现。该系统能够: + +- 解析论文内容和图片 +- 自动生成 PyTorch 复现代码 +- 运行测试验证代码正确性 +- 生成复现报告并对比原论文结果 + +### 1.2 Scope + +- **目标论文类型**: 机器学习/深度学习论文 +- **输入格式**: Markdown 文件、纯文本内容 +- **输出框架**: PyTorch +- **自动化程度**: 解析后人工核验,之后自动执行(TDD 模式) + +### 1.3 Success Criteria + +1. 能够准确解析论文结构和关键信息 +2. 能够理解论文中的架构图和实验图表 +3. 生成可运行的 PyTorch 代码 +4. 代码通过单元测试验证 +5. 生成复现报告,明确标注与原论文的差异 + +--- + +## 2. Architecture + +### 2.1 System Architecture + +采用 **主 Agent 编排 + 专业化 Subagent** 的架构模式。 + +``` +┌─────────────────────────────────────────────────────────┐ +│ paper-director │ +│ (Primary Agent) │ +│ 流程编排 / 质量控制 / 人工核验 │ +└─────────────────────────────────────────────────────────┘ + │ + ┌─────────────────┼─────────────────┐ + ▼ ▼ ▼ +┌───────────────┐ ┌───────────────┐ ┌───────────────┐ +│paper-analyzer │ │paper-image- │ │ code-writer │ +│ (Subagent) │ │extractor │ │ (Subagent) │ +│ 论文解析 │ │ (Subagent) │ │ 代码生成 │ +└───────────────┘ │ 图片理解 │ └───────────────┘ + └───────────────┘ + │ + ▼ + ┌───────────────┐ + │ test-runner │ + │ (Subagent) │ + │ 测试验证 │ + └───────────────┘ +``` + +### 2.2 File Structure + +``` +PaperTool/ +├── .opencode/ +│ ├── agents/ # Agent 定义 +│ │ ├── paper-director.md # 主 Agent(编排者) +│ │ ├── paper-analyzer.md # 论文解析 Subagent +│ │ ├── paper-image-extractor.md # 图片理解 Subagent +│ │ ├── code-writer.md # 代码生成 Subagent +│ │ └── test-runner.md # 测试验证 Subagent +│ │ +│ ├── skills/ # 项目级 Skills +│ │ ├── paper-parsing/ # 论文解析技能 +│ │ │ └── SKILL.md +│ │ ├── code-generation/ # 代码生成技能 +│ │ │ └── SKILL.md +│ │ ├── pytorch-patterns/ # PyTorch 最佳实践 +│ │ │ └── SKILL.md +│ │ ├── verification/ # 验证与对比技能 +│ │ │ └── SKILL.md +│ │ └── environment-management/ # 环境管理技能(Conda + uv) +│ │ └── SKILL.md +│ │ +│ └── commands/ # 快捷命令 +│ ├── replicate.md # /replicate 启动复现流程 +│ └── verify.md # /verify 验证复现结果 +│ +├── docs/superpowers/specs/ # 设计文档 +│ +├── workspace/ # 工作区 +│ ├── paper_name.md # 论文源文件(Markdown) +│ └── {paper_name}/ # 每篇论文一个工作目录 +│ ├── .venv/ # uv 创建的项目虚拟环境 +│ ├── pyproject.toml # 项目依赖配置 +│ ├── analysis/ # 解析结果 +│ │ ├── paper_structure.md # 论文结构分析 +│ │ ├── image_understanding.md # 图片理解 +│ │ └── replication_plan.md # 复现计划 +│ ├── src/ # 生成的代码 +│ │ ├── models/ # 模型定义 +│ │ ├── training/ # 训练脚本 +│ │ └── utils/ # 工具函数 +│ ├── tests/ # 单元测试 +│ ├── docs/ # 使用文档 +│ │ └── README.md +│ └── reports/ # 复现报告 +│ └── replication_report.md +│ +└── opencode.json # 项目配置 +``` + +### 2.3 Environment Management Strategy + +采用 **Conda + uv** 混合模式管理 Python 环境: + +#### 2.3.1 Architecture + +``` +┌─────────────────────────────────────────────────────────┐ +│ Conda (系统底座) │ +│ ├─ ai_base 环境 │ +│ │ ├─ Python 3.10+ │ +│ │ ├─ cuda-toolkit │ +│ │ └─ 底层 C++ 编译链工具 │ +│ │ │ +│ └─ 不安装任何纯 Python 业务包 │ +└─────────────────────────────────────────────────────────┘ + │ + │ 提供 Python 解释器 + ▼ +┌─────────────────────────────────────────────────────────┐ +│ uv (项目隔离舱) │ +│ ├─ workspace/paper_a/.venv/ │ +│ │ └─ torch, transformers, ... │ +│ │ │ +│ ├─ workspace/paper_b/.venv/ │ +│ │ └─ torch, jax, ... │ +│ │ │ +│ └─ 每个项目独立的轻量级虚拟环境 │ +└─────────────────────────────────────────────────────────┘ +``` + +#### 2.3.2 Environment Setup Flow + +``` +开始代码生成 + │ + ▼ +检查 Conda ai_base 环境是否存在? + │ + ├─ 否 → 创建 Conda 环境: + │ conda create -n ai_base python=3.10 cuda-toolkit -y + │ + ▼ +进入项目目录 workspace/{paper_name}/ + │ + ▼ +检查 .venv 是否存在? + │ + ├─ 否 → 创建 uv 虚拟环境: + │ uv venv --python $(conda run -n ai_base which python) + │ + ▼ +激活环境并安装依赖: + uv pip install -r requirements.txt + │ + ▼ +继续代码生成/测试 +``` + +#### 2.3.3 Implementation: Skill vs Subagent + +**推荐方案:使用 Skill** + +**`environment-management` Skill 职责**: + +1. 提供 Conda + uv 最佳实践指南 +2. 提供环境检测和创建的命令模板 +3. 提供 pyproject.toml 模板 +4. 提供常见依赖配置(PyTorch + CUDA) + +**由 `code-writer` Subagent 负责**: + +1. 在生成代码前调用 `environment-management` Skill +2. 执行环境检测和创建命令 +3. 生成项目的 pyproject.toml +4. 安装依赖 + +--- + +## 3. Workflow + +### 3.1 Complete Workflow + +``` +用户输入论文 (Markdown/Text) + │ + ▼ +┌─────────────────────────────────────────────────────────┐ +│ 阶段 1:论文解析 │ +│ ├─ paper-director 创建工作区目录 │ +│ ├─ 调用 @paper-image-extractor 提取并理解图片 │ +│ │ └─ 输出: image_understanding.md │ +│ ├─ 调用 @paper-analyzer 解析论文结构 │ +│ │ └─ 输入: 论文 + image_understanding.md │ +│ │ └─ 输出: paper_structure.md + replication_plan.md │ +│ └─ 汇总解析结果 │ +└─────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────┐ +│ 🔴 人工核验点 │ +│ ├─ 展示论文结构分析 │ +│ ├─ 展示图片理解结果 │ +│ ├─ 展示复现计划和预期产出 │ +│ ├─ 展示需要复现的实验图表清单 │ +│ └─ 等待用户确认或修正 │ +└─────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────┐ +│ 阶段 2:代码生成(TDD 模式) │ +│ ├─ paper-director 根据复现计划生成测试用例 │ +│ ├─ 调用 @code-writer 生成模型代码 │ +│ │ └─ 输入: 分析文档 + 测试用例 │ +│ │ └─ 输出: src/models/*.py │ +│ ├─ 运行测试,验证代码正确性 │ +│ ├─ 如果测试失败,迭代修复 │ +│ ├─ 调用 @code-writer 生成训练脚本 │ +│ │ └─ 输出: src/training/*.py │ +│ └─ 生成使用文档 docs/README.md │ +└─────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────┐ +│ 阶段 3:验证与报告 │ +│ ├─ 调用 @test-runner 运行完整测试套件 │ +│ ├─ 尝试运行训练脚本(验证可执行性) │ +│ ├─ 对比论文中的预期结果 │ +│ ├─ 分析差异并说明原因 │ +│ └─ 生成 replication_report.md │ +└─────────────────────────────────────────────────────────┘ + │ + ▼ + 最终产出 +``` + +### 3.2 Human Checkpoint Details + +人工核验点需要展示以下内容供用户确认: + +1. **论文基本信息** + + - 标题、作者、发表年份 + - 核心贡献点摘要 + +2. **模型架构理解** + + - 架构图的文字描述 + - 关键组件清单 + +3. **实验复现目标** + + - 需要复现的图表清单 + - 每个图表的数据来源说明 + - 预期的数值范围 + +4. **复现计划** + + - 代码模块划分 + - 实现顺序 + - 预估工作量 + +5. **风险点和限制** + + - 可能无法复现的部分 + - 需要的外部资源(数据集等) + +--- + +## 4. Agent Specifications + +### 4.1 paper-director (Primary Agent) + +**Mode**: `primary` + +**Description**: 论文复现项目的编排者和质量控制负责人 + +**Responsibilities**: + +- 接收用户的论文复现请求 +- 创建和管理工作区目录 +- 规划复现流程并创建任务清单 +- 调度各个 Subagent 执行具体任务 +- 在人工核验点暂停,等待用户确认 +- 汇总结果并生成最终复现报告 +- 处理异常情况和错误恢复 + +**Tools**: 全部启用 + +**Model**: inherit(继承用户配置) + +### 4.2 paper-analyzer (Subagent) + +**Mode**: `subagent` + +**Description**: 解析论文文本内容,提取结构化信息,规划复现任务 + +**Input**: + +- 论文 Markdown 文件或纯文本 +- `image_understanding.md`(图片理解结果) + +**Output**: + +- `paper_structure.md` - 论文结构分析 +- `replication_plan.md` - 复现计划 + +**Output Content**: + +- 论文标题、作者、摘要 +- 核心贡献点 +- 方法论概述 +- 模型架构描述(文字) +- 损失函数和优化器 +- 实验设置(数据集、超参数) +- 关键公式(LaTeX) +- **实验结果复现目标**(基于图片理解) + - 需要复现的图表清单 + - 每个图表的数据来源 + - 预期的数值范围 + - 复现优先级 +- 待实现的功能清单 + +**Tools**: read, write, edit + +**Model**: inherit + +### 4.3 paper-image-extractor (Subagent) + +**Mode**: `subagent` + +**Description**: 提取并理解论文中的图片内容 + +**Input**: + +- 论文 Markdown 文件(含图片链接/路径) + +**Output**: + +- `image_understanding.md` - 图片理解文档 + +**Output Content** (每张图片): + +- 图片类型识别(架构图/实验图/算法伪代码/公式) +- 详细文字描述 +- 架构图: 数据流向、层结构、连接关系 +- 实验图: 数值提取、趋势分析、关键数据点 +- 算法伪代码: 文字化描述、步骤拆解 +- 关键信息提炼 + +**Tools**: read, write, edit, bash + +**Model**: inherit + +### 4.4 code-writer (Subagent) + +**Mode**: `subagent` + +**Description**: 根据分析结果生成 PyTorch 代码,并管理项目环境 + +**Input**: + +- `paper_structure.md` - 论文结构分析 +- `image_understanding.md` - 图片理解 +- 测试用例(TDD 模式) + +**Output**: + +- `.venv/` - uv 创建的项目虚拟环境 +- `pyproject.toml` - 项目依赖配置 +- `src/models/*.py` - 模型定义 +- `src/training/*.py` - 训练脚本 +- `src/utils/*.py` - 工具函数 +- `docs/README.md` - 使用文档 + +**Working Mode**: TDD 驱动 + 环境管理 + +1. **调用 `environment-management` Skill** +2. 检测并创建 Conda 基础环境(如不存在) +3. 创建项目 uv 虚拟环境 +4. 生成 pyproject.toml +5. 安装依赖 +6. 接收测试用例 +7. 编写代码使测试通过 +8. 迭代修复直到所有测试通过 + +**Tools**: read, write, edit, bash + +**Model**: inherit + +### 4.5 test-runner (Subagent) + +**Mode**: `subagent` + +**Description**: 运行测试并验证复现结果 + +**Input**: + +- 生成的代码 +- 测试用例 +- 论文预期结果 + +**Output**: + +- 测试运行结果 +- `replication_report.md` - 复现报告 + +**Report Content**: + +- 测试通过率 +- 代码可运行性验证 +- 与论文结果的对比分析 +- 差异说明和原因分析 +- 改进建议 + +**Tools**: read, write, edit, bash + +**Model**: inherit + +--- + +## 5. Skills Specifications + +### 5.1 paper-parsing + +**Purpose**: 指导如何系统性、全面地解析 ML/DL 论文 + +**Core Philosophy**: 强调**开放性**和**全面性**,避免漏解析和片面理解 + +**Content**: + +1. **开放性提示框架** + + - 论文各部分可能包含的信息类型(非固定模板) + - 每个部分的"可能存在"检查清单 + - 鼓励发现论文独特之处 + +2. **全面性检查清单** + + - Abstract → 是否提取了核心贡献? + - Introduction → 是否理解了问题背景和动机? + - Related Work → 是否识别了与现有方法的关键差异? + - Method → 是否完整理解了方法细节? + - Experiments → 是否识别了所有需要复现的实验? + - Appendix → 是否检查了补充材料? + +3. **任务分解指南** + + - 如何将复杂论文拆分为可执行的子任务 + - 依赖关系识别 + - 优先级排序原则 + +4. **示例模板** + + - 不同类型论文的解析示例 + - 常见遗漏点提醒 + +### 5.2 code-generation + +**Purpose**: 指导如何根据论文生成高质量代码 + +**Content**: + +- 从论文描述到代码的映射规则 +- 模块化设计原则 +- 代码风格规范 +- 常见实现模式 + +### 5.3 pytorch-patterns + +**Purpose**: 提供 PyTorch 代码的最佳实践模板 + +**Content**: + +- 模型定义模板(nn.Module) +- 训练循环模板 +- 常见层实现参考 +- 性能优化技巧 +- 设备管理(CPU/GPU) +- 数据加载最佳实践 + +### 5.4 verification + +**Purpose**: 指导如何验证复现结果 + +**Content**: + +- 测试用例设计指南 +- 论文结果对比方法 +- 差异分析框架 +- 常见差异原因清单 + +### 5.5 environment-management + +**Purpose**: 指导如何使用 Conda + uv 混合模式管理项目环境 + +**Content**: + +1. **环境架构说明** + + - Conda 作为系统底座的职责 + - uv 作为项目隔离的职责 + - 两者的协作方式 + +2. **Conda 基础环境配置** + + - ai_base 环境创建命令 + - 必要的系统级依赖(cuda-toolkit 等) + - 环境检测脚本 + +3. **uv 项目环境配置** + + - .venv 创建命令模板 + - pyproject.toml 模板 + - 常见 ML 依赖配置 + - PyTorch + CUDA 版本对应关系 + +4. **环境管理命令清单** + + ```bash + # 检查 Conda 环境 + conda env list | grep ai_base + + # 创建 Conda 基础环境 + conda create -n ai_base python=3.10 cuda-toolkit -y + + # 获取 Conda Python 路径 + conda run -n ai_base which python + + # 创建 uv 虚拟环境 + uv venv --python $(conda run -n ai_base which python) + + # 激活并安装依赖 + source .venv/bin/activate # Linux/Mac + .venv\Scripts\activate # Windows + uv pip install -e . + ``` + +5. **pyproject.toml 模板** + + ```toml + [project] + name = "{paper_name}" + version = "0.1.0" + requires-python = ">=3.10" + dependencies = [ + "torch>=2.0.0", + "numpy>=1.24.0", + "matplotlib>=3.7.0", + "pytest>=7.0.0", + ] + + [project.optional-dependencies] + dev = ["pytest", "black", "ruff"] + ``` + +--- + +## 6. Commands Specifications + +### 6.1 /replicate + +**Purpose**: 一键启动论文复现流程 + +**Usage**: + +``` +/replicate path/to/paper.md +``` + +**Behavior**: + +1. 验证输入文件存在 +2. 从文件名提取论文名称 +3. 创建 `workspace/{paper_name}/` 目录结构 +4. 切换到 paper-director 主 Agent +5. 启动完整复现流程 + +### 6.2 /verify + +**Purpose**: 验证已生成的复现代码 + +**Usage**: + +``` +/verify workspace/paper_name/ +``` + +**Behavior**: + +1. 检查工作区目录结构 +2. 运行所有测试 +3. 对比论文结果 +4. 生成/更新验证报告 + +--- + +## 7. Data Flow + +### 7.1 Analysis Phase Data Flow + +``` +论文.md + │ + ├──────────────────────────────────┐ + │ │ + ▼ ▼ +paper-image-extractor (等待图片理解完成) + │ │ + └─> image_understanding.md ────────┘ + │ + ▼ + paper-analyzer + │ + ├─> paper_structure.md + └─> replication_plan.md +``` + +### 7.2 Code Generation Phase Data Flow + +``` +paper_structure.md + image_understanding.md + │ + ▼ + paper-director + (生成测试用例) + │ + ▼ + code-writer + │ + ┌────────────────┼────────────────┐ + ▼ ▼ ▼ +src/models/ src/training/ src/utils/ + │ │ │ + └────────────────┴────────────────┘ + │ + ▼ + 运行测试 (TDD) + │ + ┌────────┴────────┐ + ▼ ▼ + 通过 失败 → 回到 code-writer 修复 + │ + ▼ + docs/README.md +``` + +### 7.3 Verification Phase Data Flow + +``` +src/* + tests/* + replication_plan.md + │ + ▼ + test-runner + │ + ┌────────────────┼────────────────┐ + ▼ ▼ ▼ +运行测试 对比论文结果 分析差异 + │ │ │ + └────────────────┴────────────────┘ + │ + ▼ + replication_report.md +``` + +--- + +## 8. Error Handling + +### 8.1 Analysis Phase Errors + +| 错误类型 | 处理方式 | +| ------- | ---------------- | +| 论文文件不存在 | 提示用户检查路径 | +| 图片无法访问 | 标记为无法解析,继续处理其他图片 | +| 解析结果不完整 | 在人工核验点展示,请求用户补充 | + +### 8.2 Code Generation Phase Errors + +| 错误类型 | 处理方式 | +| ---- | -------------------------------- | +| 测试失败 | 迭代修复,最多 3 次;如仍失败,标记为需人工干预并继续其他模块 | +| 依赖缺失 | 提示用户安装 | +| 语法错误 | 自动修复 | + +### 8.3 Verification Phase Errors + +| 错误类型 | 处理方式 | +| ------ | ----------- | +| 测试运行失败 | 记录错误,标注在报告中 | +| 结果差异大 | 分析原因,在报告中说明 | + +--- + +## 9. Configuration + +### 9.1 opencode.json + +```json +{ + "$schema": "https://opencode.ai/config.json", + "default_agent": "paper-director", + "agent": { + "paper-director": { + "mode": "primary" + } + } +} +``` + +--- + +## 10. Future Enhancements + +1. **支持更多输入格式**: PDF 直接解析、arXiv URL +2. **支持更多框架**: TensorFlow、JAX +3. **数据集自动准备**: 自动下载和预处理常用数据集 +4. **结果可视化**: 自动生成对比图表 +5. **增量复现**: 支持部分复现和断点续传 + +--- + +## Appendix A: Glossary + +| 术语 | 说明 | +| ------------- | ------------------------- | +| Primary Agent | 主 Agent,用户直接交互的 Agent | +| Subagent | 子 Agent,由主 Agent 调度执行特定任务 | +| Skill | 技能,为 Agent 提供特定领域的指导 | +| TDD | 测试驱动开发,先写测试再写代码 | +| Workspace | 工作区,存放每篇论文复现结果的目录 | + +--- + +## Appendix B: References + +- OpenCode Agents Documentation: https://opencode.ai/docs/zh-cn/agents/ +- Superpowers Skills: https://github.com/obra/superpowers +- PyTorch Documentation: https://pytorch.org/docs/