PaperTool/.opencode/skills/code-generation/SKILL.md

---
name: code-generation
description: Use when generating PyTorch code from paper analysis to ensure correct mapping from paper to code
---

# Code Generation from Papers

## Overview

Guidelines for translating paper descriptions into working PyTorch code.

**Announce at start:** "I'm using the code-generation skill to ensure accurate paper-to-code translation."

## Core Principles

1. **Traceability**: Every code block should reference paper section/equation
2. **Testability**: Write code that can be unit tested
3. **Readability**: Prefer clarity over cleverness
4. **Modularity**: One component per file
5. **Independence**: Code logic based on paper methodology, NOT reverse-engineered from expected outputs

## Critical: Result Independence

The code must implement the **paper's described method**, not be reverse-engineered to match reference values.

### DO NOT:
```python
# WRONG: Using values from reference_plots.py as targets
expected_accuracy = 0.952  # Copied from paper figure
assert abs(accuracy - expected_accuracy) < 0.01  # This defeats the purpose
```

### DO:
```python
# CORRECT: Implement the method, let results be what they are
# Paper Section 4.1: "We use Adam with lr=1e-4"
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

# Run training, record actual results
accuracy = evaluate(model, test_loader)
# This accuracy is authoritative - compare with paper in report
```

### Reference Values Are For Comparison Only

Values from `image_understanding.md` and `reference_plots.py` should:
- Be used in the **final report** for comparison
- **NOT** be used as assertion targets in tests
- **NOT** influence implementation decisions

## Paper-to-Code Mapping

### Architecture Diagrams → nn.Module

| Diagram Element | PyTorch Equivalent |
|-----------------|-------------------|
| Box/Block | nn.Module subclass |
| Arrow | forward() call chain |
| Split | Multiple outputs / tuple |
| Merge | torch.cat / torch.add |
| Skip connection | Residual addition |

### Equations → Tensor Operations

| Notation | PyTorch |
|----------|---------|
| $Wx + b$ | `nn.Linear(in, out)` |
| $\sigma(x)$ | `torch.sigmoid(x)` or `nn.Sigmoid()` |
| $\text{softmax}(x)$ | `F.softmax(x, dim=-1)` |
| $\|x\|_2$ | `torch.norm(x, p=2)` |
| $x \odot y$ | `x * y` (element-wise) |
| $x^T y$ | `torch.matmul(x.T, y)` or `x.T @ y` |
| $\sum_i$ | `torch.sum(x, dim=i)` |
| $\mathbb{E}[x]$ | `torch.mean(x)` |

### Loss Functions

| Paper Description | PyTorch |
|-------------------|---------|
| Cross-entropy | `nn.CrossEntropyLoss()` |
| MSE / L2 | `nn.MSELoss()` |
| L1 | `nn.L1Loss()` |
| BCE | `nn.BCEWithLogitsLoss()` |
| KL divergence | `nn.KLDivLoss()` |
| Custom | Subclass or functional |

## Code Structure Template

```python
"""
{component_name}.py

Implements {what} from "{paper_title}" ({year})

Paper Reference:
- Section: {section_number}
- Equation: ({equation_number})
- Figure: {figure_number}

Author: Auto-generated for paper replication
"""

import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Optional, Tuple, List


class {ComponentName}(nn.Module):
    """
    {One-line description}

    From paper: "{exact quote or paraphrase}"

    Args:
        {param1}: {description} (paper: {where specified})
        {param2}: {description}

    Shape:
        - Input: {shape description}
        - Output: {shape description}

    Example:
        >>> layer = {ComponentName}(dim=512)
        >>> x = torch.randn(32, 100, 512)
        >>> out = layer(x)
        >>> out.shape
        torch.Size([32, 100, 512])
    """

    def __init__(
        self,
        {param1}: {type},
        {param2}: {type} = {default},
    ):
        super().__init__()

        # Paper Section X.Y: "{description}"
        self.layer1 = nn.Linear(...)

        # Equation (N): ...
        self.layer2 = nn.LayerNorm(...)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Forward pass implementing Equation (N).

        Args:
            x: Input tensor of shape (batch, seq, dim)

        Returns:
            Output tensor of shape (batch, seq, dim)
        """
        # Step 1: ... (Eq. N, first term)
        h = self.layer1(x)

        # Step 2: ... (Eq. N, second term)
        out = self.layer2(h)

        return out
```

## Common Patterns

### Residual Connection

```python
# Paper: "We add a residual connection"
out = self.sublayer(x) + x
```

### Layer Normalization

```python
# Paper: "Pre-LN Transformer"
x = self.norm(x)
x = self.attention(x)

# Paper: "Post-LN Transformer"
x = x + self.attention(x)
x = self.norm(x)
```

### Multi-Head Attention

```python
# Paper: "Standard multi-head attention with h heads"
self.attention = nn.MultiheadAttention(
    embed_dim=d_model,
    num_heads=h,
    dropout=dropout,
    batch_first=True,
)
```

### Custom Activation

```python
# Paper: "We use GELU activation"
x = F.gelu(x)

# Paper: "We use Swish/SiLU activation"
x = F.silu(x)
```

## Handling Ambiguity

When paper is unclear:

1. **Check code repository** if available
2. **Follow common practice** for the architecture type
3. **Document assumption** in code comment
4. **Add TODO** for verification

```python
# TODO: Paper unclear on initialization. Using PyTorch default.
# See: https://github.com/paper/repo for reference implementation
self.linear = nn.Linear(in_dim, out_dim)
```

## Verification Checklist

Before completing a module:

- [ ] All equations implemented
- [ ] Shapes documented and verified
- [ ] Paper references in comments
- [ ] Type hints complete
- [ ] Example in docstring works
- [ ] No hardcoded dimensions (use params)
- [ ] Gradient flow verified (no in-place ops breaking autograd)
- [ ] **No reference values hardcoded as expected outputs**
- [ ] **Implementation based on paper method, not reverse-engineered from results**