PaperTool/.opencode/skills/code-generation/SKILL.md
hc 5d5aee1f83 refactor: improve verification workflow with visual comparison
Major changes:
- paper-image-extractor: Generate reference_plots.py for visual verification
- paper-director: Add image understanding checkpoint with side-by-side comparison
- paper-analyzer: Add data source labeling with reliability levels
- code-writer: Change from TDD to VDD (Verification-Driven Development)
- test-runner: Generate comparison reports with images and explanations
- verification skill: Add difference classification system
- code-generation skill: Emphasize result independence

Key principles:
- Code results are authoritative, paper values are references
- Differences are expected and documented, not bugs to fix
- Visual comparison prioritized over exact numerical match
- Tests verify sanity (shape, gradient, range), not exact values
2026-03-31 19:55:36 +08:00

5.8 KiB

name description
code-generation Use when generating PyTorch code from paper analysis to ensure correct mapping from paper to code

Code Generation from Papers

Overview

Guidelines for translating paper descriptions into working PyTorch code.

Announce at start: "I'm using the code-generation skill to ensure accurate paper-to-code translation."

Core Principles

  1. Traceability: Every code block should reference paper section/equation
  2. Testability: Write code that can be unit tested
  3. Readability: Prefer clarity over cleverness
  4. Modularity: One component per file
  5. Independence: Code logic based on paper methodology, NOT reverse-engineered from expected outputs

Critical: Result Independence

The code must implement the paper's described method, not be reverse-engineered to match reference values.

DO NOT:

# WRONG: Using values from reference_plots.py as targets
expected_accuracy = 0.952  # Copied from paper figure
assert abs(accuracy - expected_accuracy) < 0.01  # This defeats the purpose

DO:

# CORRECT: Implement the method, let results be what they are
# Paper Section 4.1: "We use Adam with lr=1e-4"
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

# Run training, record actual results
accuracy = evaluate(model, test_loader)
# This accuracy is authoritative - compare with paper in report

Reference Values Are For Comparison Only

Values from image_understanding.md and reference_plots.py should:

  • Be used in the final report for comparison
  • NOT be used as assertion targets in tests
  • NOT influence implementation decisions

Paper-to-Code Mapping

Architecture Diagrams → nn.Module

Diagram Element PyTorch Equivalent
Box/Block nn.Module subclass
Arrow forward() call chain
Split Multiple outputs / tuple
Merge torch.cat / torch.add
Skip connection Residual addition

Equations → Tensor Operations

Notation PyTorch
Wx + b nn.Linear(in, out)
\sigma(x) torch.sigmoid(x) or nn.Sigmoid()
\text{softmax}(x) F.softmax(x, dim=-1)
\|x\|_2 torch.norm(x, p=2)
x \odot y x * y (element-wise)
x^T y torch.matmul(x.T, y) or x.T @ y
\sum_i torch.sum(x, dim=i)
\mathbb{E}[x] torch.mean(x)

Loss Functions

Paper Description PyTorch
Cross-entropy nn.CrossEntropyLoss()
MSE / L2 nn.MSELoss()
L1 nn.L1Loss()
BCE nn.BCEWithLogitsLoss()
KL divergence nn.KLDivLoss()
Custom Subclass or functional

Code Structure Template

"""
{component_name}.py

Implements {what} from "{paper_title}" ({year})

Paper Reference:
- Section: {section_number}
- Equation: ({equation_number})
- Figure: {figure_number}

Author: Auto-generated for paper replication
"""

import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Optional, Tuple, List


class {ComponentName}(nn.Module):
    """
    {One-line description}
    
    From paper: "{exact quote or paraphrase}"
    
    Args:
        {param1}: {description} (paper: {where specified})
        {param2}: {description}
    
    Shape:
        - Input: {shape description}
        - Output: {shape description}
    
    Example:
        >>> layer = {ComponentName}(dim=512)
        >>> x = torch.randn(32, 100, 512)
        >>> out = layer(x)
        >>> out.shape
        torch.Size([32, 100, 512])
    """
    
    def __init__(
        self,
        {param1}: {type},
        {param2}: {type} = {default},
    ):
        super().__init__()
        
        # Paper Section X.Y: "{description}"
        self.layer1 = nn.Linear(...)
        
        # Equation (N): ...
        self.layer2 = nn.LayerNorm(...)
        
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Forward pass implementing Equation (N).
        
        Args:
            x: Input tensor of shape (batch, seq, dim)
            
        Returns:
            Output tensor of shape (batch, seq, dim)
        """
        # Step 1: ... (Eq. N, first term)
        h = self.layer1(x)
        
        # Step 2: ... (Eq. N, second term)
        out = self.layer2(h)
        
        return out

Common Patterns

Residual Connection

# Paper: "We add a residual connection"
out = self.sublayer(x) + x

Layer Normalization

# Paper: "Pre-LN Transformer"
x = self.norm(x)
x = self.attention(x)

# Paper: "Post-LN Transformer"
x = x + self.attention(x)
x = self.norm(x)

Multi-Head Attention

# Paper: "Standard multi-head attention with h heads"
self.attention = nn.MultiheadAttention(
    embed_dim=d_model,
    num_heads=h,
    dropout=dropout,
    batch_first=True,
)

Custom Activation

# Paper: "We use GELU activation"
x = F.gelu(x)

# Paper: "We use Swish/SiLU activation"
x = F.silu(x)

Handling Ambiguity

When paper is unclear:

  1. Check code repository if available
  2. Follow common practice for the architecture type
  3. Document assumption in code comment
  4. Add TODO for verification
# TODO: Paper unclear on initialization. Using PyTorch default.
# See: https://github.com/paper/repo for reference implementation
self.linear = nn.Linear(in_dim, out_dim)

Verification Checklist

Before completing a module:

  • All equations implemented
  • Shapes documented and verified
  • Paper references in comments
  • Type hints complete
  • Example in docstring works
  • No hardcoded dimensions (use params)
  • Gradient flow verified (no in-place ops breaking autograd)
  • No reference values hardcoded as expected outputs
  • Implementation based on paper method, not reverse-engineered from results