- Use English for structural headers (Role, Workflow, Constraints)
- Use Chinese for business logic and detailed explanations
- Consistent formatting across all 6 agents:
- paper-director.md
- paper-analyzer.md
- paper-image-extractor.md
- code-writer.md
- test-runner.md
- result-verifier.md
Major changes:
- paper-image-extractor: Generate reference_plots.py for visual verification
- paper-director: Add image understanding checkpoint with side-by-side comparison
- paper-analyzer: Add data source labeling with reliability levels
- code-writer: Change from TDD to VDD (Verification-Driven Development)
- test-runner: Generate comparison reports with images and explanations
- verification skill: Add difference classification system
- code-generation skill: Emphasize result independence
Key principles:
- Code results are authoritative, paper values are references
- Differences are expected and documented, not bugs to fix
- Visual comparison prioritized over exact numerical match
- Tests verify sanity (shape, gradient, range), not exact values