Commit Graph

11 Commits

Author SHA1 Message Date
hc
6b78dc47fa style(agents): standardize bilingual format for all agent files
- Use English for structural headers (Role, Workflow, Constraints)
- Use Chinese for business logic and detailed explanations
- Consistent formatting across all 6 agents:
  - paper-director.md
  - paper-analyzer.md
  - paper-image-extractor.md
  - code-writer.md
  - test-runner.md
  - result-verifier.md
2026-04-01 00:42:01 +08:00
hc
ced50ea2b0 feat(agent): add result-verifier for blind visual comparison
Root cause: test-runner was giving overly optimistic results due to:
1. Context bias - knew the implementation, tended to defend it
2. No actual visual comparison - just wrote 'ACCEPTABLE' without looking
3. No structural validation - accepted 35x scale differences as 'acceptable'

Solution:
- New result-verifier agent that performs blind visual comparison
- Strict pass/fail criteria for structural validation
- Updated test-runner to use result-verifier for each figure
- Clear guidelines: structural mismatches = FAIL, not ACCEPTABLE

Test result: verifier correctly identified Fig3 as FAIL with 7 specific issues:
- Wrong X-axis variable (channels vs power)
- Wrong Y-axis scale (5x difference)
- Wrong curve count (5 vs 4)
- etc.
2026-03-31 23:56:36 +08:00
hc
3533e15995 fix(agent): require explicit image file reading in paper-image-extractor
The subagent was only reading text descriptions about images instead of
actually using the read tool on image files. This caused poor quality
reproductions based on guessed data rather than visual analysis.

Changes:
- Add CRITICAL instruction to use read tool on each image file
- Add Step 4: Verification step to compare generated vs original
- Add 'Extracting Data from Images' section with specific guidance
- Update guidelines to emphasize visual over textual extraction
- Allow scipy dependency for interpolation
2026-03-31 20:29:04 +08:00
hc
5d5aee1f83 refactor: improve verification workflow with visual comparison
Major changes:
- paper-image-extractor: Generate reference_plots.py for visual verification
- paper-director: Add image understanding checkpoint with side-by-side comparison
- paper-analyzer: Add data source labeling with reliability levels
- code-writer: Change from TDD to VDD (Verification-Driven Development)
- test-runner: Generate comparison reports with images and explanations
- verification skill: Add difference classification system
- code-generation skill: Emphasize result independence

Key principles:
- Code results are authoritative, paper values are references
- Differences are expected and documented, not bugs to fix
- Visual comparison prioritized over exact numerical match
- Tests verify sanity (shape, gradient, range), not exact values
2026-03-31 19:55:36 +08:00
hc
db731f6745 fix(agents): remove invalid 'model: inherit' configuration
OpenCode requires models to be either explicitly defined with valid IDs or omitted to inherit the default model.
2026-03-31 18:08:10 +08:00
hc
f62129f5d4 feat(agents): add test-runner subagent 2026-03-31 17:36:53 +08:00
hc
59c4a4c5ff feat(agents): add code-writer subagent 2026-03-31 17:35:38 +08:00
hc
f6fff84335 feat(agents): add paper-image-extractor subagent 2026-03-31 17:34:16 +08:00
hc
fb926c6fd3 feat(agents): add paper-analyzer subagent 2026-03-31 17:33:06 +08:00
hc
3691b532fc feat(agents): add paper-director primary agent
Orchestrates ML/DL paper replication workflow with human checkpoint.
2026-03-31 17:31:38 +08:00
hc
4801fb2cc2 Initial commit: design spec and implementation plan
- Design spec: docs/superpowers/specs/2026-03-31-paper-replication-agent-design.md
- Implementation plan: docs/superpowers/plans/2026-03-31-paper-replication-agent.md
- Existing agent: .opencode/agents/paper-image-extractor.md
2026-03-31 17:29:53 +08:00