Commit Graph

17 Commits

Author SHA1 Message Date
hc
ced50ea2b0 feat(agent): add result-verifier for blind visual comparison
Root cause: test-runner was giving overly optimistic results due to:
1. Context bias - knew the implementation, tended to defend it
2. No actual visual comparison - just wrote 'ACCEPTABLE' without looking
3. No structural validation - accepted 35x scale differences as 'acceptable'

Solution:
- New result-verifier agent that performs blind visual comparison
- Strict pass/fail criteria for structural validation
- Updated test-runner to use result-verifier for each figure
- Clear guidelines: structural mismatches = FAIL, not ACCEPTABLE

Test result: verifier correctly identified Fig3 as FAIL with 7 specific issues:
- Wrong X-axis variable (channels vs power)
- Wrong Y-axis scale (5x difference)
- Wrong curve count (5 vs 4)
- etc.
2026-03-31 23:56:36 +08:00
hc
3533e15995 fix(agent): require explicit image file reading in paper-image-extractor
The subagent was only reading text descriptions about images instead of
actually using the read tool on image files. This caused poor quality
reproductions based on guessed data rather than visual analysis.

Changes:
- Add CRITICAL instruction to use read tool on each image file
- Add Step 4: Verification step to compare generated vs original
- Add 'Extracting Data from Images' section with specific guidance
- Update guidelines to emphasize visual over textual extraction
- Allow scipy dependency for interpolation
2026-03-31 20:29:04 +08:00
hc
5d5aee1f83 refactor: improve verification workflow with visual comparison
Major changes:
- paper-image-extractor: Generate reference_plots.py for visual verification
- paper-director: Add image understanding checkpoint with side-by-side comparison
- paper-analyzer: Add data source labeling with reliability levels
- code-writer: Change from TDD to VDD (Verification-Driven Development)
- test-runner: Generate comparison reports with images and explanations
- verification skill: Add difference classification system
- code-generation skill: Emphasize result independence

Key principles:
- Code results are authoritative, paper values are references
- Differences are expected and documented, not bugs to fix
- Visual comparison prioritized over exact numerical match
- Tests verify sanity (shape, gradient, range), not exact values
2026-03-31 19:55:36 +08:00
hc
db731f6745 fix(agents): remove invalid 'model: inherit' configuration
OpenCode requires models to be either explicitly defined with valid IDs or omitted to inherit the default model.
2026-03-31 18:08:10 +08:00
hc
3372b76f6d feat(commands): add /verify command
Entry point for verification of existing replication projects.
2026-03-31 17:45:12 +08:00
hc
400caf2c00 feat(commands): add /replicate command
Entry point for paper replication workflow.
2026-03-31 17:45:07 +08:00
hc
d376cc113a feat(skills): add environment-management skill
Conda + uv hybrid environment setup for ML projects.
2026-03-31 17:44:26 +08:00
hc
849cfe5409 feat(skills): add verification skill
Replication result verification methodology.
2026-03-31 17:43:41 +08:00
hc
06282c7314 feat(skills): add pytorch-patterns skill 2026-03-31 17:41:36 +08:00
hc
cd6e1ebd27 feat(skills): add code-generation skill 2026-03-31 17:39:57 +08:00
hc
5136723d62 feat(skills): add paper-parsing skill 2026-03-31 17:38:17 +08:00
hc
f62129f5d4 feat(agents): add test-runner subagent 2026-03-31 17:36:53 +08:00
hc
59c4a4c5ff feat(agents): add code-writer subagent 2026-03-31 17:35:38 +08:00
hc
f6fff84335 feat(agents): add paper-image-extractor subagent 2026-03-31 17:34:16 +08:00
hc
fb926c6fd3 feat(agents): add paper-analyzer subagent 2026-03-31 17:33:06 +08:00
hc
3691b532fc feat(agents): add paper-director primary agent
Orchestrates ML/DL paper replication workflow with human checkpoint.
2026-03-31 17:31:38 +08:00
hc
4801fb2cc2 Initial commit: design spec and implementation plan
- Design spec: docs/superpowers/specs/2026-03-31-paper-replication-agent-design.md
- Implementation plan: docs/superpowers/plans/2026-03-31-paper-replication-agent.md
- Existing agent: .opencode/agents/paper-image-extractor.md
2026-03-31 17:29:53 +08:00