PaperTool

Author	SHA1	Message	Date
hc	ced50ea2b0	feat(agent): add result-verifier for blind visual comparison Root cause: test-runner was giving overly optimistic results due to: 1. Context bias - knew the implementation, tended to defend it 2. No actual visual comparison - just wrote 'ACCEPTABLE' without looking 3. No structural validation - accepted 35x scale differences as 'acceptable' Solution: - New result-verifier agent that performs blind visual comparison - Strict pass/fail criteria for structural validation - Updated test-runner to use result-verifier for each figure - Clear guidelines: structural mismatches = FAIL, not ACCEPTABLE Test result: verifier correctly identified Fig3 as FAIL with 7 specific issues: - Wrong X-axis variable (channels vs power) - Wrong Y-axis scale (5x difference) - Wrong curve count (5 vs 4) - etc.	2026-03-31 23:56:36 +08:00
hc	3533e15995	fix(agent): require explicit image file reading in paper-image-extractor The subagent was only reading text descriptions about images instead of actually using the read tool on image files. This caused poor quality reproductions based on guessed data rather than visual analysis. Changes: - Add CRITICAL instruction to use read tool on each image file - Add Step 4: Verification step to compare generated vs original - Add 'Extracting Data from Images' section with specific guidance - Update guidelines to emphasize visual over textual extraction - Allow scipy dependency for interpolation	2026-03-31 20:29:04 +08:00
hc	5d5aee1f83	refactor: improve verification workflow with visual comparison Major changes: - paper-image-extractor: Generate reference_plots.py for visual verification - paper-director: Add image understanding checkpoint with side-by-side comparison - paper-analyzer: Add data source labeling with reliability levels - code-writer: Change from TDD to VDD (Verification-Driven Development) - test-runner: Generate comparison reports with images and explanations - verification skill: Add difference classification system - code-generation skill: Emphasize result independence Key principles: - Code results are authoritative, paper values are references - Differences are expected and documented, not bugs to fix - Visual comparison prioritized over exact numerical match - Tests verify sanity (shape, gradient, range), not exact values	2026-03-31 19:55:36 +08:00
hc	db731f6745	fix(agents): remove invalid 'model: inherit' configuration OpenCode requires models to be either explicitly defined with valid IDs or omitted to inherit the default model.	2026-03-31 18:08:10 +08:00
hc	f62129f5d4	feat(agents): add test-runner subagent	2026-03-31 17:36:53 +08:00
hc	59c4a4c5ff	feat(agents): add code-writer subagent	2026-03-31 17:35:38 +08:00
hc	f6fff84335	feat(agents): add paper-image-extractor subagent	2026-03-31 17:34:16 +08:00
hc	fb926c6fd3	feat(agents): add paper-analyzer subagent	2026-03-31 17:33:06 +08:00
hc	3691b532fc	feat(agents): add paper-director primary agent Orchestrates ML/DL paper replication workflow with human checkpoint.	2026-03-31 17:31:38 +08:00
hc	4801fb2cc2	Initial commit: design spec and implementation plan - Design spec: docs/superpowers/specs/2026-03-31-paper-replication-agent-design.md - Implementation plan: docs/superpowers/plans/2026-03-31-paper-replication-agent.md - Existing agent: .opencode/agents/paper-image-extractor.md	2026-03-31 17:29:53 +08:00

10 Commits