PaperTool

History

hc ced50ea2b0 feat(agent): add result-verifier for blind visual comparison Root cause: test-runner was giving overly optimistic results due to: 1. Context bias - knew the implementation, tended to defend it 2. No actual visual comparison - just wrote 'ACCEPTABLE' without looking 3. No structural validation - accepted 35x scale differences as 'acceptable' Solution: - New result-verifier agent that performs blind visual comparison - Strict pass/fail criteria for structural validation - Updated test-runner to use result-verifier for each figure - Clear guidelines: structural mismatches = FAIL, not ACCEPTABLE Test result: verifier correctly identified Fig3 as FAIL with 7 specific issues: - Wrong X-axis variable (channels vs power) - Wrong Y-axis scale (5x difference) - Wrong curve count (5 vs 4) - etc.		2026-03-31 23:56:36 +08:00
..
agents	feat(agent): add result-verifier for blind visual comparison	2026-03-31 23:56:36 +08:00
commands	feat(commands): add /verify command	2026-03-31 17:45:12 +08:00
skills	refactor: improve verification workflow with visual comparison	2026-03-31 19:55:36 +08:00