PaperTool

hc/PaperTool

Fork 0

Commit Graph

Author	SHA1	Message	Date
hc	6b78dc47fa	style(agents): standardize bilingual format for all agent files - Use English for structural headers (Role, Workflow, Constraints) - Use Chinese for business logic and detailed explanations - Consistent formatting across all 6 agents: - paper-director.md - paper-analyzer.md - paper-image-extractor.md - code-writer.md - test-runner.md - result-verifier.md	2026-04-01 00:42:01 +08:00
hc	ced50ea2b0	feat(agent): add result-verifier for blind visual comparison Root cause: test-runner was giving overly optimistic results due to: 1. Context bias - knew the implementation, tended to defend it 2. No actual visual comparison - just wrote 'ACCEPTABLE' without looking 3. No structural validation - accepted 35x scale differences as 'acceptable' Solution: - New result-verifier agent that performs blind visual comparison - Strict pass/fail criteria for structural validation - Updated test-runner to use result-verifier for each figure - Clear guidelines: structural mismatches = FAIL, not ACCEPTABLE Test result: verifier correctly identified Fig3 as FAIL with 7 specific issues: - Wrong X-axis variable (channels vs power) - Wrong Y-axis scale (5x difference) - Wrong curve count (5 vs 4) - etc.	2026-03-31 23:56:36 +08:00

Author

SHA1

Message

Date

6b78dc47fa

style(agents): standardize bilingual format for all agent files

- Use English for structural headers (Role, Workflow, Constraints)
- Use Chinese for business logic and detailed explanations
- Consistent formatting across all 6 agents:
  - paper-director.md
  - paper-analyzer.md
  - paper-image-extractor.md
  - code-writer.md
  - test-runner.md
  - result-verifier.md

2026-04-01 00:42:01 +08:00

ced50ea2b0

feat(agent): add result-verifier for blind visual comparison

Root cause: test-runner was giving overly optimistic results due to:
1. Context bias - knew the implementation, tended to defend it
2. No actual visual comparison - just wrote 'ACCEPTABLE' without looking
3. No structural validation - accepted 35x scale differences as 'acceptable'

Solution:
- New result-verifier agent that performs blind visual comparison
- Strict pass/fail criteria for structural validation
- Updated test-runner to use result-verifier for each figure
- Clear guidelines: structural mismatches = FAIL, not ACCEPTABLE

Test result: verifier correctly identified Fig3 as FAIL with 7 specific issues:
- Wrong X-axis variable (channels vs power)
- Wrong Y-axis scale (5x difference)
- Wrong curve count (5 vs 4)
- etc.

2026-03-31 23:56:36 +08:00

2 Commits