PaperTool/workspace/resource_allocation/pyproject.toml
hc ced50ea2b0 feat(agent): add result-verifier for blind visual comparison
Root cause: test-runner was giving overly optimistic results due to:
1. Context bias - knew the implementation, tended to defend it
2. No actual visual comparison - just wrote 'ACCEPTABLE' without looking
3. No structural validation - accepted 35x scale differences as 'acceptable'

Solution:
- New result-verifier agent that performs blind visual comparison
- Strict pass/fail criteria for structural validation
- Updated test-runner to use result-verifier for each figure
- Clear guidelines: structural mismatches = FAIL, not ACCEPTABLE

Test result: verifier correctly identified Fig3 as FAIL with 7 specific issues:
- Wrong X-axis variable (channels vs power)
- Wrong Y-axis scale (5x difference)
- Wrong curve count (5 vs 4)
- etc.
2026-03-31 23:56:36 +08:00

28 lines
511 B
TOML

[project]
name = "resource-allocation"
version = "0.1.0"
description = "Replication of semantic-aware resource allocation"
requires-python = ">=3.10"
dependencies = [
"torch>=2.0.0",
"numpy>=1.23.0",
"matplotlib>=3.6.0",
"scipy>=1.9.0",
"tqdm>=4.65.0"
]
[project.optional-dependencies]
dev = [
"pytest>=7.0.0"
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["src"]
[tool.pytest.ini_options]
pythonpath = ["."]