Root cause: test-runner was giving overly optimistic results due to: 1. Context bias - knew the implementation, tended to defend it 2. No actual visual comparison - just wrote 'ACCEPTABLE' without looking 3. No structural validation - accepted 35x scale differences as 'acceptable' Solution: - New result-verifier agent that performs blind visual comparison - Strict pass/fail criteria for structural validation - Updated test-runner to use result-verifier for each figure - Clear guidelines: structural mismatches = FAIL, not ACCEPTABLE Test result: verifier correctly identified Fig3 as FAIL with 7 specific issues: - Wrong X-axis variable (channels vs power) - Wrong Y-axis scale (5x difference) - Wrong curve count (5 vs 4) - etc.
28 lines
624 B
Python
28 lines
624 B
Python
"""
|
|
tests/test_semantic_model.py
|
|
"""
|
|
|
|
import pytest
|
|
import numpy as np
|
|
from src.models.semantic_model import SemanticSurrogate
|
|
|
|
|
|
def test_semantic_surrogate():
|
|
surrogate = SemanticSurrogate()
|
|
|
|
# Test bounds
|
|
snr_linear = np.array([10.0, 100.0, 1000.0])
|
|
k_n = 4
|
|
sim = surrogate.get_similarity(snr_linear, k_n)
|
|
|
|
assert np.all(sim >= 0) and np.all(sim <= 1)
|
|
|
|
# Test monotonicity with SNR
|
|
assert sim[0] < sim[1] < sim[2]
|
|
|
|
# Test monotonicity with k_n
|
|
sim_k4 = surrogate.get_similarity(snr_linear, 4)
|
|
sim_k6 = surrogate.get_similarity(snr_linear, 6)
|
|
|
|
assert np.all(sim_k4 < sim_k6)
|