PaperTool/workspace/resource_allocation/analysis/replication_plan.md
hc ced50ea2b0 feat(agent): add result-verifier for blind visual comparison
Root cause: test-runner was giving overly optimistic results due to:
1. Context bias - knew the implementation, tended to defend it
2. No actual visual comparison - just wrote 'ACCEPTABLE' without looking
3. No structural validation - accepted 35x scale differences as 'acceptable'

Solution:
- New result-verifier agent that performs blind visual comparison
- Strict pass/fail criteria for structural validation
- Updated test-runner to use result-verifier for each figure
- Clear guidelines: structural mismatches = FAIL, not ACCEPTABLE

Test result: verifier correctly identified Fig3 as FAIL with 7 specific issues:
- Wrong X-axis variable (channels vs power)
- Wrong Y-axis scale (5x difference)
- Wrong curve count (5 vs 4)
- etc.
2026-03-31 23:56:36 +08:00

90 lines
4.3 KiB
Markdown

# Replication Plan
## Scope
The core goal of this replication is to implement the semantic-aware resource allocation algorithm (Hungarian algorithm for channel assignment + exhaustive search for optimal $k_n$) and the transform method for fair comparison.
**Out of scope:** The DeepSC neural network training and NLP text processing. Instead, we will simulate the pre-trained DeepSC behavior using a parameterized surrogate function or look-up table mapping SNR and $k_n$ to semantic similarity ($\xi$). The user explicitly requested NOT to reproduce Figure 2, so the focus will be entirely on Figures 3, 4a, 4b, and 4c.
## Implementation Order
### Module 1: Environment & Channel Simulator
- **File**: `src/models/environment.py`
- **Dependencies**: None
- **Test file**: `tests/test_environment.py`
- **Acceptance criteria**:
- [ ] Generate N users and M channels with specified bandwidth
- [ ] Apply pathloss (128.1 + 37.6 lg[d(km)] dB) and shadow fading (6 dB)
- [ ] Calculate SNR $\gamma_{n,m}$ based on noise power and Rayleigh fading
### Module 2: Semantic Similarity Surrogate
- **File**: `src/models/semantic_model.py`
- **Dependencies**: `src/models/environment.py`
- **Test file**: `tests/test_semantic_model.py`
- **Acceptance criteria**:
- [ ] Given SNR and $k_n$, returns a simulated semantic similarity $\xi \in [0, 1]$
- [ ] Higher SNR and higher $k_n$ strictly increase $\xi$
### Module 3: Resource Allocation Optimizer
- **File**: `src/models/allocator.py`
- **Dependencies**: `src/models/semantic_model.py`, `src/models/environment.py`
- **Test file**: `tests/test_allocator.py`
- **Acceptance criteria**:
- [ ] Implement exhaustive search over $k_n \in [1, K]$ to find optimal $\widetilde{\Phi}_{n,m}$
- [ ] Implement Hungarian algorithm for bipartite channel assignment ($\alpha_{n,m}$)
- [ ] Compute overall S-SE for the proposed model and conventional/fixed models
### Module 4: Transform Method & Baselines
- **File**: `src/models/baselines.py`
- **Dependencies**: `src/models/environment.py`
- **Test file**: `tests/test_baselines.py`
- **Acceptance criteria**:
- [ ] Implement Ideal Shannon limit SE calculation
- [ ] Implement 4G and 5G CQI to SE mapping lookup
- [ ] Implement transform method: calculate equivalent S-SE given transforming factor $\mu$
### Module 5: Evaluation & Plotting
- **File**: `src/evaluate.py`
- **Dependencies**: All of the above
- **Test file**: None (creates final plots)
- **Acceptance criteria**:
- [ ] Generate outputs corresponding to target Figures 3, 4a, 4b, 4c.
## Replication Targets
### Figure 3: S-SE of the semantic-aware network with different models
- **Type**: Line Plot
- **Data source**: Resource allocation output (Module 3) vs fixed $k_n$ baselines
- **Priority**: High
- **Expected values**: Proposed model S-SE > fixed $k_n$ models. Plateau expected around ~1.2 S-SE. (REFERENCE ONLY)
### Figure 4(a): S-SE versus the number of channels
- **Type**: Line Plot
- **Data source**: Evaluation loop varying channels M from 1 to 10
- **Priority**: High
- **Expected values**: Semantic > Ideal > 5G > 4G for M>=5. (REFERENCE ONLY)
### Figure 4(b): S-SE versus the transmit power
- **Type**: Line Plot
- **Data source**: Evaluation loop varying transmit power (-40 to 23 dBm)
- **Priority**: High
- **Expected values**: Semantic plateaus around 10 dBm, Ideal grows continuously and overtakes Semantic. (REFERENCE ONLY)
### Figure 4(c): S-SE versus the transforming factor
- **Type**: Line Plot
- **Data source**: Evaluation loop varying $\mu$ (bits/word) from 18 to 40
- **Priority**: High
- **Expected values**: Semantic outperforms 5G and 4G for $\mu > 19$, and outperforms Ideal for $\mu > 27$. (HIGH Reliability)
## Environment Requirements
- Python >= 3.10
- NumPy >= 1.23.0
- SciPy >= 1.9.0 (for linear_sum_assignment)
- Matplotlib >= 3.6.0
## Estimated Effort
- Core model: 4 hours
- Training pipeline (Optimization loop): 2 hours
- Evaluation: 2 hours
## Known Challenges
1. DeepSC Simulator Approximation: The exact DeepSC performance curve is not provided analytically. Mitigation: We will fit a parameterized logistic/sigmoid curve that approximates the $\xi$ mapping over SNR and $k_n$ derived from the visual insights of Figure 2.
2. 3GPP Tables for 4G/5G: 3GPP TS 36.213 and 38.214 need specific threshold tables. Mitigation: Implement an approximate step function matching realistic SE/CQI curves for these specifications.