287 lines
11 KiB
Markdown
287 lines
11 KiB
Markdown
# Co-MADDPG: 面向语义与传统混合通信的合作竞争多智能体资源分配框架
|
||
|
||
**Co-MADDPG: Cooperative-Competitive Multi-Agent Resource Allocation for Semantic-Traditional Hybrid Wireless Communication**
|
||
|
||
---
|
||
|
||
## 项目简介 / Project Overview
|
||
|
||
本项目实现了 Co-MADDPG 算法——一种基于 Stackelberg 博弈和动态合作-竞争切换机制的多智能体深度强化学习框架,用于语义通信与传统比特级通信共存场景下的 OFDMA 无线资源分配。
|
||
|
||
This project implements the Co-MADDPG algorithm — a multi-agent deep reinforcement learning framework based on Stackelberg game dynamics and dynamic cooperation-competition switching for OFDMA wireless resource allocation in semantic-traditional hybrid communication systems.
|
||
|
||
### 核心创新 / Key Innovations
|
||
|
||
1. **合作竞争博弈建模 / Coopetition Game Modeling**: 将语义用户 (Leader) 与传统用户 (Follower) 之间的资源竞争建模为 Stackelberg 博弈
|
||
2. **动态 λ(t) 切换 / Dynamic λ(t) Switching**: `λ(t) = sigmoid(β·(QoE_sys - Q_th))`,根据系统 QoE 在合作与竞争之间自适应切换
|
||
3. **异构 QoE 指标 / Heterogeneous QoE**: 语义用户使用 SSim + 压缩率,传统用户使用速率满足度
|
||
4. **CTDE 架构 / CTDE Architecture**: 集中训练分散执行,联合 Critic 网络
|
||
|
||
### 目标期刊 / Target Venue
|
||
|
||
IEEE Transactions on Communications (TCOM)
|
||
|
||
---
|
||
|
||
## 环境要求 / Requirements
|
||
|
||
### Python 版本 / Python Version
|
||
- Python 3.8+
|
||
|
||
### 依赖库 / Dependencies
|
||
|
||
```bash
|
||
pip install numpy torch pyyaml matplotlib
|
||
```
|
||
|
||
| 库 / Library | 版本 / Version | 用途 / Purpose |
|
||
|---|---|---|
|
||
| `numpy` | ≥1.20 | 数值计算、信道建模 / Numerical computation, channel modeling |
|
||
| `torch` | ≥1.10 (CPU 或 GPU) | 神经网络训练 / Neural network training |
|
||
| `pyyaml` | ≥5.0 | 配置文件加载 / Configuration file loading |
|
||
| `matplotlib` | ≥3.4 | IEEE 风格绘图 / IEEE-style plotting |
|
||
|
||
### 硬件建议 / Hardware Recommendations
|
||
|
||
| 场景 / Scenario | 配置 / Configuration |
|
||
|---|---|
|
||
| 功能验证 (Smoke Test) | CPU, 2-5 episodes, ~30 秒 |
|
||
| 短期训练 (Short Training) | CPU/GPU, 100-500 episodes, ~10-60 分钟 |
|
||
| 完整训练 (Full Training) | GPU (CUDA), 5000 episodes, ~2-8 小时 |
|
||
|
||
---
|
||
|
||
## 快速开始 / Quick Start
|
||
|
||
### 1. 克隆项目 / Clone
|
||
|
||
```bash
|
||
git clone <repo-url>
|
||
cd SemantiCommunication/code
|
||
```
|
||
|
||
### 2. 功能验证 (Smoke Test)
|
||
|
||
```bash
|
||
# 训练 Co-MADDPG 2 个 episode(验证代码逻辑)
|
||
python train.py --algo co_maddpg --episodes 2 --steps 10
|
||
|
||
# 训练所有 8 个算法各 2 个 episode
|
||
python train.py --algo all --episodes 2 --steps 10
|
||
```
|
||
|
||
### 3. 正式训练 / Full Training
|
||
|
||
```bash
|
||
# 单算法训练(推荐先跑主算法)
|
||
python train.py --algo co_maddpg --episodes 5000
|
||
|
||
# 训练全部 8 个算法
|
||
python train.py --algo all --episodes 5000
|
||
|
||
# 指定配置文件
|
||
python train.py --algo co_maddpg --config configs/default.yaml --episodes 5000
|
||
```
|
||
|
||
### 4. 评估与绘图 / Evaluation & Plotting
|
||
|
||
```bash
|
||
# 运行全部 8 个评估场景,生成 12+ 张图
|
||
python evaluate.py
|
||
|
||
# 指定结果目录
|
||
python evaluate.py --results_dir results/
|
||
```
|
||
|
||
---
|
||
|
||
## 支持的算法 / Supported Algorithms
|
||
|
||
| # | 算法 / Algorithm | CLI 名称 | λ | 更新方式 / Update | Critic 类型 | 用途 / Purpose |
|
||
|---|---|---|---|---|---|---|
|
||
| 1 | **Co-MADDPG** | `co_maddpg` | 动态 dynamic | Stackelberg | Joint (CTDE) | 本文提出 / Proposed |
|
||
| 2 | PureCooperative | `pure_coop` | 1.0 | Simultaneous | Joint | 消融:去除竞争 / Ablate competition |
|
||
| 3 | PureCompetitive | `pure_comp` | 0.0 | Simultaneous | Joint | 消融:去除合作 / Ablate cooperation |
|
||
| 4 | FixedLambda | `fixed_lambda` | 0.5 | Stackelberg | Joint | 消融:去除动态 λ / Ablate dynamic λ |
|
||
| 5 | IDDPG | `iddpg` | 0.0 | Simultaneous | Independent | 消融:去除 CTDE / Ablate CTDE |
|
||
| 6 | SingleAgentDQN | `single_dqn` | 0.5 | N/A | Centralized | 非 MARL 基线 / Non-MARL baseline |
|
||
| 7 | EqualAllocation | `equal_alloc` | 0.5 | N/A | None | 性能下界 / Lower bound |
|
||
| 8 | SemanticOnly | `semantic_only` | 1.0 | N/A | Single | 消融:去除异构性 / Ablate heterogeneity |
|
||
|
||
---
|
||
|
||
## 项目结构 / Project Structure
|
||
|
||
```
|
||
SemantiCommunication/code/
|
||
│
|
||
├── configs/ # 配置文件 / Configuration
|
||
│ ├── __init__.py
|
||
│ └── default.yaml # 主配置(超参数、环境参数)/ Main config
|
||
│
|
||
├── envs/ # 环境模块 / Environment modules
|
||
│ ├── __init__.py
|
||
│ ├── channel_model.py # 3GPP 信道模型 / 3GPP channel model (Eq.5-8)
|
||
│ ├── semantic_module.py # 语义相似度 SSim / Semantic similarity (SSim)
|
||
│ └── wireless_env.py # Gym 风格无线环境 / Gym-like wireless env
|
||
│
|
||
├── agents/ # 核心算法 / Core algorithm
|
||
│ ├── __init__.py
|
||
│ ├── actor.py # Actor 网络 FC→Tanh→[0,1]
|
||
│ ├── critic.py # Critic 网络 (Joint Q-value)
|
||
│ ├── noise.py # OU 探索噪声 / OU exploration noise
|
||
│ ├── replay_buffer.py # 9-field 经验回放 / 9-field replay buffer
|
||
│ └── co_maddpg.py # Co-MADDPG 主算法 / Main algorithm (★)
|
||
│
|
||
├── baselines/ # 7 个基线算法 / 7 baseline algorithms
|
||
│ ├── __init__.py
|
||
│ ├── pure_coop.py # λ=1 纯协作
|
||
│ ├── pure_comp.py # λ=0 纯竞争
|
||
│ ├── fixed_lambda.py # λ=0.5 固定
|
||
│ ├── iddpg.py # 独立 DDPG (无 CTDE)
|
||
│ ├── single_dqn.py # 集中式 DQN
|
||
│ ├── equal_alloc.py # 均分分配
|
||
│ └── semantic_only.py # 仅语义 DDPG
|
||
│
|
||
├── utils/ # 工具模块 / Utility modules
|
||
│ ├── __init__.py
|
||
│ ├── metrics.py # 评估指标 (Jain fairness, λ, rewards)
|
||
│ └── visualization.py # IEEE 风格绘图 (12 种图)
|
||
│
|
||
├── train.py # 训练入口 / Training entry point (★)
|
||
├── evaluate.py # 评估入口 / Evaluation entry point (★)
|
||
├── README.md # 本文件 / This file
|
||
├── ARCHITECTURE.md # 架构设计文档 / Architecture document
|
||
├── API.md # API 接口文档 / API reference
|
||
└── results/ # 训练结果输出 / Training output directory
|
||
```
|
||
|
||
---
|
||
|
||
## 配置说明 / Configuration
|
||
|
||
配置文件位于 `configs/default.yaml`,主要分为 4 个部分:
|
||
|
||
### env(环境参数)
|
||
|
||
| 参数 | 默认值 | 说明 |
|
||
|---|---|---|
|
||
| `num_subcarriers` | 64 | OFDMA 子载波数 N |
|
||
| `bandwidth` | 10.0e+6 | 系统带宽 (Hz) |
|
||
| `subcarrier_spacing` | 156250.0 | 子载波间隔 Δf (Hz) |
|
||
| `max_power` | 1.0 | 最大发射功率 (W) |
|
||
| `noise_psd` | -174 | 噪声功率谱密度 (dBm/Hz) |
|
||
| `carrier_freq` | 3.5 | 载波频率 (GHz) |
|
||
| `num_semantic_users` | 3 | 语义用户数 K_s |
|
||
| `num_traditional_users` | 3 | 传统用户数 K_b |
|
||
| `min_rate_req` | 5.0e+5 | 传统用户最低速率需求 (bps) |
|
||
| `rho_max` / `rho_min` | 1.0 / 0.05 | 压缩率范围 |
|
||
| `w1` / `w2` | 0.7 / 0.3 | 语义 QoE 权重 |
|
||
|
||
### training(训练参数)
|
||
|
||
| 参数 | 默认值 | 说明 |
|
||
|---|---|---|
|
||
| `max_episodes` | 5000 | 最大训练轮数 |
|
||
| `max_steps` | 200 | 每轮最大步数 |
|
||
| `batch_size` | 256 | 批量大小 |
|
||
| `buffer_capacity` | 100000 | 经验回放容量 |
|
||
| `actor_lr` / `critic_lr` | 1e-4 / 3e-4 | 学习率 |
|
||
| `gamma` | 0.95 | 折扣因子 |
|
||
| `tau` | 0.01 | 软更新系数 |
|
||
| `beta` | 5.0 | λ(t) sigmoid 的陡度 |
|
||
| `q_threshold` | 0.6 | λ(t) 切换阈值 Q_th |
|
||
|
||
### network(网络结构)
|
||
|
||
| 参数 | 默认值 | 说明 |
|
||
|---|---|---|
|
||
| `actor_hidden` | [256, 256, 128] | Actor 隐藏层 |
|
||
| `critic_hidden` | [512, 512, 256] | Critic 隐藏层 |
|
||
|
||
### reward(奖励权重)
|
||
|
||
| 参数 | 默认值 | 说明 |
|
||
|---|---|---|
|
||
| `coop_self` / `coop_other` / `coop_sys` | 0.5 / 0.3 / 0.2 | 合作奖励权重 |
|
||
| `comp_self` / `comp_sys` | 0.8 / 0.2 | 竞争奖励权重 |
|
||
|
||
---
|
||
|
||
## 关键公式 / Key Formulas
|
||
|
||
| 公式 | 表达式 | 论文编号 |
|
||
|---|---|---|
|
||
| 路径损耗 / Path Loss | `PL(d) = 36.7·log₁₀(d) + 22.7 + 26·log₁₀(fc)` | Eq.(5) |
|
||
| 信道增益 / Channel Gain | `h_{k,n} ~ CN(0, 10^{-PL/10})` | Eq.(6) |
|
||
| 噪声功率 / Noise Power | `σ² = 10^{(N₀_dBm-30)/10} · Δf` | Eq.(7) |
|
||
| 信噪比 / SNR | `γ_{k,n} = p_{k,n} · \|h_{k,n}\|² / σ²` | Eq.(8) |
|
||
| 语义相似度 / SSim | `φ(γ̄,ρ) = 1 - exp(-a(ρ)·γ̄^{b(ρ)})` | — |
|
||
| 语义 QoE | `QoE_s = 0.7·SSim + 0.3·(1-ρ/ρ_max)` | — |
|
||
| 传统 QoE | `QoE_b = min(R_k/R_req, 1)` | — |
|
||
| 动态 λ | `λ(t) = sigmoid(β·(QoE_sys - Q_th))` | — |
|
||
| 混合奖励 | `r_i = λ·r_coop + (1-λ)·r_comp` | — |
|
||
|
||
---
|
||
|
||
## 评估场景 / Evaluation Scenarios
|
||
|
||
`evaluate.py` 包含 8 个评估场景,对应论文 Section VII 的 12 张图:
|
||
|
||
| # | 场景 | 对应图表 | 说明 |
|
||
|---|---|---|---|
|
||
| 1 | Convergence | Fig.2 | 收敛曲线对比 |
|
||
| 2 | QoE vs SNR | Fig.3 | 不同 SNR 下的系统 QoE |
|
||
| 3 | Fairness vs SNR | Fig.4 | 不同 SNR 下的 Jain 公平性 |
|
||
| 4 | QoE vs Users | Fig.5 | 用户数量扩展性 |
|
||
| 5 | Rate Satisfaction vs Users | Fig.6 | 速率满足度 |
|
||
| 6 | Lambda Trajectory | Fig.7-8 | λ(t) 演化轨迹和散点图 |
|
||
| 7 | Ablation Study | Fig.10 | 消融实验柱状图 |
|
||
| 8 | Sensitivity | Fig.11-12 | β 和 Q_th 敏感性分析 |
|
||
|
||
---
|
||
|
||
## 输出文件 / Output Files
|
||
|
||
训练和评估产生的文件保存在 `results/` 目录:
|
||
|
||
```
|
||
results/
|
||
├── <algo_name>/
|
||
│ ├── model_s.pth # 语义智能体模型权重
|
||
│ ├── model_b.pth # 传统智能体模型权重
|
||
│ ├── training_log.json # 训练指标日志
|
||
│ └── config_snapshot.yaml # 训练时的配置快照
|
||
├── figures/
|
||
│ ├── fig02_convergence.png
|
||
│ ├── fig03_qoe_vs_snr.png
|
||
│ ├── ...
|
||
│ └── fig12_qth_sensitivity.png
|
||
└── evaluation_results.json # 评估汇总数据
|
||
```
|
||
|
||
---
|
||
|
||
## 已知问题与注意事项 / Known Issues & Notes
|
||
|
||
1. **YAML 科学记数法**: 使用 `5.0e+5` 格式(非 `500.0e3`),否则 `yaml.safe_load()` 会将其解析为字符串
|
||
2. **Smoke Test QoE 值**: 2 episode 的 smoke test 中所有算法的 QoE 值相近(~0.7-0.9),这是因为网络尚未充分训练。需完整训练(5000 episodes)才能看到显著差异
|
||
3. **GPU 加速**: 默认自动检测 CUDA。CPU 训练较慢但功能完整
|
||
4. **随机种子**: 默认 seed=42,可在配置中修改
|
||
|
||
---
|
||
|
||
## 论文引用 / Citation
|
||
|
||
如引用本工作,请参考论文:
|
||
|
||
> Co-MADDPG: 面向语义与传统混合通信的合作竞争多智能体资源分配框架
|
||
|
||
论文文件位于 `../paper/` 目录。
|
||
|
||
---
|
||
|
||
## License
|
||
|
||
MIT License
|