History

hc 5efb877df7 Initial commit: add project materials and code		2026-02-28 16:17:42 +08:00
..
agents	Initial commit: add project materials and code	2026-02-28 16:17:42 +08:00
baselines	Initial commit: add project materials and code	2026-02-28 16:17:42 +08:00
configs	Initial commit: add project materials and code	2026-02-28 16:17:42 +08:00
envs	Initial commit: add project materials and code	2026-02-28 16:17:42 +08:00
results	Initial commit: add project materials and code	2026-02-28 16:17:42 +08:00
utils	Initial commit: add project materials and code	2026-02-28 16:17:42 +08:00
API.md	Initial commit: add project materials and code	2026-02-28 16:17:42 +08:00
ARCHITECTURE.md	Initial commit: add project materials and code	2026-02-28 16:17:42 +08:00
evaluate.py	Initial commit: add project materials and code	2026-02-28 16:17:42 +08:00
README.md	Initial commit: add project materials and code	2026-02-28 16:17:42 +08:00
train.py	Initial commit: add project materials and code	2026-02-28 16:17:42 +08:00

README.md

Co-MADDPG: 面向语义与传统混合通信的合作竞争多智能体资源分配框架

Co-MADDPG: Cooperative-Competitive Multi-Agent Resource Allocation for Semantic-Traditional Hybrid Wireless Communication

项目简介 / Project Overview

本项目实现了 Co-MADDPG 算法——一种基于 Stackelberg 博弈和动态合作-竞争切换机制的多智能体深度强化学习框架，用于语义通信与传统比特级通信共存场景下的 OFDMA 无线资源分配。

This project implements the Co-MADDPG algorithm — a multi-agent deep reinforcement learning framework based on Stackelberg game dynamics and dynamic cooperation-competition switching for OFDMA wireless resource allocation in semantic-traditional hybrid communication systems.

核心创新 / Key Innovations

合作竞争博弈建模 / Coopetition Game Modeling: 将语义用户 (Leader) 与传统用户 (Follower) 之间的资源竞争建模为 Stackelberg 博弈
动态 λ(t) 切换 / Dynamic λ(t) Switching: λ(t) = sigmoid(β·(QoE_sys - Q_th))，根据系统 QoE 在合作与竞争之间自适应切换
异构 QoE 指标 / Heterogeneous QoE: 语义用户使用 SSim + 压缩率，传统用户使用速率满足度
CTDE 架构 / CTDE Architecture: 集中训练分散执行，联合 Critic 网络

目标期刊 / Target Venue

IEEE Transactions on Communications (TCOM)

环境要求 / Requirements

Python 版本 / Python Version

Python 3.8+

依赖库 / Dependencies

pip install numpy torch pyyaml matplotlib

库 / Library	版本 / Version	用途 / Purpose
`numpy`	≥1.20	数值计算、信道建模 / Numerical computation, channel modeling
`torch`	≥1.10 (CPU 或 GPU)	神经网络训练 / Neural network training
`pyyaml`	≥5.0	配置文件加载 / Configuration file loading
`matplotlib`	≥3.4	IEEE 风格绘图 / IEEE-style plotting

硬件建议 / Hardware Recommendations

场景 / Scenario	配置 / Configuration
功能验证 (Smoke Test)	CPU, 2-5 episodes, ~30 秒
短期训练 (Short Training)	CPU/GPU, 100-500 episodes, ~10-60 分钟
完整训练 (Full Training)	GPU (CUDA), 5000 episodes, ~2-8 小时

快速开始 / Quick Start

1. 克隆项目 / Clone

git clone <repo-url>
cd SemantiCommunication/code

2. 功能验证 (Smoke Test)

# 训练 Co-MADDPG 2 个 episode（验证代码逻辑）
python train.py --algo co_maddpg --episodes 2 --steps 10

# 训练所有 8 个算法各 2 个 episode
python train.py --algo all --episodes 2 --steps 10

3. 正式训练 / Full Training

# 单算法训练（推荐先跑主算法）
python train.py --algo co_maddpg --episodes 5000

# 训练全部 8 个算法
python train.py --algo all --episodes 5000

# 指定配置文件
python train.py --algo co_maddpg --config configs/default.yaml --episodes 5000

4. 评估与绘图 / Evaluation & Plotting

# 运行全部 8 个评估场景，生成 12+ 张图
python evaluate.py

# 指定结果目录
python evaluate.py --results_dir results/

支持的算法 / Supported Algorithms

#	算法 / Algorithm	CLI 名称	λ	更新方式 / Update	Critic 类型	用途 / Purpose
1	Co-MADDPG	`co_maddpg`	动态 dynamic	Stackelberg	Joint (CTDE)	本文提出 / Proposed
2	PureCooperative	`pure_coop`	1.0	Simultaneous	Joint	消融：去除竞争 / Ablate competition
3	PureCompetitive	`pure_comp`	0.0	Simultaneous	Joint	消融：去除合作 / Ablate cooperation
4	FixedLambda	`fixed_lambda`	0.5	Stackelberg	Joint	消融：去除动态 λ / Ablate dynamic λ
5	IDDPG	`iddpg`	0.0	Simultaneous	Independent	消融：去除 CTDE / Ablate CTDE
6	SingleAgentDQN	`single_dqn`	0.5	N/A	Centralized	非 MARL 基线 / Non-MARL baseline
7	EqualAllocation	`equal_alloc`	0.5	N/A	None	性能下界 / Lower bound
8	SemanticOnly	`semantic_only`	1.0	N/A	Single	消融：去除异构性 / Ablate heterogeneity

项目结构 / Project Structure

SemantiCommunication/code/
│
├── configs/                     # 配置文件 / Configuration
│   ├── __init__.py
│   └── default.yaml             # 主配置（超参数、环境参数）/ Main config
│
├── envs/                        # 环境模块 / Environment modules
│   ├── __init__.py
│   ├── channel_model.py         # 3GPP 信道模型 / 3GPP channel model (Eq.5-8)
│   ├── semantic_module.py       # 语义相似度 SSim / Semantic similarity (SSim)
│   └── wireless_env.py          # Gym 风格无线环境 / Gym-like wireless env
│
├── agents/                      # 核心算法 / Core algorithm
│   ├── __init__.py
│   ├── actor.py                 # Actor 网络 FC→Tanh→[0,1]
│   ├── critic.py                # Critic 网络 (Joint Q-value)
│   ├── noise.py                 # OU 探索噪声 / OU exploration noise
│   ├── replay_buffer.py         # 9-field 经验回放 / 9-field replay buffer
│   └── co_maddpg.py             # Co-MADDPG 主算法 / Main algorithm (★)
│
├── baselines/                   # 7 个基线算法 / 7 baseline algorithms
│   ├── __init__.py
│   ├── pure_coop.py             # λ=1 纯协作
│   ├── pure_comp.py             # λ=0 纯竞争
│   ├── fixed_lambda.py          # λ=0.5 固定
│   ├── iddpg.py                 # 独立 DDPG (无 CTDE)
│   ├── single_dqn.py            # 集中式 DQN
│   ├── equal_alloc.py           # 均分分配
│   └── semantic_only.py         # 仅语义 DDPG
│
├── utils/                       # 工具模块 / Utility modules
│   ├── __init__.py
│   ├── metrics.py               # 评估指标 (Jain fairness, λ, rewards)
│   └── visualization.py         # IEEE 风格绘图 (12 种图)
│
├── train.py                     # 训练入口 / Training entry point (★)
├── evaluate.py                  # 评估入口 / Evaluation entry point (★)
├── README.md                    # 本文件 / This file
├── ARCHITECTURE.md              # 架构设计文档 / Architecture document
├── API.md                       # API 接口文档 / API reference
└── results/                     # 训练结果输出 / Training output directory

配置说明 / Configuration

配置文件位于 configs/default.yaml，主要分为 4 个部分：

env（环境参数）

参数	默认值	说明
`num_subcarriers`	64	OFDMA 子载波数 N
`bandwidth`	10.0e+6	系统带宽 (Hz)
`subcarrier_spacing`	156250.0	子载波间隔 Δf (Hz)
`max_power`	1.0	最大发射功率 (W)
`noise_psd`	-174	噪声功率谱密度 (dBm/Hz)
`carrier_freq`	3.5	载波频率 (GHz)
`num_semantic_users`	3	语义用户数 K_s
`num_traditional_users`	3	传统用户数 K_b
`min_rate_req`	5.0e+5	传统用户最低速率需求 (bps)
`rho_max` / `rho_min`	1.0 / 0.05	压缩率范围
`w1` / `w2`	0.7 / 0.3	语义 QoE 权重

training（训练参数）

参数	默认值	说明
`max_episodes`	5000	最大训练轮数
`max_steps`	200	每轮最大步数
`batch_size`	256	批量大小
`buffer_capacity`	100000	经验回放容量
`actor_lr` / `critic_lr`	1e-4 / 3e-4	学习率
`gamma`	0.95	折扣因子
`tau`	0.01	软更新系数
`beta`	5.0	λ(t) sigmoid 的陡度
`q_threshold`	0.6	λ(t) 切换阈值 Q_th

network（网络结构）

参数	默认值	说明
`actor_hidden`	[256, 256, 128]	Actor 隐藏层
`critic_hidden`	[512, 512, 256]	Critic 隐藏层

reward（奖励权重）

参数	默认值	说明
`coop_self` / `coop_other` / `coop_sys`	0.5 / 0.3 / 0.2	合作奖励权重
`comp_self` / `comp_sys`	0.8 / 0.2	竞争奖励权重

关键公式 / Key Formulas

公式	表达式	论文编号
路径损耗 / Path Loss	`PL(d) = 36.7·log₁₀(d) + 22.7 + 26·log₁₀(fc)`	Eq.(5)
信道增益 / Channel Gain	`h_{k,n} ~ CN(0, 10^{-PL/10})`	Eq.(6)
噪声功率 / Noise Power	`σ² = 10^{(N₀_dBm-30)/10} · Δf`	Eq.(7)
信噪比 / SNR	`γ_{k,n} = p_{k,n} · \|h_{k,n}\|² / σ²`	Eq.(8)
语义相似度 / SSim	`φ(γ̄,ρ) = 1 - exp(-a(ρ)·γ̄^{b(ρ)})`	—
语义 QoE	`QoE_s = 0.7·SSim + 0.3·(1-ρ/ρ_max)`	—
传统 QoE	`QoE_b = min(R_k/R_req, 1)`	—
动态 λ	`λ(t) = sigmoid(β·(QoE_sys - Q_th))`	—
混合奖励	`r_i = λ·r_coop + (1-λ)·r_comp`	—

评估场景 / Evaluation Scenarios

evaluate.py 包含 8 个评估场景，对应论文 Section VII 的 12 张图：

#	场景	对应图表	说明
1	Convergence	Fig.2	收敛曲线对比
2	QoE vs SNR	Fig.3	不同 SNR 下的系统 QoE
3	Fairness vs SNR	Fig.4	不同 SNR 下的 Jain 公平性
4	QoE vs Users	Fig.5	用户数量扩展性
5	Rate Satisfaction vs Users	Fig.6	速率满足度
6	Lambda Trajectory	Fig.7-8	λ(t) 演化轨迹和散点图
7	Ablation Study	Fig.10	消融实验柱状图
8	Sensitivity	Fig.11-12	β 和 Q_th 敏感性分析

输出文件 / Output Files

训练和评估产生的文件保存在 results/ 目录：

results/
├── <algo_name>/
│   ├── model_s.pth              # 语义智能体模型权重
│   ├── model_b.pth              # 传统智能体模型权重
│   ├── training_log.json        # 训练指标日志
│   └── config_snapshot.yaml     # 训练时的配置快照
├── figures/
│   ├── fig02_convergence.png
│   ├── fig03_qoe_vs_snr.png
│   ├── ...
│   └── fig12_qth_sensitivity.png
└── evaluation_results.json      # 评估汇总数据

已知问题与注意事项 / Known Issues & Notes

YAML 科学记数法: 使用 5.0e+5 格式（非 500.0e3），否则 yaml.safe_load() 会将其解析为字符串
Smoke Test QoE 值: 2 episode 的 smoke test 中所有算法的 QoE 值相近（~0.7-0.9），这是因为网络尚未充分训练。需完整训练（5000 episodes）才能看到显著差异
GPU 加速: 默认自动检测 CUDA。CPU 训练较慢但功能完整
随机种子: 默认 seed=42，可在配置中修改

论文引用 / Citation

如引用本工作，请参考论文：

Co-MADDPG: 面向语义与传统混合通信的合作竞争多智能体资源分配框架

论文文件位于 ../paper/ 目录。

License

MIT License

README.md Unescape Escape

Co-MADDPG: 面向语义与传统混合通信的合作竞争多智能体资源分配框架

项目简介 / Project Overview

核心创新 / Key Innovations

目标期刊 / Target Venue

环境要求 / Requirements

Python 版本 / Python Version

依赖库 / Dependencies

硬件建议 / Hardware Recommendations

快速开始 / Quick Start

1. 克隆项目 / Clone

2. 功能验证 (Smoke Test)

3. 正式训练 / Full Training

4. 评估与绘图 / Evaluation & Plotting

支持的算法 / Supported Algorithms

项目结构 / Project Structure

配置说明 / Configuration

env（环境参数）

training（训练参数）

network（网络结构）

reward（奖励权重）

关键公式 / Key Formulas

评估场景 / Evaluation Scenarios

输出文件 / Output Files

已知问题与注意事项 / Known Issues & Notes

论文引用 / Citation

License

README.md