1511 lines
72 KiB
Plaintext
1511 lines
72 KiB
Plaintext
2478 IEEEJOURNALONSELECTEDAREASINCOMMUNICATIONS,VOL.43,NO.7,JULY2025
|
||
Hybrid Digital-Analog Semantic Communications
|
||
Huiqiang Xie , Member, IEEE, Zhijin Qin , Senior Member, IEEE, Zhu Han , Fellow, IEEE,
|
||
and Khaled B. Letaief , Fellow, IEEE
|
||
Abstract—Digital and analog semantic communications (Sem- andubiquitousconnectedintelligence.Tomeetthesedemands,
|
||
Com) face inherent limitations such as data security concerns target key performance indicators [2] have been proposed,
|
||
in analog SemCom, as well as leveling-off and cliff-edge effects aiming to ensure the seamless integration of these advanced
|
||
in digital SemCom. In order to overcome these challenges,
|
||
technologies in the next generation of mobile communication
|
||
we propose a novel SemCom framework and a corresponding
|
||
systems, e.g., 107 devices/km2 for connectivity, 60 b/s/Hz
|
||
system called HDA-DeepSC, which leverages a hybrid digital-
|
||
analog approach for multimedia transmission. This is achieved for spectral efficiency, and 100 us for end-to-end latency.
|
||
through the introduction of analog-digital allocation and fusion To materialize this vision, semantic communications [3] have
|
||
modules. To strike a balance between data rate and distortion, been envisioned as one of the potential technologies due to
|
||
wedesignnewlossfunctionsthattakeintoaccountlong-distance the low semantic errors, high spectral efficiency, and high
|
||
dependencies in the semantic distortion constraint, essential
|
||
transmission rates. By exchanging semantic information at
|
||
information recovery in the channel distortion constraint, and
|
||
both ends, semantic communications can reconstruct sources
|
||
optimal bit stream generation in the rate constraint. Addi-
|
||
tionally, we propose denoising diffusion-based signal detection or directly perform tasks with the tolerance of transmission
|
||
techniques, which involve carefully designed variance schedules errors. According to the communication paradigm, seman-
|
||
and sampling algorithms to refine transmitted signals. Through tic communications can be categorized into two categories:
|
||
extensivenumericalexperiments,wewilldemonstratethatHDA- analog semantic communications and digital semantic com-
|
||
DeepSC exhibits robustness to channel variations and is capable
|
||
munications.
|
||
of supporting various communication scenarios. Our proposed
|
||
Analog semantic communications [4], [5], [6], [7], [8],
|
||
framework outperforms existing benchmarks in terms of peak
|
||
signal-to-noise ratio and multi-scale structural similarity, show- [9], [10], [11], [12] convey the semantic information using
|
||
casing its superiority in semantic communication quality. continuous signals, which takes advantage of deep learning
|
||
Index Terms—Semantic communications, multimedia trans- (DL) to design end-to-end systems and maps the source to
|
||
mission,analogcommunications,digitalcommunications,hybrid the non-fixed-size constellations directly. There exist many
|
||
digital-analog communications. works for different modal data transmission. Xie et al. [4]
|
||
have developed a DL based semantic communication system,
|
||
I. INTRODUCTION
|
||
named DeepSC, for text transmission, in which the sentences
|
||
AS MOBILE communication systems transition from the
|
||
are mapped to the embedding vectors and then transformed
|
||
fifth generation (5G) to the sixth generation (6G), there
|
||
to the learned non-fixed-size constellation points. Yi et al.
|
||
is a need to address the evolving requirements of seamlessly
|
||
[5] introduced the explicit knowledge base to the DeepSC as
|
||
integrating virtual/augmented reality, remote control robots,
|
||
the side information and integrated the knowledge base into
|
||
the end-to-end optimization, achieving the higher bilingual
|
||
Received 15 May 2024; revised 16 December 2024; accepted 15 January
|
||
2025. Date of publication 10 April 2025; date of current version 19 June evaluationunderstudy(BLEU)scoreatthelowsignal-to-noise
|
||
2025. This work was supported in part by the National Key Research and ratio (SNR) regions. Weng et al. [6] have proposed an end-to-
|
||
Development Program of China under Grant 2023YFB2904300; in part by
|
||
end semantic communication system for speech recognition
|
||
the National Natural Science Foundation of China (NSFC) under Grant
|
||
62401227 and Grant 62293484; in part by Guangzhou Municipal Science and speech synthesis tasks, named DeepSC-ST. The speech
|
||
andTechnologyProjectunderGrant2025A04J3380;inpartbyFundamental signals are processed by the DeepSC-ST and output the con-
|
||
Research Funds for the Central Universities under Grant 21624349; in part
|
||
tinuous constellation points at the transmitter. Grassucci et al.
|
||
by the Hong Kong Research Grants Council under the Areas of Excellence
|
||
[7]havedesignedagenerativeaudiosemanticcommunication
|
||
Scheme under Grant AoE/E-601/22-R; in part by NSF ECCS-2302469,
|
||
Toyota;inpartbyAmazon;andinpartbytheJapanScienceandTechnology framework,whichtransmitsthecontinuousembeddingvectors
|
||
Agency (JST) Adopting Sustainable Partnerships for Innovative Research togeneratetheaudiosatthereceiver.Daietal.[9]haveinves-
|
||
Ecosystem (ASPIRE) under Grant JPMJAP2326. An earlier version of this
|
||
tigated the end-to-end image transmission problem, in which
|
||
paper was presented in part at the IEEE Globecom Workshop 2024 [1].
|
||
(Correspondingauthor:ZhijinQin.) the image is non-linearly transformed into continuous signals
|
||
Huiqiang Xie is with the College of Information Science and with different lengths. Wu et al. [11] have investigated the
|
||
Technology, Jinan University, Guangzhou 510632, China (e-mail:
|
||
end-to-end image transmission for multiple-inputs multiple-
|
||
huiqiangxie@jnu.edu.cn).
|
||
Zhijin Qin is with the Department of Electronic Engineering, Tsinghua outputs(MIMO)channels.Similarly,theimagesareconverted
|
||
University, Beijing 100084, China, also with the State Key Laboratory of into continuous semantic features and adaptively assigned to
|
||
Space Network and Communications, Beijing 100084, China, and also with
|
||
different subchannels based on the channel state information
|
||
Beijing National Research Center for Information Science and Technology,
|
||
Beijing100084,China(e-mail:qinzhijin@tsinghua.edu.cn). (CSI). Wang et al. [12] have proposed a video semantic
|
||
ZhuHaniswiththeDepartmentofElectricalandComputerEngineering, communication system, in which the semantic features of
|
||
UniversityofHouston,Houston,TX77004USA,andalsowiththeDepart- frames are extracted into continuous signals and transmitted
|
||
ment of Computer Science and Engineering, Kyung Hee University, Seoul
|
||
using analog communication methods.
|
||
446-701,SouthKorea(e-mail:hanzhu22@gmail.com).
|
||
Khaled B. Letaief is with the Department of Electronic and Computer The continuous signals in analog semantic communications
|
||
Engineering, The Hong Kong University of Science and Technology, Hong have two benefits. One is to allow gradient propagation and
|
||
Kong,China(e-mail:eekhaled@ust.hk).
|
||
enable end-to-end optimization. The other is that the contin-
|
||
DigitalObjectIdentifier10.1109/JSAC.2025.3559149
|
||
©2025TheAuthors.ThisworkislicensedunderaCreativeCommonsAttribution4.0License.
|
||
Formoreinformation,seehttps://creativecommons.org/licenses/by/4.0/
|
||
|
||
---PAGE BREAK---
|
||
|
||
XIEetal.:HYBRIDDIGITAL-ANALOGSEMANTICCOMMUNICATIONS 2479
|
||
uous signals have a high degree of freedom that provides the Q1: Howtoenhancedatasecurityandalleviatetheleveling-
|
||
smoothness performance optimization varying from channel off and cliff-edge effects?
|
||
conditions,enablingbetterrobustnessinthelowSNRregimes. Q2: How it be compatible with purely analog and digital
|
||
However, continuous signals also have flaws. The commer- semantic communication?
|
||
cial encryption algorithms are designed for discrete signals, Q3: How to support the various communication environ-
|
||
e.g., bit streams, raising concerns about the data security of ments, e.g., the wide bandwidth scenario or the weak
|
||
continuous signal-based systems. Besides, in some scenarios communication scenario?
|
||
that require accurate transmission at the bit level, analog The concept of hybrid digital-analog (HDA) joint source-
|
||
semantic communications cannot meet the requirement due to channel codes [22] was proposed by Mittal et al. in 2002,
|
||
theapproximatelyinfinitecandidatesetsincontinuoussignals. which proves that HDA codes are capable of theoretically
|
||
Therefore,digitalsemanticcommunicationshaveattractedthe achievingtheShannonlimit(theoreticallyoptimumdistortion)
|
||
attention of researchers. and a less severe leveling-off and cliff-edge effects. Since
|
||
Digitalsemanticcommunications[13],[14],[15],[16],[17], then, the HDA codes have attracted much attention from
|
||
[18],[19],[20],[21]transmitsemanticinformationinthetype academicsandindustries[23],[24],[25],[26],[27].Skoglund
|
||
of discrete signal, which maps the source to bit streams or et al. [24] have proposed HDA codes for the bandwidth
|
||
fixed-size constellations. Tung et al. [13] have proposed the compression scenarios, and Ko¨ken et al. [26] have analyzed
|
||
quantizedjointsource-channelcodingforimagetransmission, therobustnessofHDAcodeswithbandwidthmismatch.HDA
|
||
namedDeepJSCC-Q,bymappingthecontinuoussignalstothe transmission is also adopted in the Japanese and Canadian
|
||
close points in the fixed-size constellations to be compatible television signal transmission [28], where video and speech
|
||
with some protocols. Similarly, Bo et al. [14] improved the signals are transmitted by analog and digital transceivers,
|
||
quantized joint source-channel coding by learning transition respectively. Yu et al. [29] have designed the HDA joint
|
||
probabilityfromsourcedatatodiscreteconstellationsymbols, source-channel coding for scalable video transmission, named
|
||
inwhichtheGumbel-Maxsamplingisemployedtosamplethe WSVC. which takes the 2D discrete wavelet transform for
|
||
constellation points from the learned transition probability so analog transmission and H.264/AVC for digital transmission.
|
||
that avoiding the non-differentiable quantization. Guo et al. Lan et al. [30] have formulated the video transmission distor-
|
||
[16] quantized the semantic information with the learnable tionsfirstandthenproposedasub-optimalresourceallocation
|
||
non-linear scalar quantizer, which learns to adopt dynamic scheme, which allocates the power and quantization bits. Tan
|
||
quantizationlevelsfordifferentsemanticvalues.Fuetal.[18] etal.[31]haveproposedtheoptimalresourceallocationforthe
|
||
have proposed the vector quantized semantic communication Internet-of-things (IoT) scenario. Three factors are optimized
|
||
system, in which the semantic vectors are quantized into toenhancethequalityoftherecoveredimage,includingdigital
|
||
bit streams with the learnable vector quantizer and trans- bandwidth,orthogonalpower,andnonorthogonalpowerofthe
|
||
mitted with the digital channel codings and modulations. analog signal. Yahampath [32] has considered the imperfect
|
||
Gao et al. [20] have developed an adaptive modulation and channel state information (CSI) for the video transmission, in
|
||
retransmission scheme by deriving the relationship between which the digital power is allocated by considering the CSI
|
||
bit-error-rate and the task performance, in which the seman- errors, and the remaining power is used to transmit superim-
|
||
tic information is quantized into fixed-length bit streams. posed analog QAM symbols. However, these works rely on
|
||
Huang et al. [21] have proposed an iterative training algo- linear transforms and ignore the semantic information behind
|
||
rithm for digital semantic communications, in which the deep data,whichisunsuitablefornon-linearsemantictransmission.
|
||
source codec are trained according to the chosen channel Inspired by the concept of HDA codes, we propose a novel
|
||
coding rate. framework called DL-based HDA semantic communication.
|
||
The above works on digital semantic communication This framework integrates the strengths of both analog and
|
||
achieve accurate transmission at the bit or symbol level and digital semantic communications to effectively tackle the
|
||
part of the works can apply the encryption algorithms to challenges mentioned earlier. Firstly, the HDA semantic com-
|
||
encrypt the bit streams. However, digital semantic commu- municationsystemscanimprovedatasecurityandalleviatethe
|
||
nication systems introduce unavoidable quantization errors leveling-off and cliff-edge effects by transmitting part infor-
|
||
due to the process of quantizing continuous signals to dis- mation with the continuous signals in analog communications
|
||
crete signals, which introduces the leveling-off effect. That is, (Q1). Besides, analog and digital semantic communications
|
||
the quality of the decoded source signal is limited because are special cases of HDA semantic communications. By con-
|
||
of the quantization errors. Besides, digital semantic com- trolling the ratio between analog and digital components, the
|
||
munications experience the cliff-edge effect varying from HDA semantic communications not only can be transformed
|
||
different channel conditions, which usually results in a drastic into purely analog or digital semantic communications (Q2)
|
||
degradation in performance at lower SNRs. Therefore, it is but also support the different communication scenarios (Q3).
|
||
imperative to adopt a new semantic communication paradigm The main contributions are summarized as follows:
|
||
that can address the limitations of both analog and digi-
|
||
A novel HDA semantic communication framework is
|
||
tal semantic communications. This paradigm should enhance •
|
||
proposed, which takes advantage of analog and digital
|
||
data security and mitigate the leveling-off and cliff-edge
|
||
semantic communications and addresses the limitations
|
||
effects. However, designing such a semantic communication
|
||
inherent in each.
|
||
system poses several challenges that need to be overcome,
|
||
Based on the HDA semantic communication framework,
|
||
namely, •
|
||
we propose an HDA semantic communication system,
|
||
|
||
---PAGE BREAK---
|
||
|
||
2480 IEEEJOURNALONSELECTEDAREASINCOMMUNICATIONS,VOL.43,NO.7,JULY2025
|
||
Fig.1. Theproposedhybriddigital-analogsemanticcommunicationframework.
|
||
named HDA-DeepSC, for multimedia transmission, in Given an image, I R3 H W, where H and W are the
|
||
× ×
|
||
∈
|
||
which the new analog-digital allocation and fusion mod- height and width of the image. The semantic information can
|
||
ules are proposed to generate the analog and digital be extracted by
|
||
components.Besides,thenewlossfunctionsaredesigned z= (I;α ), (1)
|
||
t
|
||
S
|
||
tocapturethelocalandglobalinformation,alleviatingthe
|
||
where z RM 1 is the semantic information and (;α ) is
|
||
distortions from channels, and balancing the source rate. ∈ × S · t
|
||
denotedasthesemanticencoderwiththeparameterα .Then,
|
||
To further improve the quality of the recovered images, t
|
||
• z is split into two parts with analog-digital allocation module
|
||
we proposed a diffusion-based framework enhanced sig-
|
||
by
|
||
nal detection by designing the variance schedule and
|
||
[z ,z ]= (z;θ ), (2)
|
||
sampling algorithm. A D A t
|
||
Based on extensive simulation results, the proposed wherez andz arethesemanticinformationtransmittedby
|
||
• A D
|
||
HDA-DeepSC outperforms the conventional and DL- the analog transmitter and the digital transmitter, respectively.
|
||
based communication systems and improves the system (;θ ) is analog-digital allocation with parameters θ .
|
||
t t
|
||
A ·
|
||
robustness at the low SNR regime. 1) Analog Transmitter: The encoded symbols for analog
|
||
The rest of this paper is organized as follows. The sys- semantic transmission are represented as
|
||
tem model is introduced in Section II. The HDA semantic
|
||
x = (z ;β ), (3)
|
||
transmission is proposed in Section III. Section IV details the A C A A t
|
||
proposed diffusion-based signal detection. Numerical results where x
|
||
A
|
||
CLA× 1 is the encoded complex symbols and
|
||
∈
|
||
are presented in Section V to show the performance of (;β ) is denoted as the analog channel encoder with the
|
||
C A · t
|
||
the proposed frameworks. Finally, Section VI concludes this parameter β .
|
||
t
|
||
paper. 2) Digital Transmitter: The entropy coding and quantizer
|
||
Notation: Bold-font variables denote matrices or vectors. will be employed firstly to convert z into bit streams by
|
||
D
|
||
Cn m and Rn m represent complex and real matrices of size
|
||
× ×
|
||
n m, respectively. (µ,σ2) means circularly-symmetric b= E ( Q (z D )), (4)
|
||
× CN
|
||
complex Gaussian distribution with mean µ and covariance where b is the bit streams, () and () are denoted as
|
||
σ2. (µ,σ2) means Gaussian distribution with mean µ and the quantizer and entropy enc Q od · er, resp E ec · tively. Then, b is
|
||
N
|
||
covariance σ2. (a,b) means continuous uniform distribution encodedwithdigitalchannelencoders(e.g.,LDPCcodes)and
|
||
U
|
||
between a and b. () ∗ denotes the conjugate operation. x[k] fixed-size constellations (e.g., 16-QAM) by
|
||
·
|
||
represents the k-th element in the vector.
|
||
x = ( (b)), (5)
|
||
D D
|
||
M C
|
||
II. SYSTEMMODEL wherex D CLD× 1 istheencodedsymbols, ()represents
|
||
∈ M ·
|
||
the fixed-size modulation, and () is denoted as the digital
|
||
D
|
||
AsshowninFig.1,weconsiderasingle-inputsingle-output C ·
|
||
channel encoder.
|
||
(SISO)communicationsystem,whichaimstosendmultimedia
|
||
With the analog and digital symbols, the transmitted sym-
|
||
overtheair.TheproposedHDASemComframeworkconsists bols are x = [x ,x ] CL 1, where L = L +L . The
|
||
A D × A D
|
||
of the HDA transmitter, the wireless channel model, and the bandwidth compression ∈ ratio is defined as η = L .
|
||
HDA receiver, which employs both digital semantic transmis- 3 × H × W
|
||
sion and analog semantic transmission.
|
||
B. Wireless Channel Model
|
||
When x is transmitted over the block fading channels, the
|
||
A. The Hybrid Digital-Analog Transmitter
|
||
received signal can be given by
|
||
The HDA transmitter consists of a semantic encoder that
|
||
y =hx+n, (6)
|
||
extracts the semantic information behind images, analog-
|
||
digital allocation that allocates the semantic information for wherehisthechannelcoefficientthatremainsconstantwithin
|
||
analog and digital transmission, and channel encoders that a channel coherence time, n is the additive white Gaussian
|
||
protect the information over the air. noise(AWGN),inwhichn 0,σ2I .FortheRayleigh
|
||
∼CN n L
|
||
(cid:0) (cid:1)
|
||
|
||
---PAGE BREAK---
|
||
|
||
XIEetal.:HYBRIDDIGITAL-ANALOGSEMANTICCOMMUNICATIONS 2481
|
||
fadingchannel,thechannelcoefficientfollowsh (0,1); A. Model Design
|
||
∼CN
|
||
for the Rician fading channel, it follows h µ ,σ2
|
||
∼CN h h TheproposedHDA-DeepSCisshowninFig.2.Thedesign
|
||
with µ h = r/(r+1) and σ h = 1/(r+1), where (cid:0) r is th (cid:1) e of each module is detailed below.
|
||
Rician coefficient. The SNR is defined as E( x 2)/E( n 2). 1) Semantic Codec: The semantic encoder comprises a
|
||
(cid:112) (cid:112) (cid:107) (cid:107) (cid:107) (cid:107)
|
||
convolutional layer and a residual Swin Transformer block.
|
||
C. The Hybrid Digital-Analog Receiver The first convolutional layer projects the images into vector-
|
||
shaped tokens, which are used as inputs to the residual Swin
|
||
The receiver comprises signal detection that estimates the
|
||
Transformer block in a permutation-invariant manner. Then,
|
||
transmittedsymbols,aanalog-digitalfusionmodulethatfuses
|
||
the residual Swin Transformer block consists of several Swin
|
||
the digital and analog semantic information, channel decoders
|
||
Transformerlayersandaconvolutionlayer,inwhichtheSwin
|
||
that alleviate the distortions from the wireless channels, and a
|
||
Transformer layer [33] originates from the Transformer and
|
||
semantic decoder that recovers the images with the received
|
||
introduces the local attention and shifted window mechanism
|
||
semantic information.
|
||
to improve the visual semantic understanding. Besides, a con-
|
||
Withtheleastsquares(LS)signaldetection,thetransmitted
|
||
volutional layer with spatially invariant filters in the residual
|
||
symbols can be estimated by
|
||
block can enhance the translational equivariance. The residual
|
||
h ∗ h ∗ connection allows for aggregation of the shallow and deep
|
||
xˆ = y =x+ n, (7)
|
||
h2 h2 semantic features.
|
||
| | | | Similarly,thesemanticdecoderconsistsoftheresidualSwin
|
||
where xˆ = [xˆ ,xˆ ] represents the estimated symbols. We
|
||
A D Transformerblock,convolutionallayers,andpixelshuffle.The
|
||
assume that h is the perfect CSI. After the signal detection,
|
||
residualblockistoenhancethevisualsemanticunderstanding.
|
||
the semantic features are recovered by the analog and digital
|
||
The residual connection provides a short connection from
|
||
receivers, respectively.
|
||
the semantic encoder to the semantic decoder, allowing the
|
||
1) Analog Receiver: The semantic features transmitted by
|
||
processingofreconstructiontofusevaryinglevelsoffeatures.
|
||
analog communications are estimated by
|
||
The convolutional layers and pixel shuffle form the recon-
|
||
ˆz A = CA− 1(xˆ A ;β r ), (8) s
|
||
u
|
||
t
|
||
p
|
||
ru
|
||
sa
|
||
c
|
||
m
|
||
tio
|
||
p
|
||
n
|
||
les
|
||
m
|
||
t
|
||
o
|
||
h
|
||
d
|
||
e
|
||
u
|
||
f
|
||
l
|
||
e
|
||
e
|
||
a
|
||
,
|
||
tu
|
||
in
|
||
re
|
||
w
|
||
a
|
||
h
|
||
n
|
||
i
|
||
d
|
||
ch
|
||
pix
|
||
th
|
||
e
|
||
e
|
||
ls
|
||
s
|
||
h
|
||
u
|
||
u
|
||
b
|
||
f
|
||
-
|
||
fl
|
||
p
|
||
e
|
||
ix
|
||
r
|
||
e
|
||
e
|
||
l
|
||
al
|
||
c
|
||
lo
|
||
o
|
||
c
|
||
n
|
||
a
|
||
v
|
||
t
|
||
o
|
||
e
|
||
l
|
||
s
|
||
ut
|
||
t
|
||
i
|
||
h
|
||
o
|
||
e
|
||
n
|
||
f
|
||
a
|
||
e
|
||
l
|
||
a
|
||
l
|
||
t
|
||
a
|
||
u
|
||
y
|
||
re
|
||
e
|
||
s
|
||
r
|
||
where zˆ A is the estimated semantic features and CA− 1( · ;β r ) to reconstruct the transmitted images.
|
||
is denoted as the analog channel decoder with parameter β r . 2) Analog-Digital Allocation and Fusion: At the trans-
|
||
2) Digital Receiver: For digital semantic transmission, the mitter, the analog-digital allocation module transforms the
|
||
transmitted bit streams are recovered firstly by original semantic information into essential and auxiliary
|
||
bˆ= CD− 1 M − 1(xˆ D ) , (9) s p e la m y a s n a ti n c im in p fo o r r m tan at t io ro n l . e T in he bu e il s d s i e n n g tia th l e s i e m m a a g n e t s ic an i d nf t o h r e m o at t i h o e n r
|
||
where CD− 1(
|
||
·
|
||
) represents th(cid:0)e digital c(cid:1)hannel decoder and parts of semantic information work to improve the quality of
|
||
1() is denoted as the fixed-size demodulation. Then, the the image. The essential part includes the basic information
|
||
−
|
||
M ·
|
||
semantic features transmitted with digital semantic transmis- about the image, e.g., the low-frequency information, and
|
||
sion are recovered by needs to be delivered accurately and cryptographically. Only
|
||
the essential part cannot be obtained, the image cannot be
|
||
ˆz D = − 1( − 1(bˆ)), (10) built. However, the nature of analog semantic transmission is
|
||
Q E
|
||
where 1() and 1() are denoted as the entropy decoder continuous signals and not compatible with discrete encryp-
|
||
− −
|
||
E · Q · tion algorithms. Therefore, the essential part is transmitted
|
||
and dequantizer, respectively.
|
||
With zˆ and zˆ , the semantic features are fused by accurately by digital communication systems, in which the
|
||
A D
|
||
data encryption methods (e.g., symmetric cryptography and
|
||
ˆz= − 1(ˆz A ,ˆz D ;θ r ), (11) asymmetric cryptography) can be applied to encrypt the bit
|
||
A
|
||
streams to guarantee the data security of the essential part.
|
||
wherezˆ istherecoveredsemanticinformationand 1(;θ )
|
||
A − · r A hyper codec is proposed to extract the essential part of
|
||
isrepresentedastheanalog-digitalfusionmodulewithparam-
|
||
the original semantic information, which is given by
|
||
eters θ .
|
||
r
|
||
Finally, the transmitted image can be reconstructed by z = (z;θ ), (13)
|
||
D t
|
||
H
|
||
Iˆ = S − 1(ˆz;α r ), (12) where H (z;θ t ) is denoted as the hyper encoder. As shown
|
||
in Fig. 2, the hyper encoder employs two convolutional layers
|
||
where 1(;α )representsthesemanticdecoderwithparam-
|
||
S − · r to downsample the original semantic information, such that
|
||
eter α .
|
||
r enables a larger receptive field and extracts the essential
|
||
semantic information.
|
||
III. HYBRIDDIGITAL-ANALOGSEMANTICTRANSMISSION
|
||
Theauxiliaryparthelpsimprovethequalityoftherecovered
|
||
In this section, we design an HDA semantic communica- image,whichistransmittedbyanalogcommunicationsystems
|
||
tionsystem,namedHDA-DeepSC,forheterogeneouswireless withthefollowingbenefits.Analogcommunicationsystemsdo
|
||
communication environments. Then, we develop the new loss not have a cliff effect and are suitable for optimizing systems
|
||
function to train the HDA-DeepSC with the proposed training inanend-to-endmanner.Toextracttheauxiliarypart,wefirst
|
||
algorithm. analyze the entropy of z conditioned on z˜, H(z z˜), which
|
||
|
|
||
|
||
---PAGE BREAK---
|
||
|
||
2482 IEEEJOURNALONSELECTEDAREASINCOMMUNICATIONS,VOL.43,NO.7,JULY2025
|
||
Fig.2. Thestructureoftheproposedhybriddigital-analogsemanticcommunicationsystem.
|
||
qualifies the uncertainty about z when z˜ is known. In other The design of analog-digital allocation and fusion can also
|
||
words, it measures the remaining information of z when z˜ is be viewed as a coarse-to-fine processing. The digital and
|
||
known. The lower bound of H(z z˜) is derived by analog component transmits coarse and auxiliary semantic
|
||
|
|
||
information about the basics and supplements of the image,
|
||
H(z ˜z)=H(z,˜z) H(˜z)
|
||
| − respectively. The receiver fuses the coarse and auxiliary
|
||
H(z) H(˜z), (14) semantic information to obtain fine semantic information,
|
||
≥ −
|
||
which is used to recover the high-fidelity images.
|
||
where the equals hold when z˜ is close to z. z˜ =
|
||
3) Digital Transceiver: The quantizer module rounds ele-
|
||
1 1( (z ));θ is the recovered semantic infor-
|
||
H − Q − Q D r ments of z to the nearest integer, z˜ . Then, the arithmetic
|
||
mation based on essential part without consideration of D D
|
||
(cid:0) (cid:1) coding converts z˜ into bit streams, in which the arithmetic
|
||
transmission errors. 1(;θ ) is denoted as hyper decoder, D
|
||
− r
|
||
H · coding is one kind of entropy coding. The entropy coding
|
||
wheretwoconvolutionallayersareemployedtoupsampleand
|
||
requires the distribution of z˜ in advance. Similarly to [34],
|
||
recover the basic semantic information. D
|
||
we model z˜ using a non-parametric, fully factorized density
|
||
Byobserving(14),wecanobtaintheremaininginformation D
|
||
model by
|
||
of z when z˜ is known, i.e., the auxiliary part, by
|
||
1 1
|
||
z A =z − ˜z, (15) p(˜z D | ψ)= p ˜zD[i] | ψ[i] ψ[i] ∗U −2 , 2 (˜z D [i]),
|
||
where z A is transmitted by analog communications. The (cid:89) i (cid:18) (cid:18) (cid:19)(cid:19) (17)
|
||
derivation is in Appendix A.
|
||
At the receiver, the analog-digital fusion module is where ψ[i] is the parameters of each univariate distribution
|
||
employed to obtain the fine semantic information by fusing p . Like most cases, we model the quantization errors
|
||
the essential and auxiliary parts, which is given by w z˜ i D th [i] t | h ψ e [i] uniform distribution. Therefore, we convolve each
|
||
ˆz= − 1(ˆz A ,ˆz D ;θ r )= − 1(ˆz D ;θ r )+ˆz A , (16) non-parametric density with a standard uniform density to
|
||
A H better match the prior of z˜ .
|
||
D
|
||
where 1(;θ ) shares the same weights with the hyper For digital channel codec and modulation, we adopt the
|
||
− r
|
||
H ·
|
||
decoder in the transmitter. adaptive modulation and coding for different SNRs.
|
||
|
||
---PAGE BREAK---
|
||
|
||
XIEetal.:HYBRIDDIGITAL-ANALOGSEMANTICCOMMUNICATIONS 2483
|
||
4) Analog Transeiver: The analog channel codec aims to p(z)logp(z)dz
|
||
−
|
||
compress the semantic features and transmit them effectively
|
||
(cid:90)
|
||
over the air. Similarly to the previous works [35], the analog = p(z)p(ˆz z)logq(z ˆz ,θ )dzdˆz +H(z)
|
||
D D r D
|
||
channel codec mainly employs the fully connected layers | |
|
||
(cid:90)
|
||
to transmit the semantic information due to global semantic =E E [logq(z ˆz ,θ )]+H(z).
|
||
z
|
||
∼
|
||
p(z) ˆzD∼ p(ˆzD| z)
|
||
|
|
||
D r
|
||
information preservation. Compared with the convolutional (20)
|
||
neural network (CNN) layer to capture the local information,
|
||
where the inequation follows KL[p(z zˆ ),q(z zˆ ,θ )] 0,
|
||
the dense layer is good at capturing global information and | D | D r ≥
|
||
in which KL[, ] is the Kullback-Leibler (KL) divergence and
|
||
preserving the entire attributes, which follows the target of · ·
|
||
q(z zˆ ,θ ) is the variational approximation of p(z zˆ ).
|
||
the analog channel codec. This can enhance the system’s | D r | D
|
||
For the sake of argument, assume for a moment that the
|
||
robustness to channel noise.
|
||
likelihood is given by
|
||
B. Loss Function Design q(z ˆz D ,θ r )= z,(2λ z ) − 1I , (21)
|
||
| N
|
||
The wireless multimedia transmission problem can be
|
||
(cid:16) (cid:17)
|
||
viewed as the classical rate-distortion optimization problem, where z = 1(ˆz ;θ ). The log-likelihood then works out
|
||
− D r
|
||
which includes distortion and rate constraints. H
|
||
to be the squared difference between z and z weighted by λ .
|
||
z
|
||
1) Loss Function Design for Distortion Constraints: The
|
||
Then, the I(z,ˆz ) can be rewritten as
|
||
D
|
||
distortion constraint can be categorized into semantic and
|
||
(cid:98)
|
||
channel distortion constraints. For semantic distortion con- I(z,ˆz ) λ E z z 2 +H(z)+constant. (22)
|
||
D z
|
||
straint, except for the pixel difference considered in most ≥− (cid:107) − (cid:107)
|
||
(cid:104) (cid:105)
|
||
works, we further introduce the frequency difference of the Submitting (22) into (19) and omitting the constant, the
|
||
images. The designed loss function for semantic distortion can be written as
|
||
CD
|
||
L
|
||
constraint is given by
|
||
E[ z ˆz ]+λ E z z 2 H(z). (23)
|
||
=E I Iˆ 2+λ (I) (Iˆ) , (18) L CD ≈ (cid:107) − (cid:107) z (cid:107) − (cid:107) −
|
||
SD
|
||
L (cid:107) − (cid:107) F|F −F | (cid:104) (cid:105)
|
||
(cid:104) (cid:105) If we freeze the semantic codec during training, H(z) can be
|
||
where λ is the weight and () represents the Fourier
|
||
F F · technically dropped out from CD .
|
||
transform. The first item in (18) refers to the pixel difference L
|
||
2) Loss Function Design for Rate Constraints: For rate
|
||
of the image, we assume that the pixels of the image follow
|
||
constraint, the analog transmitter designs the fixed-length
|
||
the Gaussian distribution without loss of generality and thus
|
||
output. Therefore, we consider the rate constraint for the
|
||
employ the mean-square error (MSE) loss. The second item
|
||
digital transmitter, which is given
|
||
in (18) refers to the frequency difference of the image, we
|
||
considerthelearningoflong-rangedependenciesoftheimage =E[ log(p(˜z ψ)))], (24)
|
||
Rate D
|
||
L − |
|
||
and design the Fourier-based loss function. In detail, we
|
||
map the images into the frequency domain and compare the where p(z˜ D ψ) is given in (17). By minimizing the rate
|
||
|
|
||
difference between the original and transmitted images. The constraint, we can optimize the distribution of z˜ D and reduce
|
||
reasons behind the design can be summarized as the number of bits generated by the arithmetic coding.
|
||
The MSE loss guides the neural networks to recover
|
||
•
|
||
the local pixels of the images by comparing the pixel C. Training Details
|
||
difference,whichignoresthelong-rangedependenciesof
|
||
The proposed training algorithm is shown in Algorithm 1.
|
||
the image.
|
||
We adopt three-stage training methods. The first stage is
|
||
The Fourier-based loss can help the neural network learn
|
||
• the long-range dependencies of the image. Because the to train the semantic codec with the L SD , which enables
|
||
effectivesemanticextraction.Afterthesemanticcodecfinishes
|
||
same frequency in the frequency domain refers to the
|
||
training, the second stage is to train the hybrid transceiver
|
||
different pixels at the different positions of the image.
|
||
with +λ , which aims to reduce the distortions
|
||
CD r Rate
|
||
For the channel distortion constraint, we consider the L L
|
||
from physical channels as well as the number of bit streams.
|
||
distortions from channels and the transmission of essential
|
||
We can drop out the H(z) in since we freeze the
|
||
CD
|
||
information. The designed loss function is given by L
|
||
semantic codec during training. The non-differentiable opera-
|
||
=E[ z ˆz ] I(z,ˆz ), (19) tions, e.g., the quantization, entropy coding, and modulation,
|
||
CD D
|
||
L (cid:107) − (cid:107) −
|
||
will block the gradient back-propagation from receiver to
|
||
where the first item minimizes the distortions from chan-
|
||
transmitter. Therefore, we substitute additive uniform noise
|
||
nels and the second item maximizes the mutual information
|
||
for the non-differentiable operations itself during training,
|
||
between z and zˆ to make zˆ contains more information
|
||
D D i.e., z˜ = z +u in line 10 of Algorithm 1. Besides, we
|
||
D D
|
||
of z. However, directly optimizing the I(z,zˆ ) is hard. We
|
||
D choose the error-free transmission for the z˜ due to two
|
||
D
|
||
derive the lower bound of I(z,zˆ ) by
|
||
D factors, one is that the number of generated bit streams is
|
||
p(z ˆz ) muchsmallerthantheconventionalsourcecoding,e.g.,JPEG;
|
||
I(z,ˆz )= p(z,ˆz )log | D dzdˆz
|
||
D D p(z) D another one is the accurate bit transmission characteristic of
|
||
(cid:90)
|
||
digital communication. Finally, we train the whole network
|
||
≥ p(z,ˆz D )logq(z | ˆz D ,θ r )dzdˆz D with L SD +λ r L Rate to improve the quality of the recovered
|
||
(cid:90)
|
||
|
||
---PAGE BREAK---
|
||
|
||
2484 IEEEJOURNALONSELECTEDAREASINCOMMUNICATIONS,VOL.43,NO.7,JULY2025
|
||
Algorithm 1 HDA-DeepSC Training Algorithm
|
||
1 1
|
||
1
|
||
1
|
||
1
|
||
1
|
||
1 1
|
||
1 1
|
||
2
|
||
2
|
||
2
|
||
2
|
||
2
|
||
2
|
||
2
|
||
2
|
||
2
|
||
2
|
||
3
|
||
3
|
||
3
|
||
3
|
||
3
|
||
1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6
|
||
7
|
||
8
|
||
9
|
||
0 1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6 7
|
||
8 9
|
||
0
|
||
1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6
|
||
7
|
||
8
|
||
9
|
||
0
|
||
1
|
||
2
|
||
3
|
||
4
|
||
F
|
||
F
|
||
F
|
||
u
|
||
u
|
||
u
|
||
n
|
||
n
|
||
n
|
||
)( c e d o C ci t n a m e S ni a r T : n oi t c
|
||
I . t e s a t a d m o rf el p m a S : t u p nI
|
||
α ; I ( = z ) , t
|
||
S
|
||
ˆ 1 α ; z ( ) = I , − r S
|
||
e t u p m h ) 8 t o i 1 w C ( , D S
|
||
L (cid:127) α α , h ti w t n e c s e d t n ei d a r n G i a r T . D S t r L
|
||
1 α ; ) ( ) α ; ( d n a . : n r u t e R − t r S · · S )( r e vi e c s n a r T di r b y H ni a r T : n oi t c
|
||
m e S el p m a s d n a c e d o c ci t n a m e s e z e e r F : t u p nI
|
||
z . s e r u t a ef
|
||
: r e t ti m s n a r T
|
||
n oi t a c oll A l a ti gi D - g ol a n A / /
|
||
θ ; z ( = z ) , D t
|
||
H (cid:127) 1 1 ˜ , u u z = + z , , D D 2 2 U − 1 ˜ ˜ θ z ( ) ; = z , − D r
|
||
M̂ H ˜z z = z . A
|
||
− r e t ti m s n a r T l a ti gi D / /
|
||
˜z si d t n ei d a r g di o v a o t e t e i m rf - s r n o a r r r e T D
|
||
r e t ti m s n a r T g ol a n A / /
|
||
β z = x ( ) ; , A A A t C , n oi t a zil a m r o n r e w o P
|
||
x .ri a t e i h m t s r n e a v r o T A : r e vi e c e R
|
||
˜ y z e vi e h d c t ) e n i 6 w R a ( . D A
|
||
r e vi e c e R l a ti gi D / /
|
||
˜ ˆ z = z , D D
|
||
(cid:127) 1 ˆ = θ z z ( ) ; . − D r H
|
||
r e vi e c e R g ol a n A / /
|
||
ˆx y b n oi t c e t e d t l a e n g g ) o 7 i S t ( , A
|
||
1 ˆ ˆ z β x = ( ) ; . − A A r A C
|
||
n oi s u F l a ti gi D - g ol a n A / /
|
||
(cid:127) ˆ ˆ + z z = z . A
|
||
λ + e t u p m h ) ) d 4 3 t o n i 2 2 w C a ( ( . et D a C R r
|
||
L L(cid:127)
|
||
β β θ θ , , , h ti w t n e c s e d t n ei d a r n G i a r T t r t r
|
||
λ + . et D a C R r
|
||
L L
|
||
1 θ ; ( β β ; ; ( ) ( ) ) d n a , , , : n r u t e R − A t t r A · H C · · C
|
||
1 θ ; ( ) .
|
||
− r
|
||
· H
|
||
)( k r o w t e N el o h W ni a r T : n oi t c
|
||
I . t e s a t a d m o rf el p m a S : t u p nI
|
||
ˆI . t e g o t 3 d n a , 8 2 - 8 , 2 s e nil t a e p e R
|
||
λ + e t u p m h ) ) d 8 4 t o n i 1 2 w C a ( ( . et D a R S r
|
||
L L (cid:127) α β β α θ θ , , , , n e c s e d t n ei d a r n G i a r T , t t r r t r
|
||
λ + .
|
||
et D a R S r
|
||
L L
|
||
. C S p e e D - A D H e h T : n r u t e R
|
||
a
|
||
p a
|
||
t
|
||
n
|
||
p
|
||
w
|
||
ci t
|
||
a e
|
||
h ti
|
||
.r
|
||
Algorithm 2 HDA-DeepSC Inference Algorithm
|
||
image and reduce the number of bit streams in an end-to-end
|
||
manner, which converges to the global optimization.
|
||
When the whole network has been trained, we can employ
|
||
the model to transmit the image wirelessly. The inference
|
||
algorithm is presented in Algorithm 2. We remove the addi-
|
||
tive uniform noise and replace it with the non-differentiable
|
||
operations.
|
||
The three-stage training algorithm ensures that each stage
|
||
can converge to the local optimum and avoids the mismatch
|
||
of gradient descent. Besides, the approximate quantized noise
|
||
1
|
||
1
|
||
1
|
||
1 1
|
||
1
|
||
1
|
||
1
|
||
1
|
||
1 2
|
||
2 2
|
||
2
|
||
2
|
||
2
|
||
2
|
||
2
|
||
1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6 7
|
||
8
|
||
9
|
||
0
|
||
1
|
||
2
|
||
3 4
|
||
5
|
||
6
|
||
7
|
||
8
|
||
9 0
|
||
1 2
|
||
3
|
||
4
|
||
5
|
||
6
|
||
7
|
||
F u n )( e c n e r ef nI C S p e e D - A D H n oi t c
|
||
I . t e s a t a d m o rf el p m a S : t u p nI
|
||
: r e t ti m s n a r T
|
||
α ; I ( = z ) . t S
|
||
n oi t a c oll A l a ti gi D - g ol a n A / /
|
||
θ ; z ( = z ) , D t H
|
||
˜ z ( = ) z , D D Q 1 1 ˜ ˜ = z θ z ; ( ( ) ) , − − D r H Q˜z
|
||
z = z . A
|
||
−
|
||
r e t ti m s n a r T l a ti gi D / /
|
||
˜z ( = b ) , D E
|
||
( )) b = x ( . D D C M
|
||
r e t ti m s n a r T g ol a n A / /
|
||
β z = x ( ) ; , A A A t C , n oi t a zil a m r o N r e w o P
|
||
x [ = x x ] , ti t m r s e n v a o r T D A
|
||
: r e vi e c e R
|
||
y h ti w e vi e c ) e 6 R ( .y
|
||
b n oi t c e t e d t l a e n g g ) o 7 i S t (
|
||
r e vi e c e R l a ti gi D / /ˆ 1 1 ˆ = b x ( ) , − − D D C M
|
||
ˆ 1 1 ˆ )) b = ( ( z , − − D M̂ Q E (cid:127) 1 ˆ = θ z z ( ) ; . − D r
|
||
H
|
||
r e vi e c e R g ol a n A / /
|
||
1 ˆ ˆ z β x = ( ) ; . − A A r A C
|
||
n oi s u F l a ti gi D - g ol a n A / /
|
||
(cid:127) ˆ ˆ z = z + z , A
|
||
ˆ 1 ˆ α = ; z ( ) I . − r
|
||
S ˆI . : n r u t e R
|
||
h
|
||
ˆx
|
||
e
|
||
A
|
||
:
|
||
.ri a
|
||
n a d ˆx D .
|
||
helps avoid the disappearing gradient, which enables end-to-
|
||
end training. Moreover, the inference algorithm indicates that
|
||
the digital component can adopt the encryption algorithm to
|
||
protect the digital bits and the adaptive modulation coding
|
||
against channel distortions.
|
||
IV. DIFFUSIONFRAMEWORKENHANCED
|
||
SIGNALDETECTION
|
||
This section provides an overview of the de-noising dif-
|
||
fusion framework and its background. Subsequently, we
|
||
introduce a novel diffusion-based signal detection method
|
||
called DiffSDNet. DiffSDNet is developed by incorporating
|
||
a carefully designed variance schedule into the training and
|
||
sampling algorithms. The diffusion-based de-noise module is
|
||
the optional part of the HDA-DeepSC, which can further
|
||
improve the robustness of the HDA-DeepSC.
|
||
A. De-Noising Diffusion Framework
|
||
Given a random noise as input, the denoising diffusion
|
||
framework [36] models the generative processing through
|
||
multiple de-noising steps. Each step iteratively enhances the
|
||
generative results by removing the predicted noise, akin to
|
||
Langevin dynamics. The de-noising diffusion framework is
|
||
divided into forward process and reverse process.
|
||
|
||
---PAGE BREAK---
|
||
|
||
XIEetal.:HYBRIDDIGITAL-ANALOGSEMANTICCOMMUNICATIONS 2485
|
||
1) Forward Process: The forward process is fixed to a B. The Proposed De-Noising Diffusion-Based Signal
|
||
MarkovchainwithT stepsthatgraduallyaddsGaussiannoise Detection
|
||
tothedataaccordingtoavariancescheduleγ , ,γ ,which
|
||
1 T The detected signals in (7) can be rewritten as
|
||
···
|
||
is given by
|
||
xˆ =x+n˜, (30)
|
||
v(0)
|
||
→
|
||
v(1)
|
||
→
|
||
v(2)
|
||
→···→
|
||
v(T − 1)
|
||
→
|
||
v(T), (25)
|
||
where n˜ =
|
||
h∗
|
||
n is an effective noise after the signal detec-
|
||
h2
|
||
where v(0) is the input information, p v(t) v(t 1) =
|
||
tion.Weemp|lo|ytheblock-fadingchannelmodelin(6),where
|
||
−
|
||
the h keeps constant. Therefore, the n˜ follows a circularly
|
||
1 γ(t)v(t 1),γ(t)I , and p(v(T)) is modeled with
|
||
N − − (cid:0) (cid:12) (cid:12) (cid:1) symmetric complex Gaussian distribution with zero mean and
|
||
((cid:16)0 (cid:112) ,I).Duetothereparam(cid:17)eterizationofnormaldistribution, scaled variance, σ2 =σ2/h2.
|
||
v N (t) can be represented as Since the coeffi n˜ cients n o | f | p v(t) v(t 1) in (25) should
|
||
−
|
||
2
|
||
v(t) = 1 γ(t)v(t 1)+ γ(t)(cid:15)(t) satisfy 1 γ(t) +γ(t) =(cid:0)1, we(cid:12) rewritt(cid:1)en xˆ as
|
||
− − − (cid:12)
|
||
=(cid:112)1 γ¯(t) 2 v(0)+(cid:112)γ¯(t)¯(cid:15)(t), (26)
|
||
(cid:16)(cid:112)
|
||
x˜ =
|
||
(cid:17)
|
||
1 x+ σ n˜ (cid:15), (31)
|
||
− √1+σ √1+σ
|
||
(cid:113) n˜ n˜
|
||
(cid:0) (cid:1)
|
||
where ¯(cid:15)(t) (0,I) and γ¯(t) = 1 (1 γ(t)). where x˜ =xˆ/√1+σ and n˜ =σ (cid:15),(cid:15) (0,I).
|
||
∼ N − t=1 − n˜ n˜ ∼CN
|
||
Observe (26), the forward process recu(cid:113)rrently adds the Gaus- Comparing (31) with (26), we find that the wireless trans-
|
||
(cid:81)
|
||
sian noise step by step to make v(0) approach the normal mission is similar to the forward process. We model x and x˜
|
||
distribution, which can be viewed as the encoding processing in (31) as v(0) and v(t) in (26). It is natural to employ the
|
||
without learnable parameters. reverseprocesstorefinex˜,suchthatobtainsthemoreaccurate
|
||
2) Reverse Process: The reverse process is also defined as x. Given the x˜ and σ n˜ , we adopt (27) to remove the noise
|
||
a Markov chain with T steps starting at v(T), which is given in x˜ to closer the x. However, the existing variance schedule
|
||
by of p v(t) v(t − 1) and sampling algorithm are unsuitable for
|
||
wireless communications. We need to design the variance
|
||
v(T) v(T − 1) v(T − 2) v(1) v(0), (27) sched (cid:0) ule a (cid:12) (cid:12)nd sam (cid:1) pling algorithm by considering the channel
|
||
→ → →···→ →
|
||
SNR.
|
||
where q v(t − 1) v(t) = µ v(t);ω ,σ(t)I . The reverse 1) Variance Schedule Design: A variance schedule refers
|
||
N
|
||
processg (cid:0) enerates (cid:12) the (cid:1) v(t − 1) b (cid:0) as (cid:0) edonv( (cid:1) t),inw (cid:1) hichthemean to the way in which the mean and variance of the added
|
||
of v(t − 1) is mo(cid:12)deled with neural network with the v(t) as noisechangesoverthecourseofthediffusionprocess.During
|
||
input. this process, the mean and variance of the added noise is
|
||
From (26), we can observe that v(t − 1) can be predicted adjustedateachstep,affectingtheamountofnoiseintroduced
|
||
with v(t) and v(0) by removing the added noise. Therefore, at each stage, therefore variance schedule determines how the
|
||
µ v(t);ω can be modeled as noise level evolves during the diffusion process. A variance
|
||
schedulecanimpactthequalityofgeneratedxandthemodel’s
|
||
(cid:0) µ(v(t) (cid:1) ;ω)= 1 v(t) γ(t) (cid:15)(v(t);ω) . (28) convergence behavior.
|
||
1 γ(t) − γ¯(t) The variance schedule should satisfy the γ¯(T) 0. Based
|
||
− (cid:18) (cid:19) ontheconstraint,wedesignthevarianceschedulew → ithT =50
|
||
where (cid:15) v(t);ω (cid:112) predicts the noise added to v(t). From (28), steps, which is given by
|
||
thereverseprocesspredictstheGaussiannoiseateachstepand 0.5t
|
||
thenrem (cid:0) ovesthe (cid:1) predictednoisetorestorethev(0) fromv(T) γ(t) = , (32)
|
||
T
|
||
with learnable parameters, which can be viewed the decoding
|
||
processing. which γ¯(50) e − 6.375 1 . The designed variance schedule
|
||
≈ ≈
|
||
The loss function for the diffusion-based model at step t is includes 50 different noise levels. The reasons behind the
|
||
defined as designed variance schedule can be summarized as
|
||
Compared with the conventional diffusion-based frame-
|
||
2 •
|
||
(t) =E ¯(cid:15)(t) (cid:15) 1 γ¯(t) 2 v(0)+γ¯(t)¯(cid:15)(t);ω,t . workwith1,000stepsforgenerativetasks,weempirically
|
||
LDiff (cid:34)(cid:13) − (cid:18)(cid:113) − (cid:19)(cid:13) (cid:35) find that the de-noise task does not need too many steps
|
||
(cid:13) (cid:0) (cid:1) (cid:13)(29) due to the low complexity of the de-noise task.
|
||
(cid:13) (cid:13)
|
||
(cid:13) (cid:13) Wedesignamonotonicfunctionofγ(t)toachievecoarse-
|
||
•
|
||
During training, we sample the t first and model the v(t) to-finede-noiseprocessing,whichhasanunequalinterval
|
||
with v(0) by adding the Gaussian noise with the scheduled SNR, e.g., a small interval in high SNR regions and a
|
||
variances. large interval in low SNR regions. The unequal interval
|
||
Compared with the previous de-noise frameworks, e.g., SNR can speed up the de-noise processing with fewer
|
||
DnCNN, that predict the noise with only one step, the steps at low SNR regions.
|
||
de-noising diffusion framework can predict the noise with 2) Sampling Algorithm: The sampling algorithm performs
|
||
multiple steps, such that matches the distributions of noise the reverse process by sampling the steps. For example, the
|
||
and achieves better performance of de-noise. Therefore, we conventionaldiffusion-basedframeworkusuallysamples1,000
|
||
propose a de-noising diffusion-based signal detection method. steps from T 0 [36] or 100 steps with the subsequence
|
||
→
|
||
|
||
---PAGE BREAK---
|
||
|
||
2486 IEEEJOURNALONSELECTEDAREASINCOMMUNICATIONS,VOL.43,NO.7,JULY2025
|
||
Algorithm 3 Dynamic Sampling Algorithm
|
||
1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
F u n a n y D n oi t c
|
||
e h T : t u p nI
|
||
˜x e zil ai ti nI
|
||
˜t b e h t d ni F
|
||
(cid:127) ˜t = t r of
|
||
) 1 t ( = v −
|
||
0( v : n r u t e R
|
||
)( g nil p m a S ci m
|
||
˜xl a n gi s d e t c e t e d
|
||
p g ni t r a t s e h t s a
|
||
σ y h ) 3 ti 3 w ( . n˜
|
||
1 o d (cid:127)
|
||
1 ) t ( v √ − )t( γ 1
|
||
−
|
||
)
|
||
o
|
||
:d
|
||
n a
|
||
v, t ni
|
||
)t( γ )t( γ¯
|
||
σ n˜˜)
|
||
t (
|
||
(cid:127) v (
|
||
..
|
||
( t ) ; ω )
|
||
(cid:127)
|
||
of T 0 [37], in which v(T) is the first sampled step.
|
||
→
|
||
However,startingfromv(T) isunsuitableforsignaldetection.
|
||
The detected signals will start from different v(t) where t
|
||
depends on the received SNR at the receiver. Therefore, we
|
||
proposeadynamicsamplingalgorithmshowninAlgorithm3.
|
||
Fig. 3. The PSNR performance comparison between Analog DeepSC and
|
||
Firstly, given the known σ n˜ , we search the starting point t˜ AnalogDeepSCwithdifferentdenoisersontheKodakdataset.
|
||
at the reverse process, which is given by
|
||
Low-density parity check (LDPC) coding and
|
||
1 [γ¯(t˜+1),γ¯(t˜)].
|
||
(33)
|
||
•
|
||
capacity-achieved coding are used for the channel
|
||
√1+σ ∈
|
||
n˜ coding.
|
||
Then, the signal detection aims to recover the transmitted The adaptive modulation and coding (AMC) is
|
||
•
|
||
signals as more accurate as possible. Therefore, we change employed for different SNRs, including 1/2 coding
|
||
the random sampling to deterministic sampling. In detail, ratewithBPSK,1/2codingratewithQPSK,3/4cod-
|
||
we reduce the degree of randomness in the reverse process ing rate with QPSK, 1/2 coding rate with 16QAM,
|
||
by setting σ(t) in (27) equals to zero, which means that and 3/4 coding rate with 16QAM.
|
||
the q v(t − 1) v(t) changes from µ v(t);ω ,σ(t)I to Analog semantic communication systems: The purely
|
||
deterministic µ v(t);ω . N • analogsemanticcommunicationofHDA-DeepSCtrained
|
||
(cid:0) (cid:12) (cid:1) (cid:0) (cid:0) (cid:1) (cid:1)
|
||
(cid:12) with MSE loss.
|
||
(cid:0) (cid:1)
|
||
V. NUMERICALRESULTS Digital semantic communication systems: The
|
||
•
|
||
In this section, we compare the proposed HDA-DeepSC DeepJSCC-Q proposed in [13].
|
||
with DL-based semantic communication systems and digital ConventionalHDAtransmissionsystemswith2Ddiscrete
|
||
•
|
||
communication systems over AWGN and Rician fading chan- cosine transform and scaler quantization [30].
|
||
nels, where we assume the perfect CSI for all schemes. Denoising convolutional neural network (DnCNN) as the
|
||
•
|
||
one-step de-noise benchmark
|
||
The LDPC codes we use are from the 802.11ad standard,
|
||
A. Implementation Details
|
||
with blocklength 672 bits for both the 1/2 and 3/4 rate codes.
|
||
1) The Dataset: We choose the DIK2K dataset [38] for
|
||
The coherent time is set as the transmission time for each
|
||
training, which contains 1,000 images with different scenes.
|
||
image in the simulation. We set r =1 for the Rician channels
|
||
The Kodak dataset is used for testing.
|
||
and h=1 for the AWGN channels. Peak signal-to-noise ratio
|
||
2) Training Settings: The semantic codec consists of 6
|
||
(PSNR) and multi-scale structural similarity (MS-SSIM) are
|
||
Swin-Transformer layers, respectively. Each layer is with 6
|
||
used as the metrics to measure the local and global quality of
|
||
heads and a width of 120. The diffusion-based model adopts
|
||
images. The unit of MS-SSIM is dB by
|
||
the structures of OpenAI-UNet. The λ , λ , and λ is 0.1,
|
||
z r
|
||
0.1, and 0.0005, respectively. The learn F ing rate is 2 × 10 − 4. MS − SSIM(dB)= − 10log 10 (MS − SSIM). (34)
|
||
ThedeviceforsimulationconsistsofIntelR XeonR Platinum
|
||
(cid:13) (cid:13)
|
||
8352V and the NVIDIA GeForce RTX 4090. The encryption B. Denoising Networks Comparisons
|
||
algorithm is AES encryption.
|
||
Fig. 3 presents the PSNR performance for the analog
|
||
3) Benchmarks and Performance Metrics: We adopt the
|
||
DeepSCwithdifferentdenoisers.Firstobservethattheanalog
|
||
separatesource-channelcoding,theDL-basedanalogsemantic
|
||
DeepSC with denoiser has a larger PSNR than that without
|
||
communication system, the DL-based digital semantic com-
|
||
denoiserinthelowSNRregimes.Thisvalidatestheeffective-
|
||
municationsystem,andtheone-stepdenoisingnetworkasthe
|
||
ness of the denoiser in reducing the noise level. For the small
|
||
benchmarks, which are detailed as follows.
|
||
noise level at the high SNR regimes, the analog DeepSC is
|
||
Separate source-channel coding: Employ the source and capableofrestoringthesignalsthereforeallmethodsachievea
|
||
•
|
||
channel coding separately to transmit the images, we use similar PSNR as the SNR increases. Furthermore, we observe
|
||
the following technologies, respectively: thattheanalogDeepSCwithDiffSDNetoutperformsthatwith
|
||
Better Portable Graphics (BPG) for image source DnCNN with 0.6dB in terms of PSNR. This suggests that the
|
||
•
|
||
coding, the state-of-the-art image compression multiple-step denoiser has a stronger power of denoising than
|
||
method. the one-step denoiser.
|
||
|
||
---PAGE BREAK---
|
||
|
||
XIEetal.:HYBRIDDIGITAL-ANALOGSEMANTICCOMMUNICATIONS 2487
|
||
Fig.4. ComparisonbetweenHDA-DeepSCandtheAnalogDeepSC,DeepJSCC-Q,andBPGwithdifferentchannelcodingontheKodakdatasetoverAWGN
|
||
channels.
|
||
Fig.5. ComparisonbetweenHDA-DeepSCandtheAnalogDeepSC,DeepJSCC-Q,andBPGwithdifferentchannelcodingontheKodakdatasetoverRician
|
||
channels.
|
||
TABLEI channels with a 1/6 bandwidth compression ratio. For AWGN
|
||
THEPSNRCOMPARISONBETWEENTHEANALOGDEEPSCWITH channels, we can see in Fig. 4 that our HDA-DeepSC out-
|
||
DIFFERENTDIFFUSION-BASEDDENOISERSATSNR=0DB performs all the benchmarks. This indicates that the discrete
|
||
signalsof the digital component canaccurately delivercrucial
|
||
semantic information for details recovery and the continuous
|
||
signals of the analog component can prevent the leveling-off
|
||
and cliff-edge effects for lower quantization errors. Besides,
|
||
the HDA-DeepSC achieves the best performance in terms of
|
||
MS-SSIM,whichmeansthattheimagestransmittedbyHDA-
|
||
TableIshowsthecomparisonbetweenanalogDeepSCwith DeepSC have better global quality. This is likely because we
|
||
DDPM and DiffSDNet. The proposed DiffSDNet can achieve introducetheFourier-basedlossfunctionthatmakesthemodel
|
||
higher PSNR with fewer sampling steps than the DDPM, learn the long-distance dependencies. For the Rician channel
|
||
confirmingtheeffectivenessofthedesignedvarianceschedule case shown in Fig. 5, we observe that the DL-based analog
|
||
and sampling algorithm. Especially, the PSNR of analog systems are more robust to channel changes due to the high
|
||
DeepSCwithDDPMwilldecreaseasthenumberofsampling degree of freedom in continuous signals, in which the HDA-
|
||
steps increases. This is due to the high degree of randomness DeepSC is beneficial from the analog component. Moreover,
|
||
introduced in the reverse process. thelowbandwidthconsumptionofthedigitalpartallowsusto
|
||
uselow-ratechannelcodingtoachieveaccuratedeliverywhile
|
||
transmitting a small number of symbols, such as ensuring
|
||
C. Communication System Comparisons robustness in the low SNR regimes. This is the reason why
|
||
Figs. 4 and 5 report the PSNR and MS-SSIM comparison we assume error-free transmission while training the digital
|
||
betweenthevariousmethodsoverAWGNchannelsandRician part. Besides, if the communication environment is terrible,
|
||
|
||
---PAGE BREAK---
|
||
|
||
2488 IEEEJOURNALONSELECTEDAREASINCOMMUNICATIONS,VOL.43,NO.7,JULY2025
|
||
Fig.6. PSNRandMS-SSIMperformancefordifferentbandwidthcompressionratiosontheKodakdatasetoverAWGNchannels.
|
||
Fig.7. PSNRandMS-SSIMperformancefordifferentbandwidthcompressionratiosontheKodakdatasetoverRicianchannels.
|
||
TABLEII codec and train the semantic codec with MSE loss function.
|
||
THEABLATIONSOFFOURIER-BASEDCOMPONENT:MSELOSS, The Fourier-based module or loss can improve the quality
|
||
MSELOSSWITHFOURIER-BASEDMODULE,ANDMSE of images with more than 2dB in terms of PSNR and MS-
|
||
LOSSWITHFOURIER-BASEDLOSS
|
||
SSIM due to the long-distance dependencies learning in the
|
||
frequencydomain.Besides,weobservethattheFourier-based
|
||
loss can largely increase MS-SSIM than the Fourier-based
|
||
module. The reason behind that is the Fourier-based module
|
||
introduces the additional Fourier-based parameters making it
|
||
challenging to further improve its performance. This suggests
|
||
that Fourier-based loss can directly capture the global infor-
|
||
mation of images without additional parameters and hence as
|
||
in which the digital signals cannot be successfully decoded, an attractive loss to improve the global quality of images.
|
||
this system will experience the cliff-edge effect due to the
|
||
employed entropy coding. This can be improved in several
|
||
D. Bandwidth Compression Ratio Comparisons
|
||
ways. One is to replace the entropy coding module with the
|
||
learning-based quantization module. Another is to introduce Figs. 6 and 7 demonstrate the comparisons for different
|
||
error transmission during training. Both methods can lead the bandwidth compression ratios over AWGN and Rician chan-
|
||
model to learn to correct the errors in digital transmission. nels at SNR=10 dB. The HDA-DeepSC outperforms all the
|
||
Visual examples are presented in Appendix B. benchmarks in terms of PSNR and MS-SSIM. For example,
|
||
In Table II, we study the ablations of Fourier-based com- the HDA-DeepSC achieves the same PSNR as separate cod-
|
||
ponents by only considering the semantic codec, in which the ings (the BPG with 1/2 LDPC and 16QAM) with a 33%
|
||
MSE loss with Fourier-based module means that we insert improvement on bandwidth compression ratio. This suggests
|
||
the pluggable Fourier-based modules [39] into the semantic that the HDA-DeepSC can provide a higher data transmission
|
||
|
||
---PAGE BREAK---
|
||
|
||
XIEetal.:HYBRIDDIGITAL-ANALOGSEMANTICCOMMUNICATIONS 2489
|
||
Fig.8. PSNRandMS-SSIMperformancefordifferentdigital-analogratiosontheKodakdatasetoverAWGNchannels.
|
||
Fig. 9. Visualized examples for different methods transmitted over AWGN channels at SNR=10dB: (a) original image; (b) image recovered by BPG with
|
||
1/2LDPCand16QAM;(c)imagerecoveredbyHDA-DeepSCwith0.2DAratiousingunencryptedbits;(d)-(f)imagerecoveredbyHDA-DeepSCwith0.2,
|
||
0.87,and3DAratiousingencryptedbits,respectively.
|
||
rate than the benchmarks for a given PSNR or MS-SSIM. TABLEIII
|
||
Besides, we find that the learning-based methods outperform THEPSNRPERFORMANCEFORTHEENCRYPTEDANDUNENCRYPTED
|
||
theBPGintermsofMS-SSIM,indicatingtheneuralnetworks BITSOVERAWGNCHANNELSATSNR=10DB
|
||
operate as the better content generator, thereby generating the
|
||
image with global consistency.
|
||
E. Digital-Analog Ratio Comparisons
|
||
Fig.8showsthecomparisonsacrossdifferentdigital-analog
|
||
(DA) ratios by changing the ratio between the number of information. This suggests that the analog transmitter oper-
|
||
transmitted symbols of digital and analog components, where ates as a continuous signal-based system, thereby effectively
|
||
the total number of transmitted symbols is fixed. The larger reducing the quantization errors by decreasing the DA ratio.
|
||
DA ratio means more semantic information is transmitted
|
||
with the digital transmitter and vice versa. We can observe
|
||
F. Data Security
|
||
that the PSNR and MS-SSIM decrease as the DA ratio
|
||
increases, which is caused by the unavoidable quantization Table III reports the PSNR performance for the encrypted
|
||
errorsintroducedbythedigitaltransmitter.Themoresemantic and unencrypted bits, where these terms refer to whether the
|
||
information transmitted through the digital transmitter, the encryption algorithm encrypts the bit streams transmitted by
|
||
larger the quantization errors introduced to the transmitted the digital transmitter. We assume that the eavesdropper is
|
||
|
||
---PAGE BREAK---
|
||
|
||
2490 IEEEJOURNALONSELECTEDAREASINCOMMUNICATIONS,VOL.43,NO.7,JULY2025
|
||
TABLEIV methodtoreducethebitbudgetoflearnedconstellations,such
|
||
THERUNNINGTIMEPERIMAGECOMPARISONBETWEEN as achieving low-precision pseudo-analog transmission. The
|
||
THEHDA-DEEPSCANDBPG cost is slight performance degradation.
|
||
VI. CONCLUSION
|
||
Inthispaper,wehaveintroducedaninnovativeHDAseman-
|
||
tic communication framework that combines the strengths
|
||
of analog and digital semantic communications. Our frame-
|
||
incapable of decoding the encrypted bits and only decodes
|
||
work aims to overcome the inherent limitations associated
|
||
thesemanticinformationtransmittedbytheanalogtransmitter,
|
||
with each approach. Building upon the framework, we intro-
|
||
wheretheHDA-DeepSCmodelisknowntotheeavesdropper.
|
||
duced a robust HDA semantic communication system called
|
||
From Table III, the PSNR of encrypted bits is 20dB lower
|
||
HDA-DeepSC, specifically designed for multimedia transmis-
|
||
compared to that of unencrypted bits, indicating the images
|
||
sion. HDA-DeepSC leverages digital communication methods
|
||
recovered by encrypted bits are little like the original ones. In
|
||
to transmit crucial semantic information, ensuring accurate
|
||
other words, the eavesdropper obtains less information from
|
||
delivery and data security. Additionally, it utilizes analog
|
||
thesemanticinformationtransmittedbytheanalogtransmitter.
|
||
communication methods to transmit auxiliary semantic infor-
|
||
Besides, the PSNR of encrypted bits slightly decreases as
|
||
mation, effectively mitigating the leveling-off and cliff-edge
|
||
the DA ratio increases. This suggests that the HDA-DeepSC
|
||
effects associated with traditional approaches. We also intro-
|
||
effectively safeguards data with few bits while achieving the
|
||
ducedanalog-digitalallocationandfusionmodulestoseparate
|
||
high PSNR. Visual examples are presented in Figs. 9(c)-(f),
|
||
and fuse the digital and analog components, respectively.
|
||
where Figs. 9(d)-(f) are the images recovered by encrypted
|
||
Besides, we have designed the Fourier-based loss function to
|
||
bits.Interestingly,theessentialinformationisprotectedbythe
|
||
guide the model in learning the long-distance dependencies
|
||
HDA-DeepSC,e.g.thecolor,thebackground,andthetextures,
|
||
and combined the rate constraint with the non-parametric,
|
||
which proves the effectiveness of the HDA-DeepSC in data
|
||
fully factorized density model. Moreover, we have proposed
|
||
security.
|
||
the diffusion framework enhanced signal detection, named
|
||
DiffSDNet, by multiple denoising steps to reduce the noise
|
||
G. Computational Complexity level at the low SNR regimes, in which we customized the
|
||
The proposed HDA-DeepSC adopts the Swin-Transformer variance schedule and sampling algorithm for wireless com-
|
||
as the semantic codec, in which the window multi-head munication environments. The numerical results have proved
|
||
self-attention (W-MSA) module has high computational com- the effectiveness of DiffSDNet in denoising and demonstrated
|
||
plexity. The computational complexity of W-MSA is O(N the superiority of HDA-DeepSC in terms of robustness, trans-
|
||
h w (4C2+2M2C)),inwhichN,C,andM arethenumb × er missionrate,anddatasecurity,especiallyinlowSNRregimes.
|
||
of × lay × ers, the width of the layer, and the number of patches, Therefore,theproposedHDAsemanticcommunicationframe-
|
||
respectively. The channel codec consists of several dense workshowsgreatpromiseasacandidateforthenewsemantic
|
||
layers,thecomputationalcomplexityofwhichisalsolinearin communication paradigm, offering significant potential for
|
||
thenumberofpixels.Therefore,thecomputationalcomplexity real-world implementations.
|
||
of the proposed HDA-DeepSC is linear encoding/decoding
|
||
time in the number of pixels. To complete our discussion APPENDIXA
|
||
of computational complexity, we have measured the average DERIVATIONOF(15)
|
||
running time per image which is shown in Table IV. We can
|
||
Assume the x , i = 1,2, ,N follows the N i.i.d. Gaus-
|
||
i
|
||
observe that the running time of HDA-DeepSC on the CPU ···
|
||
siansources(variables)withzeromeanandvarianceσ normal
|
||
i
|
||
is slightly slower than that of BPG on the CPU. However,
|
||
distribution,thenthediscreteentropyofx=[x ,x , ,x ]
|
||
1 2 N
|
||
the GPU can significantly accelerate the running time of ···
|
||
can be written as
|
||
HDA-DeepSC, which means it can effectively support some
|
||
delay-sensitive applications. H(x)= − E x ∼ p(x) [log 2 p(x)]= − E x ∼ p(x) log 2 i Π = N 1 p(x i )
|
||
(cid:20) (cid:21)
|
||
H. Discussion of Hardware Implementation N x 2
|
||
= E log 2πσ2 i
|
||
It is possible nowadays to implement analog systems with −
|
||
i=1
|
||
xi∼ p(xi)
|
||
(cid:20)
|
||
2 i − 2σ
|
||
i
|
||
2
|
||
(cid:21)
|
||
high-precision digital circuits, called pseudo-analog transmis- (cid:88) (cid:0) (cid:1)
|
||
N x 2 N
|
||
sion. For example, the pseudo-analog system SoftCast [40] = E i log 2πσ2 . (35)
|
||
does not adopt the conventional constellations but modulates
|
||
i=1
|
||
xi∼ p(xi)
|
||
(cid:20)
|
||
2σ
|
||
i
|
||
2
|
||
(cid:21)
|
||
−
|
||
i=1
|
||
2 i
|
||
thenormalized2DdiscreteFouriercoefficientstothetransmit- (cid:88) (cid:88) (cid:0) (cid:1)
|
||
With the (35), we can derive the following relationship,
|
||
ted symbols directly. There are a lot of follow-up efforts, and
|
||
someofthem[40],[41]havebeenvalidatedonsoftwareradio z 2 ˜z2
|
||
p
|
||
(O
|
||
la
|
||
F
|
||
tf
|
||
D
|
||
o
|
||
M
|
||
rm
|
||
)
|
||
s
|
||
. T
|
||
w
|
||
h
|
||
i
|
||
e
|
||
th
|
||
ref
|
||
o
|
||
o
|
||
r
|
||
r
|
||
t
|
||
e
|
||
h
|
||
,
|
||
og
|
||
it
|
||
on
|
||
is
|
||
al
|
||
fe
|
||
f
|
||
a
|
||
r
|
||
s
|
||
e
|
||
i
|
||
q
|
||
b
|
||
u
|
||
le
|
||
en
|
||
t
|
||
c
|
||
o
|
||
y
|
||
a
|
||
d
|
||
c
|
||
i
|
||
h
|
||
v
|
||
i
|
||
i
|
||
e
|
||
s
|
||
v
|
||
io
|
||
e
|
||
n
|
||
hy
|
||
m
|
||
b
|
||
u
|
||
ri
|
||
l
|
||
d
|
||
tip
|
||
a
|
||
l
|
||
n
|
||
ex
|
||
a
|
||
i
|
||
lo
|
||
n
|
||
g
|
||
g H(z) − H(˜z)=
|
||
i=1(cid:18)
|
||
E zi∼ p(zi)
|
||
(cid:20)
|
||
2σ i
|
||
i
|
||
2
|
||
(cid:21)
|
||
− E ˜zi∼ p(˜zi)
|
||
(cid:20)
|
||
2σ˜ i
|
||
i
|
||
2
|
||
(cid:21)(cid:19)
|
||
(cid:88)
|
||
and digital transmission on one hardware platform. For low- log σ i 2
|
||
precision digital circuits, we can employ the quantization − 2 σ˜2
|
||
i=1 (cid:18) i (cid:19)
|
||
(cid:88)
|
||
|
||
---PAGE BREAK---
|
||
|
||
XIEetal.:HYBRIDDIGITAL-ANALOGSEMANTICCOMMUNICATIONS 2491
|
||
1
|
||
> E z 2 E ˜z2 [14] Y. Bo, Y. Duan, S. Shao, and M. Tao, “Joint coding-modulation for
|
||
2σ zi∼ p(zi) i − ˜zi∼ p(˜zi) i digital semantic communications via variational autoencoder,” IEEE
|
||
i=1
|
||
(cid:88)(cid:0) (cid:2) (cid:3) (cid:2) (cid:3)(cid:1) Trans.Commun.,vol.72,no.9,pp.5626–5640,Sep.2024.
|
||
σ2
|
||
log i , (36) [15] Y.He,G.Yu,andY.Cai,“Rate-adaptivecodingmechanismforsemantic
|
||
− 2 σ˜2 communicationswithmulti-modaldata,”IEEETrans.Commun.,vol.72,
|
||
(cid:88) i=1 (cid:18) i (cid:19) no.3,pp.1385–1400,Mar.2024.
|
||
where σ and σ˜ are the variance of z and˜z , respectively. σ [16] L. Guo, W. Chen, Y. Sun, and B. Ai, “Device-edge digital semantic
|
||
i i i i
|
||
communication with trained non-linear quantization,” in Proc. IEEE
|
||
is the maximum value between σ and σ˜ .
|
||
i i 97thVeh.Technol.Conf.(VTC-Spring),Jun.2023,pp.1–5.
|
||
σ2
|
||
We can observe the second term of (36), i.e., i, is the [17] C. Liu, C. Guo, Y. Yang, W. Ni, and T. Q. S. Quek, “OFDM-based
|
||
σ˜2
|
||
constant. Especially, when z˜ is close to z, the seco i nd term digital semantic communication with importance awareness,” 2024,
|
||
arXiv:2401.02178.
|
||
will be zero. Therefore, we can drop the second term during
|
||
[18] Q.Fuetal.,“Vectorquantizedsemanticcommunicationsystem,”IEEE
|
||
training and only consider the first term of (36). With the WirelessCommun.Lett.,vol.12,no.6,pp.982–986,Jun.2023.
|
||
Monte Carlo method, the entropy can be written as [19] Q.Hu,G.Zhang,Z.Qin,Y.Cai,G.Yu,andG.Y.Li,“Robustsemantic
|
||
communicationswithmaskedVQ-VAEenabledcodebook,”IEEETrans.
|
||
H(z) H(˜z) z2 ˜z2. (37) WirelessCommun.,vol.22,no.12,pp.8707–8722,Dec.2023.
|
||
− ≈ − [20] H. Gao, G. Yu, and Y. Cai, “Adaptive modulation and retransmission
|
||
Consideringthecomputationandtrainingcomplexity,werelax scheme for semantic communication systems,” IEEE Trans. Cognit.
|
||
Commun.Netw.,vol.10,no.1,pp.150–163,Feb.2024.
|
||
the (37) to the subtraction between z and z˜, which is the (15)
|
||
[21] J. Huang, K. Yuan, C. Huang, and K. Huang, “D2-JSCC: Digital
|
||
as follows.
|
||
deepjointsource-channelcodingforsemanticcommunications,”2024,
|
||
arXiv:2403.07338.
|
||
[22] U. Mittal and N. Phamdo, “Hybrid digital-analog (HDA) joint source-
|
||
APPENDIXB
|
||
channel codes for broadcasting and robust communications,” IEEE
|
||
VISUALIZEDRESULTS Trans.Inf.Theory,vol.48,no.5,pp.1082–1102,May2002.
|
||
[23] T.Fujihashi,T.Koike-Akino,andT.Watanabe,“Softdelivery:Survey
|
||
InFig.9(a)-(c),wecanobservetheproposedHDA-DeepSC
|
||
onanewparadigmforwirelessandmobilemultimediastreaming,”ACM
|
||
can restore more details, e.g., the mouth and feathers of Comput.Surv.,vol.56,no.2,pp.1–37,Sep.2023.
|
||
the parrot, than the BPG with LDPC and 16QAM due to [24] M. Skoglund, N. Phamdo, and F. Alajaji, “Hybrid digital–analog
|
||
delivering essential semantic information accurately by the source–channel coding for bandwidth compression/expansion,” IEEE
|
||
Trans.Inf.Theory,vol.52,no.8,pp.3757–3763,Aug.2006.
|
||
digital transmitter.
|
||
[25] M. Ru¨ngeler, J. Bunte, and P. Vary, “Design and evaluation of hybrid
|
||
digital-analog transmission outperforming purely digital concepts,”
|
||
REFERENCES
|
||
IEEETrans.Commun.,vol.62,no.11,pp.3983–3996,Nov.2014.
|
||
[26] E.Ko¨kenandE.Tuncel,“Onrobustnessofhybriddigital/analogsource-
|
||
[1] H.Xie,Z.Qin,Z.Han,andK.B.Letaief,“Hybriddigital-analogjoint channel coding with bandwidth mismatch,” IEEE Trans. Inf. Theory,
|
||
semantic-channelcodingforimagetransmission,”inProc.IEEEGlobal vol.61,no.9,pp.4968–4983,Sep.2015.
|
||
Commun.Conf.,CapeTown,SouthAfrica,Dec.2024,pp.1–6. [27] T.Fujihashi,T.Koike-Akino,T.Watanabe,andP.V.Orlik,“HoloCast+:
|
||
[2] C.-X. Wang et al., “On the road to 6G: Visions, requirements, key Hybrid digital-analog transmission for graceful point cloud deliv-
|
||
technologies, and testbeds,” IEEE Commun. Surveys Tuts., vol.25, ery with graph Fourier transform,” IEEE Trans. Multimedia, vol.24,
|
||
no.2,pp.905–974,2ndQuart.,2023. pp.2179–2191,2022.
|
||
[3] Z.Qin,X.Tao,J.Lu,W.Tong,andG.YeLi,“Semanticcommunica- [28] J. A. Hart, The Economics, Technology and Content of Digital TV.
|
||
tions:Principlesandchallenges,”2021,arXiv:2201.01389. Boston,MA,USA:Springer,2004.
|
||
[4] H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled [29] L.Yu,H.Li,andW.Li,“Wirelessscalablevideocodingusingahybrid
|
||
semantic communication systems,” IEEE Trans. Signal Process., digital-analog scheme,” IEEE Trans. Circuits Syst. Video Technol.,
|
||
vol.69,pp.2663–2675,2021. vol.24,no.2,pp.331–345,Feb.2014.
|
||
[5] P. Yi, Y. Cao, X. Kang, and Y.-C. Liang, “Deep learning-empowered [30] C.Lan,C.Luo,W.Zeng,andF.Wu,“Apracticalhybriddigital-analog
|
||
semanticcommunicationsystemswithasharedknowledgebase,”IEEE scheme for wireless video transmission,” IEEE Trans. Circuits Syst.
|
||
Trans.WirelessCommun.,vol.23,no.6,pp.6174–6187,Jun.2024. VideoTechnol.,vol.28,no.7,pp.1634–1647,Jul.2018.
|
||
[6] Z. Weng, Z. Qin, X. Tao, C. Pan, G. Liu, and G. Y. Li, “Deep [31] B. Tan, J. Wu, R. Wang, W. Luo, and J. Liu, “An optimal resource
|
||
learning enabled semantic communications with speech recogni- allocationforhybriddigital–analogwithcombinedmultiplexing,”IEEE
|
||
tion and synthesis,” IEEE Trans. Wireless Commun., vol.22, no.9, InternetThingsJ.,vol.6,no.1,pp.1125–1135,Feb.2019.
|
||
pp.6227–6240,Sep.2023.
|
||
[32] P. Yahampath, “Video coding for OFDM systems with imperfect CSI:
|
||
[7] E. Grassucci, C. Marinoni, A. Rodriguez, and D. Comminiello, A hybrid digital–analog approach,” Signal Process., Image Commun.,
|
||
“Diffusion models for audio semantic communication,” in Proc. IEEE
|
||
vol.87,Sep.2020,Art.no.115903.
|
||
Int. Conf. Acoust., Speech Signal Process. (ICASSP), Seoul, South
|
||
[33] Z.Liuetal.,“Swintransformer:Hierarchicalvisiontransformerusing
|
||
Korea,Apr.2024,p.13.
|
||
shiftedwindows,”inProc.IEEE/CVFInt.Conf.Comput.Vis.(ICCV),
|
||
[8] T. Han, Q. Yang, Z. Shi, S. He, and Z. Zhang, “Semantic-preserved
|
||
Oct.2021,pp.9992–10002.
|
||
communicationsystemforhighlyefficientspeechtransmission,”IEEE
|
||
[34] J.Balle,D.Minnen,S.Singh,S.J.Hwang,andN.Johnston,“Variational
|
||
J.Sel.AreasCommun.,vol.41,no.1,pp.245–259,Jan.2023.
|
||
imagecompressionwithascalehyperprior,”inProc.Int.Conf.Learn.
|
||
[9] J. Dai et al., “Nonlinear transform source-channel coding for seman-
|
||
Represent.,Vancouver,BC,Canada,Apr.2018.
|
||
tic communications,” IEEE J. Sel. Areas Commun., vol.40, no.8,
|
||
[35] H. Xie, Z. Qin, X. Tao, and K. B. Letaief, “Task-oriented multi-user
|
||
pp.2300–2316,Aug.2022.
|
||
semanticcommunications,”IEEEJ.Sel.AreasCommun.,vol.40,no.9,
|
||
[10] G.Zhang,Q.Hu,Z.Qin,Y.Cai,G.Yu,andX.Tao,“Aunifiedmulti-
|
||
tasksemanticcommunicationsystemformultimodaldata,”IEEETrans. pp.2584–2597,Sep.2022.
|
||
Commun.,vol.72,no.7,pp.4101–4116,Jul.2024. [36] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilis-
|
||
[11] H. Wu, Y. Shao, E. Ozfatura, K. Mikolajczyk, and D. Gu¨ndu¨z, tic models,” in Proc. Adv. Neural Inf. Process. Syst., Dec. 2020,
|
||
“Transformer-aided wireless image transmission with channel pp.6840–6851.
|
||
feedback,” IEEE Trans. Wireless Commun., vol.23, no.9, [37] J.Song,C.Meng,andS.Ermon,“Denoisingdiffusionimplicitmodels,”
|
||
pp.11904–11919,Sep.2024. inProc.Int.Conf.Learn.Represent.,May2021.
|
||
[12] S. Wang et al., “Wireless deep video semantic transmission,” IEEE [38] A. Ignatov et al., “PIRM challenge on perceptual image enhancement
|
||
J.Sel.AreasCommun.,vol.41,no.1,pp.214–229,Jan.2023. onsmartphones:Report,”inProc.Eur.Conf.Comput.Vis.,Jan.2019,
|
||
[13] T.-Y.Tung,D.B.Kurka,M.Jankowski,andD.Gu¨ndu¨z,“DeepJSCC- pp.315–333.
|
||
Q: Constellation constrained deep joint source-channel coding,” IEEE [39] L.Chi,B.Jiang,andY.Mu,“FastFourierconvolution,”inProc.Adv.
|
||
J.Sel.AreasInf.Theory,vol.3,no.4,pp.720–731,Dec.2022. NeuralInf.Process.Syst.,Dec.2020,pp.4479–4488.
|
||
|
||
---PAGE BREAK---
|
||
|
||
2492 IEEEJOURNALONSELECTEDAREASINCOMMUNICATIONS,VOL.43,NO.7,JULY2025
|
||
[40] S.JakubczakandD.Katabi,“SoftCast:One-size-fits-allwirelessvideo,” Rebecca Moores Professor with the Electrical and Computer Engineering
|
||
in Proc. ACM SIGCOMM Conf., New York, NY, USA, Aug. 2010, Department and the Computer Science Department, University of Houston,
|
||
pp.449–450. Houston, TX, USA. His main research targets on the novel game-theory-
|
||
[41] X. L. Liu, W. Hu, Q. Pu, F. Wu, and Y. Zhang, “ParCast: Soft video relatedconceptscriticaltoenablingefficientanddistributiveuseofwireless
|
||
delivery in MIMO-OFDM WLANs,” in Proc. 18th Annu. Int. Conf. networks with limited resources, wireless resource allocation and manage-
|
||
MobileComput.Netw.,Istanbul,Turkey,Aug.2012,pp.233–244. ment, wireless communications and networking, quantum computing, data
|
||
science, smart grids, carbon neutralization, and security and privacy. He
|
||
received a NSF Career Award in 2010, the Fred W. Ellersick Prize of the
|
||
IEEECommunicationSocietyin2011,theBestPaperAwardoftheEURASIP
|
||
Journal on Advances in Signal Processing in 2015, the IEEE Leonard G.
|
||
Abraham Prize in the field of Communications Systems (Best Paper Award
|
||
in IEEEJOURNALONSELECTEDAREASINCOMMUNICATIONS) in 2016,
|
||
theIEEEVehicularTechnologySociety2022BestLandTransportationPaper
|
||
Award,andseveralbestpaperawardsinIEEEconferences.HewasanIEEE
|
||
Huiqiang Xie (Member, IEEE) received the B.S. Communications Society Distinguished Lecturer from 2015 to 2018 and an
|
||
degreefromNorthwesternPolytechnicalUniversity, ACM Distinguished Speaker from 2022 to 2025. He has been an AAAS
|
||
theM.S.degreefromChongqingUniversity,andthe Fellow since 2019 and an ACM Fellow since 2024. He has been a 1%
|
||
Ph.D. degree from the Queen Mary University of Highly Cited Researcher since 2017 according to Web of Science. He is
|
||
Londonin2023.From2023to2024,hewasaPost- also the Winner of the 2021 IEEE Kiyo Tomiyasu Award (an IEEE Field
|
||
Doctoral Research Associate with The Hong Kong Award), for outstanding early to mid-career contributions to technologies
|
||
University of Science and Technology, Guangzhou holding the promise of innovative applications, with the following citation:
|
||
Campus.HeiscurrentlyanAssociateProfessorwith Forcontributionstogametheoryanddistributedmanagementofautonomous
|
||
Jinan University. He received the 2023 IEEE ICC communicationnetworks.
|
||
StudentTravelGrant,the2023IEEEICCBestPaper
|
||
Award,andthe2023IEEESignalProcessingSociety
|
||
Best Paper Award. He was also the Organizing Committee Co-Chair of
|
||
2024 EIECT. He is an Associate Editor of Journal of Communications and
|
||
Networks.
|
||
Zhijin Qin (Senior Member, IEEE) is currently
|
||
an Associate Professor with Tsinghua University,
|
||
Beijing, China. She was with the Imperial College
|
||
London, London, U.K.; Lancaster University, Lan-
|
||
KhaledB.Letaief(Fellow,IEEE)receivedtheB.S.
|
||
caster,U.K.;andQueenMaryUniversityofLondon,
|
||
degree(Hons.)inelectricalengineeringfromPurdue
|
||
London, from 2016 to 2022. Her research interests
|
||
University at West Lafayette, IN, USA, in Decem-
|
||
includesemanticcommunicationsandsparsesignal
|
||
ber 1984, the M.S. and Ph.D. degrees in electrical
|
||
processing. She was a recipient of the 2017 IEEE
|
||
engineering from Purdue University, in 1986, and
|
||
GLOBECOM Best Paper Award, 2018 IEEE Sig-
|
||
1990, respectively, and the Ph.D. Honoris Causa
|
||
nal Processing Society Young Author Best Paper
|
||
degree from the University of Johannesburg, South
|
||
Award,2021IEEECommunicationsSocietySignal
|
||
Africa,in2022.Heisaninternationallyrecognized
|
||
ProcessingforCommunicationsCommitteeEarlyAchievementAward,2022
|
||
leaderinwirelesscommunicationsandnetworks.He
|
||
IEEE Communications Society Fred W. Ellersick Prize, and 2023 IEEE
|
||
isamemberofUnitedStatesNationalAcademyof
|
||
ICC Best Paper Award. She was a Guest Editor of IEEE JOURNAL ON
|
||
Engineering, a fellow of Hong Kong Institution of
|
||
SELECTEDAREASINCOMMUNICATIONS(JSAC)SpecialIssueonSemantic
|
||
Engineers,amemberofIndiaNationalAcademyofSciences,andamember
|
||
Communications and an Area Editor of IEEE JOURNAL ON SELECTED
|
||
ofHongKongAcademyofEngineeringSciences.Heisalsorecognizedby
|
||
AREAS IN COMMUNICATIONS Series. She was also the Symposium Co-
|
||
ThomsonReutersasanISIHighlyCitedResearcherandwaslistedamongthe
|
||
Chair of IEEE GLOBECOM 2020 and 2021. She is an Associate Editor of
|
||
2020top30ofAI2000InternetofThingsMostInfluentialScholars.Hewasa
|
||
IEEE TRANSACTIONS ON COMMUNICATIONS, IEEE TRANSACTIONS ON
|
||
recipientofmanydistinguishedawardsandhonors,includingthe2022IEEE
|
||
COGNITIVENETWORKING,andIEEECOMMUNICATIONSLETTERS.
|
||
Communications Society Edwin Howard Armstrong Achievement Award,
|
||
2021 IEEE Communications Society Best Survey Paper Award, 2019 IEEE
|
||
CommunicationsSocietyandInformationTheorySocietyJointPaperAward,
|
||
and 2016 IEEE Marconi Prize Paper Award in Wireless Communications.
|
||
He has also been a dedicated teacher committed to excellence in teaching
|
||
and scholarship. He received the Michael G. Gale Medal for Distinguished
|
||
Teaching(highestuniversity-wideteachingawardandonlyonerecipient/year
|
||
is honored for his/her contributions). Since 1993, he has been with The
|
||
Hong Kong University of Science and Technology (HKUST), where he
|
||
Zhu Han (Fellow, IEEE) received the B.S. degree has held many administrative positions, including the Acting Provost, the
|
||
inelectronicengineeringfromTsinghuaUniversity, Head of the Electronic and Computer Engineering Department, and the
|
||
Beijing, China, in 1997, and the M.S. and Ph.D. DirectorofHongKongTelecomInstituteofInformationTechnology.While
|
||
degreesinelectricalandcomputerengineeringfrom at HKUST, he was the Chair Professor and the Dean of Engineering. He
|
||
theUniversityofMarylandatCollegePark,College is well recognized for his dedicated service to professional societies and
|
||
Park, MD, USA, in 1999 and 2003, respectively. IEEE, where he has served in many leadership positions. These include
|
||
From2000to2002,hewasaResearchandDevelop- the Founding Editor-in-Chief of the prestigious IEEE TRANSACTIONS ON
|
||
mentEngineerwithJDSU,Germantown,MD,USA. WIRELESS COMMUNICATIONS. He also served as the President of the
|
||
From 2003 to 2006, he was a Research Associate IEEECommunicationsSociety(2018–2019),theworld’sleadingorganization
|
||
with the University of Maryland at College Park. for communications professionals with headquarters in New York City and
|
||
From 2006 to 2008, he was an Assistant Professor membersin162countries.HealsoservedasamemberfortheIEEEBoard
|
||
with Boise State University, Boise, ID, USA. He is currently a John and ofDirectors. |