Deep Reinforcement Learning-based Resource
Allocation and Mode Selection for Semantic
Communication
Hyeonho Noh∗, Sojeong Park†, and Hyun Jong Yang∗
∗Department of Electrical and Computer Engineering, Seoul National University, Korea
†Department of Electrical Engineering, Pohang University of Science and Technology, Korea
Abstract—In this paper, we aim to solve the joint resource extracts, compresses, and transmits features relevant to the
allocation and mode selection problem, in which an agent intended task from data, rather than transmitting the raw data
adaptivelyallocatescommunicationuserstoappropriateresource
itself. Thus, semantic communication employs lossy data
units and toggles between bit and semantic transmission modes
compression, but it excels in the realm of task performance
while determining the count of transmitted semantic symbols
in semantic communication mode. Specifically, in contrast to efficiency [11].
the common yet unrealistic assumptions of prior research, In the field of text transmission, semantic communication
which posits the possibility of limitless data transmission models like DeepSC [11] have demonstrated excellent
over infinite periods, our focus shifts towards the realities of performance. However, they maintain a fixed transmission
unsaturated traffic conditions, where users transmit a finite
symbol size regardless of channel state information (CSI),
amount of data within restricted time frames. In order to
analogous to keeping the coding rate and modulation fixed
evaluate the efficiency of data transmission within the semantic
domain under unsaturated traffic conditions, we propose a in conventional communication. To take into account the
short-term semantic transmission rate (SR), as an evaluation benefitsofchanneldiversity,aresourceallocation(RA)model
metric of the joint problem. Under these unsaturated traffic that combines channel assignment and transmission volume
scenarios, the challenge emerges from the need to address a
control of semantic symbols was proposed [12]. Specifically,
combinatorialissue,optimizingresourceallocation,transmission
they defined the spectral efficiency in the realm of semantic
mode selection, and symbol lengths simultaneously across the
time-frequency axis. This task is compounded by the high communicationwhentransmittinginfinitesentencesoververy
degree of complexity and a significant number of unknown long transmission times [12]–[14]. However, this assumption
variables, making it a formidable challenge for conventional does not align with real-world scenarios, where user traffic
optimization techniques to solve effectively. In response, we
tends to be unsaturated, meaning that transmission time and
propose a deep reinforcement learning-based method that in
packet lengths are bounded by strict limitations [15].
each time step allocates users to each resource units, determines
the communication transmission mode, and selects data size This paper goes beyond by addressing the joint RA and
according to communication environment and users’ packet mode selection (MS) problem in unsaturated traffic scenarios,
states.Extensiveexperimentsdemonstratesuperiorperformance whereUEsparticipateinuplinkcommunicationwhileholding
over conventional schemes in terms of semantic transmission
data of different sizes and numbers. The main contributions
performance.
are as follows:
Index Terms—Semantic communication, Resource allocation,
Deep reinforcement learning, Semantic rate, Mode selection • Building on the definition of semantic spectral efficiency in
a long-term perspective, we propose a short-term semantic
I. INTRODUCTION transmission rate (SR) to evaluate the data transmission
In beyond 5G and 6G, wireless communication demands rate in unsaturated traffic conditions. The SR reflects more
serving much more user equipments (UEs) with larger realistic communication scenarios, where the the frame
amounts of data, resulting in the challenge of a shortage in length is strictly limited the length of data varies.
the frequency spectrum [1], [2]. However, traditional wireless • Under the definition of SR, the performance superiority
communicationhasbeenprimarilyfocusedonthetransmission between bit communication and semantic communication
andreceptionofdatawithoutcomprehendingitsactualcontent changes depending on various signal-to-noise ratios (SNRs)
[3],[4].Asaresult,theamountofdatathatcanbetransmitted and data sizes. Therefore, we propose a joint RA
is strictly limited by the frequency spectrum in use. and MS problem that dynamically allocates UEs into
To address the frequency spectrum shortage problem resource units (RUs) in the frequency domain, adaptively
in conventional communication, task-oriented semantic selects transmission mode between bit and semantic
communication, which can surpass the Shannon capacity in communication, and determines the number of transmitted
terms of performing specific tasks, has been proposed and is semantic symbols for semantic communication.
activelyunderresearch[3],[5]–[10].Semanticcommunication • To solve the proposed RA and MS optimization problem
ISBN 978-3-903176-65-2 © 2024 IFIP 1
Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on February 09,2026 at 07:53:50 UTC from IEEE Xplore. Restrictions apply.

---PAGE BREAK---

Fig.1. Theproposeddeepreinforcementlearning-basedRAandMSprotocol
whileconsideringbothUEs’SNRanddatasize,whichisan RUs. Constraint (1b) imposes the restriction by which each
intractable problem due to its combinatorial aspect [16], we user can only occupy at most one channel.
propose an algorithm based on deep reinforcement learning Let h ∈ C denote the uplink communication channel
n,k
(DRL), which has proven to be a powerful tool for solving between the BS and the k-th UE on the n-th RU. Then,
complex resource management problems in recent year [5], the SNR for the k-th UE on the RU n is given by Γ =
n,k
[17], [18]. P |h |2/σ2. where P is the transmit power of the k-th
n,k n,k n,k
As a case study, we evaluate the proposed DRL-based RA UE on the RU n, and σ2 is the noise variance.
and MS algorithm in the field of text transmission. Our
C. Text Transmission Performance
results demonstrate that the proposed DRL-based RA and
MS algorithm can achieve superior performance in terms Many researchers rely on the specific yet well-developed
of sentence similarity [11], [12], [19], [20] over various large language model, known as bi-directional encoder
conventionalschemessuchasDeepSCandbitcommunication. representations from transformers (BERT) [21], to measure
how accurate the semantic information is transmitted in text
II. SYSTEMMODELANDPROBLEMFORMULATION
transmissionfield[11],[12],[19],[20].Inthispaper,weadopt
A. Scenario the calculate sentence similarity [12], which is defined by
We consider a scenario in which a base station (BS) B(s)B(ˆs)T
communicates with K UEs. Given the CSI and sentences F(s,ˆs)= , (2)
∥B(s)∥∥B(ˆs)∥
to transmit of the UEs, the BS allocates each UE to
N RUs while also selecting the optimal transmission where B(s) represents the output embedding vector using
mode, which could be either conventional bit or semantic the BERT model for a sentence s. We leverage a pre-trained
communication. Additionally, if the BS decides to serve BERT model to compute the sentence similarity. Note that
UE with semantic communication, it needs to determine fromthesimilaritydefinitionin(2),wehave0≤F(s,ˆs)≤1,
the number of transmitted semantic symbols. The primary with 1 indicating the highest similarity and 0 indicating no
objective of the RA and MS process is to maximize task- relationship between two sentences.
specific performance metrics within the predefined packet
D. Definition of Semantic Rate
length for all UEs. The RA and MS process is shown in Fig.
1. With the definition of sentence similarity, SR is proposed
in [12] for measuring the semantic information transmission
B. Wireless Communication Model
rate using BERT model. However, unlike the conventional
We define a n,k as a binary RU assignment variable such approach, which calculates the average value of SR over
that a n,k = 1 if the k-th UE is allocated on n-th RU, and infinite frame length when sending a large amount of data, in
a n,k =0otherwise.Then,wecanrepresenttheconstraintson real communication environments, each user transmits limited
the RA as follows: data of different sizes. Furthermore, all users must transmit
(cid:88) data within a predetermined frame length to synchronize the
a ≤1, ∀k ∈K (1a)
n,k uplink transmission. To address these practical issues, we
n∈N
(cid:88) newly define the SR in this paper.
a n,k ≤1, ∀n∈N (1b) Let D = {s = [w ,w ,...,w ]}Dk−1
k j,k j,k,0 j,k,1 j,k,Lj,k−1 j=0
k∈K denotethetextdatasetforthek-thUEwithsizeD ,wheres
k j,k
where N = {0,1,...,N −1} and K = {0,1,...,K −1}. isthej-thsentencewithlengthL andw isthel-thword
j,k j,k,l
Constraint (1a) indicates the unique user assignment along all ofthej-thsentenceofthek-thUE.Inaddition,onecandefine
2
Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on February 09,2026 at 07:53:50 UTC from IEEE Xplore. Restrictions apply.

---PAGE BREAK---

the amount of semantic information of s as I (suts).
j,k j,k
Each sentence is transmitted via either bit communication or
semanticcommunication,asshowninFig.1.Wedenotem
n,k
as the binary transmission mode variable of the k-th UE on
then-thRUsuchthatm =0representsbitcommunication
n,k
while m =1 means semantic communication.
n,k
In bit communication, the transmitter protects information
from impairments such as noise or distortion by performing
rate adaptation through source coding and channel
coding based on the current SNR Γ . In the case of
n,k
semantic communication, successful transmission of semantic
information is guaranteed by extracting semantic information
and compressing the sentence length to c according to the
n,k
SNR Γ through semantic encoding and channel encoding.
n,k
The encoded symbol stream then can be represented by
Fig.2. SemanticratetableaccordingtoSNRanddatasizec n,k.
(cid:40)
C (s;Γ ,m ), if m =0,
x= bc n,k n,k n,k (3)
C (s;Γ ,c ,m ,β), if m =1.
sc n,k n,k n,k n,k follows:
(cid:88) (cid:88)
where C sc includes channel encoding, semantic encoding, max Φ= a n,k ϕ(D k ;Γ n,k ,c n,k ,m n,k ), (6a)
while C includes channel encoding, source encoding, and a,c,m
bc n∈Nk∈K
modulation, β is the parameter set of semantic and channel
s.t. (1a),(1b) (6b)
encoder networks. If x is sent, the signal received at the (cid:88)
c L ≤L ,∀n∈N,∀k ∈K, (6c)
receiver will be y = hx+z, where z is the additive white n,k j,k frame
Gaussian noise (AWGN) that follows CN(0,σ2I). With the j∈Dk
received signal, the decoded sentence can be represented as (cid:88) Lˆ ≤L ,∀n∈N,∀k ∈K, (6d)
j,k frame
ˆs= (cid:40) C b − c 1(y;Γ n,k ,m n,k ), if m n,k =0, (4) c j n ∈ , D k k ∈N,∀n∈N,∀k ∈K, (6e)
C−1(y;Γ ,c ,m ,β), if m =1,
sc n,k n,k n,k n,k a ,m ∈{0,1},∀n∈N,∀k ∈K, (6f)
n,k n,k
where inverse operation for C means the reverse process of
where a, c, and m are the set of all variable a , c , and
n,k n,k
C. Finally, the SR (suts/s) on n-th RU for k-th UE is defined
m for n ∈ N and k ∈ K, respectively. Clearly, due to its
n,k
by
nonconcave aspect, it is intractable to solve the RA and MS
(cid:80)Dk−1WI
·F(s ,ˆs )
optimization problem [16].
ϕ(D ;Γ ,c ,m )= j=0 j,k j,k j,k ,
k n,k n,k n,k L
frame
III. PROPOSEDDRL-BASEDRAOPTIMIZATION
(5)
A. Proposed DRL structure
where W is the bandwidth and L is the frame length.
frame We propose a DRL structure consisting of an agent, which
Note that the sentence similarity heavily depends on the
performs RA and MS, based on the SNR and the data size. If
design of C and C . In bit communication, the design of
sc bc the allocated UE decides to utilize semantic communication,
C sa b t c isfi is ed st t a h n a d t a (cid:80) rdi D ze k d −1 a L c ˆ cord ≤ ing L to SN w R he Γ re . Lˆ Then i , s i t t h m e u le s n t g b th e the dimension of channel encoder and decoder c n,k , i.e., the
j=0 j,k frame j,k number of symbols for each word is selected to maximize the
of C bc (s j,k ;Γ n,k ). In semantic communication, the optimal S-SR Φ in (6). We obtain the solution by precomputing the Φ
channel coding dimension with respect to SNR has not been forallpossiblec andorganizingtheresultsintoanSRtable,
n,k
thoroughly surveyed. Thus, we define the channel coding
as shown in Fig. 2. In the case where the agent chooses bit
dimension of semantic communication for the n-th RU for
communicationfordatatransmission,thesentenceisconveyed
the k-th UE as c . Then, semantic communication transmits
n,k using the conventional bit communication protocol.
eachwordbypackingitwithasizeofc .Wedeterminethis
n,k
valuetoregulatethenumberoftransmittedsemanticsymbols. B. Definitions of Parameters in DRL
Similar to the approach in bit communication, it is essential Here, we define the result of RA and MS, whether it’s
to satisfy the condition (cid:80)D j= k 0 −1c n,k L j,k ≤L frame for the k-th bit communication or semantic communication, as an action.
UE on the n-th RU. The BS selects actions corresponding to each RU index at
each time step based on the current state. Therefore, one can
E. Problem Formulation
set t ∈ N. Then, the state space, action space, and reward
From (1) and (5), we formulate the joint RA and MS functions of the agent are defined below.
optimization problem that maximizes sum of SR (S-SR) as State Space: The state includes the CSI and dataset to
3
Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on February 09,2026 at 07:53:50 UTC from IEEE Xplore. Restrictions apply.

---PAGE BREAK---

transmit of the UEs, which is defined as ˜s n,k = {Γ n,k ,D k }. TABLEI
Additionally,theinitialstateforallRUsandallUEsisdefined THES-SRCOMPARISONOFTHEPROPOSEDANDCONVENTIONAL
as S =
(cid:83) (cid:83)
˜s . When the k-th UE is selected as
METHODSWITHRANDOMSNRANDRANDOMNUMBEROFSENTENCES.
0 n∈N k∈K n,k
an action during the DRL procedure, we set the Γ = −1
n,k Random Random Max-SNR Max-SNR
for all n to mark it as an unavailable option.
+BC +SC +BC +SC
Action Space: The action is defined by a ∈ A, S-SR 1,776 2,464 2,169 2,498
t
which represents the result of RA and MS on the t-
DRL DRL
th RU. Thus, we can represent the action as a t = +BC +SC Proposed
{(k,m )|a =1,∀k ∈K}. S-SR 2,374 3,091 3,113
t,k t,k
Reward Function: We define the reward function of the
(cid:80)
agents as r = a ϕ .
t k∈K t,k t,k
coding dimension is fixed at eight and “Semantic” when the
C. DRL Training Process
channel coding dimension is optimized according to SNR.
Initialization: We introduce the Deep Q-network (DQN)
In the bit communication-based system, we adopt Huffman
[22] as the learning framework of the agent. Thus, we utilize
coding as a source coding and low-density parity check
a parameter θ that defines an action-value function Q(S,a;θ)
(LDPC) as a channel coding. We follow the 5G standard
for the agent. In addition, we initialize replay memories E for
in terms of coding rate and modulation and [26] to get
the agent to capacity E.
modulation and coding scheme index according to SNR.
Experience collection: At each time step t, the agent
We set the bandwidth W =180 kHz and the frame length
iteratively collects experience by selecting the actions. Each
L = 1024. We assume that the amounts of semantic
frame
actionisdrawninanepsilon-greedyfashionwithlineardecay,
information of all sentence are equivalent, i.e., I = 1, for
j,k
i.e., ϵ(e) = max{1−e/Z,0.01}, where Z is the decaying
all (j,k). In all experiments, the number of users is set to 5,
rate constant, and e is the episode step. The agent first selects
and the number of resource blocks is fixed at 5 3.
a random action a with probability ϵ(e) or selects a =
t t
argmax Q(S ,a;θ), otherwise. The agent stores transition B. Result Analysis
a t
at each time-step (S ,a ,r ,S ) in E. We first conduct a comparative analysis between the
t t t t+1
Updating model parameters: With the stored experiences in conventional and proposed schemes in a scenario involving
the replay memories, the agent updates learning parameters, randomly varying data sizes ranging from 1 to 10 and SNR
θ. In the case of θ, the agent samples random mini- levels distributed uniformly between 3 dB and 15 dB, which
batch of B transitions (S ,a ,r ,S ) from E. We set is presented in Table I. From the result, we conclude that the
j j j j+1
y = r if S is a terminal state or y = r + proposed DRL-based method achieves the highest S-SR over
j j j+1 j j
γmax Q(S ,a;θ), otherwise. Then, we get the training all conventional methods.
a j+1
loss J(θ)= (cid:80) (y −Q(S ,a ;θ))2/B. The agent performs In the following, we assess the S-SR of the bit
j j j j
a gradient descent step on J(θ) and updates θ. communication only, semantic communication only, and
proposed schemes with the DRL method across different
IV. SIMULATIONRESULTS
number of sentences, as shown in Fig 3, to ascertain the
ToevaluatetheperformanceoftheproposedDRL-basedRA influenceofMS.WhenUEsendsarelativelysmallnumberof
andMSalgorithmunderscenariowherebothsemanticandbit sentences, it can achieve higher S-SR with bit communication
communication are available, we have conducted simulations becauseitcanreliablysendwithintheframelength.However,
with the proposed DRL algorithm and baseline methods. when sending a large number of sentences, compressing
sentences into semantic information and transmitting them
A. Experimental Setup
proves to be much more effective. Thus, the proposed method
We adopt the datasets named European parliament that allows users to flexibly choose between two modes of bit
proceedings parallel Corpus [23]. It includes around 2.0 and semantic communication based on the data size achieves
million sentences and 53 million words. We sample 200,000 the highest S-SR compared to the other two communication
sentence from the datasets and divides them into a training techniques.
dataset and a test dataset. In addition, we collect the sentence Fig. 4 shows the S-SR of the proposed and conventional
with the length of 4 to 30. methods along with different SNRs. In a low SNR
We examine baselines in RA methods and communication environment, the S-SR of bit communication deteriorates due
types. In RA methods, we investigate two methods: random to the failure of complete restoration of data. In contrast,
and max-SNR [24], [25]. The random method chooses UEs semantic communication provides a significantly better S-SR
regardlessofSNRanddatasizewhilethemax-SNRprioritizes in low SNR conditions; however, it shows a slightly lower S-
UEs based sorely on SNR. In terms of communication types, SR compared to bit communication when the SNR exceeds
semantic communication-based and bit communication-based or equals 9 dB. While semantic communication experiences
systemsareconsidered.Inthesemanticcommunication-based some loss in S-SR performance due to lossy compression,
system, we refer to it as “DeepSC” [11] when the channel bit communication achieves better performance in high SNR
4
Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on February 09,2026 at 07:53:50 UTC from IEEE Xplore. Restrictions apply.

---PAGE BREAK---

Korea government (MSIT) (No. RS-2023-00250191), and in
part by the New Faculty Startup Fund from Seoul National
University.
REFERENCES
[1] DenizGu¨ndz¨ etal., “Beyondtransmittingbits:Context,semantics,and
task-oriented communications,” IEEE J. Sel. Areas Commun., vol. 41,
no.1,pp.5–41,2023.
[2] Yalin E. Sagduyu, Sennur Ulukus, and Aylin Yener, “Task-oriented
communications for nextG: End-to-end deep learning and ai security
aspects,” IEEEWirelessCommun.,vol.30,no.3,pp.52–60,2023.
[3] Wanting Yang et al., “Semantic communications for future internet:
Fig. 3. The S-SR comparison of the proposed and conventional methods Fundamentals,applications,andchallenges,” IEEECommun.Surv.Tut.,
with respect to the number of sentences. AWGN channel with a uniform vol.25,no.1,pp.213–250,2023.
distributionofSNRfrom3dBto15dBisconsidered. [4] Christina Chaccour, Walid Saad, Me´rouane Debbah, Zhu Han, and
H.VincentPoor,“Lessdata,moreknowledge:Buildingnextgeneration
semanticcommunicationnetworks,” IEEECommun.SurveysTuts.,pp.
1–1,2024.
[5] HaijunZhangetal., “DRL-drivendynamicresourceallocationfortask-
orientedsemanticcommunication,” IEEETrans.Commun.,vol.71,no.
7,pp.3992–4004,2023.
[6] HongweiZhangetal.,“Deeplearning-enabledsemanticcommunication
systemswithtask-unawaretransmitteranddynamicdata,” IEEEJ.Sel.
AreasCommun.,vol.41,no.1,pp.170–185,2023.
[7] KeYangetal., “WITT:Awirelessimagetransmissiontransformerfor
semantic communications,” in Proc. IEEE Int. Conf. Acoust. Speech
SignalProcess.,2023,pp.1–5.
[8] Huiqiang Xie, Zhijin Qin, and Geoffrey Ye Li, “Semantic
communication with memory,” IEEE J. Sel. Areas Commun., vol. 41,
no.8,pp.2658–2669,2023.
[9] Guangming Shi et al., “From semantic communication to semantic-
aware networking: model, architecture, and open problems,” IEEE
Fig.4. TheS-SRcomparisonoftheproposedandconventionalmethodswith
Commun.Magazine,vol.59,no.8,pp.44–50,2021.
respecttoSNR.ThenumberofsentencesallUEposesistwo.
[10] Xuewen Luo, Hsiao-Hwa Chen, and Qing Guo, “Semantic
communications:Overview,openissues,andfutureresearchdirections,”
IEEEWirelessCommun.,vol.29,no.1,pp.210–219,2022.
[11] Huiqiang Xie, Zhijin Qin, Geoffrey Ye Li, and Biing-Hwang Juang,
environments due to its precise data reconstruction. However,
“Deeplearningenabledsemanticcommunicationsystems,”IEEETrans.
the proposed method outperforms all baseline methods across SignalProcess.,vol.69,pp.2663–2675,2021.
the entire SNR range by adaptively selecting the optimal [12] Lei Yan, Zhijin Qin, Rui Zhang, Yongzhao Li, and Geoffrey Ye Li,
“Resourceallocationfortextsemanticcommunications,”IEEEWireless
transmission mode.
Commun.Lett.,vol.11,no.7,pp.1394–1398,2022.
[13] XidongMuetal., “Heterogeneoussemanticandbitcommunications:A
V. CONCLUSION
semi-noma scheme,” IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp.
155–169,2023.
We proposed a DRL-based algorithm for optimizing
[14] XidongMuandYuanweiLiu, “Exploitingsemanticcommunicationfor
joint RA and MS, effectively allocating UEs to RUs and non-orthogonalmultipleaccess,” IEEEJ.Sel.AreasCommun.,vol.41,
determining the optimal transmission mode between semantic no.8,pp.2563–2576,2023.
[15] HyeonhoNoh,HarimLee,andHyunJongYang,“Jointoptimizationon
and bit-based communication. Our approach dynamically
uplinkOFDMAandMU-MIMOforIEEE802.11ax:Deephierarchical
adjusts the number of transmitted semantic symbols, reinforcementlearningapproach,” IEEECommun.Lett.,pp.1–5,2024.
addressing the complexity of unsaturated traffic conditions. [16] NanZhaoetal., “Deepreinforcementlearningforuserassociationand
Experiments show superior performance over traditional resource allocation in heterogeneous cellular networks,” IEEE Trans.
WirelessCommun.,vol.18,no.11,pp.5141–5152,2019.
schemes like DeepSC and bit communication, particularly in
[17] Haijun Zhang et al., “Power control based on deep reinforcement
termsofsentencesimilarity.Futureworkwillfocusonrefining learning for spectrum sharing,” IEEE Trans. Wireless Commun., vol.
the definition and quantification of semantic information in 19,no.6,pp.4209–4219,2020.
[18] ShaoyangWangetal., “JointresourcemanagementforMC-NOMA:A
sentence data and expanding the framework to more complex
deepreinforcementlearningapproach,”IEEETrans.WirelessCommun.,
networkscenarios.Thiswillenhancethesystem’sadaptability vol.20,no.9,pp.5672–5688,2021.
and efficiency, paving the way for more intelligent semantic [19] ZiQinLiewetal., “Economicsofsemanticcommunicationsystemin
wireless powered internet of things,” in Proc. IEEE Int. Conf. Acoust.
communication solutions in evolving wireless networks.
SpeechSignalProcess.,2022,pp.8637–8641.
[20] Tianxiao Han et al., “Semantic-preserved communication system for
VI. ACKNOWLEDGEMENT highlyefficientspeechtransmission,” IEEEJ.Sel.AreasCommun.,vol.
41,no.1,pp.245–259,2023.
ThisworkwassupportedinpartbyInstituteofInformation
[21] Matthew E. Peters et al., “Deep contextualized word representations,”
& communications Technology Planning & Evaluation (IITP) inProc.NorthAmer.ChapterAssoc.Comput.Linguistics:Hum.Lang.
grant funded by the Korea government (MSIT) (No.2021-0- Tech.,NewOrleans,Louisiana,June2018,pp.2227–2237.
[22] Volodymyr Mnih et al., “Human-level control through deep
00161, 6G MIMO System Research), in part by the National
reinforcementlearning,” Nature,vol.518,no.7540,pp.529–533,Feb.
Research Foundation of Korea (NRF) grant funded by the 2015.
5
Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on February 09,2026 at 07:53:50 UTC from IEEE Xplore. Restrictions apply.

---PAGE BREAK---

[23] Philipp Koehn, “Europarl: A parallel corpus for statistical machine
translation,” inMTsummit,2005,pp.79–86.
[24] Shengli Liu et al., “Joint user association and resource allocation for
wireless hierarchical federated learning with IID and non-IID data,”
IEEETrans.WirelessCommun.,vol.21,no.10,pp.7852–7866,2022.
[25] Amin Abdel Khalek, Constantine Caramanis, and Robert W.
Heath, “Delay-constrainedvideotransmission:Quality-drivenresource
allocationandscheduling,” IEEEJ.Sel.TopicsSignalProcess.,vol.9,
no.1,pp.60–75,2015.
[26] Eunmi Chu, Janghyuk Yoon, and Bang Chul Jung, “A novel link-
to-system mapping technique based on machine learning for 5G/IoT
wirelessnetworks,” Sensors,vol.19,no.5,pp.1196,2019.
6
Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on February 09,2026 at 07:53:50 UTC from IEEE Xplore. Restrictions apply.