# Resource Allocation for Text Semantic Communications Lei $\mathrm { Y a n } ^ { \mathbb { P } }$ , Zhijin $\mathrm { Q i n } ^ { \mathbb { \oplus } }$ , Senior Member, IEEE, Rui Zhang Member, IEEE, Yongzhao Li $\operatorname { L i } ^ { \mathbb { \phi } }$ , Senior Member, IEEE, and Geoffrey Ye Li , Fellow, IEEE Abstract—Semantic communications have shown its great potential to improve the transmission reliability, especially in the low signal-to-noise regime. However, resource allocation for semantic communications still remains unexplored, which is a critical issue in guaranteeing the semantic transmission reliability and the communication efficiency. To fill this gap, we investigate the spectral efficiency in the semantic domain and rethink the semantic-aware resource allocation issue. Specifically, taking text semantic communication as an example, the semantic spectral efficiency (S-SE) is defined for the first time, and is used to optimize resource allocation in terms of channel assignment and the number of transmitted semantic symbols. Additionally, for fair comparison of semantic and conventional communication systems, a transform method is developed to convert the conventional bit-based spectral efficiency to the S-SE. Simulation results demonstrate the validity and feasibility of the proposed resource allocation method, as well as the superiority of semantic communications in terms of the S-SE. Index Terms—Semantic communications, semantic spectral efficiency, resource allocation. # I. INTRODUCTION W ITH growing wireless applications and increasing datatraffic, wireless communications are facing the bottleneck of spectrum scarcity, which motivates a paradigm shift from conventional to semantic communications [1], [2]. By focusing on transmitting the meaning of the source, semantic communications have shown a great potential to reduce the network traffic and thus alleviate spectrum shortage. Particularly, different types of semantic systems have been studied for different types of sources, including text [3], [4], image [5], [6], speech [7], and video [8], to ensure significant improvement in semantic transmission reliability. In this context, it is vital to investigate the resource allocation issue Manuscript received March 5, 2022; revised April 13, 2022; accepted April 21, 2022. Date of publication April 27, 2022; date of current version July 11, 2022. This work was supported in part by the National Natural Science Foundation of China under Grant 61901345, Grant 61901333, and Grant 62001358; in part by the Postdoctoral Science Foundation of China under Grant 2019M663630; in part by the Shaanxi Provincial Key Research and Development Program under Grant 2021ZDLGY04-08, Grant 2022ZDLGY05-03, and Grant 2022ZDLGY05-04; in part by the State Key Laboratory of Integrated Services Network under Grant ISN090105; in part by the 111 Project under Grant B08038; in part by the Huawei Technologies Ltd.; and in part by the China Scholarship Council under Grant 202006960013. The associate editor coordinating the review of this article and approving it for publication was D. B. da Costa. (Corresponding authors: Rui Zhang; Yongzhao Li.) Lei Yan, Rui Zhang, and Yongzhao Li are with the State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an 710071, China (e-mail: lyan@stu.xidian.edu.cn; $\operatorname { r } Z ^ { ( \varpi ) }$ xidian.edu.cn; yzhli@xidian.edu.cn). Zhijin Qin is with the School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, U.K. (e-mail: z.qin@qmul.ac.uk). Geoffrey Ye Li is with the School of Electrical and Electronic Engineering, Imperial College London, London SW7 2AZ, U.K. (e-mail: geoffrey.li@imperial.ac.uk). Digital Object Identifier 10.1109/LWC.2022.3170849 for semantic communications to improve the communication efficiency while guaranteeing the transmission reliability [9]. In wireless communications, how to measure the information content as well as the spectral efficiency (SE) is fundamental to the resource allocation issue. Bit is used in the conventional communications. However, it is not applicable in semantic communications as bits are produced based on the statistic knowledge of source symbols rather than the semantic information of the source. Therefore, resource allocation needs to be rethought from the semantic perspective. The research on semantic theory has provided some insights on this issue. Carnap and Bar-Hillel [10] first attempted to measure the semantic information in a sentence based on the logical probability. On this basis, the semantic channel capacity was derived in [11] for the discrete memoryless channel, revealing the existence of the semantic coding strategy for reliable communications. Furthermore, semantic coding, the fundamental limits of semantic transmission, and semantic compression were investigated in [12]. However, the aforementioned works are based on abstract models without any hint of practical implementation and fail to quantify the SE in the semantic domain. Although a complete theory or a well-developed mathematical model for semantic communications is still missing, the success of semantic system design with the aid of deep learning (DL) makes it possible to define a calculable SE in the semantic domain. Particularly, the DL-enabled semantic communication system (DeepSC) [3] and its several variants [4], [13] can effectively extract the semantic information from text and successfully deliver the meaning to the receiver. In this letter, we use DeepSC as an example to explore the SE issue and the resource allocation problem in such a semanticaware network. The main contributions are as follows: A novel resource allocation model is proposed for semantic-aware networks. Specifically, the semantic spectral efficiency (S-SE) is first defined to measure the communication efficiency from the semantic perspective. Then a new formulation is proposed and solved to maximize the overall S-SE in terms of channel assignment and the number of transmitted semantic symbols. • To make a fair comparison between semantic and conventional communication systems, a transform method is developed to convert the bit-based SE to the S-SE. Simulation results verify the effectiveness of the proposed resource allocation model, as well as the superiority of semantic communication systems in terms of the S-SE. The rest of this letter is organized as follows. Section II introduces the system model. Semantic-aware resource allocation is formulated and solved in Section III. Section IV introduces a transform method for fair comparison of semantic and conventional communication systems and presents the simulation results. Section V concludes this letter. Notation: $\mathbb { R } ^ { n \times m }$ represents the set of real matrices of size $n \times m$ . Bold-font variables represent matrices and vectors. $x \sim$  Fig. 1. The structure of semantic-aware networks. $\mathcal { C N } ( \mu , \sigma ^ { 2 } )$ means $x$ follows a circularly-symmetric complex Gaussian distribution with mean $\mu$ and covariance $\sigma ^ { 2 }$ . # II. SYSTEM MODEL We consider a cellular network consisting of a base station (BS) and a set of users denoted by $\mathcal { N } = \{ 1 , 2 , \dots , n , \dots , N \}$ , n Nas shown in Fig. 1. DeepSC [3] is adopted as the semantic communication model and equipped at each user for text transmission, where the semantics underlying text can be effectively extracted through Transformer. The DeepSC transceiver is assumed to be trained at the BS or cloud platforms. Then the trained semantic transmitter model is broadcast to users. In the following, we will detail the DeepSC transmitter at users, the transmission model, and the DeepSC receiver at the BS. # A. DeepSC Transmitter In our model, the $n$ -th user generates a sentence $\begin{array} { r l } { \mathbf { s } _ { n } } & { { } = } \end{array}$ $[ w _ { n , 1 } , w _ { n , 2 } , \ldots , w _ { n , l } , \ldots , w _ { n , L _ { n } } ] ,$ where $w _ { n , l }$ denotes the $l$ -th w wword and $L _ { n }$ w w wis the sentence length at the $n$ -th user. Then Lthe sentence is fed into the DeepSC transmitter and mapped to a semantic symbol vector ${ { \bf { X } } _ { n } } = [ { \bf { x } } _ { n , 1 } , { \bf { x } } _ { n , 2 } , . . . , { \bf { x } } _ { n , k _ { n } L _ { n } } ]$ where $\mathbf { X } _ { n } \in \mathbb { R } ^ { k _ { n } L _ { n } \times 2 }$ and $k _ { n } L _ { n }$ is the length of the semank Ltic symbol vector for a sentence at the $n$ -th user. We notice that the length of ${ \bf X } _ { n }$ varies with $L _ { n }$ to extract the semantic Linformation of sentences with different lengths more effectively [3]. In such a model, $k _ { n }$ denotes the average number kof semantic symbols used for each word at the $n$ -th user, and each semantic symbol can be transmitted over transmission medium directly. # B. Transmission Model Let $\mathcal { M } = \{ 1 , 2 , \dotsc , m , \dotsc , M \}$ denote the set of availm Mable channels in the network, where $M$ is the number of channels and each channel is with bandwidth W. The channel assignment vector of the $n$ -th user is denoted as ${ \pmb { \alpha } } _ { n } =$ $\left[ \alpha _ { n , 1 } , \alpha _ { n , 2 } , \ldots , \alpha _ { n , m } , \ldots , \alpha _ { n , M } \right]$ , where $\begin{array} { l l l } { \alpha _ { n , m } } & { \in } & { \{ 0 , 1 \} } \end{array}$ $\alpha _ { n , m } ~ = ~ 1$ when the $m$ -th channel is allocated to the $n$ -th user, and $\alpha _ { n , m } = 0$ , otherwise. Assuming that each channel can only be allocated to at most one user and each user can only occupy at most one channel, we have $$ \sum_ {n = 1} ^ {N} \alpha_ {n, m} \leq 1, \forall m \in \mathcal {M}; \sum_ {m = 1} ^ {M} \alpha_ {n, m} \leq 1, \forall n \in \mathcal {N}. \tag {1} $$ In addition, we consider that all channels consist of large-scale fading and small-scale Rayleigh fading. The signal-to-noise ratio (SNR) of the $n$ -th user over the $m$ -th channel is $$ \gamma_ {n, m} = \frac {p _ {n} g _ {n} \left| h _ {n , m} \right| ^ {2}}{W N _ {0}}, \tag {2} $$ where $p _ { n }$ is the transmit power of the $n$ -th user, $g _ { n }$ is the plarge-scale channel gain of the $n$ g-th user including path loss and shadowing, $h _ { n , m } \sim \mathcal { C N } ( 0 , 1 )$ is the Rayleigh fading coefficient for the $n$ h-th user transmitting over the $m$ -th channel, and $N _ { 0 }$ is the noise power spectral density. # C. DeepSC Receiver At the BS, the signal from the $n$ -th user can be denoted as ${ \bf Y } _ { n } = \sqrt { g _ { n } } h _ { n , m } { \bf X } _ { n } + { \bf z }$ where z is additive white Gaussian g hnoise (AWGN) and each element of $\mathbf { z }$ follows $\mathscr { C N } ( 0 , N _ { 0 } )$ . The Nreceived signal will be decoded first by the channel decoder and thereby the semantic decoder to estimate sentence $\hat { \mathbf { s } } _ { n }$ . In order to evaluate the performance of semantic communications for text transmission, we adopt the semantic similarity [3] as the performance metric, $$ \xi = \frac {\mathbf {B} (s) \mathbf {B} (\hat {s}) ^ {\mathrm {T}}}{\| \mathbf {B} (s) \| \| \mathbf {B} (\hat {s}) \|}, \tag {3} $$ where B(·) denotes Sentence-Bidirectional Encoder Representations from Transformers (BERT) model. It achieves great improvement over state-of-the-art sentence embedding methods. A pre-trained Sentence-BERT model [14] is adopted. Compared with other semantic metrics, such as bilingual evaluation understudy (BLEU) [15], BERT-level similarity measures the distance of semantic information between two sentences more precisely. From (3), we have $0 \leq \xi \leq 1$ where $\xi = 1$ means that two sentences has the highest similarity and $\xi = 0$ indicates no similarity between them. # III. SEMANTIC-AWARE RESOURCE ALLOCATION In this section, the S-SE is first defined as a new metric for semantic-aware networks. Then the semantic-aware resource allocation is formulated as a S-SE maximization problem in terms of channel assignment and the number of transmitted semantic symbols. Finally, the optimal solution of the optimization problem is obtained. # A. Semantic Spectral Efficiency In conventional communications, spectral efficiency is measured in bits per second per Hertz $( b i t s / s / H z )$ , which can effectively measure the transmission rate of bit sequences but cannot be used to measure the transmission rate of semantic information. This is because the bit sequences are produced based on the statistical knowledge of the source and are irrelevant to the meaning of the source. Thus new performance metrics need to be investigated at the semantic level. For the sake of clarity, we assume that semantic information can be measured by the semantic unit (sut), which represents the basic unit of semantic information.1 Based on this, two crucial semantic-based performance metrics can be defined: Semantic transmission rate (S-R) refers to the effectively transmitted semantic information per second and is measured in suts/s. 1The semantic unit here is just a concept and will not affect the resource optimization solution, the reason of which will be clarified in Section III-C. • Semantic spectral efficiency (S-SE) refers to the rate at which semantic information can be successfully transmitted over a unit of bandwidth, and is measured in suts/s/Hz. Then the expressions of S-R and S-SE are derived respectively in the following. Denote $\begin{array} { r l r } { { \mathcal { D } } } & { { } = } & { \{ ( { \bf s } _ { j } } \quad = \end{array}$ $[ w _ { j , 1 } , w _ { j , 2 } , \ldots , w _ { j , l } , \ldots , w _ { j , L _ { j } } ] ) \} _ { j = 1 } ^ { D }$ with size $D$ as the text w wdataset, where ${ \bf s } _ { j }$ is the $j$ w -th sentence with length $L _ { j }$ and $w _ { j , l }$ is the $l .$ L w-th word. Let the amount of semantic information of ${ \bf s } _ { j }$ be $I _ { j }$ . With $p ( \mathbf { s } _ { j } )$ representing the occurrence probability of ${ \bf s } _ { j }$ p, the expected amount of semantic information per sentence can be expressed as $\begin{array} { r } { I = \sum _ { j = 1 } ^ { D } I _ { j } p ( \mathbf { s } _ { j } ) } \end{array}$ , which cor-I I presponds to an expected number of words per sentence as $\begin{array} { r } { L = \sum _ { j = 1 } ^ { D } L _ { j } p ( \mathbf { s } _ { j } ) } \end{array}$ . Note that we focus on the long-term text L L ptransmission rather than the transmission of individual sentences, so the expected values $I$ and $L$ , instead of the random values, should be taken to obtain the representations of S-R and S-SE. Hence, at the $n$ -th user, there are $k _ { n } L$ semantic symk Lbols on average carrying the amount of semantic information of $I .$ , and the average amount of semantic information per semantic symbol is $\bar { I } / ( k _ { n } L )$ . Moreover, since the symbol rate I k Lis equal to the channel bandwidth for passband transmission, the total semantic information transmitted over the channel with bandwidth $W$ is $W I / ( k _ { n } L )$ . Thus the S-R of the $n$ -th user over the $m$ WI k L-th channel can be expressed as $$ \Gamma_ {n, m} = \frac {W I}{k _ {n} L} \xi_ {\mathrm {n}, \mathrm {m}}, \tag {4} $$ where $\xi _ { n , m }$ is the semantic similarity of the $n$ -th user over the $m$ -th channel. Note that $\xi _ { n , m }$ relies on the neural network structure of DeepSC and channel conditions. It can be expressed as a function of $k _ { n }$ and $\gamma _ { n , m }$ , i.e., $\xi _ { n , m } = f ( k _ { n } , \mathbf { \bar { \gamma } } _ { n , m } )$ k. From (4), the corresponding S-SE can f kbe expressed as $$ \Phi_ {n, m} = \frac {\Gamma_ {n , m}}{W} = \frac {I}{k _ {n} L} \xi_ {\mathrm {n}, \mathrm {m}}. \tag {5} $$ # B. Problem Formulation In this part, a semantic-aware resource allocation model is proposed to maximize the overall S-SE of all users. By denoting $\Phi$ as the overall S-SE of all users, we have $$ \Phi = \sum_ {n = 1} ^ {N} \sum_ {m = 1} ^ {M} \alpha_ {n, m} \frac {\xi_ {n , m} I}{k _ {n} L}. \tag {6} $$ The channel assignment vector is considered as one of the optimization variables to fully exploit the performance advantage of DeepSC in the low SNR regime. Furthermore, we also optimize the average number of the transmitted semantic symbols for each word, $k _ { n }$ , to enable each symbol to carry kmore semantic information and thus achieve higher S-SE while ensuring the same transmission reliability. According to the above analysis, the optimization problem can be formulated as $$ \left(\mathbf {P 0}\right) \max _ {\boldsymbol {\alpha} _ {n}, k _ {n}} \Phi \tag {7} $$ $$ s. t. \quad C _ {1}: \alpha_ {n, m} \in \{0, 1 \}, \forall n \in \mathcal {N}, \forall m \in \mathcal {M}, \tag {7a} $$ $$ \mathrm {C} _ {2}: \sum_ {n = 1} ^ {N} \alpha_ {n, m} \leq 1, \forall m \in \mathcal {M}, \tag {7b} $$  Fig. 2. The semantic similarity for DeepSC. $$ \mathrm {C} _ {3}: \sum_ {m = 1} ^ {M} \alpha_ {n, m} \leq 1, \forall n \in \mathcal {N}, \tag {7c} $$ $$ \mathrm {C} _ {4}: k _ {n} \in \{1, 2, \dots , K \}, \tag {7d} $$ $$ \mathrm {C} _ {5}: \xi_ {n, m} \geq \xi_ {\text {t h}}, \tag {7e} $$ $$ \mathrm {C} _ {6}: \Phi_ {n, m} \geq \Phi_ {\mathrm {t h}}, \tag {7f} $$ where $\mathrm { C _ { 1 } }$ , $\mathrm { C _ { 2 } }$ , and $\mathrm { C _ { 3 } }$ are channel assignment constraints, $\mathrm { C _ { 4 } }$ specifies the permitted range of the average number of semantic symbols per word with $K$ representing the maximum value, $\mathrm { C } _ { 5 }$ reflects the minimum required semantic similarity $\xi _ { \mathrm { t h } }$ , and $\mathrm { C _ { 6 } }$ restricts the minimum S-SE of users by $\Phi _ { \mathrm { t h } }$ . # C. The Optimal Solution To solve $\mathbf { \Pi } ( \mathbf { P 0 } )$ , two challenges should be addressed. One is how to deal with the term $I / L$ in the objective function, and the other is how to cope with $\xi _ { n , m }$ , which is closely related to $\Phi$ , $\mathrm { C } _ { 5 }$ , and $\mathrm { C _ { 6 } }$ . First, we note that the term $I / L$ depends on the type of source. According to the analysis in Section III-A, this term is a constant for a particular type of source, which will not affect the resource optimization. Consequently, we can omit this term when solving $( \mathbf { P 0 } )$ . Thus the optimization problem $( \mathbf { P 0 } )$ can be rewritten as $$ \begin{array}{l} (\mathbf {P 1}) \max _ {\boldsymbol {\alpha} _ {n}, k _ {n}} \widetilde {\Phi} = \sum_ {n = 1} ^ {N} \sum_ {m = 1} ^ {M} \alpha_ {n, m} \frac {\xi_ {n , m}}{k _ {n}} \\ s. t. \quad C _ {1}, C _ {2}, C _ {3}, C _ {4}, C _ {5}, C _ {6}, \tag {8} \\ \end{array} $$ Then, since $\xi _ { n , m }$ is dependent of the specific semantic communication system and physical channel conditions, we run the DeepSC model over AWGN channel to obtain the mapping between $\xi _ { n , m }$ and $\left( k _ { n } , \gamma _ { n , m } \right)$ , as shown in Fig. 2. kAfter addressing the two challenges, $( \mathbf { P 0 } )$ can be solved. Specifically, due to the orthogonality of different cellular links, (P1) can be decoupled into the following two equivalent independent optimization problems: $$ \begin{array}{l} (\mathbf {P 2}) \max _ {k _ {n}} \widetilde {\Phi} _ {n, m} \\ \text {s . t .} \quad \mathrm {C} _ {4}, \mathrm {C} _ {5}, \mathrm {C} _ {6}, \tag {9} \\ \end{array} $$ and $$ \begin{array}{l} \left(\mathbf {P 3}\right) \max _ {\boldsymbol {\alpha} _ {n}} \sum_ {n = 1} ^ {N} \sum_ {m = 1} ^ {M} \alpha_ {n, m} \widetilde {\Phi} _ {n, m} ^ {\max } \\ s. t. \quad C _ {1}, C _ {2}, C _ {3}, \tag {10} \\ \end{array} $$ where $\widetilde { \Phi } _ { n , m } = \xi _ { n , m } / k _ { n }$ and $\widetilde { \Phi } _ { n , m } ^ { \mathrm { m a x } }$ represents the maximum $\widetilde { \Phi } _ { n , m }$ kwith respect to $k _ { n }$ . (P2) targets on obtaining $\widetilde { \Phi } _ { n , m }$ for all users over all candidate channels. Since $\xi _ { n , m }$ in $\mathrm { C } _ { 5 }$ and $\mathrm { C _ { 6 } }$ can only be obtained by the look-up table method, the exhausted searching method is adopted to solve $( \mathbf { P } 2 )$ . Moreover, (P3) can be regarded as a maximum match problem of a bipartite graph. It can be solved by the Hungarian algorithm [16], where two vertex sets are $\mathcal { N }$ and $\mathcal { M }$ respectively, and $\widetilde { \Phi } _ { n , m } ^ { \mathrm { m a x } }$ is regarded as the weight between the $n$ -th user and $m$ # IV. SIMULATION RESULTS AND COMPARISON In order to evaluate the performance of the proposed semantic-aware resource allocation scheme comprehensively, we conduct the following verifications in the simulation: 1) Comparing the proposed resource allocation model against the conventional one to verify the proposed model in semantic-aware networks. 2) Comparing the S-SE of semantic and conventional communication systems to show the superiority of semantic communications. Since the conventional systems are usually assessed in the bit domain, we first develop a transform method to convert the typical SE to the S-SE by taking the effect of source coding into consideration, making fair comparisons possible. On this basis, simulation results are presented and analysed. # A. The Transform Method for Fair Comparisons In conventional communications, each letter in a word is mapped into bits through source encoder. From the semantic perspective, each bit can be loosely regarded as a semantic symbol although it may carry less semantic information than the semantic symbol of DeepSC. Similar to the definition in Section III-A, the equivalent S-R can be expressed as $$ \Gamma_ {n, m} ^ {\prime} = C _ {n, m} \frac {I}{\mu L} \xi_ {n, m}, \tag {11} $$ where $C _ { n , m }$ is the transmission rate of the $n$ -th user over the $m$ C-th channel, measured in bits/s, and $\mu$ is defined as the transforming factor revealing the ability of the source coding scheme in compressing data, representing the average number of bits per word, measured in bits/word. Specifically, if a word includes five letters on average and ASCII code is adopted to encode each letter, we will have $\mu = 4 0$ bits/word. Moreover, when we assume no bit error in conventional communications, $\xi _ { n , m }$ is equal to 1. By denoting $R _ { n , m } = C _ { n , m } / W$ as the SE, Rthe equivalent S-SE can be given by $$ \Phi_ {n, m} ^ {\prime} = R _ {n, m} \frac {I}{\mu L}. \tag {12} $$ Hence, the source coding process and bit transmission process are both considered to derive the S-SE of the conventional systems so that fair comparisons between different communication systems can be performed. # B. Benchmarks Considering the proposed resource allocation scheme is for a specific semantic system, i.e., DeepSC, we compare it with the following three benchmarks, including an ideal system and two practical ones that have been widely deployed: Ideal system: Shannon limit can be achieved with no bit errors, i.e., $R _ { n , m } = \log _ { 2 } ( 1 + \gamma _ { n , m } )$ . R 4G system: According to the measured SNR, the BS obtains the channel quality indicator (CQI) [17], based on which the achievable SE $R _ { n , m }$ can be obtained according Rto Table 7.2.3-1 in 3GPP TS 36.213.  Fig. 3. The S-SE of the semantic-aware network with different models. TABLE I SIMULATION PARAMETERS
| Parameter | Value |
| Number of users, N | 5 |
| Number of channels, M | 5 |
| Channel bandwidth, W | 180 KHz |
| Noise power spectral density, N0 | -174 dBm/Hz |
| Pathloss model | 128.1+37.6lg[d(km)] dB |
| Shadow effect factor | 6 dB |
| Transmit power, pn | 10 dBm |
| Maximum number of symbols per word, K | 20 symbols/word |
| Semantic similarity threshold, ξth | 0.9 |
| S-SE threshold, Φth | 0.025(I/L) sut/s/Hz |
| Transforming factor, μ | 40 bits/word |