A Knowledge Base Driven Task-Oriented Image Semantic Communication Scheme

Guo, Chang; Xi, Junhua; He, Zhanhao; Liu, Jiaqi; Yang, Jungang

doi:10.3390/rs16214044

Open AccessArticle

A Knowledge Base Driven Task-Oriented Image Semantic Communication Scheme

by

Chang Guo

,

Junhua Xi

,

Zhanhao He

,

Jiaqi Liu

and

Jungang Yang

^*

College of Information and Communication, National University of Defense Technology, Wuhan 430000, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(21), 4044; https://doi.org/10.3390/rs16214044

Submission received: 11 September 2024 / Revised: 27 October 2024 / Accepted: 28 October 2024 / Published: 30 October 2024

(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

With the development of artificial intelligence and computer hardware, semantic communication has been attracting great interest. As an emerging communication paradigm, semantic communication can reduce the requirement for channel bandwidth by extracting semantic information. This is an effective method that can be applied to image acquisition of unmanned aerial vehicles, which can transmit high-data-volume images within the constraints of limited available bandwidth. However, the existing semantic communication schemes fail to adequately incorporate the guidance of task requirements into the semantic communication process and are difficult to adapt to the dynamic changes of tasks. A task-oriented image semantic communication scheme driven by knowledge base is proposed, aiming at achieving high compression ratio and high quality image reconstruction, and effectively solving the bandwidth limitation. This scheme segments the input image into several semantic information unit under the guidance of task requirements by Yolo-World and Segment Anything Model. The assigned bandwidth for each unit is according to the task relevance scores, which enables high-quality transmission of task-related information with lower communication overheads. An improved metric weighted learned perceptual image patch similarity (LPIPS) is proposed to evaluate the transmission accuracy of the novel scheme. Experimental results show that our scheme achieves a notable performance improvement on weighted LPIPS while the same compression ratio compared with traditional image compression schemes. Our scheme has a higher target capture ratio than traditional image compression schemes under the task of target detection.

Keywords:

semantic communication; task-oriented; knowledge base

1. Introduction

In recent years, unmanned aerial vehicles (UAVs) have advanced rapidly, especially in terms of flexibility and maneuverability. As one of the emerging trends in advanced aerospace technology, it has been widely applied in agriculture and defense sectors. In practice, UAVs are often utilized for image acquisition for various scenarios, employing aerial cameras and remote sensing systems [1,2]. During aerial photography UAVs must relay the image data to the control end in real time for analysis and surveillance. However, as the expansion of aviation coverage and the improvement of image quality and accuracy, the amount of image data has surged. Considering the limited bandwidth of UAV airborne wireless communication, this poses a challenge [3].

To address this challenge, image compression algorithms for UAV communication have been developed. By using real-time image compression, the data volume has been significantly reduced, which decreases the demand for wireless communication bandwidth. Since traditional image compression algorithms, such as JPEG and JPEG2000, are unable to access the deeper semantic information within images, they have reached a performance plateau. The widespread use of image sensors with superior resolution requires a high compression ratio without affecting the quality of image reconstruction. This brings challenges to UAV communication, because the traditional image compression algorithm can not meet the needs of UAV images processing.

With the development of artificial intelligence and computer hardware, communication based on artificial intelligence has become the future development trend [4,5]. Communication technology has also evolved from traditional syntactic communication to semantic communication. This approach enables the effective compression of original data through the extraction of semantic information, offering the advantages of lower latency, reduced bandwidth consumption, and increased throughput. Furthermore, it has the potential to decrease the demand for wireless communication bandwidth. In particular, image semantic communication exhibits significant superior performance, owing to the abundant semantic information and high restructurability of images. Distinct from traditional syntactic communication, image semantic communication extracts semantic information by analyzing various semantic features within images, such as objects, backgrounds, and relational information [6]. Therefore, it has the potential to address the limited availability of communication bandwidth in UAV wireless communication systems [7].

Recently, a notable approach is semantic communication based on image semantic segmentation, which divides an image into regions of interest (ROI) and regions of non-interest (RONI), and achieves exceptional performance [8]. Wu presents a semantic communication scheme which transmits more details of ROI and fewer details of RONI, which ensures transmission efficiency under a limited bandwidth. It plays a pivotal role in achieving superior performance by dividing the image into ROI and RONI based on task requirements. In fact, different tasks tend to focus on distinct pieces of information within the same image. It may be an effective way to further improve performance by segmenting the image into multiple semantic information units based on the task background, and allocating bandwidth to each unit according to task relevance.

To address the challenges of limited bandwidth in UAV communication, this paper proposes an innovative scheme: a knowledge-base-driven task-oriented image semantic communication scheme. The scheme aims to achieve high compression rates and superior image reconstruction quality, effectively tackling bandwidth constraints. Compared with the existing semantic communication frameworks, we claim that there are two key benefits. Firstly, we introduce a task knowledge base into the scheme, instead of simply embedding the knowledge into the algorithm. The adoption of task knowledge base notably enhances the capability of task information to steer and refine the semantic communication process. We designed a construction and interactive update mechanism for task knowledge base, which allows for the adjustment of the task information and the process of semantic coding according to the task progress. This enables the semantic encoding process to adjust the focus of information according to task requirements, adapting to dynamic changes in tasks. Secondly, the adoption of separation-based semantic communication architecture avoids the demand foe end-to-end joint training of the transmitter and receiver. Furthermore, the incorporation of Yolo-World and the Segment Anything Model (SAM) notably enhances the flexibility of the designed algorithm, allowing it to adapt to a wider range of situations and tasks more effectively. The proposed scheme is particularly suitable for scenarios requiring high-quality image transmission under limited bandwidth constraints, given specific background and task requirements, such as UAV image acquisition. The main contributions of this paper can be summarized in three parts:

1.: A task-oriented scheme of image semantic information transmission is proposed, which is driven by a knowledge base. We designed construction and interactive update mechanism for task knowledge base, which can adjust the process of semantic communication to adopt the dynamic changes in tasks. Experimental results demonstrate that this method significantly improves the performance of weighted learned perceptual image patch similarity (LPIPS) and shows a high target capture rate in the target detection task.
2.: For unmanned end, we utilize Yolo-World and SAM to segment images into relevant semantic information units driven by task requirements. We propose a bandwidth allocation algorithm to assign bandwidth to each unit based on task relevance scores and bandwidth conditions. Lastly, we compress units based on the assigned bandwidth to realize multi-scale image compression.
3.: For the control end, we propose an image reconstruction algorithm based on OpenCV to reconstruct images according to the received semantic information units. We introduce an information supplement mechanism to increase the visual quality of reconstructed images.

The remainder of this paper is organized as follows. Section 2 reviews related research on image semantic communication. Section 3 introduces the scheme proposed in this paper. Section 4 presents experimental verification of our scheme. Finally, a conclusion drawn in Section 5.

2. Related Works

In this section, we first introduce the origin and development of semantic communication. Then, the latest achievements made in terms of task-oriented semantic communication technology for image data transmission are discussed.

As proposed by Weaver and Shannon, communication can be classified into three levels: the technical level, which focuses on resolving the technical problem of “how to transmit communication symbols accurately”; the semantic level, which focuses on resolving the semantic problem of “how does the transmitted symbol accurately convey the meaning”; and the effective level, which focuses on resolving the effect problem of “how the received meaning effectively affects the behavior in the desired way?” [9].

For a long time, semantic communication has remained focused on theoretical issues. Carnap and some other researchers have introduced a series of semantic information theories grounded in logical probability [10,11]. With the development of artificial intelligence and computer hardware, traditional syntactic communication has been rapidly transformed into semantic communication. In the latest theoretical research, Niu Kai and Zhang Ping have proposed a mathematical theory of semantic communication. It systematically elucidates the measurement system and theoretical limits of semantic information, and theoretically demonstrates the substantial performance potential inherent in semantic communication [12]. This research demonstrates that the evolution from traditional syntactic communication to semantic communication is inevitable.

In recent years, a series of semantic communication schemes have been proposed [13,14,15,16,17,18,19,20,21,22], which are implemented based on deep learning, convolutional neural networks, etc. Most research focuses on extracting semantic information from images, without considering the relevance of information to tasks. In fact, the connotation of semantic information is grammatical information and pragmatic information, which means semantic information is often associated with the task [23,24]. Therefore, we believe that integrating semantic communication with task construction to build a task-oriented semantic communication system can achieve higher communication efficiency, representing an important research direction in semantic communication. In the field of task-oriented image semantic communication technology, researchers have designed a series of image semantic communication schemes for tasks such as image classification and object detection, preliminarily achieving task-oriented image semantic communication through theoretical and experimental approaches [25,26,27].

One interesting study caught our attention, which proposed a task-oriented explainable semantic communication scheme [28]. This scheme disentangles the features of the input image and selects features which are of interest to the receiver to transmit. Similarly, Wu proposed a scheme based on image semantic segmentation, which classified each pixel of an image into two categories: ROI and RONI [8]. This scheme ensured high-quality transmission of the ROI by allocating greater bandwidth. These studies demonstrate that disentangling the image into different units and transmitting these units through semantic channels with varying bandwidth is an effective method to enhance performance. However, it remains an open problem to further enhance the relevance between tasks and the process of semantic communication. In addition, the requirements of tasks may change dynamically, which could lead to the semantic communication scheme not transmitting information related to new task requirements correctly. We believe that it is important to establish a task knowledge base and an update synchronization mechanism during the communication process. The task knowledge base furnishes task requirements and relevance scores for semantic encoding and decoding, which allows for task-related information in images to be identified and transmitted with different bandwidths according to task relevance. It is an effective method, which can reduce communication overhead while preserving task-related information.

In addition, Huang proposed an image semantic encoding scheme based on deep learning and introduced a semantic bandwidth allocation model [29]. This bandwidth allocation model significantly improves the quality of rebuilt images by balancing bit expenditure and reconstruction quality. However, in specific task scenarios with limited bandwidth, the foundation of bandwidth allocation should be task relevance, allocating more bandwidth to semantic features that are of higher importance in the task.

3. System Model

In this part, a task-oriented semantic communication scheme for UAV image transmission that is driven by a knowledge base is proposed. The system model is shown in Figure 1, and the whole system consists of three parts: the knowledge base, the encoder of the unmanned end, and the decoder of the control end. The knowledge base comprises a task knowledge base and a domain knowledge base, and is used for task analysis and semantic information extraction assist. The encoder is used for extracting image semantic information based on task information, through semantic segmentation, adaptive bandwidth allocation, and multi-scale image compression. The decoder is used to reconstruct and restore the image based on the received semantic information of the image.

At the unmanned end, the semantic information units are extracted from the original image by semantic segmentation, which is assisted by the task information and domain information. Then, the adaptive bandwidth allocation algorithm allocates bandwidth to each semantic information units based on task relevance score, and the multi-scale image compression algorithm will compress all semantic information unit base on the allocated bandwidth. At the control end, the information supplement algorithm will obtain some domain information to enhance the visual effect of the image. Then, the image reconstruction algorithm will rebuilt the image based on all semantic information units. Furthermore, the user can adjust the task by generating and transmitting new task information, such as switching tasks, defining new tasks, or modifying the task information of existing tasks.

3.1. Knowledge Base

As can be seen from Figure 1, the knowledge bases consist of comprises task knowledge base and domain knowledge base. The task knowledge base consists of the task name, requirement, and task relevance. The domain knowledge base consists of the task-related domain knowledge, such as common objects and image data in target detection tasks. Both task knowledge base and domain knowledge base are deployed on both the control end and the unmanned end, totaling four distinct knowledge bases.

Due to the high logical nature of the task information, it is appropriate to construct a four-level tree knowledge base consisting of task background, task name, requirements, and relevance score. Before starting the task execution, the same task knowledge base is deployed, and the initial task is set up on both the unmanned end and the control end. However, the requirements and relevance scores of task are change dynamically as the task progresses, which may cause a decrease in the accuracy of task-related semantic information extraction. To achieve effective extraction of task-related semantic information, the control end needs to generate task feed-forward information based on the received information and the current task status and send it to the unmanned end to adjust the task information. The pseudo-code for updating the knowledge base and generating task information can refer to Algorithm 1.

Algorithm 1: Algorithms for knowledge base updating and task information generation.

procedure TaskknowledgeBaseUpdate(N, R, $δ$ )
// N: task name
// R: task requirement
// $δ$ : relevance score
if N ∈ KB: // The task is included in the knowledge base
if R ∈ KB[N]:// The requirement is included in the task
if $δ$ = KB[N][R]: //The score of requirement is not changed
output information= N
else: //The score of requirement is changed
KB=update(KB[N][R], $δ$ )
output information= R, $δ$
else:
KB=update(KB[N], (R, $δ$ ))
output information= N, R, $δ$
else:
KB=update(KB, (N, R, $δ$ ))
output information= N, R, $δ$
output information
end procedure

The domain knowledge base comprises two parts. One part refers to the knowledge embedded into model parameters during algorithm training, such as the data used to train the model SAM. The other part comprises the domain knowledge, which encompasses various types of entities that may be captured and their corresponding three-dimensional image data, all based on the communication scenario. Note that, the knowledge base mentioned in this paper specifically refers to the latter part. As the domain knowledge base contains a substantial amount of image data, it is challenging to deploy in unmanned terminals. We only deploy a list consisting of Boolean variables on the unmanned side to query whether the domain knowledge base contains relevant image data.

During the process of the adaptive bandwidth allocation algorithm, the unmanned end can analyze whether the separated semantic information units are included in the domain knowledge base. If relevant data already exist, and the unit is not the key information, only the labels and location information need to be transmitted, thereby reducing the communication overhead for information transmission. During the information supplementation process, the control end retrieves image data from the domain knowledge base based on the received labels, enhancing the completeness and visual quality of the rebuilt images.

In fact, since the communication process is also a learning process, the domain knowledge base needs to be continuously refined during communication to improve its completeness. This paper designs an update mechanism for the domain knowledge base, which pseudo-code can refer to Algorithm 2. When the unmanned end detects information outside the scope of the domain knowledge base, it incorporates the image data and labels into the transmitted data, and the control end updates the domain knowledge base using clustering algorithms based on the received information.

Algorithm 2: Algorithm for knowledge base updating and calling.

Unmanned end:
procedure DomainKnowledgeBaseUpdate-U( $L_{a}$ ,I, $L_{o}$ )
// $L_{a}$ : image label
// I: image data
// $L_{o}$ : image label
unit=( $L_{a}$ , I, $L_{o}$ )
Process:
if $L_{a}$ ∉ KB: //The unit is not including in the knowledge base
KB=update(KB, $L_{a}$ )
output information= $L_{a}$ , I, $L_{o}$
else:
output information= $L_{a}$ , $L_{o}$
output information
end procedure
Control end:
procedure DomainKnowledgeBaseUpdate-C( $L_{a}$ ,I, $L_{o}$ )// I may be null
unit=(label, image_data, location_data)
if $L_{a}$ ∉ KB:
KB=update(KB, (( $L_{a}$ ,I))
output information= $L_{a}$ ,I, $L_{o}$
else:
output information=KB[ $L_{a}$ ] // Obtain image data from the knowledge
base
output information
end procedure

3.2. Unmanned End

The source image data

S_{o r i g i n a l}

is first split into several semantic information units, based on the task requirements provided by the task knowledge base. The remaining parts composed a background semantic information units. These semantic information units consist of image data, label data, and location data, as shown as Equation (1):

(I_{u n i t - i}, L a_{u n i t - i}, L o_{u n i t - i}), i = 1, \dots, m

(1)

where m represents the number of the semantic information units.

I_{u n i t - i}

,

L a_{u n i t - i}

, and

L o_{u n i t - i}

, respectively, represent the image data, label data and location data of the i-th semantic information unit.

To achieve image semantic segmentation driven by tasks, this paper combines the Yolo-World model and SAM to realize requirement-driven image semantic segmentation based on text prompts. First, the original image is detected by Yolo-World to obtain the labels and location boxes of task-related targets. Subsequently, all location boxes of targets are input into the SAM to segment the main target subjects within those location boxes.

In fact, different requirements have different relevance scores, which means different semantic information units have different importance. To transmit as much effective task-related information as possible under limited bandwidth conditions, we need to allocate bandwidth to different semantic information units. This paper provides an adaptive information organization algorithm, which allocates bandwidth based on the relevance scores

δ

provided by the task knowledge base and the bandwidth of the communication channel. We introduce two hyper-parameters,

k_{1}

and

k_{2}

, which serve as thresholds for the relevance score, facilitating the classification of these semantic information units. The values of these hyper-parameters are contingent upon the bandwidth requirements of the communication link. In scenarios with limited bandwidth, we elevate the thresholds to prioritize the transmission of critical information. Conversely, when bandwidth is abundant, we lower the thresholds to maximize data transmission. Based on the hyper-parameters thresholds

k_{1}

and

k_{2}

, the semantic information units are categorized into three types:

1.: Key semantic information units $(I_{k - i}, L a_{k - i}, L o_{k - i})$ where relevance scores $δ > k_{1}$ ;
2.: General semantic information units $(I_{g - i}, L a_{g - i}, L o_{g - i}))$ where relevance scores $k_{2} < δ \leq k_{1}$ ;
3.: Redundant semantic information units $(I_{r - i}, L a_{g - i}, L o_{r - i})$ where relevance scores $δ \leq k_{2}$ .

Furthermore, the remaining information forms one background information unit

(I_{b - i})

, resulting in a total of four categories of information units. During the bandwidth allocation process, redundant information units are deleted, general semantic information units are converted into labels, and key semantic information units are compressed with different compression ratios based on their relevance scores. The values of these two hyper-parameters influence the classification of semantic information units, ultimately affecting the compression ratio of the semantic encoding process and the visual quality of the rebuilt image. Bandwidth allocation should not only allocate bandwidth for different information units according to correlation scores, but also make full use of channel bandwidth resources. Hence, the numerical value of the hyper-parameter mainly depends on the channel condition.

As redundant information units are deleted, and do not need any bandwidth. General semantic information units are converted into labels, and only need minimal bandwidth to transmit label and location data. Therefore, the focus of bandwidth allocation lies in the allocation for background information units and key semantic information units. In fact, the compression ratio of background information unit only affects the visual effect, and the images of one specific scene have similar characteristics, such as structure. Hence, the compression ratio of background information unit can be directly determined by referring to the image data-set. In this paper, we set the compression ratio of background information unit as 20%. As for key semantic information units, the weights of each semantic information unit are calculated based on their relevance scores and data sizes, and the remaining bandwidth is allocated accordingly. The weight calculation formula of key semantic information units is as Equation (2):

w_{i} = \frac{S i z e_{i} \cdot δ_{i}}{\sum_{j = 1}^{a} S i z e_{j} \cdot δ_{j}}

(2)

where

S i z e_{i}

and

δ_{i}

, respectively, represent the file size and relevance scores of the i-th key semantic information units, and a represents the number of key semantic information units.

After obtaining the weights, the remaining 80% of the bandwidth is allocated to each key semantic information unit based on these weights. In fact, we have observed that, due to the relatively small data size of key semantic information units in the image, there may be no need for compression, and even a significant amount of bandwidth redundancy. Therefore, after allocating the bandwidth of key semantic information units, we calculate the redundant bandwidth. A portion of this redundant bandwidth is allocated to the label and location data, and the remaining bandwidth is supplemented to the background information unit to enhance the visual quality of the rebuilt image. Finally, the information data, compression ratio data, and location data of each information unit are integrated and output to the multi-scale image compression algorithm. The organized output can be represented by Equation (3).

\begin{matrix} {L i s t}_{k} [i] & = (I_{k - i}, R_{k - i}, L o_{k - i}), & i = 1, \dots, a \\ {L i s t}_{g} [i] & = (L a_{g - i}, R_{g - i}, L o_{g - i}), & i = 1, \dots, b \\ {L i s t}_{b} & = (I_{b}, R_{b}) \end{matrix}

(3)

where b represents the number of general semantic information units. The pseudo-code of the adaptive bandwidth allocation algorithm can refer to Algorithm 3.

Algorithm 3: Adaptive bandwidth allocation.

procedure BandwidthAllocation( $U_{k}$ , $U_{g}$ , $B_{s u m}$ )
// $U_{k}$ : Key semantic information Units
// $U_{g}$ : General semantic information Units
// $B_{s u m}$ : Total bandwidth
$B_{l a b e l}$ =Calculate( $U_{g}$ )
$B_{s u m}$ = $B_{s u m}$ − $B_{l a b e l}$
$B_{b a c k g r o u n d}$ = factor* $B_{s u m}$
$B_{s u m}$ = $B_{s u m}$ − $B_{b a c k g r o u n d}$
for i in length( $U_{k}$ ):
$b_{i}$ = $B_{s u m}$ * Weight_i
if $b_{i}$ >=Size( $U_{k}$ [i]):
$B_{k}$ +=[Size( $U_{k}$ [i])]
else:
$B_{k}$ +=[ $b_{i}$ ]
if Bit_sum> $\sum B_{k}$ :
$B_{b a c k g r o u n d}$ = $B_{b a c k g r o u n d}$ + $B_{s u m}$ − $\sum B_{k}$
output: $B_{b a c k g r o u n d}$ , $B_{k}$ , $B_{l a b e l}$
end procedure

For multi-scale image compression, the image compression algorithm is invoked to perform adaptive compression on both the key semantic information units and the background semantic information units, based on the results of bandwidth allocation. The general semantic information units are converted into label data. Ultimately, the image data, label data, and corresponding location data together constitute the data to be transmitted, which is then sent through the channel. It is important to note that the scheme designed in this paper is not limited to a specific compression algorithms, and can invoke any image compression algorithm as needed. Theoretically, our method can be implemented to enhance most image compression algorithm to improve the performance. In order to facilitate the verification through experiments, this paper adopts the traditional JPEG compression method at this stage. The demonstration process of the unmanned end is shown in Figure 2.

As shown in Figure 2, the task requirements parsed from the task knowledge base are “person” and “car”. We assume that, given the task background, “person” is more important than “car”. Subsequently, Yolo-World obtains the labels and location boxes of task-related targets based on these requirement prompts. SAM completes the semantic segmentation based on the location boxes. Then, an adaptive bandwidth allocation algorithm assigns bandwidth to each unit. In this process, we have segmented the images in the dataset and used the results to form a simple domain knowledge base, which includes “car”. We can find that each unit related to “person” is allocated different bandwidth, while the unit related to “car” is converted to a label. The remaining bandwidth is assigned to the background unit. Finally, the multi-scale image compression algorithm utilizes the allocated bandwidth to compress all the units and subsequently transmits them to the channel, along with the label data and corresponding location data.

3.3. Control End

After receiving the data, the control end first supplements the information through the information supplementation module based on the domain knowledge base, converting the label data into image data to enhance the visual effect of the rebuilt image. The sequence of redundant information units after supplementation can be represented as Equation (4):

{List}_{r - i} = (I_{r - i}, L a_{r - i}, L o_{r - i})

(4)

After completing the information supplementation, all information units are gradually integrated into the background information unit to reconstruct the image, which is then output to perform downstream tasks. This paper employs image fusion functions provided by OpenCV to sequentially integrate the received information units into the background based on the received location data to complete image reconstruction.The image fusion function can be represented as Equation (5).

dst = s r c_{1} * α + s r c_{2} * β + γ

(5)

where

{s r c}_{1}

and

{s r c}_{2}

represent two input image units,

γ

is the result offset (with a default value of 0), and

α

and

β

, respectively, represent the weights of the two input images. In the scheme described in this paper, the pixel values of the segmented areas in the background unit are set to 0, while the pixel values outside the segmented areas in the information unit are set to 0. Therefore, by setting the weight values

α = β = 1

, image fusion can be achieved.

The image fusion process described in this paper is shown in Figure 3, where the inputs are the received background information unit, semantic information unit, as well as label and location data, and the output is the rebuilt image.

In the demonstration process shown in Figure 3, the image data of “car” provided by the information supplementation is the image segmented from the original image. It should be noted that in the practical process of fusing the three-dimensional image data provided by the information supplement algorithm into the image, parameters such as size, orientation, and brightness need to be considered. After the information supplement process, all data are input to the image reconstruction algorithm. Then, all units about “car” and “person” are fused into the background unit according to the location data. Furthermore, as the task progresses dynamically, the control side will promptly generate new task information and send it to the unmanned side, which will adjust the image semantic encoding process accordingly and provide more targeted image data transmission services for task execution.

4. Experimental Results

In this section, we introduce the experimental setup, demonstrate the scheme provided in this paper, and measure the performance of this scheme. The configuration of the experimental computer is as follows: Intel(R) Core (TM) Ultra 7 155H, NVIDIA GeForce RTX 4060 Laptop GPU. The experiment of this paper consists of three parts. The first part demonstrates the scheme under the guidance of different tasks. The second part measures the image similarity between the rebuilt image and the original image. The third part measures the task performance of the proposed scheme under the task of object detection.

We have selected the Semantic Drone Dataset released by Graz University of Technology [30], analyzed the images in the dataset and built a simple task knowledge base.

The Semantic Drone Dataset focuses on semantic understanding of urban scenes for increasing the safety of autonomous drone flight and landing procedures. The imagery depicts more than 20 houses from nadir (bird’s eye) view acquired at an altitude of 5 to 30 m above ground. A high resolution camera was used to acquire images at a size of 6000 × 4000 px (24M px). The hyperparameters in the experiment are shown in Table 1.

This paper takes urban aerial photography by drones as the scenario, and environmental monitoring, traffic detection, and operational monitoring as tasks. A multi-level knowledge base has been constructed with targets such as person, car, and buildings as requirements, as shown in Figure 4.

Considering the design of relevance scores that align with practical task backgrounds, it is necessary to seek data provided by experts in corresponding fields. This paper directly set some relevance scores to facilitate the experimentation. It should be noted that this paper has only constructed a simple knowledge base for experimental verification, and has not carried out in-depth research on knowledge base construction. In future work, we will delve into the construction and utilization methods of knowledge bases, which will contribute to advancing the development and practical application of task-oriented image semantic communication technology.

4.1. Results Under Different Task Guidance

This paper selects a set of images containing a relatively large number of targets to demonstrate the scheme under different tasks. We set the task of traffic detection with the requirements of detecting “persons” and “cars” as the initial task and tested the scheme using the selected images. Subsequently, we adjusted the requirement to solely detecting “persons” and conducted further testing. Finally, we modify the requirement to detecting “cars” exclusively and perform the final testing. Considering the large gap between the file sizes of different images, this paper sets the compression ratio instead of setting the channel bandwidth condition. In this experiment stage, we set the compression ratio as 2%. In addition, we compress the original image by JPEG with the same compression ratio to compare with our scheme. The test results are as Figure 5.

It can be seen that specific objects, such as “person” and “car”, in images compressed with JPEG often suffer from significant distortion, such as color changes and structural deformations. However, semantic communication schemes tailored to the needs of targets like “person” and “car” can effectively address the distortion issues in these specific regions. When we set requirements as “person” and “car”, the reconstructed image can keep the visual effect close to the original image for the area of person and car. When we adjust the requirement to “person”, the reconstructed image keeps the visual effect close to the original image for the area of person, but suffers the same distortion as JPEG in other areas. Similarly, when we adjust the requirement to “car”, the reconstructed image keeps the visual effect close to the original image for the area of car but suffers the same distortion as JPEG in other person. These results demonstrate that our scheme can control the semantic encoding process by adjusting task information, achieving high-quality transmission of task-related regions.

4.2. Image Reconstruction Quality Assessment

In order to quantitatively analyze the performance of the proposed scheme, we analyze general evaluation metrics for rebuilt images. In terms of evaluation metrics for rebuilt images, the peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and LPIPS are commonly used. LPIPS is a deep learning-based image quality assessment metric, which has significant advantages compared to PSNR and SSIM [31]. Therefore, this paper adopts LPIPS as the metric to evaluate the quality of image reconstruction. During the calculation, we designed a weighted LPIPS by referencing Wu’s research [8], which can be represented as Equation (6):

\bar{L P I P S} = \frac{\sum_{i = 0}^{a} δ_{i} \cdot {LPIPS}_{i}}{\sum_{i = 0}^{a} δ_{i}}

(6)

where

δ_{i}

represents the relevance score of the i-th information unit, and

L P I P S_{i}

denotes the LPIPS value of the region where the i-th information unit is located.

In the experimental process, traffic detection with requirements “person” and “car” is taken as the task. In order to simplify the experimental setup, we set the relevance scores of these two requirements both as

δ = 2

, and the hyper-parameter thresholds set to

k_{1} = k_{2} = 1

. Based on Equation (6), we calculate the weighted LPIPS of our scheme and JPEG scheme across various compression rates. The experiment results is shown in Figure 6.

It can be seen that our scheme exhibits a notable 1.56 dB reduction in weighted LPIPS compared to JPEG scheme with compression ratio 2%. As the compression ratio increases, the weighted LPIPS of this scheme continues to decrease, and the gap with JPEG scheme is gradually narrowing. At a compression ratio of 10%, the reduction in weighted LPIPS is equivalent to 0.72 dB. The experiment results indicate that compared with traditional JPEG, the visual quality in task-related areas of the proposed method is significantly improved. Even at a high compression ratio, it can ensure high-fidelity transmission of task-critical areas. As shown in Figure 5, our scheme demonstrates superior visual quality in the task area, while the background appears more blurred. Furthermore, our scheme can achieve a compression rate of 2% at a target capture rate of more than 90%, which means that our scheme can speed up the image transfer by 50 times. However, our algorithm also requires additional running time, as can be seen from the experimental data, our algorithm takes about 0.2 s to process one image. Indeed, there are a plethora of optimization strategies and potential hardware solutions that can be harnessed to dramatically enhance operational efficiency. In the future, we will delve into a variety of avenues, encompassing algorithm optimization, parallel processing, and hardware acceleration.

4.3. Task Performance Evaluation

As a task-oriented image semantic communication scheme, it is very important to analyze the task performance. This paper analyzed the task effectiveness of rebuilt images for intelligent task target detection. The experimental process continued to use traffic detection as the task, specifically requiring the analysis of the number of people and cars in the collected images. The intelligent task can be simplified as the detection of cars and people. In this paper, the original image, rebuilt image, and compressed image were simultaneously input into the Yolo-World object detection model, with a confidence level set to 0.6 and text prompts set to ‘person’ and ‘car’. Using the detection count from the original image as the baseline, we normalized the target capture ratio, as shown in Figure 7.

It can be seen that the target capture ratio gains of the rebuilt image can reach 3.64 dB when the compression rate is 2%. Even at a compression ratio of 2%, the capture ratio of our scheme is also above 90%, indicating that our scheme achieves task performance close to that of the original image. At a compression ratio of 10%, both the capture ratio of our scheme and the JPEG scheme are almost the same as the original image. From the experimental results, it is evident that our proposed scheme performs compared with JPEG scheme. Under the same compression ratio, our scheme is more effective in ensuring the execution of downstream tasks, which indicate the advantages of our proposed approach. We have selected partial experimental results with a compression rate of 2%, as shown in Figure 8. It can be seen that our scheme exhibits superior visual quality in the task area, making it easier to be detected and possessing better task efficiency.

In order to verify the advantages and limitations of the scheme in different environments, we tested it on the MAR20 dataset [32]. MAR20 is currently the largest remote sensing image military aircraft recognition data set, including 3842 images, 20 types and 22,341 instances, and the image size is mostly 800 × 800 pixels. Since the main target in the MAR20 is aircraft, we set the task requirements in our algorithm to aircraft. Similarly, the original image, rebuilt image, and compressed image were simultaneously input into the YOLO-World object detection model, with a confidence level set to 0.6 and text prompts set to ‘airplane’. The experimental results are shown in Figure 9.

Figure 9 shows that our scheme can also achieve significant performance improvement on the MAR20 dataset. When the compression rate is 5%, the normalized target capture rate of our scheme is 65%, which is 5.66 dB higher than that of the traditional scheme. The experimental results show that our scheme can be effectively extended to satellite remote sensing scenarios. However, we also find that the target capture rate of the scheme on the MAR20 Dataset is lower than that on the Semantic Drone Dataset at the same compression rate. This result is caused by the difference in resolution and clarity between the two datasets.

Furthermore, the experimental outcomes reveal that the proposed scheme exhibits a discernible performance disparity across the two datasets. Specifically, on the MAR20 Dataset, our scheme attains lower compression and target capture rates than those achieved on the Semantic Drone Dataset. This experimental observation can be attributed to the differing image sharpness characteristics inherent in the two datasets. Conversely, it is evident that our scheme outperforms the JPEG compression scheme in both datasets. Overall, despite the variance in performance of our schemes across different datasets, they consistently demonstrate substantial improvements over traditional compression methods.

It should be noted that our scheme can adapt to data from different types of sensors by adjusting the algorithmic model. In fact, there have been a series of image processing algorithms for multispectral or hyperspectral images derived from various sensor data [33,34,35].

4.4. Ablation Study

To better understand the contributions of each component in our scheme, we perform ablation studies by systematically removing or modifying key elements. We first analyze the impact of the bandwidth allocation algorithm on the scheme when setting different compression rates, and the analysis results are shown in Table 2 During ablation experiments, the target compression rate was set to 5%.

As evident from Table 2, the absence of a knowledge base or Yolo-World in the scheme results in its direct regression to conventional image compression methods. This regression occurs because, within our scheme, the knowledge base supplies task information, while object detection facilitates semantic extraction. Furthermore, when the scheme is devoid of adaptive bandwidth allocation, both the LPIPS and target acquisition rate experience a notable decline, highlighting bandwidth allocation as a pivotal component in enhancing the scheme’s overall performance.

5. Discussion and Conclusions

This paper proposes a knowledge base driven task-oriented image semantic communication scheme. This scheme can extract semantic information units from images based on task knowledge base, and divide them into key semantic information units, general semantic information units and redundant semantic information units. In addition, this scheme can adaptively allocate bandwidth according to the channel bandwidth conditions and the relevance scores of different semantic information units. We propose an evaluate metric weighted LPIPS. Compared to other semantic communication schemes, our scheme is augmented with an external entity knowledge base. This knowledge base enables flexible adjustment of task information, adapts to real-time dynamic changes in the task, and enhances the interpretability of the semantic communication scheme. Compared with the traditional image compress compression JPEG, our scheme achieves a notable performance improvement on weighted LPIPS while the same compression ratio. Our scheme has a higher target capture ratio than the JPEG scheme under the task of target detection.

Our scheme depends on Yolo-World and SAM for segmentation, which may limit performance in specialized remote sensing tasks due to the unique challenges of these environments. However, we approach a modular design, which makes it easy to replace Yolo-World and SAM with a designed dedicated model. This allows for the easy integration of various communication modules, data processing algorithms, and power management strategies, tailored to the specific requirements of each platform. It enhances the system’s adaptability across different platforms and environments. Specifically, Yolo-World could be replaced by advanced versions of YOLO which have been optimized for remote sensing datasets. These optimized YOLO models offer improved accuracy in detecting and classifying objects in aerial or satellite imagery, making them more suitable for our needs. For semantic segmentation of remote sensing images, SAM could indeed be replaced by U-Net or its variants. U-Net and its variants have shown great promise in accurately delineating various features in satellite imagery, thanks to their ability to capture both local and global information effectively. In addition, we could take strategies such as customization and fine-tuning, development of specialized models, and continuous learning and adaptation. In future work, we will adopt these methods to enhance the adaptability of the system and ensure better performance in a wide range of remote sensing scenarios.

In future work, we plan to extend the existing multi-level tree-structured task knowledge base to an event-centric temporal knowledge graph [36], in order to better match task timing and provide more detailed task requirements. Furthermore, we aim to develop this domain knowledge base into a large-scale domain-specific language model, with the goal of achieving full coverage of domain knowledge. In the first phase of system design, we only need to initialize a small amount of default information in the knowledge base, which is important to note. Then, in the second phase of practical application, we will continuously improve the knowledge base through a knowledge base update mechanism to adapt to real-world situations.

On the other hand, it can significantly improve the performance of this scheme through designing algorithms with higher compression capability. This scheme has the potential to incorporate more advanced compression algorithms, including superior lossless image formats such as WebP or FLIF, as well as methodologies based on deep learning [37,38,39]. In our future work, we plan to apply these advanced image compression schemes to the image compression module in order to enhance the overall performance of the scheme. Existing experimental results have shown that our scheme can significantly enhance the performance of the invoked image compression algorithms. Therefore, in theory, our scheme is expected to surpass the performance of the current state-of-the-art compression algorithms mentioned above. To further validate this, we intend to conduct a series of experiments to compare the performance of our scheme, utilizing these advanced compression algorithms, with the existing cutting-edge techniques.

Author Contributions

Conceptualization, C.G. and J.X.; methodology, C.G. and J.X.; investigation, Z.H.; resources, C.G.; data curation, C.G.; writing—original draft preparation, C.G.; writing—review and editing, C.G., J.X. and J.L.; visualization, C.G. and J.X.; supervision, J.Y.; funding acquisition, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China Grant No. 62402505, the Independent Innovation Science Fund of National University of Defense Technology under Grant No. 22-ZZCX-055 and the Graduate Research Innovation Project of National University of Defense Technology under Grant No. XJZH2024006.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xu, Q.; Su, Z.; Fang, D.; Wu, Y. BASIC: Distributed Task Assignment with Auction Incentive in UAV-Enabled Crowdsensing System. IEEE Trans. Veh. Technol. 2024, 73, 2416–2430. [Google Scholar] [CrossRef]
Hu, G.; Ye, R.; Wan, M.; Bao, W.; Zhang, Y.; Zeng, W. Detection of Tea Leaf Blight in Low-Resolution UAV Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5601218. [Google Scholar] [CrossRef]
Dai, Y.; Tan, J.; Wang, M.; Jiang, C.; Li, M. A Convolutional Neural Network Image Compression Algorithm for UAVs. J. Circuits Syst. Comput. 2024, 33, 2450211. [Google Scholar] [CrossRef]
Zheng, Q.; Saponara, S.; Tian, X.; Yu, Z.; Elhanashi, A.; Yu, R. A real-time constellation image classification method of wireless communication signals based on the lightweight network MobileViT. Cogn. Neurodyn. 2024, 18, 659–671. [Google Scholar] [CrossRef]
Zheng, Q.; Zhao, P.; Zhang, D.; Wang, H. MR-DCAE: Manifold regularization-based deep convolutional autoencoder for unauthorized broadcasting identification. Int. J. Intell. Syst. 2021, 36, 7204–7238. [Google Scholar] [CrossRef]
Huang, D.; Tao, X.; Gao, F.; Lu, J. Deep Learning-Based Image Semantic Coding for Semantic Communications. In Proceedings of the 2021 IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021; pp. 1–6. [Google Scholar] [CrossRef]
Lan, Q.; Wen, D.; Zhang, Z.; Zeng, Q.; Chen, X.; Popovski, P.; Huang, K. What Is Semantic Communication? A View on Conveying Meaning in the Era of Machine Intelligence. J. Commun. Inf. Networks 2021, 6, 336–371. [Google Scholar] [CrossRef]
Wu, J.; Wu, C.; Lin, Y.; Yoshinaga, T.; Zhong, L.; Chen, X.; Ji, Y. Semantic Segmentation-Based Semantic Communication System for Image Transmission. Digit. Commun. Networks 2024, 10, 519–527. [Google Scholar] [CrossRef]
Shannon, C.; Weaver, W. The Mathematical Theory of Communication. Philos. Rev. 1949, 60, 398–400. [Google Scholar]
Carnap, R.; Bar-Hillel, Y. An Outline of a Theory of Semantic Information. J. Symb. Log. 1954, 19, 230–232. [Google Scholar] [CrossRef]
Bao, J.; Basu, P.; Dean, M.; Partridge, C.; Swami, A.; Leland, W.; Hendler, J.A. Towards a Theory of Semantic Communication. In Proceedings of the 2011 IEEE Network Science Workshop, West Point, NY, USA, 22–24 June 2011; pp. 110–117. [Google Scholar] [CrossRef]
Niu, K.; Zhang, P. A mathematical theory of semantic communication. J. Commun. 2024, 45, 7–59. [Google Scholar] [CrossRef]
Dai, J.; Wang, S.; Tan, K.; Si, Z.; Qin, X.; Niu, K.; Zhang, P. Nonlinear Transform Source-Channel Coding for Semantic Communications. IEEE J. Sel. Areas Commun. 2022, 40, 2300–2316. [Google Scholar] [CrossRef]
Dong, C.; Liang, H.; Xu, X.; Han, S.; Wang, B.; Zhang, P. Semantic Communication System Based on Semantic Slice Models Propagation. IEEE J. Sel. Areas Commun. 2023, 41, 202–213. [Google Scholar] [CrossRef]
Yoo, H.; Jung, T.; Dai, L.; Kim, S.; Chae, C.B. Demo: Real-Time Semantic Communications with a Vision Transformer. In Proceedings of the 2022 IEEE International Conference on Communications Workshops (ICC Workshops), Seoul, Republic of Korea, 16–20 May 2022; pp. 1–2. [Google Scholar] [CrossRef]
Li, A.; Liu, X.; Wang, G.; Zhang, P. Domain Knowledge Driven Semantic Communication for Image Transmission Over Wireless Channels. IEEE Wirel. Commun. Lett. 2023, 12, 55–59. [Google Scholar] [CrossRef]
Ma, S.; Qiao, W.; Wu, Y.; Li, H.; Shi, G.; Gao, D.; Shi, Y.; Li, S.; Al-Dhahir, N. Features Disentangled Semantic Broadcast Communication Networks. arXiv 2023, arXiv:2303.01892. Available online: http://arxiv.org/abs/2303.01892 (accessed on 9 September 2024). [CrossRef]
Peng, X.; Qin, Z.; Tao, X.; Lu, J.; Letaief, K.B. A Robust Semantic Communication System for Image. arXiv 2024, arXiv:2403.09222. Available online: http://arxiv.org/abs/2403.09222 (accessed on 9 September 2024).
Tian, Z.; Vo, H.; Zhang, C.; Min, G.; Yu, S. An Asynchronous Multi-Task Semantic Communication Method. IEEE Netw. 2023, 38, 275–283. [Google Scholar] [CrossRef]
Zhang, H.; Shao, S.; Tao, M.; Bi, X.; Letaief, K.B. Deep Learning-Enabled Semantic Communication Systems with Task-Unaware Transmitter and Dynamic Data. IEEE J. Sel. Areas Commun. 2023, 41, 170–185. [Google Scholar] [CrossRef]
Liang, C.; Li, D.; Lin, Z.; Cao, H. Selection-Based Image Generation for Semantic Communication Systems. IEEE Commun. Lett. 2024, 28, 34–38. [Google Scholar] [CrossRef]
Qiao, L.; Mashhadi, M.B.; Gao, Z.; Foh, C.H.; Xiao, P.; Bennis, M. Latency-Aware Generative Semantic Communications with Pre-Trained Diffusion Models. IEEE Wirel. Commun. Lett. 2024, 13, 2652–2656. [Google Scholar] [CrossRef]
Weaver, W. Recent Contributions to The Mathematical Theory of Communication. ETC A Rev. Gen. Semant. 1953, 10, 261–281. [Google Scholar]
Zhong, Y. A Theory of Semantic Information. China Commun. 2017, 14, 1–17. [Google Scholar] [CrossRef]
Yang, Y.; Guo, C.; Liu, F.; Liu, C.; Sun, L.; Sun, Q.; Chen, J. Semantic Communications with Artificial Intelligence Tasks: Reducing Bandwidth Requirements and Improving Artificial Intelligence Task Performance. IEEE Ind. Electron. Mag. 2023, 17, 4–13. [Google Scholar] [CrossRef]
Sun, Q.; Guo, C.; Yang, Y.; Chen, J.; Xue, X. Semantic-Assisted Image Compression. arXiv 2022, arXiv:2201.12599. Available online: http://arxiv.org/abs/2201.12599 (accessed on 9 September 2024).
Fan, S.; Liang, H.; Dong, C.; Xu, X.; Liu, G. A Specific Task-Oriented Semantic Image Communication System for Substation Patrol Inspection. IEEE Trans. Power Deliv. 2024, 39, 835–844. [Google Scholar] [CrossRef]
Ma, S.; Qiao, W.; Wu, Y.; Li, H.; Shi, G.; Gao, D.; Shi, Y.; Li, S.; Al-Dhahir, N. Task-Oriented Explainable Semantic Communications. IEEE Trans. Wirel. Commun. 2023, 22, 9248–9262. [Google Scholar] [CrossRef]
Huang, D.; Gao, F.; Tao, X.; Du, Q.; Lu, J. Toward Semantic Communications: Deep Learning-Based Image Semantic Coding. IEEE J. Sel. Areas Commun. 2023, 41, 55–71. [Google Scholar] [CrossRef]
Semantic Drone Dataset. Available online: https://www.tugraz.at/index.php?id=22387 (accessed on 10 July 2024).
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar] [CrossRef]
Yu, W.; Cheng, G.; Wang, M.; Yao, Y.; Xie, X.; Yao, X.; Han, J. MAR20: A Benchmark for Military Aircraft Recognition in Remote Sensing Images. Natl. Remote Sens. Bull. 2023, 27, 2688–2696. [Google Scholar] [CrossRef]
Osco, L.P.; Nogueira, K.; Ramos, A.P.M.; Pinheiro, M.M.F.; Santos, J.A.D. Semantic segmentation of citrus-orchard using deep neural networks and multispectral UAV-based imagery. Precis. Agric. 2021, 22, 1171–1188. [Google Scholar] [CrossRef]
Rajani, D.C. Low and mid-level features for target detection in satellite images. Int. J. Adv. Res. Comput. Eng. Technol. 2013, 2, 212438858. [Google Scholar]
Wang, Z.; Yang, P.; Liang, H.; Cui, W. Semantic Segmentation and Analysis on Sensitive Parameters of Forest Fire Smoke Using Smoke-Unet and Landsat-8 Imagery. Remote. Sens. 2021, 14, 45. [Google Scholar] [CrossRef]
Gottschalk, S.; Demidova, E. EventKG+TL: Creating Cross-Lingual Timelines from an Event-Centric Knowledge Graph. In Proceedings of the ESWC 2018 Satellite Events, Heraklion, Crete, Greece, 3–7 June 2018. [Google Scholar]
Wang, W.; Zhu, D.; Hu, K. A channel-gained single-model network with variable rate for multispectral image compression in UAV air-to-ground remote sensing. Multimed. Syst. 2024, 30, 193. [Google Scholar] [CrossRef]
Chaudhary, P.K. FBSE-Based JPEG Image Compression. IEEE Sensors Lett. 2024, 8, 7001104. [Google Scholar] [CrossRef]
Barman, D.; Hasnat, A.; Begum, S.; Barman, B. A deep learning based multi-image compression technique. Signal Image Video Process. 2024, 18, 407–416. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of task-oriented image semantic communication scheme driven by knowledge base.

Figure 2. Schematic of unmanned end.

Figure 3. Schematic of image fusion.

Figure 4. Schematic of task knowledge base.

Figure 5. Results under different task guidance: (a) original image. (b) image compressed by traditional algorithm. (c) Rebuilt image of semantic communication with “car” and “person” as cue words. (d) Rebuilt image of semantic communication with “car” as cue word. (e) Rebuilt image of semantic communication with “person” as cue word.

Figure 6. Learned perceptual image patch similarity (LPIPS) under different compression rates.

Figure 7. Target acquisition rate under different compression ratio.

Figure 8. Comparison of target detection rate for images of different schemes: (a) Original image. (b) Rebuilt image of semantic communication. (c) Image compressed by traditional algorithm.

Figure 9. Target acquisition rate under different compression ratio on the MAR20 dataset.

Table 1. Hyperparameters in the experiment.

Hyperparameter	Value
Confidence in Semantic Segmentation	$0.40$
Confidence in Target Detection (performance analysis)	$0.60$
Image fusion weight $α$	1
Image fusion weight $β$	1
Information unit classification threshold $k_{1}$	1
Information unit classification threshold $k_{2}$	1

Table 2. Ablation studies. Assessing the performance of various modules under a 5% compression ratio.

Method	LPIPS	Target Acquisition
Scheme without Knowledge Base	$0.44$	$0.83$
Scheme without Yolo-World	$0.44$	$0.83$
Scheme without Bandwidth Allocation	$0.34$	$0.94$
Complete Scheme	$0.32$	$0.96$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, C.; Xi, J.; He, Z.; Liu, J.; Yang, J. A Knowledge Base Driven Task-Oriented Image Semantic Communication Scheme. Remote Sens. 2024, 16, 4044. https://doi.org/10.3390/rs16214044

AMA Style

Guo C, Xi J, He Z, Liu J, Yang J. A Knowledge Base Driven Task-Oriented Image Semantic Communication Scheme. Remote Sensing. 2024; 16(21):4044. https://doi.org/10.3390/rs16214044

Chicago/Turabian Style

Guo, Chang, Junhua Xi, Zhanhao He, Jiaqi Liu, and Jungang Yang. 2024. "A Knowledge Base Driven Task-Oriented Image Semantic Communication Scheme" Remote Sensing 16, no. 21: 4044. https://doi.org/10.3390/rs16214044

APA Style

Guo, C., Xi, J., He, Z., Liu, J., & Yang, J. (2024). A Knowledge Base Driven Task-Oriented Image Semantic Communication Scheme. Remote Sensing, 16(21), 4044. https://doi.org/10.3390/rs16214044

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Knowledge Base Driven Task-Oriented Image Semantic Communication Scheme

Abstract

1. Introduction

2. Related Works

3. System Model

3.1. Knowledge Base

3.2. Unmanned End

3.3. Control End

4. Experimental Results

4.1. Results Under Different Task Guidance

4.2. Image Reconstruction Quality Assessment

4.3. Task Performance Evaluation

4.4. Ablation Study

5. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI