Next Article in Journal
Numerical and Analytical Analysis of the Low-Frequency Magnetic Fields Generated by Three-Phase Underground Power Cables with Solid Bonding
Previous Article in Journal
Detecting Fake Reviews in Google Maps—A Case Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Toward Low Time Fragmentation of Equipment: A Double-DQN Based TT&C Task Planning Approach

School of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(10), 6326; https://doi.org/10.3390/app13106326
Submission received: 13 March 2023 / Revised: 14 May 2023 / Accepted: 18 May 2023 / Published: 22 May 2023

Abstract

:
With the increase of the number of satellites in space, satellite tracking, telemetry, and command (TT&C) is becoming more and more important for aerospace. This paper proposes a method for a low time fragmentation oriented. TT&C task planning method based on Double Deep Q-Network (DDQN). This method mainly solves the problem of poor responses to emergency tasks caused by the large amount of time fragments of equipment under traditional TT&C task-planning methods. Firstly, a multi-objective optimization model aiming at minimizing time fragments and maximizing task revenue is formulated. Then, according to the conflict characteristics of tasks, a task-planning conflict graph is proposed, based on which a TT&C task-planning problem is transferred into an independent set problem. Finally, DDQN is combined with graph structure embedding to solve the transferred independent set problem. The experimental results demonstrate that the proposed method reduces the time fragment of TT&C equipment by 32% and shortens the response time of emergency tasks by 36% compared to existing methods.

1. Introduction

The satellite tracking, telemetry, and command (TT&C) system is utilized to transmit the telemetry and telecommand data and determine the satellite orbit [1]. With the development and construction of prominent satellite constellations such as Hongyun and Hongyan in China [2], the number of satellites in space has rapidly increased [3,4], resulting in an increasing TT&C demand for space networks. On the other hand, due to the long construction period, high cost, and constraints of national borders, the number of ground stations is relatively limited, resulting in a lack of available TT&C resources. Therefore, more severe resource conflicts will be encountered in TT&C task planning, which also brings more significant challenges to the TT&C resource management of space networks.
The TT&C tasks can be divided into two categories: common tasks and emergency tasks. Common tasks [5] request the daily tracing of satellites in space and form a majority of the TT&C tasks. Emergency tasks [6] are the real-time tasks that take place during emergencies. As satellites play an increasingly important role in disaster relief and emergency search and rescue, more and more emergency TT&C tasks are emerging. During the planning stage, the emergency task often needs to be inserted into the idle time of the TT&C equipment or preempt the planned resources that are allocated to common tasks. However, the existing traditional TT&C task-planning methods only aim to maximize the sum revenue of the completed tasks, which generates many short time intervals (i.e., time fragments) between the TT&C windows, which cannot be used for planning tasks, significantly affecting the planning of emergency tasks. The resources of common TT&C tasks are preempted so that the common TT&C tasks cannot be successfully executed. Therefore, it is necessary to study the TT&C task-planning method toward low time fragments, which facilitate the planning of emergency TT&C tasks to reduce the response time of emergency TT&C tasks.
The existing methods for common TT&C task planning mainly include deterministic algorithms [7,8,9], heuristic algorithms [10,11,12], and intelligent algorithms [13,14,15,16,17]. Among them, the deterministic algorithm is mainly suitable for solving TT&C task scheduling problems on a small scale, but it is difficult to apply this algorithm to solve problems on a large scale. The heuristic algorithm is mainly based on certain heuristic rules to obtain the optimal solution within a certain time. However, the heuristic rules of this type of algorithm are often derived from engineering experience and have great uncertainty. Intelligent algorithms seek optimal results while taking into account efficiency and accuracy. With the gradual increase in satellites in recent years, traditional deterministic and heuristic algorithms have not been the most suitable for solving TT&C task planning problems. Therefore, most of the current research focuses on the research into intelligent algorithms. References [13,14,15] all established mathematical models of TT&C resource planning models to minimize the workload and use different ant colony algorithms to solve them. Reference [16] designed an emergency task-planning algorithm based on a simulated annealing algorithm with the goal of maximizing the total task revenue. The algorithm reduces the probability of falling into local optimum through an iterative neighborhood search and instability strategy to realize the reasonable allocation of multi-dimensional TT&C resources. Reference [17] established a mathematical model of TT&C task planning with a minimized overall task satisfaction rate as the optimization goal. It proposed an improved DQN (Deep Q-Network)-based TT&C resource optimization configuration method to effectively solve the joint planning problem of heterogeneous TT&C network resources. This algorithm effectively improves the satisfaction rate of the TT&C tasks.
Although, References [13,14,15,16,17] used intelligent algorithms to improve the completion rate of TT&C tasks effectively, which do not control the time fragments of TT&C equipment during the task-planning process, resulting in untimely responses to emergency tasks. Therefore, this paper designs a low time fragmentation-oriented TT&C task-planning method (LTFTT&C) based on DDQN (Double Deep Q-network) to minimize time fragmentation and maximize task revenue. The algorithm uses deep reinforcement learning to mine the distribution rules of TT&C task resources of the space network. It constructs a common TT&C task feasible scheme that considers the task completion rate and emergency task response speed. Unlike other traditional TT&C task-planning algorithms, LTFTT&C can reduce the time fragments in the task-planning process while ensuring the success rate of the task. The basic idea is as follows: Firstly, aiming at the problem of high time fragmentation and low utilization rates of TT&C equipment in the existing TT&C task-planning results, a mathematical model of TT&C task planning is developed with the objective function of minimizing the total time fragments and maximizing the total task revenue. Then, based on the conflict graph model, the task-planning problem is transformed into a maximum independent set problem. Finally, the task-planning algorithm based on DDQN and the mathematical model are used alternately to complete the entire solution process. Comparing the LTFTT&C algorithm with the existing methods, the simulation results show that the proposed algorithm in this paper reduces the time fragments of TT&C equipment while ensuring the revenue of common TT&C tasks and effectively improves the response speed to subsequent emergency tasks.
The remainder of this paper is organized as follows: Section 2 gives a description of the TT&C task-planning scenario and problem. Section 3 has established the mathematical model of the TT&C task-planning problem in a space network. Section 4 transforms the TT&C task-planning problem into an independent set problem on the graph. Section 5 introduces a low time fragmentation-oriented TT&C task-planning algorithm. Section 6 conducts simulation experiments to verify the effectiveness of the proposed algorithm. The last section gives the research conclusion.

2. Scenario and Problem Description

The TT&C scenario shown in Figure 1 is composed of satellites and ground stations. The set of satellites is S = { s 1 , s 2 , s 3 , , s i , , s N } , and s i represents the i-th satellite. The set of ground stations is G = { g 1 , g 2 , g 3 , , g h , , g M } , and g h represents the h-th ground station. Each ground station has L TT&C equipment. The set of all the TT&C equipment is A = { a 1 , a 2 , a 3 , , a j , , a L , a L + 1 , , a 2 L , , a M L } , and a j represents the j-th TT&C equipment. There are w n i j visible windows between satellite s i and TT&C equipment a j in the planning horizon, and the set is W i , j = w i , j k k = 0 , 1 , 2 , , w n i j . w i , j k represents the k-th visible window between satellite s i and TT&C equipment a j in the planning horizon. For each visible window w i , j k W i , j , it can be expressed as w i , j k = [ w s i , j k , w e i , j k , u d i , j k ] , where w s i , j k and w e i , j k are the start time and end time of the visible window. u d i , j k is the sign of orbit ascending or descending. W i = 1 j M L W i , j is the set of all the visible windows of satellite s i between all the TT&C equipment. The set of all visible windows in the planning horizon is expressed as W = 1 i N , 1 j M L W i , j .
During the planning horizon, all the N satellites in the space network need to be tracked, telemetered and commanded as required. o m i represents the TT&C requirements of the i-th satellite. Therefore, the space network has N TT&C tasks, and their set is denoted as O M = o m i 1 i N . For each o m i O M , it is represented by a four-dimensional tuple o m i = i d i , w r i , g r i , ω i , where i d i represents the ID of the satellite to be TT&C in task o m i . w r i represents the requirement of the TT&C window, i.e., the number of TT&C windows required and the request of the orbit ascending or descending for task o m i . g r i indicates the type of TT&C equipment, that is, the function of the TT&C equipment required by task o m i . ω i represents the revenue bought by completing task o m i .
As shown in Figure 2, w 1 , 1 1 , w 3 , 1 2 , and w 2 , 1 3 are three scheduled windows of different tasks performed by the same equipment. A relatively short time interval, i.e., a time fragment, is easily formed between adjacent scheduled time windows of the same TT&C equipment. Specifically, two time intervals with lengths ε 1 and ε 2 are, respectively, formed between the three consecutive time windows w 1 , 1 1 , w 3 , 1 2 , and w 2 , 1 3 of the TT&C equipment a 1 . The length of ε 1 is shorter than the threshold Th, and the length of ε 2 is longer than the threshold Th. Therefore, ε 1 forms a time fragment of equipment a 1 . In the planning horizon, if the interval between two adjacent time windows executing tasks in the same TT&C equipment is less than the threshold Th, a time fragment Δ t l is formed, and the set of time fragments is Δ T = Δ t 1 , Δ t 2 , Δ t 3 , , Δ t l , .
The essence of the TT&C task-planning problem of the space network is to allocate suitable visible windows of the suitable equipment to each satellite need to be tracked, telemetered and commanded. In this way, the TT&C equipment can track, telemetry, monitor, and control the satellite flight orbit, attitude, and operational status of the onboard subsystems. The core problem studied in this paper is to maximize the total revenue of TT&C tasks while minimizing the total time fragments after the execution of TT&C tasks without exceeding the capacity of network resources.

3. Mathematical Model

During the execution of the TT&C task, the transceiver equipment between the satellite and the ground station can only perform one task at a time. Therefore, there may be conflicts in the simultaneous scheduling of different visible windows. For any two visible windows w i , j k , w m , n r W , they conflict with each other if both of the following conditions are satisfied:
(1)
There is time overlap between visible windows w i , j k and w m , n r , i.e., w s m , n r < w s i , j k < w e m , n r or w s m , n r < w e i , j k < w e m , n r .
(2)
Windows w i , j k and w m , n r occupy the same satellite or TT&C equipment, i.e., i = m or j = n.
As shown in Figure 3, the start time of window w 2 , 1 1 is before the end time of window w 1 , 1 1 , and the end time of window w 2 , 1 1 is after the start time of window w 1 , 1 2 . The overlapping part of the two windows together uses the same equipment to cause a conflict.
A TT&C task of a satellite requires multiple ascending or descending orbit TT&C windows. The combination of visible windows that meet the requirements constitutes a feasible scheme for the TT&C task. In this paper, y i α represents the α-th feasible scheme of task o m i , so the set of feasible schemes for the task o m i is Y i = y i 1 , y i 2 , , y i α , , and the set of feasible schemes for all TT&C tasks is Y = 1 i N Y i . For example, the window requirement w r i of task o m i is two ascending orbit windows and two descending orbit windows. The α-th feasible scheme of TT&C task o m i is y i α = w i α , 1 , w i α , 2 , w i α , 3 , w i α , 4 , and w i α , 1 , w i α , 2 , w i α , 3 , w i α , 4 W , where w i α , 1 , w i α , 1 , w i α , 3 , and w i α , 4 are, respectively, the first ascending orbit window, the first descending orbit window, the second ascending orbit window, and the second descending orbit window in the α-th scheme of the TT&C task.
Since different visible windows between satellites and TT&C equipment may conflict, there may also be conflicts between different feasible schemes. If different feasible schemes contain the same window or if there is a conflict between their visible windows, there is a conflict between the feasible schemes. Therefore, for any two feasible schemes y i α , y j β Y , a conflict between them must meet one of the following two conditions:
(1) w i , j k W has w i , j k y i α and w i , j k y j β .
(2) w i , j k y i α , w m , n r y j β , there is a conflict between window w i , j k and window w m , n r .
Use O y i α to represent the set of all feasible schemes that conflict with y i α . The Boolean variable x i α = 0 , 1 is used to indicate whether the α-th feasible scheme of task o m i is scheduled. Specifically, x i α = 1 means that the α-th feasible scheme of task o m i is scheduled. Otherwise, x i α = 0 means that the α-th feasible scheme of task o m i is not scheduling.
The goal of TT&C task planning is to allocate appropriate time windows for all TT&C tasks within the planning horizon so that the total revenue of TT&C tasks is maximized and the time fragments generated by them are minimized. The following conditions must be met when planning the TT&C tasks:
(1) A TT&C task can only be completed once by a TT&C feasible scheme.
(2) A TT&C scheme and its conflicting scheme cannot be scheduling at the same time.
The low time fragmentation-oriented TT&C task-planning problem is modeled as the following multi-objective optimization problem:
P : min l = 1 Δ T Δ t l max i = 1 N ω i α = 1 Y i x i α
Subject to the following:
C 1 :   α = 1 Y i x i α 1 , 1 i N
C 2 :   x i α + y i β O y i α x j β 1 , y i α Y
where C1 is the unique constraint of tasks, which guarantees that each TT&C task can only be completed once at most. C2 is the conflict constraint of the TT&C scheme, i.e., a TT&C scheme and its conflicting schemes cannot be scheduling simultaneously.

4. Problem Transformation

Observing the above optimization problem P, we can see that this problem is a multi-objective combinatorial optimization problem and has been proven to be NP-hard [18]. To solve the TT&C task-planning problem more conveniently and effectively, this paper first transforms the problem into a graph theory problem, and then uses machine learning to solve the problem. Specifically, a task-planning conflict graph C G ( V , E ) is constructed, and the vertex v i α in the graph represents the task feasible scheme y i α , i.e., V = v i α y i α Y . The edges in CG represent the conflicted relationship between different feasible task feasible schemes. For any v i α , v n β V , a conflict between them must meet one of the following two conditions:
(1)
v i α and v n β present the two different feasible schemes of the same task, i.e., i = n.
(2)
There is a conflict between the feasible schemes corresponding to v i α and v n β . That is to say, there is y n β O ( y i α ) for v i α .
It means that there is a conflicted relationship between two vertices, namely, v i α , v n β E .
The above two conditions correspond to the constraints C1 and C2 in problem P. C1 represents the conflicted relationship between different feasible schemes of the same TT&C task. C2 indicates the conflicted relationship between feasible schemes of different TT&C tasks. Therefore, solving the optimization problem P is equivalent to solving the independent set problem on the graph.

5. Low Time Fragmentation-Oriented TT&C Task-Planning Algorithm

5.1. Graph Learning-Based Framework

Section 4 has transformed the TT&C task-planning problem into the problem of solving the independent set in the task-planning conflict graph. Based on this, this section combines deep learning and graph theory to propose a problem-solving framework, as shown in Figure 4. In this framework, the set IS represents the local solution of the independent set problem, which is a subset of set V. The Q evaluation function is obtained through reinforcement learning training.
As shown in Figure 4, we first construct a set of feasible schemes. On this basis, we construct a task-planning conflict graph and calculate the Q values of all vertices in the graph. Then, according to the Q evaluation function trained by deep learning, we select the vertex with the largest Q value. Additionally, we add this selected vertex to the current local solution IS. Afterward, we deleted the vertex from the task-planning conflict graph, and the task-planning conflict graph is updated. Lastly, we repeat the above process until there are no remaining vertices in the task-planning conflict graph. The current local solution IS is the optimal independent set.

5.2. Feasible Scheme Set Construction

A TT&C task may have many feasible schemes, which increase the complexity of TT&C task planning of space network. To better solve the TT&C task-planning problem of space networks, this paper constructs the feasible scheme set of all tasks in advance. Specifically, we model the feasible planning orders for visible windows of the same task as a directed graph model. Then, this paper transforms the problem of constructing the set of feasible schemes into a path searching problem in directed graph. Finally, we use the traversal search algorithm to find all the feasible schemes of all TT&C tasks.
Let directed graph Gi(Vi, Ei) represent all visible windows of satellite s i as well as their relationship in a planning horizon. Vertex set V i = W i B , E in the graph represents the vertex set corresponding to all visible windows between satellite s i and the TT&C equipment within a planning horizon. The start vertex B and the end vertex E are two virtual visible windows of the TT&C task o m i . The directed edge of directed graph G i represents the sequential scheduling relationship between two visible windows for task o m i . For instance, windows w i , m k , w i , n r W are two visible windows for task o m i . If there is no overlap between the two windows and the ascending or descending orbit alternate condition is satisfied, the two windows can be scheduled sequentially for task o m i . That is to say, E i in the graph represents the connection relationship between vertices, which is defined as follows:
(1)
w i , j k V i , exists ( B , w i , j k ) E i , and ( w i , j k , E ) E i ;
(2)
w i , m k , w i , n r V i , if the two windows do not overlap and one is an orbit ascending window and the other is an orbit descending window (without loss of generality, we assume that w s i , n r > w e i , m k ), then ( w i , m k , w i , n r ) E i .
As shown in Figure 5, vertex B and vertex E represent the virtual start window and virtual end window, respectively. w 1 , 1 1 , w 1 , 1 2 , w 1 , 2 1 , w 1 , 1 3 , w 1 , 1 4 , w 1 , 2 2 , w 1 , 2 3 , and w 1 , 2 4 represent different visible windows, respectively. A path that satisfies the number of windows and the window of ascending or descending requirements represents a feasible scheme within the planning period. Since the aim of TT&C tasks of space networks is to track, telemeter, and command satellites, different tasks are independent of each other in the construction of feasible schemes. That is to say, the directed graphs of each satellite are independent of each other, and the problem of constructing feasible schemes is transformed into the path problem of each independently directed graph. This paper uses the traversal search algorithm on the graph to complete the construction of the feasible scheme set [19,20,21].

5.3. Vertex Attribute Extraction and Parametric Design of Q Function

Define the vertex attribute x v = ( d v , β v , ϖ v , ω v ) of the task-planning conflict graph, where d v is the degree of vertex v in the current task-planning conflict graph C G I S ¯ , and β v is the number of vertices in C G I S ¯ that correspond to the same task as vertex v in C G I S ¯ , namely,
β v = v i α v i α V I S ¯
where the operator · is the number of elements in the set. ϖ v represents the newly added time fragments after the feasible scheme represented by vertex v is added to the current local solution. ω v is the revenue obtained after the task feasible scheme corresponding to vertex v is completed.
In order to simplify the symbolic expression, v is used in the following text to express vertex v i α in the task-planning conflict graph. Graph structure embedding is carried out recursively. In each step of the recursive process, information is embedded for each vertex of the graph, so that it contains a larger range of graph structure and vertex attribute information. By embedding different degrees of information on each vertex, the computation can be effectively reduced and the algorithm efficiency can be improved. In particular, in the recursive process, a q-dimensional vector μ v ( k ) is used to represent the embedding vector of the vertex v obtained after the k-th iteration. The initial value of the embedding vector of vertex v is set to 0.
μ v ( k + 1 ) r e l u ( θ 1 x v + θ 2 u N I S ¯ ( v ) μ u ( k ) )
It can be seen from Equation (4) that the updating process of graph structure embedding is based on the adjacency relationship of vertices in the graph, where N I S ¯ ( v ) is the vertices adjacent to the vertex v in the C G I S ¯ graph. θ 1 and θ 2 are the model parameters, and relu() is the rectified linear unit. Based on Equation (4), one can see that the embedding update process is carried out based on the graph topology. A new round of embedding sweeping across the vertexes will start only after the embedding update for all nodes from the previous round has finished. The above process goes through a total of K iterations, and each vertex obtains an embedding vector μ v ( K ) .
After the K-step graph embedding information is completed, the embedding vector is defined as the evaluation function Q ( I S , v ; Θ ) . u I S μ u and μ v are used to represent the graph CG and vertex v, respectively, and Q ( I S , v ; Θ ) is defined as follows:
Q ( I S , v ; Θ ) = θ 3 T r e l u [ θ 4 u I S μ u , θ 5 μ v ]
where θ 3 T , θ 4 , and θ 5 are the parameters to be trained for the model, and [.,.] is the concatenation operator. The parameters μ u and μ v are calculated by Equation (4). They are affected by the parameters θ 1 and θ 2 in the embedding process of the graph structure. So the evaluation function Q ( I S , v ; Θ ) is determined by five parameters θ i i = 1 5 .
Algorithm 1 shows the specific calculation steps of the evaluation function Q ( I S , v ; Θ ) , where u I S μ u k needs to be recalculated according to the task-planning the subgraph of task planning conflict graph at that time in each iteration. The parameters θ i i = 1 5 are obtained through deep Q-learning training.
Algorithm 1 Calculation of evaluation funtion Q ( I S , v ; Θ )
1: Initialize xv
2: for k = 1 to K do
3:   for vV do
4:      μ v ( k + 1 ) r e l u ( θ 1 x v + θ 2 u N I S ¯ ( v ) μ u ( k ) )
5:   end for
6: end for
7: Return Q ( I S , v ; Θ ) = θ 3 T r e l u [ θ 4 u I S μ u , θ 5 μ v ]

5.4. Evaluation Function Based on DDQN Network Training

As a typical representative of the value-based algorithm in the reinforcement learning algorithm, the DQN algorithm combines deep learning and reinforcement learning to predict the Q value. Q value can be understood as the value of state action, i.e., the expected revenues of the agent through acting in a certain state. Although the DQN algorithm has achieved good results in deep reinforcement learning, it sometimes overestimates the behavior value too optimistically. This will cause the algorithm to unanimously regard the suboptimal behavior value as the optimal behavior value, eventually failing to converge to the best value function. However, the calculation method of the target value of DDQN and DQN is different. In DQN, the largest Q value in the target network is directly selected as the target value. Additionally, DDQN calculates the Q value according to the parameters of the current Q network. Therefore, the Q value in DDQN is less than or equal to the Q value in DQN. This reduces the overestimation to a certain extent, making the Q value closer to the real value.
Therefore, in this section, solving the independent set in the task-planning conflict graph is modeled as a DDQN learning process. Additionally, the Q evaluation function has been trained. This paper defines the state, action, and reward under the reinforcement learning framework. The specific content is as follows:
  • State: The state in reinforcement learning is the local solution IS. Because the impact of local solution on the subsequent independent set construction process is equal to the task-planning conflict graph in the iteration process, the q-dimension embedded vector u I S μ u of the task-planning conflict subgraph is used to represent the current state.
  • Action: The action v is a vertex that does not belong to the current local solution IS. Additionally, the action v is expressed as a q-dimensional embedding vector μ v .
  • Reward: This paper aims to minimize time fragments and maximize task revenue. Therefore, the designed reward is the revenue of vertex v minus the total time fragment’s increment after vertex v is added to the current local solution IS, namely:
    r ( I S , v ) = λ ω v Δ T ( I S , v )
    where Δ T ( I S , v ) is the total time fragment generated by adding the feasible scheme represented by vertex v to the set of feasible schemes corresponding to the current solution IS, and λ∈(0,1) is a constant.
The evaluation function Q ( I S , v ; Θ ) gives the value of action v in a given state IS. DDQN is used to optimize parameter Θ of the evaluation Q ( I S , v ; Θ ) function in the reinforcement learning process.
In this paper, two DDQN neural networks are constructed to train model parameters. i.e., target network and evaluation network. The evaluation network is used to select an action, and the target network is used to evaluate the value of the action. The two networks update parameters alternately with each other. Specifically, target network has parameter Θ t and evaluation network has parameter Θ e are established. Additionally, the parameter Θ e is updated by minimizing the square loss through the gradient descent method:
L ( Θ ) = E ( h Q ( I S , v ; Θ e ) ) 2
During the training process, the neural network copies parameter Θ t to parameter Θ e every several step. In DDQN, the target value is defined as
h = r ( I S τ , v τ ) + γ Q ( I S τ , arg max v Q ( I S τ + 1 , v , Θ e ) ; Θ t ) I S I S ^ r ( I S τ , v τ ) I S = I S ^
where γ is the Q-learning decay coefficient.
When constructing an independent set, the reward obtained for each additional vertex is not only from the successful planning of the task represented by this vertex, but also from subsequent task rewards in non-conflicting task planning. For this delayed reward scenario, the Q function updated at each step is too short-sighted to reflect the long-term revenue situation well. Therefore, this paper uses the n-step Q-learning method to update the parameters. Unlike single-step learning, n-step Q-learning needs to wait n steps for parameter update. Only in this way can the Q-function provide a more precise evaluation of future rewards. The parameter update method is the same as the single-step learning (Equation (7)). Therefore, there is a difference in the calculation of the target value:
h = i = 1 n 1 r ( I S τ + i , v τ + i ) + γ Q ( I S τ + i , arg max v Q ( I S τ + n , v , Θ e ) ; Θ t ) I S τ + n 1 I S ^ i = 1 n 1 r ( I S τ + i , v τ + i ) I S τ + n 1 = I S ^
Moreover, there is a strong correlation between the continuous actions of the independent set construction process. In order to reduce the impact of the correlation between continuous samples on neural network training, this paper uses experience replay technology to speed up the convergence speed of neural network and improve the robustness of training results. Specifically, samples are continuously added to the data set Π at each step in every round. For example, in step τ + n , the tuple I S τ + n , v τ + n , R τ + n , τ , I S τ is added to Π , where R τ + n , τ = i = 1 n 1 r ( I S τ + i , v τ + i ) .
In summary, we find that there is a reward delay problem during independent set construction and that there is a strong correlation between consecutive samples. Based on this, we first address the reward-delay situation during independent set construction using n-step Q-learning. Then, we use the experience replay technique to reduce the correlation between consecutive samples. At last, this paper designs an independent set algorithm based on double deep Q-learning to complete the training of parameter Θ , as shown in Algorithm 2. C G ( V c , E c ) is a task-planning conflict graph constructed from a set of tasks drawn from the sample space.
Algorithm 2: Evaluation function training based on DDQN learning
1: Initialize experience replay storage space Π .
2: for round e = 1 to L do
3:   Extract a task set and construct the corresponding task-planning conflict graph C G ( V c , E c )
4:   Initialize τ = 1, IS1 = Ø, and random parameters Θ t , Θ e .
5:   while V I S τ ¯ :
6:      v τ = random node v V I S τ ¯ w . p . ε arg max v V I S τ ¯ Q I S τ , v ; Θ e w . p . 1 - ε .
7:     Add vertex v t into the local solution: I S τ + 1 = I S τ v τ
8:     Delete vertex v τ and its neighbor vertices from C G I S ¯ .
9:    if τ     n then
10:      Add tuple I S τ + n , v τ + n , R τ + n , τ , I S τ to Π .
11:      Sample random batch from B~ Π .
12:      Use the samples in B to train and update the value of Θ e based on the gradient descent method.
13:      Replace target parameters Θ t Θ e every c steps
14:    end if
15:      τ = τ + 1 .
16:   end while
17: end for
18: return Θ t

6. Simulation

In this paper, STK11.3 software is used to build a scenario of space networks. Tensorflow1.15 is used to build a neural network, and Python3.7 is used to build the algorithm main program. The space network scenario configuration is as follows:
(1)
A total of 150 low-orbit satellites are located in 150 different orbits. The orbital altitudes are 500 km, 650 km, and 800 km. Inclinations are evenly distributed within the range of 96 to 100 degrees.
(2)
There are two ground stations (Xi’an Station and Kashi Station in China), each of which has three sets of TT&C equipment.
The task-planning horizon is from 2022-8-27 04:00:00 to 2022-8-28 04:00:00. During the experiment, each TT&C task has two to six different TT&C window requests of the orbit ascending or descending. A total of 150 groups of tasks are randomly generated, of which 120 are used as training sets and 30 as test sets. The time fragment length threshold Th is set to 5 min during the experiment.
Figure 6 shows the convergence of training the function Q network at different learning rates. It can be seen that the larger the learning rate, the faster the Q-function converges. When the learning rate is 0.05, the loss function converges the fastest, but the result quickly falls into the local optimum. When the learning efficiency is 0.0005, the convergence speed of the loss function is too slow, resulting in too long a training time to achieve a good training effect. Therefore, this paper uses a learning rate of 0.005 to train the Q evaluation function. Because this can not only prevent the loss function from converging too slowly but also prevent the loss function from falling into a local optimum to a certain extent. Finally, the best training result is achieved.
In order to verify the performance of the algorithm proposed in this paper, this paper compares the LTFTT&C algorithm with the randomized adaptive search procedure (GRASP) [22] and genetic algorithm (GA) [23]. It mainly conduct performance comparisons in terms of the total time fragments in different TT&C task scenarios, average response time of emergency tasks, and total task revenue.
Figure 7 compares the total time fragments of the TT&C equipment of the three TT&C task-planning algorithms under different task numbers. It can be seen that, compared with the other two algorithms, the LTFTT&C algorithm significantly reduced the total amount of time fragments of equipment under different task numbers. Compared with the GRASP algorithm, LTFTT&C reduced the total time fragments of TT&C equipment by 48%. Compared with the genetic algorithm, LTFTT&C reduced the total time fragments of TT&C equipment by 32%. To further verify the LTFTT&C algorithm to reduce the gain of time fragmentation on emergency task planning, the emergency task test response time (i.e., the difference between the emergency task completion time and arrival time) was inserted based on the three algorithms for common TT&C tasks. As shown in Figure 8, with the increase in the number of emergency tasks, the average response time of the emergency tasks of the three algorithms increases to varying degrees, and the LTFTT&C algorithm is obviously better than the other two algorithms. LTFTT&C is 43% faster than the GRASP algorithm and 36% faster than the genetic algorithm in the average response time of the emergency tasks. This is mainly because the LTFTT&C algorithm effectively reduces the amount of time fragments, making it easier for emergency tasks to insert into suitable visible windows.
Figure 9 shows the performance of the three algorithms regarding the total task revenue. It can be seen from the figure that when the number of tasks is small, the LTFTT&C algorithm is significantly better than the other two comparison algorithms in terms of task revenue. It is 21% and 15% higher than the GRASP and genetic algorithms. When the number of tasks is large, the LTFTT&C algorithm improves the total task revenue by 28% compared with the GRASP algorithm. It decreases the total task revenue by 7% compared with the genetic algorithm. Combining Figure 7 and Figure 9, compared with GRASP, the LTFTT&C algorithm adopts the deep reinforcement learning method, which not only improves the revenue but also reduces the time fragment. Compared with the genetic algorithm, the LTFTT&C algorithm sacrifices a small amount of revenue in exchange for a significant decrease in time fragmentation, thus effectively reducing the response time of emergency tasks.

7. Conclusions

In this paper, we proposed a low time fragmentation-oriented TT&C task-planning method to reduce the total time fragments of TT&C equipment and improve the response time of emergency tasks. Firstly, this paper established a mathematical model with the optimization goal of minimizing the total time fragments and maximizing the total revenue of TT&C tasks. Then, we transformed the problem of TT&C task planning into an independent set problem in the graph, and introduced DDQN in deep reinforcement learning as an evaluation strategy. Afterward, we designed a low time fragmentation-oriented TT&C task-planning method to improve the service capability of TT&C resources. The simulation results of multiple experimental scenarios showed that the LTFTT&C algorithm is significantly better than GRASP and GA algorithms in terms of time fragmentation. Therefore, using the LTFTT&C algorithm to solve the TT&C task-planning problem is reasonable and effective.

Author Contributions

Methodology, H.X. and R.L.; Validation, H.X.; Writing—original draft, H.X.; Writing—review & editing, R.L.; Supervision, R.L. All authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Natural Science Foundation of Shaanxi Province (2023-JC-YB-556), in part by the Young Talent fund of University Association for Science and Technology in Shaanxi China (20200112), in part by the Key Research and Development Program in Shaanxi Province of China (2021GY-006), and in part by the Postdoctoral Foundation in Shaanxi Province of China (2018BSHEDZZ47).

Institutional Review Board Statement

The study did not require ethical approval.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Zhan, Y.; Wan, P.; Jiang, C.; Pan, X.; Chen, X.; Guo, S. Challenges and solutions for the satellite tracking, telemetry, and command system. IEEE Wirel. Commun. 2020, 27, 12–18. [Google Scholar] [CrossRef]
  2. Chen, S. Analysis on low-orbit satellite communication and my country’s development suggestions. Telecommun. Sci. 2020, 36, 1–13. [Google Scholar]
  3. Dong, Y.; Li, Y.; Chen, J.; Ou, T.; Li, Y.; Xi’an Satellite Control Center. Demand analysis of command control system of the space TT&C network. In Proceedings of the 2nd International Conference on Information Technology and Computer Application (ITCA), Beijing, China, 25 July 2019; pp. 236–239. [Google Scholar]
  4. Zhou, D.; Sheng, M.; Li, J.; Han, Z. Aerospace Integrated Networks Innovation for Empowering 6G: A Survey and Future Challenges. IEEE Commun. Surv. Tutor. 2023. [Google Scholar] [CrossRef]
  5. Zhai, X.; Niu, X.; Tang, H.; Wu, L.; Shen, Y. Robust satellite scheduling approach for dynamic emergency tasks. Math. Probl. Eng. 2015, 2015, 1–20. [Google Scholar] [CrossRef]
  6. Dai, C.Q.; Li, C.; Fu, S.; Zhao, J.; Chen, Q. Dynamic scheduling for emergency tasks in space data relay network. IEEE Trans. Veh. Technol. 2020, 70, 795–807. [Google Scholar] [CrossRef]
  7. Whitley, D.; Sutton, A.; Howe, A.; Barbulescu, L. Resource scheduling with permutation based representations: Three applications. Evol. Comput. Pract. 2008, 88, 219–243. [Google Scholar]
  8. Marinelli, F.; Nocella, S.; Rossi, F.; Smriglio, S. A Lagrangian heuristic for satellite range scheduling with resource constraints. Comput. Oper. Res. 2011, 38, 1572–1583. [Google Scholar] [CrossRef]
  9. Sarkheyli, A.; Bagheri, A.; Ghorbani-Vaghei, B.; Askari-Moghadam, R. Using an effective tabu search in interactive resources scheduling problem for LEO satellites tasks. Aerosp. Sci. Technol. 2013, 29, 287–295. [Google Scholar] [CrossRef]
  10. Chen, M.; Wen, J.; Song, Y.J.; Xing, L.N.; Chen, Y.W. A population perturbation and elimination strategy based genetic algorithm for multi-satellite TT&C scheduling problem. Swarm Evol. Comput. 2021, 65, 100912. [Google Scholar]
  11. Ren, B.; Liu, J.; Li, Z.; Wu, J.; Yao, Z. Satellite Requirement Preference Driven TT&C Resources Scheduling Algorithm for Time Sensitive Tasks. In Proceedings of the 3rd International Conference on Electronic Information and Communication Technology (ICEICT), Shenzhen, China, 13–15 November 2020; pp. 15–19. [Google Scholar]
  12. Gu, X.; Bai, J.; Zhang, C.; Gao, H. Study on TT&C resources scheduling technique based on inter-satellite link. Acta Astronaut. 2014, 104, 26–32. [Google Scholar]
  13. Zhang, Z.; Hu, F.; Zhang, N. Ant colony algorithm for satellite control resource scheduling problem. Appl. Intell. 2018, 48, 3295–3305. [Google Scholar] [CrossRef]
  14. Zhang, N.; Feng, Z.; Ke, L. Guidance-solution based ant colony optimization for satellite control resource scheduling problem. Appl. Intell. 2011, 35, 436–444. [Google Scholar] [CrossRef]
  15. Zhang, Z.; Zhang, N.; Feng, Z. Multi-satellite control resource scheduling based on ant colony optimization. Expert Syst. Appl. 2014, 41, 2816–2823. [Google Scholar] [CrossRef]
  16. Wang, Y.; Zhou, D.; Song, N.; Sheng, M.; Li, J.; Liu, J. Concurrent Reconfiguration of Resource-Oriented Emergency TT&C Task Scheduling for Space Information Networks. J. Commun. Inf. Netw. 2021, 6, 142–152. [Google Scholar]
  17. Xue, N.; Ding, D.; Fan, Y.; Wang, Z. Research on Joint Scheduling Method of Heterogeneous TT&C Network Resources Based on Improved DQN Algorithm. In Proceedings of the 2nd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), Chongqing, China, 17–19 December 2021; Volume 2, pp. 1009–1014. [Google Scholar]
  18. Barbulescu, L.; Watson, J.P.; Whitley, L.D.; Howe, A.E. Scheduling space–ground communications for the air force satellite control network. J. Sched. 2004, 7, 7–34. [Google Scholar] [CrossRef]
  19. Christofides, N. The optimum traversal of a graph. Omega 1973, 1, 719–732. [Google Scholar] [CrossRef]
  20. Cheung, T.Y. Graph traversal techniques and the maximum flow problem in distributed computation. IEEE Trans. Softw. Eng. 1983, SE-9, 504–512. [Google Scholar] [CrossRef]
  21. Buchsbaum, A.L.; Goldwasser, M.H.; Venkatasubramanian, S.; Westbrook, J.R. On external memory graph traversal. In Proceedings of the ACM-SIAM Symposium on Discrete Algoritma, San Francisco, CA, USA, 9–11 January 2000; pp. 859–860. [Google Scholar]
  22. Rojanasoonthon, S.; Bard, J. A GRASP for parallel machine scheduling with time windows. INFORMS J. Comput. 2005, 17, 32–51. [Google Scholar] [CrossRef]
  23. Wu, B.; Li, Y.-X.; Huang, Y.-X. Optimal scheduling of TT&C network resources based on genetic algorithm. J. Astronaut. 2006, 27, 1132–1136. [Google Scholar]
Figure 1. TT&C scenario.
Figure 1. TT&C scenario.
Applsci 13 06326 g001
Figure 2. Time fragments.
Figure 2. Time fragments.
Applsci 13 06326 g002
Figure 3. Conflicts among visible windows.
Figure 3. Conflicts among visible windows.
Applsci 13 06326 g003
Figure 4. Graph learning-based framework for solving the TT&C task-planning problem.
Figure 4. Graph learning-based framework for solving the TT&C task-planning problem.
Applsci 13 06326 g004
Figure 5. Directed graph of visible windows.
Figure 5. Directed graph of visible windows.
Applsci 13 06326 g005
Figure 6. The change graph of loss function with the number of training steps.
Figure 6. The change graph of loss function with the number of training steps.
Applsci 13 06326 g006
Figure 7. Total time fragments vs. number of tasks.
Figure 7. Total time fragments vs. number of tasks.
Applsci 13 06326 g007
Figure 8. Average response time of emergency tasks.
Figure 8. Average response time of emergency tasks.
Applsci 13 06326 g008
Figure 9. Total revenue of tasks vs. number of tasks.
Figure 9. Total revenue of tasks vs. number of tasks.
Applsci 13 06326 g009
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, H.; Liu, R. Toward Low Time Fragmentation of Equipment: A Double-DQN Based TT&C Task Planning Approach. Appl. Sci. 2023, 13, 6326. https://doi.org/10.3390/app13106326

AMA Style

Xu H, Liu R. Toward Low Time Fragmentation of Equipment: A Double-DQN Based TT&C Task Planning Approach. Applied Sciences. 2023; 13(10):6326. https://doi.org/10.3390/app13106326

Chicago/Turabian Style

Xu, Hangkun, and Runzi Liu. 2023. "Toward Low Time Fragmentation of Equipment: A Double-DQN Based TT&C Task Planning Approach" Applied Sciences 13, no. 10: 6326. https://doi.org/10.3390/app13106326

APA Style

Xu, H., & Liu, R. (2023). Toward Low Time Fragmentation of Equipment: A Double-DQN Based TT&C Task Planning Approach. Applied Sciences, 13(10), 6326. https://doi.org/10.3390/app13106326

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop