Optimal Task Abort and Maintenance Policies Considering Time Redundancy

Chen, Ke; Zhao, Xian; Qiu, Qingan

doi:10.3390/math10091360

Open AccessArticle

Optimal Task Abort and Maintenance Policies Considering Time Redundancy

by

Ke Chen

,

Xian Zhao

and

Qingan Qiu

^*

School of Management & Economics, Beijing Institute of Technology, Beijing 100081, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(9), 1360; https://doi.org/10.3390/math10091360

Submission received: 20 March 2022 / Revised: 10 April 2022 / Accepted: 11 April 2022 / Published: 19 April 2022

(This article belongs to the Special Issue Data-Driven Methods and Artificial Intelligence in Reliability and Maintenance)

Download

Browse Figures

Versions Notes

Abstract

:

For many practical systems that are required to perform critical tasks, it is commonly observed that tasks can be performed multiple times within a limited time to improve task success probability. Such property is referred to as time redundancy. This paper contributes by studying the optimal adaptive maintenance and the task abort strategies of continuously degraded systems considering two kinds of time redundancy to improve system safety and task reliability. The task abort decision is considered dynamically according to the degradation level and the number of task attempts. Task success probability and system survival probability under two kinds of time redundancy are evaluated using an event-based numerical algorithm. The optimal imperfect maintenance and task abort thresholds are investigated dynamically in each attempt to minimize the expected total cost of maintenance, task failure and system failure. The established model in this study is illustrated by numerical results.

Keywords:

task abort; time redundancy; task success probability; system survival probability

MSC:

93E20

1. Introduction

Various practical systems are commonly required to perform specific tasks within a certain duration. Task success probability (TSP), the probability of completing a required task with or without deadlines, is a core metric to evaluate the performance during task execution [1,2,3,4,5,6,7]. Existing models on task based systems are mainly devoted to the evaluation and maximization of TSP [8,9,10]. However, there are often situations where the survival of the system from failure is more crucial than the successful completion of the required tasks, since the failure of these systems will cause huge economic losses and major environmental hazards [11,12,13,14]. For example, due to external influences (such as extreme natural conditions), when the risk of failure reaches a high level, the drones are designed to abnegate the surveillance task and immediately start rescue procedures [15]. As system failures in critical engineering applications often result in huge damages and casualties, it is pivotal to implement a detailed task abort policy to balance the trade-off between TSP and system survival probability (SSP), thereby minimizing the expected total cost of task failures and system failures.

Time redundancy is commonly incorporated into task execution to improve the TSP of task-critical systems. Time redundancy allows systems to execute tasks multiple times during a constrained time. For instance, satellites can perform the task of transiting information between the spacecraft and the ground observation station multiple times within a given time window [16]. There exist two types of time redundancy according to the criteria of task success: in type I time redundancy (ITR), systems should continually run for a period of time greater than a specified value; in type II time redundancy (IITR), task success requires that the cumulative operating time be larger than the required value [8].

In addition to task termination and time redundancy, preventive maintenance is another critical factor influencing TSP and SSP. Task abort policy and time redundancy are designed during task execution while preventive maintenance is taken before or after task execution. Preventive maintenance is crucial to ensure the highly reliable performance during task execution. To improve the TSP and SSP of safety-critical systems, an effective preventive maintenance scheme considering task abort and time redundancy should be developed.

Despite the significant theoretical advancement in the optimization of abort policy, the effect of time redundancy on the TSP and SSP under different abort and maintenance policies has not been explored. In real-world applications, time redundancy can enhance the system performance significantly and consequently influence the decision-making process. Taking into account the time redundancy when the abort and maintenance decisions are made will lead to more effective and beneficial task abort policies. To further advance the state of the art of evaluating and enhancing system performance during task execution, this paper contributes by modeling dynamic condition-based task termination and maintenance strategies considering two types of time redundancy. The results indicate that by introducing time redundancy into the abort and maintenance decision making, the SSP and TSP can be significantly improved. In summary, our contributions to existing theoretical and practical research on risk control are summarized as follows:

Dynamic preventive maintenance and task abort policies are designed that vary with the number task attempts;
TSP and SSP are derived under proposed preventive maintenance and task abort policies considering ITR and IITR;
The optimal preventive maintenance and abort thresholds minimizing the expected cost of preventive maintenance, task failure and system failure are studied.

The rest of this paper is organized as follows. Section 2 conducts a thorough literature review on task abort, time redundancy and maintenance planning. Section 3 characterizes the monotone degradation behavior and develops maintenance and task abort policies considering ITR and IITR. The MSP and SSP are evaluated under ITR and IITR in Section 4 and Section 5, respectively. Section 6 studies the optimal maintenance and abort thresholds. The obtained results is illustrated with a case study in Section 7. We conclude the research in Section 8 discussing the conclusion and future research directions.

2. Literature Review

Intensive efforts have been dedicated to modeling task abort policy with the aim of balancing the TSP and SSP under different types of system and mission characteristics. The optimal task abort strategy for unmanned aerial vehicles (UAVs) modeled by single-component systems under external shocks was studied in [15,17,18,19,20], where the task abort was determined by a threshold of the number of external shocks. In [21], the multi-state shock models were considered, and each shock may lead to the degradation of the system to a worse performance level and eventually cause system failure. The system state was used in the decision optimization problem, and the trade-off between TSP and SSP was studied. In addition to the shock-based abort decision making, other criteria can also be used to guide the termination strategy of safety-critical systems. For systems with a defective state, the duration of defective state [22] and warning signal [23] can be used to guide abort decision making. For systems subject to minor and catastrophic failures, the number of minimal repairs can be chosen as the decision variable of the termination strategy [24,25]. The abort modeling for systems with continuous degradation is drawing increasing attention thanks to the wide application of sensor technologies in safety-critical systems [26,27,28,29,30].

As an extension of abort policies for single-component systems, the pioneering work of the optimal task termination strategy for multi-component systems was conducted by Myers [12], which considered the hot standby system and developed strategy of task termination by taking the number of failed components as a decision variable. The rescue procedure in multi-component systems is commonly triggered by a certain number of failed components or external shocks to avoid costly consequences. Levitin et al. [31] generalized the model in [12] to the case of different components and proposed an adaptive termination strategy. Filene and Daly [32] characterized the effect of task termination strategy on TSP and SSP in distributed computer systems. Peng [33] designed the termination strategy of a multi-cooperative UAVs subject to external shocks. The optimal routing, aborting, and hitting strategies of UAVs were investigated in [34,35]. Levitin [36] calculated TSP and SSP considering the fault propagation effect. Levitin [21] considered several subtasks performed by different groups of units, and the optimal subtask assignment and optimal termination strategy between units is studied. The joint optimization of mission abort and component switching policies for multi-state warm standby systems was studied in [19].

The above-mentioned references have considered termination policies allowing only one attempt to complete a task. However, for systems performing critical tasks, time redundancy is commonly taken into consideration to improve their TSP. Reliability modeling of time redundancy has received considerable research attention in the past several years. In [10,37], the TSP and optimal maintenance policy under ITR were modeled and analyzed. The abort modeling incorporating time redundancy is a rather new topic. In [38], the optimal termination strategy was studied for single-component systems with a fixed number of attempts to complete the task. In [21], optimal abort rules for multi-component systems with multiple allowed attempts were considered. The optimal mission abort policies under ITR and IITR were investigated in [16]. Compared with the existing literature with constant abort thresholds, the abort threshold varies with the task attempts due to time redundancy.

In addition to task termination and time redundancy, preventive maintenance is another critical factor influencing TSP and SSP. Planing maintenance action in an optimal way can not only enhance system reliability performance but also reduce the system operation cost. According to the effect of maintenance actions, existing maintenance policies can be classified into perfect maintenance and imperfect maintenance [39,40,41,42,43]. Perfect maintenance restores the system “as good as new”, and the maintenance effect of imperfect maintenance is worse than that of perfect maintenance. Under an age-based preventive maintenance policy, a product is maintained at a certain age or upon failure, whichever occurs first. The rapid development of sensing technology makes it possible to monitor the condition of the system in a much easier way, which facilitates modeling the system degradation paths through random processes such as the Wiener process and gamma process. For systems with measurable degradation, condition-based maintenance is more effective than age-based maintenance in reducing the risk of failure [44,45,46,47,48,49]. In existing models, the joint effects of time redundancy, preventive maintenance and task abort on TSP and SSP have not been considered.

3. Problem Formulation

We consider safety critical systems whose degradation is stochastically increasing. Due to the monotonicity of the degradation path, the degradation process is modeled by homogeneous Gamma processes

\{Y (t), t \geq 0\}

with shape function

α t

and scale parameter

β

. That is,

\{Y (t), t \geq 0\}

possesses the following properties:

$Y (0) = 0$ with probability one.
For $u < v$ , the degradation increment in time interval $(u, v)$ , $Y (v) - Y (u)$ follows Gamma distribution with the distribution function $G_{(α (v - u), β)} (y)$ given as

$G_{(α (v - u), β)} (y) = 1 - \frac{Γ (α (v - u), β y)}{Γ (α (v - u))},$

(1)

and probability density function $g_{α (v - u), β} (y)$

$g_{(α (v - u), β)} (y) = \frac{β^{α (v - u)} y^{α (v - u) - 1} e^{- β y}}{Γ (α (v - u))} .$

(2)

Here, $Γ (u)$ and $Γ (u, v)$ are the Gamma function and incomplete Gamma function, which are, respectively, defined as

$\{\begin{matrix} Γ (u) = \int_{0}^{\infty} v^{u - 1} e^{- v} d v, \\ Γ (u, v) = \int_{v}^{\infty} z^{u - 1} e^{- z} d z . \end{matrix}$

(3)
$Y (t)$ has independent increments over disjoint intervals.

System failure emerges once the degradation level reaches the threshold D. To this end, the random system lifetime can be defined as the first hitting time of the degradation process

Y (t)

with respect to the failure threshold D. For the considered system, SSP is measured through the probability that no catastrophic failure occurs during task execution. To enhance the SSP under continuous degradation, a task can be terminated if the degradation level exceeds a specified level and starts a rescue procedure immediately. The duration for the rescue procedure started at time t is increasing in t, which is denoted as

ϕ (t)

. Let

ε

be the time after which the task success takes less time than the rescue procedure. Namely,

ϕ (t) + t > τ, \forall t > ε .

Thus, for

t > ε

, the task will not be aborted.

For systems executing critical tasks, time redundancy is another commonly adopted method to improve TSP and SSP. With time redundancy, the task can be executed multiple times by the required deadline

\hat{τ}

. Let K be the maximum number of attempts by time

\hat{τ}

. The mission succeeds if in any attempt

k \leq K

, the system completes the task within the time deadline

\hat{τ}

. The following two common types of time redundancy are considered in the established models:

Task success under ITR: the continuous operating time should exceed a threshold $τ$ $(τ < \hat{τ})$ ;
Task success under IITR: the cumulative operating time should exceed a threshold $τ$ $(τ < \hat{τ})$ .

A crucial problem in designing the task termination policy is balancing TSP and SSP. Considering the time redundancy property, to achieve more accurate risk evaluation and control, a dynamic task termination policy is proposed where the control limit for task termination varies with attempts. Specifically, in the ith attempt, the task is terminated, and a rescue procedure is started if the degradation level is larger than the threshold

d_{i}

. Let

T_{d_{i}}

be the duration from the beginning of the ith attempt to the task termination instant if abort threshold

d_{i}

is adopted, which is defined as the first hitting time of

Y (t)

with respect to the termination threshold

d_{i}

. By Equation (2), the cumulative distribution function of

T_{d_{i}}

,

F_{T_{d_{i}}} (t)

, is given as

F_{T_{d_{i}}} (t) = P (Y (t) > d_{i}) = \int_{d_{i}}^{\infty} g_{α t, β} (y) d y = \frac{Γ (α t, β d_{i})}{Γ (α t)} .

(4)

In addition to task abort and time redundancy, a preventive maintenance policy is incorporated to enhance the TSP and SSP. To be specific, upon the completion of a rescue procedure, imperfect maintenance is carried out whose effect is characterized by the maintenance degree

λ

. Given the degradation level at the completion of a rescue y, the degradation after preventive maintenance is reduced to

λ y

with

0 \leq λ \leq 1

. The case of

λ = 1

implies that no maintenance action is performed and

λ = 0

corresponds to replacement. The preventive maintenance cost associated with maintenance degree

λ

and degradation y is denoted by

c_{p} (λ, y)

, which is increasing in y and decreasing in

λ

. When

λ = 1

, no maintenance is performed, and thus,

c_{p} (1, y) = 0

.

Figure 1 illustrates the multiple attempts under ITR and IITR. It can be seen from Figure 1a that under IITR, the task is terminated in attempt 1, and the degradation after rescue completion is below the failure threshold. Thus, the system survives the rescue process. After the first attempt, imperfect maintenance is carried out to reduce the degradation level. The operating time in the first attempt are accumulated, and the task succeeds in attempt 2 before the task abort time

T_{2}

. Figure 1b shows a sample path under ITR where the task is aborted in attempt 1 and attempt 2 and the continuous operating time reaches

τ

in the third attempt.

4. TSP and SSP under ITR

In this section, TSP and SSP are derived under ITR considering the proposed dynamic abort and maintenance policies. Since multiple attempts can be executed until task success, an event transition-based numerical algorithm is adopted to evaluate the TSP and SSP.

4.1. TSP Evaluation under ITR

Under ITR, the continuous operating time should exceed a threshold

τ

(τ < \hat{τ})

. Let

T_{k}

and

Y_{k}

be random variables representing the remaining time for task execution and degradation level before the kth attempt. Let

q_{k} (t, y)

be the joint pdf of

T_{k}

and

Y_{k}

. A new system starts operation with the the remaining task execution time

\hat{τ}

at time 0. Therefore, by definition of

q_{k} (t, y)

, the corresponding probability mass function of

Y_{0}

and

T_{0}

can be given as follows

q_{0} (t, y) = \{\begin{matrix} 1, t = \hat{τ}, y = 0, \\ 0, otherwise . \end{matrix}

Since both the task abort and maintenance policies are dependent on the number of attempts, in the

(k - 1)

th attempt, if abort threshold

d_{k - 1}

and maintenance degree

λ_{k - 1}

are adopted, then the rescue initiated time and rescue completion time are

T_{d_{k - 1}}

and

T_{d_{k - 1}} + ϕ (T_{d_{k - 1}})

, respectively. Let

H_{k - 1}

denote the abort and preventive maintenance policies during the

(k - 1)

th attempt. The degradation level at the beginning of the kth attempt can be recursively determined by

Y_{k} = I_{(H_{k - 1} = (d_{k - 1}, λ_{k - 1}))} λ_{k - 1} (Y_{k - 1} + Δ_{d_{k - 1}}),

(5)

where

I (A)

denotes the indicator function of event A and

Δ_{d_{k - 1}}

is the degradation increment during the

(k - 1)

th attempt if that abort threshold

d_{k - 1}

is taken. Given the elapsed time of the

(k - 1)

th attempt

{\tilde{T}}_{d_{k - 1}}

, the remaining time for task execution at the beginning of the kth task is given as

T_{k} = T_{k - 1} - {\tilde{T}}_{d_{k - 1}} .

(6)

Let

g_{k - 1} (\tilde{t}, \tilde{y})

be the joint probability density function of the operating time and degradation increment of the

(k - 1)

th attempt. By Equations (5) and (6), given the degradation increment

\tilde{y}

and operating time

\tilde{t}

of the

(k - 1)

th attempt, then the degradation and remaining task execution time before the

(k - 1)

th attempt are

\frac{y}{λ_{k - 1}} - \tilde{y}

and

t + \tilde{t}

, respectively. Thus, one can obtain the overall unconditional probability density function

q_{k} (t, y)

recursively as

q_{k} (t, y) = I_{(H_{k - 1} = (d_{k - 1}, λ_{k - 1}))} \int_{0}^{min (τ, (\hat{τ} - t))} \int_{0}^{y / λ_{k - 1}} q_{k - 1} (t + \tilde{t}, \frac{y}{λ_{k - 1}} - \tilde{y}) g_{k - 1} (\tilde{t}, \tilde{y}) d \tilde{y} d \tilde{t} .

In what follows, we focus on deriving the expression for

g_{k - 1} (\tilde{t}, \tilde{y})

. Note that the Gamma process

Y (t)

is a jump process and has an infinite number of jumps in finite intervals, and the degradation level at time

T_{d_{k - 1}}

is not exactly

d_{k - 1}

but attains it with a non-degenerative random overshoot. By Bertoin [50], the joint probability density function of

T_{d_{k - 1}}

and

Y (T_{d_{k - 1}})

can be given as

f_{(T_{d_{k - 1}}, Y (T_{d_{k - 1}}))} (t, y) = \int_{0}^{\infty} I_{(d_{k - 1} \leq y < d_{k - 1} + s)} g_{(α t, β)} (y - s) μ (d s),

(7)

where

μ (d s)

is the Levy measure of Gamma process with parameters

α

and

β

given by

μ (d s) = \frac{α e^{- β s}}{s}, s > 0 .

Since the task is aborted at the

(k - 1)

th attempt, if the elapsed time of the

(k - 1)

th attempt is

\tilde{t}

, then

T_{d_{k - 1}} + ϕ (T_{d_{k - 1}}) = \tilde{t}

. By Equation (7), the expression for

g_{k - 1} (\tilde{t}, \tilde{y})

can be obtained using the property of independent increment of Gamma process

\begin{matrix} g_{k - 1} (\tilde{t}, \tilde{y}) & = I_{(t + ϕ (t) = \tilde{t})} f_{(T_{d_{k - 1}}, Y (T_{d_{k - 1}}))} (t, y) g_{(α ϕ (t), β)} (\tilde{y} - y) \\ = \int_{0}^{\infty} I_{(t + ϕ (t) = \tilde{t})} I_{(d_{k - 1} \leq y < d_{k - 1} + s)} g_{(α t, β)} (y - s) g_{(α ϕ (t), β)} (\tilde{y} - y) μ (d s) . \end{matrix}

(8)

Under ITR, if the task succeeds at the kth attempt by time

\hat{τ}

, then the remaining time for task execution before the kth attempt should be larger than the task duration

τ

, and no system failure occurs at the kth attempt (the rescue initiated time in the kth attempt,

T_{d_{k}}

, is larger than

ε

, and the system lifetime

L_{d_{k}}

is greater than the task duration

τ

, i.e.,

T_{d_{k}} > ε

and

L_{d_{k}} > τ

). Then, the probability that the task is completed at the kth attempt under ITR is given as

\begin{matrix} R_{I} (k) = \int_{τ}^{\hat{τ}} \int_{0}^{d_{k}} I_{(H_{k} = (d_{k}, λ_{k}))} P (T_{d_{k}} > ε, L_{d_{k}} > τ | Y_{k} = y) q_{k} (t, y) d y d t . \end{matrix}

(9)

Due to the property of stationary increments of Gamma process, the degradation increment in time interval

(s, t)

,

Y (t) - Y (s)

follows Gamma distribution with a shape parameter

α (t - s)

and scale parameter

β

. Based on the distribution function of the degradation increment

Y (t) - Y (s)

in Equation (2) and the joint probability density function of

T_{i}

and

X (T_{i})

in Equation (7), we have

\begin{matrix} P (T_{d_{k}} > ε, L_{d_{k}} > τ | Y_{k} = y) \\ = P (T_{d_{k}} > τ) + P (ε < T_{d_{k}} \leq τ, L_{d_{k}} > τ) \\ = P (Y (τ) - Y (0) < d_{k} - y) + \int_{ε}^{τ} \int_{d_{k}}^{D} P (Y (τ) - Y (\tilde{t}) < D - y) f_{(T_{d_{k}}, Y (T_{d_{k}}))} (\tilde{t}, \tilde{y}) d \tilde{y} d \tilde{t} \\ = 1 - \frac{Γ (α τ, β (d_{k} - y))}{Γ (α τ)} + \int_{ε}^{τ} \int_{d_{k}}^{D} (1 - \frac{Γ (α (τ - \tilde{t}), β (D - \tilde{y}))}{Γ (α (τ - \tilde{t}))}) f_{(T_{d_{k}}, Y (T_{d_{k}}))} (\tilde{t}, \tilde{y}) d \tilde{y} d \tilde{t} . \end{matrix}

(10)

Based on Equation (10), the probability that the task is completed at the kth attempt under ITR is given as

R_{I} (k) = \int_{τ}^{\hat{τ}} \int_{0}^{d_{k}} I_{(H_{k} = (d_{k}, λ_{k}))} (\begin{matrix} 1 - \frac{Γ (α τ, β (d_{k} - y))}{Γ (α τ)} \\ + \int_{ε}^{τ} \int_{d_{k}}^{D} (1 - \frac{Γ (α (τ - \tilde{t}), β (D - \tilde{y}))}{Γ (α (τ - \tilde{t}))}) f_{(T_{d_{k}}, Y (T_{d_{k}}))} (\tilde{t}, \tilde{y}) d \tilde{y} d \tilde{t} \end{matrix}) q_{k} (t, y) d y d t .

Since the number of attempts until task success is mutually exclusive, by the law of total probability, the TSP under ITR can be obtained as

R_{I} = \sum_{k = 1}^{K} R_{I} (k) .

4.2. SSP Evaluation under ITR

The system survives the task under the condition that it completes either the task or the rescue process. Consequently, SSP is the sum of TSP and rescue success probability. If the system survives after making k attempts before time

\hat{τ}

, then we have that the task is aborted at the kth attempt and the rescue procedure succeeds, i.e.,

T_{d_{k}} < ε

and

T_{d_{k}} + ϕ (T_{d_{k}}) < L_{d_{k}}

, and the remaining time after the kth rescue should be smaller than

τ

such that no further attempt is made, i.e.,

t - (T_{d_{k}} + ϕ (T_{d_{k}})) < τ

. Thus, the probability that the system survives after k attempts is given by

\begin{matrix} S_{k} = \int_{τ}^{\hat{τ}} \int_{0}^{d_{k}} I_{(H_{k} = (d_{k}, λ_{k}))} P (T_{d_{k}} < ε, t - τ < T_{d_{k}} + ϕ (T_{d_{k}}) < L_{d_{k}} | Y_{k} = y) q_{k} (t, y) d y d t . \end{matrix}

(11)

In accordance with the property of independent and stationary increments of Gamma process, given the degradation and remaining task execution time at the beginning of the kth attempt, the probability that the system survives after the kth attempt is given by

\begin{matrix} P (T_{d_{k}} < ε, t - τ < T_{d_{k}} + ϕ (T_{d_{k}}) < L_{d_{k}} | Y_{k} = y) \\ = \int_{ζ}^{ε} \int_{d_{k}}^{D} P (T_{d_{k}} + ϕ (T_{d_{k}}) < L | T_{d_{k}} = \tilde{t}, Y (T_{d_{k}}) = \tilde{y}) f_{(T_{d_{k}}, Y (T_{d_{k}}) | Y_{k} = y)} (\tilde{t}, \tilde{y}) d \tilde{y} d \tilde{t} \\ = \int_{ζ}^{ε} \int_{d_{k}}^{D} P (Y (\tilde{t} + ϕ (\tilde{t})) - Y (\tilde{t}) < D - \tilde{y}) f_{(T_{d_{k}}, Y (T_{d_{k}}) | Y_{k} = y)} (\tilde{t}, \tilde{y}) d \tilde{y} d \tilde{t} \\ = \int_{ζ}^{ε} \int_{d_{k}}^{D} (1 - \frac{Γ (α ϕ (\tilde{t}), β (D - \tilde{y}))}{Γ (α ϕ (\tilde{t}))}) f_{(T_{d_{k}}, Y (T_{d_{k}}) | Y_{k} = y)} (\tilde{t}, \tilde{y}) d \tilde{y} d \tilde{t}, \end{matrix}

(12)

where

ζ

satisfies

t - ζ - ϕ (ζ) = τ

and

f_{(T_{d_{k}}, Y (T_{d_{k}}) | Y_{k} = y)} (\tilde{t}, \tilde{y}) = \int_{0}^{\infty} I_{(d_{k - 1} \leq \tilde{y} < d_{k - 1} + s)} g_{(α \tilde{t}, β)} (\tilde{y} - y - s) μ (d s) .

According to Equations (11) and (12), the probability that the system survives after k attempts under ITR in Equation (11) is given by

\begin{matrix} S_{I} (k) = \int_{τ}^{\hat{τ}} \int_{0}^{d_{k}} I_{(H_{k} = (d_{k}, λ_{k}))} \int_{ζ}^{ε} \int_{d_{k}}^{D} (1 - \frac{Γ (α ϕ (\tilde{t}), β (D - \tilde{y}))}{Γ (α ϕ (\tilde{t}))}) f_{(T_{d_{k}}, Y (T_{d_{k}}) | Y_{k} = y)} (\tilde{t}, \tilde{y}) d \tilde{y} d \tilde{t} q_{k} (t, y) d y d t . \end{matrix}

Since the number of attempts until rescue success is mutually exclusive, the SSP under ITR can be obtained by the law of total probability as

S_{I} = \sum_{k = 1}^{K} S_{I} (k) .

5. TSP and SSP under IITR

In this section, TSP and SSP are derived under IITR considering the dynamic abort and maintenance policies. Similar to the derivation of TSP and SSP under ITR, the recursive method is adopted to evaluate the TSP and SSP.

5.1. TSP Evaluation under IITR

Under IITR, the cumulative operating time should exceed a threshold

τ

(τ < \hat{τ})

. Let

T_{k}

,

Y_{k}

, and

M_{k}

be random variables representing the remaining time for task execution, degradation level, and cumulative operating time before the kth attempt. Let

q_{k} (t, y, m)

be the joint pdf of

T_{k}

,

Y_{k}

, and

M_{k}

. A new system starts operation with the the remaining task execution time

\hat{τ}

and cumulative operating 0 at the beginning of the first attempt. Therefore, by definition of

q_{k} (y, t, m)

, the corresponding probability mass function of

T_{0}

,

Y_{0}

and

M_{0}

can be given as

q_{0} (t, y, m) = \{\begin{matrix} 1, t = \hat{τ}, y = 0, m = 0 \\ 0, otherwise . \end{matrix}

Let

g_{k - 1} (\tilde{t}, \tilde{y}, \tilde{m})

be the joint probability density function of the operating time, degradation increment and time in task of the

(k - 1)

th attempt. Given the degradation increment y, operating time s and time in task m of the

(k - 1)

th attempt, then the degradation, remaining task execution time and cumulation time in task before the

(k - 1)

th attempt are

\frac{y}{λ_{k - 1}} - \tilde{y}

,

t + \tilde{t}

and

m - \tilde{m}

, respectively. Thus, one can obtain the overall unconditional probability density function

q_{k} (y, t, m)

recursively as

q_{k} (t, y, m) = I_{(H_{k - 1} = (d_{k - 1}, λ_{k - 1}))} \int_{0}^{τ - m} \int_{0}^{y / λ_{k - 1}} \int_{0}^{min (τ, (\hat{τ} - t))} q_{k - 1} (t + \tilde{t}, \frac{y}{λ_{k - 1}} - \tilde{y}, m - \tilde{m}) g_{k - 1} (\tilde{y}, \tilde{t}, \tilde{m}) d \tilde{t} d \tilde{y} d \tilde{m} .

Under IITR, if k attempts are made until task success before time

\hat{τ}

, then the remaining time for task execution before the kth attempt should be greater than the time required to finish the remaining task, and the cumulative operating time is larger than

τ

after the kth attempt. There exist two possible scenarios for task success. In scenario 1, the cumulative operating time exceeds

τ

before the abort time

T_{d_{k}}

. In scenario 2, the cumulative operating time is smaller than

τ

before the abort time

T_{d_{k}}

but is larger than

τ

before failure occurrence. Thus, the probability that the task is completed at the kth attempt under IITR is given as

\begin{matrix} R_{I I} (k) & = \int_{0}^{τ} \int_{τ}^{\hat{τ}} \int_{0}^{d_{k}} I_{(H_{k} = (d_{k}, λ_{k}))} P (T_{d_{k}} > ε, L_{d_{k}} > τ - m | Y_{k} = y) q_{k} (t, y, m) d y d t d m \\ = \int_{τ - ε}^{τ} \int_{τ}^{\hat{τ}} \int_{0}^{d_{k}} I_{(H_{k} = (d_{k}, λ_{k}))} P (T_{d_{k}} \geq τ - m | Y_{k} = y) q_{k} (t, y, m) d y d t d m \\ + \int_{0}^{τ - ε} \int_{τ}^{\hat{τ}} \int_{0}^{d_{k}} I_{(H_{k} = (d_{k}, λ_{k}))} P (ε < T_{d_{k}} < τ - m, L_{d_{k}} > τ - m | Y_{k} = y) q_{k} (t, y, m) d y d t d m . \end{matrix}

(13)

According to the distribution function of the degradation increment in Equation (2) and the joint probability density function of

T_{d_{k}}

and

X (T_{d_{k}})

in Equation (7), the probability that the mission is completed before reaching the abort threshold

d_{k}

is

P (T_{d_{k}} > τ - m | Y_{k} = y) = 1 - \frac{Γ (α (τ - m), β (d_{k} - y))}{Γ (α (τ - m))},

(14)

and the probability that the mission is completed after reaching the abort threshold

d_{k}

is

\begin{matrix} P (ε < T_{d_{k}} < τ - m, L_{d_{k}} > τ - m | Y_{k} = y) \\ = \int_{ε}^{τ - m} \int_{d_{k}}^{D} P \{Y (τ - m) - Y (\tilde{t}) < D - \tilde{y}\} f_{(T_{d_{k}}, Y (T_{d_{k}}) | Y_{k} = y)} (\tilde{t}, \tilde{y}) d \tilde{y} d \tilde{t} \\ = \int_{ε}^{τ - m} \int_{d_{k}}^{D} (1 - \frac{Γ (α (τ - m - \tilde{t}), β (D - \tilde{y}))}{Γ (α (τ - m - \tilde{t}))}) f_{(T_{d_{k}}, Y (T_{d_{k}}) | Y_{k} = y)} (\tilde{t}, \tilde{y}) d \tilde{y} d \tilde{t} . \end{matrix}

(15)

Based on the expression in Equations (14) and (15), the probability that the task is completed at the kth attempt under IITR can be derived as

\begin{matrix} R_{I I} (k) & = \int_{0}^{τ - ε} \int_{τ}^{\hat{τ}} \int_{0}^{d_{k}} I_{(H_{k} = (d_{k}, λ_{k}))} (1 - \frac{Γ (α (τ - m), β (d_{k} - y))}{Γ (α (τ - m))}) q_{k} (t, y, m) d y d t d m \\ + \int_{τ - ε}^{τ} \int_{τ}^{\hat{τ}} \int_{0}^{d_{k}} I_{(H_{k} = (d_{k}, λ_{k}))} \int_{ε}^{τ - m} \int_{d_{k}}^{D} (1 - \frac{Γ (α (τ - m - \tilde{t}), β (D - \tilde{y}))}{Γ (α (τ - m - \tilde{t}))}) f_{(T_{d_{k}}, Y (T_{d_{k}}) | Y_{k} = y)} (\tilde{t}, \tilde{y}) d \tilde{t} d \tilde{y} q_{k} (t, y, m) d y d t d m . \end{matrix}

Since the number of attempts until task success is mutually exclusive, by the law of total probability, the TSP under IITR can be obtained as

R_{I I} = \sum_{k = 1}^{K} R_{I I} (k) .

5.2. SSP Evaluation under IITR

According to the time redundancy property, if the system survives after making k attempts before time

\hat{τ}

, then the task is aborted at the kth attempt, i.e.,

T_{d_{k}} < ε

and

T_{d_{k}} + ϕ (T_{d_{k}}) < L_{d_{k}}

, and the remaining time for task execution after the completion of the kth rescue procedure should be smaller than the remaining task time, i.e.,

t - ϕ (T_{d_{k}}) < τ - m

. Based on the probability density function of

T_{i}

, the probability that the system survives after k attempts under IITR is given by

\begin{matrix} S_{I I} (k) = \int_{0}^{τ} \int_{τ}^{\hat{τ}} \int_{0}^{d_{k}} I_{(H_{k} = (d_{k}, λ_{k}))} P (T_{d_{k}} < ε, T_{d_{k}} + ϕ (T_{d_{k}}) < L, t - ϕ (T_{d_{k}}) < τ - m | Y_{k} = y) q_{k} (t, y, m) d y d t d m . \end{matrix}

(16)

By the distribution of the degradation increment in Equation (2) and the joint probability density function of

T_{d_{k}}

and

Y (T_{d_{k}})

in Equation (7), it follows that

\begin{matrix} P (T_{d_{k}} < ε, T_{d_{k}} + ϕ (T_{d_{k}}) < L, t - ϕ (T_{d_{k}}) < τ - m | Y_{k} = y) \\ = \int_{ψ}^{ε} \int_{d_{k}}^{D} P \{T_{d_{k}} + ϕ (T_{d_{k}}) < L | T_{d_{k}} = \tilde{t}, Y (T_{d_{k}}) = \tilde{y}\} f_{(T_{d_{k}}, Y (T_{d_{k}}) | Y_{k} = y)} (\tilde{t}, \tilde{y}) d \tilde{y} d \tilde{t} \\ = \int_{ψ}^{ε} \int_{d_{k}}^{D} P \{Y (\tilde{t} + ϕ (\tilde{t})) - Y (\tilde{t}) < D - \tilde{y}\} f_{(T_{d_{k}}, Y (T_{d_{k}}) | Y_{k} = y)} (\tilde{t}, \tilde{y}) d \tilde{y} d \tilde{t} \\ = \int_{ψ}^{ε} \int_{d_{k}}^{D} (1 - \frac{Γ (α ϕ (\tilde{t}), β (D - \tilde{y}))}{Γ (α ϕ (\tilde{t}))}) f_{(T_{d_{k}}, Y (T_{d_{k}}) | Y_{k} = y)} (\tilde{t}, \tilde{y}) d \tilde{y} d \tilde{t}, \end{matrix}

(17)

where

ψ

satisfies

t - ϕ (ψ) = τ - m

. Using Equation (17), the probability that the system survives after k attempts under IITR can be given as

S_{I I} (k) = \int_{0}^{τ} \int_{τ}^{\hat{τ}} \int_{0}^{d_{k}} I_{(H_{k} = (d_{k}, λ_{k}))} \int_{ψ}^{ε} \int_{d_{k}}^{D} (1 - \frac{Γ (α ϕ (\tilde{t}), β (D - \tilde{y}))}{Γ (α ϕ (\tilde{t}))}) f_{(T_{d_{k}}, Y (T_{d_{k}}) | Y_{k} = y)} (\tilde{t}, \tilde{y}) d \tilde{y} d \tilde{t} q_{k} (t, y, m) d y d t d m .

In a similar manner, the SSP under ITR can be obtained by the law of total probability as

S_{I I} = \sum_{k = 1}^{\infty} S_{I I} (k) .

6. Optimal Abort and Maintenance Policies

The TSP increases as the abort limits increase while the SSP is decreasing in the abort thresholds due to increased failure risk caused by larger task execution duration. Thus, it is of practical value to find the optimal task abort thresholds to balance the trade-off between TSP and SSP. A commonly used criterion characterizing such optimization problem is economic loss. The cost during the task execution includes the maintenance cost before each attempt, task failure cost and system failure cost. Denote the random maintenance cost over task execution by W. Let

c_{m}

and

c_{s}

be the task failure cost and system failure cost, respectively. Based on the expressions for TSP and SSP, the expected total cost under ITR during task execution can be given as

E (C_{I}) = c_{m} (1 - R_{I}) + c_{s} (1 - S_{I}) + E (W_{I}),

(18)

and the expected total cost under IITR during task execution can be given as

E (C_{I I}) = c_{m} (1 - R_{I I}) + c_{s} (1 - S_{I I}) + E (W_{I I}) .

(19)

In what follows, we focus on deriving the expected maintenance cost during task execution under two types of time redundancy. Let

w_{I} (j)

and

w_{I I} (j)

be the maintenance cost at the jth attempt under ITR and IITR, respectively. By the law of conditional expectation, the expected maintenance cost under ITR is

E (W_{I}) = \sum_{k = 1}^{K} \sum_{j = 1}^{k - 1} R_{I} (k) w_{I} (j) .

(20)

In a similar manner, the expected maintenance cost under IITR is

E (W_{I I}) = \sum_{k = 1}^{K} \sum_{j = 1}^{k - 1} R_{I I} (k) w_{I I} (j) .

Note that the maintenance cost is related to both the degradation before maintenance and the maintenance degree. Given the degradation level y at the beginning of the jth attempt, then the degradation after the rescue of the

(k - 1)

th attempt is

\frac{y}{λ_{k - 1}}

. The expected maintenance cost at the jth attempt under ITR is

w_{I} (j) = \int_{τ}^{\hat{τ}} \int_{0}^{d_{j}} I_{(H_{j - 1} = (d_{j - 1}, λ_{j - 1}))} c_{p} (λ_{j - 1}, \frac{y}{λ_{j - 1}}) q_{j} (t, y) d y d t .

(21)

By (20) and (21), the expected maintenance cost under ITR is

E (W_{I}) = \sum_{k = 1}^{\infty} \sum_{j = 1}^{k - 1} \int_{τ}^{\hat{τ}} \int_{0}^{d_{j}} R_{I} (k) I_{(H_{j - 1} = (d_{j - 1}, λ_{j - 1}))} c_{p} (λ_{j - 1}, \frac{y}{λ_{j - 1}}) q_{j} (t, y) d y d t .

The expected maintenance cost under IITR is given in a similar manner as

E (W_{I I}) = \int_{0}^{τ} \int_{τ}^{\hat{τ}} \int_{0}^{d_{k}} I_{(H_{j - 1} = (d_{j - 1}, λ_{j - 1}))} c_{p} (λ_{j - 1}, \frac{y}{λ_{j - 1}}) q_{j} (t, y, m) d y d t d m .

Since the calculation of TSP and SSP involves reserve function, a numerical method is designed to obtain the value of TSP and SSP, and then, the optimal abort thresholds and maintenance degrees can be solved by efficient heuristic algorithms. The following section evaluates the performance of the proposed abort and maintenance policies numerically.

7. Case Study

7.1. Background

This section applies the developed abort and maintenance strategies to the cooling system in chemical reactors studied by Cha et al. [24] to illustrate the proposed risk assessment method and the performance of the optimal policies. Cooling systems are widely used in chemical reactors, which are required to keep temperatures at required levels, and their failures will lead to the dramatic fluctuation in temperature and ultimate reactor damage, resulting in huge economic loss and serious environmental and personal harm. Thus, the performed cooling task can be terminated to avoid the serious failure consequences. Additionally, routine preventive maintenance is critical to improve the TSP and SSP of the cooling system. The development of optimal maintenance and task termination strategies for cooling systems is of crucial importance in engineering practice. Crack degradation caused by corrosion is the most common internal failure mode of a typical cooling system. To characterize the monotone degradation behavior, the degradation process is modeled by a homogeneous Gamma process with D = 20 mm,

α = 1.2, β = 0.4

.

Assume that the allowable time for performing the cooling task is 30 h. The time for a single cooling task is 15 h. When the system degradation at each attempt exceeds a threshold, the cooling task is suspended, and rescue will be carried out whose duration at time is t is

ϕ (t) = 0.5 t

. Then, we can calculate the maximum abort time

ε

in each attempt as 10 h. When the time in task is greater than 10 h, the task takes less time to complete the rescue; that is, if the rescue starts after 10 h, the task will not be suspended. After each rescue, the imperfect maintenance is carried out with cost function

c_{p} (λ, y) = 20 y (1 - λ)

. In this section, the TSP and SS of the cooling system are numerically tested by a numerical integration method. Then, the optimal maintenance and abort thresholds under dynamic policy are studied.

7.2. Evaluation of TSP and SSP

This section uses a forward numerical algorithm to evaluate TSP and SSP based on the backward equations. First, we define the discretized time interval and degradation level. The running time of Matlab on a Pentium 3.2GHz PC is approximately 1500 s under the current parameter setting. Figure 2 shows the TSP and SSP under a single task attempt. The solid line represents the SSP, and the dotted line denotes the TSP. It can be observed that the TSP goes up as the degradation-based abort threshold increases. Such variation is due to the fact that under a larger value of task abort threshold, the task can continue for a relatively longer duration, and the corresponding TSP is larger, but the probability of occurrence of system failure will increase for a longer task duration. Hence, SSP decreases with task abort threshold. Figure 2 shows that TSP remains at a low level when the abort threshold is less than 8 and then increases significantly since the probability of completing the mission is very small when the abort threshold takes a low value. When the task abort threshold is 0, a task is aborted immediately before mission execution, and the corresponding SSP and TSP are 1 and 0, respectively. When the task abort threshold is 20, the TSP and SSP are the same value, since the task is never aborted. In this case, SSP equals the TSP.

7.3. Optimal Task Abort and Maintenance Policies

We further consider optimal task abort and maintenance policies under different types of time redundancy. This section investigates the variation of the optimal solution with respect to allowable time and task duration. The cost of a task failure and system failure are assumed to be 500 and 1700, respectively. Table 1 shows how the optimal abort and maintenance decisions under ITR vary with the change of different time deadlines and task duration. It shows that given a fixed task duration and number of attempts, the abort threshold is nondecreasing when the time deadline increases. One possible source for such variation is that when the time deadline is small, the abort should be conducted earlier during the first several attempts to save time for the rescue procedure and subsequent attempts. With the increase of time deadline, it is optimal to delay mission abort due to the need for more time for the rescue procedure and following task execution. Similarly, we can observe from Table 1 that given a fixed number of attempts, the abort threshold decreases when the task duration increases. Because when the mission duration is small, the abort should be conducted at a later stage to improve TSP. With the increase of task duration, it is optimal to advance task abort to improve SSP. For a fixed time deadline and task duration, the abort threshold decreases with the increase of task attempts. One explanation is that the task should be terminated earlier to save time for rescue and improve SSP in the last few attempts.

We can observe that given fixed amounts of allowable time, task duration and allowed attempts, the optimal

λ

decreases with the increase of task attempts to reduce the total cost. To be specific, in the first few task attempts, it is optimal to perform imperfect maintenance to reduce maintenance cost, while with the increase of task attempts, it is optimal to perform perfect maintenance to improve TSP and SSP. Given a fixed task duration and number of attempts, the optimal

λ

increases when the allowable time increases due to the increased TSP.

Table 2 shows how the optimal abort and maintenance decisions under IITR vary with the change of different time deadlines and task duration. Comparing Table 1 and Table 2, it can be found that under IITR, the optimal

λ

increases under IITR, since the completed work can be accumulated under IITR. Thus, it is optimal to conduct imperfect maintenance to save the maintenance cost. The abort threshold under ITR is lager than that under IITR. Under IITR, the completed task in different attempts can be accumulated under IITR, resulting in higher TSP. Consequently, the task under IITR can be aborted earlier due to higher TSP.

8. Conclusions, Limitations, and Future Research

This paper advances the state of the art of task termination by studying the joint optimal task abort and maintenance policies for task-based systems with a stochastic degradation process. The tasks can be executed multiple attempts by a deadline. TSP and SSP are evaluated considering ITR and IITR with different task success criteria. Based on the system degradation process and time redundancy properties, dynamic task abort and preventive maintenance policies are designed via considering the degradation level, remaining amount of time for task execution and cumulative operating time. TSP and SSP are evaluated by an event transition-based numerical algorithm. Based on the proposed framework, cost models are constructed to characterize total cost due to maintenance, task failure and system malfunction. The optimal thresholds and performance of the policies under ITR and IITR are investigated. The results indicate that both TSP and SSP under IITR are better than that under ITR.

The limitations of the current study and a number of corresponding future research directions are summarized as follows. Firstly, we assume that both the maintenance and abort thresholds are related to the number of task attempts. Future study can be devoted to considering the maintenance and abort thresholds related to the remaining task to be completed. Secondly, maintenance time is assumed to negligible in this study. The case of time-consuming maintenance activities is worth investigating, which is more practical in engineering practice. Last but not least, the current research can be extended to the case where a certain amount of work is required to complete the task.

Author Contributions

Formal analysis, K.C.; Conceptualization, Methodology, Funding acquisition, Q.Q., Funding acquisition, Software, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Foundation of China (72131002, 72001026 and 71971026), Science and Technology Innovation Project of Beijing Institute of Technology (2021CX01022).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting the findings of this study are available within the article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Notation	Description
$Y (t)$	system degradation at time t
D	failure threshold
$ϕ (t)$	rescue duration at time t
$τ$	task duration
$d_{i}$	abort limit at attempt i
$T_{d_{i}}$	task abort time
ITR	type I time redundancy
IITR	type II time redundancy
TSP	task success probability
SSP	system survival probability
$R_{I}$	TSP under ITR
$R_{I I}$	TSP under IITR
$S_{I}$	SSP under ITR
$S_{I I}$	SSP under IITR
$c_{m}$	task failure cost
$c_{s}$	system failure cost

References

Gao, K.; Peng, R.; Qu, L.; Wu, S. Jointly optimizing lot sizing and maintenance policy for a production system with two failure modes. Reliab. Eng. Syst. Saf. 2020, 202, 106996. [Google Scholar] [CrossRef]
Gao, K.; Yan, X.; Liu, X.d.; Peng, R. Object defence of a single object with preventive strike of random effect. Reliab. Eng. Syst. Saf. 2019, 186, 209–219. [Google Scholar] [CrossRef]
Gao, K.; Peng, R.; Qu, L.; Xing, L.; Wang, S.; Wu, D. Linear system design with application in wireless sensor networks. J. Ind. Inf. Integr. 2021, 100279, in press. [Google Scholar] [CrossRef]
Qiu, Q.; Cui, L. Reliability evaluation based on a dependent two-stage failure process with competing failures. Appl. Math. Model. 2018, 64, 699–712. [Google Scholar] [CrossRef]
Peng, R.; Zhai, Q.; Xing, L.; Yang, J. Reliability analysis and optimal structure of series-parallel phased-mission systems subject to fault-level coverage. IIE Trans. 2016, 48, 736–746. [Google Scholar] [CrossRef]
Zhao, J.; Si, S.; Cai, Z.; Guo, P.; Zhu, W. Mission success probability optimization for phased-mission systems with repairable component modules. Reliab. Eng. Syst. Saf. 2020, 195, 106750. [Google Scholar] [CrossRef]
Levitin, G.; Xing, L.; Johnson, B.W.; Dai, Y. Mission reliability, cost and time for cold standby computing systems with periodic backup. IEEE Trans. Comput. 2015, 64, 1043–1057. [Google Scholar] [CrossRef]
Wu, X.; Hillston, J. Mission reliability of semi-Markov systems under generalized operational time requirements. Reliab. Eng. Syst. Saf. 2015, 140, 122–129. [Google Scholar] [CrossRef] [Green Version]
Zhu, P.; Han, J.; Liu, L.; Lombardi, F. Reliability evaluation of phased-mission systems using stochastic computation. IEEE Trans. Reliab. 2016, 65, 1612–1623. [Google Scholar] [CrossRef]
Yi, H.; Cui, L.; Shen, J. Modeling and analysis for time redundant systems with a given mission window. Comput. Ind. Eng. 2019, 127, 480–492. [Google Scholar] [CrossRef]
Peng, R.; Zhai, Q.; Xing, L.; Yang, J. Reliability of demand-based phased-mission systems subject to fault level coverage. Reliab. Eng. Syst. Saf. 2014, 121, 18–25. [Google Scholar] [CrossRef]
Myers, A. Probability of Loss Assessment of Critical k-Out-of-n: G Systems Having a Mission Abort Policy. IEEE Trans. Reliab. 2009, 58, 694–701. [Google Scholar] [CrossRef]
Peng, R.; Levitin, G.; Xie, M.; Ng, S.H. Defending simple series and parallel systems with imperfect false targets. Reliab. Eng. Syst. Saf. 2010, 95, 679–688. [Google Scholar] [CrossRef]
Qiu, Q.; Cui, L.; Gao, H.; Yi, H. Optimal allocation of units in sequential probability series systems. Reliab. Eng. Syst. Saf. 2018, 169, 351–363. [Google Scholar] [CrossRef]
Levitin, G.; Finkelstein, M. Optimal mission abort policy for systems operating in a random environment. Risk Anal. 2018, 38, 795–803. [Google Scholar] [CrossRef] [PubMed]
Qiu, Q.; Kou, M.; Chen, K.; Deng, Q.; Kang, F.; Lin, C. Optimal stopping problems for mission oriented systems considering time redundancy. Reliab. Eng. Syst. Saf. 2021, 205, 107226. [Google Scholar] [CrossRef]
Zhao, X.; Chai, X.; Sun, J.; Qiu, Q. Optimal bivariate mission abort policy for systems operate in random shock environment. Reliab. Eng. Syst. Saf. 2021, 205, 107244. [Google Scholar] [CrossRef]
Levitin, G.; Finkelstein, M. Optimal mission abort policy with multiple shock number thresholds. Proc. Inst. Mech. Eng. Part J. Risk Reliab. 2018, 232, 607–615. [Google Scholar] [CrossRef]
Zhao, X.; Chai, X.; Sun, J.; Qiu, Q. Joint optimization of mission abort and protective device selection policies for multistate systems. Risk Anal. 2022; Online ahead of print. [Google Scholar] [CrossRef]
Levitin, G.; Finkelstein, M.; Dai, Y. Mission abort policy optimization for series systems with overlapping primary and rescue subsystems operating in a random environment. Reliab. Eng. Syst. Saf. 2020, 193, 106590. [Google Scholar] [CrossRef]
Levitin, G.; Finkelstein, M.; Xiang, Y. Optimal abort rules and subtask distribution in missions performed by multiple independent heterogeneous units. Reliab. Eng. Syst. Saf. 2020, 199, 106920. [Google Scholar] [CrossRef]
Qiu, Q.; Cui, L. Optimal mission abort policy for systems subject to random shocks based on virtual age process. Reliab. Eng. Syst. Saf. 2019, 189, 11–20. [Google Scholar] [CrossRef]
Yang, L.; Sun, Q.; Ye, Z.S. Designing mission abort strategies based on early-warning information: Application to UAV. IEEE Trans. Ind. Inform. 2019, 16, 277–287. [Google Scholar] [CrossRef]
Cha, J.H.; Finkelstein, M.; Levitin, G. Optimal Mission Abort Policy for Partially Repairable Heterogeneous Systems. Eur. J. Oper. Res. 2018, 271, 818–825. [Google Scholar] [CrossRef]
Qiu, Q.; Cui, L.; Wu, B. Dynamic mission abort policy for systems operating in a controllable environment with self-healing mechanism. Reliab. Eng. Syst. Saf. 2020, 203, 107069. [Google Scholar] [CrossRef]
Qiu, Q.; Cui, L. Gamma process based optimal mission abort policy. Reliab. Eng. Syst. Saf. 2019, 190, 106496. [Google Scholar] [CrossRef]
Zhao, X.; Sun, J.; Qiu, Q.; Chen, K. Optimal inspection and mission abort policies for systems subject to degradation. Eur. J. Oper. Res. 2021, 292, 610–621. [Google Scholar] [CrossRef]
Zhao, X.; Fan, Y.; Qiu, Q.; Chen, K. Multi-criteria mission abort policy for systems subject to two-stage degradation process. Eur. J. Oper. Res. 2021, 295, 233–245. [Google Scholar] [CrossRef]
Levitin, G.; Finkelstein, M.; Dai, Y. State-based mission abort policies for multistate systems. Reliab. Eng. Syst. Saf. 2020, 204, 107122. [Google Scholar] [CrossRef]
Yang, L.; Chen, Y.; Qiu, Q.; Wang, J. Risk Control of Mission-Critical Systems: Abort Decision-Makings Integrating Health and Age Conditions. IEEE Trans. Ind. Inform. 2022. [Google Scholar] [CrossRef]
Levitin, G.; Xing, L.; Dai, Y. Mission abort policy in heterogeneous nonrepairable 1-out-of-N warm standby systems. IEEE Trans. Reliab. 2017, 67, 342–354. [Google Scholar] [CrossRef]
Filene, R.; Daly, W. The reliability impact of mission abort strategies on redundant flight computer systems. IEEE Trans. Comput. 1974, 100, 739–743. [Google Scholar] [CrossRef]
Peng, R. Joint routing and aborting optimization of cooperative unmanned aerial vehicles. Reliab. Eng. Syst. Saf. 2018, 177, 131–137. [Google Scholar] [CrossRef]
Zhu, X.; Yan, R.; Peng, R.; Zhang, Z. Optimal routing, loading and aborting of UAVs executing both visiting tasks and transportation tasks. Reliab. Eng. Syst. Saf. 2020, 204, 107132. [Google Scholar] [CrossRef]
Zhu, X.; Zhu, X.; Yan, R.; Peng, R. Optimal routing, aborting and hitting strategies of UAVs executing hitting the targets considering the defense range of targets. Reliab. Eng. Syst. Saf. 2021, 215, 107811. [Google Scholar] [CrossRef]
Levitin, G.; Xing, L.; Luo, L. Influence of failure propagation on mission abort policy in heterogeneous warm standby systems. Reliab. Eng. Syst. Saf. 2019, 183, 29–38. [Google Scholar] [CrossRef]
Yu, H.; Wu, X.; Wu, X. An extended object-oriented petri net model for mission reliability evaluation of phased-mission system with time redundancy. Reliab. Eng. Syst. Saf. 2020, 197, 106786. [Google Scholar] [CrossRef]
Levitin, G.; Finkelstein, M.; Huang, H.Z. Optimal Abort Rules for Multiattempt Missions. Risk Anal. 2019, 39, 2732–2743. [Google Scholar] [CrossRef]
Dui, H.; Xu, Z.; Chen, L.; Xing, L.; Liu, B. Data-driven maintenance priority and resilience evaluation of performance loss in a main coolant system. Mathematics 2022, 10, 563. [Google Scholar] [CrossRef]
Gu, X.; Guo, W.; Jin, X. Performance evaluation for manufacturing systems under control-limit maintenance policy. J. Manuf. Syst. 2020, 55, 221–232. [Google Scholar] [CrossRef]
Wang, W.; Zhao, F.; Peng, R. A preventive maintenance model with a two-level inspection policy based on a three-stage failure process. Reliab. Eng. Syst. Saf. 2014, 121, 207–220. [Google Scholar] [CrossRef]
Wang, J.; Qiu, Q.; Wang, H.; Lin, C. Optimal condition-based preventive maintenance policy for balanced systems. Reliab. Eng. Syst. Saf. 2021, 211, 107606. [Google Scholar] [CrossRef]
Xia, T.; Sun, B.; Chen, Z.; Pan, E.; Wang, H.; Xi, L. Opportunistic maintenance policy integrating leasing profit and capacity balancing for serial-parallel leased systems. Reliab. Eng. Syst. Saf. 2021, 205, 107233. [Google Scholar] [CrossRef]
Sun, Q.; Ye, Z.S.; Zhu, X. Managing component degradation in series systems for balancing degradation through reallocation and maintenance. IISE Trans. 2019, 52, 797–810. [Google Scholar] [CrossRef]
Yang, L.; Li, G.; Zhang, Z.; Ma, X. Operations & Maintenance Optimization of Wind Turbines Integrating Wind and Aging Information. IEEE Trans. Sustain. Energy 2020, 12, 211–221. [Google Scholar] [CrossRef]
Wang, X.; Zhou, H.; Parlikad, A.K.; Xie, M. Imperfect Preventive Maintenance Policies With Unpunctual Execution. IEEE Trans. Reliab. 2020, 69, 1480–1492. [Google Scholar] [CrossRef]
Yang, L.; Zhao, Y.; Peng, R.; Ma, X. Hybrid preventive maintenance of competing failures under random environment. Reliab. Eng. Syst. Saf. 2018, 174, 130–140. [Google Scholar] [CrossRef]
Hu, J.; Shen, J.; Shen, L. Opportunistic Maintenance for Two-Component Series Systems Subject to Dependent Degradation and Shock. Reliab. Eng. Syst. Saf. 2020, 201, 106995. [Google Scholar] [CrossRef]
Sun, Q.; Ye, Z.S.; Chen, N. Optimal inspection and replacement policies for multi-unit systems subject to degradation. IEEE Trans. Reliab. 2017, 67, 401–413. [Google Scholar] [CrossRef]
Bertoin, J. Lévy Processes; Cambridge University Press: Cambridge, UK, 1998; Volume 121. [Google Scholar]

Figure 1. A sample path of multiple attempts under type I time redundancy.

Figure 2. TSP and SSP under single task attempt.

Table 1. Optimal solution under ITR given different allowable time and task duration.

(Allowable Time $\hat{τ}$ , Task Duration $τ$ )	Allowed Attempts K = 2	Allowed Attempts K = 3	Allowed Attempts K = 4
(30 h, 15 h)	$(13.2, 0; 13.2)$	$(15.1, 0.23; 13.7; 0; 12.8)$	$(15.7, 0.21; 14.0, 0.30; 13.7, 0; 12.6)$
(35 h, 15 h)	$(13.9, 0; 13.9)$	$(15.2, 0.27; 14.4; 0; 13.5)$	$(16.0, 0.26; 15.0; 0.31; 14.2; 0; 13.4)$
(40 h, 15 h)	$(14.2, 0; 14.2)$	$(15.5, 0.29; 14.7; 0; 14.1)$	$(16.1, 0.27; 15.3, 0.35; 14.6, 0; 14.0)$
(45 h, 15 h)	$(14.7, 0; 14.7)$	$(15.8, 0.32; 15.0; 0; 14.3)$	$(16.5, 0.30; 15.7; 0.38; 15.0; 0; 14.2)$
(50 h, 15 h)	$(15.3, 0; 15.3)$	$(16.9, 0.35; 15.6; 0; 14.5)$	$(17.7, 0.32; 16.8, 0.43; 15.4, 0; 14.3)$
(30 h, 10 h)	$(13.7, 0; 13.7)$	$(15.4, 0.28; 13.8; 0; 13.3)$	$(16.3, 0.26; 14.5, 0.15; 14.1, 0; 12.9)$
(30 h, 15 h)	$(13.2, 0; 13.2)$	$(15.1, 0.23; 13.7; 0; 12.8)$	$(15.7, 0.21; 14.0, 0.15; 13.7, 0; 12.6)$
(30 h, 20 h)	$(12.9, 0; 12.9)$	$(14.3, 0.21; 13.4; 0; 12.5)$	$(15.1, 0.15; 13.5, 0.12; 13.2, 0; 12.1)$
(30 h, 25 h)	$(12.2, 0; 12.2)$	$(14.0, 0.10; 13.1; 0; 11.9)$	$(14.6, 0.11; 13.1, 0.10; 12.8, 0; 11.1)$

Table 2. Optimal solution under IITR given different time deadlines.

(Allowable Time $\hat{τ}$ , Task Duration $τ$ )	Allowed Attempts K = 2	Allowed Attempts K = 3	Allowed Attempts K = 4
(30 h, 15 h)	$(12.7, 0.15; 12.5)$	$(14.6, 0.28; 13.3; 0; 12.3)$	$(15.1, 0.45; 13.6, 0.38; 13.3, 0.31; 12.1)$
(35 h, 15 h)	$(13.2, 0.20; 12.9)$	$(15.0, 0.30; 13.7; 0.11; 13.2)$	$(15.7, 0.43; 14.5; 0.40; 13.5; 0.36; 12.6)$
(40 h, 15 h)	$(14.0, 0.25; 13.3)$	$(15.3, 0.39; 14.3; 0.24; 13.8)$	$(15.8, 0.47; 15.1; 0.42; 14.0; 0.37; 12.9)$
(45 h, 15 h)	$(14.3, 0.41; 13.8)$	$(15.1, 0.43; 14.8; 0.37; 14.0)$	$(16.4, 0.48; 15.6; 0.43; 14.5; 0.39; 13.2)$
(50 h, 15 h)	$(14.7, 0.44; 14.1)$	$(16.4, 0.51; 15.0; 0.44; 14.2)$	$(17.2, 0.52; 16.3, 0.51; 15.0, 0.46; 13.9)$
(30 h, 10 h)	$(13.5, 0.20; 13.1)$	$(15.2, 0.33; 13.5; 0; 13.0)$	$(16.1, 0.49; 14.2, 0.40; 13.6, 0.31; 12.5)$
(30 h, 15 h)	$(12.7, 0.15; 12.5)$	$(14.6, 0.28; 13.3; 0; 12.3)$	$(15.1, 0.45; 13.6, 0.34; 13.3, 0.30; 12.1)$
(30 h, 20 h)	$(12.3, 0; 12.1)$	$(14.1, 0.22; 13.4; 0; 12.1)$	$(14.7, 0.30; 13.2, 0.21; 13.0, 0; 11.4)$
(30 h, 25 h)	$(12.0, 0; 11.8)$	$(13.7, 0.18; 13.0; 0; 11.6)$	$(14.3, 0.27; 12.8, 0.17; 12.5, 0; 10.8)$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, K.; Zhao, X.; Qiu, Q. Optimal Task Abort and Maintenance Policies Considering Time Redundancy. Mathematics 2022, 10, 1360. https://doi.org/10.3390/math10091360

AMA Style

Chen K, Zhao X, Qiu Q. Optimal Task Abort and Maintenance Policies Considering Time Redundancy. Mathematics. 2022; 10(9):1360. https://doi.org/10.3390/math10091360

Chicago/Turabian Style

Chen, Ke, Xian Zhao, and Qingan Qiu. 2022. "Optimal Task Abort and Maintenance Policies Considering Time Redundancy" Mathematics 10, no. 9: 1360. https://doi.org/10.3390/math10091360

APA Style

Chen, K., Zhao, X., & Qiu, Q. (2022). Optimal Task Abort and Maintenance Policies Considering Time Redundancy. Mathematics, 10(9), 1360. https://doi.org/10.3390/math10091360

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Task Abort and Maintenance Policies Considering Time Redundancy

Abstract

1. Introduction

2. Literature Review

3. Problem Formulation

4. TSP and SSP under ITR

4.1. TSP Evaluation under ITR

4.2. SSP Evaluation under ITR

5. TSP and SSP under IITR

5.1. TSP Evaluation under IITR

5.2. SSP Evaluation under IITR

6. Optimal Abort and Maintenance Policies

7. Case Study

7.1. Background

7.2. Evaluation of TSP and SSP

7.3. Optimal Task Abort and Maintenance Policies

8. Conclusions, Limitations, and Future Research

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI