1. Introduction
Many multi-element systems employ redundancy to provide high reliability, and one of the commonly used redundant systems is the linear multi-state consecutively connected system (LMCCS) [
1,
2]. An LMCCS contains multiple consecutively connected elements, each with several performance levels to connect itself and the following elements. An element cannot provide connections to any other elements when it fails, and the system fails when the first and last elements are not connected. The LMCCS generalizes linear consecutive-
k-out-of-
n:F systems [
1,
3] and is widely applied in engineering projects, such as petroleum pipeline systems, signal relay stations, street illumination systems, and logistics systems. Elements in such systems degrade over time, leading to costly system failure. Therefore, this paper focuses on degradation management for LMCCSs to lower maintenance-related costs, including the system failure cost.
Take an oil pipeline system as an example of the LMCCS, where a sequence of pump stations transports crude oil from an oilfield to a refinery. The system is designed to be redundant, and the output power (performance level) of pump stations is controllable. Hence, if a station fails, operators can improve the performance of stations before the failed station to guarantee steady oil transportation. However, a higher output power leads to a higher degradation rate. At the same time, the operator can actively lower the performance of a highly degraded pump station to reduce its failure risk. To summarize, the system redundancy and the controllability of performance provide opportunities to decrease operation and maintenance costs through performance control. Therefore, we consider condition-based performance control (CBP), i.e., controlling the performance level of each element based on the system degradation state. In addition, timely maintenance brings an element to a better or brand-new state, preventing system failure or decreasing system downtime. Thus, we adopt condition-based maintenance (CBM), i.e., determining maintenance for the elements based on the system degradation state.
The CBP decision affects the degradation of the elements and further the CBM decision. Meanwhile, CBM brings failed elements back to the functional state, which influences the CBP decision in that only the performance level of a functional element is controllable. That is, the CBP and CBM both affect the degradation of the elements and further affect each other. Therefore, the CBP and CBM should be simultaneously optimized. We also consider the impact of maintenance capacity on the CBP and CBM. Although higher capacity guarantees lower system failure risk, it needs more resources, e.g., a large maintenance team and a high spare parts inventory. Therefore, considering the restriction of the maintenance capacity, we focus on the joint optimization of condition-based maintenance and performance control (CBMP) for LMCCSs. Specifically, we assume that all the elements in an LMCCS are functionally identical and degrade continuously over time according to a gamma process, and the degradation rate of an element increases when its performance level increases. The performance level of an element indicates how many subsequent nodes it can connect and is controllable if the element is not failed. The LMCCS is inspected periodically, on which the optimal performance control and maintenance decisions are made to minimize the related total cost. Finally, the CBMP problem is modeled by the Markov decision process (MDP) and solved by a dynamic programming algorithm, through which the optimal CBMP policy, i.e., the optimal performance control and maintenance actions for each system degradation state, is provided.
The performance control in CBMP optimization also covers the load-sharing in redundant systems. When an element fails, the system with redundancy keeps operating, but the remaining elements should raise their performance to keep the same system performance. The elements taking an additional load, i.e., in high performance, degrade faster. In LMCCSs, including the linear consecutive-k-out-of-n:F system, the load increment caused by element failure can only be shared with elements before the failed one. In addition, in LMCCSs, such as oil pipeline systems and signal relay networks, operators can choose which element(s) to take an additional load by raising performance. Therefore, the degradation processes of elements are dependent, and operators can manipulate such degradation dependence. The CBMP optimization can reach optimal load-sharing, i.e., effectively assign the load (control the performance) among functional elements.
Based on the numerical studies in this paper, we find that the restricted maintenance capacity, the load-sharing, and the structural characteristics of LMCCSs can lead to a reverse balance of degradation levels between elements. The reverse balance in degradation management means increasing the difference in degradation levels between elements. The existing literature has studied the balance of degradation, which decreases the difference in degradation levels between elements. In contrast, our numerical results show that the reverse balance can also be cost-effective in specific situations, where more-degraded elements take over the load from less-degraded ones, or highly degraded elements rather than failed ones are prioritized for replacement. Meanwhile, the CBMP policy proposed in this paper is proven to outperform a benchmark policy, where predetermined performance control and CBM are considered. In addition, sensitivity analyses are performed on the maintenance capacity, setup cost, and the difference between preventive and corrective replacement costs to give more management insights into the LMCCSs.
As described above, our contributions are summarized in the following four aspects: (1) CBMP, an efficient management technique, is proposed for LMCCSs. As far as we know, few studies researched CBM or CBMP for LMCCSs. (2) The load-sharing in LMCCSs is investigated. Our research methodology and findings also apply to linear consecutive-k-out-of-n:F systems. (3) We provided innovative insights into CBMP. To our knowledge, little existing literature considers the maintenance capacity’s substantial influence on CBMP. We studied the CBMP policy under the combined effect of the restricted maintenance capacity, the load-sharing, the maintenance setup cost, and the structural characteristics of LMCCSs. (4) We examined the value and rationale of the reverse balance of degradation levels between elements while balancing degradation is usual in degradation management. The optimal CBMP policy takes advantage of both the balance and the reverse balance.
The remainder of the paper is structured as follows.
Section 2 discusses the related literature.
Section 3 elaborates on the LMCCS and the CBMP policy, where we also demonstrate assumptions concerning the maintenance capacity, maintenance costs, and the system failure cost.
Section 4 presents the formulation of the MDP model and the dynamic programming algorithm.
Section 5 provides numerical experiments of a five-element LMCCS. The main experiment, comparison experiment, and sensitivity analysis are performed to acquire insights into the optimal CBMP policy, the balance, and the reverse balance. Finally,
Section 6 concludes the paper and offers future research directions.
3. Problem Description
Our base system is an LMCCS, consisting of
, sequentially ordered nodes
,
, as shown in
Figure 1. Each node, except the last one, has an element to connect the current node and subsequent nodes. All the
elements are functionally identical and degrade continuously over time. The degradation rate of an element increases when its performance level increases. The performance level of an element indicates how many subsequent nodes it can connect and is controllable if the element is not failed. Let
,
,
, denote the performance level of the
th element,
, where level
means the element provides no connection and
is the maximum level. As shown in
Figure 1, the
th element at node
at performance level
can provide connections from
to
. If the first node
and the last one
are not connected, the LMCCS fails.
In the remainder of the section, we first introduce the discretization of the continuous degradation level and then the degradation process of the elements in
Section 3.1. We elaborate on the CBMP policy and related costs for the LMCCS in
Section 3.2.
3.1. Degradation of the Elements
The degradation level of an element ranges from to infinity, where level indicates that the element is brand-new, and an element fails when its degradation level reaches or exceeds a predetermined failure threshold . To make the CBMP policy optimization tractable using the MDP, we discretize the continuous degradation level into discrete degradation states as . Let . The degradation level in corresponds to the discrete degradation state , , the level in corresponds to the brand-new state , and the level in corresponds to the failure state .
Let denote the degradation state of the th element, , . Then, the system degradation state is . Starting from , let denote the system state after a time unit, for . The probability of degrading from state to of the th element under performance in a time unit is denoted as . The system performance level is denoted as .
The degradation of elements is subject to load-sharing, and there are three considerations for the condition-based load-sharing decision. First, the load of an element in an LMCCS can only be shared by several elements before it, and only a functional element can take the load. Second, the load shift is not merely triggered by element failure, e.g., operators can switch off a functional element based on the system degradation state. Last, when the maximal performance level is greater than 2, multiple elements can share the load of a not-functioning element, so it has to be decided which element will bear the load. To summarize, the condition-based load-sharing should consider which element(s) do not take the load, which element(s) bear the load, and the structural characteristics of LMCCSs. To model the load-sharing, we convert it into a performance control problem. We assume that, at a given performance level, the degradation process of an element is independent of the degradation processes of other elements. Therefore, the probability of degrading from system state to under system performance in a time unit is .
To model the degradation of each element, we employ the gamma process, which is appropriate for depicting continuous degradation in cumulative damage such as wear, erosion, and fatigue [
39,
40]. We model the degradation level of the
th element as a gamma process
with the shape parameter
and the scale parameter
. The increment of the degradation level in a time unit, denoted by
, has the probability density function as
where the gamma function
. Then, the expected degradation increment per time unit, i.e., the degradation rate, of the
th element is
. Since the degradation rate increases with the performance level,
is an increasing function of
. The gamma process is a monotonic degradation process, where an element can only remain in its current state or degrade to a worse state.
The probability that an element degrades from state to under performace in a time unit for is
3.2. CBMP Policy and Related Costs
Inspection, condition-based element replacement, and condition-based performance control are successively executed at the start of every time unit. The time required for these activities is assumed to be negligible. The inspection obtains the system degradation state , incurring cost . Specialized equipment and maintenance staff need to be prepared if the maintenance is scheduled, incurring setup cost . We assume that the replacement restores an element to its brand-new state. The residual value of a failed element is smaller than that of a degraded but not failed one. Therefore, the cost of replacing a degraded but not failed element is less than that of a failed element , i.e., . The difference between the corrective and preventive replacement costs can be seen as an element failure cost, not incurred upon element failure but reflected when performing the replacement. Maintenance capacity refers to the maximum number of elements that can be replaced simultaneously at the beginning of each unit of time.
Performance control is performed after the replacement. The performance level of an element can be set as an integer from to if the element is functional, i.e., in degradation states from to . For a failed element, i.e., the one in state , its performance level is not controllable and can only be set as . The system failure cost is incurred if the system performance level cannot enable the first node and the last node to be connected. The degradation rates of the elements in the unit of time are determined by their performance levels.
This paper aims to find an optimal CBMP policy for LMCCSs, i.e., to determine which elements are to be replaced and how to control performance levels based on the system degradation state. The CBMP decision is made at the beginning of every unit of time to minimize the long-term total cost, including the inspection, setup, replacement, and system failure costs.
4. Markov Decision Process Formulation
The MDP model includes a set of possible states . Every possible state has its own set of feasible actions . The model also contains the transition probabilities that the system degrades from state to state if an action in is chosen. When an LMCCS is in state and under an action in , a corresponding cost is incurred. The value function reflects the effect of actions on long-term maintenance-related costs. In addition, we present the dynamic programming algorithm, specifically the policy iteration algorithm, to solve the model, along with its pseudocode.
4.1. State Space, Action Space, and Transition Probabilities
State space. The state of the LMCCS is measured by the degradation state . The state space , where for .
Action space. At the beginning of each time unit, two sequential actions are executed. First is the replacement decision, denoted as
. Then, the performance control sets the system performance to
. As a result, the action space is determined as follows:
. The maintenance capacity
limits the number of replacements. For
:
An LMCCS’s degradation state after replacement decision
is denoted as
. The replacement is assumed to restore an element to its brand-new state. Thus,
If, after replacement, the
th element is still in the failure state
, its performance level is not controllable and is fixed at
. Otherwise, its performance level can be set to any integer from
to
. Hence, for
:
Transition probabilities. The degradation processes of the elements are independent when the system performance level is fixed. In detail, the degradation of each element follows the gamma process. The transition probability of moving from
to
under replacement
and performance control
is
. The method of computing the probability
has been given in Equations (1) and (2) in
Section 3.1.
4.2. Value Function Formulation
We consider inspection, setup, element replacement, and system failure costs for each time unit. The replacement of a failed element incurs the corrective maintenance cost
, while that of a functional but degraded element incurs the preventive maintenance cost
. The replacement cost of the
th element is denoted as
and
Therefore, the replacement cost of the LMCCS is .
The indicator function equals 1 if condition is true and 0 otherwise. A setup cost is required if the replacement is initiated, i.e., . Hence, the system maintenance setup cost is .
The method to determine whether a system failure cost is incurred is as follows. Node , , is connected by at least one element before if . If every node , , is connected by at least one element before , i.e., , the LMCCS works correctly. Thus, the system failure cost of an LMCCS in system performance is .
The sum of costs
in an LMCCS for one unit of time comprises the inspection cost
, the replacement cost
, the setup cost
, and the system failure cost. Hence,
We use to represent the set of all degradation states that can be reached from state when the replacement decision is . An element whose degradation follows the gamma process can only remain in its current state or degrade to a worse state. Hence, .
Let , , be the discount rate, which decides the present value of future costs. A cost occurring , units of time in the future equals only times what it would be if it happened instantly.
A value function specifies what is costly in the long run, whereas the abovementioned cost
indicates what is costly in a time unit. Therefore, the value function
first considers the immediate cost
, then the set of possible following states
and finally, the expected potential costs in those states
. These potential costs in the future should be multiplied by the discount rate
. Hence, the minimized value function
is as follows:
where the constraints on the action space
have been given by Equations (3)–(5) in
Section 4.1.
4.3. Policy Iteration Algorithm
We adopt the policy iteration to determine the optimal action
for every
. Algorithm 1 demonstrates the pseudocode. Due to constraints in Equations (3)–(5), the action space
of each
can be different and thus needs to be determined in Step 0, preparation. We start with an arbitrary policy, as shown in Step 1, initialization. The value function of that policy is found in Step 2, policy evaluation. Specifically, for each system state
, the effect of the current policy on its value function
is evaluated iteratively by computing the objective function in the minimization problem in Equation (7). Then, based on the updated value function, a new and improved policy is reached in Step 3, policy improvement. In detail, every feasible action
of each
is evaluated to reach the optimal policy of the current value function. After that, repeat the policy evaluation and the policy improvement steps until the current optimal actions of every state keep stable, which means the optimal solution of the model has been achieved.
Algorithm 1: Policy iteration algorithm. |
Input: Element number ; failure threshold ; failure state ; maximal performance level ; maintenance capacity ; costs , , , , ; shape parameter of the gamma process; scale parameter for ; discount rate ; termination tolerance . Step 0: (Preparation) Determine the action space for all according to Equations (3)–(5). Compute the transition probabilities according to Equations (1) and (2) for all , , and . Compute the sum of costs for all according to Equation (6). Step 1: (Initialization) The maintenance and the performance control policy for the LMCCS in state is denoted as and , respectively. and are vectors of length . Let . Step 2: Policy Evaluation Repeat: (The symbol denotes the difference in the value function between two successive iterations) : denote a copy of the value function) Until (The symbol is the termination tolerance) Step 3: Policy Improvement (Let indicate whether policy keep stable) : denote a copy of the current policy) ( means the value of takes its minimal value) If . If ,; else go back to Step 2 (Policy Evaluation). |
5. Numerical Experiments
In this section, we employ numerical experiments to explore the CBMP policy.
Section 5.1 elaborates on the main experiment, which focuses on the reverse balance and the balance in degradation management.
Section 5.2 introduces a benchmark policy that considers CBM but predetermines a load-sharing rule. The CBMP policy compares favorably with the benchmark because the CBMP policy offers different load-sharing arrangements according to different system degradation states.
Section 5.3 performs the sensitivity analysis to observe how the change in several parameter values influences the optimal CBMP policy. The whole solution process is implemented in Python 3.10 (64-bit) on an Intel
® Core™ i7-1165G7 @ 2.80 GHz CPU and 16 GB 3733 MHz memory.
5.1. Main Experiment
In the main experiment, we discuss one of our contributions, i.e., explorations of the reverse balance of degradation.
Section 5.1.1 introduces the parameters of the model and assumptions concerning the performance–degradation relation, for which the degradation rate of an element is a function of its performance level.
Section 5.1.2 demonstrates examples of three categories of the reverse balance and one category of the balance of degradation.
5.1.1. Parameters of the Main Experiment
We consider an LMCCS comprising five elements. Each element has four degradation states: state 0 means not degraded, state 1 means moderately degraded, state 2 means highly degraded, and state 3 means failed. Hence, there are a total of 1024 possible system states. The maximal performance level of each element is . Level 0 means idle, level 1 indicates that the element can connect to the next node, and level 2 indicates connecting to the following two nodes. The maintenance capacity is 2, which means two elements at most can be replaced in a decision period. The inspection, maintenance setup, preventive maintenance, corrective maintenance, and system failure costs are 5, 100, 20, 150, and 5000, respectively. The discount rate and the termination tolerance for the policy iteration algorithm are 0.97 and , respectively.
As mentioned in
Section 3.1, the degradation of each element follows the gamma process, so the degradation rate of the
th element is the product of the shape and scale parameters
. The relation between the performance level
and the degradation rate
can be modeled as, for example, power, exponential, logarithmic functions, or certain combinations among these functions. We model the performance–degradation relation as follows:
where the shape parameter
, the degradation rate of an idle element
, and that of an element at the maximal performance level
. We assume that an element still gradually degrades while not working. In addition, we set the exponent of
as
so that the degradation rate rises faster and faster when the performance level increases. An exponent greater than one leads to a result that
, which encourages sharing the total system load equally among functional elements. Therefore, the degradation rate
for an element at performance level 1. The parameter values of the main experiment are summarized in
Table 1.
5.1.2. Reverse Balance and Balance in Degradation Management
The optimal replacement and performance control of the 1024 system states can be reached through the policy iteration algorithm provided in
Section 4.3. Some of the results are presented in
Table 2, where the system state before replacement (original state), the optimal replacement action, the system state after replacement (post-replace state), the optimal performance control, and the value function are shown. Based on the results, we discuss one category of the reverse balance resulting from replacement, two categories of the reverse balance caused by performance control, and one category of the balance by performance control.
Instances I1–I6 in group G1 (see
Table 2) illustrate the reverse balance resulting from replacement. The failed elements, i.e., the elements with degradation state 3, are not prioritized for replacement, while highly degraded elements, i.e., the elements with degradation state 2, are replaced first. It differs from the control limit policy under which maintenance occurs once the degradation level exceeds a threshold. The control limit policy is intuitive since the degradation level surpassing the control limit implies an underlying risk of system failure. The reason to avoid system failure is that system failure or downtime costs are high for some systems in real life. Similarly, regarding specific multi-element systems, an element failure cost exists, reflected by a difference between preventive and corrective replacement costs in our model. The element failure cost is realistic because irreparable damage is possible upon element failure, reducing the residual value of a replaced element. Therefore, when the maintenance capacity is restricted, a trade-off has to be reached between replacing failed and highly degraded elements. Giving priority to failed elements better strengthens the system reliability while replacing highly degraded ones first can avoid element failure costs. Hence, a reverse balance may occur in a redundant system with a relatively low system failure cost, a high element failure cost, and a limited maintenance capacity. Such a reverse balance can be observed in redundant systems other than LMCCSs, e.g., parallel systems and variants of
k-out-of-
n systems [
8,
9,
10,
41].
Instances I7–I14 in group G2 (see
Table 2) display the reverse balance arising from performance control, where a more-degraded element takes more load than a less-degraded one, i.e., the performance level of a more-degraded element is set higher than that of less-degraded ones. For example, in instance I7 group G2, when the post-replace state is (0,0,0,1,2), the performance level is (1,1,1,2,0), where the performance level of the element with degradation state 0 is 1 while the performance level of the element with degradation state 1 is 2. The following three factors contribute to one type of reverse balance caused by performance control. First, the LMCCS’s redundancy allows a functional element to be switched off, which brings down the degradation rate of the element. Second, switching off highly degraded elements can be cost-effective when the element failure cost is high. Third, the unique structural characteristics of the LMCCS constrain the load-sharing arrangement. The load of an element in the LMCCS can only be taken by several elements before it. Therefore, if not properly located, the least-degraded element cannot take the load from a highly degraded element, which results in a reverse balance. The abovementioned reverse balance can be observed in instances I7–I10, group G2,
Table 2. Another type of reverse balance can arise if the structural characteristics of the LMCCS prevent the least-degraded elements from taking on more load when there is a failed element. Instances I11–I14 in group G2 in
Table 2 demonstrate this type of reverse balance.
A balance of degradation may simultaneously occur with the reverse balance mentioned above. For example, the elements with degradation state 2 are actively switched off, and their loads are taken by elements with degradation state 0 or 1 in instances I7–I9, group G2,
Table 2. Using a less-degraded element to protect a more-degraded element from failure through load-sharing (performance control) is an active balance of degradation. In addition, the active switch-off of a functional element based on system degradation states can be regarded as a condition-based mission abort [
42], where an element’s work is stopped to ensure its survivability. Our merit is considering load-sharing and redundancy so that both the system’s correct functioning and the element survivability are achieved.
The maintenance setup cost and the maintenance capacity influence the occurrence of the balances and reverse balances. A high setup cost can delay the replacement of degraded elements, and a low maintenance capacity restricts the number of replacements, both of which can lead to a poor system state. It is realistic that various restrictions on maintenance capacity exist in real life, and the setup cost is expensive for specific systems. However, the active balance and reverse balances discussed in this section stem from coping with systems with highly degraded or failed elements, and the cost caused by unfavorable factors, e.g., low maintenance capacity and high setup costs, can be reduced by the condition-based maintenance and performance control.
5.2. Comparison Experiment
This section first defines a benchmark policy and then presents the results of the comparison experiment, where how and why the CBMP outperforms the benchmark is also analyzed.
The benchmark policy consists of predetermined performance control and CBM. The predetermined performance control is introduced as follows. The system load is equally distributed among all system elements if they are all functional. When an element fails, its load is taken by its closest functional element before it. If no functional element meets such requirements, the LMCCS fails, and the remaining functional elements are switched off. Whether the system fails or not, we assume an element degrades according to its performance level. The CBM decision is made considering the predetermined performance control. From the perspective of MDP modeling, the action space of the benchmark is a proper subset of the CBMP problem’s action space. It means that the CBMP problem has more alternatives in making performance control and replacement decisions, which guarantees that the optimal CBMP policy is better than or at least equal to the optimal benchmark policy.
According to the results of the experiments, the mean value of the value function under CBMP is 4366.71, 6.54% smaller than the one under the benchmark policy, which is 4672.32. Meanwhile, for every state, the value function of CBMP is smaller than that of the benchmark policy. Under the two policies, 342 system states have different optimal actions, accounting for 33.40% of all 1024 states. The experiment results and the above theoretical analysis consistently indicate that CBMP outperforms the benchmark. Next, we examine the advantages of CBMP over the benchmark.
The CBMP policy allows for actively switching off a functional element, which raises the average degradation rate of system elements, according to Equation (8). Moreover, switching off a functional element may cause system failure when certain other elements have already failed. Therefore, the performance control should be condition-based and optimized so that the cost savings brought by switching off a functional element outweigh its side effect. Specifically, through an active balance of degradation, the load of a highly degraded element is taken by less-degraded ones, averting element failure, which also strengthens the system reliability.
Table 3 shows some of the results of CBMP and the benchmark policy, where the first row of each instance is the result of CBMP, while the second row in gray shading is the result of the benchmark policy. As shown in instances I15–I19, group G3 in
Table 3, because the system reliability is enhanced by the active performance control (load-sharing), replacements are delayed in the CBMP policy compared with the benchmark. Replacements being postponed means that the remaining useful life of degraded elements can be fully exploited. In addition, the postponed replacements and the balanced degradation levels can cluster replacements to achieve economies of scale.
5.3. Sensitivity Analysis
We discussed the balance and the reverse balance in the main experiment. In the comparison experiment, the advantages of the CBMP policy over the benchmark stem from the active balance by condition-based performance control. Several groups of sensitivity analysis are conducted to better investigate the impacts of uncertainties in the model parameters on the optimal CBMP, including the balance and the reverse balance. Based on the parameters in
Table 1, we investigate how the maintenance capacity, the setup cost, and the difference between preventive and corrective replacement costs affect the optimal results.
5.3.1. Sensitivity Analysis concerning Maintenance Capacity
In the main experiment, the maintenance capacity
is set as 2 so that for a system with multiple degraded elements, the post-replacement system state is still degraded. Since the balance and the reverse balance discussed in
Section 5.1.2 arise from coping with a degraded system state, if the maintenance capacity is not restricted, we may not be able to continue to observe such phenomena.
Table 4 shows the results with a different value of maintenance capacity, in which the first row of each instance is the result of the main experiment, where
; while the second row in gray shading results from changing the
to 5. In instances I20–I22 in group G4 (see
Table 4), when
, highly degraded elements are replaced first rather than the failed ones, and reverse balances regarding performance control can be observed in I23–I25, G5,
Table 4. However, when the capacity is lifted to 5, i.e., no limit on the number of replacements exists, the elements in I20–I25 are all replaced so that the systems are new and no balance nor reverse balance discussed in
Section 5.1.2 occurs.
5.3.2. Sensitivity Analysis concerning Maintenance Setup Costs
In the main experiment, the setup cost is relatively high compared with the preventive and corrective replacement costs (
,
,
), which can postpone the replacement and lead to a degraded system state.
Table 5 shows the results with a different value of the setup cost, in which the first row of each instance is the result of the main experiment, where
; while the second row in gray shading results from changing the
to 20. When
, the active balance by performance control can be seen in I26–I29,
Table 5, and reverse balances by performance control can be observed in I28–I31, where instances I28–I29 in G7 concern protecting highly degraded elements, and instances I30–I31 in G8 are about load-sharing responding to element failure. However, when the setup cost is decreased from 100 to 20, in I26–I31, the highly degraded and failed elements are replaced, preventing the balance and the reverse balance discussed in
Section 5.1.2 from happening.
5.3.3. Sensitivity Analysis concerning the Difference between Preventive and Corrective Replacement Costs
As mentioned in
Section 5.1.2, the reverse balance by replacement and the active balance by performance control arise from reducing the failure risk of highly degraded elements so that the expected element failure cost, reflected by the difference between preventive and corrective replacement costs, is decreased.
Table 6 shows the results with a different value of the element failure cost, in which the first row of each instance is the result of the main experiment, where
,
; while the second row in gray shading results from
,
. When the difference between preventive and corrective replacement costs decreased from 130 (
,
) to 60 (
,
), the balances and reverse balances can still be observed, as presented in instances I32–I37, groups G9–G11,
Table 6. Group G9 is about the reverse balance by replacement, group G10 relates to the balance and the reverse balance by performance control, and group G11 concerns the reverse balance by performance control arising from element failure.
6. Conclusions
We propose an optimal CBMP policy for LMCCSs to minimize long-term maintenance-related costs, including system failure costs. Specifically, optimal maintenance and performance control for every system degradation state is reached through MDP modeling and the policy iteration algorithm. In addition, we model the element degradation by the gamma process and discretize the continuous degradation process so that it can be modeled by the MDP. Our model and the algorithm also apply to linear consecutive-k-out-of-n:F systems because the LMCCS generalizes such systems. The optimal condition-based load-sharing for LMCCSs is also covered in the CBMP optimization. We have examined five critical factors influencing the optimal CBMP policy: load-sharing, structural characteristics of the LMCCS, maintenance setup cost, limited maintenance capacity, and the difference between preventive and corrective replacement costs. In numerical experiments, we analyze reverse balances resulting from replacement and performance control and the active balance through performance control. The reverse balance by replacement indicates that condition-based maintenance policies regarding a maintenance threshold, e.g., a control limit, may not always be optimal.
We suggest three directions for future research. First, the maintenance capacity is predetermined in our study, while it can be dynamic for future studies. Spare inventory replenishment, transportation, budget, and other concrete constraints on maintenance resources can be considered. Second, a node can hold more than one element in some LMCCSs, complicating the condition-based load-sharing decision. Researchers can consider the CBMP policy for other variants of the LMCCS. Last, real-world systems may pose challenges for collecting and processing information on the system degradation state. Making maintenance and performance control decisions under partial or inaccurate information is worth studying.