1. Introduction
Uninterrupted power source (UPS) batteries are an integral part of any data center, which ensure the stable performance of the data center during transitional fail-over mechanisms between power grids and diesel generators [
1]. Data centers require steady power for smooth performance, which is thus managed by the UPS batteries. UPS is installed between the main power grid and the servers [
2]. Since the electricity bill of a data center constitutes a significant portion of its overall operational costs, data centers are now major consumers of electrical energy [
3]. In 2013, data centers in U.S.A. consumed 91 billion kilowatt-hours of electricity, and this was expected to continue to rise over the years [
4]. In 2017, nearly 8 million data centers required an astronomical 416.2 terawatt-hours of electricity [
5,
6]. Even a single faulty battery in a pack could cause millions of dollars of damage to the equipment used in the data centers during transition. The layout of the data center’s design is illustrated in
Figure 1.
Despite the increasing improvements in battery manufacturing and storage technology [
7], health estimation of batteries in data centers is still a challenge. Not surprisingly, many studies have been conducted to develop battery life prediction of the battery packs, such as voltage fault diagnosis, charge regimes, and state of health (SOH) estimation. Severson et al. [
8] demonstrated a data-driven model to predict the battery life cycle with voltage curves of 124 batteries before degradation. Tang et al. [
9] predicted the battery voltage with the model-based extreme learning machine for electric vehicles. L. Jiang et al. [
10] employed the Taguchi method to search an optimal charging pattern for 5-stage constant-current charging strategy and improved the lithium-ion battery charging efficiency by 0.6–0.9%. D. Sidorov et al. [
11] presented a review of battery energy storage and an example of battery modeling for renewable energy applications and demonstrated an adaptive approach to solve the load leveling problem with storage. Hu et al. [
12] employed advanced sparse Bayesian predictive modeling (SBPM) methodology to capture the underlying correspondence between capacity loss and sample entropy. Sample entropy of short voltages displayed an effective variable of capacity loss. You et al. [
13] proposed a data-driven approach to trace battery SOH by using data, such as current, voltage, and temperature, as well as historical distributions. Song et al. [
14] proposed a data-driven hybrid remaining useful life estimation approach by fussing the IND-AR (Iterative nonlinear degradation autoregressive model) and empirical model via the state-space model in RPF (Regularized particle filter) for spacecraft lithium-ion batteries. Zhou et al. [
15] combined Empirical Mode Decomposition (EMD) and Auto-Regressive Integrated Moving Average (ARIMA) models for the prediction of lithium-ion batteries’ Remaining Useful Life (RUL) in the Battery Management System (BMS), which is used in electric vehicles. Chen et al. [
16] proposed a hybrid approach by combining Variational Mode Decomposition (VMD) de-noising technique, ARIMA, and GM (Gray Model) (1,1) models for battery RUL prediction.
The ARIMA model has been one of the most widely used models in time-series forecasting [
17,
18,
19]. Kavasseri et al. [
20] examines the use of fractional-ARIMA or f-ARIMA models to forecast wind speeds on the day-ahead (24 h) and two-day-ahead (48 h) horizons. A hybridization of Artificial Neural Network (ANN) and the ARIMA model is proposed by Khashei et al. [
21] to overcome the mentioned limitation of ANNs and yield a more accurate forecasting model than traditional hybrid ARIMA-ANNs models. The annual energy consumption in Iran is forecasted by using three patterns of ARIMA–ANFIS model by Barak et al [
22].
ARIMA is used in forecasting social, economic, engineering, foreign exchange, and stock problems. It predicts future values of a time series using a linear combination of its past values and a series of errors [
23,
24,
25,
26,
27]. Since batteries in the data center are always on charging mode, the deep discharge is a rare occurrence for batteries and their distinctive internal chemistry causes different behaviors like stationary or stochastic for each battery. In addition, failure data is not available in real life which makes it a challenge to accurately predict the battery status before its first failure. For this paper, we developed a cluster-assisted ARIMA model framework to improve the accurate prediction of battery voltage. Clustered patterns are utilized as external regressors to improve the accuracy of the ARIMA model and provide a more accurate indication of battery status in the future. Clustering in machine learning is the grouping of a similar set of data points. This aspect is used to group the patterns of batteries within the data center and improve the forecasting model instead of predicting thousands of batteries individually. Clustering algorithms, like Dynamic Time Warping (DTW), hierarchical, fuzzy, k-shape, and TADPole all have unique functionality for grouping similar data points, and the features selected by clustering improve the model forecasting accuracy [
28,
29,
30]. The proposed cluster-assisted forecasting results are compared with actual battery data and without clustered ARIMA forecasting.
The rest of the paper is organized as follows:
Section 2 describes the features of the data center and data set used for the study.
Section 3 describes data preprocessing and explain the methodology by introducing the algorithms for cluster consistency and clustered ARIMA forecasting.
Section 4 shows the steps to implement the proposed clustered forecasting method.
Section 5 demonstrates the battery cluster consistency detection results and cluster-assisted ARIMA forecasting, as well as discusses the effectiveness of the method by comparing the results with actual data and without the cluster-assisted forecasting ARIMA model.
Section 6 concludes this work.