1. Introduction
Thailand is widely recognized as a global hub for hard disk drive (HDD) production. The manufacturing process of HDDs involves the utilization of advanced technology at various stages. Among these stages, the HDD testing process holds significant importance in instilling confidence in the products being sold. The signal writing process, which relies on the spiral seed self-servo writing machine, necessitates precise position control. In order to achieve high-quality position control, a proportional-integral-derivative (PID) controller must effectively handle diverse load characteristics. Presently, a single set of PID controller gains is used to control multiple positioners. Before deployment in the manufacturing process, the PID gains are manually tuned by expert designers to meet time and frequency domain criteria during bench testing in a clean room environment.
During operation, the signal writing machine operates across a wide range of frequencies within cycle-time limits. Any deviation in the positioner can impact the quality of the signal writing process, resulting in the position error signal (PES) exceeding control limits. The reference signal is generated during the HDD manufacturing process. High-precision machines are employed to regulate the movement position of the read–write head. However, over time, these machines experience wear and deterioration in their parts, leading to writing signal errors that exceed the specified range. Consequently, this causes wastage in the production process. In severe cases of machine part damage, the machine needs to be stopped for repairs. Optimizing the PID gain is crucial to minimize the PES. However, based on repair history, some machines have significant mechanical defects that cannot be rectified by optimizing the controller gain. These machines should undergo repairs instead of being included in the tunable class. It is necessary to separate them from the tunable class.
Currently, there are two approaches to address these issues. In cases where the signal writing quality does not meet the control specifications and defective units are identified, a high-strength permanent magnet is employed to erase the incorrectly written spiral signal. In cases where the machinery stops, the repair process is carried out in a step-by-step manner, beginning with the replacement of components according to predefined steps.
We conducted a study and classified the signal writing machines into four classes: 0 (healthy machines), 1 (encoder sensor degradation), 2 (pushpin degradation), and 3 (tunable machines). Machines in class 0 function well in manufacturing and produce high-quality signal writing. Machines in classes 1 and 2 should be repaired by replacing the degraded components. Machines in class 3 do not exhibit any mechanical degradation, but the PES exceeds the control limit. These machines should be categorized for controller optimization.
In model-based controller tuning, accurately estimating the mathematical model up to high frequencies is crucial to account for the effect of each resonance. The mathematical model can be obtained from the frequency response function (FRF), which can be gathered and used with system identification techniques to acquire the model. Controller tuning can then be implemented through model-based optimization to meet both time and frequency domain criteria.
To make informed decisions for each signal writing machine class, machine learning is employed to classify machine performance. Operating parameters during the signal writing process that demonstrate a high correlation with the dynamic response of each machine characteristic are considered as feature inputs for the machine learning model. The signal writing machines in class 3, which exhibit less deviation from nominal performance, are selected for further controller optimization.
The approaches used to perform system identification and machine learning classification were reviewed.
E.C. Levy [
1] introduced complex curve fitting in the frequency domain. Levy’s method identifies FRF in polynomial form in an ordinary least squares (OLS) sense and was later referenced by several other least squares methods. J.R. Marti [
2] presented the Bode technique to estimate the model parameter of transmission lines. The transfer function model was estimated exclusively using the magnitude response, wherein pole and zero are the real part within the complex number. Jiraphon Srisertpol and Autsadayut Rodpai [
3] proposed the mathematical model for linear viscoelastic materials. They created several weighting functions through frequency modification. Srisertpol and Rodpai define stable systems as those with a negative pole at the left half-plane of an s-domain. The system with the minimum norm error was selected as the candidate model in each iteration loop. Abdullah Al Mamun, T.H. Lee and T.S. Low [
4] proposed frequency domain identification as a transfer function model for disk drive actuators. This approach uses the error residue matrix from the prior cycle as a weighting function for the current calculation cycle. The results of this technique indicate whether the model is stable or unstable by using the left half-plane pole. A stable model is required for accurate controller tuning. Further research is ongoing for frequencies below 50 Hz, due to limitations in data collection. Moreover, some resonance peaks above 4 kHz have not yet been identified, and more iterations may be required. Taku Noda [
5] suggests the identification of a multiphase network equivalent for electromagnetic transient calculations using a partitioned frequency response that involves iterative re-weighting. The weighting function is the absolute error from the previous cycle. The partition is set manually at an interval of 1 kHz, which refers to the closet equidistant point. This method performs well in high resonance mode, with a smooth gain at low frequencies. In addition, Eduardo Salvador [
6] used several adaptive errors to modify weighting functions. This method works well when the numerator (zero) equals the denominator (pole). Takashi Yamaguchi, Mitsuo Hirata and Chee Khiang Pang [
7] proposed HDD actuator modeling with high-speed, precision motion control. This approach utilizes a combination of rigid body plus delay time and a multiplicative resonance model. A weighted least squares (WLS) technique (
fitsys in MATLAB
®) was used to validate the stable model. The weighting function was set with the Rigid Body baseline, with gain reduction in high frequency. Chen Peng and Liang Yanbing [
8] presented frequency domain identification in the context of a fast-steering mirror. The model was fitted by a gradient search type based on the Levenberg–Marquardt algorithm, and the error model was improved by adding zero to second-order standard forms. Tomaz K. and Damir [
9] presented an analytical method for estimation of a five-parameter model in second-order with zero plus time delay. The proposed method uses information in characteristic areas from arbitrary change in the steady state of the process to calculate parameters of model. The results show good fitting of both time and frequency response.
To classify machine performance, various classification techniques have been reviewed.
Salem M. et al. [
10] presented frequency response analysis (FRA) for three-phase star and delta induction motors to perform pattern recognition and fault analysis. They analyzed statistical parameters such as the correlation coefficient (CC), absolute sum of logarithm error (ASLE), standard deviation (SD), and mean square error (MSE) for healthy motors in both star and delta connections, as well as motors with short circuit faults (SC) and open circuit faults (OC). The results demonstrated that these statistical parameters can be used for pattern recognition. I Abdul et al. [
11] presented a fault diagnosis method for wind turbines using a decision tree classification algorithm. One advantage of this approach is that it does not require transformation or normalization of the input features, making it easier to trace the root cause of faults in the wind turbine. The resulting decision tree classifier can be used by an engineer to design simulation scenarios that replicate the faults and propose mitigating actions in supervisory control and data acquisition (SCADA). Masayuti S. and Prabhas C. [
12] used a support vector machine (SVM) to classify abnormal assembly processes in hard disk drive manufacturing. They achieved 100% accuracy in classifying good and abnormal assembly drives using a training set of 500 drives. They identified that SVM can be used in manufacturing and has advantages over conventional methods such as the trapping method because it is easier to develop and does not require knowledge of specific abnormalities to develop a correct program. Aida et al. [
13] presented a fault detection and classification method for power transmission systems using the k-nearest neighbor (k-NN) classifier. The system was classified into healthy and 10 faulty types with an accuracy of about 98%. Yordanos D. et al. [
14] presented application of machine learning for fault classification and location in a radial distribution grid. This was employed to extract useful features from the three-phase current signal. Standard statistical techniques are then applied to discrete wavelet transform (DWT) coefficients to extract the useful features. Multilayer perceptron (MLP) and extreme learning machine (ELM) were used to classify “no fault” and 10 fault types. The results showed both methods performed very efficient for the classification and location of faults, and that ELM is faster to train. Thanasak W. et al. [
15] presented fault detection and identification (FDI) of mount head damage in head gimbal assembly (HGA) manufacturing to improve slider loss defect (SLD) using an artificial neural network (ANN). The mount heads were classified into healthy and three fault types, which were dependent on damage level. The input feature is an image of mount head each group. The performance on classifying each damage type was 94.3%, which was better than the result obtained from using voltage and vacuum methods. Prathan C. et al. [
16] developed fault detection and diagnosis techniques for linear bearing in an auto core adhesion mounting (ACAM) machine using vibration signal analysis and motor current analysis. The vibration signal was transformed to crest factor and fast Fourier transform (FFT), while motor current was analyzed using analysis of variance (ANOVA). The researchers were able to separate healthy linear motor from five fault types with good classification accuracy. Prathan C. et al. [
17] used ANN to segregate healthy bearings from five defect types using the spectrum of FFT of vibration during linear motor movement, motor current, and crest factor as input parameters. The results showed an accuracy of up to 93% using the triple parameters. In subsequent work, Prathan C. et al. [
18] used ANN to classify anomalies in a high-speed auto core adhesion mounting machine, followed by using a PI servo to compensate the machine back to an acceptable condition. They achieved an accuracy of 100% in the classification via the confusion matrix. They also demonstrated two types of gain scheduling, namely, discrete gain and continuous gain, which resulted in 86% and 93% error reduction, respectively. Ahmed R. Nasser et al. [
19]. presented intelligent FD and identification on an analogue circuit using a fuzzy-logic classifier. Testing was done on a low-pass filter analogue circuit. An average 98% F score accuracy was shown for this classifier. Yun Peng Zue et al. [
20] presented a study on the condition monitoring of a roto shaft system using frequency response function (FRF). The researchers conducted two experimental studies to evaluate the system using different methods. The first study focused on rub-impact faults, while the second study examined misalignment. Three methods were compared in the analysis: the first method utilized harmonic data, the second employed harmonic excitation, and the last method utilized potential from experimental data. The results demonstrated that the second method was better at detecting faults than the other methods. Naderi E. and Khorasani K. [
21] presented a data-driven approach for fault detection, isolation, and estimation of aircraft gas turbine engine actuators and sensors. The frequency response of the gas turbine engine was considered and estimated using Markov parameter estimation. These data were then utilized for the direct design and implementation of fault detection, isolation, and estimation filters. Samree J. et al. [
22] presented transient analysis method using a high-pass filter circuit in a high-voltage system. Faults were detected very quickly—within 1 ms. Abdelila E. [
23] presented a fault-detection method for photovoltaic (PV) panels using k-means clustering. The input data were collected from thermal images, and an image processing technique was applied to convert the RGB color space to HSV. The number of clusters, k, was determined using the elbow method and the average silhouette method. The results demonstrated that the k-means algorithm successfully detected the location of faults on PV panels. Christoph K. et al. [
24] presented vibration classification on a motor using machine learning. Several acceleration sensors were evaluated, and the vibration at each motor speed was used as the label for the machine learning model. Smoothing and differencing were used for feature extraction. Performance based on decision tree, gradient boost, and focus cluster algorithms were presented in terms of accuracy and computing time. A. M. Umbrajkaar et al. [
25] presented vibration analysis on rotating machines using a combined approach of SVM and ANN. The data from triaxial acceleration sensors were collected in three groups: healthy, parallel misalignment, and angular misalignment. The data were transformed using DWT before passing to the machine learning model.
Based on our review of various system identification techniques, we have concluded that the WLS technique is appropriate for signal writing machines because it does not re-quire an initial gradient search parameter. However, WLS requires an appropriate weighting function. The Autsadayut [
3] method works well under stable system conditions and with linear viscoelastic materials. The Noda [
5] method works well for electrical plant identification when WLS reveals equidistant points in the frequency response. Either way, signal writing machines represent a resonance mode with an inequal frequency band. Some resonance modes tend to misidentify frequencies or require multiple iterations to meet criteria. We used [
3,
5] the peak partition technique to model a signal writing machine.
On several classification techniques, we decided to use an ANN classifier as our machine learning classifier. We used each machine group’s performance to represent the label output. The operation parameters that show a high correlation to the dynamics behavior of each transfer function will be used as feature inputs during the signal write process.
The paper is organized as follows:
Section 2 describes the materials and methods used to analyze and estimate mathematical models in transfer function form, including time and frequency domains. The machine classification using an artificial neural network (ANN) structure is also presented in this section.
Section 3 describes the results of the mathematical model obtained from system identification techniques and the results of machine classification using ANN.
Section 4 discusses the validation results for each model structure and the accuracy of machine classification. Finally,
Section 5 offers conclusions and alludes to further work.
2. Materials and Methods
This section describes the operation of a spiral seed self-servo track writing machine for closed-loop position control during HDD testing to collect various data from the machine during testing. It also discusses the structure of a mathematical model in terms of a transfer function, the procedure for mathematical model estimation using system identification methods, and the analysis of the response of the test machine in the time and frequency domain for each machine characteristic. Additionally, it demonstrates the classification procedure of the ANN technique for testing machines.
2.1. Position Control
The spiral seed self-servo track writing machine, as shown in
Figure 1, is used for writing a reference signal in a spiral pattern. The machine consists of 16 signal write slots, with one industrial computer controlling 4 signal write slots. It runs on multithread software called WinSTW. Each slot consists of a drive figure clamping, an electric control board, and a positioner known as MicroE. The positioner model is the PA2000, which is integrated with a voice-coil motor (VCM) and an absolute encoder, and the amplifier model is SA200. The maximum stroke range is 40 degrees, with an encoder resolution of 4.68 nanoradians. The rotary shaft of the MicroE is assembled with a customized positioner arm that has an adjustable pushpin height to support multiple product specifications.
The position control system shown in
Figure 2 consists of a plant that is engaged with complex loads such as hard disk drives (HDDs). The controller is a PID controller with low-pass and notch filters.
To obtain frequency response data, MicroE Systems Motion Control software version 6.3 was used, in frequency analysis mode. The input swept-sine signal was set to 0.4 Vpp, with frequency range 40–5000 Hz. The sampling time was set to 39.6 usec for data collection. Bode plots were generated by injecting the sine input signal at the junction point be-tween the controller and amplifier. We then collected data from the encoder, where the position changed, as illustrated in in
Figure 3.
Please note that the phase response was displayed as a phase margin and was wrapped at positive(+)180 to negative(−)180 degrees. The frequency response data were saved in a bod file and imported into the system identification software. As shown in
Figure 4, the phase display data from the MicroE Systems Motion Control software needs to be unwrapped and corrected as a phase response.
The correct phase response with unwrap version is shown in
Figure 5, below.
2.2. Model Structure
The frequency response of the high-precision machine in
Figure 5 can be used for system identification. The model structure was defined by a combination of low- and high-frequency models.
The model structure for the low-frequency range can be obtained by using the response data from the first response frequency up to 500 Hz. When we converted the frequency axis from Hz to rad/sec, the magnitude response exhibited a slope of −40 dB/decade, with a small gain increase around 50 to 70 Hz. These results motivated us to create a second-order standard form model, as shown in (1), where
and
are the natural frequency and damping ratio, respectively.
represents the DC gain of the model, which combines the plant gain and amplifier gain.
The model structure for the high-frequency range can be obtained by using the frequency response data above 500 Hz, represented by a
type resonance model as shown in (2), where
and
are the natural frequencies and damping ratios of the anti-resonance modes, respectively.
and
are the natural frequencies and damping ratios of the resonance modes, respectively.
The plant transfer function is shown in (3).
2.3. System Identification Algorithm
To find model parameters and order in high-precision machines, OLS and WLS were used, with an iterative weighting function. We considered using the general transfer function
in (4).
and
represent the numerator and denominator of the transfer function, respectively. They can be expressed in polynomial form, as shown in (5).
The error
of actual frequency response data
and model
are represented in (6).
We replaced
with the numerator and denominator.
Then, we multiplied both sides of (7) by
.
When the error related to the frequency response data and model was zero, the left side of (8) was also zero. When we replaced and with the polynomials from (5), the result was (9).
Next, we rearranged (9) to (10).
Then, we expressed (10) in matrix form as (11), to solve the least squares solution.
is the coefficient matrix, as shown in (12).
Since
= 1, Equation (12) is modified to (13),
is a coefficient vector of the numerator and denominator, as shown in (14).
is the actual frequency response vector, as shown in (15)
and represent the number of poles and zeros, respectively. is the measurement point.
The OLS solution is shown in (16).
The WLS solution is shown in (17).
is the weighting function [
5], which was changed adaptively according to the solution of least squares in the previous step.
is the iteration step and the initial
is the identity matrix, which is the OLS solution.
To evaluate model accuracy, we had to consider the error of transfer function in (7) from each data point. Then, the norm value [
3] was calculated, as shown in (19).
2.4. Second-Order Standard Form Model Estimation
The second-order standard form model parameters in (1) are identified by (16) and (17). The model parameters are shown in (20). The fitting result is illustrated in
Figure 6. The magnitude response of the model fit well to the actual magnitude response at low frequency when the sample point of phase shift was not less than −180 degrees. The minimum model error of model was 808. The best weighting function is shown in
Figure 7. The weighting function showed gain reduction at high frequency.
There were delays related to phase response at frequencies exceeding 1 kHz. Thus, we added a transportation delay to (1). We multiplied delay up to second-order standard form to improve phase errors at high frequency, while maintaining magnitude response. The model structure was modified as shown in (21).
MicroE Systems Motion Control software was used in Time Response Analysis mode. The output of the time response data was exported to a plot, as shown in
Figure 8, with the delay indicating the time between the two samples. We estimated time delays from 39.6 usec to 79.4 usec. Based on a Golden Section Method search (GSM), the time delay was about 65 usec. The model response and actual response are shown in
Figure 9. The model parameters are illustrated in (22). The minimum model error of second-order standard form plus delay time was reduced to 277.
2.5. Resonance Model Estimation
The resonance model in (2) was separately identified by using a resonance peak command (findpeaks in MATLAB®). The first data point from the first group was selected from the end of the second-order standard form data.
Then, the first resonance frequency and center frequency from each group were partitioned and indicated in dash line, as shown in
Figure 10.
Next, (16) and (17) were used to estimate the model parameter, under the following specific conditions:
The higher magnitude peak was identified first.
The maximum order in each of part was limited to the number of peaks, multiplied by 2.
One peak equates 2 orders.
The minimum order from each part was set as a second-order.
The result of pole and zero must be in the left half-plane of the s-domain.
The result of pole and zero must be within a fitting frequency range.
The result must be a complex conjugate of pole and zero. This condition was set to maintain phase response from a roughly phased slope of the second-order standard form plus delay time model, from (21).
The last was around 4000 Hz. The fitting sequence for identifying the resonance groups followed a higher-to-lower peak magnitude, as indicated by the numbers at the top of each peak in
Figure 11. The first resonance group identified was around 1200 Hz, while the last resonance group was around 4000 Hz.
The fitting result of the resonance model is shown in
Figure 12. The model parameters are illustrated in (23) to (28). Several resonance peaks can be identified as having a second-order complex conjugate of pole and zero. The phase response was maintained at 0 degrees.
The combination of second-order standard form plus delay time with a resonance model is shown in
Figure 13. The model error was 114.
2.6. Resonance Gain Adjustment
The result of the second-order standard form plus delay time combination and the resonance model in
Figure 13 reveals a small gain, which decreased at high frequency. We improved the model by maintaining the phase response and fine-tuning the small resonance gain. (2) was modified to (29), where
was the resonance gain. Because one-dimensional gain searching was required, we used the GSM method. We set the minimum and maximum resonance gain at 0.1 dB and 2 dB, respectively. The best resonance gain was adjusted to 1.5 dB. The model error was reduced to 93. The comparison of actual response and model response is shown in
Figure 14.
The final model parameter is shown in (30). The model was at the 22nd order, with a small delay time.
As the signal writing machine utilizes a digital PID controller, it is necessary to convert the plant model in (30) from continuous time to discrete time. This conversion can be achieved using the C2D (continuous to discrete) method. In this case, we employ the zero-order hold (ZOH) method to transform the model. The resulting transformed equation is provided in (31).
The flowchart of the system identification method, including the adaptive weight least squares and peak detector for the partition resonance frequency model, is shown in
Figure 15.
2.7. Frequency and Time Response Analysis
In the manufacturing process, even though the same type of signal writing machine is used, differences in its characteristics can result in varying dynamic response performance.
Figure 16,
Figure 17,
Figure 18,
Figure 19 and
Figure 20 show the comparison of the mechanical FRF and PES of three classes of signal writing machines, with class 0 being defined as a healthy machine that produces a good HDD during the signal writing operation.
Class 1 in
Figure 16a shows DC gain increase from 75 to 83 in magnitude plot compared to the healthy machine. The (32) and (33) show second-order standard form model of a healthy model and class 1 model, respectively.
Figure 16b PES shows high variations across HSA seek from OD to ID as indicated by the dash line. According to repair history, the sensor voltage drops from more than 1.5 V to 0.5 V, as shown in the red line in
Figure 17. This machine is categorized as having a sensor fault and cannot be compensated due to the actual degradation of the sensor. These machines will be shut down for repair.
Class 2 in
Figure 18a shows that the first resonance frequency has shifted from around 1230 Hz to 665 Hz, resulting in an increased first resonance magnitude. The natural frequency of the second-order standard form model has moved to around 110 Hz, which is close to the HDD spindle motor speed at 120 Hz (7200 rpm). The (34) and (35) show first resonance model of a healthy model and class 2 machine model, respectively. The PES in
Figure 18b exhibits slow response and large oscillations when seeking starts from the OD as indicated by the dash line. DFT is calculated using PES at a servo sample from 200 to 400, which is in the OD zone, and the DFT control band is from 600 Hz to 1500 Hz. This causes the DFT to exceed the control limit. The machine has a loose pushpin, which is a mechanical degradation and cannot be compensated. It will be shut down for repair.
Figure 19 shows a comparison of a healthy pushpin and a loose pushpin.
Class 3 in
Figure 20a has a comparable FRF, with the first resonance frequency slightly higher than that of the healthy machine. Upon closer examination in
Figure 21, the gain crossover frequency is slightly lower than the healthy machine as indicated by the arrow dash line, and the phase margin tends to be higher, leading to low closed-loop bandwidth and a larger damping ratio. When using the same controller gain, the response of this class becomes slower than the healthy machine. The (36) and (37) show first resonance model of the healthy model and class 3 model, respectively. The PES in
Figure 20b shows slow response when compared to the healthy machine as indicated by the dash line. This class exhibits a slight performance deviation and can be optimized by adjusting the controller gain to meet frequency domain stability criteria and reduce PES.
The DFT results of each machine class have been compared with those of the healthy machine, as shown in
Figure 22. The healthy class shows minimum amplitude for each harmonic. In the sensor fault class (red line), a high amplitude is observed around 1100 Hz, which is considered system noise since it is beyond the closed-loop design bandwidth. In the pushpin-loose class (yellow line), a high amplitude is observed around 700 Hz, which is considered the effect of the first resonance. In the tunable class (green marker), a high amplitude is observed at low frequencies, which is considered a result of the slow system response.
2.8. Tester Classification: ANN Structure
From the deviation among the three machine classes, in order to make the best decisions for each machine class, it is necessary to separate them based on their deviations. Class 1 and class 2 are classified as machines that require maintenance, while class 3 is identified as a tunable machine. To address this issue, the authors propose a machine learning method that classifies machines based on their characteristics.
The authors utilize an artificial neural network (ANN) structure from the Keras python package to classify machines into different classes. The ANN structure comprises 10 input nodes, representing the selected features. To achieve high accuracy, the authors initially experimented with a large number of nodes in each layer, using a configuration of 200–500–200 nodes. The rectified linear unit (ReLU) activation function is applied in these hidden layers.
The output layer of the ANN consists of 4 nodes, representing the 4 machine classes. The SoftMax activation function is utilized in the output layer to generate class probabilities.
Figure 23 illustrates the structure of the ANN.
During training, the network is trained for 200 epochs with a batch size of 10. The authors employ the Adam optimizer with a learning rate of 0.01. The categorical cross-entropy loss function is used to evaluate the network’s performance.
The ReLU activation function is defined by the Formula (38), while the SoftMax activation function is defined by the Formula (39). These activation functions play a crucial role in introducing non-linearity and producing appropriate output representations for classification.
2.9. Data Collection and Preparation
In machine learning, data collection and preparation are crucial. This study collected over 200 features from 4 classes of signal writing machines. The raw data for 486 drives were obtained from the manufacturing log file. Class 0, representing the healthy condition, has 198 drives in its dataset. Class 1, representing a sensor fault machine, has 106 drives. Class 2, representing a pushpin-loose machine, has 78 drives. Finally, class 3, representing a tunable machine, has 104 drives.
The data was first randomly indexed, and then cleaned using the modified version of the FeatureSelector Python package [
26] to support multiclass data. The cleaning process involved removing missing data that accounted for more than 20% of each feature, as well as single unique data and linear correlations that showed a 90% or higher correlation to other features. The feature ranking was determined using light GBM, and
Figure 24 displays the 10 most important operating parameters that correlate highly with the label. These parameters are considered input features.
DFT in frequency domain is calculated from raw PES data in time domain by (40). Only DFT amplitude was considered (41):
is DFT which include information of both amplitude and phase
is PES value in time domain
is number of samples
is current sample
DFT_Euclidean_Low: Represents the Euclidean norm (38) of DFT harmonics at low frequencies between 600 and 1500 Hz. The symptoms associated with this feature are observed in the analysis of frequency and time response in
Section 2.7. A high DFT value can be attributed to three symptoms:
High noise, as shown in
Figure 16, where the model exhibits an overall increase in magnitude, including noise.
High variation during the start of movement, as shown in
Figure 18, where the model’s first resonance shifts to a lower frequency.
Slow response due to a low bandwidth resulting from the reduction of complex pole/zero cancelation, as shown in
Figure 20.
The histogram in
Figure 25a indicates significant variation in the sensor fault class, while in
Figure 25b, the median value of the healthy class is significantly lower than that of the other classes. Outlier data points are marked with diamond symbols. This feature plays a crucial role in distinguishing the healthy class from the other classes.
- 2.
DFT_Euclidean_High: Represents the Euclidean norm (43) of DFT harmonics at high frequencies between 1500 and 3000 Hz. This feature is also important for distinguishing the healthy class from the other classes. The interpretation of this feature is similar to the first feature, as discussed in
Section 2.7, but with a focus on higher frequencies of interest. The histogram in
Figure 26a indicates significant variation in the sensor fault class, while in
Figure 26b, the median value of the healthy class is significantly lower than that of the other classes.
- 3.
Harmonic3: Represents the amplitude of the third DFT harmonic (37). Due to the nature of a low-bandwidth system, a slow response symptom will result in a high DFT magnitude at low frequencies, as shown in
Figure 22. The histogram in
Figure 27a demonstrates narrow variation in the healthy, sensor fault, and pushpin-loose classes. Additionally, the boxplot in
Figure 27b reveals that the median of the tunable class is higher than that of the other classes. This indicates that the Harmonic3 feature has high potential for distinguishing the tunable class from the other classes.
- 4.
PES_Range: Represents the range of PES peaks in track units from OD to ID. A high value indicates a high spread of position error in the system or high overshoot, which can be attributed to a low closed-loop damping ratio when the plant described in (30) is operated as a closed-loop system. However, it should be noted that some unit conversions and truncations of floating-point numbers in reporting may affect the accuracy. In
Figure 28a,b, there is no obvious symptom to use this feature alone for separating machine classes. While this feature alone may not be sufficient, it can contribute to increasing accuracy when combined with other features.
- 5.
FF_Factor: Represents the feed-forward gain that is adjusted in the feed-forward controller to minimize PES during mechanical HSA warmup. If the system exhibits a smooth slope of a second-order system at low frequency, we can add an appropriate amount of FF factor to the feed forward filter in order to reduce PES. However, if the effect of the first resonance impact at low frequency is present, we cannot increase the FF gain further to reduce PES. In the histogram shown in
Figure 29a, the pushpin-loose class exhibits a significantly lower median than the other classes, which is attributed to the effect of the first resonance shifting to a lower frequency.
- 6.
PES_STD: Represents the standard deviation (44) of PES in the DFT calculation within a certain range. A high value of PES_STD can be interpreted as a high settling time, indicating a low closed-loop system bandwidth. In
Figure 30a, the data from the tunable and some sensor fault classes are separated from the healthy and pushpin-loose classes. Additionally, in
Figure 30b, the boxplot demonstrates high potential to distinguish the tunable class from the other classes.
- 7.
Harmonic15: Represents the amplitude of the 15th DFT harmonic (41). The characteristic of a low closed-loop bandwidth system is a slow response, resulting in a high DFT magnitude at low frequencies, as shown in
Figure 22. In
Figure 31a, a slightly higher variation can be observed in the sensor fault class. However, in
Figure 31b, the boxplot demonstrates a high median value of the 15th harmonic for the tunable class, indicating its potential to be separated from the other classes.
- 8.
Harmonic1: Represents the amplitude of the first DFT harmonic (41), similar to other low DFT harmonic features. The characteristic of a low closed-loop bandwidth system is a slow response. In
Figure 32a, the data of the tunable class is separated from the other classes. In
Figure 32b, the boxplot shows that the median of the tunable class is noticeably distinct from the other classes. This indicates high potential to use this feature for separating machine classes.
- 9.
Harmonic6: Represents the amplitude of the sixth DFT harmonic (41). Although there is no clear variation observed in the histogram shown in
Figure 33a, the boxplot in
Figure 33b suggests that the data can be separated into two distinct groups. The first group consists of the healthy and sensor fault classes, while the second group includes the pushpin-loose and tunable classes.
- 10.
Harmonic8: Represents the amplitude of the eighth DFT harmonic (41). In
Figure 22, high peaks can be observed around 1100 Hz and 2200 Hz, indicating the presence of resonance and an increase in noise. These peaks correspond to a frequency response where the gain increases across the entire frequency range.
Figure 34a shows significant variation within the sensor fault class, suggesting the occurrence of faults.
Figure 34b highlights the high magnitude of the eighth harmonic, indicating its potential to distinguish sensor fault classes from other groups.
Due to the 10 operating parameters being in different scales and units, the data were normalized using (45) and (46). This normalization process ensures that the data have a zero mean and a unit L2 norm. The normalization was performed using the normalize Python package. Additionally, the output labels were encoded using the one-hot format, a commonly employed technique for categorical variables in machine learning.
To assess the model’s performance, the dataset was divided into three groups using the Python train_test_split package. The split ratio used allocated 20% of the data for testing, while the remaining 80% was further divided, with 20% for validation.
Due to the nature of the manufacturing process, the dataset contains fewer samples from defective machines compared to healthy machines. To address this class imbalance, the “stratify = y” argument was utilized during the data-splitting process. This ensured that the data was partitioned in a manner that maintained an equal ratio of samples from each class across the three groups (train, validate, and test). By stratifying the data based on the output labels, the resulting datasets had a representative distribution of samples from each class, facilitating more accurate model training and evaluation.
The final distribution of the data after splitting was as follows:
Train: 60% (291 drives);
Validate: 20% (97 drives);
4. Discussion
A mathematical model of a signal writing machine has been successfully developed in a stable condition. The model consists of a second-order system with a small delay connected in series with a resonance model. The parameters of the model were determined using the adaptive weight least squares method and a peak detection for resonance partition.
The first resonance peak was accurately identified as it is more critical and has a significant impact on the system’s closed-loop response. If the first resonance frequency falls within the DFT checking band, there is a high risk of failing to meet the DFT limit.
The model that exhibits minimal deviation from the baseline can be utilized for future controller optimization to minimize PES and ensure stability.
However, due to the complexity of system identification and the time it takes, and in some cases, mechanical degradation that cannot be compensated for by adjusting the controller, machine classification is proposed.
The features of the machine learning model were derived from the DFT values in each harmonic, which correlate with the resonance model obtained from system identification.
The performance of the ANN classifier was promising, with 100% accuracy. The cross-validation shows minimum accuracy at 98% with low variation, indicating its ability to effectively distinguish between healthy and other classes of signal writing machines. Hyperparameter tuning can be employed to reduce the neural size while maintaining high accuracy.
The tunable class has significant potential for optimizing controller gain, wherein sensor fault and pushpin-loose classes can be used for maintenance planning.
5. Conclusions
This research presents a classification system for machine performance using system identification analysis and an artificial neural network (ANN). The machines are categorized into four classes base on symptom position error and defect in machine component: 0 (healthy machine), 1 (sensor fault), 2 (loose pushpin), and 3 (tunable machine). The transfer function model parameters are analyzed to determine their correlation with position error response in both the time and frequency domains. Parameters from both domains that show high correlation are used as feature inputs for the ANN, while the machine class serves as the label.
The position error signal (PES) in the time domain is transformed to the frequency domain using DFT at a specific OD seek location. Class 0 represents a healthy machine that exhibits a small PES and no abnormalities in its components. All DFT harmonics are within the pass criterion.
Symptoms of class 1 indicate low voltage in the encoder sensor. The PES shows high variation across the stroke seek OD to ID, indicating that noise affects the response. The system identification shows that the system gain increases, leading to increased noise signals. The DFT shows several high harmonic peaks, resulting in HDD failing the DFT criterion. This machine class requires repair by changing the positioner, which is unable to compensate.
Symptoms of class 2 indicate a loose pushpin. The PES shows oscillation at the OD zone. The system identification shows that the first resonance model changes from 1230 Hz to 665 Hz, which is within the closed-loop bandwidth, leading to an increase in first resonance response. The DFT shows a high harmonic at the first resonance frequency, resulting in HDD failing the DFT criterion. This machine class requires repair by changing the pushpin, which is unable to compensate.
Class 3 PES exhibits smoother symptoms than classes 1 and 2. However, the PES shows a slower decrease than a healthy machine. The system identification shows that the first resonance model is closed to the healthy class. However, the gain cross-frequency is lower than that of a healthy machine, and the phase margin tends to be larger than that of a healthy machine, leading to a slower system response than the healthy class. The DFT shows a high magnitude at the first harmonic, resulting in HDD failing the DFT criterion. There are no obvious component parts degrading. Through rough experimentation, the response of this machine class can be improved by optimizing the controller gain.
The DFT and PES from specific frequency ranges that correlate with the characteristics of the mathematical model are key parameters for classifying machine class. Ten and five features are evaluated. Several training epochs are performed, and the ANN classification results show high accuracy in all confusion matrices.
Machines in classes 1 and 2 are recommended for repair, whereas those in the tunable machine class have the potential for PES improvement by adjusting the controller gain. The mathematical model obtained from the system identification method will be used for further controller optimization.
The classification method utilizing machine learning has demonstrated high accuracy, greatly aiding users in identifying faulty parts instead of following a sequential fixing process. Furthermore, we can leverage the model information from the tunable group to optimize the controller, rather than depending solely on a single controller gain. This approach presents potential advantages in terms of efficiency and performance optimization.