Real-Time Analysis of Industrial Data Using the Unsupervised Hierarchical Density-Based Spatial Clustering of Applications with Noise Method in Monitoring the Welding Process in a Robotic Cell

Blachowicz, Tomasz; Wylezek, Jacek; Sokol, Zbigniew; Bondel, Marcin

doi:10.3390/info16020079

Open AccessArticle

Real-Time Analysis of Industrial Data Using the Unsupervised Hierarchical Density-Based Spatial Clustering of Applications with Noise Method in Monitoring the Welding Process in a Robotic Cell

¹

PROPOINT S.A., R&D Department, Bojkowska 37 R Str., 44-100 Gliwice, Poland

²

Institute of Physics—CSE, Silesian University of Technology, S. Konarskiego 22B Str., 44-100 Gliwice, Poland

^*

Author to whom correspondence should be addressed.

Information 2025, 16(2), 79; https://doi.org/10.3390/info16020079

Submission received: 28 November 2024 / Revised: 7 January 2025 / Accepted: 20 January 2025 / Published: 22 January 2025

(This article belongs to the Special Issue The Convergence of Artificial Intelligence and Internet of Things Security: Shaping the Future of Secure Connected Systems)

Download

Browse Figures

Versions Notes

Abstract

:

The application of modern machine learning methods in industrial settings is a relatively new challenge and remains in the early stages of development. Current computational power enables the processing of vast numbers of production parameters in real time. This article presents a practical analysis of the welding process in a robotic cell using the unsupervised HDBSCAN machine learning algorithm, highlighting its advantages over the classical k-means algorithm. This paper also addresses the problem of predicting and monitoring undesirable situations and proposes the use of the real-time graphical representation of noisy data as a particularly effective solution for managing such issues.

Keywords:

machine learning; monitoring of industrial processes; predictive maintenance; industry 4.0; HDBSCAN algorithm; welding in a robotic cell

Graphical Abstract

1. Introduction

The increasing popularity of artificial intelligence (AI) methods, especially those in machine learning (ML), has found applications in numerous branches of science and technology [1,2,3,4,5,6,7,8,9]. The rapid development of these methods is closely linked to advances in computational power, particularly for big data (BD) processing [10], which has practical applications across various industrial sectors [11]. Examples include food processing technology [12], quality and safety improvement [13], decision support [14], and risk management [15]. Additionally, BD analysis facilitates supply chain continuity [16], material data mining [17], and the prediction of production anomalies [18,19,20,21,22].

A key challenge in BD and AI applications is data visualization, particularly from the perspective of human operators, whether as end-users of software applications or individuals supervising production processes [23,24,25,26,27,28,29,30,31,32,33,34,35,36]. Virtual-reality solutions are of particular interest, as they integrate real-time production data with spatial information about physical objects [37,38,39]. Selecting the appropriate type of production parameter information for workers is essential [40], depending on production activities, such as automated assembly [41], manual assembly [42], or process monitoring and control [43,44]. Though we cannot delve into all related challenges here, issues like the use of virtual reality goggles [45,46,47] and human–machine interfaces (HMIs) [48,49,50] for optimal visualization remain pivotal to industrial progress.

This paper focuses on utilizing the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) algorithm in real technological scenarios, aiming to provide operators with informative graphical data to support decision making—a key objective of this study.

Furthermore, it must be noted that ML methods in real-world robotic applications are not yet widely implemented. While robotization is substantial in many industries, its geographic distribution is uneven [51,52]. Historically, industry standards have relied on programmable logic controllers (PLCs) for automation [53,54], leaving industrial sensor technology relatively underdeveloped. As a result, while major industry players generate large volumes of data, their practical use often remains underexplored. Key areas of current BD and ML applications include geoscience [55], urban transport monitoring [56], and database numerical analysis [57,58]. In machine industry contexts, k-means algorithms have been applied to process optimization and anomaly detection in the automotive sector [59,60], while HDBSCAN has seen applications in hydropower plant monitoring [61] and fault detection [62,63,64,65]. A comprehensive review of Industry 4.0, emphasizing BD utilization, is available in [66].

We believe that the primary challenge is not the type or quantity of sensors but overcoming barriers to data usage and visualization. Consequently, a secondary objective of this work is to demonstrate the implementation of the HDBSCAN algorithm [67] for predicting undesirable events, including production breakdowns.

The selection of HDBSCAN as the preferred method is based on extensive efforts comparing it against other clustering methods, including standard k-means [68], MiniBatch k-means [69], Mean-Shift clustering [70], Hierarchical Clustering [71], Balanced Iterative Reducing and Clustering using Hierarchies (BIRCHs) [72], and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) [73]. All numerical analyses were conducted using the Scikit-learn Python package ver. 1.4.2 [74], with visualizations generated using the matplotlib library [75].

This paper is structured as follows: the first section describes the technological context and data origins, with time-domain data presentations. Next, the k-means and DBSCAN algorithms are detailed as precursors to illustrating the practical advantages of HDBSCAN. These algorithms were tested for their ability to discern two situations: a stable system and one displaying instabilities during welding. Furthermore, visual information was evaluated from the practical perspective of operators overseeing production. This paper concludes with a summary of the findings.

2. Industrial Data Collection and Methods of Analysis

2.1. Collecting Data

The presented analysis is based on data obtained during a real welding process in a robotic cell using a Fronius^® welding system (TPS500i) manufactured by Fronius GmbH (Wels, Austria) [76]. Data were recorded over a period of 78 min, during which the robot cyclically repeated several welding steps that varied in the length and shape of the welding path. These data were stored at a constant interval of 4.2 s, while the typical amount of data recorded in a single experiment was approximately 1200 points. The experiment was repeated for several welded samples, yielding the same nature of results each time. During operation, the following parameters were recorded: electric current intensity (A), electric power (W), electric voltage of the welding power supply (V), welding time (s), wire feed speed (WFS) (m/min), and system energy consumption (kJ). The welding method used was MIG/MAG (gas metal arc welding) [77]. Heat exchangers for domestic applications, made of ordinary carbon steel, were used as samples for the experiments (Figure 1). The typical length of the weld paths ranged from about 60 mm to 80 mm. There were six welds securing the tubes around their circumferences and eight straight welds attaching the front square surfaces. A typical welding process for these paths was split into single events of varying durations, with the duration of a single event typically not exceeding 6 s.

The Open Platform Communications Unified Architecture (OPC-UA) communication protocol was used to collect data from the welding source, enabling manufacturer-independent data exchange between the machines used in industrial systems. The protocol serves as a base for integrating devices in IoT and implementing Industry 4.0 approaches. The Fronius welding source had an option that allowed real-time recording of the required information into a text file, which was later analyzed. The industrial signals collected are presented in the time domain (Figure 2). As shown, the recorded data clustered into dominant areas, with some outliers which could be considered undesirable. For example, in Figure 2a, there are five dominant regions (“islands”) of data, displayed in detail in the zoomed view of Figure 2b. Additionally, two significant regions appear: one at the beginning of the recording (time = 0), where non-zero values of current are observed, and another with zero-value currents below the islands, recorded for time > 70,000. Other points exhibit a random behavior, indicating abnormal data.

2.2. Preliminary Data Clustering

A deeper analysis of the measured parameters can be conducted by excluding the time domain—a standard step in clustering analysis. Modified results are shown in Figure 3. While other combinations of datasets are possible, the presented relations are representative for further analysis. However, this presentation alone does not provide comprehensive insight into undesirable production moments, which is crucial for making corrections during the technological process.

The following section will focus on the relationship observed in Figure 3a, specifically the dependence of energy on the current.

2.3. Methods of Data Analysis

2.3.1. K-Means Algorithm

The classical k-means clustering method divides data into sub-areas by determining a specific number of clusters and their centers, minimizing the sum of distances from individual points to their cluster centers. In this method, the number of centers and the distance metric must be specified by the user. For the presented analysis, the Euclidean quadratic distance

d_{k i}

between the position of the actual i-th data point

{\vec{r}}_{k i}

and the k-th cluster

{\vec{c}}_{k}

was used, defined as

d_{k i} ({\vec{r}}_{k i}, {\vec{c}}_{k}) = {({\vec{r}}_{k i} - {\vec{c}}_{k})}^{2},

(1)

and the total sum of distances within all k-clusters is minimized as follows

\min_{{\vec{c}}_{k}} (\sum_{k} \sum_{i} d_{k i} ({\vec{r}}_{k i}, {\vec{c}}_{k})) .

(2)

Figure 4 (left panel) shows an example of a k-means clustering result between normalized energy and current data. Although normalization is unnecessary, it is commonly applied, especially for data from unrelated subject areas. In the right panel, the choice of four clusters is justified using the overall silhouette value (SV) parameter, which indicates how effectively data points are grouped. For the i-th data point from a dataset of N points, the individual silhouette value is defined as

S V (i) = \frac{{\bar{D}}_{i} - {\bar{d}}_{k i}}{m a x \{{\bar{D}}_{i}, {\bar{d}}_{k i}\}},

(3)

where

{\bar{D}}_{i}

is the mean distance of the i-th data point to all points in other clusters, and

{\bar{d}}_{k i}

is the mean distance within the k-th cluster. Thus, the overall SV is then calculated as

S V = \frac{1}{N} \sum_{i = 1}^{i = N} S V (i) .

(4)

The first minimum in the relationship between SV, and the number of clusters typically determines the optimal cluster count. Our intention was not to review clustering outcomes across different algorithms but rather to focus on identifying noisy data from other data points. For comprehensive reviews of various approaches in this field, readers are directed to [78,79].

Clusters also provide meaningful interpretations, as shown in Figure 5, which demonstrate the relationship between consumed energy, applied currents, and welding durations (welding_time_n) across clusters.

While k-means produces clear cluster boundaries (Figure 4)—an advantage of the method—its graphical representation may not quickly convey information about undesirable situations to an operator. Although cluster center observations can offer hints about production quality, they may rely too heavily on human recognition and reflex.

2.3.2. DBSCAN Algorithm

In the next step, we focus on testing the effectiveness of monitoring the production process. This can be achieved using an algorithm that excludes outlier results from clusters, treating these as undesirable noise indicative of problematic technological situations. The DBSCAN approach is an example of such a solution. This algorithm allows the user to define a minimum number of data points, within an assumed range of influence (epsilon), required for a point’s coordinates to be classified as part of a given cluster.

The algorithm classifies data points into three categories: core points, directly reachable points, and non-reachable (noise) points [73]. Core and directly reachable points form data clusters. In our case, the clusters generated by DBSCAN may overlap entirely, and the density at a given position remains unknown.

2.3.3. HDBSCAN Algorithm

While the DBSCAN algorithm offers the basic advantage of distinguishing outlier data (noise) from dominant clusters, the closely related HDBSCAN approach [7,80] extends this capability by identifying variable densities within local clusters. In HDBSCAN, the classification of clustered points versus noise relies not only on the epsilon range but also on the assumed density threshold for data points to be considered cluster members. The density of clustered points provides a useful way to graphically inform an operator about the degree of overlap (density). For example, a point in the figure may have a size proportional to its density.

However, we propose an important enhancement in data presentation: generating figures in real time, where the time-dependent number of noisy points serves as a key indicator of the welding process’s stability or instability. Detailed information extracted from such an approach will be discussed in the subsequent paragraphs.

3. Analysis of Results

3.1. DBSCAN Results

Figure 6 illustrates four cases of clustering using the DBSCAN algorithm, where the assumed minimum number of cluster members is set to 5, 10, 25, and 50 (described in the figure captions as the number of samples). These cases are presented for an exemplary energy–current dataset to determine the optimal value of the impact range (epsilon range). This analysis helps estimate a rational number of clusters for data visualization and assess the relevant number of noisy points, as evident in the figures.

By comparing these results with the earlier k-means analysis, we can conclude that setting epsilon = 10 and a minimum sample size of 25 or 50 (shown in the lower panels of Figure 6) offers a good balance. Too small an epsilon value causes numerical instability, while too large a value reduces the number of clusters, leading to information loss. Proper selection of these parameters (epsilon and minimum samples) ensures clear distinction between desired data and noise, enabling effective monitoring of production process stability. Figure 7 further demonstrates the clustering results for varying epsilon values (5, 10, 15, and 25), with a consistent minimum sample size of 50.

These results (compare Figure 4 and Figure 7), particularly for epsilon = 10, effectively highlight relevant regions of the welding process dynamics.

3.2. HDBSCAN Results

The HDBSCAN algorithm enables the identification of clusters with variable densities, graphically represented by circles of varying diameters proportional to the number of samples with identical coordinates. Like DBSCAN, this method uses a separation measure called “cut distance”, equivalent to epsilon in DBSCAN. Additionally, HDBSCAN incorporates an adjustable parameter for the minimum number of points needed to form a cluster’s nucleus. In Figure 8, various cases are presented, where the minimum cluster size is set to 15, 20, 25, 30, 40, and 80. The cut distance is fixed at 0.1 for consistency across analyses.

HDBSCAN Data Analysis

Choosing a minimum cluster size of 30 enables the detection of two clusters at zero-value current, a potentially significant observation about the welding process’s resting state. Additional relationships among data series are shown in Figure 9. Points with higher densities are represented by larger circles, though the uniformity of circle sizes indicates a relatively consistent density within the analyzed welding process.

While HDBSCAN provides more precise information compared to earlier methods, the real-time monitoring of process dynamics—specifically, variations in cluster counts and noisy points—remains ambiguous until the process conclusion. This limitation is addressed in Figure 10, which dynamically tracks these variations for a minimum sample size of 30. In the figure, the initial phase is clearly identifiable, marked by the formation of the first two clusters and a simultaneous drop in the number of noise points. Following this phase, the production process transitions to a four-cluster state, during which the number of noise points increases linearly at a relatively small growth rate of approximately 0.015 points per time step. However, toward the final stages of production, a sudden step in noise levels is observed.

To simulate a hypothetically more unstable production scenario, the minimum sample size was reduced to 15, as shown in Figure 11.

The number of clusters and noisy points reveals instabilities, such as at time = 600 and 660. In the latter case, the system transitions from four clusters to six, with an intermediate five-cluster state, as more clearly seen in the noisy data (right panel of Figure 11). Unlike the results in Figure 10, noise signals here exhibit nonlinear behaviors. However, from a practical standpoint, these instabilities have no adverse effects on the welded parts. Figure 11 primarily serves as a simulation of a hypothetical, undesirable situation.

Returning to the case with a minimum sample size of 30, Figure 12 shows noise patterns recorded in real time across different data series. Observing multiple datasets provides additional insights, such as process instabilities in the energy–power relationship around time = 600, which are not evident in other datasets.

To deepen this analysis, Figure 13 presents the first derivative of the noise signal, highlighting two significant instability periods: around time = 620 and nearing the process conclusion (time > 1000).

From this perspective, two significant instabilities were identified: the first occurring around the time-point 620 (clearly visible in Figure 13b) and the second near the end of the production process, at time-points exceeding 1000 (most apparent in Figure 13a,b). These two uncertain situations can be detected through direct (xy) dependency analysis, particularly in the energy vs. power relationships shown in Figure 14 and Figure 15. Both figures illustrate the state prior to the abrupt increase in noise levels (left panels) and during elevated noise levels (right panels).

In both cases, noise spikes correspond to scattered energy consumption values (0–15 kJ), while power fluctuates around 3500 W. These points remain unclassified within the clusters.

4. Conclusions

In this paper, we demonstrated the practical application of the unsupervised machine learning algorithm HDBSCAN for monitoring the arc welding process. A visually informative representation of the production process was developed, providing valuable insights for operators. The advantages of HDBSCAN were also highlighted in comparison to classical algorithms like k-means and DBSCAN, particularly for identifying potentially problematic situations during the production process.

Our findings underscore the importance of analyzing a combination of diverse data series to accurately assess process flow, especially when dealing with noisy data. In this case, the analysis focused on mechanical and energy-related time-dependent parameters. These two types of data, specific to arc welding process engineering, appear to form a reasonable minimum requirement for addressing the challenge of failure prediction.

By leveraging real-time information from HDBSCAN about the number of clusters and the volume of noisy data signals—as well as their first derivatives—it was possible to clearly identify process instabilities. We emphasize once more that such graphical representations should be simultaneously monitored across several selected combinations of data series for a more comprehensive analysis. Hence, we recommend using real-time data on the number of clusters and the number of noise points as a significant type of graphical information to be implemented in user interface screens.

We hope that the analysis of real industrial data presented here will contribute to broadening the use of practical machine learning (ML) and big data (BD) methodologies in industrial technological processes. In future work, we will aim to integrate supervised methods as a natural extension of the current unsupervised approach, addressing the additional challenge of labeling data, particularly for high-dimensional datasets representing complex technological processes.

Author Contributions

Conceptualization, T.B., J.W. and Z.S.; methodology, T.B. and M.B.; software, T.B. and M.B.; validation, T.B and J.W.; formal analysis, T.B. and Z.S.; investigation, T.B. and M.B.; resources, J.W. and Z.S.; writing—original draft preparation, T.B.; writing—review and editing, all authors; visualization, T.B.; supervision, all authors; and funding acquisition, Z.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by The National Centre for Research and Development, Poland, with grant “Development of energy-efficient, adaptive robotic cells with Industry 4.0 features, dedicated to the creation of a modular, freely complex and interconnected set of production machines or stand-alone operation” (no. POIR.01.01.01-00-1032/19).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

All authors were employed by the company PROPOINT S.A. All authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Shalev-Shwartz, S.; Ben-David, S. Understanding Machine Learning: From Theory to Algorithms; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Aggarwal, C.C.; Reddy, C.K. (Eds.) Data Clustering: Algorithms and Applications; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
Azzalini, A.; Torelli, N. Clustering via nonparametric density estimation. Stat. Comput. 2007, 17, 71–80. [Google Scholar] [CrossRef]
Kriegel, H.P.; Kröger, P.; Sander, J.; Zimek, A. Density-based clustering. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2009, 1, 231–240. [Google Scholar] [CrossRef]
Campello, R.J.G.B.; Hruschka, E.R.; Sander, J. Clustering based on density measures and automated cluster extraction. Data Min. Knowl. Discov. 2015, 29, 802–830. [Google Scholar]
Campello, R.J.; Moulavi, D.; Zimek, A.; Sander, J. Density-Based Clustering Based on Hierarchical Density Estimates. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Gold Coast, Australia, 14–17 April 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 160–172. [Google Scholar]
Sumit; Gupta, D.; Juneja, S.; Nauman, A.; Hamid, Y.; Ullah, I.; Kim, T.; Tag eldin, E.M.; Ghamry, N.A. Energy Saving Implementation in Hydraulic Press Using Industrial Internet of Things (IIoT). Electronics 2022, 11, 4061. [Google Scholar] [CrossRef]
Ullah, I.; Adhikari, D.; Khan, H.; Anwar, M.S.; Ahmad, S.; Bai, X. Mobile robot localization: Current challenges and future prospective. Comput. Sci. Rev. 2024, 53, 100651. [Google Scholar] [CrossRef]
Liu, C.; Peng, G.; Kong, Y.; Li, S.; Chen, S. Data Quality Affecting Big Data Analytics in Smart Factories: Research Themes, Issues and Methods. Symmetry 2021, 13, 1440. [Google Scholar] [CrossRef]
Silva, N.; Barros, J.; Santos, M.Y.; Costa, C.; Cortez, P.; Carvalho, M.S.; Gonçalves, J.N.C. Advancing Logistics 4.0 with the Implementation of a Big Data Warehouse: A Demonstration Case for the Automotive Industry. Electronics 2021, 10, 2221. [Google Scholar] [CrossRef]
Ding, H.; Tian, J.; Yu, W.; Wilson, D.I.; Young, B.R.; Cui, X.; Xin, X.; Wang, Z.; Li, W. The Application of Artificial Intelligence and Big Data in the Food Industry. Foods 2023, 12, 4511. [Google Scholar] [CrossRef]
Meng, Q.; Peng, Q.; Li, Z.; Hu, X. Big Data Technology in Construction Safety Management: Application Status, Trend and Challenge. Buildings 2022, 12, 533. [Google Scholar] [CrossRef]
Theodorakopoulos, L.; Theodoropoulou, A.; Halkiopoulos, C. Enhancing Decentralized Decision-Making with Big Data and Blockchain Technology: A Comprehensive Review. Appl. Sci. 2024, 14, 7007. [Google Scholar] [CrossRef]
Iglesias, C.A.; Favenza, A.; Carrera, Á. A Big Data Reference Architecture for Emergency Management. Information 2020, 11, 569. [Google Scholar] [CrossRef]
Riad, M.; Naimi, M.; Okar, C. Enhancing Supply Chain Resilience Through Artificial Intelligence: Developing a Comprehensive Conceptual Framework for AI Implementation and Supply Chain Optimization. Logistics 2024, 8, 111. [Google Scholar] [CrossRef]
Chittam, S.; Gokaraju, B.; Xu, Z.; Sankar, J.; Roy, K. Big Data Mining and Classification of Intelligent Material Science Data Using Machine Learning. Appl. Sci. 2021, 11, 8596. [Google Scholar] [CrossRef]
Elouataoui, W.; El Mendili, S.; Gahi, Y. An Automated Big Data Quality Anomaly Correction Framework Using Predictive Analysis. Data 2023, 8, 182. [Google Scholar] [CrossRef]
Oprea, S.-V.; Bâra, A.; Puican, F.C.; Radu, I.C. Anomaly Detection with Machine Learning Algorithms and Big Data in Electricity Consumption. Sustainability 2021, 13, 10963. [Google Scholar] [CrossRef]
Liu, W.; Lei, P.; Xu, D.; Zhu, X. Anomaly Recognition, Diagnosis and Prediction of Massive Data Flow Based on Time-GAN and DBSCAN for Power Dispatching Automation System. Processes 2023, 11, 2782. [Google Scholar] [CrossRef]
Vladov, S.; Vysotska, V.; Sokurenko, V.; Muzychuk, O.; Nazarkevych, M.; Lytvyn, V. Neural Network System for Predicting Anomalous Data in Applied Sensor Systems. Appl. Syst. Innov. 2024, 7, 88. [Google Scholar] [CrossRef]
Grunova, D.; Bakratsi, V.; Vrochidou, E.; Papakostas, G.A. Machine Learning for Anomaly Detection in Industrial Environments. Eng. Proc. 2024, 70, 25. [Google Scholar] [CrossRef]
McInnes, L.; Healy, J.; Astels, S. HDBSCAN: Hierarchical density-based clustering. J. Open Source Softw. 2017, 2, 205. [Google Scholar] [CrossRef]
Hartigan, J.A.; Wong, M.A. A K-means clustering algorithm. Appl. Stat. 1979, 28, 100–108. [Google Scholar] [CrossRef]
Jain, A.K. Data clustering: 50 years beyond K-means. Patt. Recogn. Lett. 2010, 31, 651–666. [Google Scholar] [CrossRef]
Schubert, E.; Sander, J.; Ester, M.; Kriegel, H.P.; Xu, X. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Trans. Database Syst. 2017, 42, 19. [Google Scholar] [CrossRef]
Retiti Diop Emane, C.; Song, S.; Lee, H.; Choi, D.; Lim, J.; Bok, K.; Yoo, J. Anomaly Detection Based on GCNs and DBSCAN in a Large-Scale Graph. Electronics 2024, 13, 2625. [Google Scholar] [CrossRef]
Emmons, S.; Kobourov, S.; Gallant, M.; Börner, K. Analysis of network clustering algorithms and cluster quality metrics at scale. PLoS ONE 2016, 11, e0159161. [Google Scholar] [CrossRef] [PubMed]
Xu, R.; Wunsch, D. Survey of clustering algorithms. IEEE Trans. Neural Netw. 2005, 16, 645–678. [Google Scholar] [CrossRef]
Sun, W.; Zhou, Z.; Ma, F.; Wang, J.; Ji, C. Industrial Application of Data-Driven Process Monitoring with an Automatic Selection Strategy for Modeling Data. Processes 2023, 11, 402. [Google Scholar] [CrossRef]
Ackermann, M.R.; Blömer, J.; Sohler, C. Clustering for metric and non-metric distance measures. ACM Trans. Algorithms 2010, 6, 1–26. [Google Scholar] [CrossRef]
Hasan, M.M.U.; Hasan, T.; Shahidi, R.; James, L.; Peters, D.; Gosine, R. Lithofacies Identification from Wire-Line Logs Using an Unsupervised Data Clustering Algorithm. Energies 2023, 16, 8116. [Google Scholar] [CrossRef]
Han, X.; Armenakis, C.; Jadidi, M. Modeling Vessel Behaviours by Clustering AIS Data Using Optimized DBSCAN. Sustainability 2021, 13, 8162. [Google Scholar] [CrossRef]
Munguía Mondragón, J.C.; Rendón Lara, E.; Alejo Eleuterio, R.; Granda Gutirrez, E.E.; Del Razo López, F. Density-Based Clustering to Deal with Highly Imbalanced Data in Multi-Class Problems. Mathematics 2023, 11, 4008. [Google Scholar] [CrossRef]
Yun, K.; Yun, H.; Lee, S.; Oh, J.; Kim, M.; Lim, M.; Lee, J.; Kim, C.; Seo, J.; Choi, J. A Study on Machine Learning-Enhanced Roadside Unit-Based Detection of Abnormal Driving in Autonomous Vehicles. Electronics 2024, 13, 288. [Google Scholar] [CrossRef]
DeMedeiros, K.; Koh, C.Y.; Hendawi, A. Clustering on the Chicago Array of Things: Spotting Anomalies in the Internet of Things Records. Future Internet 2024, 16, 28. [Google Scholar] [CrossRef]
Bano, F.; Alomar, M.A.; Alotaibi, F.M.; Serbaya, S.H.; Rizwan, A.; Hasan, F. Leveraging Virtual Reality in Engineering Education to Optimize Manufacturing Sustainability in Industry 4.0. Sustainability 2024, 16, 7927. [Google Scholar] [CrossRef]
Zhang, H.; Lee, S.; Lu, Y.; Yu, X.; Lu, H. A Survey on Big Data Technologies and Their Applications to the Metaverse: Past, Current and Future. Mathematics 2023, 11, 96. [Google Scholar] [CrossRef]
Tarng, W.; Wu, Y.-J.; Ye, L.-Y.; Tang, C.-W.; Lu, Y.-C.; Wang, T.-L.; Li, C.-L. Application of Virtual Reality in Developing the Digital Twin for an Integrated Robot Learning System. Electronics 2024, 13, 2848. [Google Scholar] [CrossRef]
Martins, N.C.; Marques, B.; Dias, P.; Sousa Santos, B. Expanding the Horizons of Situated Visualization: The Extended SV Model. Big Data Cogn. Comput. 2023, 7, 112. [Google Scholar] [CrossRef]
Burghardt, A.; Szybicki, D.; Gierlak, P.; Kurc, K.; Pietruś, P.; Cygan, R. Programming of Industrial Robots Using Virtual Reality and Digital Twins. Appl. Sci. 2020, 10, 486. [Google Scholar] [CrossRef]
Lewczuk, K.; Żuchowicz, P. Virtual Reality Application for the Safety Improvement of Intralogistics Systems. Sustainability 2024, 16, 6024. [Google Scholar] [CrossRef]
Žemla, F.; Cigánek, J.; Rosinová, D.; Kučera, E.; Haffner, O. Smart Platform for Monitoring and Control of Discrete Event System in Industry 4.0 Concept. Appl. Sci. 2023, 13, 10697. [Google Scholar] [CrossRef]
Caiza, G.; Sanz, R. Digital Twin to Control and Monitor an Industrial Cyber-Physical Environment Supported by Augmented Reality. Appl. Sci. 2023, 13, 7503. [Google Scholar] [CrossRef]
Yang, Y.; Zhong, L.; Li, S.; Yu, A. Research on the Perceived Quality of Virtual Reality Headsets in Human–Computer Interaction. Sensors 2023, 23, 6824. [Google Scholar] [CrossRef] [PubMed]
Muñoz-Saavedra, L.; Miró-Amarante, L.; Domínguez-Morales, M. Augmented and Virtual Reality Evolution and Future Tendency. Appl. Sci. 2020, 10, 322. [Google Scholar] [CrossRef]
Alpala, L.O.; Quiroga-Parra, D.J.; Torres, J.C.; Peluffo-Ordóñez, D.H. Smart Factory Using Virtual Reality and Online Multi-User: Towards a Metaverse for Experimental Frameworks. Appl. Sci. 2022, 12, 6258. [Google Scholar] [CrossRef]
Florescu, A. Digital Twin for Flexible Manufacturing Systems and Optimization Through Simulation: A Case Study. Machines 2024, 12, 785. [Google Scholar] [CrossRef]
Krupas, M.; Kajati, E.; Liu, C.; Zolotova, I. Towards a Human-Centric Digital Twin for Human–Machine Collaboration: A Review on Enabling Technologies and Methods. Sensors 2024, 24, 2232. [Google Scholar] [CrossRef]
Mourtzis, D.; Angelopoulos, J.; Panopoulos, N. The Future of the Human–Machine Interface (HMI) in Society 5.0. Future Internet 2023, 15, 162. [Google Scholar] [CrossRef]
Hetmanczyk, M.P. A Method to Evaluate the Maturity Level of Robotization of Production Processes in the Context of Digital Transformation—Polish Case Study. Appl. Sci. 2024, 14, 5401. [Google Scholar] [CrossRef]
Çiğdem, Ş.; Meidute-Kavaliauskiene, I.; Yıldız, B. Industry 4.0 and Industrial Robots: A Study from the Perspective of Manufacturing Company Employees. Logistics 2023, 7, 17. [Google Scholar] [CrossRef]
Yao, K.-C.; Lin, C.-L.; Pan, C.-H. Industrial Sustainable Development: The Development Trend of Programmable Logic Controller Technology. Sustainability 2024, 16, 6230. [Google Scholar] [CrossRef]
Langmann, R.; Stiller, M. The PLC as a Smart Service in Industry 4.0 Production Systems. Appl. Sci. 2019, 9, 3815. [Google Scholar] [CrossRef]
Pedersen, K.; Jensen, R.R.; Hall, L.K.; Cutler, M.C.; Transtrum, M.K.; Gee, K.L.; Lympany, S.V. K-Means Clustering of 51 Geospatial Layers Identified for Use in Continental-Scale Modeling of Outdoor Acoustic Environments. Appl. Sci. 2023, 13, 8123. [Google Scholar] [CrossRef]
Cesario, E.; Lindia, P.; Vinci, A. Detecting Multi-Density Urban Hotspots in a Smart City: Approaches, Challenges and Applications. Big Data Cogn. Comput. 2023, 7, 29. [Google Scholar] [CrossRef]
Ragazou, K.; Passas, I.; Garefalakis, A.; Galariotis, E.; Zopounidis, C. Big Data Analytics Applications in Information Management Driving Operational Efficiencies and Decision-Making: Mapping the Field of Knowledge with Bibliometric Analysis Using R. Big Data Cogn. Comput. 2023, 7, 13. [Google Scholar] [CrossRef]
Kumar, Y.; Marchena, J.; Awlla, A.H.; Li, J.J.; Abdalla, H.B. The AI-Powered Evolution of Big Data. Appl. Sci. 2024, 14, 10176. [Google Scholar] [CrossRef]
Gadal, S.; Mokhtar, R.; Abdelhaq, M.; Alsaqour, R.; Ali, E.S.; Saeed, R. Machine Learning-Based Anomaly Detection Using K-Mean Array and Sequential Minimal Optimization. Electronics 2022, 11, 2158. [Google Scholar] [CrossRef]
Guerreiro, M.T.; Guerreiro, E.M.A.; Barchi, T.M.; Biluca, J.; Alves, T.A.; de Souza Tadano, Y.; Trojan, F.; Siqueira, H.V. Anomaly Detection in Automotive Industry Using Clustering Methods—A Case Study. Appl. Sci. 2021, 11, 9868. [Google Scholar] [CrossRef]
Choi, W.-H.; Kim, J. Unsupervised Learning Approach for Anomaly Detection in Industrial Control Systems. Appl. Syst. Innov. 2024, 7, 18. [Google Scholar] [CrossRef]
Barrera, J.M.; Reina, A.; Mate, A.; Trujillo, J.C. Fault detection and diagnosis for industrial processes based on clustering and autoencoders: A case of gas turbines. Int. J. Mach. Learn. Cyber. 2022, 13, 3113–3129. [Google Scholar] [CrossRef]
Nelson, W.; Culp, C. Machine Learning Methods for Automated Fault Detection and Diagnostics in Building Systems—A Review. Energies 2022, 15, 5534. [Google Scholar] [CrossRef]
Vijayan, D.; Aziz, I. Adaptive Hierarchical Density-Based Spatial Clustering Algorithm for Streaming Applications. Telecom 2023, 4, 1–14. [Google Scholar] [CrossRef]
Mazzei, D.; Ramjattan, R. Machine Learning for Industry 4.0: A Systematic Review Using Deep Learning-Based Topic Modelling. Sensors 2022, 22, 8641. [Google Scholar] [CrossRef] [PubMed]
Zhang, F.; Guo, J.; Yuan, F.; Qiu, Y.; Wang, P.; Cheng, F.; Gu, Y. Enhancement Methods of Hydropower Unit Monitoring Data Quality Based on the Hierarchical Density-Based Spatial Clustering of Applications with a Noise–Wasserstein Slim Generative Adversarial Imputation Network with a Gradient Penalty. Sensors 2024, 24, 118. [Google Scholar] [CrossRef] [PubMed]
Stewart, G.; Al-Khassaweneh, M. An Implementation of the HDBSCAN* Clustering Algorithm. Appl. Sci. 2022, 12, 2405. [Google Scholar] [CrossRef]
Tabianan, K.; Velu, S.; Ravi, V. K-Means Clustering Approach for Intelligent Customer Segmentation Using Customer Purchase Behavior Data. Sustainability 2022, 14, 7243. [Google Scholar] [CrossRef]
John, J.M.; Shobayo, O.; Ogunleye, B. An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market. Analytics 2023, 2, 809–823. [Google Scholar] [CrossRef]
Trassinelli, M.; Ciccodicola, P. Mean Shift Cluster Recognition Method Implementation in the Nested Sampling Algorithm. Entropy 2020, 22, 185. [Google Scholar] [CrossRef]
Sokhonn, L.; Park, Y.-S.; Lee, M.-K. Hierarchical Clustering via Single and Complete Linkage Using Fully Homomorphic Encryption. Sensors 2024, 24, 4826. [Google Scholar] [CrossRef]
Doğan, Y.; Dalkılıç, F.; Kut, A.; Kara, K.C.; Takazoğlu, U. A Novel Stream Mining Approach as Stream-Cluster Feature Tree Algorithm: A Case Study in Turkish Job Postings. Appl. Sci. 2022, 12, 7893. [Google Scholar] [CrossRef]
Li, X.; Zhang, P.; Zhu, G. DBSCAN Clustering Algorithms for Non-Uniform Density Data and Its Application in Urban Rail Passenger Aggregation Distribution. Energies 2019, 12, 3722. [Google Scholar] [CrossRef]
Available online: https://scikit-learn.org/ (accessed on 7 January 2025).
Available online: https://matplotlib.org/ (accessed on 7 January 2025).
Available online: https://www.fronius.com/en/welding-technology/product-list?filter=11409 (accessed on 7 January 2025).
González-González, C.; Los Santos-Ortega, J.; Fraile-García, E.; Ferreiro-Cabello, J. Environmental and Economic Analyses of TIG, MIG, MAG and SMAW Welding Processes. Metals 2023, 13, 1094. [Google Scholar] [CrossRef]
Nowak-Brzezińska, A.; Gaibei, I. How the Outliers Influence the Quality of Clustering? Entropy 2022, 24, 917. [Google Scholar] [CrossRef] [PubMed]
Kossakov, M.; Mukasheva, A.; Balbayev, G.; Seidazimov, S.; Mukammejanova, D.; Sydybayeva, M. Quantitative Comparison of Machine Learning Clustering Methods for Tuberculosis Data Analysis. Eng. Proc. 2024, 60, 20. [Google Scholar] [CrossRef]
Available online: https://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html (accessed on 7 January 2025).

Figure 1. The heat exchanger used in the experiment. The total length of the sample is 385 mm, with a main square cross-section of 60 mm × 60 mm.

Figure 2. Time-domain representation of the registered data: (a) current, (b) current with the initial phase excluded, (c) power, (d) voltage, (e) welding time, (f) wire feed speed (WFS), and (g) energy.

Figure 3. Data representation for classification—various combinations of data series with the time domain excluded: (a) power vs. current, (b) voltage vs. current, (c) welding time vs. current, (d) WFS vs. current, (e) energy vs. current, and (f) energy vs. welding time.

Figure 4. K-means clustering results: (left) normalized energy vs. normalized current and (right) overall silhouette value (SV) vs. the number of clusters.

Figure 5. Classification of data without k-means clustering, showing normalized energy vs. normalized current for various welding durations.

Figure 6. Tuning of the DBSCAN algorithm—results for different minimum numbers of samples constituting a cluster, with adjustments to the attraction region size (epsilon range).

Figure 7. Results for the DBSCAN algorithm: clustering outcomes for different epsilon values (eps) of 5, 10, 15, and 25.

Figure 8. Results for the HDBSCAN algorithm: clustering outcomes for different minimum sample sizes of 15, 20, 25, 30, 40, and 80. The case with 30 samples, marked in red, was selected for further evaluation.

Figure 9. HDBSCAN results, with a minimum sample size set to 30. The number of detected clusters varies between 3, 4, or 5, depending on the data series type, representing the entire production history and final stage of welding.

Figure 10. Real-time HDBSCAN results, tracking the number of clusters and noisy points with a minimum cluster size of 30 and a cut distance of 0.1.

Figure 11. Real-time HDBSCAN results, simulating unstable production conditions with a minimum cluster size of 15 and a cut distance of 0.1.

Figure 12. HDBSCAN results of real-time noise tracking for various data series. Minimum cluster size: 30. Results for various combinations of data series: (a) energy vs. power, (b) energy vs. voltage, (c) energy vs. welding time, and (d) energy vs. WFS.

Figure 13. First derivative of the noise signal for 30-point clusters, showing two key instability periods. Results for various combinations of data series: (a) energy vs. current, (b) energy vs. power, (c) energy vs. voltage, (d) energy vs. welding time, and (e) energy vs. WFS.

Figure 14. Instability detection using HDBSCAN (time = 620). Left: state before noise increase; and right: elevated noise level (highlighted by the red oval).

Figure 15. Instability detection using HDBSCAN (time > 1000). Left: state before noise increase; and right: elevated noise level (highlighted by the red oval).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Blachowicz, T.; Wylezek, J.; Sokol, Z.; Bondel, M. Real-Time Analysis of Industrial Data Using the Unsupervised Hierarchical Density-Based Spatial Clustering of Applications with Noise Method in Monitoring the Welding Process in a Robotic Cell. Information 2025, 16, 79. https://doi.org/10.3390/info16020079

AMA Style

Blachowicz T, Wylezek J, Sokol Z, Bondel M. Real-Time Analysis of Industrial Data Using the Unsupervised Hierarchical Density-Based Spatial Clustering of Applications with Noise Method in Monitoring the Welding Process in a Robotic Cell. Information. 2025; 16(2):79. https://doi.org/10.3390/info16020079

Chicago/Turabian Style

Blachowicz, Tomasz, Jacek Wylezek, Zbigniew Sokol, and Marcin Bondel. 2025. "Real-Time Analysis of Industrial Data Using the Unsupervised Hierarchical Density-Based Spatial Clustering of Applications with Noise Method in Monitoring the Welding Process in a Robotic Cell" Information 16, no. 2: 79. https://doi.org/10.3390/info16020079

APA Style

Blachowicz, T., Wylezek, J., Sokol, Z., & Bondel, M. (2025). Real-Time Analysis of Industrial Data Using the Unsupervised Hierarchical Density-Based Spatial Clustering of Applications with Noise Method in Monitoring the Welding Process in a Robotic Cell. Information, 16(2), 79. https://doi.org/10.3390/info16020079

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Real-Time Analysis of Industrial Data Using the Unsupervised Hierarchical Density-Based Spatial Clustering of Applications with Noise Method in Monitoring the Welding Process in a Robotic Cell

Abstract

1. Introduction

2. Industrial Data Collection and Methods of Analysis

2.1. Collecting Data

2.2. Preliminary Data Clustering

2.3. Methods of Data Analysis

2.3.1. K-Means Algorithm

2.3.2. DBSCAN Algorithm

2.3.3. HDBSCAN Algorithm

3. Analysis of Results

3.1. DBSCAN Results

3.2. HDBSCAN Results

HDBSCAN Data Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI