Big Data Intelligence: Methodologies and Applications

Order results

Result details

Journals

Show export options Show export options

Select all

Export citation of selected articles as:

22 pages, 7126 KiB

Open AccessArticle

Exploring Downscaling in High-Dimensional Lorenz Models Using the Transformer Decoder

by Bo-Wen Shen

Mach. Learn. Knowl. Extr. 2024, 6(4), 2161-2182; https://doi.org/10.3390/make6040107 - 25 Sep 2024

Viewed by 1801

Abstract

This paper investigates the feasibility of downscaling within high-dimensional Lorenz models through the use of machine learning (ML) techniques. This study integrates atmospheric sciences, nonlinear dynamics, and machine learning, focusing on using large-scale atmospheric data to predict small-scale phenomena through ML-based empirical models. The high-dimensional generalized Lorenz model (GLM) was utilized to generate chaotic data across multiple scales, which was subsequently used to train three types of machine learning models: a linear regression model, a feedforward neural network (FFNN)-based model, and a transformer-based model. The linear regression model uses large-scale variables to predict small-scale variables, serving as a foundational approach. The FFNN and transformer-based models add complexity, incorporating multiple hidden layers and self-attention mechanisms, respectively, to enhance prediction accuracy. All three models demonstrated robust performance, with correlation coefficients between the predicted and actual small-scale variables exceeding 0.9. Notably, the transformer-based model, which yielded better results than the others, exhibited strong performance in both control and parallel runs, where sensitive dependence on initial conditions (SDIC) occurs during the validation period. This study highlights several key findings and areas for future research: (1) a set of large-scale variables, analogous to multivariate analysis, which retain memory of their connections to smaller scales, can be effectively leveraged by trained empirical models to estimate irregular, chaotic small-scale variables; (2) modern machine learning techniques, such as FFNN and transformer models, are effective in capturing these downscaling processes; and (3) future research could explore both downscaling and upscaling processes within a triple-scale system (e.g., large-scale tropical waves, medium-scale hurricanes, and small-scale convection processes) to enhance the prediction of multiscale weather and climate systems. Full article

(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)

► Show Figures

Figure 1

12 pages, 2111 KiB

Open AccessArticle

RAAFNet: Reverse Attention Adaptive Fusion Network for Large-Scale Point Cloud Semantic Segmentation

by Kai Wang and Huanhuan Zhang

Mathematics 2024, 12(16), 2485; https://doi.org/10.3390/math12162485 - 12 Aug 2024

Viewed by 980

Abstract

Point cloud semantic segmentation is essential for comprehending and analyzing scenes. However, performing semantic segmentation on large-scale point clouds presents challenges, including demanding high memory requirements, a lack of structured data, and the absence of topological information. This paper presents a novel method based on the Reverse Attention Adaptive Fusion network (RAAFNet) for segmenting large-scale point clouds. RAAFNet consists of a reverse attention encoder–decoder module, an adaptive fusion module, and a local feature aggregation module. The reverse attention encoder–decoder module is applied to extract point cloud features at different scales. The adaptive fusion module enhances fine-grained representation within multi-resolution feature maps. Furthermore, a local aggregation classifier is introduced, which aggregates the features of neighboring points to the center point in order to leverage contextual information and enhance the classifier’s perceptual capability. Finally, the predicted labels are generated. Notably, our method excels at extracting point cloud features across different dimensions and produces highly accurate segmentation results. Experimental results on the Semantic3D dataset achieved an overall accuracy of 89.9% and a mIoU of 74.4%. Full article

(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)

► Show Figures

Figure 1

18 pages, 2378 KiB

Open AccessReview

A Review of Orebody Knowledge Enhancement Using Machine Learning on Open-Pit Mine Measure-While-Drilling Data

by Daniel M. Goldstein, Chris Aldrich and Louisa O’Connor

Mach. Learn. Knowl. Extr. 2024, 6(2), 1343-1360; https://doi.org/10.3390/make6020063 - 18 Jun 2024

Cited by 1 | Viewed by 2005

Abstract

Measure while drilling (MWD) refers to the acquisition of real-time data associated with the drilling process, including information related to the geological characteristics encountered in hard-rock mining. The availability of large quantities of low-cost MWD data from blast holes compared to expensive and sparsely collected orebody knowledge (OBK) data from exploration drill holes make the former more desirable for characterizing pre-excavation subsurface conditions. Machine learning (ML) plays a critical role in the real-time or near-real-time analysis of MWD data to enable timely enhancement of OBK for operational purposes. Applications can be categorized into three areas, focused on the mechanical properties of the rock mass, the lithology of the rock, as well as, related to that, the estimation of the geochemical species in the rock mass. From a review of the open literature, the following can be concluded: (i) The most important MWD metrics are the rate of penetration (rop), torque (tor), weight on bit (wob), bit air pressure (bap), and drill rotation speed (rpm). (ii) Multilayer perceptron analysis has mostly been used, followed by Gaussian processes and other methods, mainly to identify rock types. (iii) Recent advances in deep learning methods designed to deal with unstructured data, such as borehole images and vibrational signals, have not yet been fully exploited, although this is an emerging trend. (iv) Significant recent developments in explainable artificial intelligence could also be used to better advantage in understanding the association between MWD metrics and the mechanical and geochemical structure and properties of drilled rock. Full article

(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)

► Show Figures

Figure 1

20 pages, 5568 KiB

Open AccessArticle

Extracting Interpretable Knowledge from the Remote Monitoring of COVID-19 Patients

by Melina Tziomaka, Athanasios Kallipolitis, Andreas Menychtas, Parisis Gallos, Christos Panagopoulos, Alice Georgia Vassiliou, Edison Jahaj, Ioanna Dimopoulou, Anastasia Kotanidou and Ilias Maglogiannis

Mach. Learn. Knowl. Extr. 2024, 6(2), 1323-1342; https://doi.org/10.3390/make6020062 - 18 Jun 2024

Viewed by 1425

Abstract

Apart from providing user-friendly applications that support digitized healthcare routines, the use of wearable devices has proven to increase the independence of patients in a healthcare setting. By applying machine learning techniques to real health-related data, important conclusions can be drawn for unsolved issues related to disease prognosis. In this paper, various machine learning techniques are examined and analyzed for the provision of personalized care to COVID-19 patients with mild symptoms based on individual characteristics and the comorbidities they have, while the connection between the stimuli and predictive results are utilized for the evaluation of the system’s transparency. The results, jointly analyzing wearable and electronic health record data for the prediction of a daily dyspnea grade and the duration of fever, are promising in terms of evaluation metrics even in a specified stratum of patients. The interpretability scheme provides useful insight concerning factors that greatly influenced the results. Moreover, it is demonstrated that the use of wearable devices for remote monitoring through cloud platforms is feasible while providing awareness of a patient’s condition, leading to the early detection of undesired changes and reduced visits for patient screening. Full article

(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)

► Show Figures

Figure 1

14 pages, 4201 KiB

Open AccessArticle

Solving Contextual Stochastic Optimization Problems through Contextual Distribution Estimation

by Xuecheng Tian, Bo Jiang, King-Wah Pang, Yu Guo, Yong Jin and Shuaian Wang

Mathematics 2024, 12(11), 1612; https://doi.org/10.3390/math12111612 - 21 May 2024

Viewed by 1214

Abstract

Stochastic optimization models always assume known probability distributions about uncertain parameters. However, it is unrealistic to know the true distributions. In the era of big data, with the knowledge of informative features related to uncertain parameters, this study aims to estimate the conditional distributions of uncertain parameters directly and solve the resulting contextual stochastic optimization problem by using a set of realizations drawn from estimated distributions, which is called the contextual distribution estimation method. We use an energy scheduling problem as the case study and conduct numerical experiments with real-world data. The results demonstrate that the proposed contextual distribution estimation method offers specific benefits in particular scenarios, resulting in improved decisions. This study contributes to the literature on contextual stochastic optimization problems by introducing the contextual distribution estimation method, which holds practical significance for addressing data-driven uncertain decision problems. Full article

(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)

► Show Figures

Figure 1

46 pages, 3360 KiB

Open AccessReview

Categorical Data Clustering: A Bibliometric Analysis and Taxonomy

by Maya Cendana and Ren-Jieh Kuo

Mach. Learn. Knowl. Extr. 2024, 6(2), 1009-1054; https://doi.org/10.3390/make6020047 - 7 May 2024

Viewed by 3372

Abstract

Numerous real-world applications apply categorical data clustering to find hidden patterns in the data. The K-modes-based algorithm is a popular algorithm for solving common issues in categorical data, from outlier and noise sensitivity to local optima, utilizing metaheuristic methods. Many studies have focused on increasing clustering performance, with new methods now outperforming the traditional K-modes algorithm. It is important to investigate this evolution to help scholars understand how the existing algorithms overcome the common issues of categorical data. Using a research-area-based bibliometric analysis, this study retrieved articles from the Web of Science (WoS) Core Collection published between 2014 and 2023. This study presents a deep analysis of 64 articles to develop a new taxonomy of categorical data clustering algorithms. This study also discusses the potential challenges and opportunities in possible alternative solutions to categorical data clustering. Full article

(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)

► Show Figures

Figure 1

11 pages, 1606 KiB

Open AccessArticle

Effective Data Reduction Using Discriminative Feature Selection Based on Principal Component Analysis

by Faith Nwokoma, Justin Foreman and Cajetan M. Akujuobi

Mach. Learn. Knowl. Extr. 2024, 6(2), 789-799; https://doi.org/10.3390/make6020037 - 3 Apr 2024

Cited by 2 | Viewed by 2217

Abstract

Effective data reduction must retain the greatest possible amount of informative content of the data under examination. Feature selection is the default for dimensionality reduction, as the relevant features of a dataset are usually retained through this method. In this study, we used unsupervised learning to discover the top-k discriminative features present in the large multivariate IoT dataset used. We used the statistics of principal component analysis to filter the relevant features based on the ranks of the features along the principal directions while also considering the coefficients of the components. The selected number of principal components was used to decide the number of features to be selected in the SVD process. A number of experiments were conducted using different benchmark datasets, and the effectiveness of the proposed method was evaluated based on the reconstruction error. The potency of the results was verified by subjecting the algorithm to a large IoT dataset, and we compared the performance based on accuracy and reconstruction error to the results of the benchmark datasets. The performance evaluation showed consistency with the results obtained with the benchmark datasets, which were of high accuracy and low reconstruction error. Full article

(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)

► Show Figures

Figure 1

20 pages, 1028 KiB

Open AccessReview

Luxury Car Data Analysis: A Literature Review

by Pegah Barakati, Flavio Bertini, Emanuele Corsi, Maurizio Gabbrielli and Danilo Montesi

Data 2024, 9(4), 48; https://doi.org/10.3390/data9040048 - 30 Mar 2024

Cited by 1 | Viewed by 6616

Abstract

The concept of luxury, considering it a rare and exclusive attribute, is evolving due to technological advances and the increasing influence of consumers in the market. Luxury cars have always symbolized wealth, social status, and sophistication. Recently, as technology progresses, the ability and interest to gather, store, and analyze data from these elegant vehicles has also increased. In recent years, the analysis of luxury car data has emerged as a significant area of research, highlighting researchers’ exploration of various aspects that may differentiate luxury cars from ordinary ones. For instance, researchers study factors such as economic impact, technological advancements, customer preferences and demographics, environmental implications, brand reputation, security, and performance. Although the percentage of individuals purchasing luxury cars is lower than that of ordinary cars, the significance of analyzing luxury car data lies in its impact on various aspects of the automotive industry and society. This literature review aims to provide an overview of the current state of the art in luxury car data analysis. Full article

(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)

► Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Displaying articles 1-8

Submit your Abstract

Journal Name	Impact Factor	CiteScore	Launched Year	First Decision (median)	APC
Big Data and Cognitive Computing BDCC	3.7	7.1	2017	25.3 Days	CHF 1800
Data data	2.2	4.3	2016	26.8 Days	CHF 1600
Machine Learning and Knowledge Extraction make	4.0	6.3	2019	20.8 Days	CHF 1800
Mathematics mathematics	2.3	4.0	2013	18.3 Days	CHF 2600

Topic Menu

Topic Editors

Big Data Intelligence: Methodologies and Applications

Topic Information

Keywords

Participating Journals

Published Papers (8 papers)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI