Topic Editors

School of Software Technology, Dalian University of Technology, Dalian 116024, China
School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
Dr. Boxiang Dong
Department of Computer Science, Montclair State University, Montclair, NJ, USA

Big Data Intelligence: Methodologies and Applications

Abstract submission deadline
closed (31 October 2024)
Manuscript submission deadline
31 December 2024
Viewed by
16672

Topic Information

Dear Colleagues,

In the big data era, with the enrichment of data collection and description measures, a wide array of data in various formats are collected much easier than before. It is significant to discover the knowledge hidden in the mass by comprehensive understanding and learning to realize the data intelligence, which can help human in various dimensions, such as intelligent decisions and predictive services. However, the high-dimensional, heterogeneous, real-time, and low-quality characteristics of the collected data pose great challenges to the design of knowledge discovery methods. If we can effectively perform feature learning on massive high-dimensional, heterogeneous, real-time, and low-quality big data to discover the hidden knowledge and rules, the potential values and insights can be identified. Thus, it will provide a comprehensive understanding and a favorable decision-making framework based on the massive data to realize the real big data intelligence.

This topic aims to seek the high-quality papers from academics and industry-related researchers in the areas of big data, data mining, machine learning, artificial intelligence, and multimedia analysis to present the most recently advanced methods and applications for realizing big data intelligence. Proposed submissions should be original, unpublished, and novel for in-depth research. Topics include but not limited to:

  • Big Data Theory and Methods;
  • Artificial Intelligence Theory and Methods;
  • Multimodal Data Analysis;
  • Domain Adaption and Transfer Learning;
  • Deep Learning and Reinforcement Learning;
  • Knowledge Graphs;
  • Natural Language Processing;
  • Cross-modal Index;
  • Uncertainty Data Analysis;
  • Data Reliability Analysis;
  • Medical Big Data Analysis and Application;
  • Industrial Big Data Analysis and Application;
  • Big data Analysis and Application in Other Fields.

Prof. Dr. Liang Zhao
Dr. Liang Zou
Dr. Boxiang Dong
Topic Editors

Keywords

  • big data
  • artificial intelligence
  • multimodal learning
  • knowledge graphs
  • data reliability

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Big Data and Cognitive Computing
BDCC
3.7 7.1 2017 18 Days CHF 1800 Submit
Data
data
2.2 4.3 2016 27.7 Days CHF 1600 Submit
Machine Learning and Knowledge Extraction
make
4.0 6.3 2019 27.1 Days CHF 1800 Submit
Mathematics
mathematics
2.3 4.0 2013 17.1 Days CHF 2600 Submit

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (8 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
22 pages, 7126 KiB  
Article
Exploring Downscaling in High-Dimensional Lorenz Models Using the Transformer Decoder
by Bo-Wen Shen
Mach. Learn. Knowl. Extr. 2024, 6(4), 2161-2182; https://doi.org/10.3390/make6040107 - 25 Sep 2024
Viewed by 1458
Abstract
This paper investigates the feasibility of downscaling within high-dimensional Lorenz models through the use of machine learning (ML) techniques. This study integrates atmospheric sciences, nonlinear dynamics, and machine learning, focusing on using large-scale atmospheric data to predict small-scale phenomena through ML-based empirical models. [...] Read more.
This paper investigates the feasibility of downscaling within high-dimensional Lorenz models through the use of machine learning (ML) techniques. This study integrates atmospheric sciences, nonlinear dynamics, and machine learning, focusing on using large-scale atmospheric data to predict small-scale phenomena through ML-based empirical models. The high-dimensional generalized Lorenz model (GLM) was utilized to generate chaotic data across multiple scales, which was subsequently used to train three types of machine learning models: a linear regression model, a feedforward neural network (FFNN)-based model, and a transformer-based model. The linear regression model uses large-scale variables to predict small-scale variables, serving as a foundational approach. The FFNN and transformer-based models add complexity, incorporating multiple hidden layers and self-attention mechanisms, respectively, to enhance prediction accuracy. All three models demonstrated robust performance, with correlation coefficients between the predicted and actual small-scale variables exceeding 0.9. Notably, the transformer-based model, which yielded better results than the others, exhibited strong performance in both control and parallel runs, where sensitive dependence on initial conditions (SDIC) occurs during the validation period. This study highlights several key findings and areas for future research: (1) a set of large-scale variables, analogous to multivariate analysis, which retain memory of their connections to smaller scales, can be effectively leveraged by trained empirical models to estimate irregular, chaotic small-scale variables; (2) modern machine learning techniques, such as FFNN and transformer models, are effective in capturing these downscaling processes; and (3) future research could explore both downscaling and upscaling processes within a triple-scale system (e.g., large-scale tropical waves, medium-scale hurricanes, and small-scale convection processes) to enhance the prediction of multiscale weather and climate systems. Full article
(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)
Show Figures

Figure 1

12 pages, 2111 KiB  
Article
RAAFNet: Reverse Attention Adaptive Fusion Network for Large-Scale Point Cloud Semantic Segmentation
by Kai Wang and Huanhuan Zhang
Mathematics 2024, 12(16), 2485; https://doi.org/10.3390/math12162485 - 12 Aug 2024
Viewed by 786
Abstract
Point cloud semantic segmentation is essential for comprehending and analyzing scenes. However, performing semantic segmentation on large-scale point clouds presents challenges, including demanding high memory requirements, a lack of structured data, and the absence of topological information. This paper presents a novel method [...] Read more.
Point cloud semantic segmentation is essential for comprehending and analyzing scenes. However, performing semantic segmentation on large-scale point clouds presents challenges, including demanding high memory requirements, a lack of structured data, and the absence of topological information. This paper presents a novel method based on the Reverse Attention Adaptive Fusion network (RAAFNet) for segmenting large-scale point clouds. RAAFNet consists of a reverse attention encoder–decoder module, an adaptive fusion module, and a local feature aggregation module. The reverse attention encoder–decoder module is applied to extract point cloud features at different scales. The adaptive fusion module enhances fine-grained representation within multi-resolution feature maps. Furthermore, a local aggregation classifier is introduced, which aggregates the features of neighboring points to the center point in order to leverage contextual information and enhance the classifier’s perceptual capability. Finally, the predicted labels are generated. Notably, our method excels at extracting point cloud features across different dimensions and produces highly accurate segmentation results. Experimental results on the Semantic3D dataset achieved an overall accuracy of 89.9% and a mIoU of 74.4%. Full article
(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)
Show Figures

Figure 1

18 pages, 2378 KiB  
Review
A Review of Orebody Knowledge Enhancement Using Machine Learning on Open-Pit Mine Measure-While-Drilling Data
by Daniel M. Goldstein, Chris Aldrich and Louisa O’Connor
Mach. Learn. Knowl. Extr. 2024, 6(2), 1343-1360; https://doi.org/10.3390/make6020063 - 18 Jun 2024
Viewed by 1614
Abstract
Measure while drilling (MWD) refers to the acquisition of real-time data associated with the drilling process, including information related to the geological characteristics encountered in hard-rock mining. The availability of large quantities of low-cost MWD data from blast holes compared to expensive and [...] Read more.
Measure while drilling (MWD) refers to the acquisition of real-time data associated with the drilling process, including information related to the geological characteristics encountered in hard-rock mining. The availability of large quantities of low-cost MWD data from blast holes compared to expensive and sparsely collected orebody knowledge (OBK) data from exploration drill holes make the former more desirable for characterizing pre-excavation subsurface conditions. Machine learning (ML) plays a critical role in the real-time or near-real-time analysis of MWD data to enable timely enhancement of OBK for operational purposes. Applications can be categorized into three areas, focused on the mechanical properties of the rock mass, the lithology of the rock, as well as, related to that, the estimation of the geochemical species in the rock mass. From a review of the open literature, the following can be concluded: (i) The most important MWD metrics are the rate of penetration (rop), torque (tor), weight on bit (wob), bit air pressure (bap), and drill rotation speed (rpm). (ii) Multilayer perceptron analysis has mostly been used, followed by Gaussian processes and other methods, mainly to identify rock types. (iii) Recent advances in deep learning methods designed to deal with unstructured data, such as borehole images and vibrational signals, have not yet been fully exploited, although this is an emerging trend. (iv) Significant recent developments in explainable artificial intelligence could also be used to better advantage in understanding the association between MWD metrics and the mechanical and geochemical structure and properties of drilled rock. Full article
(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)
Show Figures

Figure 1

20 pages, 5568 KiB  
Article
Extracting Interpretable Knowledge from the Remote Monitoring of COVID-19 Patients
by Melina Tziomaka, Athanasios Kallipolitis, Andreas Menychtas, Parisis Gallos, Christos Panagopoulos, Alice Georgia Vassiliou, Edison Jahaj, Ioanna Dimopoulou, Anastasia Kotanidou and Ilias Maglogiannis
Mach. Learn. Knowl. Extr. 2024, 6(2), 1323-1342; https://doi.org/10.3390/make6020062 - 18 Jun 2024
Viewed by 1222
Abstract
Apart from providing user-friendly applications that support digitized healthcare routines, the use of wearable devices has proven to increase the independence of patients in a healthcare setting. By applying machine learning techniques to real health-related data, important conclusions can be drawn for unsolved [...] Read more.
Apart from providing user-friendly applications that support digitized healthcare routines, the use of wearable devices has proven to increase the independence of patients in a healthcare setting. By applying machine learning techniques to real health-related data, important conclusions can be drawn for unsolved issues related to disease prognosis. In this paper, various machine learning techniques are examined and analyzed for the provision of personalized care to COVID-19 patients with mild symptoms based on individual characteristics and the comorbidities they have, while the connection between the stimuli and predictive results are utilized for the evaluation of the system’s transparency. The results, jointly analyzing wearable and electronic health record data for the prediction of a daily dyspnea grade and the duration of fever, are promising in terms of evaluation metrics even in a specified stratum of patients. The interpretability scheme provides useful insight concerning factors that greatly influenced the results. Moreover, it is demonstrated that the use of wearable devices for remote monitoring through cloud platforms is feasible while providing awareness of a patient’s condition, leading to the early detection of undesired changes and reduced visits for patient screening. Full article
(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)
Show Figures

Figure 1

14 pages, 4201 KiB  
Article
Solving Contextual Stochastic Optimization Problems through Contextual Distribution Estimation
by Xuecheng Tian, Bo Jiang, King-Wah Pang, Yu Guo, Yong Jin and Shuaian Wang
Mathematics 2024, 12(11), 1612; https://doi.org/10.3390/math12111612 - 21 May 2024
Viewed by 939
Abstract
Stochastic optimization models always assume known probability distributions about uncertain parameters. However, it is unrealistic to know the true distributions. In the era of big data, with the knowledge of informative features related to uncertain parameters, this study aims to estimate the conditional [...] Read more.
Stochastic optimization models always assume known probability distributions about uncertain parameters. However, it is unrealistic to know the true distributions. In the era of big data, with the knowledge of informative features related to uncertain parameters, this study aims to estimate the conditional distributions of uncertain parameters directly and solve the resulting contextual stochastic optimization problem by using a set of realizations drawn from estimated distributions, which is called the contextual distribution estimation method. We use an energy scheduling problem as the case study and conduct numerical experiments with real-world data. The results demonstrate that the proposed contextual distribution estimation method offers specific benefits in particular scenarios, resulting in improved decisions. This study contributes to the literature on contextual stochastic optimization problems by introducing the contextual distribution estimation method, which holds practical significance for addressing data-driven uncertain decision problems. Full article
(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)
Show Figures

Figure 1

46 pages, 3360 KiB  
Review
Categorical Data Clustering: A Bibliometric Analysis and Taxonomy
by Maya Cendana and Ren-Jieh Kuo
Mach. Learn. Knowl. Extr. 2024, 6(2), 1009-1054; https://doi.org/10.3390/make6020047 - 7 May 2024
Viewed by 2727
Abstract
Numerous real-world applications apply categorical data clustering to find hidden patterns in the data. The K-modes-based algorithm is a popular algorithm for solving common issues in categorical data, from outlier and noise sensitivity to local optima, utilizing metaheuristic methods. Many studies have [...] Read more.
Numerous real-world applications apply categorical data clustering to find hidden patterns in the data. The K-modes-based algorithm is a popular algorithm for solving common issues in categorical data, from outlier and noise sensitivity to local optima, utilizing metaheuristic methods. Many studies have focused on increasing clustering performance, with new methods now outperforming the traditional K-modes algorithm. It is important to investigate this evolution to help scholars understand how the existing algorithms overcome the common issues of categorical data. Using a research-area-based bibliometric analysis, this study retrieved articles from the Web of Science (WoS) Core Collection published between 2014 and 2023. This study presents a deep analysis of 64 articles to develop a new taxonomy of categorical data clustering algorithms. This study also discusses the potential challenges and opportunities in possible alternative solutions to categorical data clustering. Full article
(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)
Show Figures

Figure 1

11 pages, 1606 KiB  
Article
Effective Data Reduction Using Discriminative Feature Selection Based on Principal Component Analysis
by Faith Nwokoma, Justin Foreman and Cajetan M. Akujuobi
Mach. Learn. Knowl. Extr. 2024, 6(2), 789-799; https://doi.org/10.3390/make6020037 - 3 Apr 2024
Viewed by 1883
Abstract
Effective data reduction must retain the greatest possible amount of informative content of the data under examination. Feature selection is the default for dimensionality reduction, as the relevant features of a dataset are usually retained through this method. In this study, we used [...] Read more.
Effective data reduction must retain the greatest possible amount of informative content of the data under examination. Feature selection is the default for dimensionality reduction, as the relevant features of a dataset are usually retained through this method. In this study, we used unsupervised learning to discover the top-k discriminative features present in the large multivariate IoT dataset used. We used the statistics of principal component analysis to filter the relevant features based on the ranks of the features along the principal directions while also considering the coefficients of the components. The selected number of principal components was used to decide the number of features to be selected in the SVD process. A number of experiments were conducted using different benchmark datasets, and the effectiveness of the proposed method was evaluated based on the reconstruction error. The potency of the results was verified by subjecting the algorithm to a large IoT dataset, and we compared the performance based on accuracy and reconstruction error to the results of the benchmark datasets. The performance evaluation showed consistency with the results obtained with the benchmark datasets, which were of high accuracy and low reconstruction error. Full article
(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)
Show Figures

Figure 1

20 pages, 1028 KiB  
Review
Luxury Car Data Analysis: A Literature Review
by Pegah Barakati, Flavio Bertini, Emanuele Corsi, Maurizio Gabbrielli and Danilo Montesi
Data 2024, 9(4), 48; https://doi.org/10.3390/data9040048 - 30 Mar 2024
Viewed by 4619
Abstract
The concept of luxury, considering it a rare and exclusive attribute, is evolving due to technological advances and the increasing influence of consumers in the market. Luxury cars have always symbolized wealth, social status, and sophistication. Recently, as technology progresses, the ability and [...] Read more.
The concept of luxury, considering it a rare and exclusive attribute, is evolving due to technological advances and the increasing influence of consumers in the market. Luxury cars have always symbolized wealth, social status, and sophistication. Recently, as technology progresses, the ability and interest to gather, store, and analyze data from these elegant vehicles has also increased. In recent years, the analysis of luxury car data has emerged as a significant area of research, highlighting researchers’ exploration of various aspects that may differentiate luxury cars from ordinary ones. For instance, researchers study factors such as economic impact, technological advancements, customer preferences and demographics, environmental implications, brand reputation, security, and performance. Although the percentage of individuals purchasing luxury cars is lower than that of ordinary cars, the significance of analyzing luxury car data lies in its impact on various aspects of the automotive industry and society. This literature review aims to provide an overview of the current state of the art in luxury car data analysis. Full article
(This article belongs to the Topic Big Data Intelligence: Methodologies and Applications)
Show Figures

Figure 1

Back to TopTop