Advances in Data Science: Methods, Systems, and Applications

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (30 September 2024) | Viewed by 18083

Special Issue Editors


E-Mail Website
Guest Editor
Department of Control and Computer Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy
Interests: mobile and pervasive systems; tracking systems; natural interfaces; data-intensive architectures; data-driven methodologies for cultural heritage

E-Mail Website
Guest Editor
French Council of Scientific Research (CNRS), LIRIS, France Campus de la Doua, 25 Avenue Pierre de Coubertin, CEDEX, 69622 Villeurbanne, France
Interests: data science pipeline optimization and enactment; data analytics operators; graph analytics pipeline specification and execution on just-in-time architectures; data analytics on multi-scale target architectures; domain-specific query languages for data science queries

E-Mail Website
Guest Editor
Department of Control and Computer Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy
Interests: explainable AI; data science; automated data analytics; machine learning; natural language processing; concept drift methodologies; computational social science
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

We are pleased to announce a new Special Issue, “Advances in Data Science: Methods, Systems and Applications”, which aims to allow researchers and practitioners from different research areas to share their experiences in developing state-of-the-art analytics solutions through new methods, novel architectures and systems, and real-world applications that could benefit from the proposed solutions. Researchers are invited to submit research activities describing innovative methods, algorithms, and platforms that cover all facets of a data analytics process that provides interesting and useful services. Papers detailing industrial implementations of data analytics applications, design, and deployment experience reports on various issues raised by data analytics projects are particularly welcome. We call for research, experience reports, and demonstration proposals covering all aspects of data analytics projects and research activities. 

We welcome technical, experimental, and methodological manuscripts, as well as contributions to applied data science, that address the following topics:

  • Advances in data science and data management methods, systems, and applications.
  • Intelligent systems, cyber-physical systems, data engine, IoT platforms, big data frameworks and architectures.
  • Advances in AI and ML methods such as deep neural networks, explainable AI, computational intelligence, natural language processing, reinforcement learning models, concept drift management, and augmented reality.
  • The fields of engineering, computer science, physical, social, and life sciences, with a particular emphasis on ethical issues, fairness, and accountability.
  • Outlining academic and industrial needs and suggesting future research directions and agendas.

Application scenarios of interest include, but are not limited to:

Dr. Giovanni Malnati
Dr. Genoveva Vargas-Solar
Dr. Tania Cerquitelli
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data management
  • explainable AI
  • machine learning
  • big data architectures
  • applied data science
  • computational social science

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

28 pages, 5711 KiB  
Article
Timed Genetic Process Mining for Robust Tracking of Processes under Incomplete Event Log Conditions
by Yutika Amelia Effendi and Minsoo Kim
Electronics 2024, 13(18), 3752; https://doi.org/10.3390/electronics13183752 - 21 Sep 2024
Viewed by 509
Abstract
In process mining, an event log is a structured collection of recorded events that describes the execution of processes within an organization. The completeness of event logs is crucial for ensuring accurate and reliable process models. Incomplete event logs, which can result from [...] Read more.
In process mining, an event log is a structured collection of recorded events that describes the execution of processes within an organization. The completeness of event logs is crucial for ensuring accurate and reliable process models. Incomplete event logs, which can result from system errors, manual data entry mistakes, or irregular operational patterns, undermine the integrity of these models. Addressing this issue is essential for constructing accurate models. This research aims to enhance process model performance and robustness by transforming incomplete event logs into complete ones using a process discovery algorithm. Genetic process mining, a type of process discovery algorithm, is chosen for its ability to evaluate multiple candidate solutions concurrently, effectively recovering missing events and improving log completeness. However, the original form of the genetic process mining algorithm is not optimized for handling incomplete logs, which can result in incorrect models being discovered. To address this limitation, this research proposes a modified approach that incorporates timing information to better manage incomplete logs. By leveraging timing data, the algorithm can infer missing events, leading to process tracking and reconstruction which is more accurate. Experimental results validate the effectiveness of the modified algorithm, showing higher fitness and precision scores, improved process model comparisons, and a good level of coverage without errors. Additionally, several advanced metrics for conformance checking are presented to further validate the process models and event logs discovered by both algorithms. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

43 pages, 2556 KiB  
Article
Processing the Narrative: Innovative Graph Models and Queries for Textual Content Knowledge Extraction
by Genoveva Vargas-Solar
Electronics 2024, 13(18), 3688; https://doi.org/10.3390/electronics13183688 - 17 Sep 2024
Viewed by 799
Abstract
The internet contains vast amounts of text-based information across various domains, such as commercial documents, medical records, scientific research, engineering tests, and events affecting urban and natural environments. Extracting knowledge from these texts requires a deep understanding of natural language nuances and accurately [...] Read more.
The internet contains vast amounts of text-based information across various domains, such as commercial documents, medical records, scientific research, engineering tests, and events affecting urban and natural environments. Extracting knowledge from these texts requires a deep understanding of natural language nuances and accurately representing content while preserving essential information. This process enables effective knowledge extraction, inference, and discovery. This paper proposes a critical study of state-of-the-art contributions exploring the complexities and emerging trends in representing, querying, and analysing content extracted from textual data. This study’s hypothesis states that graph-based representations can be particularly effective when annotated with sophisticated querying and analytics techniques. This hypothesis is discussed through the lenses of contributions in linguistics, natural language processing, graph theory, databases, and artificial intelligence. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

16 pages, 1950 KiB  
Article
Anomaly Detection Based on GCNs and DBSCAN in a Large-Scale Graph
by Christopher Retiti Diop Emane, Sangho Song, Hyeonbyeong Lee, Dojin Choi, Jongtae Lim, Kyoungsoo Bok and Jaesoo Yoo
Electronics 2024, 13(13), 2625; https://doi.org/10.3390/electronics13132625 - 4 Jul 2024
Viewed by 1687
Abstract
Anomaly detection is critical across domains, from cybersecurity to fraud prevention. Graphs, adept at modeling intricate relationships, offer a flexible framework for capturing complex data structures. This paper proposes a novel anomaly detection approach, combining Graph Convolutional Networks (GCNs) and Density-Based Spatial Clustering [...] Read more.
Anomaly detection is critical across domains, from cybersecurity to fraud prevention. Graphs, adept at modeling intricate relationships, offer a flexible framework for capturing complex data structures. This paper proposes a novel anomaly detection approach, combining Graph Convolutional Networks (GCNs) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). GCNs, a specialized deep learning model for graph data, extracts meaningful node and edge representations by incorporating graph topology and attribute information. This facilitates learning expressive node embeddings capturing local and global structural patterns. For anomaly detection, DBSCAN, a density-based clustering algorithm effective in identifying clusters of varying densities amidst noise, is employed. By defining a minimum distance threshold and a minimum number of points within that distance, DBSCAN proficiently distinguishes normal graph elements from anomalies. Our approach involves training a GCN model on a labeled graph dataset, generating appropriately labeled node embeddings. These embeddings serve as input to DBSCAN, identifying clusters and isolating anomalies as noise points. The evaluation on benchmark datasets highlights the superior performance of our approach in anomaly detection compared to traditional methods. The fusion of GCNs and DBSCAN demonstrates a significant potential for accurate and efficient anomaly detection in graphs. This research contributes to advancing graph-based anomaly detection, with promising applications in domains where safeguarding data integrity and security is paramount. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

20 pages, 1844 KiB  
Article
APIMiner: Identifying Web Application APIs Based on Web Page States Similarity Analysis
by Yuanchao Chen, Yuliang Lu, Zulie Pan, Juxing Chen, Fan Shi, Yang Li and Yonghui Jiang
Electronics 2024, 13(6), 1112; https://doi.org/10.3390/electronics13061112 - 18 Mar 2024
Viewed by 1328
Abstract
Modern web applications offer various APIs for data interaction. However, as the number of these APIs increases, so does the potential for security threats. Essentially, more APIs in an application can lead to more detectable vulnerabilities. Thus, it is crucial to identify APIs [...] Read more.
Modern web applications offer various APIs for data interaction. However, as the number of these APIs increases, so does the potential for security threats. Essentially, more APIs in an application can lead to more detectable vulnerabilities. Thus, it is crucial to identify APIs as comprehensively as possible in web applications. However, this task faces challenges due to the increasing complexity of web development techniques and the abundance of similar web pages. In this paper, we propose APIMiner, a framework for identifying APIs in web applications by dynamically traversing web pages based on web page state similarity analysis. APIMiner first builds a web page model based on the HTML elements of the current web page. APIMiner then uses this model to represent the state of the page. Then, APIMiner evaluates each element’s similarity in the page model and determines the page state similarity based on these similarity values. From the different states of the page, APIMiner extracts the data interaction APIs on the page. We conduct extensive experiments to evaluate APIMiner’s effectiveness. In the similarity analysis, our method surpasses state-of-the-art methods like NDD and mNDD in accurately distinguishing similar pages. We compare APIMiner with state-of-the-art tools (e.g., Enemy of the State, Crawlergo, and Wapiti3) for API identification. APIMiner excels in the number of identified APIs (average 1136) and code coverage (average 28,470). Relative to these tools, on average, APIMiner identifies 7.96 times more APIs and increases code coverage by 142.72%. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

12 pages, 2919 KiB  
Article
Aircraft Behavior Recognition on Trajectory Data with a Multimodal Approach
by Meng Zhang, Lingxi Zhang and Tao Liu
Electronics 2024, 13(2), 367; https://doi.org/10.3390/electronics13020367 - 16 Jan 2024
Viewed by 1096
Abstract
Moving traces are essential data for target detection and associated behavior recognition. Previous studies have used time–location sequences, route maps, or tracking videos to establish mathematical recognition models for behavior recognition. The multimodal approach has seldom been considered because of the limited modality [...] Read more.
Moving traces are essential data for target detection and associated behavior recognition. Previous studies have used time–location sequences, route maps, or tracking videos to establish mathematical recognition models for behavior recognition. The multimodal approach has seldom been considered because of the limited modality of sensing data. With the rapid development of natural language processing and computer vision, the multimodal model has become a possible choice to process multisource data. In this study, we have proposed a mathematical model for aircraft behavior recognition with joint data manners. The feature abstraction, cross-modal fusion, and classification layers are included in the proposed model for obtaining multiscale features and analyzing multimanner information. Attention has been placed on providing self- and cross-relation assessments on the spatiotemporal and geographic data related to a moving object. We have adopted both a feedforward network and a softmax function to form the classifier. Moreover, we have enabled a modality-increasing phase, combining longitude and latitude sequences with related geographic maps to avoid monotonous data. We have collected an aircraft trajectory dataset of longitude and latitude sequences for experimental validation. We have demonstrated the excellent behavior recognition performance of the proposed model joint with the modality-increasing phase. As a result, our proposed methodology reached the highest accuracy of 95.8% among all the adopted methods, demonstrating the effectiveness and feasibility of trajectory-based behavior recognition. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

24 pages, 825 KiB  
Article
Evaluation Method of IP Geolocation Database Based on City Delay Characteristics
by Yuancheng Xie, Zhaoxin Zhang, Yang Liu, Enhao Chen and Ning Li
Electronics 2024, 13(1), 15; https://doi.org/10.3390/electronics13010015 - 19 Dec 2023
Cited by 2 | Viewed by 1328
Abstract
Despite the widespread use of IP geolocation databases, a robust and precise method for evaluating their accuracy remains elusive. This study presents a novel algorithm designed to assess the reliability of IP geolocation databases, leveraging the congruence of delay distributions across network segments [...] Read more.
Despite the widespread use of IP geolocation databases, a robust and precise method for evaluating their accuracy remains elusive. This study presents a novel algorithm designed to assess the reliability of IP geolocation databases, leveraging the congruence of delay distributions across network segments and cities. We developed a fusion reference database, termed CDCDB, to facilitate the evaluation of commercial IP geolocation databases. Remarkably, CDCDB achieves an average positioning accuracy at the city level of 94%, coupled with a city coverage of 99.99%. This allows for an effective and comprehensive evaluation of IP geolocation databases. When compared to IPUU, CDCDB demonstrates an increase in the number of network segments by 18.7%, an increase in the number of high-quality network segments by 13.2%, and an enhancement in the coverage of city-level network segments by 20.92%. The evaluation outcomes reveal that the reliability of IP geolocation databases is not uniform across different cities. Moreover, distinct IP geolocation databases display varying preferences for cities. Consequently, we advise online service providers to select suitable IP geolocation databases based on the cities they cater to, as this could significantly enhance service quality. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

20 pages, 3841 KiB  
Article
High-Level K-Nearest Neighbors (HLKNN): A Supervised Machine Learning Model for Classification Analysis
by Elife Ozturk Kiyak, Bita Ghasemkhani and Derya Birant
Electronics 2023, 12(18), 3828; https://doi.org/10.3390/electronics12183828 - 10 Sep 2023
Cited by 10 | Viewed by 4876
Abstract
The k-nearest neighbors (KNN) algorithm has been widely used for classification analysis in machine learning. However, it suffers from noise samples that reduce its classification ability and therefore prediction accuracy. This article introduces the high-level k-nearest neighbors (HLKNN) method, a new technique for [...] Read more.
The k-nearest neighbors (KNN) algorithm has been widely used for classification analysis in machine learning. However, it suffers from noise samples that reduce its classification ability and therefore prediction accuracy. This article introduces the high-level k-nearest neighbors (HLKNN) method, a new technique for enhancing the k-nearest neighbors algorithm, which can effectively address the noise problem and contribute to improving the classification performance of KNN. Instead of only considering k neighbors of a given query instance, it also takes into account the neighbors of these neighbors. Experiments were conducted on 32 well-known popular datasets. The results showed that the proposed HLKNN method outperformed the standard KNN method with average accuracy values of 81.01% and 79.76%, respectively. In addition, the experiments demonstrated the superiority of HLKNN over previous KNN variants in terms of the accuracy metric in various datasets. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

23 pages, 1520 KiB  
Article
Research and Hardware Implementation of a Reduced-Latency Quadruple-Precision Floating-Point Arctangent Algorithm
by Changjun He, Bosong Yan, Shiyun Xu, Yiwen Zhang, Zhenhua Wang and Mingjiang Wang
Electronics 2023, 12(16), 3472; https://doi.org/10.3390/electronics12163472 - 16 Aug 2023
Cited by 2 | Viewed by 1441
Abstract
In the field of digital signal processing, such as in navigation and radar, a significant number of high-precision arctangent function calculations are required. Lookup tables, polynomial approximation, and single/double-precision floating-point Coordinate Rotation Digital Computer (CORDIC) algorithms are insufficient to meet the demands of [...] Read more.
In the field of digital signal processing, such as in navigation and radar, a significant number of high-precision arctangent function calculations are required. Lookup tables, polynomial approximation, and single/double-precision floating-point Coordinate Rotation Digital Computer (CORDIC) algorithms are insufficient to meet the demands of practical applications, where both high precision and low latency are essential. In this paper, based on the concept of trading area for speed, a four-step parallel branch iteration CORDIC algorithm is proposed. Using this improved algorithm, a 128-bit quad-precision floating-point arctangent function is designed, and the hardware circuit implementation of the arctangent algorithm is realized. The results demonstrate that the improved algorithm can achieve 128-bit floating-point arctangent calculations in just 32 cycles, with a maximum error not exceeding 2×1034 rad. It possesses exceptionally high computational accuracy and efficiency. Furthermore, the hardware area of the arithmetic unit is approximately 0.6317 mm2, and the power consumption is about 40.6483 mW under the TSMC 65 nm process at a working frequency of 500 MHz. This design can be well suited for dedicated CORDIC processor chip applications. The research presented in this paper holds significant value for high-precision and rapid arctangent function calculations in radar, navigation, meteorology, and other fields. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

34 pages, 2337 KiB  
Article
A Novel Process of Parsing Event-Log Activities for Process Mining Based on Information Content
by Fadilul-lah Yassaanah Issahaku, Xianwen Fang, Sumaiya Bashiru Danwana, Edem Kwedzo Bankas and Ke Lu
Electronics 2023, 12(2), 289; https://doi.org/10.3390/electronics12020289 - 5 Jan 2023
Cited by 1 | Viewed by 1989
Abstract
Process mining has piqued the interest of researchers and technology manufacturers. Process mining aims to extract information from event activities and their interdependencies from events recorded by some enterprise systems. An enterprise system’s transactions are labeled based on their information content, such as [...] Read more.
Process mining has piqued the interest of researchers and technology manufacturers. Process mining aims to extract information from event activities and their interdependencies from events recorded by some enterprise systems. An enterprise system’s transactions are labeled based on their information content, such as an activity that causes the occurrence of another, the timestamp between events, and the resource from which the transaction originated. This paper describes a novel process of parsing event-log activities based on information content (IC). The information content of attributes, especially activity names, which are used to describe the flow processes of enterprise systems, is grouped hierarchically as hypernyms and hyponyms in a subsume tree. The least common subsume (LCS) values of these activity names are calculated, and the corresponding relatedness values between them are obtained. These values are used to create a fuzzy causal matrix (FCM) for parsing the activities, from which a process mining algorithm is designed to mine the structural and semantic relationships among activities using an enhanced gray wolf optimizer and backpropagation algorithm. The proposed approach is resistant to noisy and incomplete event logs and can be used for process mining to reflect the structure and behavior of event logs. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

11 pages, 341 KiB  
Article
Theory-Guided Deep Learning Algorithms: An Experimental Evaluation
by Simone Monaco, Daniele Apiletti and Giovanni Malnati
Electronics 2022, 11(18), 2850; https://doi.org/10.3390/electronics11182850 - 9 Sep 2022
Cited by 3 | Viewed by 1945
Abstract
The use of theory-based knowledge in machine learning models has a major impact on many engineering and physics problems. The growth of deep learning algorithms is closely related to an increasing demand for data that is not acceptable or available in many use [...] Read more.
The use of theory-based knowledge in machine learning models has a major impact on many engineering and physics problems. The growth of deep learning algorithms is closely related to an increasing demand for data that is not acceptable or available in many use cases. In this context, the incorporation of physical knowledge or a priori constraints has proven beneficial in many tasks. On the other hand, this collection of approaches is context-specific, and it is difficult to generalize them to new problems. In this paper, we experimentally compare some of the most commonly used theory-injection strategies to perform a systematic analysis of their advantages. Selected state-of-the-art algorithms were reproduced for different use cases to evaluate their effectiveness with smaller training data and to discuss how the underlined strategies can fit into new application contexts. Full article
(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)
Show Figures

Figure 1

Back to TopTop