New Advances in Data Analytics and Mining

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: 9 June 2026 | Viewed by 9852

Special Issue Editor


E-Mail Website
Guest Editor
School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China
Interests: clustering and classification algorithms; feature selection; rough set; granular computing; machine learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In recent years, with the rapid development of computer and network techniques, some traditional data analytics and mining technologies are facing unprecedented challenges. How to deal with large scale data is now a challenging task in the fields of information science and artificial intelligence. Many new data analytics and mining methods, including deep learning, granular computing, concept lattice and visualization method, are developed in order to solve the problem of big data analytics and mining.

This Special Issue provides a platform for researchers to present their novel and unpublished works in the domain of data analytics and mining. We are pleased to invite you, along with the members of your research group, to contribute to the forthcoming Special Issue, entitled “New Advances in Data Analytics and Mining”. Potential topics include, but are not limited to, the following:

  1. Data mining techniques;
  2. Knowledge-based granular data analytics;
  3. Knowledge-based three-way data analytics;
  4. Concept lattice;
  5. Visualization methods;
  6. Deep learning;
  7. Clustering and classification algorithms;
  8. Uncertainty in big data;
  9. Cognitive computing;
  10. Features selection.

Prof. Dr. Xibei Yang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data mining
  • data analytics
  • granular computing
  • uncertainty analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 2688 KiB  
Article
Similarity-Based Three-Way Clustering by Using Dimensionality Reduction
by Anlong Li, Yiping Meng and Pingxin Wang
Mathematics 2024, 12(13), 1951; https://doi.org/10.3390/math12131951 - 24 Jun 2024
Cited by 1 | Viewed by 943
Abstract
Three-way clustering uses core region and fringe region to describe a cluster, which divide the dataset into three parts. The division helps identify the central core and outer sparse regions of a cluster. One of the main challenges in three-way clustering is the [...] Read more.
Three-way clustering uses core region and fringe region to describe a cluster, which divide the dataset into three parts. The division helps identify the central core and outer sparse regions of a cluster. One of the main challenges in three-way clustering is the meaningful construction of the two sets. Aimed at handling high-dimensional data and improving the stability of clustering, this paper proposes a novel three-way clustering method. The proposed method uses dimensionality reduction techniques to reduce data dimensions and eliminate noise. Based on the reduced dataset, random sampling and feature extraction are performed multiple times to introduce randomness and diversity, enhancing the algorithm’s robustness. Ensemble strategies are applied on these subsets, and the k-means algorithm is utilized to obtain multiple clustering results. Based on these results, we obtain co-association frequency between different samples and fused clustering result using the single-linkage method of hierarchical clustering. In order to describe the core region and fringe region of each cluster, the similar class of each sample is defined by co-association frequency. The lower and upper approximations of each cluster are obtained based on similar class. The samples in the lower approximation of each cluster belong to the core region of the cluster. The differences between lower and upper approximations of each cluster are defined as fringe region. Therefore, a three-way explanation of each cluster is naturally formed. By employing various UC Irvine Machine Learning Repository (UCI) datasets and comparing different clustering metrics such as Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and Accuracy (ACC), the experimental results show that the proposed strategy is effective in improving the structure of clustering results. Full article
(This article belongs to the Special Issue New Advances in Data Analytics and Mining)
Show Figures

Figure 1

11 pages, 701 KiB  
Article
A Novel Fuzzy Bi-Clustering Algorithm with Axiomatic Fuzzy Set for Identification of Co-Regulated Genes
by Kaijie Xu and Yixi Wang
Mathematics 2024, 12(11), 1659; https://doi.org/10.3390/math12111659 - 26 May 2024
Viewed by 770
Abstract
The identification of co-regulated genes and their Transcription-Factor Binding Sites (TFBSs) are the key steps toward understanding transcription regulation. In addition to effective laboratory assays, various bi-clustering algorithms for the detection of the co-expressed genes have been developed. Bi-clustering methods are used to [...] Read more.
The identification of co-regulated genes and their Transcription-Factor Binding Sites (TFBSs) are the key steps toward understanding transcription regulation. In addition to effective laboratory assays, various bi-clustering algorithms for the detection of the co-expressed genes have been developed. Bi-clustering methods are used to discover subgroups of genes with similar expression patterns under to-be-identified subsets of experimental conditions when applied to gene expression data. By building two fuzzy partition matrices of the gene expression data with the Axiomatic Fuzzy Set (AFS) theory, this paper proposes a novel fuzzy bi-clustering algorithm for the identification of co-regulated genes. Specifically, the gene expression data are transformed into two fuzzy partition matrices via the sub-preference relations theory of AFS at first. One of the matrices considers the genes as the universe and the conditions as the concept, and the other one considers the genes as the concept and the conditions as the universe. The identification of the co-regulated genes (bi-clusters) is carried out on the two partition matrices at the same time. Then, a novel fuzzy-based similarity criterion is defined based on the partition matrices, and a cyclic optimization algorithm is designed to discover the significant bi-clusters at the expression level. The above procedures guarantee that the generated bi-clusters have more significant expression values than those extracted by the traditional bi-clustering methods. Finally, the performance of the proposed method is evaluated with the performance of the three well-known bi-clustering algorithms on publicly available real microarray datasets. The experimental results are in agreement with the theoretical analysis and show that the proposed algorithm can effectively detect the co-regulated genes without any prior knowledge of the gene expression data. Full article
(This article belongs to the Special Issue New Advances in Data Analytics and Mining)
Show Figures

Figure 1

15 pages, 4026 KiB  
Article
Augmentation of Soft Partition with a Granular Prototype Based Fuzzy C-Means
by Ruixin Wang, Kaijie Xu and Yixi Wang
Mathematics 2024, 12(11), 1639; https://doi.org/10.3390/math12111639 - 23 May 2024
Viewed by 632
Abstract
Clustering is a fundamental cornerstone in unsupervised learning, playing a pivotal role in various data mining techniques. The precise and efficient classification of data stands as a central focus for numerous researchers and practitioners alike. In this study, we design an effective soft [...] Read more.
Clustering is a fundamental cornerstone in unsupervised learning, playing a pivotal role in various data mining techniques. The precise and efficient classification of data stands as a central focus for numerous researchers and practitioners alike. In this study, we design an effective soft partition classification method which refines and extends the prototype of the well-known Fuzzy C-Means clustering algorithm. Specifically, the developed scheme employs membership function to extend the prototypes into a series of granular prototypes, thus achieving a deeper revelation of the structure of the data. This process softly divides the data into core and extended parts. The core part can be succinctly encapsulated through several information granules, whereas the extended part lacks discernible geometry and requires formal descriptors (such as membership formulas). Our objective is to develop information granules that shape the core structure within the dataset, delineate their characteristics, and explore the interaction among these granules that result in their deformation. The granular prototypes become the main component of the information granules and provide an optimization space for traditional prototypes. Subsequently, we apply quantum-behaved particle swarm optimization to identify the optimal partition matrix for the data. This optimized matrix significantly enhances the partition performance of the data. Experimental results provide substantial evidence of the effectiveness of the proposed approach. Full article
(This article belongs to the Special Issue New Advances in Data Analytics and Mining)
Show Figures

Figure 1

13 pages, 8207 KiB  
Article
Enhancement of the Classification Performance of Fuzzy C-Means through Uncertainty Reduction with Cloud Model Interpolation
by Weiwei Mao and Kaijie Xu
Mathematics 2024, 12(7), 975; https://doi.org/10.3390/math12070975 - 25 Mar 2024
Cited by 2 | Viewed by 915
Abstract
As an information granulation technology, clustering plays a pivotal role in unsupervised learning, serving as a fundamental cornerstone for various data mining techniques. The effective and accurate classification of data is a central focus for numerous researchers. For a dataset, we assert that [...] Read more.
As an information granulation technology, clustering plays a pivotal role in unsupervised learning, serving as a fundamental cornerstone for various data mining techniques. The effective and accurate classification of data is a central focus for numerous researchers. For a dataset, we assert that the classification performance of a clustering method is significantly influenced by uncertain data, particularly those situated at the cluster boundaries. It is evident that uncertain data encapsulate richer information compared with others. Generally, the greater the uncertainty, the more information the data holds. Therefore, conducting a comprehensive analysis of this particular subset of data carries substantial significance. This study presents an approach to characterize data distribution properties using fuzzy clustering and defines the boundary and non-boundary characteristics (certainty and uncertainty) of the data. To improve the classification performance, the strategy focuses on reducing the uncertainty associated with boundary data. The proposed scheme involves inserting data points with the cloud computing technology based on the distribution characteristics of the membership functions to diminish the uncertainty of uncertain data. Building upon this, the contribution of boundary data is reassigned to the prototype in order to diminish the proportion of uncertain data. Subsequently, the classifier is optimized through data label (classification error) supervision. Ultimately, the objective is to leverage clustering algorithms for classification, thereby enhancing overall classification accuracy. Experimental results substantiate the effectiveness of the proposed scheme. Full article
(This article belongs to the Special Issue New Advances in Data Analytics and Mining)
Show Figures

Figure 1

16 pages, 449 KiB  
Article
A Task Orchestration Strategy in a Cloud-Edge Environment Based on Intuitionistic Fuzzy Sets
by Chunmei Huang, Bingbing Fan and Chunmao Jiang
Mathematics 2024, 12(1), 122; https://doi.org/10.3390/math12010122 - 29 Dec 2023
Viewed by 994
Abstract
In the context of the burgeoning cloud-edge collaboration paradigm, powered by advancements in the Internet of Things (IoT), cloud computing, and 5G technology, this paper proposes a task orchestrating strategy for cloud-edge collaborative environments based on intuitionistic fuzzy sets. The proposed strategy prioritizes [...] Read more.
In the context of the burgeoning cloud-edge collaboration paradigm, powered by advancements in the Internet of Things (IoT), cloud computing, and 5G technology, this paper proposes a task orchestrating strategy for cloud-edge collaborative environments based on intuitionistic fuzzy sets. The proposed strategy prioritizes efficient resource utilization, minimizes task failures, and reduces service time. First, WAN bandwidth, edge server virtual machine utilization, delay sensitivity of the task, and task length are used to determine whether the task should be executed on the cloud or edge device. Then, the cloud-edge collaborative decision-making algorithm is used to select the task’s target edge servers (either the local edge servers or the neighboring edge servers). Finally, simulation experiments are conducted to demonstrate the effectiveness and efficacy of the proposed algorithm. Full article
(This article belongs to the Special Issue New Advances in Data Analytics and Mining)
Show Figures

Figure 1

26 pages, 10013 KiB  
Article
An Adaptive Ant Colony Optimization for Solving Large-Scale Traveling Salesman Problem
by Kezong Tang, Xiong-Fei Wei, Yuan-Hao Jiang, Zi-Wei Chen and Lihua Yang
Mathematics 2023, 11(21), 4439; https://doi.org/10.3390/math11214439 - 26 Oct 2023
Cited by 6 | Viewed by 2980
Abstract
The ant colony algorithm faces dimensional catastrophe problems when solving the large-scale traveling salesman problem, which leads to unsatisfactory solution quality and convergence speed. To solve this problem, an adaptive ant colony optimization for large-scale traveling salesman problem (AACO-LST) is proposed. First, AACO-LST [...] Read more.
The ant colony algorithm faces dimensional catastrophe problems when solving the large-scale traveling salesman problem, which leads to unsatisfactory solution quality and convergence speed. To solve this problem, an adaptive ant colony optimization for large-scale traveling salesman problem (AACO-LST) is proposed. First, AACO-LST improves the state transfer rule to make it adaptively adjust with the population evolution, thus accelerating its convergence speed; then, the 2-opt operator is used to locally optimize the part of better ant paths to further optimize the solution quality of the proposed algorithm. Finally, the constructed adaptive pheromone update rules can significantly improve the search efficiency and prevent the algorithm from falling into local optimal solutions or premature stagnation. The simulation based on 45 traveling salesman problem instances shows that AACO-LST improves the solution quality by 79% compared to the ant colony system (ACS), and in comparison with other algorithms, the PE of AACO-LST is not more than 1% and the Err is not more than 2%, which indicates that AACO-LST can find high-quality solutions with high stability. Finally, the convergence speed of the proposed algorithm was tested. The data shows that the average convergence speed of AACO-LST is more than twice that of the comparison algorithm. The relevant code can be found on our project homepage. Full article
(This article belongs to the Special Issue New Advances in Data Analytics and Mining)
Show Figures

Figure 1

30 pages, 1552 KiB  
Article
A Broad TSK Fuzzy Classifier with a Simplified Set of Fuzzy Rules for Class-Imbalanced Learning
by Jinghong Zhang, Yingying Li, Bowen Liu, Hao Chen, Jie Zhou, Hualong Yu and Bin Qin
Mathematics 2023, 11(20), 4284; https://doi.org/10.3390/math11204284 - 13 Oct 2023
Viewed by 1229
Abstract
With the expansion of data scale and diversity, the issue of class imbalance has become increasingly salient. The current methods, including oversampling and under-sampling, exhibit limitations in handling complex data, leading to overfitting, loss of critical information, and insufficient interpretability. In response to [...] Read more.
With the expansion of data scale and diversity, the issue of class imbalance has become increasingly salient. The current methods, including oversampling and under-sampling, exhibit limitations in handling complex data, leading to overfitting, loss of critical information, and insufficient interpretability. In response to these challenges, we propose a broad TSK fuzzy classifier with a simplified set of fuzzy rules (B-TSK-FC) that deals with classification tasks with class-imbalanced data. Firstly, we select and optimize fuzzy rules based on their adaptability to different complex data to simplify the fuzzy rules and therefore improve the interpretability of the TSK fuzzy sub-classifiers. Secondly, the fuzzy rules are weighted to protect the information demonstrated by minority classes, thereby improving the classification performance on class-imbalanced datasets. Finally, a novel loss function is designed to derive the weights for each TSK fuzzy sub-classifier. The experimental results on fifteen benchmark datasets demonstrate that B-TSK-FC is superior to the comparative methods from the aspects of classification performance and interpretability in the scenario of class imbalance. Full article
(This article belongs to the Special Issue New Advances in Data Analytics and Mining)
Show Figures

Figure 1

Back to TopTop