Data Mining: Analysis and Applications

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (20 June 2023) | Viewed by 49386

Special Issue Editors


E-Mail Website
Guest Editor
School of Artificial Intelligence, Southwest University, Chongqing 400100, China
Interests: cognitive computing; data mining; granular computing; information fusion; knowledge engineering
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Faculty of Science, Kunming University of Science and Technology, Kunming 650504, China
Interests: classification algorithms; computational intelligence; feature selection; formal concept analysis; optimization

E-Mail Website
Guest Editor
School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China
Interests: clustering and classification algorithms; feature selection; rough set; granular computing; machine learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data mining is a research field aimed at extracting potentially useful information from data sets. There has been a constant growth in theoretical analysis and practical applications in the field of data mining. In recent years, with the rapid development of computer and network techniques, large amounts of data have been brought to our attention. Some traditional data mining technologies are facing unprecedented challenges. Dealing with large-scale data represents a challenging task in the fields of information science and artificial intelligence. Many new data mining methods, including deep learning, granular computing, concept lattice, and visualization methods, are developed in order to solve the problem of big data mining. This Special Issue provides a platform for researchers from analysis and application backgrounds to present their novel and unpublished works in the domain of data mining. Potential topics include, but are not limited to, the following:

  1. Big data mining;
  2. Clustering and classification algorithms;
  3. Cognitive computing;
  4. Computational intelligence;
  5. Data mining techniques;
  6. Deep learning;
  7. Feature selection;
  8. Formal concept analysis;
  9. Fuzzy set and fuzzy logic;
  10. Information fusion;
  11. Granular computing;
  12. Knowledge discovery;
  13. Machine learning;
  14. Practical applications of data mining;
  15. Uncertainty in big data;
  16. Visualization methods.

Prof. Dr. Weihua Xu
Prof. Dr. Jinhai Li
Prof. Dr. Xibei Yang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data mining
  • knowledge engineering
  • information fusion
  • machine learning
  • uncertainty analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (22 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 1536 KiB  
Article
Privacy-Preserving Distributed Learning via Newton Algorithm
by Zilong Cao, Xiao Guo and Hai Zhang
Mathematics 2023, 11(18), 3807; https://doi.org/10.3390/math11183807 - 5 Sep 2023
Viewed by 947
Abstract
Federated learning (FL) is a prominent distributed learning framework. The main barriers of FL include communication cost and privacy breaches. In this work, we propose a novel privacy-preserving second-order-based FL method, called GDP-LocalNewton. To improve the communication efficiency, we use Newton’s method [...] Read more.
Federated learning (FL) is a prominent distributed learning framework. The main barriers of FL include communication cost and privacy breaches. In this work, we propose a novel privacy-preserving second-order-based FL method, called GDP-LocalNewton. To improve the communication efficiency, we use Newton’s method to iterate and allow local computations before aggregation. To ensure strong privacy guarantee, we make use of the notion of differential privacy (DP) to add Gaussian noise in each iteration. Using advanced tools of Gaussian differential privacy (GDP), we prove that the proposed algorithm satisfies the strong notion of GDP. We also establish the convergence of our algorithm. It turns out that the convergence error comes from the local computation and Gaussian noise for DP. We conduct experiments to show the merits of the proposed algorithm. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

26 pages, 3662 KiB  
Article
Unsupervised Attribute Reduction Algorithm for Mixed Data Based on Fuzzy Optimal Approximation Set
by Haotong Wen, Shixin Zhao and Meishe Liang
Mathematics 2023, 11(16), 3452; https://doi.org/10.3390/math11163452 - 9 Aug 2023
Cited by 2 | Viewed by 1075
Abstract
Fuzzy rough set theory has been successfully applied to many attribute reduction methods, in which the lower approximation set plays a pivotal role. However, the definition of lower approximation used has ignored the information conveyed by the upper approximation and the boundary region. [...] Read more.
Fuzzy rough set theory has been successfully applied to many attribute reduction methods, in which the lower approximation set plays a pivotal role. However, the definition of lower approximation used has ignored the information conveyed by the upper approximation and the boundary region. This oversight has resulted in an unreasonable relation representation of the target set. Despite the fact that scholars have proposed numerous enhancements to rough set models, such as the variable precision model, none have successfully resolved the issues inherent in the classical models. To address this limitation, this paper proposes an unsupervised attribute reduction algorithm for mixed data based on an improved optimal approximation set. Firstly, the theory of an improved optimal approximation set and its associated algorithm are proposed. Subsequently, we extend the classical theory of optimal approximation sets to fuzzy rough set theory, leading to the development of a fuzzy improved approximation set method. Finally, building on the proposed theory, we introduce a novel, fuzzy optimal approximation-set-based unsupervised attribute reduction algorithm (FOUAR). Comparative experiments conducted with all the proposed algorithms indicate the efficacy of FOUAR in selecting fewer attributes while maintaining and improving the performance of the machine learning algorithm. Furthermore, they highlight the advantage of the improved optimal approximation set algorithm, which offers higher similarity to the target set and provides a more concise expression. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

21 pages, 11025 KiB  
Article
Three-Way Co-Training with Pseudo Labels for Semi-Supervised Learning
by Liuxin Wang, Can Gao, Jie Zhou and Jiajun Wen
Mathematics 2023, 11(15), 3348; https://doi.org/10.3390/math11153348 - 31 Jul 2023
Cited by 1 | Viewed by 1129
Abstract
The theory of three-way decision has been widely utilized across various disciplines and fields as an efficient method for both knowledge reasoning and decision making. However, the application of the three-way decision theory to partially labeled data has received relatively less attention. In [...] Read more.
The theory of three-way decision has been widely utilized across various disciplines and fields as an efficient method for both knowledge reasoning and decision making. However, the application of the three-way decision theory to partially labeled data has received relatively less attention. In this study, we propose a semi-supervised co-training model based on the three-way decision and pseudo labels. We first present a simple yet effective method for producing two views by assigning pseudo labels to unlabeled data, based on which a heuristic attribute reduction algorithm is developed. The three-way decision is then combined with the concept of entropy to form co-decision rules for classifying unlabeled data into useful, uncertain, or useless samples. Finally, some useful samples are iteratively selected to improve the performance of the co-decision model. The experimental results on UCI datasets demonstrate that the proposed model outperforms other semi-supervised models, exhibiting its potential for partially labeled data. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

14 pages, 472 KiB  
Article
A Fast Algorithm for Updating Negative Concept Lattices with Increasing the Granularity Sizes of Attributes
by Junping Xie, Liuhai Zhang and Jing Yang
Mathematics 2023, 11(14), 3229; https://doi.org/10.3390/math11143229 - 22 Jul 2023
Cited by 1 | Viewed by 1017
Abstract
In this paper, firstly, we studied the relationship between negative concept lattices with increasing the granularity sizes of the attributes. Aiming to do this, negative concepts and covering relations were both classified into three types, and the sufficient and necessary conditions of distinguishing [...] Read more.
In this paper, firstly, we studied the relationship between negative concept lattices with increasing the granularity sizes of the attributes. Aiming to do this, negative concepts and covering relations were both classified into three types, and the sufficient and necessary conditions of distinguishing these kinds of negative concepts and covering relations are given, respectively. Further, based on the above analysis, an algorithm for updating negative concept lattices after the increase is proposed. Finally, the experimental results demonstrated that our algorithm performed significantly better than the direct construction algorithm. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

13 pages, 1602 KiB  
Article
Role Minimization Optimization Algorithm Based on Concept Lattice Factor
by Tao Wang and Qiang Wu
Mathematics 2023, 11(14), 3047; https://doi.org/10.3390/math11143047 - 10 Jul 2023
Cited by 1 | Viewed by 1185
Abstract
Role-based access control (RBAC) is a widely adopted security model that provides a flexible and scalable approach for managing permissions in various domains. One of the critical challenges in RBAC is the efficient assignment of roles to users while minimizing the number of [...] Read more.
Role-based access control (RBAC) is a widely adopted security model that provides a flexible and scalable approach for managing permissions in various domains. One of the critical challenges in RBAC is the efficient assignment of roles to users while minimizing the number of roles involved. This article presents a novel role minimization optimization algorithm (RMOA) based on the concept lattice factor to address this challenge. The proposed RMOA leverages the concept lattice, a mathematical structure derived from formal concept analysis, to model and analyze the relationships between roles, permissions, and users in an RBAC system. By representing the RBAC system as a concept lattice, the algorithm captures the inherent hierarchy and dependencies among roles and identifies the optimal role assignment configuration. The RMOA operates in two phases: the first phase focuses on constructing the concept lattice from the RBAC system’s role–permission–user relations, while the second phase performs an optimization process to minimize the number of roles required for the access control. It determines the concept lattice factor using the concept lattice interval to discover the minimum set of roles. The optimization process considers both the user–role assignments and the permission–role assignments, ensuring that access requirements are met while reducing role proliferation. Experimental evaluations conducted on diverse RBAC datasets demonstrate the effectiveness of the proposed algorithm. The RMOA achieves significant reductions in the number of roles compared to existing role minimization approaches, while preserving the required access permissions for users. The algorithm’s efficiency is also validated by its ability to handle large-scale RBAC systems within reasonable computational time. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

33 pages, 1479 KiB  
Article
Parallel Selector for Feature Reduction
by Zhenyu Yin, Yan Fan, Pingxin Wang and Jianjun Chen
Mathematics 2023, 11(9), 2084; https://doi.org/10.3390/math11092084 - 27 Apr 2023
Cited by 1 | Viewed by 1272
Abstract
In the field of rough set, feature reduction is a hot topic. Up to now, to better guide the explorations of this topic, various devices regarding feature reduction have been developed. Nevertheless, some challenges regarding these devices should not be ignored: (1) the [...] Read more.
In the field of rough set, feature reduction is a hot topic. Up to now, to better guide the explorations of this topic, various devices regarding feature reduction have been developed. Nevertheless, some challenges regarding these devices should not be ignored: (1) the viewpoint provided by a fixed measure is underabundant; (2) the final reduct based on single constraint is sometimes powerless to data perturbation; (3) the efficiency in deriving the final reduct is inferior. In this study, to improve the effectiveness and efficiency of feature reduction algorithms, a novel framework named parallel selector for feature reduction is reported. Firstly, the granularity of raw features is quantitatively characterized. Secondly, based on these granularity values, the raw features are sorted. Thirdly, the reordered features are evaluated again. Finally, following these two evaluations, the reordered features are divided into groups, and the features satisfying given constraints are parallel selected. Our framework can not only guide a relatively stable feature sequencing if data perturbation occurs but can also reduce time consumption for feature reduction. The experimental results over 25 UCI data sets with four different ratios of noisy labels demonstrated the superiority of our framework through a comparison with eight state-of-the-art algorithms. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

15 pages, 2039 KiB  
Article
A Random Forest-Based Method for Predicting Borehole Trajectories
by Baoyong Yan, Xiantao Zhang, Chengxu Tang, Xiao Wang, Yifei Yang and Weihua Xu
Mathematics 2023, 11(6), 1297; https://doi.org/10.3390/math11061297 - 8 Mar 2023
Cited by 3 | Viewed by 1869
Abstract
Drilling trajectory control technology for near-horizontal directional drilling in coal mines is mainly determined empirically by manual skew data, and the empirical results are only qualitative and variable, meanwhile possessing great instability and uncertainty. In order to improve the accuracy and efficiency of [...] Read more.
Drilling trajectory control technology for near-horizontal directional drilling in coal mines is mainly determined empirically by manual skew data, and the empirical results are only qualitative and variable, meanwhile possessing great instability and uncertainty. In order to improve the accuracy and efficiency of drilling trajectory prediction, this paper investigates a random forest regression-based drilling trajectory prediction method after applying numerous machine learning methods to experimental data for comparison. In the selection of prediction features, this paper replaces all feature variables of the borehole trajectory with feature values with higher cumulative influence weights, and screens out three variables with high importance, azimuth, inclination and bend at the present moment of the drill, as the input variables of the model, and the increments in the borehole in a horizontal direction, left and right direction, and up and down direction at the present moment and the next moment as the output variables of the model. In the model training, the model in this paper uses the bootstrap self-service method resampling technique to collect training sample data, constructs a random forest regression model, and takes the mean value of the decision tree output as the result of the borehole trajectory prediction. To further improve the model accuracy, the paper improves the prediction performance of the model by adjusting the parameters of the random forest model such as tree, depth, minimum sample of leaf nodes, minimum sample number of internal node division, etc. The model is also evaluated by using common machine learning evaluation metrics R2 score, RAE, RMSE and MSE. The experimental results show that the average absolute error of the model drops to 1% on the prediction of the horizontal direction and up and down direction; the average absolute error of the model drops to 9% on the prediction of the left and right direction, and this error rate reaches the error requirement in the actual construction process, so the model can effectively improve the prediction accuracy of borehole trajectory while improving the safety of directional construction. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

19 pages, 2664 KiB  
Article
Association Rule Mining through Combining Hybrid Water Wave Optimization Algorithm with Levy Flight
by Qiyi He, Jin Tu, Zhiwei Ye, Mingwei Wang, Ye Cao, Xianjing Zhou and Wanfang Bai
Mathematics 2023, 11(5), 1195; https://doi.org/10.3390/math11051195 - 28 Feb 2023
Cited by 4 | Viewed by 1552
Abstract
Association rule mining (ARM) is one of the most important tasks in data mining. In recent years, swarm intelligence algorithms have been effectively applied to ARM, and the main challenge has been to achieve a balance between search efficiency and the quality of [...] Read more.
Association rule mining (ARM) is one of the most important tasks in data mining. In recent years, swarm intelligence algorithms have been effectively applied to ARM, and the main challenge has been to achieve a balance between search efficiency and the quality of the mined rules. As a novel swarm intelligence algorithm, the water wave optimization (WWO) algorithm has been widely used for combinatorial optimization problems, with the disadvantage that it tends to fall into local optimum solutions and converges slowly. In this paper, a novel hybrid ARM method based on WWO with Levy flight (LWWO) is proposed. The proposed method improves the solution of WWO by expanding the search space through Levy flight while effectively increasing the search speed. In addition, this paper employs the hybrid strategy to enhance the diversity of the population in order to obtain the global optimal solution. Moreover, the proposed ARM method does not generate frequent items, unlike traditional algorithms (e.g., Apriori), thus reducing the computational overhead and saving memory space, which increases its applicability in real-world business cases. Experiment results show that the performance of the proposed hybrid algorithms is significantly better than that of the WWO and LWWO in terms of quality and number of mined rules. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

10 pages, 1976 KiB  
Article
Demand Forecasting of Spare Parts Using Artificial Intelligence: A Case Study of K-X Tanks
by Jae-Dong Kim, Tae-Hyeong Kim and Sung Won Han
Mathematics 2023, 11(3), 501; https://doi.org/10.3390/math11030501 - 17 Jan 2023
Cited by 4 | Viewed by 5655
Abstract
The proportion of the inventory range associated with spare parts is often considered in the industrial context. Therefore, even minor improvements in forecasting the demand for spare parts can lead to substantial cost savings. Despite notable research efforts, demand forecasting remains challenging, especially [...] Read more.
The proportion of the inventory range associated with spare parts is often considered in the industrial context. Therefore, even minor improvements in forecasting the demand for spare parts can lead to substantial cost savings. Despite notable research efforts, demand forecasting remains challenging, especially in areas with irregular demand patterns, such as military logistics. Thus, an advanced model for accurately forecasting this demand was developed in this study. The K-X tank is one of the Republic of Korea Army’s third generation main battle tanks. Data about the spare part consumption of 1,053,422 transactional data points stored in a military logistics management system were obtained. Demand forecasting classification models were developed to exploit machine learning, stacked generalization, and time series as baseline methods. Additionally, various stacked generalizations were established in spare part demand forecasting. The results demonstrated that a suitable selection of methods could help enhance the performance of the forecasting models in this domain. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

30 pages, 21240 KiB  
Article
A Semisupervised Concept Drift Adaptation via Prototype-Based Manifold Regularization Approach with Knowledge Transfer
by Muhammad Zafran Muhammad Zaly Shah, Anazida Zainal, Taiseer Abdalla Elfadil Eisa, Hashim Albasheer and Fuad A. Ghaleb
Mathematics 2023, 11(2), 355; https://doi.org/10.3390/math11020355 - 9 Jan 2023
Viewed by 1898
Abstract
Data stream mining deals with processing large amounts of data in nonstationary environments, where the relationship between the data and the labels often changes. Such dynamic relationships make it difficult to design a computationally efficient data stream processing algorithm that is also adaptable [...] Read more.
Data stream mining deals with processing large amounts of data in nonstationary environments, where the relationship between the data and the labels often changes. Such dynamic relationships make it difficult to design a computationally efficient data stream processing algorithm that is also adaptable to the nonstationarity of the environment. To make the algorithm adaptable to the nonstationarity of the environment, concept drift detectors are attached to detect the changes in the environment by monitoring the error rates and adapting to the environment’s current state. Unfortunately, current approaches to adapt to environmental changes assume that the data stream is fully labeled. Assuming a fully labeled data stream is a flawed assumption as the labeling effort would be too impractical due to the rapid arrival and volume of the data. To address this issue, this study proposes to detect concept drift by anticipating a possible change in the true label in the high confidence prediction region. This study also proposes an ensemble-based concept drift adaptation approach that transfers reliable classifiers to the new concept. The significance of our proposed approach compared to the current baselines is that our approach does not use a performance measur as the drift signal or assume a change in data distribution when concept drift occurs. As a result, our proposed approach can detect concept drift when labeled data are scarce, even when the data distribution remains static. Based on the results, this proposed approach can detect concept drifts and fully supervised data stream mining approaches and performs well on mixed-severity concept drift datasets. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

13 pages, 1570 KiB  
Article
A Trie Based Set Similarity Query Algorithm
by Lianyin Jia, Junzhuo Tang, Mengjuan Li, Runxin Li, Jiaman Ding and Yinong Chen
Mathematics 2023, 11(1), 229; https://doi.org/10.3390/math11010229 - 2 Jan 2023
Cited by 1 | Viewed by 1828
Abstract
Set similarity query is a primitive for many applications, such as data integration, data cleaning, and gene sequence alignment. Most of the existing algorithms are inverted index based, they usually filter unqualified sets one by one and do not have sufficient support for [...] Read more.
Set similarity query is a primitive for many applications, such as data integration, data cleaning, and gene sequence alignment. Most of the existing algorithms are inverted index based, they usually filter unqualified sets one by one and do not have sufficient support for duplicated sets, thus leading to low efficiency. To solve this problem, this paper designs T-starTrie, an efficient trie based index for set similarity query, which can naturally group sets with the same prefix into one node, and can filter all sets corresponding to the node at a time, thereby significantly improving the candidates generation efficiency. In this paper, we find that the set similarity query problem can be transformed into matching nodes of the first-layer (FMNodes) detecting problem on T-starTrie. Therefore, an efficient FLMNode detection algorithm is designed. Based on this, an efficient set similarity query algorithm, TT-SSQ, is implemented by developing a variety of filtering techniques. Experimental results show that TT-SSQ can be up to 3.10x faster than existing algorithms. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

24 pages, 1153 KiB  
Article
A Hybrid Encryption Scheme for Quantum Secure Video Conferencing Combined with Blockchain
by Dexin Zhu, Jun Zheng, Hu Zhou, Jianan Wu, Nianfeng Li and Lijun Song
Mathematics 2022, 10(17), 3037; https://doi.org/10.3390/math10173037 - 23 Aug 2022
Cited by 10 | Viewed by 2195
Abstract
Traditional video conference systems depend largely on computational complexity to ensure system security, but with the development of high-performance computers, the existing encryption system will be seriously threatened. To solve this problem, a hybrid encryption scheme for quantum secure video conferencing combined with [...] Read more.
Traditional video conference systems depend largely on computational complexity to ensure system security, but with the development of high-performance computers, the existing encryption system will be seriously threatened. To solve this problem, a hybrid encryption scheme for quantum secure video conferencing combined with blockchain is proposed in this study. In the system solution architecture, first, the quantum key distribution network is embedded in the classic network; then, the “classical + quantum” hybrid encryption scheme is designed according to the secret level required for the video conference content. Besides, the real-time monitoring module of the quantum key distribution network is designed to ensure that users can check the running state of the network at any time. Meeting minutes can be shared by combining with blockchain. In order to quickly query meeting minutes, a cache-efficient query method based on B+ tree is proposed. The experimental results show that compared with the traditional video conference system, the quantum secure video conference system sufficiently integrates the technical advantages of the quantum key distribution to resist the security threats such as channel eavesdropping and high-performance computational attacks while ensuring the stable operation of the classic system, thus providing a video conference system with a higher security level. Meanwhile, the query time cost of blockchain with different lengths is tested, and the query efficiency of the proposed method is 3.15-times higher than the original query efficiency of blockchain. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

12 pages, 2420 KiB  
Article
Developing a New Collision-Resistant Hashing Algorithm
by Larissa V. Cherckesova, Olga A. Safaryan, Nikita G. Lyashenko and Denis A. Korochentsev
Mathematics 2022, 10(15), 2769; https://doi.org/10.3390/math10152769 - 4 Aug 2022
Cited by 6 | Viewed by 5368
Abstract
Today, cryptographic hash functions have numerous applications in different areas. At the same time, new collision attacks have been developed recently, making some widely used algorithms like SHA-1 vulnerable and unreliable. This article is aiming at the development of a new hashing algorithm [...] Read more.
Today, cryptographic hash functions have numerous applications in different areas. At the same time, new collision attacks have been developed recently, making some widely used algorithms like SHA-1 vulnerable and unreliable. This article is aiming at the development of a new hashing algorithm that will be resistant to all cryptographic attacks, including quantum collision attacks that potentially pose a threat to some widely used cryptographic hash functions. This algorithm was called Nik-512. The avalanche effect is tested to ensure the cryptographic strength of the developed algorithm. The Nik-512 function is then applied to build a data integrity system which can be used to protect data from malicious users. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

15 pages, 5259 KiB  
Article
A Parallel Convolution and Decision Fusion-Based Flower Classification Method
by Lianyin Jia, Hongsong Zhai, Xiaohui Yuan, Ying Jiang and Jiaman Ding
Mathematics 2022, 10(15), 2767; https://doi.org/10.3390/math10152767 - 4 Aug 2022
Cited by 3 | Viewed by 2069
Abstract
Flower classification is of great significance to the fields of plants, food, and medicine. However, due to the inherent inter-class similarity and intra-class differences of flowers, it is a difficult task to accurately classify them. To this end, this paper proposes a novel [...] Read more.
Flower classification is of great significance to the fields of plants, food, and medicine. However, due to the inherent inter-class similarity and intra-class differences of flowers, it is a difficult task to accurately classify them. To this end, this paper proposes a novel flower classification method that combines enhanced VGG16 (E-VGG16) with decision fusion. Firstly, facing the shortcomings of the VGG16, an enhanced E-VGG16 is proposed. E-VGG16 introduces a parallel convolution block designed in this paper on VGG16 combined with several other optimizations to improve the quality of extracted features. Secondly, considering the limited decision-making ability of a single E-VGG16 variant, parallel convolutional blocks are embedded in different positions of E-VGG16 to obtain multiple E-VGG16 variants. By introducing information entropy to fuse multiple E-VGG16 variants for decision-making, the classification accuracy is further improved. The experimental results on the Oxford Flower102 and Oxford Flower17 public datasets show that the classification accuracy of our method reaches 97.69% and 98.38%, respectively, which significantly outperforms the state-of-the-art methods. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

21 pages, 3394 KiB  
Article
Advertising Decisions of Platform Supply Chains Considering Network Externalities and Fairness Concerns
by Liang Shen, Fei Lin, Yuyan Wang, Xin Su, Hua Li and Rui Zhou
Mathematics 2022, 10(13), 2359; https://doi.org/10.3390/math10132359 - 5 Jul 2022
Cited by 6 | Viewed by 2058
Abstract
With the popularization of platform economics, many manufacturers are shifting their operations from offline to online, forming platform supply chains (PSCs), which combine e-commerce with supply chain management. To study the influences of network externalities and fairness concerns on advertising strategies of the [...] Read more.
With the popularization of platform economics, many manufacturers are shifting their operations from offline to online, forming platform supply chains (PSCs), which combine e-commerce with supply chain management. To study the influences of network externalities and fairness concerns on advertising strategies of the platform supply chain (PSC), we construct decentralized decision-making models, with and without fairness concerns. Then, we solve the optimal decisions regarding PSC and use numerical examples to verify the conclusions of the decision models. We further analyze the internal influences of advertising strategies on network externalities in the extended model. We find that the network externalities are beneficial to the PSC system, but the manufacturer’s fairness concerns are not beneficial to the PSC. The advertising strategies of the network platform are not affected by network externalities and fairness concerns. In the extended model, the manufacturer can obtain more profits, but the network platform yields less profit than the decentralized model without fairness concerns. Moreover, the more sensitive the network externalities are to the change in advertising strategies, the greater the profits for the PSC members. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

24 pages, 1360 KiB  
Article
A Novel Method for Decision Making by Double-Quantitative Rough Sets in Hesitant Fuzzy Systems
by Xiaoyan Zhang and Qian Yang
Mathematics 2022, 10(12), 2069; https://doi.org/10.3390/math10122069 - 15 Jun 2022
Cited by 1 | Viewed by 1631
Abstract
In some complex decision-making issues such as economy, management, and social development, decision makers are often hesitant to reach a consensus on the decision-making results due to different goals. How to reduce the influence of decision makers’ subjective arbitrariness on decision results is [...] Read more.
In some complex decision-making issues such as economy, management, and social development, decision makers are often hesitant to reach a consensus on the decision-making results due to different goals. How to reduce the influence of decision makers’ subjective arbitrariness on decision results is an inevitable task in decision analysis. Following the principle of improving the fault-tolerance capability, this paper firstly proposes the graded and the variable precision rough set approaches from a single-quantitative decision-making view in a hesitant fuzzy environment (HFEn). Moreover, in order to improve the excessive overlap caused by the high concentration of single quantization, we propose two kinds of double-quantitative decision-making methods by cross-considering relative quantitative information and absolute quantitative information. The proposal of this method not only solves the fuzzy system problem of people’s hesitation in the decision-making process, but also greatly enhances the fault-tolerant ability of the model in application. Finally, we further compare the approximation process and decision results of the single-quantitative models and the double-quantitative models, and explore some basic properties and corresponding decision rules of these models. Meanwhile, we introduce a practical example of housing purchase to expound and verify these theories, which shows that the application value of these theories is impressive. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

14 pages, 509 KiB  
Article
On the Suitability of Bagging-Based Ensembles with Borderline Label Noise
by José A. Sáez and José L. Romero-Béjar
Mathematics 2022, 10(11), 1892; https://doi.org/10.3390/math10111892 - 1 Jun 2022
Cited by 1 | Viewed by 1682
Abstract
Real-world classification data usually contain noise, which can affect the accuracy of the models and their complexity. In this context, an interesting approach to reduce the effects of noise is building ensembles of classifiers, which traditionally have been credited with the ability to [...] Read more.
Real-world classification data usually contain noise, which can affect the accuracy of the models and their complexity. In this context, an interesting approach to reduce the effects of noise is building ensembles of classifiers, which traditionally have been credited with the ability to tackle difficult problems. Among the alternatives to build ensembles with noisy data, bagging has shown some potential in the specialized literature. However, existing works in this field are limited and only focus on the study of noise based on a random mislabeling, which is unlikely to occur in real-world applications. Recent research shows that other types of noise, such as that occurring at class boundaries, are more common and challenging for classification algorithms. This paper delves into the analysis of the usage of bagging techniques in these complex problems, in which noise affects the decision boundaries among classes. In order to investigate whether bagging is able to reduce the impact of borderline noise, an experimental study is carried out considering a large number of datasets with different noise levels, and several noise models and classification algorithms. The results obtained reflect that bagging obtains a better accuracy and robustness than the individual models with this complex type of noise. The highest improvements in average accuracy are around 2–4% and are generally found at medium-high noise levels (from 15–20% onwards). The partial consideration of noisy samples when creating the subsamples from the original training set in bagging can make it so that only some parts of the decision boundaries among classes are impaired when building each model, reducing the impact of noise in the global system. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

23 pages, 37701 KiB  
Article
Robust Multi-Label Classification with Enhanced Global and Local Label Correlation
by Tianna Zhao, Yuanjian Zhang and Witold Pedrycz
Mathematics 2022, 10(11), 1871; https://doi.org/10.3390/math10111871 - 30 May 2022
Cited by 4 | Viewed by 2410
Abstract
Data representation is of significant importance in minimizing multi-label ambiguity. While most researchers intensively investigate label correlation, the research on enhancing model robustness is preliminary. Low-quality data is one of the main reasons that model robustness degrades. Aiming at the cases with noisy [...] Read more.
Data representation is of significant importance in minimizing multi-label ambiguity. While most researchers intensively investigate label correlation, the research on enhancing model robustness is preliminary. Low-quality data is one of the main reasons that model robustness degrades. Aiming at the cases with noisy features and missing labels, we develop a novel method called robust global and local label correlation (RGLC). In this model, subspace learning reconstructs intrinsic latent features immune from feature noise. The manifold learning ensures that outputs obtained by matrix factorization are similar in the low-rank latent label if the latent features are similar. We examine the co-occurrence of global and local label correlation with the constructed latent features and the latent labels. Extensive experiments demonstrate that the classification performance with integrated information is statistically superior over a collection of state-of-the-art approaches across numerous domains. Additionally, the proposed model shows promising performance on multi-label when noisy features and missing labels occur, demonstrating the robustness of multi-label classification. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

22 pages, 441 KiB  
Article
An Improved Three-Way Clustering Based on Ensemble Strategy
by Tingfeng Wu, Jiachen Fan and Pingxin Wang
Mathematics 2022, 10(9), 1457; https://doi.org/10.3390/math10091457 - 26 Apr 2022
Cited by 18 | Viewed by 2210
Abstract
As a powerful data analysis technique, clustering plays an important role in data mining. Traditional hard clustering uses one set with a crisp boundary to represent a cluster, which cannot solve the problem of inaccurate decision-making caused by inaccurate information or insufficient data. [...] Read more.
As a powerful data analysis technique, clustering plays an important role in data mining. Traditional hard clustering uses one set with a crisp boundary to represent a cluster, which cannot solve the problem of inaccurate decision-making caused by inaccurate information or insufficient data. In order to solve this problem, three-way clustering was presented to show the uncertainty information in the dataset by adding the concept of fringe region. In this paper, we present an improved three-way clustering algorithm based on an ensemble strategy. Different to the existing clustering ensemble methods by using various clustering algorithms to produce the base clustering results, the proposed algorithm randomly extracts a feature subset of samples and uses the traditional clustering algorithm to obtain the diverse base clustering results. Based on the base clustering results, labels matching is used to align all clustering results in a given order and voting method is used to obtain the core region and the fringe region of the three way clustering. The proposed algorithm can be applied on the top of any existing hard clustering algorithm to generate the base clustering results. As examples for demonstration, we apply the proposed algorithm on the top of K-means and spectral clustering, respectively. The experimental results show that the proposed algorithm is effective in revealing cluster structures. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

21 pages, 685 KiB  
Article
A Quantum Language-Inspired Tree Structural Text Representation for Semantic Analysis
by Yan Yu, Dong Qiu and Ruiteng Yan
Mathematics 2022, 10(6), 914; https://doi.org/10.3390/math10060914 - 13 Mar 2022
Viewed by 2297
Abstract
Text representation is an important topic in the field of natural language processing, which can effectively transfer knowledge to downstream tasks. To extract effective semantic information from text with unsupervised methods, this paper proposes a quantum language-inspired tree structural text representation model to [...] Read more.
Text representation is an important topic in the field of natural language processing, which can effectively transfer knowledge to downstream tasks. To extract effective semantic information from text with unsupervised methods, this paper proposes a quantum language-inspired tree structural text representation model to study the correlations between words with variable distance for semantic analysis. Combining the different semantic contributions of associated words in different syntax trees, a syntax tree-based attention mechanism is established to highlight the semantic contributions of non-adjacent associated words and weaken the semantic weight of adjacent non-associated words. Moreover, the tree-based attention mechanism includes not only the overall information of entangled words in the dictionary but also the local grammatical structure of word combinations in different sentences. Experimental results on semantic textual similarity tasks show that the proposed method obtains significant performances over the state-of-the-art sentence embeddings. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

24 pages, 3447 KiB  
Article
Research on the Path of Policy Financing Guarantee to Promote SMEs’ Green Technology Innovation
by Ruzhi Xu, Tingting Guo and Huawei Zhao
Mathematics 2022, 10(4), 642; https://doi.org/10.3390/math10040642 - 18 Feb 2022
Cited by 15 | Viewed by 3263
Abstract
In the process of policy financing guaranteeing help to SMEs to make innovations in green technologies, multiple parties continue to play strategic games for their interests. Evolutionary game theory is a practical tool for analyzing multi-agent strategies, which can help us to explore [...] Read more.
In the process of policy financing guaranteeing help to SMEs to make innovations in green technologies, multiple parties continue to play strategic games for their interests. Evolutionary game theory is a practical tool for analyzing multi-agent strategies, which can help us to explore how policy financing guarantees help to SMEs to achieve effective credit enhancement. This paper constructs a four-party evolutionary game model among SMEs, banks, guarantee agencies, and the government, and obtains four evolutionary stable strategies by analyzing various players’ replicator dynamics. In addition, we carry out numerical simulations on the key parameters affecting the stability of the game system. The findings suggest that keeping the fixed risk-ratio between guarantee agencies and banks constant reduces the government’s financial burden and strengthens the re-guarantee system’s construction at the initial stage of SME financing, which can indirectly increase the enthusiasm for cooperation between banks and guarantee agencies. The interest subsidy policy is more effective in promoting SMEs’ compliance and bank–guarantee cooperation in the short term. Meanwhile, the government should increase the supervision of defaulting SMEs and cooperate with financial institutions to improve the credit system for SMEs. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

20 pages, 389 KiB  
Article
Beam-Influenced Attribute Selector for Producing Stable Reduct
by Wangwang Yan, Jing Ba, Taihua Xu, Hualong Yu, Jinlong Shi and Bin Han
Mathematics 2022, 10(4), 553; https://doi.org/10.3390/math10040553 - 11 Feb 2022
Cited by 7 | Viewed by 1615
Abstract
Attribute reduction is a critical topic in the field of rough set theory. Currently, to further enhance the stability of the derived reduct, various attribute selectors are designed based on the framework of ensemble selectors. Nevertheless, it must be pointed out that some [...] Read more.
Attribute reduction is a critical topic in the field of rough set theory. Currently, to further enhance the stability of the derived reduct, various attribute selectors are designed based on the framework of ensemble selectors. Nevertheless, it must be pointed out that some limitations are concealed in these selectors: (1) rely heavily on the distribution of samples; (2) rely heavily on the optimal attribute. To generate the reduct with higher stability, a novel beam-influenced selector (BIS) is designed based on the strategies of random partition and beam. The scientific novelty of our selector can be divided into two aspects: (1) randomly partition samples without considering the distribution of samples; (2) beam-based selections of features can save the selector from the dependency of the optimal attribute. Comprehensive experiments using 16 UCI data sets show the following: (1) the stability of the derived reducts may be significantly enhanced by using our selector; (2) the reducts generated based on the proposed selector can provide competent performance in classification tasks. Full article
(This article belongs to the Special Issue Data Mining: Analysis and Applications)
Show Figures

Figure 1

Back to TopTop