FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems
Abstract
:1. Introduction
- (a)
- To determine whether a dataset is reducible or not by means of a uniform stratified sub-sampling.
- (b)
- To estimate the maximum possible reduction percentage and the most important features.
2. Data Reduction in Big Data
2.1. Big Data Design Approaches
2.2. Data Reduction
2.3. Data Reduction Meets Big Data
3. FDR-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems
3.1. FDR-BD Description and Workflow
- First, the ability to operate automatically by means of a hyper-parametrization process, which is based on an ordered descendent list of suggested reduction percentages. To do so, it works over a dataset following a k-fold cross validation scheme in order to ensure all data are analyzed.
- Second, the possibility to achieve the maximum level of data reduction by using thresholds of acceptable loss of predictive performance. Both the list of the percentages (redPerc list) and the predictive loss threshold (PLT) influence the ending conditions of the hyper-parametrization process, and they can be set by the data expert user.
- Finally, the capability of being a scalable methodology able to deal with big data in reasonable times by taking advantage of parallel computing. To achieve this, the technical implementation details are described in Section 3.2.
- First, an initial reduction regarding the dimensionality (vertical or column-wise reduction) can be achieved through the selection of the most important features.
- Second, if the FS generates redundant instances, i.e., exact duplicated “rows”, they are removed (horizontal or row-wise reduction).
- Third, the main process is carried out by means of a hyper-parametrization with the aim to decide what percentage of instances could be removed while maintaining a desired quality threshold. From the resultant decision, a final and intensive vertical reduction can be carried out.
Algorithm 1 Pseudo-code showing the use of FDR-BD for data reduction. |
Require: data, redPercList, GMbsl, PLT, k, aggCriterion 1: 2: (training, test) ← split data 3: tra ← select the recommFeats features from training 4: condensedTra ← apply UniformReduction over tra with RPR 5: model ← train the classifier using RPR 6: results ← classify test from model |
3.2. Technical Implementation Highlights
4. Experimental Environment
4.1. Datasets’ Description
4.2. Comparison Methods, Classifier, and Parameters
4.3. Evaluation Metrics: Condensation and Predictive Performance
- Dimensionality reduction percentage (DimRed): number of features selected with respect to the original amount in the training data.
- Size reduction percentage (SizeRed): number of remaining examples with respect to the original data volume.
4.4. Scalability Evaluation
4.5. Infrastructure
5. Experimental Results
5.1. Data Volume Reduction Study
5.2. The Influence of FS in the Data Volume Downsizing
5.3. Data Condensation Details and Performance Evaluation
5.4. Scalability Evaluation
6. Concluding Remarks and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
MIFs | Most important features |
RPR | Recommended percentage reduction |
PLT | Prediction loss threshold |
References
- Wu, X.; Zhu, X.; Wu, G.; Ding, W. Data mining with big data. IEEE Trans. Knowl. Data Eng. 2014, 26, 97–107. [Google Scholar] [CrossRef]
- Lu, Y. Industry 4.0: A survey on technologies, applications and open research issues. J. Ind. Inf. Integr. 2017, 6, 1–10. [Google Scholar] [CrossRef]
- Bousdekis, A.; Lepenioti, K.; Apostolou, D.; Mentzas, G. A Review of Data-Driven Decision-Making Methods for Industry 4.0 Maintenance Applications. Electronics 2021, 10, 828. [Google Scholar] [CrossRef]
- Gandomi, A.; Haider, M. Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manag. 2015, 35, 137–144. [Google Scholar] [CrossRef] [Green Version]
- Rashid, A.N.M.B. Access methods for Big Data: Current status and future directions. EAI Endorsed Trans. Scalable Inf. Syst. 2018, 4. [Google Scholar] [CrossRef] [Green Version]
- Osman, A. A novel big data analytics framework for smart cities. Future Gener. Comput. Syst. 2019, 91, 620–633. [Google Scholar] [CrossRef]
- Bruni, R.; Bianchi, G. Effective Classification Using a Small Training Set Based on Discretization and Statistical Analysis. IEEE Trans. Knowl. Data Eng. 2015, 27, 2349–2361. [Google Scholar] [CrossRef]
- Bruni, R.; Bianchi, G. Website categorization: A formal approach and robustness analysis in the case of e-commerce detection. Expert Syst. Appl. 2020, 142, 113001. [Google Scholar] [CrossRef] [Green Version]
- Maillo, J.; Triguero, I.; Herrera, F. Redundancy and Complexity Metrics for Big Data Classification: Towards Smart Data. IEEE Access 2020, 8, 87918–87928. [Google Scholar] [CrossRef]
- Acampora, G.; Herrera, F.; Tortora, G.; Vitiello, A. A multi-objective evolutionary approach to training set selection for support vector machine. Knowl. Based Syst. 2018, 147, 94–108. [Google Scholar] [CrossRef]
- Zaharia, M.; Xin, R.; Wendell, P.; Das, T.; Armbrust, M.; Dave, A.; Meng, X.; Rosen, J.; Venkataraman, S.; Franklin, M.; et al. Apache spark: A unified engine for big data processing. Commun. ACM 2016, 59, 56–65. [Google Scholar] [CrossRef]
- Liu, H.; Motoda, H. (Eds.) Computational Methods of Feature Selection; Chapman & Hall/CRC: Boca Raton, FL, USA, 2007; p. 419. [Google Scholar]
- Liu, H.; Motoda, H. (Eds.) Instance Selection and Construction for Data Mining; Springer: Boston, MA, USA, 2001. [Google Scholar] [CrossRef]
- Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
- Rostami, M.; Berahmand, K.; Nasiri, E.; Forouzandeh, S. Review of swarm intelligence-based feature selection methods. Eng. Appl. Artif. Intell. 2021, 100, 104210. [Google Scholar] [CrossRef]
- Hariri, R.H.; Fredericks, E.M.; Bowers, K.M. Uncertainty in big data analytics: Survey, opportunities, and challenges. J. Big Data 2019, 6, 44. [Google Scholar] [CrossRef] [Green Version]
- Luengo, J.; García-Gil, D.; Ramírez-Gallego, S.; López, S.G.; Herrera, F. Big Data Preprocessing: Enabling Smart Data; Springer International Publishing: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
- Triguero, I.; Derrac, J.; Garcia, S.; Herrera, F. A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2012, 42, 86–100. [Google Scholar] [CrossRef]
- Garcia, S.; Derrac, J.; Cano, J.R.; Herrera, F. Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 417–435. [Google Scholar] [CrossRef] [PubMed]
- Pérez-Rodríguez, J.; Arroyo-Peña, A.G.; García-Pedrajas, N. Simultaneous instance and feature selection and weighting using evolutionary computation: Proposal and study. Appl. Soft Comput. 2015, 37, 416–443. [Google Scholar] [CrossRef]
- Das, S.; Datta, S.; Chaudhuri, B.B. Handling data irregularities in classification: Foundations, trends, and future challenges. Pattern Recognit. 2018, 81, 674–693. [Google Scholar] [CrossRef]
- Meng, X.; Bradley, J.; Yavuz, B.; Sparks, E.; Venkataraman, S.; Liu, D.; Freeman, J.; Tsai, D.; Amde, M.; Owen, S.; et al. MLlib: Machine Learning in Apache Spark. J. Mach. Learn. Res. 2016, 17, 1–7. [Google Scholar]
- Dean, J.; Ghemawat, S. MapReduce: Simplified data processing on large clusters. Commun. ACM 2008, 51, 107–113. [Google Scholar] [CrossRef]
- Ramírez-Gallego, S.; Fernández, A.; García, S.; Chen, M.; Herrera, F. Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce. Inf. Fusion 2018, 42, 51–61. [Google Scholar] [CrossRef]
- Angiulli, F. Fast Nearest Neighbor Condensation for Large Data Sets Classification. IEEE Trans. Knowl. Data Eng. 2007, 19, 1450–1464. [Google Scholar] [CrossRef] [Green Version]
- García, S.; Cano, J.R.; Herrera, F. A memetic algorithm for evolutionary prototype selection: A scaling up approach. Pattern Recognit. 2008, 41, 2693–2709. [Google Scholar] [CrossRef] [Green Version]
- Skalak, D.B. Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms. In Proceedings of the Eleventh International Conference, New Brunswick, NJ, USA, 10–13 July 1994; pp. 293–301. [Google Scholar]
- García-Osorio, C.; de Haro-García, A.; García-Pedrajas, N. Democratic instance selection: A linear complexity instance selection algorithm based on classifier ensemble concepts. Artif. Intell. 2010, 174, 410–441. [Google Scholar] [CrossRef] [Green Version]
- Liu, H.; Motoda, H. Feature Selection for Knowledge Discovery and Data Mining; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1998; Volume 454, p. 214. [Google Scholar]
- Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
- Saeys, Y.; Abeel, T.; Van de Peer, Y. Robust Feature Selection Using Ensemble Feature Selection Techniques. In Machine Learning and Knowledge Discovery in Databases; Daelemans, W., Goethals, B., Morik, K., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2008; Volume 5212, pp. 313–325. [Google Scholar] [CrossRef] [Green Version]
- Resende, P.A.A.; Drummond, A.C. A Survey of Random Forest Based Methods for Intrusion Detection Systems. ACM Comput. Surv. 2018, 51, 1–36. [Google Scholar] [CrossRef]
- García-Gil, D.; Alcalde-Barros, A.; Luengo, J.; García, S.; Herrera, F. Big Data Preprocessing as the Bridge between Big Data and Smart Data: BigDaPSpark and BigDaPFlink Libraries. In Proceedings of the 4th International Conference on Internet of Things, Big Data and Security, Heraklion, Greece, 2–4 May 2019; pp. 324–331. [Google Scholar] [CrossRef]
- Triguero, I.; García, S.; Herrera, F. Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification. Pattern Recognit. 2011, 44, 901–916. [Google Scholar] [CrossRef] [Green Version]
- Triguero, I.; Peralta, D.; Bacardit, J.; García, S.; Herrera, F. MRPR: A MapReduce solution for prototype reduction in big data classification. Neurocomputing 2015, 150, 331–345. [Google Scholar] [CrossRef] [Green Version]
- Arnaiz-González, A.; González-Rogel, A.; Díez-Pastor, J.F.; López-Nozal, C. MR-DIS: Democratic instance selection for big data by MapReduce. Prog. Artif. Intell. 2017, 6, 211–219. [Google Scholar] [CrossRef]
- Triguero, I.; García-Gil, D.; Maillo, J.; Luengo, J.; García, S.; Herrera, F. Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1289. [Google Scholar] [CrossRef]
- Ramirez-Gallego, S.; Mourino-Talin, H.; Martinez-Rego, D.; Bolon-Canedo, V.; Benitez, J.M.; Alonso-Betanzos, A.; Herrera, F. An Information Theory-Based Feature Selection Framework for Big Data Under Apache Spark. IEEE Trans. Syst. Man Cybern. Syst. 2018, 48, 1441–1453. [Google Scholar] [CrossRef]
- Fernandez, A.; del Rio, S.; Chawla, N.V.; Herrera, F. An Insight into Imbalanced Big Data Classification: Outcomes and Challenges. Complex Intell. Syst. 2017, 3, 105–120. [Google Scholar] [CrossRef] [Green Version]
- Lichman, M. UCI Machine Learning Repository. 2017. Available online: http://archive.ics.uci.edu/ml (accessed on 10 March 2021).
- Vanschoren, J.; van Rijn, J.N.; Bischl, B.; Torgo, L. OpenML: Networked Science in Machine Learning. SIGKDD Explor. 2013, 15, 49–60. [Google Scholar] [CrossRef] [Green Version]
- Kaggle Team. Kaggle—Datasets. 2021. Available online: https://www.kaggle.com/datasets (accessed on 10 March 2021).
- Quinlan, J.R. C4.5: Programs for Machine Learning; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1993. [Google Scholar]
- Barandela, R.; Sánchez, J.S.; García, V.; Rangel, E. Strategies for learning in class imbalance problems. Pattern Recognit. 2003, 36, 849–851. [Google Scholar] [CrossRef]
Spark Operation | Description |
---|---|
map | Performs a transformation to each element of a data structure |
filter | Selects all data structure elements that satisfy a predicate |
sample | Takes a sample of a set of data |
union | Combines two Spark data structure row-wise |
kFold | Splits data into k pairs of training and validation sets, following a Bernoulli sampling |
dropDuplicates | Eliminates duplicated rows of a dataframe |
RandomForestClassifier | Are distributed learning algorithms for data classification |
DecisionTreeClassifier |
Dataset | #Ex. | #Atts. | #(Class0; Class1) | %(Class0; Class1) | Co/Di/Ca | Size (MB) |
---|---|---|---|---|---|---|
Agrawal | 1,000,000 | 9 | (672,044; 327,955) | (67.2; 32.8) | 4/2/3 | 71.6 |
airlines | 539,383 | 7 | (299,119; 240,264) | (55.46; 44.54) | 4/0/3 | 18.3 |
BNG_Australian | 1,000,000 | 14 | (573,051; 426,949) | (57.31; 42.69) | 14/0/0 | 80.8 |
BNG_heart | 1,000,000 | 13 | (555,946; 444,054) | (55.59; 44.41) | 6/0/7 | 65.7 |
census | 140,246 | 41 | (132,085; 8162) | (94.18; 5.82) | 1/12/28 | 68.7 |
click_prediction | 1,963,972 | 11 | (1,636,593; 327,379) | (83.33; 16.67) | 0/11/0 | 136.3 |
covtype1 | 581,012 | 54 | (369,307; 211,705) | (63.56; 36.44) | 10/0/44 | 71.7 |
covtype1_vs_2 | 495,173 | 54 | (283,468; 211,705) | (57.25; 42.75) | 10/0/44 | 61.1 |
covtype2 | 581,012 | 54 | (297,544; 283,468) | (51.21; 48.79) | 10/0/44 | 71.7 |
creditCard | 284,015 | 30 | (283,540; 480) | (99.83; 0.17) | 28/1/0 | 145.2 |
ECBDL14-10mill-90 | 12,000,000 | 90 | (11,760,000; 240,000) | (98; 2) | 60/0/30 | 3481.6 |
ethylene_ECO_E_LH | 4,208,261 | 16 | (3,895,861; 312,400) | (92.58; 7.42) | 16/0/0 | 517 |
ethylene_EM_E_LH | 4,178,500 | 16 | (3,840,313; 338,191) | (91.91; 8.09) | 16/0/0 | 518.7 |
fars_fatal | 100,968 | 29 | (58,852; 42,116) | (58.29; 41.71) | 0/1/28 | 59 |
HEPMASS_IR_16 | 5,578,255 | 28 | (5,250,122; 328,133) | (94.12; 5.88) | 28/0/0 | 1331.2 |
higgs | 11,000,000 | 28 | (5,827,686; 5,172,314) | (52.98; 47.02) | 28/0/0 | 7680 |
HIGGS_IR_16 | 6,193,440 | 28 | (5,829,120; 364,320) | (94.12; 5.88) | 28/0/0 | 3378.7 |
homeCredit | 307,511 | 171 | (282,686; 24,825) | (91.93; 8.07) | 1/9/161 | 113.1 |
hyperplane | 1,000,000 | 10 | (500,007; 499,993) | (50; 50) | 10/0/0 | 91.4 |
klaverjas | 981,541 | 34 | (528,339; 453,202) | (53.83; 46.17) | 2/32/0 | 79.2 |
MiniBooNE | 129,590 | 49 | (93,101; 36,489) | (71.84; 28.16) | 49/0/0 | 67.7 |
rlcp | 5,749,132 | 4 | (5,728,201; 20,931) | (99.64; 0.36) | 1/3/0 | 180.1 |
skin | 245,057 | 3 | (194,198; 50,858) | (79.25; 20.75) | 0/4/0 | 3 |
susy | 5,000,000 | 18 | (2,712,173; 2,287,827) | (54.24; 45.76) | 18/0/0 | 1740.8 |
SUSY_IR_16 | 2,881,684 | 18 | (2,712,173; 169,511) | (94.12; 5.88) | 18/0/0 | 1022.5 |
Algorithm | Parameter | Value |
---|---|---|
FDR-BD | PLT | 1%, 5% |
redPerc list (%) | {99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 80, 70, 60, 50, 40, 30, 20, 10} | |
aggregation criteria | minimum | |
internal k-fCV | 5 | |
FS | covered variance | up to 90% |
DT | impurity | Entropy |
maxDepth | 5 | |
RUS | final ratio | 1:1 |
FCNN_MR | k | 1 |
ReducerType | join |
Actual | Predicted | |
---|---|---|
Positive | Negative | |
Positive | True positive (TP) | False negative (FN) |
Negative | False positive (FP) | True negative (TN) |
PLT [%] | PLT [%] | ||||
---|---|---|---|---|---|
1 | 5 | 1 | 5 | ||
Dataset | RPR [%] | Dataset | RPR [%] | ||
Agrawal | 99 | 99 | fars_fatal | 30 | 94 |
airlines | – | – | HEPMASS_IR_16 | 98 | 99 |
BNG_Australian | 99 | 99 | higgs | 98 | 99 |
BNG_heart | 99 | 99 | HIGGS_IR_16 | 99 | 99 |
census | – | – | homeCredit | 93 | 98 |
click_prediction | 99 | 99 | hyperplane | 99 | 99 |
covtype1 | 97 | 99 | klaverjas | 99 | 99 |
covtype1_vs_2 | 96 | 98 | MiniBooNE | 95 | 99 |
covtype2 | 94 | 99 | rlcp | 99 | 99 |
creditCard | 98 | 99 | skin | – | 97 |
ECBDL14-10mill-90 | 99 | 99 | susy | 99 | 99 |
ethylene_ECO_E_LH | 97 | 99 | SUSY_IR_16 | 98 | 99 |
ethylene_EM_E_LH | 99 | 99 |
Agrawal | BNG_Australian | BNG_heart | click_prediction | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
BSL | FCNN_MR | FDR-BD | BSL | FCNN_MR | FDR-BD | BSL | FCNN_MR | FDR-BD | BSL | FCNN_MR | FDR-BD | |
GM | 0.9475 | 0.9506 | 0.9476 | 0.8606 | 0.8332 | 0.8561 | 0.8541 | 0.8549 | 0.8534 | 0.6112 | 0.2254 | 0.6171 |
SizeRed | 42 | 38 | 99 | 14 | 66 | 99 | 23 | 55 | 99 | 68 | 61 | 99 |
DimRed | 67 | 0 | 67 | 71 | 0 | 71 | 62 | 0 | 62 | 64 | 0 | 64 |
covtype1 | covtype1_vs_2 | covtype2 | creditCard | |||||||||
BSL | FCNN_MR | FDR-BD | BSL | FCNN_MR | FDR-BD | BSL | FCNN_MR | FDR-BD | BSL | FCNN_MR | FDR-BD | |
GM | 0.7534 | 0.7505 | 0.7585 | 0.7557 | 0.7487 | 0.7492 | 0.7411 | 0.7496 | 0.7351 | 0.9269 | 0.8662 | 0.9239 |
SizeRed | 56 | 76 | 98 | 38 | 76 | 98 | 37 | 73 | 96 | 99 | 99 | 98 |
DimRed | 87 | 0 | 89 | 85 | 0 | 91 | 83 | 0 | 85 | 67 | 0 | 73 |
ECBDL14-10mill-90 | ethylene_ECO_E_LH | ethylene_EM_E_LH | HEPMASS_IR_16 | |||||||||
BSL | FCNN_MR | FDR-BD | BSL | FCNN_MR | FDR-BD | BSL | FCNN_MR | FDR-BD | BSL | FCNN_MR | FDR-BD | |
GM | 0.6969 | 0.1514 | 0.6959 | 0.8814 | 0.605 | 0.8777 | 0.8692 | 0.7295 | 0.8693 | 0.8284 | 0.7347 | 0.8252 |
SizeRed | 96 | 88 | 99 | 85 | 98 | 97 | 84 | 97 | 99 | 88 | 85 | 98 |
DimRed | 79 | 0 | 82 | 50 | 0 | 50 | 44 | 0 | 44 | 86 | 0 | 86 |
higgs | HIGGS_IR_16 | homeCredit | hyperplane | |||||||||
BSL | FCNN_MR | FDR-BD | BSL | FCNN_MR | FDR-BD | BSL | FCNN_MR | FDR-BD | BSL | FCNN_MR | FDR-BD | |
GM | 0.6509 | 0.6634 | 0.6488 | 0.6557 | 0.2771 | 0.6576 | 0.5953 | 0.1388 | 0.5954 | 0.3494 | 0.3432 | 0.5263 |
SizeRed | 38 | 39 | 99 | 88 | 75 | 99 | 84 | 74 | 93 | 0 | 42 | 99 |
DimRed | 82 | 0 | 82 | 79 | 0 | 79 | 90 | 0 | 91 | 60 | 0 | 60 |
klaverjas | MiniBooNE | rlcp | susy | |||||||||
BSL | FCNN_MR | FDR-BD | BSL | FCNN_MR | FDR-BD | BSL | FCNN_MR | FDR-BD | BSL | FCNN_MR | FDR-BD | |
GM | 0.8006 | 0.807 | 0.8072 | 0.8743 | 0.8458 | 0.8689 | 0.9309 | 0.9308 | 0.9306 | 0.7612 | 0.7542 | 0.762 |
SizeRed | 0 | 41 | 99 | 49 | 72 | 95 | 99 | 99 | 99 | 0 | 52 | 99 |
DimRed | 79 | 0 | 79 | 75 | 0 | 78 | 25 | 0 | 25 | 78 | 0 | 78 |
SUSY_IR_16 | ||||||||||||
BSL | FCNN_MR | FDR-BD | ||||||||||
GM | 0.7613 | 0.5604 | 0.7614 | |||||||||
SizeRed | 88 | 82 | 98 | |||||||||
DimRed | 78 | 0 | 78 |
GM | SizeRed | DimRed | |||||||
---|---|---|---|---|---|---|---|---|---|
Avg | Diff | W/T/L | Avg | Diff | W/T/L | Avg | Diff | W/T/L | |
BSL | 0.7669 | 0.0077 | 4/15/2 | 56 | 42 | 19/1/1 | 71 | 1 | 7/14/0 |
FCNN_MR | 0.6438 | 0.1308 | 14/5/2 | 71 | 27 | 18/1/2 | 0 | 72 | 21/0/0 |
FDR-BD | 0.7746 | – | – | 98 | – | – | 72 | – | – |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Basgall, M.J.; Naiouf, M.; Fernández, A. FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems. Electronics 2021, 10, 1757. https://doi.org/10.3390/electronics10151757
Basgall MJ, Naiouf M, Fernández A. FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems. Electronics. 2021; 10(15):1757. https://doi.org/10.3390/electronics10151757
Chicago/Turabian StyleBasgall, María José, Marcelo Naiouf, and Alberto Fernández. 2021. "FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems" Electronics 10, no. 15: 1757. https://doi.org/10.3390/electronics10151757
APA StyleBasgall, M. J., Naiouf, M., & Fernández, A. (2021). FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems. Electronics, 10(15), 1757. https://doi.org/10.3390/electronics10151757