Effect of Feature Selection on the Accuracy of Music Popularity Classification Using Machine Learning Algorithms
Abstract
:1. Introduction
- (1)
- This article shows how effective the feature selection method is in determining the properties that are most effective on the popularity of songs.
- (2)
- This article provides a comprehensive study and evaluation comparing the performance of machine learning algorithms in the method of feature selection. Performance criteria have been used to determine the best algorithm.
2. Literature Survey
3. Dataset
4. Methodology
4.1. Data Pre-Processing
Detecting and Extracting Outliers from the Dataset
- Outlier Data Query and Deletion Algorithm
- Q1 = np.percentile(data[c],25)
- Q3 = np.percentile(data[c],75)
- IQR = Q3–Q1
- Outlier_step = 1.5 × IQR
- outlier_list_col = data[(data[c] < Q1 − outlier_step)|(data[c] > Q3 + outlier_step)].index
4.2. Categorizing the Popularity Variable
4.3. Feature Selection
- Filter methods
- Wrapper method (Forward, Backward)
- Embedded methods (Lasso-L1, Ridge-L2 Regression)
5. Results and Discussion
5.1. Separate Dataset
5.2. Classification Using Logistic Regression-Random Forest-KNN Algorithms and Evaluation
6. Conclusions and Recommendations
Author Contributions
Funding
Conflicts of Interest
References
- Li, T.; Ogihara, M.; Tzanetakis, G. Music Data Mining; CRC Press: Boca Raton, FL, USA, 2012; Available online: https://www.routledge.com/Music-Data-Mining/Li-Ogihara-Tzanetakis/p/book/9781439835524 (accessed on 15 July 2022).
- Sloboda, J.A.; O’Neill, S.A.; Ivaldi, A. Functions of music in everyday life: An exploratory study using the Experience Sampling Method. Music. Sci. 2001, 5, 9–32. [Google Scholar] [CrossRef]
- Prabhu, N.R.; Vasko, J.A.; Bein, D.; Bein, W. Music genre classification using data mining and machine learning. Inf. Technol. -New Gener. 2018, 738, 397–403. [Google Scholar]
- Popular Music. Available online: https://tr.wikipedia.org/wiki/Pop%C3%BCler_m%C3%BCzik (accessed on 29 June 2022).
- Spotify. Audio Features & Analysis. 2021. Available online: https://developer.spotify.com/discover/ (accessed on 10 October 2022).
- Khan, M.A.; Abbas, S.; Raza, A.; Khan, F.; Whangbo, T. Emotion Based Signal Enhancement Through Multisensory Integration Using Machine Learning. CMC-Comput. Mater. Contin. 2022, 71, 5911–5931. [Google Scholar] [CrossRef]
- Ayvaz, U.; Gürüler, H.; Khan, F.; Ahmed, N.; Whangbo, T. Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning. CMC-Comput. Mater. Contin. 2022, 71, 5511–5521. [Google Scholar] [CrossRef]
- Iqbal, M.W.; Naqvi, M.R.; Khan, M.A.; Khan, F.; Whangbo, T. Mobile Devices Interface Adaptivity Using Ontologies. CMC-Comput. Mater. Contin. 2022, 71, 4767–4784. [Google Scholar] [CrossRef]
- Laila, U.E.; Mahboob, K.; Khan, A.W.; Khan, F.; Taekeun, W. An Ensemble Approach to Predict Early-Stage Diabetes Risk Using Machine Learning: An Empirical Study. Sensors 2022, 22, 5247. [Google Scholar] [CrossRef] [PubMed]
- Zhang, B.; Kreitz, G.; Isaksson, M.; Ubillos, J.; Urdaneta, G.; Pouwelse, J.A.; Epema, D. Understanding User Behavior in Spotify. In Proceedings of the 2013 Proceedings IEEE INFOCOM, Turin, Italy, 14–19 April 2013; pp. 220–224. [Google Scholar]
- Aguiar, L.; Waldfogel, J. Platforms, Promotion and Product Discovery: Evidence from Spotify Playlist; National Bureau of Economic Research: Cambridge, MA, USA, 2018; pp. 1–44. [Google Scholar]
- Goldmann, M.; Kreitz, G. Measurements on the Spotify Peer-Assisted Music on Demand Streaming System. In Proceedings of the IEEE International Conference on Peer-to-Peer Computing, Kyoto, Japan, 31 August–September 2011. [Google Scholar]
- Vonderau, P. The Spotify Effect: Digital Distribution and Financial Growth. SAGE J. 2017, 20, 3–19. [Google Scholar] [CrossRef]
- Jacobson, K.; Murali, V.; Newett, E.; Whitman, B.; Yon, R. Music personalization at Spotify. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; p. 373. [Google Scholar]
- Efe, A. Example Of Online Music Platform as A Display Advertising Space: Spotify. Int. J. Public Relat. Advert. Stud. 2019, 2, 131–146. [Google Scholar]
- Canyakan, S. Audio History: Audio-Specific Music Technology and Origin. Uşak Univ. J. Soc. Sci. 2017, 10, 171–191. [Google Scholar]
- An, Y.; Sun, S.; Wang, S. Naive Bayes classifiers for music emotion classification based on lyrics. In Proceedings of the 2017 IEEE/ACIS 16th International Conference on Computer and Information Science, Wuhan, China, 24–26 May 2017; pp. 635–638. [Google Scholar]
- Guimaraes, P.; Froes, J.; Costa, D.; Freitas, L.A. A comparison of identification methods of Brazilian music styles by lyrics. In Proceedings of the Fourth Widening Natural Language Processing Workshop, Online, 4 December 2020; pp. 61–63. [Google Scholar]
- Duru, S.; Yüreğir, O.H. Data Cleaning for Data Mining and Applications on Turkish Classical Music Data. J. Econ. Adm. Sci. 2019, 3, 150–159. [Google Scholar]
- Karatana, A.; Yıldız, O. Music Genre Classification with Machine Learning Techniques. In Proceedings of the 25th Signal Processing and Communications Applications Conference, Antalya, Turkey, 15–18 May 2017; pp. 1–4. [Google Scholar]
- Sciandra, M.; Spera, I.C. A model-based approach to Spotify data analysis: A Beta GLMM. J. Appl. Stat. 2022, 49, 214–229. [Google Scholar] [CrossRef] [PubMed]
- Apostolova-Trpkovska, M.; Kajtazi, A.; Abazi Bexheti, L.; Kadriu, A. Applying Data Mining and Data Visualization within the Scope of Audio Data Using Spotify; IADIS: Lisbon, Portugal, 2019; pp. 197–204. [Google Scholar] [CrossRef] [Green Version]
- Pareek, P.; Shankar, P.; Sakariya, N. Predicting Music Popularity Using Machine Learning Algorithm and Music Metrics Available in Spotify. J. Dev. Econ. Manag. Res. Stud. JDMS 2022, 9, 10–19. [Google Scholar]
- Cueva Mora, A.; Tierney, B. Feature Engineering vs. Feature Selection vs. Hyperparameter Optimization in the Spotify Song Popularity Dataset. In Proceedings of the Tenth International Conference on Data Analytics, Barcelona, Spain, 3–7 October 2021. [Google Scholar]
- Zangerle, E.; Vötter, M.; Huber, R.; Yang, Y.H. Hit Song Prediction: Leveraging Low-and High-Level Audio Features. In Proceedings of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, 4–8 November 2019; pp. 319–326. [Google Scholar]
- Nijkamp, R. Prediction of Product Success: Explaining Song Popularity by Audio Features from Spotify Data. Bachelor’s Thesis, University of Twente, Enschede, The Netherlands, 2018. [Google Scholar]
- Rahardwika, D.S.; Rachmawanto, E.H.; Sari, C.A.; Susanto, A.; Mulyono IU, W.; Astuti, E.Z.; Fahmi, A. Effect of feature selection on the accuracy of music genre classification using SVM classifier. In Proceedings of the 2020 International Seminar on Application for Technology of Information and Communication (iSemantic), Semarang, Indonesia, 19–20 September 2020; pp. 7–11. [Google Scholar]
- Available online: https://www.kaggle.com/tomigelo/spotify-audio-features (accessed on 15 July 2022).
- Pan, J.S.; Tian, A.Q.; Chu, S.C.; Li, J.B. Improved binary pigeon-inspired optimization and its application for feature selection. Appl. Intell. 2021, 51, 8661–8679. [Google Scholar] [CrossRef]
- Hu, P.; Pan, J.S.; Chu, S.C. Improved binary grey wolf optimizer and its application for feature selection. Knowl.-Based Syst. 2020, 195, 105746. [Google Scholar] [CrossRef]
- Bircan, H. Logistic Regression Analysis: An Application on Medical Data. Kocaeli Univ. J. Soc. Sci. 2004, 8, 185–208. [Google Scholar]
- Özdemir, S. Potential Distribution Modelling and Mapping Using Random Forest Method: An Example of Yukarigökdere District. Turk. J. For. 2018, 19, 51–56. [Google Scholar] [CrossRef] [Green Version]
- Aksoy, G.; Ataş, P.; Karabatak, M. Investigation of shopping habits using data mining classification algorithms. In Proceedings of the 2019 1st International Informatics and Software Engineering Conference (UBMYK), Ankara, Turkey, 6–7 November 2019; pp. 1–5. [Google Scholar]
- Breiman, L. Random forest. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Cover, T.; Hart, P. Nearest Neighbor Pattern Classification. Information Theory. IEEE Trans. 1967, 13, 21–27. [Google Scholar]
- Jiang, Y.; Zhou, Z.H. Editing Training Data For Knn Classifiers with Neural Network Ensemble. Lect. Notes Comput. Sci. 2004, 3173, 356–361. [Google Scholar] [CrossRef]
- Veranyurt, Ü.; Deveci, A.; Esen, M.F.; Veranyurt, O. Disease Classification by Machine Learning Techniques: Random Forest, K-Nearest Neighbor and Adaboost Algorithms Applications. Int. J. Health Manag. Strateg. Res. 2020, 6, 275–286. Available online: https://dergipark.org.tr/en/pub/usaysad/issue/56571/786740 (accessed on 15 July 2022).
- Han, J.; Pei, J.; Kamber, M. Data Mining: Concepts and Techniques; Elsevier: Amsterdam, The Netherlands, 2011. [Google Scholar]
- Japkowicz, N. Performance Evaluation for Learning Algorithms; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- Alan, A.; Karabatak, M. Evaluation of the Factors Affecting Performance on the Datasets—Classification Relationship. Fırat Univ. J. Eng. Sci. 2020, 32, 531–540. [Google Scholar]
- Tzanetakis, G.; Cook, P. Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 2002, 10, 293–302. [Google Scholar] [CrossRef]
Protocol/Algorithm | Research Problem | Objectives | Contribution | Domain | Simulator | Metrics |
---|---|---|---|---|---|---|
[12] | determinants of popularity? | Determination within the set of characteristics | Improvement the success of newly released songs | Spotify WEB API | R Studio | Coefficients, standard error and z value |
[13] | Detection of audio in Spotify Music Data | Identified of popular songs | Enabling to use more relevant audio features | Spotify WEB API | R Studio | Correlation |
[14] | Predict of song’s capacity | To be able to determine the capacity of a song | Improvement the success of newly released songs | Spotify and other platforms | Python | Accuracy, precision, recall and F1-score |
[15] | Comparing Feature Engineering, Feature Selection and Hyperparameter Optimization | predicts popularity in the Spotify Song Popularity dataset | Using the algorithm in popularity estimation | Spotify WEB API | Kaggle | Cross-Validation (CV) and root mean square error |
[16] | Acoustic features extraction from the song’s | Predicting low- and high-level audio features volume characteristics | Ensuring the success of a new song | All music Platforms | Python | Root mean squared error and the Mean absolute error (MAE) |
[17] | Attribute approach based on Spotify’s audio features | Determination of song data audio features and song popularity measured | Generating hypotheses between vocal characteristics, song popularity and the accuracy of these hypotheses | Spotify WEB API | SPSS | R2 |
Proposed method | Determination of sound features | effectiveness of the sound features | Improvement the rhythm of songs | Spotify WEB API | ANACONDA-Jupyter Notebook | Correlation, Accuracy, F1-criteria |
Acousticness | Danceability | Duration_ms | Energy | Instrumentalness | Key | Liveness |
---|---|---|---|---|---|---|
0.005 | 0.743 | 238373 | 0.339 | 0 | 1 | 0.0812 |
0.024 | 0.846 | 214800 | 0.557 | 0 | 8 | 0.286 |
0.025 | 0.603 | 138913 | 0.723 | 0 | 9 | 0.0824 |
loudness | mode | speechiness | tempo | time_signature | valence | popularity |
−7.678 | 1 | 0.409 | 203.927 | 4 | 0.118 | 15 |
−7.259 | 1 | 0.457 | 159.009 | 4 | 0.371 | 0 |
−5.89 | 0 | 0.0454 | 114.966 | 4 | 0.382 | 56 |
Machine Learning | Logistic Regression | Random Forest | KNN | |
---|---|---|---|---|
Algorithms Feature Selection | ||||
Yes | 95.14% | 93.54% | 93.54% | |
No | 95.15% | 93.40% | 91.64% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khan, F.; Tarimer, I.; Alwageed, H.S.; Karadağ, B.C.; Fayaz, M.; Abdusalomov, A.B.; Cho, Y.-I. Effect of Feature Selection on the Accuracy of Music Popularity Classification Using Machine Learning Algorithms. Electronics 2022, 11, 3518. https://doi.org/10.3390/electronics11213518
Khan F, Tarimer I, Alwageed HS, Karadağ BC, Fayaz M, Abdusalomov AB, Cho Y-I. Effect of Feature Selection on the Accuracy of Music Popularity Classification Using Machine Learning Algorithms. Electronics. 2022; 11(21):3518. https://doi.org/10.3390/electronics11213518
Chicago/Turabian StyleKhan, Faheem, Ilhan Tarimer, Hathal Salamah Alwageed, Buse Cennet Karadağ, Muhammad Fayaz, Akmalbek Bobomirzaevich Abdusalomov, and Young-Im Cho. 2022. "Effect of Feature Selection on the Accuracy of Music Popularity Classification Using Machine Learning Algorithms" Electronics 11, no. 21: 3518. https://doi.org/10.3390/electronics11213518
APA StyleKhan, F., Tarimer, I., Alwageed, H. S., Karadağ, B. C., Fayaz, M., Abdusalomov, A. B., & Cho, Y. -I. (2022). Effect of Feature Selection on the Accuracy of Music Popularity Classification Using Machine Learning Algorithms. Electronics, 11(21), 3518. https://doi.org/10.3390/electronics11213518