A Deeper Look at Sheet Music Composer Classification Using Self-Supervised Pretraining
Abstract
:1. Introduction
2. Materials and Methods
2.1. Pretraining
2.2. Finetuning
2.3. Inference
3. Results
3.1. Experimental Setup
3.2. Fragment Classification Results
3.3. Full-Page Classification Results
4. Discussion
4.1. Effect of Data Augmentation
4.2. Single Hand Models
4.3. t-SNE
4.4. Unseen Composer Classification
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
OMR | Optical Music Recognition |
IMSLP | International Music Score Library Project |
KNN | K Nearest Neighbors |
SVM | Support Vector Machine |
CNN | Convolutional Neural Network |
MIDI | Musical Instrument Digital Interface |
NLP | Natural Language Processing |
ULMFit | Universal Language Model Fine Tuning |
LSTM | Long Short-Term Memory |
GPT-2 | Generative Pretraining |
BERT | Bidirectional Encoder Representations from Transformers |
RoBERTa | A Robustly Optimized BERT Pretraining Approach |
AWD-LSTM | Average Stochastic Gradient Descent Weight-Dropped LSTM |
Portable Document Format | |
BPE | Byte Pair Encoder |
t-SNE | t-Distributed Stochastic Neighbor Embedding |
MRR | Mean Reciprocal Rank |
References
- Calvo-Zaragoza, J.; Hajič, J., Jr.; Pacha, A. Understanding Optical Music Recognition. ACM Comput. Surv. (CSUR) 2020, 53, 1–35. [Google Scholar] [CrossRef]
- Yang, D.; Tsai, T. Camera-Based Piano Sheet Music Identification. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Montréal, QC, Canada, 11–15 October 2020; pp. 481–488. [Google Scholar]
- Dorfer, M.; Hajič, J.; Arzt, A.; Frostel, H.; Widmer, G. Learning Audio-Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification. Trans. Int. Soc. Music Inf. Retr. 2018, 1, 22–33. [Google Scholar] [CrossRef]
- Henkel, F.; Kelz, R.; Widmer, G. Learning to Read and Follow Music in Complete Score Sheet Images. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Montréal, QC, Canada, 11–15 October 2020; pp. 780–787. [Google Scholar]
- Tsai, T. Towards Linking the Lakh and IMSLP Datasets. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 546–550. [Google Scholar]
- Anan, Y.; Hatano, K.; Bannai, H.; Takeda, M.; Satoh, K. Polyphonic Music Classification on Symbolic Data Using Dissimilarity Functions. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Porto, Portugal, 8–12 October 2012; pp. 229–234. [Google Scholar]
- Kaliakatsos-Papakostas, M.A.; Epitropakis, M.G.; Vrahatis, M.N. Musical Composer Identification Through Probabilistic and Feedforward Neural Networks. In Proceedings of the European Conference on the Applications of Evolutionary Computation, Istanbul, Turkey, 7–9 April 2010; pp. 411–420. [Google Scholar]
- Herlands, W.; Der, R.; Greenberg, Y.; Levin, S. A Machine Learning Approach to Musically Meaningful Homogeneous Style Classification. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014. [Google Scholar]
- Backer, E.; van Kranenburg, P. On Musical Stylometry—A Pattern Recognition Approach. Pattern Recognit. Lett. 2005, 26, 299–309. [Google Scholar] [CrossRef]
- Kempfert, K.C.; Wong, S.W. Where Does Haydn End And Mozart Begin? Composer Classification Of String Quartets. arXiv 2018, arXiv:1809.05075. [Google Scholar] [CrossRef]
- Mearns, L.; Tidhar, D.; Dixon, S. Characterisation of Composer Style using High-Level Musical Features. In Proceedings of the 3rd International Workshop on Machine Learning and Music, Firenze, Italy, 29 October 2010; pp. 37–40. [Google Scholar]
- Van Kranenburg, P.; Backer, E. Musical Style Recognition—A Quantitative Approach. In Handbook of Pattern Recognition and Computer Vision; World Scientific Publishing Co.: Singapore, 2005; pp. 583–600. [Google Scholar]
- Hillewaere, R.; Manderick, B.; Conklin, D. String Quartet Classification with Monophonic Models. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Utrecht, The Netherlands, 9–13 August 2010; pp. 537–542. [Google Scholar]
- Herremans, D.; Martens, D.; Sörensen, K. Composer Classification Models for Music-Theory Building. In Computational Music Analysis; Springer: Berlin, Germany, 2016; pp. 369–392. [Google Scholar]
- McKay, C.; Fujinaga, I. jSymbolic: A Feature Extractor for MIDI Files. In Proceedings of the International Computer Music Conference, New Orleans, LA, USA, 6–11 November 2006. [Google Scholar]
- Brinkman, A.; Shanahan, D.; Sapp, C. Musical Stylometry, Machine Learning and Attribution Studies: A Semi-Supervised Approach to the Works of Josquin. In Proceedings of the Biennial International Conference on Music Perception and Cognition, San Francisco, CA, USA, 5–9 July 2016; pp. 91–97. [Google Scholar]
- Sadeghian, P.; Wilson, C.; Goeddel, S.; Olmsted, A. Classification of Music by Composer Using Fuzzy Min-Max Neural Networks. In Proceedings of the 12th International Conference for Internet Technology and Secured Transactions (ICITST), Cambridge, UK, 11–14 December 2017; pp. 189–192. [Google Scholar]
- Hontanilla, M.; Pérez-Sancho, C.; Inesta, J.M. Modeling Musical Style with Language Models for Composer Recognition. In Proceedings of the Iberian Conference on Pattern Recognition and Image Analysis, Madeira, Portugal, 5–7 June 2013; pp. 740–748. [Google Scholar]
- Wołkowicz, J.; Kešelj, V. Evaluation of N-gram-based Classification Approaches on Classical Music Corpora. In Proceedings of the International Conference on Mathematics and Computation in Music, Montreal, QC, Canada, 12–14 June 2013; pp. 213–225. [Google Scholar]
- Wołkowicz, J.; Kulka, Z.; Kešelj, V. N-gram-based Approach to Composer Recognition. Arch. Acoust. 2008, 33, 43–55. [Google Scholar]
- Kaliakatsos-Papakostas, M.A.; Epitropakis, M.G.; Vrahatis, M.N. Weighted Markov Chain Model for Musical Composer Identification. In Proceedings of the European Conference on the Applications of Evolutionary Computation, Torino, Italy, 27–29 April 2011; pp. 334–343. [Google Scholar]
- Pollastri, E.; Simoncelli, G. Classification of Melodies by Composer with Hidden Markov Models. In Proceedings of the First International Conference on WEB Delivering of Music, Florence, Italy, 23–24 November 2001; pp. 88–95. [Google Scholar]
- Buzzanca, G. A Supervised Learning Approach to Musical Style Recognition. In Proceedings of the International Conference on Music and Artificial Intelligence (ICMAI), Edinburgh, UK, 12–14 September 2002; Volume 2002, p. 167. [Google Scholar]
- Verma, H.; Thickstun, J. Convolutional Composer Classification. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, The Netherlands, 4–8 November 2019; pp. 549–556. [Google Scholar]
- Velarde, G.; Chacón, C.C.; Meredith, D.; Weyde, T.; Grachten, M. Convolution-based Classification of Audio and Symbolic Representations of Music. J. New Music Res. 2018, 47, 191–205. [Google Scholar] [CrossRef]
- Velarde, G.; Weyde, T.; Chacón, C.E.C.; Meredith, D.; Grachten, M. Composer Recognition Based on 2D-Filtered Piano-Rolls. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), New York, NY, USA, 7–11 August 2016; pp. 115–121. [Google Scholar]
- Good, M.; Actor, G. Using MusicXML for File Interchange. In Proceedings of the Third International Conference on WEB Delivering of Music, Leeds, UK, 15–17 September 2003; p. 153. [Google Scholar]
- Huron, D. Humdrum and Kern: Selective Feature Encoding; MIT Press: Cambridge, MA, USA, 1997. [Google Scholar]
- Hankinson, A.; Roland, P.; Fujinaga, I. The Music Encoding Initiative as a Document-Encoding Framework. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Miami, FL, USA, 24–28 October 2011; pp. 293–298. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed Representations of Words and Phrases and their Compositionality. Adv. Neural Inf. Process. Syst. 2013, 26, 3111–3119. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. GloVe: Global Vectors for Word Representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Howard, J.; Ruder, S. Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; pp. 328–339. [Google Scholar]
- Dai, A.M.; Le, Q.V. Semi-Supervised Sequence Learning. Adv. Neural Inf. Process. Syst. 2015, 28, 3079–3087. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, December 2017; pp. 5998–6008. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models are Unsupervised Multitask Learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. arXiv 2020, arXiv:2005.14165. [Google Scholar]
- Dai, Z.; Yang, Z.; Yang, Y.; Carbonell, J.G.; Le, Q.; Salakhutdinov, R. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 2978–2988. [Google Scholar]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv 2019, arXiv:1906.08237. [Google Scholar]
- Yang, D.; Tanprasert, T.; Jenrungrot, T.; Shan, M.; Tsai, T. MIDI Passage Retrieval Using Cell Phone Pictures of Sheet Music. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, The Netherlands, 4–8 November 2019; pp. 916–923. [Google Scholar]
- Yang, D.; Tsai, T. Composer Style Classification of Piano Sheet Music Images Using Language Model Pretraining. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Montréal, QC, Canada, 11–15 October 2020; pp. 176–183. [Google Scholar]
- Shan, M.; Tsai, T. Improved Handling of Repeats and Jumps in Audio-Sheet Image Synchronization. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Montréal, QC, Canada, 11–15 October 2020; pp. 62–69. [Google Scholar]
- Merity, S.; Keskar, N.S.; Socher, R. Regularizing and Optimizing LSTM Language Models. arXiv 2017, arXiv:1708.02182. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Gage, P. A New Algorithm for Data Compression. C Users J. 1994, 12, 23–38. [Google Scholar]
- Martin, L.; Muller, B.; Suárez, P.J.O.; Dupont, Y.; Romary, L.; de la Clergerie, É.V.; Seddah, D.; Sagot, B. CamemBERT: A Tasty French Language Model. arXiv 2019, arXiv:1911.03894. [Google Scholar]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations. arXiv 2019, arXiv:1909.11942. [Google Scholar]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
- Bai, S.; Kolter, J.Z.; Koltun, V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
- Bradbury, J.; Merity, S.; Xiong, C.; Socher, R. Quasi-Recurrent Neural Networks. arXiv 2016, arXiv:1611.01576. [Google Scholar]
- Buda, M.; Maki, A.; Mazurowski, M.A. A Systematic Study of the Class Imbalance Problem in Convolutional Neural Networks. Neural Netw. 2018, 106, 249–259. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Smith, L.N. Cyclical Learning Rates for Training Neural Networks. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 27–29 March 2017; pp. 464–472. [Google Scholar]
- Smith, L.N. A Disciplined Approach to Neural Network Hyper-parameters: Part 1–Learning Rate, Batch Size, Momentum, and Weight Decay. arXiv 2018, arXiv:1803.09820. [Google Scholar]
- Taylor, L.; Nitschke, G. Improving Deep Learning Using Generic Data Augmentation. arXiv 2017, arXiv:1708.06020. [Google Scholar]
- Calvo-Zaragoza, J.; Rico-Juan, J.R.; Gallego, A.J. Ensemble Classification from Deep Predictions with Test Data Augmentation. Soft Comput. 2020, 24, 1423–1433. [Google Scholar] [CrossRef] [Green Version]
- Van der Maaten, L.; Hinton, G. Visualizing Data Using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Composer | Pieces | Pages | Features |
---|---|---|---|
Bach | 176 | 969 | 245k |
Beethoven | 70 | 666 | 153k |
Chopin | 82 | 665 | 118k |
Haydn | 50 | 347 | 88k |
Liszt | 169 | 2272 | 394k |
Mozart | 54 | 468 | 116k |
Schubert | 75 | 578 | 139k |
Schumann | 37 | 500 | 105k |
Scriabin | 74 | 686 | 108k |
Total | 787 | 7151 | 1.47M |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, D.; Ji, K.; Tsai, T. A Deeper Look at Sheet Music Composer Classification Using Self-Supervised Pretraining. Appl. Sci. 2021, 11, 1387. https://doi.org/10.3390/app11041387
Yang D, Ji K, Tsai T. A Deeper Look at Sheet Music Composer Classification Using Self-Supervised Pretraining. Applied Sciences. 2021; 11(4):1387. https://doi.org/10.3390/app11041387
Chicago/Turabian StyleYang, Daniel, Kevin Ji, and TJ Tsai. 2021. "A Deeper Look at Sheet Music Composer Classification Using Self-Supervised Pretraining" Applied Sciences 11, no. 4: 1387. https://doi.org/10.3390/app11041387
APA StyleYang, D., Ji, K., & Tsai, T. (2021). A Deeper Look at Sheet Music Composer Classification Using Self-Supervised Pretraining. Applied Sciences, 11(4), 1387. https://doi.org/10.3390/app11041387