Comparison Among AlexNet, GoogLeNet and ResNet-18 in Autiomatic Music Genre Classification

Xinyu Hong

doi:10.54097/j1ry4d81

Authors

Xinyu Hong

DOI:

https://doi.org/10.54097/j1ry4d81

Keywords:

Deep learning, music genre classification, convolutional neural networks.

Abstract

This paper compared the performance among three famous convolutional neural networks in classifying music genres. Being carried out on a prominent dataset named Free Music Archive, the experiments show that ResNet-18 performs much better than AlexNet and GoogLeNet in classifying the music genres in a relatively small dataset. Meanwhile, the classification accuracy of each model for each music genre was also recorded. It indicates that different models could be expert in identifying distinct genres. Several genres, including blues, hip-hop and international, were not closely related to the change of models. In general, ResNet-18 reached the highest average classification accuracy at approximate 80%, while AlexNet did best in finding hip-hop music and GoogLeNet had relatively less difference in recognition rates for every genre. Those findings can serve as a reference in future music genre classification tasks and personalized music recommendation based on big data.

Downloads

Download data is not yet available.

References

Silla Jr, C. N., Kaestner, C. A., & Koerich, A. L. (2007). Automatic music genre classification using ensemble of classifiers. In2007 IEEE International Conference on Systems, Man and Cybernetics, pp. 1687-1692.

Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on speech and audio processing, 10(5): 293-302.

Kosina, K. (2002). Music genre recognition.

Yan, J. (2022) Comparison of Machine Learning and Deep Learning Model Classification of music genres. Information Technology and Informatization, 12:217-220.

Gao, Y. (2020) Research on Music and Audio Classification based on Deep Learning. Thesis of South China University of Technology.

Defferrard, M., Benzi, K., Vandergheynst, P., & Bresson, X. (2016). FMA: A dataset for music analysis. arv preprint arv:1612.01840.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90.

Attri, I., Awasthi, L. K., Sharma, T. P., & Rathee, P. (2023). A review of deep learning techniques used in agriculture. Ecological Informatics, 102217.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9.

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arv preprint arv:1409.1556.

Bae, W., Yoo, J., & Chul Ye, J. (2017). Beyond deep residual learning for image restoration: Persistent homology-guided manifold simplification. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 145-153.