An EB-enhanced CNN Model for Piano Music Transcription

This alert has been successfully added and will be sent to: You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below. Manage my Alerts

New Citation Alert!

Information & Contributors
Bibliometrics & Citations
Get Access
References
Media
Tables
Share

Abstract

Automatic Music Transcription (AMT) is an important task in Music Information Retrieval (MIR). Many researchers have focused on the structure of Convolutional Neural Network (CNN) for transcription. In this paper, we construct a CNN-based (EB) piano music transcription model using the energy-balanced constant Q transform spectrogram, which is called EB-enhanced CNN model. Unlike standard CNN-based methods, the proposed model makes the energy of the input features more balanced, so that many previously missed pitches due to weak energy can be successfully detected. Training and evaluation are performed on the MAPS dataset, a public dataset for piano transcription. As a result, our technique achieves a 3.53% f1 score improvement compared with the state-of-the-art method on the MAPS ENSTDkCl subset.

References

A. Cogliati, Z. Duan, and B. Wohlberg, “Piano transcription with convolutional sparse lateral inhibition,” IEEE Signal Processing Letters, vol 24, no 4, pp. 392-396, 2017.

F. Cong, S. Liu, and L. Guo, “A Parallel Fusion Approach to Piano Music Transcription based on Convolutional Neural Network,” in ICASSP, pp. 125131, 2018.

E. Benetos, S. Dixon, Z. Duan, and S. Ewert, “Automatic Music Transcription: An Overview,” IEEE Signal Processing Magazine, vol 36, no 1, pp. 20-30, 2018.

S. C. Kong, W. Xu, W. Liu, X. Gong, and J. T. Liu, “Onset-Aware Polyphonic Piano Transcription: A CNN-Based Approach,” the 9th International Workshop on Computer Science and Engineering, pp. 454-461, 2019.

K. Dressler, “Multiple fundamental frequency extraction for MIREX 2012”, Eighth Music Information Retrieval Evaluation eXchange (MIREX), 2012.

P. Smaragdis and J.C. Brown, “Non-negative matrix factorization for polyphonic music transcription”, in WASPAA, pp. 177-180, 2003.

S. Ewert, “An augmented Lagrangian method for piano transcription using equal loudness thresholding and LSTM-based decoding”, in WASPAA, pp. 146-150, 2017.

P. H. Peeling and A. T. Cemgil, “Generative spectrogram factorization models for polyphonic piano transcription,” IEEE transactions on audio, speech, and language processing, vol 18, no 3, pp. 519-527, 2009.

S. Sigtia, E. Benetos and S. Dixon, “An end-to-end neural network for polyphonic piano music transcription,” IEEE/ACM Transactions on Audio, Speech, and Language, vol 24, no 5, pp. 927-939, 2016.

S. Sigtia and E. Benetos, “A hybrid recurrent neural network for music transcription,” IEEE international conference on acoustics, speech and signal, pp. 2061-2065, 2015.

C. Hawthorne, E. Elsen, and J. Song, “Onsets and frames: Dual-objective piano transcription”, in arXiv preprint 1710.11153, 2017.

R. Kelz, S. Böck, and G. Widmer, “Deep polyphonic adsr piano note transcription,” IEEE International Conference on Acoustics, Speech and Signal, pp. 246250, 2019.

C. Hawthorne, E. Elsen, and J. Song, “Onsets and frames: Dual-objective piano transcription,” the 19th ISMIR Conference, 2018.

C. Z. A. Huang and A. Vaswani, “Music transformer: Generating music with long-term structure,” International Conference on Learning Representations, pp. 102-110, 2019.

X. Gong, W. Xu, and J. T. Liu, “ANALYSIS AND CORRECTION OF MAPS DATASET,” the 22nd International Conference on Digital Audio Effects, 2019.

V. Emiya, R. Badeau, and B. David, “Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle,” IEEE Transactions on Audio, Speech, and Language Processing, vol 18, no 6, pp. 1643-1654, 2009.

Q. Wang, R. Zhou, and Y. Yan, “A two-stage approach to note-level transcription of a specific piano,” Applied Sciences, vol 7, no 9, pp. 901, 2017.

Recommendations

Robust Piano Music Transcription Based on Computer Vision

HPCCT & BDAI '20: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence

Recently, automatic music transcription aiming to convert acoustic music signals into symbolic notations attracts increasing attention. In order to deal with the challenges of automatic music transcription based on acoustic information, traditional .

claVision: Visual Automatic Piano Music Transcription

NIME 2015: Proceedings of the international conference on New Interfaces for Musical Expression

One important problem in Musical Information Retrieval is Automatic Music Transcription, which is an automated conversion process from played music to a symbolic notation such as sheet music. Since the accuracy of previous audio-based transcription .

Automatic Guitar Music Transcription

ACSAT '12: Proceedings of the 2012 International Conference on Advanced Computer Science Applications and Technologies

This paper presents a system that helps in automatically generating guitar tablatures and musical scores based on musical audio data. Information gathered from the audio consists of pitch, onsets and durations, chords, and beat and tempo. Major issues .