Score Following and Retrieval Based on Chroma and Octave Representation

Wei-Ta Chu and Meng-Luen Li

Multimedia Computing Laboratory
Dept. of Computer Science and Information Engineering
National Chung Cheng University

1. Introduction

With the studies of effective representation of music signals and music scores, i.e. chroma and octave features, this work conducts score following and score retrieval. To complement the shortage of chromagram representation, energy distributions in different octaves are used to describe tone height information. By transforming music signals and scores into sequences of feature vectors, score following is transformed as a sequence matching problem, and is solved by the dynamic time warping (DTW) algorithm. To conduct score retrieval, we modify the backtracking step of DTW to determine multiple partial matchings between the query and a score. Experimental results show the effectiveness of the proposed features and the feasibility of the modified DTW algorithm in score retrieval.

2. Datasets

Dataset1 consists of 67 music files and 67 corresponding scores (MIDI files), which are collected from [6]. Each music piece is recorded from a real performance, which proceeds according to a corresponding score. Music pieces have durations ranging from 27.1 to 191.5 seconds. The sampling rate is 44.1kHz and each sample is represented by 16 bits. The amounts of score bars of scores range from 9 to 92. To evaluate performance of music-score matching, we listen to all music files and manually identify how a segment of a music signal corresponds to a score bar in the corresponding music score.

Each set has three files:

file type Meaning
xxx.wav Music file
xxx.mid Score file
xxx.lab Ground truth of score bars in the music file

In these ground truths, the start time and end time (in seconds) of each score bar in the .wav file are specified. For example, in []_bach-johann-sebastian-invention-177.lab:

Start time End time
0.05 2.6
2.6 5.15
5.15 7.35
... ...

These mean that the first score bar starts at 0.05 sec and ends at 2.6 sec, the second score bar starts at 2.6 sec and ends at 5.15 sec, the third score bar starts at 5.15 sec and ends at 7.35 sec, ...
These ground truths are defined manually, with the assist of Wavesurfer.
As regards the score bar information in MIDI, you can parse the MIDI file according to MIDI specificiation. You can also find score information with some visualization tools of MIDI.

There are also 67 music files and 67 corresponding score files (MIDI files) in Dataset2. The score files are same as that in Dataset1, while the music files in Dataset2 are converted from MIDI files by a MIDI synthesizer.

Dataset3 is generated for evaluating the score retrieval system. It consists of 1491 music queries, 67 corresponding score files (MIDI files), and 133 irrelevant score files. Each query piece is generated by randomly selecting a segment from a music signal in Dataset1.


3. Citation

W.-T. Chu and M.-L. Li, "Score Following and Retrieval Based on Chroma and Octave Representation," Proceedings of Internaional Conference on Multimedia Modeling, 2011. (in Lecture Notes in Computer Science, vol. 6523, pp. 229-239, 2011.)

Any problem please contact .

Last Updated: April 5, 2012