- Research Article
- Open Access

# Optical Music Recognition for Scores Written in White Mensural Notation

- Lorenzo J. Tardón
^{1}Email author, - Simone Sammartino
^{1}, - Isabel Barbancho
^{1}, - Verónica Gómez
^{1}and - Antonio Oliver
^{1}

**2009**:843401

https://doi.org/10.1155/2009/843401

© Lorenzo J. Tardón et al. 2009

**Received:**30 January 2009**Accepted:**18 November 2009**Published:**28 December 2009

## Abstract

An Optical Music Recognition (OMR) system especially adapted for handwritten musical scores of the XVII-th and the early XVIII-th centuries written in white mensural notation is presented. The system performs a complete sequence of analysis stages: the input is the RGB image of the score to be analyzed and, after a preprocessing that returns a black and white image with corrected rotation, the staves are processed to return a score without staff lines; then, a music symbol processing stage isolates the music symbols contained in the score and, finally, the classification process starts to obtain the transcription in a suitable electronic format so that it can be stored or played. This work will help to preserve our cultural heritage keeping the musical information of the scores in a digital format that also gives the possibility to perform and distribute the original music contained in those scores.

## Keywords

- Black Pixel
- Fourier Descriptor
- Staff Line
- Fisher Linear Discriminant
- Music Symbol

## 1. Introduction

Optical Music Recognition (OMR) aims to provide a computer with the necessary processing capabilities to convert a scanned score into an electronic format and even recognize and understand the contents of the score. OMR is related to Optical Character Recognition (OCR); however, it shows several differences based on the typology of the symbols to be recognized and the structure of the framework [1]. OMR has been an active research area since the 70s but it is in the early 90s when the first works for handwritten formats [2] and ancient music started to be developed [3, 4]. Some of the most recent works on ancient music recognition are due to Pugin et al. [5], based on the implementation of hidden Markov models and adaptive binarization, and to Caldas Pinto et al. [6], with the development of the project ROMA (*Reconhecimento Óptico de Música Antiga*) for the recognition and restoration of ancient music manuscripts, directed by the *Biblioteca Geral da Universidade de Coimbra*.

Of course, a special category of OMR systems deal with ancient handwritten music scores. OMR applied to ancient music shows several additional difficulties with respect to classic OMR [6]. The notation can vary from one author to another or among different scores of the same artist or even within the same score. The size, shape, and intensity of the symbols can change due to the imperfections of handwriting. In case of later additional interventions on the scores, other classes of symbols, often with different styles, may appear superimposed to the original ones. The thickness of the staff lines is not a constant parameter anymore and the staff lines are not continuous straight lines in real scores. Moreover, the original scores get degraded by the effect of age. Finally, the digitized scores may present additional imperfections: geometrical distortions, rotations, or even heterogeneous illumination.

A good review of the stages related to the OMR process can be found in [7] or [8]. These stages can be described as follows: correction of the rotation of the image, detection and processing of staff lines, detection and labeling of musical objects, and recognition and generation of the electronic descriptive document.

Working with early scores makes us pay a bit more attention to the stages related to image preprocessing, to include specific tasks devoted to obtain good binary images. This topic will also be considered in the paper together with all the stages required and the specific algorithms developed to get an electronic description of the music in the scores.

*Archivo de la Catedral de Málaga*(ACM). The ACM was created at the end of the XV-th century and it contains music scores from the XV-th to the XX-th centuries. The OMR system developed will be evaluated on scores written in white mensural notation. We will distinguish between two different styles of notation: the style mainly used in the scores by Stephano di Britto and the style mainly used by Francisco Sanz (XVII-th century and early XVIII-th century, resp.). So, the target scores are documents written in rather different styles (Figure 1): Britto (Figure 1(a)) uses a rigorous style, with squared notes. Sanz (Figure 1(b)) shows a handwritten style close to the modern one, with rounded notes and vertical stems with varying thickness due to the use of a feather pen. The scores of these two authors, and others of less importance in the ACM, are characterized by the presence of frontispieces, located at the beginning of the first page in Sanz style scores, and at the beginning of each voice (two voices per page) in Britto style scores. In both cases, the lyrics (text) of the song are present. The text can be located above or below the staff, and its presence must be taken into account during the preprocessing stage.

## 2. Image Preprocessing

- (a)
selection of the area of interest and elimination of nonmusical elements,

- (b)
grayscale conversion and illumination compensation,

- (c)
image binarization,

- (d)
correction of image rotation.

These steps are implemented in different stages, applying the procedures to both the whole image and to parts of the image to get better results. The following subsections describe the preprocessing stages implemented.

### 2.1. Selection of the Area of Interest and Elimination of Nonmusical Elements

*ROI*extraction algorithm [10] has been developed. After the user manually draws a polygon surrounding the area of interest, the algorithm returns the minimal rectangle containing this image area (Figure 4).

### 2.2. Grayscale Conversion and Illumination Compensation

The original color space of the acquired images is RGB. The musical information of the score is contained in the position and shapes of the music symbols, but not in their color, so the images are converted to grayscale. The algorithm is based on the HSI (*Hue*, *Saturation*, *Lightness*, *Intensity*) model and, so, the conversion implemented is based on a weighted average [10]:

where , and are the coordinates of the color of each pixel.

Now, the process of illumination compensation starts. The objective is to obtain a more uniform background so that the symbols can be more efficiently detected. In our system, the illumination cannot be measured, it must be estimated from the available data.

The acquired image is considered to be the product of the reflectance and illumination fields [11]:

The reflectance measures the light reflection characteristic of the object, varying from , when the surface is completely opaque, to [12]. The reflectance contains the musical information.

The aim is to obtain an estimation of the illumination to obtain a corrected image according to [11].

The next step is to interpolate the illumination pattern to the size of the original image. The starting points for the interpolation precess are placed as shown in Figure 6. The algorithm used is a bicubic piecewise interpolation with a neighborhood of 16 points which gives a smooth illumination field with continuous derivative [14]. Figure 6 shows the steps performed for the compensation of the illumination.

### 2.3. Image Binarization

In our context, the binarization aims to distinguish between the pixels that constitute the music symbols and the background. Using the grayscale image obtained after the process described in the previous section, a threshold , with , must be found to classify the pixels as background or foreground [10].

Now, the threshold must be defined. The two methods employed in our system are the *iterative average* method [10] and the Otsu method [13], based on a deterministic and a probabilistic approach, respectively.

### 2.4. Correction of Image Rotation

The staff lines are a main source of information of the extent of the music symbols and their position. Hence, the processes of detection and extraction of staff lines are, in general, an important stage of an OMR system [9]. In particular, subsequent procedures are simplified if the lines are straight and horizontal. So, a stage for the correction of the global rotation of the image is included. Note that other geometrical corrections [15] have not been considered.

- (1)
a point in the standard plane corresponds to a sinusoidal curve in the Hough plane,

- (2)
a point in the Hough plane corresponds to a straight line in the standard plane,

- (3)
points of the same straight line in the standard plane correspond to sinusoids that share a single common point in the Hough plane.

Once the main slope is detected, the difference with is computed, and the image is rotated to correct its inclination. Such procedure is useful for images with global rotation and low distortion. Unfortunately, most of the images of the scores under analysis have distortions that make the staff appear locally rotated. In order to overcome this inconvenience, the correction of the rotation is implemented only if the detected angle is larger than . In successive steps of the OMR process, the rotation of portions of each single staff is checked and corrected using the technique described here.

## 3. Staff Processing

In this section, the procedure developed to detect and remove the staff lines is presented. The whole procedure includes the detection of the staff lines and their removal using a line tracking algorithm following the characterization in [19]. However, specific processes are included in our implementation, like the normalization of the score size and the local correction of rotation. In the next subsections, the stages of the staff processing procedure are described.

### 3.1. Isolation of the Staves

- (1)
estimation of the thickness of the staff lines,

- (2)
estimation of the average distance between the staff lines and between staves,

- (3)
estimation of the width of the staves and division of the score,

- (4)
revision of the staves extracted.

#### 3.1.1. Estimation of the Thickness of the Staff Lines

A threshold is applied to the row histograms to obtain the reference values to determine the average thickness of the staff lines. The choice of the histogram threshold should be automatic and it should depend on the distribution of black/white values of the row histograms. In order to define the histogram threshold, the overall set of histogram values are clustered into three classes using K-means [21] to obtain the three centroids that represent the extraneous small elements of the score, the horizontal elements different from the staff lines, like the aligned horizontal segments of the characters, and the effective staff lines (see Figure 11). Then, the arithmetic mean between the second and the third centroids defines the histogram threshold.

#### 3.1.2. Estimation or the Average Distance between the Staff Lines and between the Staves

In this case, the K-means algorithm [21] is applied to the distances between consecutive local maxima of the histogram over the histogram threshold to find two clusters. The centroids of these clusters, represent the average distance between the staff lines and the average distance between the staves. The histogram threshold is obtained using the technique described in the previous task (task 1) of the isolation of staves procedure).

#### 3.1.3. Estimation of the Width of the Staff and Division of the Score

Now the parameters described in the previous stages are employed to divide the score into its staves. Assuming that all the staves have the same width for a certain score, the height of the staves is estimated using:

As mentioned before, rotations or distortions of the original image could lead to a wrong detection of the line thickness and to the fail of the entire process. In order to avoid such situation, the parameters used in this stage are calculated using a central portion of the original image. The original image is divided into 16 cells and only the central part (4 cells) is extracted. The rotation of this portion of the image is corrected as described in Section 2.4, and then, the thickness and width parameters are estimated.

#### 3.1.4. Revision of the Staves Extracted

In some handwritten music scores, the margins of the scores do not have the same width and the extraction procedure can lead to a wrong fragmentation of the staves. When the staff is not correctly cut, at least one of the margins is not completely white, conversely, some black elements are in the margins of the image selected. In this case, the row histogram of white pixels can be used to easily detect this problem by simply checking the first and the last values of the white row histogram (see Figures 15(a) and 15(b)), and comparing these values versus the maximum. If the value of the first row is smaller than the maximum, the selection window for that staff is moved up one line. Conversely, if the value of the last row of the histogram is smaller than the maximum, then the selection window for that staff is moved down on line. The process is repeated until a correct staff image, with white margins and containing the whole five lines is obtained.

### 3.2. Scaling of the Score

In order to normalize the dimensions of the score and the descriptors of the objects before any recognition stage, a scaling procedure is considered. A reference measure element is required in order to obtain a global scaling value for the entire staff. The most convenient parameter is the distance between the staff lines. A large set of measures have been carried out on the available image samples and a reference value of pixels has been decided. The scaling factor , between the reference value and the current lines distance is computed by

where is the distance between the lines of the staff measured as described in Section 3.1.2. The image is interpolated to the new size using the nearest neighbor interpolation method (zero order interpolation) [22].

### 3.3. Local Correction of the Rotation

- (1)
split the staff into four equal parts and store the three splitting points,

- (2)
compute the column histogram ( -projection) [7],

- (3)
set a threshold on the column histogram as a multiple of the thickness of the staff lines estimated previously,

- (4)
locate the minimum of the column histogram under the threshold (Figure 16(b)),

- (5)
select as splitting positions the three minima that are the closest to the three points selected at step ( ).

### 3.4. Blanking of the Staff Lines

The staff lines are often an obstacle for symbol tagging and recognition in OMR systems [23]. Hence, a specific staff removal algorithm has been developed. Our blanking algorithm is based on tracking the lines before their removal. Note that the detection of the position of the staff lines is crucial for the location of music symbols and the determination of the pitch. Notes and clefs are painted over the staff lines and their removal can lead to partially erase the symbols. Moreover, the lines can even modify the real aspect of the symbols filling holes or connecting symbols that have to be separated. In the literature, several distinct methods for line blanking can be found [24–30], each of them with a valid issue in the most general conditions, but they do not perform properly when applied to the scores we are analyzing. Even the comparative study in [19] is not able to find a clear best algorithm.

The approach implemented in this work uses the row histogram to detect the position of the lines. Then, a moving window is employed to track the lines and remove them. The details of the process are explained through this section.

To begin tracking the staff lines, a reference point for each staff line must be found. To this end, the approach shown in Section 3.1.1 is used: the row histogram is computed on a portion of the staff, the threshold is computed and the references of the five lines are retrieved.

## 4. Processing of Music Symbols

At this point, we have a white and black image of each staff without the staff lines, the music symbols are present together with artifacts due to parts of the staff lines not deleted, spots, and so forth. The aim of the procedure described in this section is to isolate the sets of black pixels that belong to the musical symbols (notes, clefs, etc.), putting together the pieces that belong to the same music symbol. Therefore, two main steps can be identified: isolation of music elements and combination of elements that belong to the same music symbol. These steps are considered in the following subsections.

### 4.1. Isolation of Music Elements

- (1)
tagging of elements,

- (2)
removal of artifacts.

#### 4.1.1. Tagging of Elements

An element is defined as a group of pixels isolated from their neighborhood. Each of these groups is tagged with a unique value, the pixel connectivity rule [31] is employed to detect the elements using the 4-connected rule.

#### 4.1.2. Removal of Artifacts

Small fragments coming from an incomplete removal of the staff lines, text and other elements must be removed before starting the classification of the tagged object. This task performs two different tests.

*dot*(the music symbol for increasing half the value of a note) are detected and removed. The average size is fixed

*a priori*, evaluating a set of the scores to be recognized and using the distance between staff lines as reference. Now, other elements (text in most cases) generally located at the edges of the staff will be removed. The top and the bottom staff lines are used as reference; if there is any element beyond this lines, it is checked if the element is located completely outside the lines, then, it is removed. An example of the performance of this strategy is shown in Figure 20.

### 4.2. Combination of Elements Belonging to the Same Music Symbol

At this stage, we deal a number of music symbols composed by two or more elements, spatially separated and with different tags. In order to properly feed the classifier, the different parts of the same symbol must be joined and a single tag must be given.

In spite of these processes, the classifier will receive some symbols that do not correspond to real music symbol, hence, the classifier should be able to perform a further inspection based on the possible coherence of the notation and, as suggested in [32, 33], on the application of music laws.

## 5. Classification

At this stage, the vectors of features extracted for the unknown symbols must be compared against a series of patterns of known symbols. The classification strategy and the features to employ have to be decided. In this section, the features that will be used for the classification are described. Then, the classifiers employed are presented [31, 34]. Finally, the task of identification of the location of the symbols is considered.

### 5.1. Extraction of Features of Music Symbols

- (1)
fourier descriptors,

- (2)
bidimensional wavelet transform coefficients,

- (3)
bidimensional Fourier transform coefficients,

- (4)
angular-radial transform coefficients.

These descriptors will be extracted from the scaled music symbols (See Section 3.2) and used in different classification strategies, with different similarity measures.

#### 5.1.1. Fourier Descriptors

The Fourier transform of the set of coordinates of the contour of each symbol is computed to retrieve the vector of Fourier descriptors which is an unidimensional robust and reliable representation of the image [37]. The low frequency components represent the shape of the object, while the highest frequency values follow the finest details. The vector of coordinates of the contour of the object (2D) must be transformed into a unidimensional representation. Two options are considered to code the contour.

where
and
are the coordinates of the centroid and
and
are the coordinates of the *n* th point of the contour of the symbol. The Fourier descriptors are widely employed in the recognition of shapes, where the invariance with respect to geometrical transformations and invariance with respect to changes of the initial point selected for tracking the contour are important. In particular, the zero frequency coefficient corresponds to the centroid, so a normalization of the vector of coordinates by this value gives invariance against translation. Also, the normalization of the coefficients with respect to the first coefficient can provide invariance against scaling. Finally, if only the modulus of the coefficients is observed, invariance against rotation and against changes in the selection of the starting point of the edge vector contour is achieved.

*dilation*operator is applied to the symbols to fill the white spaces and holes (Figure 24), using a

*structural*element fixed

*a priori*for each type of the music notations considered. However, the largest holes may still remain (as in the inner part of the G clef, see Figure 25(b)). Hence, all the edges are tracked using a backtracking bug follower algorithm [17], their coordinates are retrieved and the smaller contours are removed.

#### 5.1.2. Bidimensional Wavelet Transform Coefficients

The wavelet transform is based on the convolution of the original signal with a defined function with a fixed shape (the mother function) that is shifted and scaled to best fit the signal itself [10]. After applying the transformation, some coefficients will be used for the classification. In our case, the mother wavelet will be the CDF 5/3 biorthogonal wavelet (Cohen-Daubechies-Feauveau), also called the LeGall 5/3, widely used in JPEG 2000 lossless compression [38]. The coefficients are obtained computing the wavelet transform of each symbol framed in its tight bounding box. Only the coefficients with the most relevant information are kept. This selection is done taking into account both the frequency content (the first half of the coefficients) and their absolute value (the median of the absolute value of the horizontal component). Finally, the coefficients are employed to compute the following measures used as descriptors: sum of absolute values, energy, standard deviation, mean residual and entropy.

#### 5.1.3. Bidimensional Fourier Transform Coefficients

As the wavelet transform, the Fourier transform is used to obtain a bidimensional frequency spectrum. The coefficients of the transform are selected depending on their magnitude and a series of measures are obtained (as in Section 5.1.2). Note that, according to the comments in Section 5.1.1, only the modulus of the coefficients is taken into account.

#### 5.1.4. Angular-Radial Transform Coefficients

The angular radial transform is a region-based shape descriptor that can represent the shape of an object (even a holed one) using a small number of coefficients [39]. The transform rests on a set of basis functions , that depend on two main parameters ( and ) related to an angle ( ) and a radius ( ) value. In our case, angular functions and radial functions are built to define basis functions. Then, each basis function is iteratively integrated for each location of the image of the symbol to obtain a total amount of descriptors (the first one is used to normalize the others). In order to speed up the extraction of the coefficients, a LUT (look-up table) approach is employed [40].

### 5.2. Classifiers

#### 5.2.1. K-NN Classification

The k-NN classifier is one of the simplest ones, with asymptotically optimal performance. The class membership of an unknown object is obtained computing the most common class among the nearest known elements, for a selected distance measure. Note that the performance of the procedure depends on the number of training members of each class [31].

For our classifier, the statistics of order one and two of each feature and for each class of symbol are computed. The features that do not allow to distinguish among different classes are rejected.

The two sets of Fourier descriptors (Section 5.1.1) are entirely included, leading to the two sets of features. Similarly, the whole two sets of coefficients obtained by the angular-radial transform (Section 5.1.4) are kept. The first three parameters obtained by the bidimensional wavelet transform (Section 5.1.2) are used: the two energy related measures and its standard deviation, because they are the only ones showing a reasonable reliability for the discrimination of classes. Finally, only the sum of the absolute values of the bidimensional Fourier transform (Section 5.1.3) is retained, for the same reason.

The two distance measures employed are

#### 5.2.2. Classifiers Based on Mahalanobis Distance

The Mahalanobis distance [41] is used in a classification procedure to measure the distance of the feature vector of an object to the centroid of each class. The definition of this measure rest on the dissimilarity measure between two multidimensional random variables, calculated using the covariance matrix as

where the distance is computed between the features vector of the unknown symbol and the centroid of the class .

Note that the inverse of is required, so must be nonsingular. To this end, the number of training elements with linear independence of each class should not be smaller than the dimension of the feature vector. Since there are some rare (not commonly used) musical objects in the data available, a reduced number of features is required. We have selected features, among the ones with the smallest variance within a class, to guarantee that exists.

#### 5.2.3. Classifiers Based on the Fisher Linear Discriminant

The Fisher linear discriminant approach is based on the search of a linear combination of the vector of features such that the dissimilarity between two classes is as large as possible [31]. In particular, the Fisher linear discriminant aims to represent two clouds of multidimensional vectors in a favorable unidimensional framework [42]. To this end, a projection vector , for the vectors of features is found, such that the projections form two clouds of elements of the two classes, projected on a line, such that the lines distance is as large as possible. Then, the membership of an unknown symbol (vector of features) is derived from the location of its projection. In particular, a k-NN approach can be used and it can also be assumed that the projections of the vectors of features follow a Gaussian distribution [42]. This model can be used to compute the probability that a certain projection belongs to a certain distribution (class). Both approaches are implemented in this work.

Note that the Fisher linear discriminant is defined in a two-class framework whilst an OMR system aims to recognize the proper symbol among several classes, so, an exhaustive search for the class membership is done.

#### 5.2.4. Building the Training Database

About eighty scores written in white mensural notation in the two styles considered (Stephano di Britto and Maestro Sanz notation styles) have been analyzed. These scores contain more than isolated music symbols. About of the scores are written with the style of di Britto and about of the scores of each style correspond to these two authors. Note that we have not found significative differences in the results and the features obtained for these main authors with respect to the results and features obtained for others. A minimum of samples of the less common symbols (classes) are stored. When the samples of a certain class are not enough to reach the lower limit of , the necessary elements are generated artificially using nonlinear morphological operations.

### 5.3. Locating the Symbol Position (Pitch)

The final task related to the recognition of music symbols is the determination of the position of each of them in the staff. Note that, at this stage, the accurate tracking of the staff lines has already been performed and their positions, throughout the whole staff, are known.

Observe that there exist two classes of symbols for which the position is not relevant: the *longa* silence and the C clef.

Also note that other alternatives could be used to determine the pitch of the symbols. For example, the location of the bounding box in the scaled score together with the location and shape of the model of the symbol in the box would be enough to accomplish this task.

## 6. Evaluation of the System Performance

Two main tasks are directly related to the global success of the recognizer: the extraction and the classification of the symbols. Thus, the evaluation of the OMR system is based on the analysis of the results of these two stages.

### 6.1. Performance Evaluation of the Algorithm of Symbol Extraction

Success rates of the extraction algorithm for samples written in the notation styles of Britto and Sanz.

Notation style | % of symbols correctly extracted | % of symbols Extracted with errors | % of symbols completely lost |
---|---|---|---|

Britto | 80.58% | 12.81% | 6.61% |

Sanz | 64.98% | 14.34% | 20.68% |

Both, the symbols correctly extracted and the ones extracted with errors are included in the evaluation of the classifier, as described in the next sections.

### 6.2. Implementation Results of the k-NN Classifier

Correct classification rates for the k-NN method for symbols correctly extracted and partially extracted. The methods employed for the extraction of the vectors of features are: the Fourier descriptor, with the edge function computed by distance from the centroid (FD1), the Wavelet transform coefficients (WTC), the Fourier transform coefficients (FTC) and the two sets of angular-radial transform coefficients based on center of gravity of the edges (ART1) and on the center of the bounding box (ART2).

K-NN classifier results | |||||
---|---|---|---|---|---|

Classification rate with entire symbols | Classification rate with partial symbols | ||||

Notation style | Notation style | ||||

Britto | Sanz | Britto | Sanz | ||

K = 1 | 72.31% | 57.80% | 64.52% | 29.41% | |

FD1 | K = 3 | 73.33% | 58.40% | 61.29% | 29.41% |

K = 5 | 74.87% | 61.04% | 64.52% | 26.47% | |

K = 1 | 63.08% | 40.26% | 16.13% | 5.88% | |

WTC | K = 3 | 68.72% | 46.75% | 16.13% | 8.82% |

K = 5 | 74.36% | 51.30% | 16.13% | 11.76% | |

K = 1 | 48.72% | 35.71% | 0% | 2.94% | |

FTC | K = 3 | 48.72% | 40.91% | 0% | 2.94% |

K = 5 | 53.33% | 44.81% | 0% | 2.94% | |

K = 1 | 95.90% | 79.87% | 58.06% | 20.59% | |

ART1 | K = 3 | 95.38% | 78.57% | 61.29% | 26.47% |

K = 5 | 95.38% | 81.82% | 70.97% | 26.47% | |

K = 1 | 94.36% | 75.32% | 22.58% | 32.35% | |

ART2 | K = 3 | 91.79% | 72.73% | 41.93% | 35.29% |

K = 5 | 91.79% | 70.12% | 51.61% | 38.23% |

The results are generally better for the scores that use the notation style of Britto than for the ones that use Sanz notation style. This is mainly due to the generally lower quality of arts of the scores written in Sanz notation style and, also, to the different discrimination capabilities of the features when applied to different notation styles.

In general, the Fourier descriptors used show good performance, showing the best results for symbols hardly recognizable, and partially extracted. This is mainly due to the approach used for the selection of the largest object in the framework, based on the contours (Section 5.1.1). The wavelet coefficients seem to be more heavily influenced by the worst conditions of Sanz style scores, but the classifier attains reasonable results when using the k-NN method. Recall that the k-NN shows a high degree of dependence on the robustness of the features employed. This is the reason for the poor results obtained using the Fourier transform coefficients, when only one feature is used (Section 5.2.1). Finally, the results of the k-NN classifier implemented with the angular-radial transform (ART) coefficients are the best ones. The method that uses the centroid computed as the center of gravity of the contour of the objects shows slightly better classification rates than the approach that uses the center of the bounding box of the object.

### 6.3. Implementation Results of the Classifier Based on the Mahalanobis Distance

Correct classification rates for the classifier based on the Mahalanobis distance. The vectors of features are obtained from the angular-radial transform coefficients, with reference on the center of gravity of the contour (ART1) and on the center of the bounding box (ART2).

Mahalanobis distance classifier results | ||||
---|---|---|---|---|

Classification rate | Classification rate | |||

with entire symbols | with partial symbols | |||

Notation style | Notation style | |||

Britto | Sanz | Britto | Sanz | |

ART1 | 74.36% | 59.09% | 12.90% | 35.29% |

ART2 | 69.23% | 56.49% | 48.39% | 23.53% |

Better results would be expected if more training elements of all the classes were available, thus allowing to use larger vectors of features in this procedure.

### 6.4. Implementation Results of the Fisher Linear Discriminant

Correct classification rates for the Fisher method for both the symbols correctly extracted and partially extracted. The vectors of features employed are the Fourier descriptors of the distance to the centroid of the contour points (FD1) and the two sets of angular-radial transform coefficients with center at the centroid of the contour and at the center of bounding box, ART1 and ART2, respectively. The choice of the membership is done using a k-NN and Gaussian approach.

Fisher linear classifier results | |||||
---|---|---|---|---|---|

Classification rate with entire symbols | Classification rate with partial symbols | ||||

Notation style | Notation style | ||||

Britto | Sanz | Britto | Sanz | ||

K = 1 | 73.33% | 59.74% | 41.94% | 20.59% | |

ART1 k-NN | K = 3 | 74.87% | 64.28% | 45.16% | 17.65% |

K = 5 | 75.38% | 64.28% | 41.94% | 17.65% | |

K = 1 | 67.18% | 45.45% | 45.16% | 17.65% | |

ART2 k-NN | K = 3 | 67.18% | 47.40% | 45.16% | 20.59% |

K = 5 | 67.18% | 49.35% | 45.16% | 20.59% | |

ART1 Gaussian | 72.31% | 59.06% | 32.26% | 23.53% | |

ART2 Gaussian | 63.08% | 46.10% | 45.16% | 20.59% | |

FD1 Gaussian | 62.05% | 51.95% | 35.48% | 29.41% |

Again, the results obtained using the angular radial transform coefficients with reference at the center of gravity of the contour ART1 are better than the alternative approaches. Note that there is no marked difference between the k-NN and the Gaussian method of classification using the projections of the vectors of features.

## 7. Computer Music Representation

After all the stages of the OMR system are completed (Figure 2), the recognized symbols can be employed to write down the score in different engraving styles or even to make it sound. Nowadays, there is no real standard for computer symbolic music representation [8], although different representation formats (sometime linked to certain applications) are available. Among them, MusicXML [44] is a format to represent western music notation from the 17th century onwards. WEDELMUSIC [45] is a XML compliant format which can include the image of the score and an associated WAV or MIDI file and it is mainly aimed to support the development of new emerging applications. GUIDO [46] is a general purpose language for representing scores. More recently, MPEG-SMR (Symbolic Music Representation) [47] aims to become a real standard to cope with computer music representation and the related emerging needs of new interactive music applications.

In our case, we have selected Lilypond [48] for music engraving. This program, and associated coding language, is part of the GNU project and accepts an ASCII input to engrave the score. It determines the spacing by itself, and breaks lines and pages to provide a tight and uniform layout. An important feature in our context is that it can draw the score in modern notation and, with minimum changes, the score in white mensural notation can also be obtained [49]. Additionally, the program can also generate the MIDI file of the typed score [50] so that the recognized score can be listened.

*brevis*in white mensural notation) has been written as a quarter-note tied to a whole-note tied to a half-note-dot, instead of as a square note (a square note two whole notes). These changes need to be done by hand since the version of Lilypond employed does not make such corrections automatically.

Observe that the headers are different (Figure 30), depending on the type of notation selected. Also, note that the code for the modern notation includes, at the end, the command [50] to generate the corresponding MIDI file.

## 8. Conclusions and Discussion

A complete OMR system for the analysis of manuscript music scores written in white mensural notation of the XVII-th and early XVIII-th centuries has been presented and two different notation styles have been considered. Multiple methods for the extraction of features of the music symbols are implemented and the resulting vectors are employed in several classification strategies.

User interaction has been limited to the selection of an initial ROI and the choice of some of the processing techniques available in the system at certain stages. Also, the calculation of the Hough transform used to correct the global rotation of the image, which is the process that takes longer computation time in the system implemented, can be replaced by the manual introduction of the rotation angle.

In the experiments, it has been observed that, in spite of the size of the database of scores used for training, the presence of some rare symbols had an important influence on the system. Some of the classification strategies have been adapted to use a reduced number of features in order to cope with these rare symbols in the same way as with the other common symbols. Hence, the methods that do not suffer from the scarceness of the reference elements are the ones that attain the best performance.

The best combination of techniques involves the usage of the k-NN method and the vectors of features based on the angular-radial transform (ART) coefficients. Success rates reach about of symbols correctly recognized. Such performance is attained using a k-NN classifier that employs a large number of features ( angular radial transform coefficients) that are able to represent, with high level of reliability, the very complex shape of the extracted symbols.

An open source program for music engraving (Lilypond) has been found useful to produce new scores from the ones processed using modern notation or white mensural notation, as in the original scores. Also, MIDI files could be automatically generated.

The trend for future developments of the system is mainly based on the improvement of the performance of the preprocessing steps. In fact, these stages are very important for the development of the OMR system. Also, a largest database of training data could allow to use more robustly some of the classification strategies evaluated, like the ones based on the Fisher linear discriminant, which are actually limited by the availability of samples of objects of certain rare classes.

## Declarations

### Acknowledgments

This work has been funded by the Ministerio de Educatión y Ciencia of the Spanish Government under Project no. TSI2007-61181 and by the Junta de Andalucía under Project Number P07-TIC-02783. The authors are grateful to the person in charge of the Archivo de la Catedral de Málaga, who allowed the utilization of the data sets used in this work.

## Authors’ Affiliations

## References

- Bainbridge D, Bell T:
**The challenge of optical music recognition.***Computers and the Humanities*2001,**35**(2):95-121. 10.1023/A:1002485918032View ArticleGoogle Scholar - Wolman J, Choi J, Asgharzadeh S, Kahana J: Recognition of handwritten music notation. Proceedings of the International Computer Music Conference, 1992, San Jose, Calif, USA 125-127.Google Scholar
- McGee W, Merkley P:
**The optical scanning of medieval music.***Computers and the Humanities*1991,**25**(1):47-53.View ArticleGoogle Scholar - Carter NP:
**Segmentation and preliminary recognition of madrigals notated in white mensural notation.***Machine Vision and Applications*1992,**5**(3):223-230. 10.1007/BF02627000View ArticleGoogle Scholar - Pugin L, Burgoyne JA, Fujinaga I:
**Goal-directed evaluation for the improvement of optical music recognition on early music prints.***Proceedings of the 7th ACM/IEEE Joint Conference on Digital Libraries (JCDL '07), June 2007, Vancouver, Canada*303-304.Google Scholar - Caldas Pinto J, Vieira P, Ramalho M, Mengucci M, Pina P, Muge F:
**Ancient music recovery for digital libraries.***Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries, January 2000, Lecture Notes in Computer Science*24-34.View ArticleGoogle Scholar - Fujinaga I:
*Adaptive optical music recognition, Ph.D. thesis*. Faculty of Music, McGill University; June 1996.Google Scholar - Droetboom M, Fujinaga I, MacMillan K: Optical music interpretation. Proceedings of the IAPR International Workshop on Structural, Syntactic and Statistical Pattern Recognition, 2002, Lecture Notes in Computer Science 378-386.Google Scholar
- Bainbridge D: Optical music recognition. In Progress Report. Department of Computer Science, University of Canterbury; 1994.Google Scholar
- González RC, Woods RE:
*Digital Image Processing*. Prentice-Hall, Upper Saddle River, NJ, USA; 2007.Google Scholar - Trabocchi O, Sanfilippo F:
*Efectos de la iluminación.*Ingeniero en Automatización Y Control Industrial, Universidad Nacional de Quilmes; 2005.Google Scholar - Sanfilippo F, Trabocchi O:
*Tipos de iluminación.*Ingeniero en Automatización Y Control Industrial, Universidad Nacional de Quilmes; 2005.Google Scholar - Otsu N:
**A threshold selection method from gray-level histograms.***IEEE Transactions on Systems, Man and Cybernetics*1979,**9**(1):62-66.View ArticleMathSciNetGoogle Scholar - Lu H, Cary PD:
**Deformation measurements by digital image correlation: implementation of a second-order displacement gradient.***Experimental Mechanics*2000,**40**(4):393-400. 10.1007/BF02326485View ArticleGoogle Scholar - Lobb R, Bell T, Bainbridge D: Fast capture of sheet music for an agile digital music library. Proceedings of the International Symposium on Music Information Retrieval, 2005 145-152.Google Scholar
- Pitas I:
*Digital Image Processing Algorithms and Applications*. Wiley-Interscience, New York, NY, USA; 2000.Google Scholar - Pratt WK:
*Digital Image Processing*. John Wiley & Sons, New York, NY, USA; 2007.View ArticleMATHGoogle Scholar - Duda RO, Hart PE:
*Pattern Classification and Scene Analysis*. John Wiley & Sons, New York, NY, USA; 1973.MATHGoogle Scholar - Dalitz C, Droettboom M, Pranzas B, Fujinaga I:
**A comparative study of staff removal algorithms.***IEEE Transactions on Pattern Analysis and Machine Intelligence*2008,**30**(5):753-766.View ArticleGoogle Scholar - Szwoch M:
**A robust detector for distorted music staves.***Proceedings of the 11th International Conference on Computer Analysis of Images and Patterns (CAIP '05), September 2005***3691:**701-708.Google Scholar - Therrien CW:
*Decision Estimation and Classification: An Introduction to Pattern Recognition and Related Topics*. John Wiley & Sons, New York, NY, USA; 1989.MATHGoogle Scholar - Lim JS:
*Two-Dimensional Signal and Image Processing*. Prentice-Hall, Upper Saddle River, NJ, USA; 1990.Google Scholar - Blostein D, Baird HS:
**A critical survey of music image analysis.**In*Structured Document Image Analysis*. Edited by: Baird HS, Bunke H, Yamamoto K. Springer, Berlin, Germany; 1992:405-434.View ArticleGoogle Scholar - Pruslin D:
*Automatic recognition of sheet music, Ph.D. thesis*. Massachusetts Institute of Technology, Cambridge, Mass, USA; June 1966.Google Scholar - Prerau DS:
*Computer pattern recognition of standard engraved music notation, Ph.D. thesis*. Massachusetts Institute of Technology, Cambridge, Mass, USA; September 1970.Google Scholar - Andronico A, Ciampa A: On automatic pattern recognition and acquisition of printed music. Proceedings of the International Computer Music Conference (ICMC '82), 1982 245-278.Google Scholar
- Mahoney JV:
*Automatic analysis of musical score images, M.S. thesis*. Massachusetts Institute of Technology, Cambridge, Mass, USA; 1982.Google Scholar - Roach JW, Tatem JE:
**Using domain knowledge in low-level visual processing to interpret handwritten music: an experiment.***Pattern Recognition*1988,**21**(1):33-44. 10.1016/0031-3203(88)90069-6View ArticleGoogle Scholar - Carter NP:
*Automatic recognition of printed music in the context of electronic publishing, Ph.D. thesis*. University of Surrey; February 1989.Google Scholar - Kato H, Inokuchi S:
**The recognition system of printed piano using musical knowledge and constraints.***Proceedings of IAPR Workshop on Syntactic and Structured Pattern Recognition, June 1990*231-248.Google Scholar - Duda RO, Hart PE, Stork DG:
*Pattern Classification*. Wiley-Interscience, New York, NY, USA; 2000.Google Scholar - Ng KC, Boyle RD:
**Recognition and reconstruction of primitives in music scores.***Image and Vision Computing*1996,**14**(1):39-46. 10.1016/0262-8856(95)01038-6View ArticleGoogle Scholar - Gezerlis VG, Theodoridis S:
**Optical character recognition of the Orthodox Hellenic Byzantine music notation.***Pattern Recognition*2002,**35**(4):895-914. 10.1016/S0031-3203(01)00098-XView ArticleMATHGoogle Scholar - Theodoridis S, Koutroumbas K:
*Pattern Recognition*. Academic Press, New York, NY, USA; 2006.MATHGoogle Scholar - Llobet R, Pérez J, Paredes R: Técnicas reconocimiento de formas aplicadas al diagnóstico de cáncer asistido por ordenador. RevistaeSalud.com 2006.,2(7):Google Scholar
- Sonka M, Havac V, Boyle R:
*Image Processing, Analysis and Machine Vision*. Cambridge University Press, Cambridge, UK; 1993.View ArticleGoogle Scholar - Zahn CT, Roskies RZ:
**Fourier descriptors for plane closed curves.***IEEE Transactions on Computers*1972,**21**(3):269-281.View ArticleMathSciNetMATHGoogle Scholar - Christopoulos C, Skodras A, Ebrahimi T:
**The JPEG2000 still image coding system: an overview.***IEEE Transactions on Consumer Electronics*2000,**46**(4):1103-1127. 10.1109/30.920468View ArticleGoogle Scholar - Hwang S-K, Kim W-Y:
**Fast and efficient method for computing ART.***IEEE Transactions on Image Processing*2006,**15**(1):112-117.View ArticleGoogle Scholar - Höynck M, Ohm J-R:
**Shape retrieval with robustness against partial occlusion.***Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), April 2003, Hong Kong***3:**593-596.Google Scholar - De Maesschalck R, Jouan-Rimbaud D, Massart DL:
**Tutorial: the Mahalanobis distance.***Chemometrics and Intelligent Laboratory Systems*2000,**50:**1-8. 10.1016/S0169-7439(99)00047-7View ArticleGoogle Scholar - Stark H, Woods JW:
*Probability Random Processes & Estimation Theory for Engineers*. Prentice-Hall, Upper Saddle River, NJ, USA; 1994.Google Scholar - Bainbridge D, Bell TC:
**Dealing with superimposed objects in optical music recognition.***Proceedings of the 6th International Conference on Image Processing and Its Applications, July 1997, Dublin, Ireland***2:**756-760.View ArticleGoogle Scholar - Good M:
**MusicXML for notation and analysis.**In*The Virtual Score: Representation, Retrieval, Restoration, Computing in Musicology, no. 12, Cambridge, Mass, USA*. Edited by: Hewlett WB, Selfridge-Field E. MIT Press; 2001:113-124.Google Scholar - Bellini P, Nessi P: WEDELMUSIC format: and XML music notation format for emerging applications. Proceedings of the 1st International Conference on WEB Delivering of Music (WEDELMUSIC '01), 2001 79-86.Google Scholar
- Hoos HH, Hamel KA, Reinz K, Kilian J: The GUIDO notation format. A novel approach for adequately representing score-level music. Proceedings of the International Computer Music Conference, 1998 451-454.Google Scholar
- Bellini P, Nesi P, Zoia G:
**Symbolic music representation in MPEG.***IEEE Multimedia*2005,**12**(4):42-49. 10.1109/MMUL.2005.82View ArticleGoogle Scholar **LilyPond**…**music notation for everyone**2009, http://lilypond.org/web/index- Nienhuys H-W, Nieuwenhuizen J: Lilypond, a system for automated music engraving. Proceedings of the 14th Colloquium on Musical Informatis, 2003, Florence, Italy CIM-1-CIM-6.Google Scholar
**Gnu lilypond—notation reference**2009, http://lilypond.org/doc/v2.12/Documentation/user/lilypond.pdf

## Copyright

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.