Quality assessment of image-based biometric information

The quality of biometric raw data is one of the main factors affecting the overall performance of biometric systems. Poor biometric samples increase the enrollment failure and decrease the system performance. Hence, controlling the quality of the acquired biometric raw data is essential in order to have useful biometric authentication systems. Towards this goal, we present a generic methodology for the quality assessment of image-based biometric modality combining two types of information: 1) image quality and 2) pattern-based quality using the scale-invariant feature transformation (SIFT) descriptor. The associated metric has the advantages of being multimodal (face, fingerprint, and hand veins) and independent from the used authentication system. Six benchmark databases and one biometric verification system are used to illustrate the benefits of the proposed metric. A comparison study with the National Institute of Standards and Technology (NIST) fingerprint image quality (NFIQ) metric proposed by the NIST shows the benefits of the presented metric.


Introduction
Biometric systems are being increasingly used in our daily life to manage the access of physical (such as border control) and logical (such as ecommerce) resources. Biometrics uses the authentication factors based on "Something that qualifies the user" and "Something the user can do". The main benefits of this authentication method is the strong relationship between the individual and its authenticator, as well as the easiness of its use.
Also, it is usually more difficult to copy the biometric characteristics of an individual than most of other authentication methods such as passwords.
Despite the advantages of biometric systems, many drawbacks decrease their proliferation. The main one is the uncertainty of the verification result. By contrast to password checking, the verification of biometric raw data is subject to errors and represented by a similarity percentage (100% is never reached). This verification inaccuracy is due to many reasons such as the variations of human characteristics (e.g., occlusions [1]), environmental factors (e.g., illuminations [2]) and cross-device matching [3]. This kind of acquisition artifacts may deeply affect the performance of biometric systems and hence, decrease their use in real life applications. Moreover, the impact of quality on the system overall performace is also presented by the results of the FVC seriers of competitions (FVC in 2000(FVC in , 2002(FVC in , 2004(FVC in and 2006 [4]. Therefore, controlling the quality of the acquired biometric raw data is considered as an essential step in both enrollment and verification phases. Using the quality information, poor quality samples can be removed during the enrollment or rejected during the verification. Such information could be also used for soft biometrics and multimodal approaches [5,6]. We present in this paper a quality assessment metric of image-based biometric raw data using both information: 1) image quality and 2) patternbased quality using the SIFT keyspoints extracted from the image. The presented metric has the advantages of being multimodal (face, fingerprint and hand veins), and independent from the used authentication system. The outline of the paper is given as follows: Section 2 presents related previous works on quality assessment of biometric raw data. We present in Section 3 the proposed quality assessment metric. Section 4 describes the experimental results obtained for the six trial biometric databases (four for face, two for fingerprint and hand veins, respectively). A comparison study with the NFIQ 1 metric on fingerprints is given in Section 5. A conclusion 1 NIST Fingerprint Image Quality 3 and some perspectives of this work are given in Section 6.

Related works
The quality assessment of biometric raw data is receiving more and more attention in biometrics community. We present in this section an overview of existing biometric image-based quality metrics.
The quality assessment of biometric raw data is divided into three points of view as illustrated in Figure 1 [7]: • Character: refers to the quality of the physical features of the individual.
• Fidelity: refers to the degree of similarity between a biometric sample and its source.
• Utility: refers to the impact of the individual biometric sample on the overall performance of a biometric system. In biometrics, there is an international consensus on the fact that the quality of a biometric sample should be related to its recognition performance [8]. Therefore, we present in this paper a utility-based quality assessment metric of biometric raw data. In the rest of this section, we present an overview of the existing image-based quality metrics.
Alonso-Fernandez et al. [9] present an extensive overview of existing fingerprint quality metrics which are mainly divided into three major categories: 1. Based on the use of local features of the image; 2. Based on the use of global features of the image; 3. Or addressing the problem of quality assessment as a classification problem.
The presented methods in [9] have shown their efficiency in predicting the quality of fingerprints images. However, these methods are modality-dependent, hence they cannot be used for other kinds of modalities (such as Face). An example of these metrics is the NIST Fingerprint Image Quality metric (NFIQ) [10] proposed by the NIST. NFIQ metric is dedicated to fingerprint quality evaluation.
Shen et al. [11] applied Gabor filters to identify blocks with clear ridge and valley patterns as good quality blocks. Lim et al. [12] combined local and global spatial features to detect low quality and invalid fingerprint images.
Chen et al. [13] developed two new quality indices for fingerprint images.
The first index measures the energy concentration in the frequency domain as a global feature. The second index measures the spatial coherence in local regions. These methods has shown their efficiency in predicting the quality of fingerprint images. However, they are didicated for fingerprint modality 5 and could not be used for other modalities such as veins images.
Krichen et al. [1] present a probabilistic iris quality measure based on a Gaussian Mixture Model (GMM). The authors compared the efficiency of their metric with existing ones according two types of alterations (occlusions and blurring) which may significantly decrease the performance of iris recognition systems. Chaskar et al. [14] assessed nine quality factors of iris images such as Ideal Iris Resolution(IIR), Actual Iris Resolution (AIR), etc. Other iris quality metrics are presented in [15,16]. However, these methods are used to measure the quality of iris image, and cannot be used for other types of modalities.
He et al. [17] present a hierarchical model to compute the biometric sample quality at three levels: database, class and image quality levels. The method is based on the quantiles of genuine and impostor matching score distributions. However, their model could not be used directly on a single capture (i.e., requires a pre-acquired database).
Zhang and Wang [2] present an asymmetry-based quality assessment method of face images. The method uses SIFT descriptor for quality assessment. The presented method has shown its robustness against illumination and pose variations. Another asymmetry-based method is presented in [18,19]. However, this approach supposes the asymmetry hypothesis hence, could not be used for the others types of modalities.
For the finger veins modality, very few are the existing works that predict the quality of finger veins images. We can cite the work presented by Qin et al. [20]. The authors present a quality assessment method of finger veins images based on Radon transform to detect the local vein patterns. We believe that extensive work should be done in this area since the veins modality is considered as a promising solution to be implemented.

Discussion
Quality assessment of biometric raw data is an essential step to achieve a better accuracy in real life applications. Despite this, few researches have been conducted to this point with respect to research activities on performance side. However, most of the existing quality metrics are modalityand matcher-dependent. The others, based on the genuine and impostor matching score distributions, could not be used directly on a single capture (i.e., they require a large number of captures for the same person in order to constitute its genuine score distribution). Therefore, the main contribution of this paper is the definition of a quality metric which can be considered as independent from the used matching system, and also it can be used for several biometric modalities (face, fingerprint and hand veins images). It detects with a reasonable accuracy three types of alterations that may deeply affect the global performance of the most widely used matching systems.
The presented metric is not based on asymmetry hypothesis. Thus, it may be used for several types of modalities (such as fingerprint, face, hand and finger veins), and can be used directly on a single capture after training the model.

Developed Metric
The presented metric is designed to quantify the quality of image-based biometric data using two types of information as illustrated in Figure 2. The retained principle is as follows: using one image quality criterion (Section 3.1) 7 and four pattern-based quality criteria (Section 3.2), a SVM-based classification process (Section 3.3) is performed to predict the quality of the target biometric data.

No-reference image quality
The image quality assessment is an active research topic which is widely used to validate treatment processes applied to digital images. In the context of image compression, for example, such kind of assessment is used to quantify the quality of the reconstructed image. Existing image quality assessment metrics are divided into three categories: 1) Full-Reference (FR) quality metrics, where the target image is compared with a reference signal that is assumed to have perfect quality; 2) Reduced-Reference (RR) quality metrics, where a description of the target image is compared with a description of the reference signal; and 3) No-Reference (NR) quality metrics, where the target image is evaluated without any reference to the original one. Despite the acceptable performance of current FR quality algorithms, the need for a reference signal limits their application, and calls for reliable no-reference algorithms.
In our study, we have used a No-Reference Image Quality Assessment (NR-IQA) index, since the reference image does not exist. The used NR-IQA method in this paper is the BLIINDS 2 index introduced by Saad et al. [21]. This index is based on a DCT framework. This makes it computationally convenient, uses a commonly used transform, and allows a coherent framework. The BLIINDS index is defined from four features, using 17 × 17 image patches centered at every pixel in the image, that are then pooled together: Contrast is a basic perceptual attribute of an image. One may distinguish between global contrast measures and ones that are computed locally (and possibly pooled into one measure post local extraction).
The contrast of the k th local DCT patch is computed as follows: where N is the patch size, x DC represents the DC coefficient and the set {x i AC | i = 1 : N } represents the AC coefficients. Then, the local contrast scores from all patches of the image are then pooled together by averaging the computed values to obtain a global image contrast value υ 1 : 2 BLind Image Integrity Notator using DCT Statistics where M is the number of local patches. its kurtosis is computed to quantify the degree of its peakedness and tail weight: where µ is the mean of x AC , and σ is its standard deviation. Then, the resulting values for all patches are pooled together by averaging the lowest tenth percentile of the obtained values to compute the global image kurtosis value υ 2 .

DCT-Based Anisotropy orientation (υ 3 and υ 4 )
It has been hypothesized that degradation processes damage a scene's directional information. Consequently, anisotropy, which is a directionally dependent quality of images, was shown by Gabarda and Cristbal [22] to decrease as more degradation is added to the image. The anisotropy measure is computed using the Renyi Entropy on DCT image patches along four different orientations θ = 0, 45, 90, 135 in degrees. Each patch consists of the DCT coefficients of oriented pixel intensities. We discard the DC coefficient, since the focus is on direc-tional information. Let the DCT coefficients of k th patch of orientation θ be denoted by P θ [k, j], where j is the frequency index of the DCT coefficient. Each DCT patch is then subjected to a normalization of where N is the size of the oriented k th patch. Finally, the associated Renyi entropy R k θ is computed as where β > 1. Finally, the two measures of anisotropy υ 3 and υ 4 are defined as Due to the fact that the perception of image details depends on the image resolution, the distance from the image plane to the observer, and the acuity of the observers visual system, a multiscale approach is applied to compute the final global score as: constraints by 4 j=1 L i=1 α i j = 1 and where L represents the number of decomposition level used. The α i j values are obtained using the correlation of each criterion (υ i ) with the subjective notes given by human observers [21].
Examples of predicted quality score using BLIINDS index are given in Figure 3. The stronger the image is degraded, the lower the quality index is. 13.58 11.15 9.35 8.50

Pattern-based quality
The used pattern-based quality criteria are based on statistical measures of keypoints features. We have used this approach since keypoints features describe, in a stable manner, the regions of the image where the information is important. This approach is widely used in object [23] and biometric recognition [24] issues. For the descriptor vector computation, several methods exist in the literature such as the Scale Invariant Feature Transform (SIFT) [25], Shape Contexts [26], Speed Up Robust Features (SURF) [27]. In our study, we have used the SIFT algorithm since a comparison study presented by Mikolajczyk and Schmid [28] show that SIFT outperformed the other methods.
SIFT algorithm consists of four major stages: 1) scale-space extrema detection, 2) keypoint localization, 3) orientation assignment and 4) keypoint descriptor. In the first stage, potential interest points are identified, using a difference-of-Gaussian function, that are invariant to scale and orientation.
In the second stage, candidate keypoints are localized to sub-pixel accuracy and eliminated if found to be unstable. The third stage identifies the dom- In other words, each image im is described by a set of invariant features to contribute to the quality assessment of the biometric raw data.

SVM-based classification
In order to predict biometric sample quality using both information (image quality and pattern-based quality), we use the Support Vector Machine (SVM). From all existing classification schemes, a SVM-based technique has been selected due to high classification rates obtained in previous works [33] and to their high generalization abilities. . The second step is to find an optimal decision hyperplane in this space. The criterion for optimality will be defined shortly. Note that for the same training set, different transformations Φ(·) may lead to different decision functions. A transformation is achieved in an implicit manner using a kernel K(·, ·) and consequently the decision function can be defined as : with α * i ∈ R. The values w and b are the parameters defining the linear decision hyperplane. In SVMs, the optimality criterion to maximize is the margin, that is to say, the distance between the hyperplane and the nearest point Φ(x i ) of the training set. The α * i which optimize this criterion are obtained by solving the following problem : where C is a penalization coefficient for data points located in or beyond the margin and provides a compromise between their numbers and the width of the margin. In this paper, we use the RBF kernel: In order to train models with RBF kernels, we use a python script provided by the libsvm library [35]. This script automatically scales training and testing sets. It searches (only for the training set) the best couple (C, γ) of the kernel. The search of the best couple (C, γ) is done using a five-fold cross-validation computation.
Originally, SVMs have essentially been developed for the two classes problems. However, several approaches can be used for extending SVMs to multi-

Experimental Results
The goal of the proposed quality metric is to detect, with a reasonable accuracy, three synthetic alterations which may deeply affect the most widely used matching systems. The proposed metric may be considered as independent from the used matching system. An example of its practical use is illustrated in Figure 5. The method predicts the alteration of the input image. Then, depending from the robustness of the used matching system against the predicted alteration, the matching system qualifies the image (good, fair, bad or very bad quality).

Protocol
Six benchmark databases and one biometric matching algorithm are used in order to validate the proposed metric.

Alteration process
In this study, we introduce three types of synthetic alterations as well as three levels for each type using the MATLAB tool: • Blurring alteration: blurring images are obtained using a two-dimensional Gaussian filter. To do so, we use the fspecial ('gaussian', hsize, σ) method which returns a rotationally symmetric Gaussian lowpass filter of size hsize with standard deviation σ.   Using these alterations, the input vector to SVM is the five retained quality criteria (one for image quality and four pattern-based quality) and the output can belong to ten different classes defined as follows (see Table 2): • class 1 illustrates a reference image.
• classes 2 to 10 illustrate 3 types of alterations and 3 levels for each type (see Section 4.1.2 for details about the introduced alterations).

Benchmark databases
In this study, we use six benchmark databases. For each database, we introduce three types of alterations (blurring, gaussian noise and resize alter-         Figure 12 shows these alterations on a sample from FACES94 database.

Biometric matching algorithm
The used biometric matching algorithm is a SIFT-based algorithm [25].
The matching similarity principle used is described in previous works [24].
Each image im is described by a set of invariant features X(im) as described in Section 3.2. The verification between two images im 1 and im 2 corresponds 21 to the similarity between two sets of features X(im 1 ) and X(im 2 ). We thus use the following matching method which is a modified version of a decision criterion first proposed by Lowe [25]. Given two keypoints x ∈ X(im 1 ) and y ∈ X(im 2 ), we say that x is associated to y if: where C is an arbitrary threshold, d(·, ·) denotes the Euclidean distance between the SIFT descriptors and y denotes any point of X(im 2 ) whose distance to x is minimal but greater than d(x, y): In other words, x is associated to y if y is the closest point from x in X(im 2 ) according to the Euclidean distance between SIFT descriptors and if the second smallest value of this distance d(x, y ) is significantly greater than d(x, y). The significance of the necessary gap between d(x, y) and d(x, y ) is encoded by the constant C. Then, we consider this keypoint x is matched to y iff x is associated to y and y is associated to x. Figure 13 presents an example of matching resulting from a genuine and an impostor comparison.

Validation process
According to Grother and Tabassi [8], biometric quality metrics should predict the matching performance. That is, a quality metric takes a biometric raw data, and produces a class or a scalar related to error rates associated to that sample. Therefore, we use the Equal Error Rate (EER) which illustrates the overall performance of a biometric system [42]. EER is defined as the rate when both False Acceptance Rate (FAR) and False Reject Rate (FRR) are equal: the lower EER, the more accurate the system is considered to be.
In order to validate the proposed quality metric, we proceed as follows: • • Quality sets definition: the proposed metric predicts a quality class of the target image. In order to show the utility of this metric, we need to define the quality sets for the used authentication system. Depending from the used authentication system, some alterations may have more impact on its global performance more than others. Thereafter, we use the EER to illustrate the global performance of the biometric system.
• EER value of each quality set: in order to quantify the effectiveness of our quality metric in predicting system performance, we have put each image to a quality set, using its predicted label by our metric. Then, we have calculated the EER value for each quality set. The effectiveness of the method is quantified by how well our quality metric could predict system performance among the defined quality sets. More generally speaking, the more the images are degraded, the more the performance of the overall system will be decreased (illustrated by an increase of its EER value).

Quality criteria behavior with alterations
In this section, we show the robustness of the used criteria in detecting alterations presented in the previous section. To do so, we use the Pearson's correlation coefficient between two variables as defined in Equation 13. It is defined as the covariance of the two variables (X and Y) divided by the product of their standard deviation (σ X and σ Y ): In order to compute the correlation of the used criteria with the three types of alterations, we define for each type of alteration and for each criterion p the variables as follows: where N (im) is the number of detected keypoints for image im.

Mean (µ) and
4. Standard deviation (σ) of scales: mean and standard deviation of scales related to the keypoints detected from image im.
Therefore, the vector V used to predict biometric sample quality is a fivedimensional vector containing one image quality criterion and four patternbased criteria as depicted in Table 3.

Learning the multi-class SVM models
We learned 7 multi-class SVM models: 5 for face databases, and 2 for hand veins and fingerprint databases. Table 4 presents the accuracy of the learned multi-class SVM models on both training and test sets. We have put the symbol "×" at the last two lines, since we have only 1 multi-class SVM generated per database.

Quality sets definition
In order to quantify the robustness of the proposed metric in predicting system performance, we need first to define the quality sets of the used biometric authentication systems. Therefore, we have tested the robustness of the used system against the three alterations presented in Section 4.1.2.
The EER values are computed using the first image as a reference (single enrollment process), and the rest for the test. Figure 16 shows that all the introduced alterations have an impact on overall performance of the used authentication matching algorithm presented in Section 4.1.3. Therefore, we define in Table 5 the quality sets definition for the used matching algorithm.
29 Figure 16: Impact of alterations on overall performance of the used authentication system among the 4 face databases.
Quality set Predicted quality class by SVM Description

EER value of each quality set
In order to validate the proposed quality metric in predicting the used matching algorithm performance, according to Grother and Tabassi [8], we calculate the EER value of each quality set predicted by the learned multiclass SVM models. Figure 17 shows the EER values of each quality set among the used biometric databases. From this figure, we can deduce several points: • The proposed metric has shown its efficiency in predicting the used matching system among the 6 biometric databases. More generally speaking, the more the images are altered, the more the EER values are increasing. This is demonstrated by the increasing curves presented in Figure 17. This shows the scalability of the presented metric to be used on different types of images (such as the image resolution).

Comparison study with NFIQ
In order to show the efficiency of the proposed metric, we present in this section a comparison study with the NIST Fingerprint Image Quality metric (NFIQ) [10]. We have used NFIQ metric proposed by the NIST, since it is the most cited at the literature for the fingerprint modality. NFIQ provides five levels of quality (NFIQ=1 indicates high quality samples, whereas NFIQ=5 indicates poor quality samples). For the comparison with the proposed method (four levels of quality), we consider that the 4 th and 5 th levels belong to the very bad quality set.
In order to compare the proposed metric with NFIQ, we use the approach suggested by Grother and Tabassi [8] when comparing quality metrics. To do so, we use the Kolmogorov-Smirnov (KS) test [44] which is a nonparametric test to measures the overlap of two distributions: in our case, distributions of scores of genuine and impostors, respectively. More generally speaking, KS test returns a value defined between 0 and 1: for better quality samples, a larger KS test statistic (i.e., higher separation between genuine and impostor distributions) is expected.

Conclusion and perspectives
The quality assessment of biometric raw data is a key factor to take into account during the enrollment step when using biometric systems. Such kind of information may be used to enhance the overall performance of biometric systems, as well as in fusion approaches. However, few are the works exist in comparison to the performance ones. Toward contributing in this research area, we have presented an image-based quality assessment metric of biometric raw data using two types of information (image and patternbased quality). The proposed metric is independent from the used matching system, and could be used to several kind of modalities. Using six public biometric databases (face, fingerprint and hand veins), we have shown its efficiency in detecting three kinds of synthetic alterations (blurring, Gaussian noise and resolution).
For the perspectives of this work, we aim to add an additional quality criterion in order to detect luminance alteration, which is also considered as an important alteration affecting biometric systems (mainly, facial-based recognition systems). We aim also to compare the proposed metric with NFIQ using other kind of biometric matching algorithms (such as BOZORTH3 [45] proposed by the NIST). In addition, we are planning to test the efficiency of the presented method on altered images combining the presented alterations, which also represent another kind of real life alterations. This can be done using the presented criteria and a SVM or a genetic algorithm in order to produce an index between 0% and 100% (i.e., more the index is near 100% better is the quality). Modality specific alterations could also be used to have a precise analysis of the efficiency of the proposed methodology.

Terms and definitions
Enrollment: The process of collecting biometric samples from a person and the subsequent preparation and storage of biometric reference templates representing that person's identity.
False Acceptance Rate (FAR): Rate at which an impostor is accepted by an authentication system.