- Open Access
Complementary feature sets for optimal face recognition
EURASIP Journal on Image and Video Processing volume 2014, Article number: 35 (2014)
In face recognition tasks, one kind of feature set is not adequate to generate superior results; thus, selection and combination of complementary features are crucial steps. In this paper, the fusion of two useful descriptors, i.e., the Zernike moments (ZMs) and the local binary pattern (LBP)/local ternary pattern (LTP), has been proposed. The ZM descriptor consists of good global image representation capabilities besides being invariant to image rotation and noise, while the LBP/LTP descriptors capture the innate details within some local parts of face image and are insensitive to illumination variations. The fusion of these two is observed to incorporate the traits of both of these individual descriptors. Subsequently, in this work, the performance of diverse feature sets of ZMs (i.e., magnitude features, magnitude plus phase features, and the real plus imaginary component features) combined with the LBP/LTP descriptor is analyzed on FERET, Yale, and ORL face databases. The recognition results achieved by the proposed method are approximately 10 to 30% higher than those obtained with these descriptors separately. Recognition rates of the proposed method are also found to be significantly better (i.e., by 8 to 24%) in case of single example image per person in the training.
In recent times, face recognition has become one of the widely used biometric techniques having a number of real-world applications like human-computer interaction, surveillance, authentication, computer vision applications, computer user interfaces, etc. An automatic face recognition system consists of some methods to ascertain a person's identity on the basis of his/her physiological characteristics. The sensitivity of available classifiers to different kinds of disparities such as illumination variation, facial expression, facial occlusion, pose variation, aging, etc. is among the most challenging problems that the researchers face .
In order to improve the existing face recognition techniques, discriminative competence of the invariant features selected to represent the face images should be high because, thereafter, classification is performed on the basis of these invariant features only. In literature, the approaches used to represent the face images are classified broadly into two categories, namely, the global feature extraction approaches and the local feature extraction approaches . The global feature extraction approaches are based on the statistical methods, wherein features are extracted from the whole face image. In this category, the subspace-based methods, namely, principal component analysis (PCA), Fisher linear discriminant (FLD), two-dimensional PCA (2DPCA), and two-directional two-dimensional PCA (2D2PCA) [3–7], are some of the popular and most frequently employed techniques. Moment invariants, such as Hu's seven moment invariants and orthogonal rotation invariant moments such as Zernike moments (ZMs), pseudo-Zernike moments (PZMs), and orthogonal Fourier-Mellin moments (OFMMs), are observed to be very effective in global image description and recognition , and MPEG-7 uses some of them as region-based shape descriptors for image retrieval . The magnitude of these moments is invariant to image rotation, and after applying some geometric transformations, it becomes invariant to translation and scale [10, 11].
Local feature extraction approaches deal with fine information within the specific parts of face images such as eyes, nose, mouth, etc. Recently, a lot of work has been done on these methods because the local features are known to be robust against illumination, occlusion, expression, and noise variations. The local feature extraction approaches have been classified into two categories, i.e., the sparse descriptors and the dense descriptors. The sparse descriptors initially divide a face image into patches and then determine its invariant features. A prominent descriptor in this category is the scale-invariant feature transform (SIFT) introduced by Lowe , which possesses useful characteristics of being invariant to scale and rotation. Soyel et al. used the discriminative SIFT (D-SIFT) approach for optimal facial expression recognition, but this method is somewhat susceptible to the illumination variation . In face recognition technology, Gabor wavelet is one of the most frequently used and successful local image descriptors. It incorporates the characteristics of space and frequency domains. The local features extracted by Gabor filters are invariant to scale and orientation and are able to detect edges and lines in the face images . The main difficulty with Gabor filters is their high computational complexity. In case of the dense descriptors, local binary pattern (LBP) is one of the most widely used approaches due to its invariance to monotonic gray-level changes and ease in extraction of the local features. Apart from texture analysis, it has provided excellent results in many areas of image processing and computer vision including its wide use in face recognition [15, 16]. Several variants of LBP are available in literature to represent face images with compact feature sets. Such variants also improve the classification performance of the basic LBP approach [17, 18].
In complex applications like face recognition, it is observed that one kind of feature set is not rich enough to capture the entire face information. Thus, finding and combining the complementary feature sets have become an active research topic in recent years. Specifically, global features are related to the holistic characteristics of face, whereas local features describe the finer details within face images, so it seems logical to combine both of these feature sets since the information conveyed by them belongs to different attributes of the face images. In recent times, many researchers are developing the face recognition algorithms by combining the multiple feature sets. Kim et al.  have proposed a combined subspace-based approach using both global and local features obtained by applying linear discriminant analysis (LDA)-based method for face recognition. Zhou and Yang  have proposed fusing feature Fisher classifier (F3C) approach where the face image is first divided into smaller subimages and then the discrete cosine transform (DCT) technique is applied to the whole image and some subimages to extract the holistic and local facial features. After concatenating these DCT-based holistic and local facial features, the enhanced Fisher linear discriminant model (EFM) has been employed to generate a low-dimensional feature vector. Similarly, local and global information extracted by using DCT coefficients along with the Fisher classifier developed for high-dimensional multiclass problem have been proposed in . Singh et al. proposed a robust two-stage face recognition approach by the fusion of global ZMs and Weber law descriptor (WLD)-based local features . The usefulness of combining the global and local facial features is presented in  where a hierarchical ensemble of global and local features is performed. In this technique, 2D Fourier transform is used to extract the global features and the Gabor wavelet is opted to extract local features. Subsequently, equal weights are assigned to both the global and local features for combining the outputs of two classifiers (although it is established by the authors that the contribution of both global and local features is different). Wong et al. have proposed dual optimal multiband feature (DOMF) method for face recognition in which wavelet packet transform (WPT) decomposes the image into frequency subbands and the multiband feature fusion technique is incorporated to select optimal multiband feature sets that are invariant to illumination and facial expression. In this method, parallel radial basis function (RBF) neural networks are used to classify the two sets of features. The decision scores are then combined and processed by an adaptive fusion mechanism . The use of steerable pyramid decomposition (S-P transform) both in global and local appearance and feature/score fusion has been analyzed in . In this work, each face image is described by a subset of band-filtered images containing steerable pyramid coefficient. These S-P subbands are divided into small subblocks to extract the compact and meaningful feature vectors that provide a better representation of the class information. Recently, Liu and Liu  have proposed an approach for face recognition that fuses color and local spatial and global frequency information. This method is composed of multiple features of face images derived from LBP, DCT, hybrid color space, and the Gabor image representation. The combination of Gabor and LBP enhances the power of spatial histogram that is impressively insensitive to appearance variations. This method has proven to be robust against illumination, pose, and expression variations [27, 28]. However, the combination of Gabor and multiresolution LBP descriptors requires significantly greater computation time.
Although a lot of research is going on combining the multiple feature sets, still the selection of complementary feature sets for fusion and the techniques for combining these divergent feature sets are a challenge. In view of that, in this paper, a fusion of two complementary feature sets is proposed, where the global information of the face images is extracted by the ZM descriptor employing its rotation invariance characteristic, while the LBP/LTP descriptor captures the significant local information. Among various global shape descriptors, ZMs are observed to be one of the best shape descriptors because of their many attractive characteristics . They possess minimum information redundancy, rotation invariance of their magnitude, robustness to noise, etc. The estimation of head movement by using the phase coefficients of ZMs of the original image and that of the rotated image has been employed to generate a set of features that is significantly tolerant to pose variation as well . The magnitude features of ZMs obtained at some higher order of moments are observed to be invariant to expression variation . On the other hand, the LBP descriptor is observed to be relatively more insensitive to illumination changes. It is computationally efficient as well as quite simple to implement. Recently, a useful extension to this approach is introduced, namely, the local ternary patterns (LTP), that is observed to be more discriminative and invariant to image noise in near-uniform regions as compared to LBP . Particularly, combining the feature sets that are invariant to global variations as well as to local changes of face images would be an effective approach to achieve an optimal face recognition system. As discussed earlier, the information conveyed by the ZM and LBP/LTP descriptors are distinct and belong to different aspects of a facial image. Fusion of these descriptors is expected to be enriched with the useful characteristics of both of them. One of the critical issues involved in the fusion process is the time spent in the computation of the combined features. It is shown through time analysis of the proposed approach that the total time required for the recognition process is very small and can be afforded by PCs and other low computation devices.
The ZM descriptor provides three different sets of features, namely, magnitude features, combined magnitude and phase features [31, 32], and the modified real and imaginary component features . In this study, these diverse feature sets of ZMs are referred to as ZMmag, ZMmagPhase, and ZMcomponent, respectively. The performance of the feature sets of ZMs combined with LBP descriptor, in comparison to that of the ZMs coupled with the LTP descriptor, is also analyzed. Consequently, the proposed fusion of the diverse feature sets of ZMs and the LBP/LTP descriptors provides various combined approaches such as ZMmagLBP, ZMmagPhaseLBP, ZMcomponentLBP, ZMmagLTP, ZMmagPhaseLTP, and ZMcomponentLTP. In order to compare the performance of these combined approaches to that of the individual ZM and LBP/LTP approaches, exhaustive experiments are performed on three prominent face databases, namely, FERET, Yale, and ORL, against pose, illumination, expression, and noise variations. The results obtained show that the recognition rate of the combined approaches, in comparison to that of their individual counterparts, is significantly better varying between 10 and 30%. Experimental results also prove the efficacy of the proposed methods over other existing works. A significant improvement in recognition rate is achieved for the case of single training image per person.
The rest of the paper is organized as follows: Section 2 presents a brief overview of the ZM approach and the diverse feature sets obtained from it and includes a brief introduction to the LBP/LTP approaches, Section 3 describes the similarity measures used to evaluate the matching score of these methods, the procedure involved in the proposed fusion of the ZM and the LBP/LTP descriptors is described in Section 4, the experiments and results obtained are presented in Section 5, and the conclusions and future directions are presented in Section 6.
2 Baseline image descriptors
2.1 Global image descriptor
2.1.1 Zernike moments
The Zernike functions constitute a set of orthogonal basis functions mapped over the unit circle. Zernike moments of a function f (x, y) are constructed by projecting it onto those functions. The ZMs of order n and repetition m are defined by
where n ≥ 0, |m| ≤ n, and are the complex conjugates of the Zernike function V nm (x, y), where
with , θ = tan−1( y/x ), θ∈ [0, 2π], and
2.1.2 Diverse feature sets of ZMs and related work
Since the magnitude of ZMs is invariant to rotation, usually it is used as invariant image descriptor in many image analysis and pattern recognition applications. The phase component of ZMs is, however, ignored. It is observed that the phase component also carries equally significant information as the magnitude component does . Therefore, in recent years, significant research work has been carried out to incorporate ZM phase coefficients along with their magnitudes as invariant feature descriptors. At present, there are two approaches to realize this objective. In the first approach, developed by Revaud et al. , a similarity measure incorporating both the magnitude and phase coefficients of the query and database image is used. The method provides excellent pattern matching performance but at the cost of enhanced computation time. In the second approach, the rotation angle between a query image and the database image is estimated. It is assumed that the query image is the rotated version of the original database image. The estimated rotation angle is used to cancel the effect of rotation in order to compare the phases of the query and database images. Recently, we devised a novel way to correct phase coefficients without estimating rotation angle. The method was applied successfully in face recognition . The method works as follows: Suppose Z nm and Z′ nm are the ZMs of database and query images, respectively, and ϕ nm and ϕ′ nm are their respective phase angles. We compute and correct the ZMs of the query image by evaluating . If the two images are same, then , otherwise ; therefore, the real and imaginary components of ZMs of the query and database images can be compared separately, instead of comparing only their magnitude. An attractive advantage of this approach is that by using two-component feature vectors, the number of features is almost doubled as compared to the ZM magnitude only features for the same order of moments. This approach has additional advantages of having low computation cost, less susceptibility to image noise, and numerical stability, in addition to providing better recognition rate . Throughout the paper, these features of ZMs based on magnitude, magnitude together with corrected phase , and the corrected real and imaginary parts of ZMs  are referred to as ZMmag, ZMmagPhase, and ZMcomponent, respectively.
2.2 Local image descriptor
2.2.1 Local binary pattern
Ojala et al. have introduced the local binary patterns (LBP) for effective texture description that has been used in many image processing and computer vision applications . The most important property of this approach is its tolerance against illumination variation. Being computationally simple, it provides significant advantage over other approaches. The LBP operator takes some specific neighborhood around each pixel, and it then thresholds the values of these neighborhood pixels with respect to the central pixel's value. The resulting binary pattern is used as an element of the local image descriptor. Thus, it assigns a label to every pixel p i of an image by thresholding its respective 3 × 3 neighborhood values with the value of the central pixel p c and producing the result in the form of an 8-bit binary code. The LBP operator is computed as
where the values of i move along the eight neighbors of the central pixel. In case of 8-bit patterns, Ojala et al.  have observed that out of 28 patterns, only 58 uniform patterns provide approximately 90% information of the image neighborhoods while the remaining patterns consist of mostly noise. This attribute significantly reduces the number of LBP histogram bins from 256 to 59 where all the non-uniform patterns are stored in a single bin, the 59th bin.
2.2.2 Local ternary patterns
The local histogram features obtained from LBP have proven to be highly discriminative in face recognition [16, 17]. However, they are found to be sensitive to noise because of the fact that they are thresholded exactly at the value of the central pixel especially in near-uniform and smooth regions of face images like cheeks or forehead. Recently, an important extension to the original LBP is provided by Tan et al. . It generates three-valued codes corresponding to each image pixel. In their method, the binary LBP code is replaced with the ternary LTP code and the gray values in a zone of width ± w around the central pixel p c are quantized to 0 and the values above this zone are quantized to +1 while those below it are quantized to −1. Specifically, the value of b(x) given in Equation 7 is replaced with the following three-valued function:
where w is a user-defined threshold. The LTP code is assumed to be invariant to image noise but may not be strictly invariant to gray-level transformations. The concept of uniform patterns to obtain histogram features is also applicable to LTP. For simplicity, the three-valued LTP codes are split into their positive and negative bisects which generate two sets of histogram features out of which one corresponds to the positive patterns and the other represents the negative patterns .
3 Similarity measures used
In this section, the similarity measures used for finding the matching scores of ZM and LBP/LTP descriptors are discussed briefly. In this work, it is observed that the fusion of matching scores obtained by applying L2 − Norm/L1 − Norm on the ZM descriptor and histogram intersection on the LBP/LTP descriptor generates superior performance. Hence, these different similarity measurement techniques are used on the feature sets generated by these descriptors. Since the matching scores obtained from these different approaches are heterogeneous, normalization is required to transform these matching scores to a common range before combining them.
3.1 Similarity measure for ZM descriptor
The magnitude features of ZMs, i.e., ZMmag, of two images are compared by evaluating the normalized Euclidean distance (L2 − Norm) between them. The normalized L2 − Norm between the two sets of feature vectors of ZMs is given by
where and Z i are the feature vectors of the query and the database images, respectively, and L represents the size of the feature vector consisting of the magnitude of ZMs. The normalized Euclidean distance dPhase defined by , between ZM phases of the query and the database images, is computed as
where φ is the phase angle of the database image and φ′ is the phase angle after estimating the rotation angle between the query and database images and correcting the phase . The total distance dmagPhase between the feature vectors consisting of ZMmagPhase coefficients has been evaluated by using the distances dmag and dPhase, computed as per Equations 9 and 10, respectively. The formula used to compute the dmagPhase is given as
Normally, equal weights are assigned to simplify this process, i.e., w1 = w2 = 0.5.
The ZMcomponent descriptor includes the modified real and imaginary parts of ZMs to formulate a two-component feature vector for each ZM. The normalized L1 − Norm-based distance measure for the evaluation of similarity between component features of the database and the query images is given as under
The above mentioned distance metric dcomp has proven to be a better similarity measure between two sets of component feature vectors .
3.2 Similarity measure for LBP/LTP descriptor
The histogram intersection distance has been used to compare the feature vectors of query and database images for both the LBP and the LTP descriptors. The histogram intersection distance evaluated for every bin n between the database image and the query image is given as
where Hd and Hq are the histograms consisting of LBP/LTP features of the database and the query images, respectively. B is the total number of bins in the histograms. If either of the two images, i.e., the database and the query images is identical, then the value of D h (Hd, Hq) is 1.
4 Fusion of ZM and LBP/LTP descriptors
The ZM descriptor and the LBP/LTP descriptors are observed to be complementary to each other, and their fusion is expected to be able to discriminate the face images even in the presence of diverse variations. The ZM descriptor is observed to extract the global information of the images more effectively as compared to that of any other global descriptor . On the other hand, the LBP and the LTP descriptors have been established to be successful methods for representing the finer interior details within the face images. The feature set established by the fusion of these autonomous approaches, i.e., ZMs and the LBP/LTP, is supposed to be enriched with the invariant characteristics of both of them. Exhaustive experiments performed against pose, illumination, expression, and noise variations on the suitable databases prove that the said hypothesis is correct.
The procedure followed to recognize the face images by the proposed combined approaches, i.e., ZMmagLBP, ZMmagPhaseLBP, ZMcomponentLBP, ZMmagLTP, ZMmagPhaseLTP, and ZMcomponentLTP, is briefly described in Figure 1. The recognition of face images through the proposed fusion of feature sets includes three stages - feature extraction, fusion of similarity score, and classification. The first stage of this procedure creates the invariant feature sets extracted by using ZM and LBP/LTP descriptors. The second stage involves fusion of the matching scores obtained from these feature sets after applying the similarity measures as described in the previous section. A number of feasible techniques such as fusion at the feature extraction level, matching score level, or decision level exist for combining the multiple feature sets. It is not easy to combine the information at the feature level when the feature sets obtained by different techniques are either inaccessible or incompatible. Fusion at the decision level is too rigid as only a limited amount of information is available at this level. Therefore, integration at the matching score level is generally preferred due to the ease of accessing and combining matching scores . In the proposed work, feature vectors are obtained by applying ZM and LBP/LTP descriptors which provide complementary information. Further, we observed that for the LBP/LTP descriptor, the matching score evaluated by the histogram intersection measure gives better results than using L2 − Norm. Hence, in this work, the fusion at the matching score level is employed wherein the histogram intersection (using Equation 13) and L2 − Norm (using Equations 9 and 11 for ZMmag and ZMmagPhase)/L1 − Norm (using Equation 12 for ZMcomponent) is used for evaluating the matching scores of the feature vectors obtained from the LBP/LTP and ZM descriptors, respectively. Thereafter, these individual matching scores are combined by using the sum rule to generate a single scalar score which is then used to make the final decision. The sum rule to compute the fusion of individual matching scores is given as below
where SZ and SL represent the matching scores of the ZM descriptor and the LBP/LTP descriptors, respectively. The matching scores of these approaches are normalized before fusion. Normalization is required to map the matching scores obtained from multiple frameworks to a common range so that they can be easily combined. In order to combine these matching scores, SL is subtracted from 1, so that the histogram intersection would now signify higher similarity with lower values. Finally, in the third stage, we use the nearest neighbor rule to perform classification. This method always gives us only one recognized image which is labeled as either correct or incorrect in order to evaluate the recognition performance. The recognition rate (in percentage) is measured by using the following formula:
where N test is the total number of images in the test set and Nf is the number of images recognized incorrectly.
5 Experiments and results
In order to evaluate the performance of the considered autonomous approaches in comparison to that of the proposed combined approaches, experiments are performed on three well-known and calibrated face databases, namely, FERET face database  consisting of images in diverse variations, Yale face database  consisting of illumination and expression variations, and ORL face database  having small pose (tilt/yaw) changes. It is well known that the accuracy of the face recognition system is significantly affected by the kind of variations present in images of the face database as well as by the number of images of each subject (i.e., person) in the training set. Thus, exhaustive experiments in a comprehensive and deterministic manner are performed with respect to different types of variations present in these databases. The number of training images per person is also varied to observe its effect on recognition accuracy. The best results are highlighted in italics. All the experiments are performed in Visual C++6.0 under Microsoft Windows environment on a PC with a 3.0-GHz CPU and a 3-GB RAM.
5.1 Performance on FERET database
FERET grayscale face database has become the most popular and standard database in the field of face recognition. We have performed experiments on two subsets of this database, consisting of the frontal to profile pose variation. The first subset is formed by randomly selecting 100 persons with seven different poses (yaw) 0°, ±22.5°, ±67.5°, and ±90°. The second subset consisted of FERET ‘b’ category images of 200 persons in different illumination, expression, and pose angles of 0°, ±15°, ±25°, ±40°, and ±60°. In this work, the first subset is called FERET_A. It consists of 700 images. The second subset is named as FERET_B and contains 2,200 images. FERET evaluation protocol partitions the database into gallery (1,196 images of 1,196 persons) and four probe sets, namely, fafb, fc, dup I, and dup II. The images in the fafb set are with facial expression variation, the fc set contains images with illumination variations, and the images in the dupI and dupII sets represent aging effects. For detailed experimentation against pose, expression, and illumination variations, various data partitions are generated for these subsets which are described in Table 1. The original images of this database are of size 256 × 384 pixels. We transformed these images into 128 × 128-pixel size in order to reduce the time taken for conducting the experiments, while the face images from FERET_Gallery/Probe subset are cropped and resized to 64 × 64 pixels. Some sample images, for one person, from this database are shown in Figure 2. The face images of FERET_A, FERET_B, and FERET_Gallery/Probe subsets are partitioned into 64 patches of 16 × 16 and 8 × 8 pixels, respectively, to extract the local LBP/LTP features, while the global ZM features are extracted from whole face images.
In order to analyze the performance of the proposed combined approaches, the first set of comprehensive experiments is performed on FERET_A1 category as described in Table 1. The different possible trials for this setup, containing various combinations of the training and the test sets, are shown in Table 2. The average recognition performance of the individual and the combined approaches over these different trials (seven) is presented in Figure 3 for different values of maximum order of ZMs, denoted by nmax. Further, it is pertinent to mention here that the values of nmax have no effect on the performance of the LBP and LTP descriptors. So, the results presented for the LBP/LTP descriptors remain the same for each value of nmax. From the results presented, it is observed that among the autonomous approaches, the performance of the ZMcomponent approach is better than that of others. However, the proposed combined approaches exhibit significantly high recognition rates compared to their individual counterparts. In this experiment, the ZMmagPhaseLBP approach provides the highest recognition rate at 71.24%.
Next, experiments are performed on FERET_A2 category, and the recognition results are presented in Figure 4 for both the individual and the combined approaches over different order of moments nmax used for the ZM descriptors. The basic LBP/LTP descriptors are not invariant to rotation; however, in this category, they perform better than the ZM descriptor. This is due to the fact that the higher pose angles occlude a significant portion of the face and on this kind of distortion, the local feature sets are observed to be more successful than the global features. On the other hand, the proposed fusion of the ZM descriptors and the LBP/LTP descriptors achieves approximately 20% improvement in the recognition results in comparison to that of these independent approaches. It is also noticed that the ZM descriptors coupled with the LBP descriptor generate superior results and the highest recognition rate of 77.33% is accomplished by the ZMmagPhaseLBP approach. In case of the LTP descriptor, the LTP combined with the ZMcomponent descriptor, i.e., ZMcomponentLTP, provides better results than those of other combinations.
Further experiments are performed on FERET_A3, FERET_A4, and FERET_A5 categories. The recognition results for these setups are shown in Table 3. On this database, fusion of the ZM features obtained for nmax = 9 provides better results. Accordingly, here, all other experiments have been conducted only for this order of moments. It is observed from Table 3 that there is an improvement in recognition rates by approximately 10 to 20% due to the fusion of the ZM and LBP/LTP descriptors. Further, it is observed that the recognition rates decline significantly with increase in the pose angle of test images (e.g., the highest recognition rate of only 74.5% is noticed for FERET_A4 which contains pose variations of ±67.5° in test images). This outcome is obvious because of the fact that the presence of the higher pose angle will occlude a significant part of the face image. The highest recognition rate of 86.5% is achieved by the ZMmagPhaseLBP approach on FERET_A3 category. On FERET_A5 category, a superior recognition rate of 99.5% is achieved by both the ZMcomponentLBP and the ZMcomponentLTP approaches. Similarly, the ZMmagPhaseLBP approach provides the highest recognition rate on FERET_A4 category which reveals the contribution of ZM phase coefficients towards the improvement in recognition results.
For FERET_B, the recognition results obtained by performing the experiments on FERET_B1 and FERET_B2 categories are also shown in Table 3. It is clear from the experimental results that the combined approaches exhibit approximately 15 to 20% hike in the recognition rate than that obtained by the individual approaches. The highest recognition rate of 78.5 and 88.2% is achieved by the ZMmagPhaseLBP descriptor for FERET_B1 and FERET_B2 categories, respectively. The result obtained on FERET_Gallery/Probe set ascertains the robustness of the proposed system against changes in expression and lighting; however, further research in aging is required. In general, on FERET_B and FERET_Gallery/Probe subsets, the combination of ZMmagPhase with LBP descriptor generates higher results than others.
5.2 Performance on Yale database
The Yale face database contains 11 images per person for 15 individuals resulting in a total of 165 images. The images in this database have major variations in illumination and facial expressions. They also have images demonstrating occlusion of eyes with eyeglasses. The original size of the images in this database is 243 × 320 pixels with 256 gray levels. For the experiments, these are cropped down to 64 × 64 pixels. Sample cropped images from this database, for one person, are shown in Figure 5. Here also, the face images are partitioned into 64 patches of 8 × 8 pixels to extract the local LBP/LTP features.
In order to examine the improvement in performance by the proposed combined approaches across the expression and illumination variations, exhaustive experiments are performed on this database by taking different number of images in training and test sets. Accordingly, for this purpose, various data partitions have been generated which are presented in Table 4. The first set of comprehensive experiments is performed on YALE 1 category where out of total 11 images of each person, one image is taken in the training set and all the remaining are placed in the test set. This process is repeated 11 times by taking different face images of each person in the training set. Average recognition results over 11 different runs of training and test sets are presented in Figure 6 for different nmax of ZMs.
From the results obtained, it is observed that an average improvement of approximately 12% is achieved by the proposed combined descriptors as compared to the individual approaches. In case of individual approaches, the performance of both the LBP and LTP approaches is better than that of the three descriptors of ZMs. The result obtained depicts that the local approaches are able to capture the interior details of a face image more efficiently than the global ones. This certainly enhances the suitability of these methods to outperform even in the presence of only a single exemplar image per person. From the results depicted in Figure 6, it is observed that among all the combined approaches, the highest recognition results are achieved by the ZMmagPhaseLBP descriptor. It is also observed that for the proposed combined methods, fusion of ZM features obtained for nmax = 11 provides better results. Hence, on this database, all other experiments have been carried out for this order of moments.
Next, the experiments are performed on YALE 2.1, YALE 2.2, YALE 2.3, YALE 2.4, and YALE 2.5 categories. The average recognition results over ten trials of each group of the training and the test sets are presented in Table 5. It is well known that the LBP and LTP descriptors are invariant to changes in intensities of the images, so the results obtained by these two approaches are quite higher than those obtained by the ZM descriptors. Hence, on this database, it has been realized that the LBP/LTP feature sets contribute much more towards the improvement in the recognition rate of the proposed combined approaches which are significantly higher than the individual approaches. In most of the cases, the LBP and LTP descriptors combined with ZMmag features generate superior results. For example, on YALE 2.4 category, the highest recognition rate of 97.56% is achieved by the ZMmagLBP approach. However, on YALE 2.3 category, the average highest recognition rate is 97.14% with the ZMmagLBP approach. Thus, in general, the proposed ZMmagLBP approach outperforms the others.
Thereafter, experiments are performed on YALE 3.1 and YALE 3.2 categories against illumination variation. Similarly, in order to examine the performance of the proposed approaches particularly over expression variation, experiments are carried out on YALE 4.1 and YALE 4.2 categories. The results obtained from these experiments are presented in Table 6. From the results shown, it is clearly noticed that the proposed combined approaches show an improvement in performance by approximately 30% over the ZM descriptors alone, whereas in comparison to the performance of individual LBP/LTP descriptors, an improvement of approximately 10% is achieved. As described earlier, among the individual approaches, the performance of the LBP/LTP descriptors is better than that of the ZM descriptor. Between these two descriptors, the LBP descriptor generates higher recognition rate against the illumination and expression variations on YALE 3.1, YALE 3.2, and YALE 4.2 categories while the LTP descriptor gives higher results on only YALE 4.1 category for expression variation.
For illumination variation, i.e., on YALE 3.1 category, the highest recognition rate of 91.67% is achieved by the proposed ZMmagLBP approach, whereas against expression variation, the ZMmagPhaseLBP descriptor gives the highest recognition rate at 85.56% on YALE 4.1 category. Experiments are also conducted on YALE 3.2 category wherein all of the face images consisting of expression variation are taken in the training set and the remaining ones, i.e., one neutral and four images with illumination changes, are placed in the test set. The results for this setup are also shown in Table 6, from which it is observed that the performance of the ZMmagLBP as well as that of ZMcomponentLBP is better. Particularly, on YALE 3.2 category, a superior recognition rate of 97.33% is achieved by both approaches. Similarly, in case of YALE 4.2 category, four images of each person consisting of illumination variation are used to create the training set while all of the remaining ones (i.e., one neutral and six images in varying expressions) are placed in the test set. As shown in Table 6, ZMmagPhaseLBP achieves a high recognition rate of 98.89% for this category. Thus, from the results shown in Table 6, it can be concluded that ZMmagLBP is illumination invariant and ZMmagPhaseLBP is expression invariant. If we look at the overall performance of the proposed approaches on Yale database, ZMmagLBP and ZMmagPhaseLBP outperform the other combinations.
5.3 Performance on ORL database
The ORL face database consists of a total of 400 images of size 112 × 92 pixels of 40 persons with ten images per person in different states of variation. All the face images in this database are taken against a dark homogenous background. These images contain slight pose variation (tilt and yaw) up to ±20° with some basic facial expressions (smiling/not smiling, open/closed eyes). For performing experiments, the images of this database are cropped to 64 × 64 pixels. Sample cropped images for one person are shown in Figure 7. The face images of these databases are partitioned into 64 patches of 8 × 8 pixels to extract the local LBP/LTP features. Detailed experiments are performed on this database in order to analyze robustness of the proposed combined approaches against the pose variation. Various data partitions generated for this purpose are presented in Table 7.
Firstly, experiments are performed on ORL 1 category by taking one image of each person in the training set, and all of the remaining ones are used to formulate the test set. Different trials are framed in this case. As there are nine different images in the test set, ten combinations of different training and test images are possible here. The average recognition results over these ten different trials are shown in Figure 8a,b. The results on different values of nmax are depicted in order to analyze the effect of maximum order of moments nmax of ZMs on the performance of the proposed combined approaches. As the basic LBP and the LTP descriptors used in this work are not invariant to image rotation whereas the ZM descriptor is an established rotation invariant scheme, it is observed from the results that the performance of the individual ZM descriptors is better than that of the LBP/LTP descriptors for this database. Among the ZM-based descriptors, ZMcomponent and ZMmagPhase descriptors give the highest recognition rates because of the inclusion of phase coefficients. However, an improvement of more than 10% is achieved by fusion of the invariant feature sets of the ZM and LBP/LTP descriptors wherein the ZM descriptor plays a significant role in achieving rotation invariance. The highest recognition rate of 81.22% is achieved by the proposed ZMmagLTP approach. From the results presented in Figure 8a,b, it is observed that in the proposed combined methods, fusion of the ZM features obtained at nmax = 9 provides better results on this database. Accordingly, further experiments have been conducted only on this order of moments.
The average recognition results over ten different trials of each group (i.e., ORL 2.1, ORL 2.2, ORL 2.3, and ORL 2.4) of the training and the test sets are presented in Table 8. Excellent results are obtained by the proposed combined approaches, while ZMcomponentLTP provides the best results. On taking five images in training and the remaining five in the test set (i.e., for ORL 2.4 over ten runs), the average recognition rate of 99.2% is achieved by both the ZMmagLBP and ZMcomponentLTP approaches. Further, ZMcomponent features have proven to be invariant to image rotation and tolerant to pose variations to some extent . From this analysis, we can state that the ZMcomponent combined with LTP as well as the ZMmag coupled with LBP provides superior results against pose variations.
Next, experiments are performed on ORL 3 category by taking two neutral face images in the training set, while four images of each person consisting of scale and up/down head movement are taken in the test set. Similarly, in order to examine the performance of the proposed approaches over yaw pose variation, two neutral images of each person are placed in the training set and four images with slight left/right head movement are placed in the test set, i.e., ORL 4 category. The results obtained from this experimental analysis are also presented in Table 8. On ORL 3 category, the performance of the ZMcomponent coupled with that of the LBP/LTP descriptors is better, achieving a recognition rate of 90.0%. Similarly, on ORL 4 category, the highest recognition rate of 91.25% is achieved by both the ZMmagLBP and the ZMmagPhaseLBP approaches. Thus, in most of the cases of ORL database, the ZMcomponent combined with LBP/LTP outperforms the other proposed combinations.
5.4 Performance analysis against noise variation
To examine the effect of noise on the recognition accuracy, we add impulsive noise, commonly named salt-and-pepper or spike noise, to the face images of the three databases. In the presence of impulsive noise, an image has dark pixels in bright regions and white pixels in dark regions . In this analysis, a noise of 0.05 is added to the images of the test set whereas the training is done on original face images, i.e., on images with no noise. The procedure of experimental setup to examine the performance of these approaches against additive noise is the same as before. That is, in order to analyze the performance on FERET database, experiments are performed on FERET_A3 data partition in which one frontal image (0° pose) is selected in the training set and the four images in different poses (±22.5°, ±67.5° and with additive noise) for each person are used in the test set. On YALE 2.4 data partition, robustness of the proposed approaches is analyzed against noise variation by selecting five images of each person in the training set and the remaining six images (with additive noise) in the test set. The results presented are the average recognition rates over ten different runs of training and test sets. In a similar manner, the experiments on the images of ORL 2.4 data partition are performed by taking random five images of each person in the training set and the remaining five images (with additive noise) in the test set, and the recognition results for the same are also the averaged recognition rates over ten different runs of training and test sets. The experimental results on the said databases are shown in Table 9.
From the results presented, it is observed that among the individual approaches, the LTP descriptor is more robust to noise variation than the LBP. On Yale and ORL databases (with noise variation), the performance of the proposed ZMmagLTP and ZMcomponentLTP descriptors, respectively, is better as compared to all other combined approaches. On the other hand, if FERET images with noise variations are assessed, then the recognition rate of ZMmagPhaseLBP is 81.5%, whereas the recognition rate of ZMmagLTP is 81.0%. The percentage difference between the actual results obtained (without adding noise and with noise) for both approaches is 5 and 3.5%, respectively. Hence, from this observation, we can say that on FERET database, the performance of the proposed ZMmagLTP descriptor is better against noise variation. For the case of Yale and ORL databases, the degradation due to noise in recognition rates is very less.
5.5 Time complexity
One of the important issues involved in using combined approaches similar to the ones proposed here is the time complexity of these approaches. It is a common perception that the moment-based descriptors are computation intensive which is true to some extent especially in case of the ZM calculation. The time complexity of the ZMs is of order if all moments up to a maximum order nmax are computed for an image of N × N pixels. However, with the use of fast algorithms [10, 11], the time complexity is reduced to . Further significant reduction in computation time is achieved by using symmetry/antisymmetry properties of kernel function of ZMs. The ZMs of the database images are computed offline and indexed with the images themselves. The ZMs of the test image are computed online. Although the time complexity of ZM calculation is still high, in this work, better recognition results have been obtained with nmax = 9 for FERET and ORL databases whereas Yale database exhibits good results by taking nmax = 11; therefore, we consider moments only up to these orders. As Z0,0 and Z1,1 have no discriminative capabilities, they do not affect the recognition rate. Hence, with nmax = 11, we have 40 features after discarding the coefficients Z0,0 and Z1,1. In contrast, although the number of features in the feature vector containing local histogram features of the LBP/LTP descriptors is high, the computation time of these descriptors is very low. Thus, the proposed fusion of the ZM and LBP/LTP descriptors maintains a good balance between speed and dimensionality. The size of feature vectors of the ZM and LBP/LTP descriptors is shown in Table 10.
We observe that for an image of 256 × 256 pixels, the CPU elapse time for calculating ZMs is only 0.032 s for nmax = 12 on a PC with a 3.0-GHz CPU and a 1-GB RAM under Microsoft Windows environment. The time taken for computing LBP and LTP features is 0.015 and 0.016 s, respectively. Thus, the total time elapsed for the extraction of the local and global features of a test image does not exceed 0.048 s. The time taken for classification is much less than the feature extraction time. Thus, in comparison to the gain in the recognition performance, the time taken by the combined features is much less and can be afforded by the low computation power devices in online mode. Since the time complexity does not depend on the contents of the image, these experiments are carried out for one image only.
5.6 Performance comparison
We have compared the performance of the proposed combined descriptors with other popular methods such as PCA, 2DPCA, (PC)2A, E(PC)2A, 2D(PC)2A, SVD perturbation , and hybrid Fourier-AFMT transform  for face recognition with single (first) example image per person. As shown in Table 11, the proposed combined descriptors give the best recognition rate when compared with other well-established methods. On the other hand, the time complexity of PCA-based methods is very high as compared to the proposed approaches.
Comparison of performance of the proposed combined descriptors with other popular methods for face recognition with single (first) example image per person.
Dual optimal multiband features (DOMF)  give a recognition rate of 92.6 and 88.4% on Yale and ORL databases, respectively, when two images of each person are taken in the training set and all the remaining are kept in the test set. On this similar setup for training and test images, the highest recognition rate achieved by the proposed ZMmagPhaseLBP descriptor for YALE 2.1 is 94.59% while the ZMcomponentLBP descriptor achieves a recognition rate of 92.47% for ORL 2.1 category.
The performance of the proposed combined approaches is also compared with that of some recent face recognition methods when five images of each person are used for training. The recognition results of the proposed combined approaches and those of these recent methods on Yale and ORL databases for this case are shown in Table 12. The best results are highlighted in italics. All these methods use multidimensional features or combined approaches to represent the face images. As can be seen from the results presented, the recognition rate of the proposed approaches is higher as compared to that of the recent methods. In case of block-based S-P approach , one random set of five images per person is taken in the training set while all the remaining are kept in the test set for both the Yale and ORL databases, whereas the results presented for our proposed approaches are the average of ten random trials of training and test sets. It is worth mentioning here that on some of the random trials, our proposed descriptors also provide 100% recognition rate. Recently introduced wavelet moment (WM) and complex WM (CWM) approaches  have achieved a recognition rate of 51.5 and 54.3%, respectively, on FERET_A2 subset, while the proposed ZMmagPhaseLBP descriptor has attained a recognition rate of 77.33%. On the fafb subset of FERET database, the recognition rate obtained by the RES , WM, and CWM  approaches is 95.0, 88.0, and 91.0% whereas the highest recognition rate achieved by the proposed ZMmagPhaseLBP approach is 98.04%. Thus, on the basis of superior results obtained by the proposed fusion technique, it can be concluded that combining the feature sets of the ZM and LBP/LTP descriptors is an efficient and practical approach for robust face recognition.
This paper proposes the fusion of two useful feature sets, i.e., the global ZMs and the local LBP/LTP descriptor. Face images capture extensive variation under varying pose and lighting conditions accompanied by the presence of expression and noise. Individually, the ZM and LBP/LTP descriptors are observed to be very effective in providing good recognition performance on the face images containing certain variations. In particular, the ZM descriptor extracts rotationally invariant shape features from the whole face images, whereas the LBP/LTP descriptors are able to capture the fine details and illumination-invariant characteristics within some local regions of the face images. However, the fusion of these two complementary approaches incorporates the benefits of both of these descriptors and as such proves to be invariant against various distortions present in the face images. Herein this work, diverse feature sets of ZMs are combined with LBP/LTP descriptors to generate various combined approaches, namely, ZMmagLBP, ZMmagPhaseLBP, ZMcomponentLBP, ZMmagLTP, ZMmagPhaseLTP, and ZMcomponentLTP. From the detailed experiments performed on FERET, Yale, and ORL face databases, it has been observed that the proposed combined approaches are highly robust against pose, expression, illumination, and noise variations, as the recognition rate achieved by the proposed approaches is approximately 10 to 30% higher than that obtained by applying these approaches individually. Fusion of ZM and LBP descriptor performs better over the pose, expression, and illumination variations, while in the presence of noise, ZMs combined with LTP descriptor generate superior results. Experimental results also prove the efficacy of the proposed methods over other existing techniques. Also, significant improvement in the recognition rate is achieved by the proposed scheme when only single training image per person is available.
Future work is suggested towards discovering the optimal ways to utilize the information acquired by the phase coefficients of ZM descriptor in addition to using different methods of classification to further improve the performance of the proposed fusion approach.
Zhao W, Chellappa R, Phillips P, Rosenfeld A: Face recognition: a literature survey. ACM Comput. Surv. 2003, 35(4):399-458.
Hjelmas E, Low BK: Face detection: a survey. Comput. Vision. Image. Underst. 2001, 83: 236-274.
Turk M: A random walk through Eigenspace. IEICE Trans. Inf. Syst. 2001, E84-D(12):1586-1595.
Mittal N, Walia E: Face recognition using improved fast PCA algorithm. In Proceedings of the IEEE International Congress on Image and Signal Processing (CISp ‘08). Sanya, Hainan; 2008:554-558.
Belhumeur PN, Hespanha JP, Kriegman DJ: Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19: 711-720.
Xu Y, Zhang D, Yang J, J–Y Y: An approach for directly extracting features from matrix data and its application in face recognition. Neurocomputing 2008, 71: 1857-1865.
Daoqiang Z, Zhi-Hua Z: (2D)2PCA: Two-directional two-dimensional PCA for efficient face representation and recognition. Neurocomputing 2005, 69: 224-231.
Zhang D, Lu G: Review of shape representation and description techniques. Pattern Recognit. 2004, 37(1):1-19.
Zhang D, Lu G: Evaluation of MPEG-7 shape descriptors against other shape descriptors. Multimed. Syst. 2003, 9: 15-30.
Singh C, Walia E: Fast and numerically stable methods for the computation of Zernike moments. Pattern Recognit. 2010, 43(7):2497-2506.
C–Y W, Paramesran R: On the computational aspects of Zernike moments. Image Vis. Comput. 2007, 25: 967-980.
Lowe DG: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60(2):91-110.
Soyel H, Demirel H: Facial expression recognition based on discriminative scale invariant feature transform. IET Electron. Lett. 2010, 46(5):343-345.
Huang L, Shimizu A, Kobatake H: Robust face detection using Gabor filter features. Pattern Recognit. Lett. 2005, 26(11):1641-1649.
Ojala T, Pietikäinen M: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24(7):971-987.
Ahonen T, Hadid A, Pietikäinen M: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28(12):2037-2041.
Jun B, Kim T, Kim D: A compact local binary pattern using maximization of mutual information for face analysis. Pattern Recognit. 2011, 44: 532-543.
Tan X, Triggs B: Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans. Image Process. 2010, 19(6):1635-1648.
Kim C, Oh J, Choi C: Combined Subspace Method Using Global and Local features For Face Recognition. 4th edition. Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN ‘05), Montreal Canada; 2005:2030-2035.
Zhou D, Yang X: Feature fusion based face recognition using EFM. Proceedings International Conference on Image Analysis and Recognition (ICIAR ‘04). Lect. Notes Comput. Sci. 2004, 3212: 643-650.
Zhou D, Yang X, Peng N, Wang Y: Improved-LDA based face recognition using both facial global and local information. Pattern Recognit. Lett. 2006, 27: 536-543.
Singh C, Walia E, Mittal N: Robust two-stage face recognition approach using global and local features. Vis. Comput. 2012, 28(11):1085-1098.
Su Y, Shan S, Chen X, Gao W: Hierarchical ensemble of global and local classifiers for face recognition. IEEE Trans. Image Process. 2009, 18(8):1885-1895.
Wong Y-W, Seng KP, Li-M A: Dual optimal multiband features for face recognition. Expert Syst, Appl 2010, 37(4):2957-2962.
Aroussi ME, Hassouni ME, Ghouzali S, Rziza M, Aboutajdine D: Local appearance based face recognition method using block based steerable pyramid transform. Signal Process 2011, 91: 38-50.
Liu Z, Liu C: Fusion of color, local spatial and global frequency information for face recognition. Pattern Recognit. 2010, 43: 2882-2890.
Jun B, Lee J, Kim D: A novel illumination-robust face recognition using statistical and non-statistical method. Pattern Recognit. Lett. 2011, 32: 329-336.
Moore S, Bowden R: Local binary patterns for multi-view facial expression recognition. Comput. Vision Image Underst. 2011, 115: 541-558.
Singh C, Walia E, Mittal N: Rotation invariant complex Zernike moments features and their application to human face and character recognition. IET Comput. Vision 2011, 5(5):255-265.
Lajevardi SM, Hussain ZM: Higher order orthogonal moments for invariant facial expression recognition. Digit Signal Process 2010, 20: 1771-1779.
Li S, M–C L, Chi-Man P: Complex Zernike moments features for shape based image retrieval. IEEE Trans. Syst. Man. Cybern. C Appl. Rev. 2009, 39: 227-237.
Singh C, Mittal N, Walia E: Face recognition using Zernike and complex Zernike moment features. Pattern Recognit. Image Anal. 2011, 21(1):71-81.
Revaud J, Lavoue G, Baskurt A: Improving Zernike moments comparison for optimal similarity and rotation angle retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31(4):627-636.
Jain A, Nandakumar K, Ross A: Score normalization in multimodal biometric systems. Pattern Recognit. 2005, 38: 2270-2285.
The Facial Recognition Technology (FERET) face database http://www.nist.gov/itl/iad/ig/colorferet.cfm
Yale face database http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
Olivetti Research Laboratory (ORL) face database http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
Li J, Pan J-S: A novel pose and illumination robust face recognition with a single training image per person algorithm. Chin. Optic. Lett. 2008, 6(4):255-257.
Chen YM, Chiang J-H: Fusing multiple features for Fourier Mellin-based face recognition with single example image per person. Neurocomputing 2010, 73(16–18):3089-3096.
Kuo C-H, Lee JD: Face recognition based on a two-view projective transformation using one sample per subject. IET Comput. Vision 2012, 6(5):489-498.
Singh C, Sahan AM: Face recognition using complex wavelet moments. Opt. Laser Technol. 2013, 47: 256-267.
Zhi R, Ruan Q: Two-dimensional direct and weighted linear discriminant analysis for face recognition. Neurocomputing 2008, 71: 3607-3611.
Wang Y, Wu Y: Face recognition using Intrinsicfaces. Pattern Recognit. 2010, 43: 3580-3590.
The authors are thankful to the useful comments and suggestions of the anonymous reviewers for raising the standard of the paper. The authors are grateful to the All India Council for Technical Education (AICTE), Govt. of India, New Delhi, India, for supporting the research work vide their file number 8013/RID/BOR/RPS-77/2005-06. We are also grateful to the National Institute of Standards and Technology (email@example.com) for providing FERET face database.
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Singh, C., Mittal, N. & Walia, E. Complementary feature sets for optimal face recognition. J Image Video Proc 2014, 35 (2014). https://doi.org/10.1186/1687-5281-2014-35
- Face recognition
- Zernike moments (ZMs)
- Local binary pattern (LBP)
- Local ternary pattern (LTP)
- Invariant image features