Complementary feature sets for optimal face recognition
© Singh et al.; licensee Springer. 2014
Received: 4 January 2014
Accepted: 4 June 2014
Published: 5 July 2014
In face recognition tasks, one kind of feature set is not adequate to generate superior results; thus, selection and combination of complementary features are crucial steps. In this paper, the fusion of two useful descriptors, i.e., the Zernike moments (ZMs) and the local binary pattern (LBP)/local ternary pattern (LTP), has been proposed. The ZM descriptor consists of good global image representation capabilities besides being invariant to image rotation and noise, while the LBP/LTP descriptors capture the innate details within some local parts of face image and are insensitive to illumination variations. The fusion of these two is observed to incorporate the traits of both of these individual descriptors. Subsequently, in this work, the performance of diverse feature sets of ZMs (i.e., magnitude features, magnitude plus phase features, and the real plus imaginary component features) combined with the LBP/LTP descriptor is analyzed on FERET, Yale, and ORL face databases. The recognition results achieved by the proposed method are approximately 10 to 30% higher than those obtained with these descriptors separately. Recognition rates of the proposed method are also found to be significantly better (i.e., by 8 to 24%) in case of single example image per person in the training.
In recent times, face recognition has become one of the widely used biometric techniques having a number of real-world applications like human-computer interaction, surveillance, authentication, computer vision applications, computer user interfaces, etc. An automatic face recognition system consists of some methods to ascertain a person's identity on the basis of his/her physiological characteristics. The sensitivity of available classifiers to different kinds of disparities such as illumination variation, facial expression, facial occlusion, pose variation, aging, etc. is among the most challenging problems that the researchers face .
In order to improve the existing face recognition techniques, discriminative competence of the invariant features selected to represent the face images should be high because, thereafter, classification is performed on the basis of these invariant features only. In literature, the approaches used to represent the face images are classified broadly into two categories, namely, the global feature extraction approaches and the local feature extraction approaches . The global feature extraction approaches are based on the statistical methods, wherein features are extracted from the whole face image. In this category, the subspace-based methods, namely, principal component analysis (PCA), Fisher linear discriminant (FLD), two-dimensional PCA (2DPCA), and two-directional two-dimensional PCA (2D2PCA) [3–7], are some of the popular and most frequently employed techniques. Moment invariants, such as Hu's seven moment invariants and orthogonal rotation invariant moments such as Zernike moments (ZMs), pseudo-Zernike moments (PZMs), and orthogonal Fourier-Mellin moments (OFMMs), are observed to be very effective in global image description and recognition , and MPEG-7 uses some of them as region-based shape descriptors for image retrieval . The magnitude of these moments is invariant to image rotation, and after applying some geometric transformations, it becomes invariant to translation and scale [10, 11].
Local feature extraction approaches deal with fine information within the specific parts of face images such as eyes, nose, mouth, etc. Recently, a lot of work has been done on these methods because the local features are known to be robust against illumination, occlusion, expression, and noise variations. The local feature extraction approaches have been classified into two categories, i.e., the sparse descriptors and the dense descriptors. The sparse descriptors initially divide a face image into patches and then determine its invariant features. A prominent descriptor in this category is the scale-invariant feature transform (SIFT) introduced by Lowe , which possesses useful characteristics of being invariant to scale and rotation. Soyel et al. used the discriminative SIFT (D-SIFT) approach for optimal facial expression recognition, but this method is somewhat susceptible to the illumination variation . In face recognition technology, Gabor wavelet is one of the most frequently used and successful local image descriptors. It incorporates the characteristics of space and frequency domains. The local features extracted by Gabor filters are invariant to scale and orientation and are able to detect edges and lines in the face images . The main difficulty with Gabor filters is their high computational complexity. In case of the dense descriptors, local binary pattern (LBP) is one of the most widely used approaches due to its invariance to monotonic gray-level changes and ease in extraction of the local features. Apart from texture analysis, it has provided excellent results in many areas of image processing and computer vision including its wide use in face recognition [15, 16]. Several variants of LBP are available in literature to represent face images with compact feature sets. Such variants also improve the classification performance of the basic LBP approach [17, 18].
In complex applications like face recognition, it is observed that one kind of feature set is not rich enough to capture the entire face information. Thus, finding and combining the complementary feature sets have become an active research topic in recent years. Specifically, global features are related to the holistic characteristics of face, whereas local features describe the finer details within face images, so it seems logical to combine both of these feature sets since the information conveyed by them belongs to different attributes of the face images. In recent times, many researchers are developing the face recognition algorithms by combining the multiple feature sets. Kim et al.  have proposed a combined subspace-based approach using both global and local features obtained by applying linear discriminant analysis (LDA)-based method for face recognition. Zhou and Yang  have proposed fusing feature Fisher classifier (F3C) approach where the face image is first divided into smaller subimages and then the discrete cosine transform (DCT) technique is applied to the whole image and some subimages to extract the holistic and local facial features. After concatenating these DCT-based holistic and local facial features, the enhanced Fisher linear discriminant model (EFM) has been employed to generate a low-dimensional feature vector. Similarly, local and global information extracted by using DCT coefficients along with the Fisher classifier developed for high-dimensional multiclass problem have been proposed in . Singh et al. proposed a robust two-stage face recognition approach by the fusion of global ZMs and Weber law descriptor (WLD)-based local features . The usefulness of combining the global and local facial features is presented in  where a hierarchical ensemble of global and local features is performed. In this technique, 2D Fourier transform is used to extract the global features and the Gabor wavelet is opted to extract local features. Subsequently, equal weights are assigned to both the global and local features for combining the outputs of two classifiers (although it is established by the authors that the contribution of both global and local features is different). Wong et al. have proposed dual optimal multiband feature (DOMF) method for face recognition in which wavelet packet transform (WPT) decomposes the image into frequency subbands and the multiband feature fusion technique is incorporated to select optimal multiband feature sets that are invariant to illumination and facial expression. In this method, parallel radial basis function (RBF) neural networks are used to classify the two sets of features. The decision scores are then combined and processed by an adaptive fusion mechanism . The use of steerable pyramid decomposition (S-P transform) both in global and local appearance and feature/score fusion has been analyzed in . In this work, each face image is described by a subset of band-filtered images containing steerable pyramid coefficient. These S-P subbands are divided into small subblocks to extract the compact and meaningful feature vectors that provide a better representation of the class information. Recently, Liu and Liu  have proposed an approach for face recognition that fuses color and local spatial and global frequency information. This method is composed of multiple features of face images derived from LBP, DCT, hybrid color space, and the Gabor image representation. The combination of Gabor and LBP enhances the power of spatial histogram that is impressively insensitive to appearance variations. This method has proven to be robust against illumination, pose, and expression variations [27, 28]. However, the combination of Gabor and multiresolution LBP descriptors requires significantly greater computation time.
Although a lot of research is going on combining the multiple feature sets, still the selection of complementary feature sets for fusion and the techniques for combining these divergent feature sets are a challenge. In view of that, in this paper, a fusion of two complementary feature sets is proposed, where the global information of the face images is extracted by the ZM descriptor employing its rotation invariance characteristic, while the LBP/LTP descriptor captures the significant local information. Among various global shape descriptors, ZMs are observed to be one of the best shape descriptors because of their many attractive characteristics . They possess minimum information redundancy, rotation invariance of their magnitude, robustness to noise, etc. The estimation of head movement by using the phase coefficients of ZMs of the original image and that of the rotated image has been employed to generate a set of features that is significantly tolerant to pose variation as well . The magnitude features of ZMs obtained at some higher order of moments are observed to be invariant to expression variation . On the other hand, the LBP descriptor is observed to be relatively more insensitive to illumination changes. It is computationally efficient as well as quite simple to implement. Recently, a useful extension to this approach is introduced, namely, the local ternary patterns (LTP), that is observed to be more discriminative and invariant to image noise in near-uniform regions as compared to LBP . Particularly, combining the feature sets that are invariant to global variations as well as to local changes of face images would be an effective approach to achieve an optimal face recognition system. As discussed earlier, the information conveyed by the ZM and LBP/LTP descriptors are distinct and belong to different aspects of a facial image. Fusion of these descriptors is expected to be enriched with the useful characteristics of both of them. One of the critical issues involved in the fusion process is the time spent in the computation of the combined features. It is shown through time analysis of the proposed approach that the total time required for the recognition process is very small and can be afforded by PCs and other low computation devices.
The ZM descriptor provides three different sets of features, namely, magnitude features, combined magnitude and phase features [31, 32], and the modified real and imaginary component features . In this study, these diverse feature sets of ZMs are referred to as ZMmag, ZMmagPhase, and ZMcomponent, respectively. The performance of the feature sets of ZMs combined with LBP descriptor, in comparison to that of the ZMs coupled with the LTP descriptor, is also analyzed. Consequently, the proposed fusion of the diverse feature sets of ZMs and the LBP/LTP descriptors provides various combined approaches such as ZMmagLBP, ZMmagPhaseLBP, ZMcomponentLBP, ZMmagLTP, ZMmagPhaseLTP, and ZMcomponentLTP. In order to compare the performance of these combined approaches to that of the individual ZM and LBP/LTP approaches, exhaustive experiments are performed on three prominent face databases, namely, FERET, Yale, and ORL, against pose, illumination, expression, and noise variations. The results obtained show that the recognition rate of the combined approaches, in comparison to that of their individual counterparts, is significantly better varying between 10 and 30%. Experimental results also prove the efficacy of the proposed methods over other existing works. A significant improvement in recognition rate is achieved for the case of single training image per person.
The rest of the paper is organized as follows: Section 2 presents a brief overview of the ZM approach and the diverse feature sets obtained from it and includes a brief introduction to the LBP/LTP approaches, Section 3 describes the similarity measures used to evaluate the matching score of these methods, the procedure involved in the proposed fusion of the ZM and the LBP/LTP descriptors is described in Section 4, the experiments and results obtained are presented in Section 5, and the conclusions and future directions are presented in Section 6.
2 Baseline image descriptors
2.1 Global image descriptor
2.1.1 Zernike moments
2.1.2 Diverse feature sets of ZMs and related work
Since the magnitude of ZMs is invariant to rotation, usually it is used as invariant image descriptor in many image analysis and pattern recognition applications. The phase component of ZMs is, however, ignored. It is observed that the phase component also carries equally significant information as the magnitude component does . Therefore, in recent years, significant research work has been carried out to incorporate ZM phase coefficients along with their magnitudes as invariant feature descriptors. At present, there are two approaches to realize this objective. In the first approach, developed by Revaud et al. , a similarity measure incorporating both the magnitude and phase coefficients of the query and database image is used. The method provides excellent pattern matching performance but at the cost of enhanced computation time. In the second approach, the rotation angle between a query image and the database image is estimated. It is assumed that the query image is the rotated version of the original database image. The estimated rotation angle is used to cancel the effect of rotation in order to compare the phases of the query and database images. Recently, we devised a novel way to correct phase coefficients without estimating rotation angle. The method was applied successfully in face recognition . The method works as follows: Suppose Z nm and Z′ nm are the ZMs of database and query images, respectively, and ϕ nm and ϕ′ nm are their respective phase angles. We compute and correct the ZMs of the query image by evaluating . If the two images are same, then , otherwise ; therefore, the real and imaginary components of ZMs of the query and database images can be compared separately, instead of comparing only their magnitude. An attractive advantage of this approach is that by using two-component feature vectors, the number of features is almost doubled as compared to the ZM magnitude only features for the same order of moments. This approach has additional advantages of having low computation cost, less susceptibility to image noise, and numerical stability, in addition to providing better recognition rate . Throughout the paper, these features of ZMs based on magnitude, magnitude together with corrected phase , and the corrected real and imaginary parts of ZMs  are referred to as ZMmag, ZMmagPhase, and ZMcomponent, respectively.
2.2 Local image descriptor
2.2.1 Local binary pattern
where the values of i move along the eight neighbors of the central pixel. In case of 8-bit patterns, Ojala et al.  have observed that out of 28 patterns, only 58 uniform patterns provide approximately 90% information of the image neighborhoods while the remaining patterns consist of mostly noise. This attribute significantly reduces the number of LBP histogram bins from 256 to 59 where all the non-uniform patterns are stored in a single bin, the 59th bin.
2.2.2 Local ternary patterns
where w is a user-defined threshold. The LTP code is assumed to be invariant to image noise but may not be strictly invariant to gray-level transformations. The concept of uniform patterns to obtain histogram features is also applicable to LTP. For simplicity, the three-valued LTP codes are split into their positive and negative bisects which generate two sets of histogram features out of which one corresponds to the positive patterns and the other represents the negative patterns .
3 Similarity measures used
In this section, the similarity measures used for finding the matching scores of ZM and LBP/LTP descriptors are discussed briefly. In this work, it is observed that the fusion of matching scores obtained by applying L2 − Norm/L1 − Norm on the ZM descriptor and histogram intersection on the LBP/LTP descriptor generates superior performance. Hence, these different similarity measurement techniques are used on the feature sets generated by these descriptors. Since the matching scores obtained from these different approaches are heterogeneous, normalization is required to transform these matching scores to a common range before combining them.
3.1 Similarity measure for ZM descriptor
Normally, equal weights are assigned to simplify this process, i.e., w1 = w2 = 0.5.
The above mentioned distance metric dcomp has proven to be a better similarity measure between two sets of component feature vectors .
3.2 Similarity measure for LBP/LTP descriptor
where Hd and Hq are the histograms consisting of LBP/LTP features of the database and the query images, respectively. B is the total number of bins in the histograms. If either of the two images, i.e., the database and the query images is identical, then the value of D h (Hd, Hq) is 1.
4 Fusion of ZM and LBP/LTP descriptors
The ZM descriptor and the LBP/LTP descriptors are observed to be complementary to each other, and their fusion is expected to be able to discriminate the face images even in the presence of diverse variations. The ZM descriptor is observed to extract the global information of the images more effectively as compared to that of any other global descriptor . On the other hand, the LBP and the LTP descriptors have been established to be successful methods for representing the finer interior details within the face images. The feature set established by the fusion of these autonomous approaches, i.e., ZMs and the LBP/LTP, is supposed to be enriched with the invariant characteristics of both of them. Exhaustive experiments performed against pose, illumination, expression, and noise variations on the suitable databases prove that the said hypothesis is correct.
where N test is the total number of images in the test set and Nf is the number of images recognized incorrectly.
5 Experiments and results
In order to evaluate the performance of the considered autonomous approaches in comparison to that of the proposed combined approaches, experiments are performed on three well-known and calibrated face databases, namely, FERET face database  consisting of images in diverse variations, Yale face database  consisting of illumination and expression variations, and ORL face database  having small pose (tilt/yaw) changes. It is well known that the accuracy of the face recognition system is significantly affected by the kind of variations present in images of the face database as well as by the number of images of each subject (i.e., person) in the training set. Thus, exhaustive experiments in a comprehensive and deterministic manner are performed with respect to different types of variations present in these databases. The number of training images per person is also varied to observe its effect on recognition accuracy. The best results are highlighted in italics. All the experiments are performed in Visual C++6.0 under Microsoft Windows environment on a PC with a 3.0-GHz CPU and a 3-GB RAM.
5.1 Performance on FERET database
Data partition on FERET database for performing various experiments
Set of experiment
One image of each person resulting in a total of 100 images
Remaining six images of each person in different pose variations, i.e., a total of 600 images
One image of each person in frontal, i.e., 0° pose
Six remaining images having pose variations of ±22.5°, ±67.5°, and ±90°
Against the frontal pose image, all the images in different poses (≤ ± 90°) are taken for testing
One image in frontal pose
Four images having pose variations of ±22.5° and ±67.5°
Testing the performance for small and large pose variation up to ±67.5°
One image in frontal pose
Two images in ±67.5° pose angle
Testing the performance for large pose variation in the left and right directions
One image in frontal pose
Two images in ±22.5° pose angle
Testing the performance for small pose variation in the left and right directions
One image in frontal pose, i.e., the image labeled ‘ba’, is in the training set
Ten remaining images having illumination, expression, and pose (up to ±60°) variations, i.e., the images labeled bk, bj, bb, bc, bd, be, bf, bg, bh, and bi, are kept in the test set
Images consisting of all the three variations, i.e., illumination, expression, and pose (up to ±60°), are taken for testing against the frontal pose image. A total of 200 images are in the training set, and 2,000 images are in the test set
Three random images of each person resulting in a total of 600 images
Remaining seven images of each person in different variations, i.e., 1,600 images
Ten different trials of this setup have been taken, and the recognition result (Table 3) is the average of all these trials
One image of each person in frontal pose, i.e., the image labeled ‘ba,’ is in the training set and contains a maximum of 200 face images in it
Eight images of each person having pose variations (up to ±60°), i.e., the images labeled bb, bc, bd, be, bf, bg, bh, and bi, are kept in the test set resulting in a maximum of 1,600 images in it
This category consists of the images of 200 persons in 9 pose variations, from frontal to profile pose, resulting in a total of 1,800 images in it. In this category, the different experiments are carried out by varying the number of subjects (persons) in the database
Standard gallery set fa contains 1,196 images of 1,196 subjects in frontal view
fafb, fc, dupI, and dupII
The images in the fafb (1,195 images) set are with facial expression variation, the fc (194 images) set contains images with illumination variation, and the images in the dupI (722 images) and dupII (234 images) sets represent aging effects
Different trials for one image in training and remaining six in test set for FERET_A1 category
Image in training set in pose angle
Images in test set in pose angle
±22.5°, ±67.5°, ±90°
0°, −22.5°, ±67.5°, ±90°
0°, +22.5°, ±67.5°, ±90°
0°, ±22.5°, −67.5°, ±90°
0°, ±22.5°, +67.5°, ±90°
0°, ±22.5°, ±67.5°, −90°
0°, ±22.5°, ±67.5°, +90°
Performance of the considered approaches on FERET database
For FERET_B, the recognition results obtained by performing the experiments on FERET_B1 and FERET_B2 categories are also shown in Table 3. It is clear from the experimental results that the combined approaches exhibit approximately 15 to 20% hike in the recognition rate than that obtained by the individual approaches. The highest recognition rate of 78.5 and 88.2% is achieved by the ZMmagPhaseLBP descriptor for FERET_B1 and FERET_B2 categories, respectively. The result obtained on FERET_Gallery/Probe set ascertains the robustness of the proposed system against changes in expression and lighting; however, further research in aging is required. In general, on FERET_B and FERET_Gallery/Probe subsets, the combination of ZMmagPhase with LBP descriptor generates higher results than others.
5.2 Performance on Yale database
Data partition on Yale database for performing various experiments
Set of experiment
All remaining (i.e., remaining ten images except the one selected for the training)
Eleven different trials of this setup have been taken, and the recognition result (Figure 6) is the average of all these trials
The recognition result (Table 5) is the average of ten random trials on each setup, i.e., YALE 2.1, YALE 2.2, YALE 2.3, YALE 2.4, and YALE 2.5
f, g, h, i
Testing consists of experiments against illumination variation
b, c, d, e, j, k
a, f, g, h, i
b, c, d, e, j, k
Testing consists of experiments against expression variation
f, g, h, i
a, b, c, d, e, j, k
From the results obtained, it is observed that an average improvement of approximately 12% is achieved by the proposed combined descriptors as compared to the individual approaches. In case of individual approaches, the performance of both the LBP and LTP approaches is better than that of the three descriptors of ZMs. The result obtained depicts that the local approaches are able to capture the interior details of a face image more efficiently than the global ones. This certainly enhances the suitability of these methods to outperform even in the presence of only a single exemplar image per person. From the results depicted in Figure 6, it is observed that among all the combined approaches, the highest recognition results are achieved by the ZMmagPhaseLBP descriptor. It is also observed that for the proposed combined methods, fusion of ZM features obtained for nmax = 11 provides better results. Hence, on this database, all other experiments have been carried out for this order of moments.
Performance (average) of the individual/combined approaches on Yale database
Performance of the individual/combined approaches over illumination and expression variation on Yale database
For illumination variation, i.e., on YALE 3.1 category, the highest recognition rate of 91.67% is achieved by the proposed ZMmagLBP approach, whereas against expression variation, the ZMmagPhaseLBP descriptor gives the highest recognition rate at 85.56% on YALE 4.1 category. Experiments are also conducted on YALE 3.2 category wherein all of the face images consisting of expression variation are taken in the training set and the remaining ones, i.e., one neutral and four images with illumination changes, are placed in the test set. The results for this setup are also shown in Table 6, from which it is observed that the performance of the ZMmagLBP as well as that of ZMcomponentLBP is better. Particularly, on YALE 3.2 category, a superior recognition rate of 97.33% is achieved by both approaches. Similarly, in case of YALE 4.2 category, four images of each person consisting of illumination variation are used to create the training set while all of the remaining ones (i.e., one neutral and six images in varying expressions) are placed in the test set. As shown in Table 6, ZMmagPhaseLBP achieves a high recognition rate of 98.89% for this category. Thus, from the results shown in Table 6, it can be concluded that ZMmagLBP is illumination invariant and ZMmagPhaseLBP is expression invariant. If we look at the overall performance of the proposed approaches on Yale database, ZMmagLBP and ZMmagPhaseLBP outperform the other combinations.
5.3 Performance on ORL database
Data partition on ORL database for performing various experiments
Set of experiment
All remaining (i.e., remaining ten images except the one selected for the training)
Ten different trials of this setup have been taken, and the recognition result (Figure 8) is the average of all these trials
The recognition result (shown in Table 8) is the average of ten random trials on each setup, i.e., ORL 2.1, ORL 2.2, ORL 2.3, and ORL 2.4
c, g, i, j
Testing consists of experiments against scale and up/down (tilt) pose variation
b, d, e, f
Testing consists of experiments against left/right (yaw) pose variation
Performance of the considered approaches against different pose variations on ORL database
Next, experiments are performed on ORL 3 category by taking two neutral face images in the training set, while four images of each person consisting of scale and up/down head movement are taken in the test set. Similarly, in order to examine the performance of the proposed approaches over yaw pose variation, two neutral images of each person are placed in the training set and four images with slight left/right head movement are placed in the test set, i.e., ORL 4 category. The results obtained from this experimental analysis are also presented in Table 8. On ORL 3 category, the performance of the ZMcomponent coupled with that of the LBP/LTP descriptors is better, achieving a recognition rate of 90.0%. Similarly, on ORL 4 category, the highest recognition rate of 91.25% is achieved by both the ZMmagLBP and the ZMmagPhaseLBP approaches. Thus, in most of the cases of ORL database, the ZMcomponent combined with LBP/LTP outperforms the other proposed combinations.
5.4 Performance analysis against noise variation
Performance of the considered approaches against noise variation
Face datasets comprising additive noise in testing
From the results presented, it is observed that among the individual approaches, the LTP descriptor is more robust to noise variation than the LBP. On Yale and ORL databases (with noise variation), the performance of the proposed ZMmagLTP and ZMcomponentLTP descriptors, respectively, is better as compared to all other combined approaches. On the other hand, if FERET images with noise variations are assessed, then the recognition rate of ZMmagPhaseLBP is 81.5%, whereas the recognition rate of ZMmagLTP is 81.0%. The percentage difference between the actual results obtained (without adding noise and with noise) for both approaches is 5 and 3.5%, respectively. Hence, from this observation, we can say that on FERET database, the performance of the proposed ZMmagLTP descriptor is better against noise variation. For the case of Yale and ORL databases, the degradation due to noise in recognition rates is very less.
5.5 Time complexity
Dimensionality of the feature vectors of the ZM and LBP/LTP descriptors
Size of the feature vector
For moments up to order nmax = 11
2 × 40
Size of the feature vector is double the size of magnitude features
2 × 40
Size of the feature vector is double the size of magnitude features
Taking image patch size of 8 × 8 pixels for uniform local binary patterns
2 × 3,776
Number of features is double the number of LBP features because two feature vectors consisting of the positive and the negative uniform binary patterns are taken
We observe that for an image of 256 × 256 pixels, the CPU elapse time for calculating ZMs is only 0.032 s for nmax = 12 on a PC with a 3.0-GHz CPU and a 1-GB RAM under Microsoft Windows environment. The time taken for computing LBP and LTP features is 0.015 and 0.016 s, respectively. Thus, the total time elapsed for the extraction of the local and global features of a test image does not exceed 0.048 s. The time taken for classification is much less than the feature extraction time. Thus, in comparison to the gain in the recognition performance, the time taken by the combined features is much less and can be afforded by the low computation power devices in online mode. Since the time complexity does not depend on the contents of the image, these experiments are carried out for one image only.
5.6 Performance comparison
Performance comparison (%) of some recent approaches with proposed methods on Yale and ORL databases
Hybrid Fourier-AFMT transform
Comparison of performance of the proposed combined descriptors with other popular methods for face recognition with single (first) example image per person.
Dual optimal multiband features (DOMF)  give a recognition rate of 92.6 and 88.4% on Yale and ORL databases, respectively, when two images of each person are taken in the training set and all the remaining are kept in the test set. On this similar setup for training and test images, the highest recognition rate achieved by the proposed ZMmagPhaseLBP descriptor for YALE 2.1 is 94.59% while the ZMcomponentLBP descriptor achieves a recognition rate of 92.47% for ORL 2.1 category.
Performance comparison (%) of the proposed approaches with recent methods on Yale and ORL databases
Two-dimensional LDA (2D-LDA) 
Direct LDA (DLDA) 
Enhanced Fisher linear discriminant model (EFM) 
Combined feature Fisher classifier (CF2C) 
Feature Fisher classifier (F3C) 
Block based S-Pa
Algorithm A (WMs) 
Algorithm B (CWMs) 
This paper proposes the fusion of two useful feature sets, i.e., the global ZMs and the local LBP/LTP descriptor. Face images capture extensive variation under varying pose and lighting conditions accompanied by the presence of expression and noise. Individually, the ZM and LBP/LTP descriptors are observed to be very effective in providing good recognition performance on the face images containing certain variations. In particular, the ZM descriptor extracts rotationally invariant shape features from the whole face images, whereas the LBP/LTP descriptors are able to capture the fine details and illumination-invariant characteristics within some local regions of the face images. However, the fusion of these two complementary approaches incorporates the benefits of both of these descriptors and as such proves to be invariant against various distortions present in the face images. Herein this work, diverse feature sets of ZMs are combined with LBP/LTP descriptors to generate various combined approaches, namely, ZMmagLBP, ZMmagPhaseLBP, ZMcomponentLBP, ZMmagLTP, ZMmagPhaseLTP, and ZMcomponentLTP. From the detailed experiments performed on FERET, Yale, and ORL face databases, it has been observed that the proposed combined approaches are highly robust against pose, expression, illumination, and noise variations, as the recognition rate achieved by the proposed approaches is approximately 10 to 30% higher than that obtained by applying these approaches individually. Fusion of ZM and LBP descriptor performs better over the pose, expression, and illumination variations, while in the presence of noise, ZMs combined with LTP descriptor generate superior results. Experimental results also prove the efficacy of the proposed methods over other existing techniques. Also, significant improvement in the recognition rate is achieved by the proposed scheme when only single training image per person is available.
Future work is suggested towards discovering the optimal ways to utilize the information acquired by the phase coefficients of ZM descriptor in addition to using different methods of classification to further improve the performance of the proposed fusion approach.
The authors are thankful to the useful comments and suggestions of the anonymous reviewers for raising the standard of the paper. The authors are grateful to the All India Council for Technical Education (AICTE), Govt. of India, New Delhi, India, for supporting the research work vide their file number 8013/RID/BOR/RPS-77/2005-06. We are also grateful to the National Institute of Standards and Technology (firstname.lastname@example.org) for providing FERET face database.
- Zhao W, Chellappa R, Phillips P, Rosenfeld A: Face recognition: a literature survey. ACM Comput. Surv. 2003, 35(4):399-458.View ArticleGoogle Scholar
- Hjelmas E, Low BK: Face detection: a survey. Comput. Vision. Image. Underst. 2001, 83: 236-274.MATHView ArticleGoogle Scholar
- Turk M: A random walk through Eigenspace. IEICE Trans. Inf. Syst. 2001, E84-D(12):1586-1595.Google Scholar
- Mittal N, Walia E: Face recognition using improved fast PCA algorithm. In Proceedings of the IEEE International Congress on Image and Signal Processing (CISp ‘08). Sanya, Hainan; 2008:554-558.Google Scholar
- Belhumeur PN, Hespanha JP, Kriegman DJ: Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19: 711-720.View ArticleGoogle Scholar
- Xu Y, Zhang D, Yang J, J–Y Y: An approach for directly extracting features from matrix data and its application in face recognition. Neurocomputing 2008, 71: 1857-1865.View ArticleGoogle Scholar
- Daoqiang Z, Zhi-Hua Z: (2D)2PCA: Two-directional two-dimensional PCA for efficient face representation and recognition. Neurocomputing 2005, 69: 224-231.View ArticleGoogle Scholar
- Zhang D, Lu G: Review of shape representation and description techniques. Pattern Recognit. 2004, 37(1):1-19.MATHView ArticleGoogle Scholar
- Zhang D, Lu G: Evaluation of MPEG-7 shape descriptors against other shape descriptors. Multimed. Syst. 2003, 9: 15-30.View ArticleGoogle Scholar
- Singh C, Walia E: Fast and numerically stable methods for the computation of Zernike moments. Pattern Recognit. 2010, 43(7):2497-2506.MATHView ArticleGoogle Scholar
- C–Y W, Paramesran R: On the computational aspects of Zernike moments. Image Vis. Comput. 2007, 25: 967-980.View ArticleGoogle Scholar
- Lowe DG: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60(2):91-110.View ArticleGoogle Scholar
- Soyel H, Demirel H: Facial expression recognition based on discriminative scale invariant feature transform. IET Electron. Lett. 2010, 46(5):343-345.View ArticleGoogle Scholar
- Huang L, Shimizu A, Kobatake H: Robust face detection using Gabor filter features. Pattern Recognit. Lett. 2005, 26(11):1641-1649.View ArticleGoogle Scholar
- Ojala T, Pietikäinen M: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24(7):971-987.View ArticleGoogle Scholar
- Ahonen T, Hadid A, Pietikäinen M: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28(12):2037-2041.View ArticleMATHGoogle Scholar
- Jun B, Kim T, Kim D: A compact local binary pattern using maximization of mutual information for face analysis. Pattern Recognit. 2011, 44: 532-543.MATHView ArticleGoogle Scholar
- Tan X, Triggs B: Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans. Image Process. 2010, 19(6):1635-1648.MathSciNetView ArticleGoogle Scholar
- Kim C, Oh J, Choi C: Combined Subspace Method Using Global and Local features For Face Recognition. 4th edition. Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN ‘05), Montreal Canada; 2005:2030-2035.Google Scholar
- Zhou D, Yang X: Feature fusion based face recognition using EFM. Proceedings International Conference on Image Analysis and Recognition (ICIAR ‘04). Lect. Notes Comput. Sci. 2004, 3212: 643-650.View ArticleGoogle Scholar
- Zhou D, Yang X, Peng N, Wang Y: Improved-LDA based face recognition using both facial global and local information. Pattern Recognit. Lett. 2006, 27: 536-543.View ArticleGoogle Scholar
- Singh C, Walia E, Mittal N: Robust two-stage face recognition approach using global and local features. Vis. Comput. 2012, 28(11):1085-1098.View ArticleGoogle Scholar
- Su Y, Shan S, Chen X, Gao W: Hierarchical ensemble of global and local classifiers for face recognition. IEEE Trans. Image Process. 2009, 18(8):1885-1895.MathSciNetView ArticleGoogle Scholar
- Wong Y-W, Seng KP, Li-M A: Dual optimal multiband features for face recognition. Expert Syst, Appl 2010, 37(4):2957-2962.View ArticleGoogle Scholar
- Aroussi ME, Hassouni ME, Ghouzali S, Rziza M, Aboutajdine D: Local appearance based face recognition method using block based steerable pyramid transform. Signal Process 2011, 91: 38-50.MATHView ArticleGoogle Scholar
- Liu Z, Liu C: Fusion of color, local spatial and global frequency information for face recognition. Pattern Recognit. 2010, 43: 2882-2890.MATHView ArticleGoogle Scholar
- Jun B, Lee J, Kim D: A novel illumination-robust face recognition using statistical and non-statistical method. Pattern Recognit. Lett. 2011, 32: 329-336.View ArticleGoogle Scholar
- Moore S, Bowden R: Local binary patterns for multi-view facial expression recognition. Comput. Vision Image Underst. 2011, 115: 541-558.View ArticleGoogle Scholar
- Singh C, Walia E, Mittal N: Rotation invariant complex Zernike moments features and their application to human face and character recognition. IET Comput. Vision 2011, 5(5):255-265.View ArticleGoogle Scholar
- Lajevardi SM, Hussain ZM: Higher order orthogonal moments for invariant facial expression recognition. Digit Signal Process 2010, 20: 1771-1779.View ArticleGoogle Scholar
- Li S, M–C L, Chi-Man P: Complex Zernike moments features for shape based image retrieval. IEEE Trans. Syst. Man. Cybern. C Appl. Rev. 2009, 39: 227-237.View ArticleGoogle Scholar
- Singh C, Mittal N, Walia E: Face recognition using Zernike and complex Zernike moment features. Pattern Recognit. Image Anal. 2011, 21(1):71-81.View ArticleGoogle Scholar
- Revaud J, Lavoue G, Baskurt A: Improving Zernike moments comparison for optimal similarity and rotation angle retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31(4):627-636.View ArticleGoogle Scholar
- Jain A, Nandakumar K, Ross A: Score normalization in multimodal biometric systems. Pattern Recognit. 2005, 38: 2270-2285.View ArticleGoogle Scholar
- The Facial Recognition Technology (FERET) face database http://www.nist.gov/itl/iad/ig/colorferet.cfm
- Yale face database http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
- Olivetti Research Laboratory (ORL) face database http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
- Li J, Pan J-S: A novel pose and illumination robust face recognition with a single training image per person algorithm. Chin. Optic. Lett. 2008, 6(4):255-257.View ArticleGoogle Scholar
- Chen YM, Chiang J-H: Fusing multiple features for Fourier Mellin-based face recognition with single example image per person. Neurocomputing 2010, 73(16–18):3089-3096.View ArticleGoogle Scholar
- Kuo C-H, Lee JD: Face recognition based on a two-view projective transformation using one sample per subject. IET Comput. Vision 2012, 6(5):489-498.MathSciNetView ArticleGoogle Scholar
- Singh C, Sahan AM: Face recognition using complex wavelet moments. Opt. Laser Technol. 2013, 47: 256-267.View ArticleGoogle Scholar
- Zhi R, Ruan Q: Two-dimensional direct and weighted linear discriminant analysis for face recognition. Neurocomputing 2008, 71: 3607-3611.View ArticleGoogle Scholar
- Wang Y, Wu Y: Face recognition using Intrinsicfaces. Pattern Recognit. 2010, 43: 3580-3590.MATHView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.