Complementary feature sets for optimal face recognition

Singh, Chandan; Mittal, Neerja; Walia, Ekta

doi:10.1186/1687-5281-2014-35

Research
Open access
Published: 05 July 2014

Complementary feature sets for optimal face recognition

Chandan Singh¹,
Neerja Mittal² &
Ekta Walia³

EURASIP Journal on Image and Video Processing volume 2014, Article number: 35 (2014) Cite this article

3704 Accesses
12 Citations
Metrics details

Abstract

In face recognition tasks, one kind of feature set is not adequate to generate superior results; thus, selection and combination of complementary features are crucial steps. In this paper, the fusion of two useful descriptors, i.e., the Zernike moments (ZMs) and the local binary pattern (LBP)/local ternary pattern (LTP), has been proposed. The ZM descriptor consists of good global image representation capabilities besides being invariant to image rotation and noise, while the LBP/LTP descriptors capture the innate details within some local parts of face image and are insensitive to illumination variations. The fusion of these two is observed to incorporate the traits of both of these individual descriptors. Subsequently, in this work, the performance of diverse feature sets of ZMs (i.e., magnitude features, magnitude plus phase features, and the real plus imaginary component features) combined with the LBP/LTP descriptor is analyzed on FERET, Yale, and ORL face databases. The recognition results achieved by the proposed method are approximately 10 to 30% higher than those obtained with these descriptors separately. Recognition rates of the proposed method are also found to be significantly better (i.e., by 8 to 24%) in case of single example image per person in the training.

1 Introduction

In recent times, face recognition has become one of the widely used biometric techniques having a number of real-world applications like human-computer interaction, surveillance, authentication, computer vision applications, computer user interfaces, etc. An automatic face recognition system consists of some methods to ascertain a person's identity on the basis of his/her physiological characteristics. The sensitivity of available classifiers to different kinds of disparities such as illumination variation, facial expression, facial occlusion, pose variation, aging, etc. is among the most challenging problems that the researchers face [1].

In order to improve the existing face recognition techniques, discriminative competence of the invariant features selected to represent the face images should be high because, thereafter, classification is performed on the basis of these invariant features only. In literature, the approaches used to represent the face images are classified broadly into two categories, namely, the global feature extraction approaches and the local feature extraction approaches [2]. The global feature extraction approaches are based on the statistical methods, wherein features are extracted from the whole face image. In this category, the subspace-based methods, namely, principal component analysis (PCA), Fisher linear discriminant (FLD), two-dimensional PCA (2DPCA), and two-directional two-dimensional PCA (2D²PCA) [3–7], are some of the popular and most frequently employed techniques. Moment invariants, such as Hu's seven moment invariants and orthogonal rotation invariant moments such as Zernike moments (ZMs), pseudo-Zernike moments (PZMs), and orthogonal Fourier-Mellin moments (OFMMs), are observed to be very effective in global image description and recognition [8], and MPEG-7 uses some of them as region-based shape descriptors for image retrieval [9]. The magnitude of these moments is invariant to image rotation, and after applying some geometric transformations, it becomes invariant to translation and scale [10, 11].

Local feature extraction approaches deal with fine information within the specific parts of face images such as eyes, nose, mouth, etc. Recently, a lot of work has been done on these methods because the local features are known to be robust against illumination, occlusion, expression, and noise variations. The local feature extraction approaches have been classified into two categories, i.e., the sparse descriptors and the dense descriptors. The sparse descriptors initially divide a face image into patches and then determine its invariant features. A prominent descriptor in this category is the scale-invariant feature transform (SIFT) introduced by Lowe [12], which possesses useful characteristics of being invariant to scale and rotation. Soyel et al. used the discriminative SIFT (D-SIFT) approach for optimal facial expression recognition, but this method is somewhat susceptible to the illumination variation [13]. In face recognition technology, Gabor wavelet is one of the most frequently used and successful local image descriptors. It incorporates the characteristics of space and frequency domains. The local features extracted by Gabor filters are invariant to scale and orientation and are able to detect edges and lines in the face images [14]. The main difficulty with Gabor filters is their high computational complexity. In case of the dense descriptors, local binary pattern (LBP) is one of the most widely used approaches due to its invariance to monotonic gray-level changes and ease in extraction of the local features. Apart from texture analysis, it has provided excellent results in many areas of image processing and computer vision including its wide use in face recognition [15, 16]. Several variants of LBP are available in literature to represent face images with compact feature sets. Such variants also improve the classification performance of the basic LBP approach [17, 18].

In complex applications like face recognition, it is observed that one kind of feature set is not rich enough to capture the entire face information. Thus, finding and combining the complementary feature sets have become an active research topic in recent years. Specifically, global features are related to the holistic characteristics of face, whereas local features describe the finer details within face images, so it seems logical to combine both of these feature sets since the information conveyed by them belongs to different attributes of the face images. In recent times, many researchers are developing the face recognition algorithms by combining the multiple feature sets. Kim et al. [19] have proposed a combined subspace-based approach using both global and local features obtained by applying linear discriminant analysis (LDA)-based method for face recognition. Zhou and Yang [20] have proposed fusing feature Fisher classifier (F³C) approach where the face image is first divided into smaller subimages and then the discrete cosine transform (DCT) technique is applied to the whole image and some subimages to extract the holistic and local facial features. After concatenating these DCT-based holistic and local facial features, the enhanced Fisher linear discriminant model (EFM) has been employed to generate a low-dimensional feature vector. Similarly, local and global information extracted by using DCT coefficients along with the Fisher classifier developed for high-dimensional multiclass problem have been proposed in [21]. Singh et al. proposed a robust two-stage face recognition approach by the fusion of global ZMs and Weber law descriptor (WLD)-based local features [22]. The usefulness of combining the global and local facial features is presented in [23] where a hierarchical ensemble of global and local features is performed. In this technique, 2D Fourier transform is used to extract the global features and the Gabor wavelet is opted to extract local features. Subsequently, equal weights are assigned to both the global and local features for combining the outputs of two classifiers (although it is established by the authors that the contribution of both global and local features is different). Wong et al. have proposed dual optimal multiband feature (DOMF) method for face recognition in which wavelet packet transform (WPT) decomposes the image into frequency subbands and the multiband feature fusion technique is incorporated to select optimal multiband feature sets that are invariant to illumination and facial expression. In this method, parallel radial basis function (RBF) neural networks are used to classify the two sets of features. The decision scores are then combined and processed by an adaptive fusion mechanism [24]. The use of steerable pyramid decomposition (S-P transform) both in global and local appearance and feature/score fusion has been analyzed in [25]. In this work, each face image is described by a subset of band-filtered images containing steerable pyramid coefficient. These S-P subbands are divided into small subblocks to extract the compact and meaningful feature vectors that provide a better representation of the class information. Recently, Liu and Liu [26] have proposed an approach for face recognition that fuses color and local spatial and global frequency information. This method is composed of multiple features of face images derived from LBP, DCT, hybrid color space, and the Gabor image representation. The combination of Gabor and LBP enhances the power of spatial histogram that is impressively insensitive to appearance variations. This method has proven to be robust against illumination, pose, and expression variations [27, 28]. However, the combination of Gabor and multiresolution LBP descriptors requires significantly greater computation time.

Although a lot of research is going on combining the multiple feature sets, still the selection of complementary feature sets for fusion and the techniques for combining these divergent feature sets are a challenge. In view of that, in this paper, a fusion of two complementary feature sets is proposed, where the global information of the face images is extracted by the ZM descriptor employing its rotation invariance characteristic, while the LBP/LTP descriptor captures the significant local information. Among various global shape descriptors, ZMs are observed to be one of the best shape descriptors because of their many attractive characteristics [8]. They possess minimum information redundancy, rotation invariance of their magnitude, robustness to noise, etc. The estimation of head movement by using the phase coefficients of ZMs of the original image and that of the rotated image has been employed to generate a set of features that is significantly tolerant to pose variation as well [29]. The magnitude features of ZMs obtained at some higher order of moments are observed to be invariant to expression variation [30]. On the other hand, the LBP descriptor is observed to be relatively more insensitive to illumination changes. It is computationally efficient as well as quite simple to implement. Recently, a useful extension to this approach is introduced, namely, the local ternary patterns (LTP), that is observed to be more discriminative and invariant to image noise in near-uniform regions as compared to LBP [18]. Particularly, combining the feature sets that are invariant to global variations as well as to local changes of face images would be an effective approach to achieve an optimal face recognition system. As discussed earlier, the information conveyed by the ZM and LBP/LTP descriptors are distinct and belong to different aspects of a facial image. Fusion of these descriptors is expected to be enriched with the useful characteristics of both of them. One of the critical issues involved in the fusion process is the time spent in the computation of the combined features. It is shown through time analysis of the proposed approach that the total time required for the recognition process is very small and can be afforded by PCs and other low computation devices.

The ZM descriptor provides three different sets of features, namely, magnitude features, combined magnitude and phase features [31, 32], and the modified real and imaginary component features [29]. In this study, these diverse feature sets of ZMs are referred to as ZM_mag, ZM_magPhase, and ZM_component, respectively. The performance of the feature sets of ZMs combined with LBP descriptor, in comparison to that of the ZMs coupled with the LTP descriptor, is also analyzed. Consequently, the proposed fusion of the diverse feature sets of ZMs and the LBP/LTP descriptors provides various combined approaches such as ZM_magLBP, ZM_magPhaseLBP, ZM_componentLBP, ZM_magLTP, ZM_magPhaseLTP, and ZM_componentLTP. In order to compare the performance of these combined approaches to that of the individual ZM and LBP/LTP approaches, exhaustive experiments are performed on three prominent face databases, namely, FERET, Yale, and ORL, against pose, illumination, expression, and noise variations. The results obtained show that the recognition rate of the combined approaches, in comparison to that of their individual counterparts, is significantly better varying between 10 and 30%. Experimental results also prove the efficacy of the proposed methods over other existing works. A significant improvement in recognition rate is achieved for the case of single training image per person.

The rest of the paper is organized as follows: Section 2 presents a brief overview of the ZM approach and the diverse feature sets obtained from it and includes a brief introduction to the LBP/LTP approaches, Section 3 describes the similarity measures used to evaluate the matching score of these methods, the procedure involved in the proposed fusion of the ZM and the LBP/LTP descriptors is described in Section 4, the experiments and results obtained are presented in Section 5, and the conclusions and future directions are presented in Section 6.

2 Baseline image descriptors

2.1 Global image descriptor

2.1.1 Zernike moments

The Zernike functions constitute a set of orthogonal basis functions mapped over the unit circle. Zernike moments of a function f (x, y) are constructed by projecting it onto those functions. The ZMs of order n and repetition m are defined by

Z_{nm} = \frac{n + 1}{π} \iint_{x^{2} + y^{2} \leq 1} f (x, y) V_{nm}^{*} (x, y) dxdy

(1)

where n ≥ 0, |m| ≤ n, and $V_{nm}^{*} (x, y)$ are the complex conjugates of the Zernike function V_nm(x, y), where

V_{nm} (x, y) = R_{nm} (x, y) e^{jmθ}

(2)

with $= \sqrt{- 1}$ , θ = tan⁻¹( y/x ), θ∈ [0, 2π], and

R_{nm} (x, y) = \sum_{s = 0}^{(n - |m|) / 2} \frac{{(- 1)}^{s} (n - s)! {(x^{2} + y^{2})}^{\frac{n - 2 s}{2}}}{s! (\frac{(n + |m|)}{2} - s)! (\frac{(n - |m|)}{2} - s)!}

(3)

The ZMs are derived for a discrete image function using zeroth-order approximation of Equation 1 given by [10]

\begin{array}{l} Z_{nm} = \frac{n + 1}{π} \sum_{i = 0}^{N - 1} \sum_{k = 0}^{N - 1} f (i, k) V_{nm}^{*} (x_{i}, y_{k}) Δ^{2} \\ x_{i}^{2} + y_{k}^{2} \leq 1 \end{array}

(4)

where

\begin{array}{l} x_{i} = \frac{2 i + 1 - N}{N \sqrt{2}}, y_{k} = \frac{2 k + 1 - N}{N \sqrt{2}}, \\ i, k = 0, 1, 2, \dots, N - 1, and Δ = \frac{2}{N \sqrt{2}} \end{array}

(5)

2.1.2 Diverse feature sets of ZMs and related work

Since the magnitude of ZMs is invariant to rotation, usually it is used as invariant image descriptor in many image analysis and pattern recognition applications. The phase component of ZMs is, however, ignored. It is observed that the phase component also carries equally significant information as the magnitude component does [31]. Therefore, in recent years, significant research work has been carried out to incorporate ZM phase coefficients along with their magnitudes as invariant feature descriptors. At present, there are two approaches to realize this objective. In the first approach, developed by Revaud et al. [33], a similarity measure incorporating both the magnitude and phase coefficients of the query and database image is used. The method provides excellent pattern matching performance but at the cost of enhanced computation time. In the second approach, the rotation angle between a query image and the database image is estimated. It is assumed that the query image is the rotated version of the original database image. The estimated rotation angle is used to cancel the effect of rotation in order to compare the phases of the query and database images. Recently, we devised a novel way to correct phase coefficients without estimating rotation angle. The method was applied successfully in face recognition [29]. The method works as follows: Suppose Z_nm and Z′_nm are the ZMs of database and query images, respectively, and ϕ_nm and ϕ′_nm are their respective phase angles. We compute $mθ = ϕ_{nm} - ϕ_{nm}^{'}$ and correct the ZMs of the query image by evaluating $Z_{nm}^{' c} = Z_{nm}^{'} e^{jmθ}$ . If the two images are same, then $Z_{nm}^{' c} = Z_{nm}$ , otherwise $Z_{nm}^{' c} \neq Z_{nm}$ ; therefore, the real and imaginary components of ZMs of the query and database images can be compared separately, instead of comparing only their magnitude. An attractive advantage of this approach is that by using two-component feature vectors, the number of features is almost doubled as compared to the ZM magnitude only features for the same order of moments. This approach has additional advantages of having low computation cost, less susceptibility to image noise, and numerical stability, in addition to providing better recognition rate [29]. Throughout the paper, these features of ZMs based on magnitude, magnitude together with corrected phase [31], and the corrected real and imaginary parts of ZMs [29] are referred to as ZM_mag, ZM_magPhase, and ZM_component, respectively.

2.2 Local image descriptor

2.2.1 Local binary pattern

Ojala et al. have introduced the local binary patterns (LBP) for effective texture description that has been used in many image processing and computer vision applications [15]. The most important property of this approach is its tolerance against illumination variation. Being computationally simple, it provides significant advantage over other approaches. The LBP operator takes some specific neighborhood around each pixel, and it then thresholds the values of these neighborhood pixels with respect to the central pixel's value. The resulting binary pattern is used as an element of the local image descriptor. Thus, it assigns a label to every pixel p_i of an image by thresholding its respective 3 × 3 neighborhood values with the value of the central pixel p_c and producing the result in the form of an 8-bit binary code. The LBP operator is computed as

LBP = \sum_{i = 0}^{7} 2^{i} b (p_{i} - p_{c})

(6)

b (p_{i} - p_{c}) = \{\begin{cases} 1, if p_{i} \geq p_{c} \\ 0, otherwise \end{cases}

(7)

where the values of i move along the eight neighbors of the central pixel. In case of 8-bit patterns, Ojala et al. [15] have observed that out of 2⁸ patterns, only 58 uniform patterns provide approximately 90% information of the image neighborhoods while the remaining patterns consist of mostly noise. This attribute significantly reduces the number of LBP histogram bins from 256 to 59 where all the non-uniform patterns are stored in a single bin, the 59th bin.

2.2.2 Local ternary patterns

The local histogram features obtained from LBP have proven to be highly discriminative in face recognition [16, 17]. However, they are found to be sensitive to noise because of the fact that they are thresholded exactly at the value of the central pixel especially in near-uniform and smooth regions of face images like cheeks or forehead. Recently, an important extension to the original LBP is provided by Tan et al. [18]. It generates three-valued codes corresponding to each image pixel. In their method, the binary LBP code is replaced with the ternary LTP code and the gray values in a zone of width ± w around the central pixel p_c are quantized to 0 and the values above this zone are quantized to +1 while those below it are quantized to −1. Specifically, the value of b(x) given in Equation 7 is replaced with the following three-valued function:

b^{'} (p_{i}, p_{c}, w) = \{\begin{cases} 1, p_{i} \geq (p_{c} + w) \\ 0, |p_{i} - p_{c}| < w \\ - 1, p_{i} \leq (p_{c} - w) \end{cases}

(8)

where w is a user-defined threshold. The LTP code is assumed to be invariant to image noise but may not be strictly invariant to gray-level transformations. The concept of uniform patterns to obtain histogram features is also applicable to LTP. For simplicity, the three-valued LTP codes are split into their positive and negative bisects which generate two sets of histogram features out of which one corresponds to the positive patterns and the other represents the negative patterns [18].

3 Similarity measures used

In this section, the similarity measures used for finding the matching scores of ZM and LBP/LTP descriptors are discussed briefly. In this work, it is observed that the fusion of matching scores obtained by applying L₂ − Norm/L₁ − Norm on the ZM descriptor and histogram intersection on the LBP/LTP descriptor generates superior performance. Hence, these different similarity measurement techniques are used on the feature sets generated by these descriptors. Since the matching scores obtained from these different approaches are heterogeneous, normalization is required to transform these matching scores to a common range before combining them.

3.1 Similarity measure for ZM descriptor

The magnitude features of ZMs, i.e., ZM_mag, of two images are compared by evaluating the normalized Euclidean distance (L₂ − Norm) between them. The normalized L₂ − Norm between the two sets of feature vectors of ZMs is given by

d_{mag} = \frac{1}{L} \sqrt{\frac{{(|Z_{i}^{'}| - |Z_{i}|)}^{2}}{max ({|Z_{i}^{'}|}^{2}, {|Z_{i}|}^{2})}}

(9)

where $Z_{i}^{'}$ and Z_i are the feature vectors of the query and the database images, respectively, and L represents the size of the feature vector consisting of the magnitude of ZMs. The normalized Euclidean distance d_Phase defined by [31], between ZM phases of the query and the database images, is computed as

d_{Phase} = \frac{1}{L} \sqrt{\sum_{i = 1}^{L} \frac{1}{2 π} {(|φ_{i}^{'}| - |φ_{i}|)}^{2}}

(10)

where φ is the phase angle of the database image and φ′ is the phase angle after estimating the rotation angle between the query and database images and correcting the phase [31]. The total distance d_magPhase between the feature vectors consisting of ZM_magPhase coefficients has been evaluated by using the distances d_mag and d_Phase, computed as per Equations 9 and 10, respectively. The formula used to compute the d_magPhase is given as

d_{magPhase} = (w_{1} d_{mag} + {w_{2} d}_{Phase}) / (w_{1} + w_{2})

(11)

Normally, equal weights are assigned to simplify this process, i.e., w₁ = w₂ = 0.5.

The ZM_component descriptor includes the modified real and imaginary parts of ZMs to formulate a two-component feature vector for each ZM. The normalized L₁ − Norm-based distance measure for the evaluation of similarity between component features of the database and the query images is given as under

d_{comp} = \frac{1}{L} \sum_{i = 1}^{L} \frac{(|Re (Z_{i}) - Re ({Z^{'}}_{i}^{c})|)}{max (|Re (Z_{i})|, |Re ({Z^{'}}_{i}^{c})|)} + \frac{(|Im (Z_{i}) - Im ({Z^{'}}_{i}^{c})|)}{max (|Im (Z_{i})|, |Im ({Z^{'}}_{i}^{c})|)}

(12)

The above mentioned distance metric d_comp has proven to be a better similarity measure between two sets of component feature vectors [29].

3.2 Similarity measure for LBP/LTP descriptor

The histogram intersection distance has been used to compare the feature vectors of query and database images for both the LBP and the LTP descriptors. The histogram intersection distance evaluated for every bin n between the database image and the query image is given as

D_{h} (Hd, Hq) = \frac{\sum_{n = 1}^{B} min (H d_{n}, H q_{n})}{min (\sum_{n = 1}^{B} H d_{n}, \sum_{n = 1}^{B} H q_{n})}, D_{h} (Hd, Hq) \in [0, 1]

(13)

where Hd and Hq are the histograms consisting of LBP/LTP features of the database and the query images, respectively. B is the total number of bins in the histograms. If either of the two images, i.e., the database and the query images is identical, then the value of D_h(Hd, Hq) is 1.

4 Fusion of ZM and LBP/LTP descriptors

The ZM descriptor and the LBP/LTP descriptors are observed to be complementary to each other, and their fusion is expected to be able to discriminate the face images even in the presence of diverse variations. The ZM descriptor is observed to extract the global information of the images more effectively as compared to that of any other global descriptor [9]. On the other hand, the LBP and the LTP descriptors have been established to be successful methods for representing the finer interior details within the face images. The feature set established by the fusion of these autonomous approaches, i.e., ZMs and the LBP/LTP, is supposed to be enriched with the invariant characteristics of both of them. Exhaustive experiments performed against pose, illumination, expression, and noise variations on the suitable databases prove that the said hypothesis is correct.

The procedure followed to recognize the face images by the proposed combined approaches, i.e., ZM_magLBP, ZM_magPhaseLBP, ZM_componentLBP, ZM_magLTP, ZM_magPhaseLTP, and ZM_componentLTP, is briefly described in Figure 1. The recognition of face images through the proposed fusion of feature sets includes three stages - feature extraction, fusion of similarity score, and classification. The first stage of this procedure creates the invariant feature sets extracted by using ZM and LBP/LTP descriptors. The second stage involves fusion of the matching scores obtained from these feature sets after applying the similarity measures as described in the previous section. A number of feasible techniques such as fusion at the feature extraction level, matching score level, or decision level exist for combining the multiple feature sets. It is not easy to combine the information at the feature level when the feature sets obtained by different techniques are either inaccessible or incompatible. Fusion at the decision level is too rigid as only a limited amount of information is available at this level. Therefore, integration at the matching score level is generally preferred due to the ease of accessing and combining matching scores [34]. In the proposed work, feature vectors are obtained by applying ZM and LBP/LTP descriptors which provide complementary information. Further, we observed that for the LBP/LTP descriptor, the matching score evaluated by the histogram intersection measure gives better results than using L₂ − Norm. Hence, in this work, the fusion at the matching score level is employed wherein the histogram intersection (using Equation 13) and L₂ − Norm (using Equations 9 and 11 for ZM_mag and ZM_magPhase)/L₁ − Norm (using Equation 12 for ZM_component) is used for evaluating the matching scores of the feature vectors obtained from the LBP/LTP and ZM descriptors, respectively. Thereafter, these individual matching scores are combined by using the sum rule to generate a single scalar score which is then used to make the final decision. The sum rule to compute the fusion of individual matching scores is given as below

sum rule (F_{sr}) = \frac{\{S_{Z} + (1 - S_{L})\}}{2}

(14)

where S_Z and S_L represent the matching scores of the ZM descriptor and the LBP/LTP descriptors, respectively. The matching scores of these approaches are normalized before fusion. Normalization is required to map the matching scores obtained from multiple frameworks to a common range so that they can be easily combined. In order to combine these matching scores, S_L is subtracted from 1, so that the histogram intersection would now signify higher similarity with lower values. Finally, in the third stage, we use the nearest neighbor rule to perform classification. This method always gives us only one recognized image which is labeled as either correct or incorrect in order to evaluate the recognition performance. The recognition rate (in percentage) is measured by using the following formula:

RecRate = \frac{(N test - Nf)}{N test} \times 100

(15)

where N test is the total number of images in the test set and Nf is the number of images recognized incorrectly.

5 Experiments and results

In order to evaluate the performance of the considered autonomous approaches in comparison to that of the proposed combined approaches, experiments are performed on three well-known and calibrated face databases, namely, FERET face database [35] consisting of images in diverse variations, Yale face database [36] consisting of illumination and expression variations, and ORL face database [37] having small pose (tilt/yaw) changes. It is well known that the accuracy of the face recognition system is significantly affected by the kind of variations present in images of the face database as well as by the number of images of each subject (i.e., person) in the training set. Thus, exhaustive experiments in a comprehensive and deterministic manner are performed with respect to different types of variations present in these databases. The number of training images per person is also varied to observe its effect on recognition accuracy. The best results are highlighted in italics. All the experiments are performed in Visual C++6.0 under Microsoft Windows environment on a PC with a 3.0-GHz CPU and a 3-GB RAM.

5.1 Performance on FERET database

FERET grayscale face database has become the most popular and standard database in the field of face recognition. We have performed experiments on two subsets of this database, consisting of the frontal to profile pose variation. The first subset is formed by randomly selecting 100 persons with seven different poses (yaw) 0°, ±22.5°, ±67.5°, and ±90°. The second subset consisted of FERET ‘b’ category images of 200 persons in different illumination, expression, and pose angles of 0°, ±15°, ±25°, ±40°, and ±60°. In this work, the first subset is called FERET_A. It consists of 700 images. The second subset is named as FERET_B and contains 2,200 images. FERET evaluation protocol partitions the database into gallery (1,196 images of 1,196 persons) and four probe sets, namely, fafb, fc, dup I, and dup II. The images in the fafb set are with facial expression variation, the fc set contains images with illumination variations, and the images in the dupI and dupII sets represent aging effects. For detailed experimentation against pose, expression, and illumination variations, various data partitions are generated for these subsets which are described in Table 1. The original images of this database are of size 256 × 384 pixels. We transformed these images into 128 × 128-pixel size in order to reduce the time taken for conducting the experiments, while the face images from FERET_Gallery/Probe subset are cropped and resized to 64 × 64 pixels. Some sample images, for one person, from this database are shown in Figure 2. The face images of FERET_A, FERET_B, and FERET_Gallery/Probe subsets are partitioned into 64 patches of 16 × 16 and 8 × 8 pixels, respectively, to extract the local LBP/LTP features, while the global ZM features are extracted from whole face images.

Table 1 Data partition on FERET database for performing various experiments

Full size table

In order to analyze the performance of the proposed combined approaches, the first set of comprehensive experiments is performed on FERET_A1 category as described in Table 1. The different possible trials for this setup, containing various combinations of the training and the test sets, are shown in Table 2. The average recognition performance of the individual and the combined approaches over these different trials (seven) is presented in Figure 3 for different values of maximum order of ZMs, denoted by n_max. Further, it is pertinent to mention here that the values of n_max have no effect on the performance of the LBP and LTP descriptors. So, the results presented for the LBP/LTP descriptors remain the same for each value of n_max. From the results presented, it is observed that among the autonomous approaches, the performance of the ZM_component approach is better than that of others. However, the proposed combined approaches exhibit significantly high recognition rates compared to their individual counterparts. In this experiment, the ZM_magPhaseLBP approach provides the highest recognition rate at 71.24%.

Table 2 Different trials for one image in training and remaining six in test set for FERET_A1 category

Full size table

Next, experiments are performed on FERET_A2 category, and the recognition results are presented in Figure 4 for both the individual and the combined approaches over different order of moments n_max used for the ZM descriptors. The basic LBP/LTP descriptors are not invariant to rotation; however, in this category, they perform better than the ZM descriptor. This is due to the fact that the higher pose angles occlude a significant portion of the face and on this kind of distortion, the local feature sets are observed to be more successful than the global features. On the other hand, the proposed fusion of the ZM descriptors and the LBP/LTP descriptors achieves approximately 20% improvement in the recognition results in comparison to that of these independent approaches. It is also noticed that the ZM descriptors coupled with the LBP descriptor generate superior results and the highest recognition rate of 77.33% is accomplished by the ZM_magPhaseLBP approach. In case of the LTP descriptor, the LTP combined with the ZM_component descriptor, i.e., ZM_componentLTP, provides better results than those of other combinations.

Further experiments are performed on FERET_A3, FERET_A4, and FERET_A5 categories. The recognition results for these setups are shown in Table 3. On this database, fusion of the ZM features obtained for n_max = 9 provides better results. Accordingly, here, all other experiments have been conducted only for this order of moments. It is observed from Table 3 that there is an improvement in recognition rates by approximately 10 to 20% due to the fusion of the ZM and LBP/LTP descriptors. Further, it is observed that the recognition rates decline significantly with increase in the pose angle of test images (e.g., the highest recognition rate of only 74.5% is noticed for FERET_A4 which contains pose variations of ±67.5° in test images). This outcome is obvious because of the fact that the presence of the higher pose angle will occlude a significant part of the face image. The highest recognition rate of 86.5% is achieved by the ZM_magPhaseLBP approach on FERET_A3 category. On FERET_A5 category, a superior recognition rate of 99.5% is achieved by both the ZM_componentLBP and the ZM_componentLTP approaches. Similarly, the ZM_magPhaseLBP approach provides the highest recognition rate on FERET_A4 category which reveals the contribution of ZM phase coefficients towards the improvement in recognition results.

Table 3 Performance of the considered approaches on FERET database

Full size table

For FERET_B, the recognition results obtained by performing the experiments on FERET_B1 and FERET_B2 categories are also shown in Table 3. It is clear from the experimental results that the combined approaches exhibit approximately 15 to 20% hike in the recognition rate than that obtained by the individual approaches. The highest recognition rate of 78.5 and 88.2% is achieved by the ZM_magPhaseLBP descriptor for FERET_B1 and FERET_B2 categories, respectively. The result obtained on FERET_Gallery/Probe set ascertains the robustness of the proposed system against changes in expression and lighting; however, further research in aging is required. In general, on FERET_B and FERET_Gallery/Probe subsets, the combination of ZM_magPhase with LBP descriptor generates higher results than others.

5.2 Performance on Yale database

The Yale face database contains 11 images per person for 15 individuals resulting in a total of 165 images. The images in this database have major variations in illumination and facial expressions. They also have images demonstrating occlusion of eyes with eyeglasses. The original size of the images in this database is 243 × 320 pixels with 256 gray levels. For the experiments, these are cropped down to 64 × 64 pixels. Sample cropped images from this database, for one person, are shown in Figure 5. Here also, the face images are partitioned into 64 patches of 8 × 8 pixels to extract the local LBP/LTP features.

In order to examine the improvement in performance by the proposed combined approaches across the expression and illumination variations, exhaustive experiments are performed on this database by taking different number of images in training and test sets. Accordingly, for this purpose, various data partitions have been generated which are presented in Table 4. The first set of comprehensive experiments is performed on YALE 1 category where out of total 11 images of each person, one image is taken in the training set and all the remaining are placed in the test set. This process is repeated 11 times by taking different face images of each person in the training set. Average recognition results over 11 different runs of training and test sets are presented in Figure 6 for different n_max of ZMs.

Table 4 Data partition on Yale database for performing various experiments

Full size table

From the results obtained, it is observed that an average improvement of approximately 12% is achieved by the proposed combined descriptors as compared to the individual approaches. In case of individual approaches, the performance of both the LBP and LTP approaches is better than that of the three descriptors of ZMs. The result obtained depicts that the local approaches are able to capture the interior details of a face image more efficiently than the global ones. This certainly enhances the suitability of these methods to outperform even in the presence of only a single exemplar image per person. From the results depicted in Figure 6, it is observed that among all the combined approaches, the highest recognition results are achieved by the ZM_magPhaseLBP descriptor. It is also observed that for the proposed combined methods, fusion of ZM features obtained for n_max = 11 provides better results. Hence, on this database, all other experiments have been carried out for this order of moments.

Next, the experiments are performed on YALE 2.1, YALE 2.2, YALE 2.3, YALE 2.4, and YALE 2.5 categories. The average recognition results over ten trials of each group of the training and the test sets are presented in Table 5. It is well known that the LBP and LTP descriptors are invariant to changes in intensities of the images, so the results obtained by these two approaches are quite higher than those obtained by the ZM descriptors. Hence, on this database, it has been realized that the LBP/LTP feature sets contribute much more towards the improvement in the recognition rate of the proposed combined approaches which are significantly higher than the individual approaches. In most of the cases, the LBP and LTP descriptors combined with ZM_mag features generate superior results. For example, on YALE 2.4 category, the highest recognition rate of 97.56% is achieved by the ZM_magLBP approach. However, on YALE 2.3 category, the average highest recognition rate is 97.14% with the ZM_magLBP approach. Thus, in general, the proposed ZM_magLBP approach outperforms the others.

Table 5 Performance (average) of the individual/combined approaches on Yale database

Full size table

Thereafter, experiments are performed on YALE 3.1 and YALE 3.2 categories against illumination variation. Similarly, in order to examine the performance of the proposed approaches particularly over expression variation, experiments are carried out on YALE 4.1 and YALE 4.2 categories. The results obtained from these experiments are presented in Table 6. From the results shown, it is clearly noticed that the proposed combined approaches show an improvement in performance by approximately 30% over the ZM descriptors alone, whereas in comparison to the performance of individual LBP/LTP descriptors, an improvement of approximately 10% is achieved. As described earlier, among the individual approaches, the performance of the LBP/LTP descriptors is better than that of the ZM descriptor. Between these two descriptors, the LBP descriptor generates higher recognition rate against the illumination and expression variations on YALE 3.1, YALE 3.2, and YALE 4.2 categories while the LTP descriptor gives higher results on only YALE 4.1 category for expression variation.

Table 6 Performance of the individual/combined approaches over illumination and expression variation on Yale database

Full size table

For illumination variation, i.e., on YALE 3.1 category, the highest recognition rate of 91.67% is achieved by the proposed ZM_magLBP approach, whereas against expression variation, the ZM_magPhaseLBP descriptor gives the highest recognition rate at 85.56% on YALE 4.1 category. Experiments are also conducted on YALE 3.2 category wherein all of the face images consisting of expression variation are taken in the training set and the remaining ones, i.e., one neutral and four images with illumination changes, are placed in the test set. The results for this setup are also shown in Table 6, from which it is observed that the performance of the ZM_magLBP as well as that of ZM_componentLBP is better. Particularly, on YALE 3.2 category, a superior recognition rate of 97.33% is achieved by both approaches. Similarly, in case of YALE 4.2 category, four images of each person consisting of illumination variation are used to create the training set while all of the remaining ones (i.e., one neutral and six images in varying expressions) are placed in the test set. As shown in Table 6, ZM_magPhaseLBP achieves a high recognition rate of 98.89% for this category. Thus, from the results shown in Table 6, it can be concluded that ZM_magLBP is illumination invariant and ZM_magPhaseLBP is expression invariant. If we look at the overall performance of the proposed approaches on Yale database, ZM_magLBP and ZM_magPhaseLBP outperform the other combinations.

5.3 Performance on ORL database

The ORL face database consists of a total of 400 images of size 112 × 92 pixels of 40 persons with ten images per person in different states of variation. All the face images in this database are taken against a dark homogenous background. These images contain slight pose variation (tilt and yaw) up to ±20° with some basic facial expressions (smiling/not smiling, open/closed eyes). For performing experiments, the images of this database are cropped to 64 × 64 pixels. Sample cropped images for one person are shown in Figure 7. The face images of these databases are partitioned into 64 patches of 8 × 8 pixels to extract the local LBP/LTP features. Detailed experiments are performed on this database in order to analyze robustness of the proposed combined approaches against the pose variation. Various data partitions generated for this purpose are presented in Table 7.

Table 7 Data partition on ORL database for performing various experiments

Full size table

Firstly, experiments are performed on ORL 1 category by taking one image of each person in the training set, and all of the remaining ones are used to formulate the test set. Different trials are framed in this case. As there are nine different images in the test set, ten combinations of different training and test images are possible here. The average recognition results over these ten different trials are shown in Figure 8a,b. The results on different values of n_max are depicted in order to analyze the effect of maximum order of moments n_max of ZMs on the performance of the proposed combined approaches. As the basic LBP and the LTP descriptors used in this work are not invariant to image rotation whereas the ZM descriptor is an established rotation invariant scheme, it is observed from the results that the performance of the individual ZM descriptors is better than that of the LBP/LTP descriptors for this database. Among the ZM-based descriptors, ZM_component and ZM_magPhase descriptors give the highest recognition rates because of the inclusion of phase coefficients. However, an improvement of more than 10% is achieved by fusion of the invariant feature sets of the ZM and LBP/LTP descriptors wherein the ZM descriptor plays a significant role in achieving rotation invariance. The highest recognition rate of 81.22% is achieved by the proposed ZM_magLTP approach. From the results presented in Figure 8a,b, it is observed that in the proposed combined methods, fusion of the ZM features obtained at n_max = 9 provides better results on this database. Accordingly, further experiments have been conducted only on this order of moments.

The average recognition results over ten different trials of each group (i.e., ORL 2.1, ORL 2.2, ORL 2.3, and ORL 2.4) of the training and the test sets are presented in Table 8. Excellent results are obtained by the proposed combined approaches, while ZM_componentLTP provides the best results. On taking five images in training and the remaining five in the test set (i.e., for ORL 2.4 over ten runs), the average recognition rate of 99.2% is achieved by both the ZM_magLBP and ZM_componentLTP approaches. Further, ZM_component features have proven to be invariant to image rotation and tolerant to pose variations to some extent [29]. From this analysis, we can state that the ZM_component combined with LTP as well as the ZM_mag coupled with LBP provides superior results against pose variations.

Table 8 Performance of the considered approaches against different pose variations on ORL database

Full size table

Next, experiments are performed on ORL 3 category by taking two neutral face images in the training set, while four images of each person consisting of scale and up/down head movement are taken in the test set. Similarly, in order to examine the performance of the proposed approaches over yaw pose variation, two neutral images of each person are placed in the training set and four images with slight left/right head movement are placed in the test set, i.e., ORL 4 category. The results obtained from this experimental analysis are also presented in Table 8. On ORL 3 category, the performance of the ZM_component coupled with that of the LBP/LTP descriptors is better, achieving a recognition rate of 90.0%. Similarly, on ORL 4 category, the highest recognition rate of 91.25% is achieved by both the ZM_magLBP and the ZM_magPhaseLBP approaches. Thus, in most of the cases of ORL database, the ZM_component combined with LBP/LTP outperforms the other proposed combinations.

5.4 Performance analysis against noise variation

To examine the effect of noise on the recognition accuracy, we add impulsive noise, commonly named salt-and-pepper or spike noise, to the face images of the three databases. In the presence of impulsive noise, an image has dark pixels in bright regions and white pixels in dark regions [30]. In this analysis, a noise of 0.05 is added to the images of the test set whereas the training is done on original face images, i.e., on images with no noise. The procedure of experimental setup to examine the performance of these approaches against additive noise is the same as before. That is, in order to analyze the performance on FERET database, experiments are performed on FERET_A3 data partition in which one frontal image (0° pose) is selected in the training set and the four images in different poses (±22.5°, ±67.5° and with additive noise) for each person are used in the test set. On YALE 2.4 data partition, robustness of the proposed approaches is analyzed against noise variation by selecting five images of each person in the training set and the remaining six images (with additive noise) in the test set. The results presented are the average recognition rates over ten different runs of training and test sets. In a similar manner, the experiments on the images of ORL 2.4 data partition are performed by taking random five images of each person in the training set and the remaining five images (with additive noise) in the test set, and the recognition results for the same are also the averaged recognition rates over ten different runs of training and test sets. The experimental results on the said databases are shown in Table 9.

Table 9 Performance of the considered approaches against noise variation

Full size table

From the results presented, it is observed that among the individual approaches, the LTP descriptor is more robust to noise variation than the LBP. On Yale and ORL databases (with noise variation), the performance of the proposed ZM_magLTP and ZM_componentLTP descriptors, respectively, is better as compared to all other combined approaches. On the other hand, if FERET images with noise variations are assessed, then the recognition rate of ZM_magPhaseLBP is 81.5%, whereas the recognition rate of ZM_magLTP is 81.0%. The percentage difference between the actual results obtained (without adding noise and with noise) for both approaches is 5 and 3.5%, respectively. Hence, from this observation, we can say that on FERET database, the performance of the proposed ZM_magLTP descriptor is better against noise variation. For the case of Yale and ORL databases, the degradation due to noise in recognition rates is very less.

5.5 Time complexity

One of the important issues involved in using combined approaches similar to the ones proposed here is the time complexity of these approaches. It is a common perception that the moment-based descriptors are computation intensive which is true to some extent especially in case of the ZM calculation. The time complexity of the ZMs is of order $O (N^{2} n_{max}^{3})$ if all moments up to a maximum order n_max are computed for an image of N × N pixels. However, with the use of fast algorithms [10, 11], the time complexity is reduced to $O (N^{2} n_{max}^{2})$ . Further significant reduction in computation time is achieved by using symmetry/antisymmetry properties of kernel function of ZMs. The ZMs of the database images are computed offline and indexed with the images themselves. The ZMs of the test image are computed online. Although the time complexity of ZM calculation is still high, in this work, better recognition results have been obtained with n_max = 9 for FERET and ORL databases whereas Yale database exhibits good results by taking n_max = 11; therefore, we consider moments only up to these orders. As Z_0,0 and Z_1,1 have no discriminative capabilities, they do not affect the recognition rate. Hence, with n_max = 11, we have 40 features after discarding the coefficients Z_0,0 and Z_1,1. In contrast, although the number of features in the feature vector containing local histogram features of the LBP/LTP descriptors is high, the computation time of these descriptors is very low. Thus, the proposed fusion of the ZM and LBP/LTP descriptors maintains a good balance between speed and dimensionality. The size of feature vectors of the ZM and LBP/LTP descriptors is shown in Table 10.

Table 10 Dimensionality of the feature vectors of the ZM and LBP/LTP descriptors

Full size table

We observe that for an image of 256 × 256 pixels, the CPU elapse time for calculating ZMs is only 0.032 s for n_max = 12 on a PC with a 3.0-GHz CPU and a 1-GB RAM under Microsoft Windows environment. The time taken for computing LBP and LTP features is 0.015 and 0.016 s, respectively. Thus, the total time elapsed for the extraction of the local and global features of a test image does not exceed 0.048 s. The time taken for classification is much less than the feature extraction time. Thus, in comparison to the gain in the recognition performance, the time taken by the combined features is much less and can be afforded by the low computation power devices in online mode. Since the time complexity does not depend on the contents of the image, these experiments are carried out for one image only.

5.6 Performance comparison

We have compared the performance of the proposed combined descriptors with other popular methods such as PCA, 2DPCA, (PC)²A, E(PC)²A, 2D(PC)²A, SVD perturbation [38], and hybrid Fourier-AFMT transform [39] for face recognition with single (first) example image per person. As shown in Table 11, the proposed combined descriptors give the best recognition rate when compared with other well-established methods. On the other hand, the time complexity of PCA-based methods is very high as compared to the proposed approaches.

Table 11 Performance comparison (%) of some recent approaches with proposed methods on Yale and ORL databases

Full size table

Comparison of performance of the proposed combined descriptors with other popular methods for face recognition with single (first) example image per person.

Dual optimal multiband features (DOMF) [24] give a recognition rate of 92.6 and 88.4% on Yale and ORL databases, respectively, when two images of each person are taken in the training set and all the remaining are kept in the test set. On this similar setup for training and test images, the highest recognition rate achieved by the proposed ZM_magPhaseLBP descriptor for YALE 2.1 is 94.59% while the ZM_componentLBP descriptor achieves a recognition rate of 92.47% for ORL 2.1 category.

The performance of the proposed combined approaches is also compared with that of some recent face recognition methods when five images of each person are used for training. The recognition results of the proposed combined approaches and those of these recent methods on Yale and ORL databases for this case are shown in Table 12. The best results are highlighted in italics. All these methods use multidimensional features or combined approaches to represent the face images. As can be seen from the results presented, the recognition rate of the proposed approaches is higher as compared to that of the recent methods. In case of block-based S-P approach [25], one random set of five images per person is taken in the training set while all the remaining are kept in the test set for both the Yale and ORL databases, whereas the results presented for our proposed approaches are the average of ten random trials of training and test sets. It is worth mentioning here that on some of the random trials, our proposed descriptors also provide 100% recognition rate. Recently introduced wavelet moment (WM) and complex WM (CWM) approaches [41] have achieved a recognition rate of 51.5 and 54.3%, respectively, on FERET_A2 subset, while the proposed ZM_magPhaseLBP descriptor has attained a recognition rate of 77.33%. On the fafb subset of FERET database, the recognition rate obtained by the RES [40], WM, and CWM [41] approaches is 95.0, 88.0, and 91.0% whereas the highest recognition rate achieved by the proposed ZM_magPhaseLBP approach is 98.04%. Thus, on the basis of superior results obtained by the proposed fusion technique, it can be concluded that combining the feature sets of the ZM and LBP/LTP descriptors is an efficient and practical approach for robust face recognition.

Table 12 Performance comparison (%) of the proposed approaches with recent methods on Yale and ORL databases

Full size table

6 Conclusions

This paper proposes the fusion of two useful feature sets, i.e., the global ZMs and the local LBP/LTP descriptor. Face images capture extensive variation under varying pose and lighting conditions accompanied by the presence of expression and noise. Individually, the ZM and LBP/LTP descriptors are observed to be very effective in providing good recognition performance on the face images containing certain variations. In particular, the ZM descriptor extracts rotationally invariant shape features from the whole face images, whereas the LBP/LTP descriptors are able to capture the fine details and illumination-invariant characteristics within some local regions of the face images. However, the fusion of these two complementary approaches incorporates the benefits of both of these descriptors and as such proves to be invariant against various distortions present in the face images. Herein this work, diverse feature sets of ZMs are combined with LBP/LTP descriptors to generate various combined approaches, namely, ZM_magLBP, ZM_magPhaseLBP, ZM_componentLBP, ZM_magLTP, ZM_magPhaseLTP, and ZM_componentLTP. From the detailed experiments performed on FERET, Yale, and ORL face databases, it has been observed that the proposed combined approaches are highly robust against pose, expression, illumination, and noise variations, as the recognition rate achieved by the proposed approaches is approximately 10 to 30% higher than that obtained by applying these approaches individually. Fusion of ZM and LBP descriptor performs better over the pose, expression, and illumination variations, while in the presence of noise, ZMs combined with LTP descriptor generate superior results. Experimental results also prove the efficacy of the proposed methods over other existing techniques. Also, significant improvement in the recognition rate is achieved by the proposed scheme when only single training image per person is available.

Future work is suggested towards discovering the optimal ways to utilize the information acquired by the phase coefficients of ZM descriptor in addition to using different methods of classification to further improve the performance of the proposed fusion approach.

References

Zhao W, Chellappa R, Phillips P, Rosenfeld A: Face recognition: a literature survey. ACM Comput. Surv. 2003, 35(4):399-458.
Article Google Scholar
Hjelmas E, Low BK: Face detection: a survey. Comput. Vision. Image. Underst. 2001, 83: 236-274.
Article MATH Google Scholar
Turk M: A random walk through Eigenspace. IEICE Trans. Inf. Syst. 2001, E84-D(12):1586-1595.
Google Scholar
Mittal N, Walia E: Face recognition using improved fast PCA algorithm. In Proceedings of the IEEE International Congress on Image and Signal Processing (CISp ‘08). Sanya, Hainan; 2008:554-558.
Google Scholar
Belhumeur PN, Hespanha JP, Kriegman DJ: Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19: 711-720.
Article Google Scholar
Xu Y, Zhang D, Yang J, J–Y Y: An approach for directly extracting features from matrix data and its application in face recognition. Neurocomputing 2008, 71: 1857-1865.
Article Google Scholar
Daoqiang Z, Zhi-Hua Z: (2D)²PCA: Two-directional two-dimensional PCA for efficient face representation and recognition. Neurocomputing 2005, 69: 224-231.
Article Google Scholar
Zhang D, Lu G: Review of shape representation and description techniques. Pattern Recognit. 2004, 37(1):1-19.
Article MATH Google Scholar
Zhang D, Lu G: Evaluation of MPEG-7 shape descriptors against other shape descriptors. Multimed. Syst. 2003, 9: 15-30.
Article Google Scholar
Singh C, Walia E: Fast and numerically stable methods for the computation of Zernike moments. Pattern Recognit. 2010, 43(7):2497-2506.
Article MATH Google Scholar
C–Y W, Paramesran R: On the computational aspects of Zernike moments. Image Vis. Comput. 2007, 25: 967-980.
Article Google Scholar
Lowe DG: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60(2):91-110.
Article Google Scholar
Soyel H, Demirel H: Facial expression recognition based on discriminative scale invariant feature transform. IET Electron. Lett. 2010, 46(5):343-345.
Article Google Scholar
Huang L, Shimizu A, Kobatake H: Robust face detection using Gabor filter features. Pattern Recognit. Lett. 2005, 26(11):1641-1649.
Article Google Scholar
Ojala T, Pietikäinen M: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24(7):971-987.
Article Google Scholar
Ahonen T, Hadid A, Pietikäinen M: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28(12):2037-2041.
Article MATH Google Scholar
Jun B, Kim T, Kim D: A compact local binary pattern using maximization of mutual information for face analysis. Pattern Recognit. 2011, 44: 532-543.
Article MATH Google Scholar
Tan X, Triggs B: Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans. Image Process. 2010, 19(6):1635-1648.
Article MathSciNet Google Scholar
Kim C, Oh J, Choi C: Combined Subspace Method Using Global and Local features For Face Recognition. 4th edition. Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN ‘05), Montreal Canada; 2005:2030-2035.
Google Scholar
Zhou D, Yang X: Feature fusion based face recognition using EFM. Proceedings International Conference on Image Analysis and Recognition (ICIAR ‘04). Lect. Notes Comput. Sci. 2004, 3212: 643-650.
Article Google Scholar
Zhou D, Yang X, Peng N, Wang Y: Improved-LDA based face recognition using both facial global and local information. Pattern Recognit. Lett. 2006, 27: 536-543.
Article Google Scholar
Singh C, Walia E, Mittal N: Robust two-stage face recognition approach using global and local features. Vis. Comput. 2012, 28(11):1085-1098.
Article Google Scholar
Su Y, Shan S, Chen X, Gao W: Hierarchical ensemble of global and local classifiers for face recognition. IEEE Trans. Image Process. 2009, 18(8):1885-1895.
Article MathSciNet Google Scholar
Wong Y-W, Seng KP, Li-M A: Dual optimal multiband features for face recognition. Expert Syst, Appl 2010, 37(4):2957-2962.
Article Google Scholar
Aroussi ME, Hassouni ME, Ghouzali S, Rziza M, Aboutajdine D: Local appearance based face recognition method using block based steerable pyramid transform. Signal Process 2011, 91: 38-50.
Article MATH Google Scholar
Liu Z, Liu C: Fusion of color, local spatial and global frequency information for face recognition. Pattern Recognit. 2010, 43: 2882-2890.
Article MATH Google Scholar
Jun B, Lee J, Kim D: A novel illumination-robust face recognition using statistical and non-statistical method. Pattern Recognit. Lett. 2011, 32: 329-336.
Article Google Scholar
Moore S, Bowden R: Local binary patterns for multi-view facial expression recognition. Comput. Vision Image Underst. 2011, 115: 541-558.
Article Google Scholar
Singh C, Walia E, Mittal N: Rotation invariant complex Zernike moments features and their application to human face and character recognition. IET Comput. Vision 2011, 5(5):255-265.
Article Google Scholar
Lajevardi SM, Hussain ZM: Higher order orthogonal moments for invariant facial expression recognition. Digit Signal Process 2010, 20: 1771-1779.
Article Google Scholar
Li S, M–C L, Chi-Man P: Complex Zernike moments features for shape based image retrieval. IEEE Trans. Syst. Man. Cybern. C Appl. Rev. 2009, 39: 227-237.
Article Google Scholar
Singh C, Mittal N, Walia E: Face recognition using Zernike and complex Zernike moment features. Pattern Recognit. Image Anal. 2011, 21(1):71-81.
Article Google Scholar
Revaud J, Lavoue G, Baskurt A: Improving Zernike moments comparison for optimal similarity and rotation angle retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31(4):627-636.
Article Google Scholar
Jain A, Nandakumar K, Ross A: Score normalization in multimodal biometric systems. Pattern Recognit. 2005, 38: 2270-2285.
Article Google Scholar
The Facial Recognition Technology (FERET) face database http://www.nist.gov/itl/iad/ig/colorferet.cfm
Yale face database http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
Olivetti Research Laboratory (ORL) face database http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
Li J, Pan J-S: A novel pose and illumination robust face recognition with a single training image per person algorithm. Chin. Optic. Lett. 2008, 6(4):255-257.
Article Google Scholar
Chen YM, Chiang J-H: Fusing multiple features for Fourier Mellin-based face recognition with single example image per person. Neurocomputing 2010, 73(16–18):3089-3096.
Article Google Scholar
Kuo C-H, Lee JD: Face recognition based on a two-view projective transformation using one sample per subject. IET Comput. Vision 2012, 6(5):489-498.
Article MathSciNet Google Scholar
Singh C, Sahan AM: Face recognition using complex wavelet moments. Opt. Laser Technol. 2013, 47: 256-267.
Article Google Scholar
Zhi R, Ruan Q: Two-dimensional direct and weighted linear discriminant analysis for face recognition. Neurocomputing 2008, 71: 3607-3611.
Article Google Scholar
Wang Y, Wu Y: Face recognition using Intrinsicfaces. Pattern Recognit. 2010, 43: 3580-3590.
Article MATH Google Scholar

Download references

Acknowledgements

The authors are thankful to the useful comments and suggestions of the anonymous reviewers for raising the standard of the paper. The authors are grateful to the All India Council for Technical Education (AICTE), Govt. of India, New Delhi, India, for supporting the research work vide their file number 8013/RID/BOR/RPS-77/2005-06. We are also grateful to the National Institute of Standards and Technology (colorferet@nist.gov) for providing FERET face database.

Author information

Authors and Affiliations

Department of Computer Science, Punjabi University, Patiala, 147002, India
Chandan Singh
Central Scientific Instruments Organisation, Sector 30-C, Chandigarh, 160030, India
Neerja Mittal
Department of Computer Science, South Asian University, Akbar Bhawan, Chanakyapuri, Delhi, 110021, India
Ekta Walia

Authors

Chandan Singh
View author publications
You can also search for this author in PubMed Google Scholar
Neerja Mittal
View author publications
You can also search for this author in PubMed Google Scholar
Ekta Walia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neerja Mittal.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Singh, C., Mittal, N. & Walia, E. Complementary feature sets for optimal face recognition. J Image Video Proc 2014, 35 (2014). https://doi.org/10.1186/1687-5281-2014-35

Download citation

Received: 04 January 2014
Accepted: 04 June 2014
Published: 05 July 2014
DOI: https://doi.org/10.1186/1687-5281-2014-35

Complementary feature sets for optimal face recognition

Abstract

1 Introduction

2 Baseline image descriptors

2.1 Global image descriptor

2.1.1 Zernike moments

2.1.2 Diverse feature sets of ZMs and related work

2.2 Local image descriptor

2.2.1 Local binary pattern

2.2.2 Local ternary patterns

3 Similarity measures used

3.1 Similarity measure for ZM descriptor

3.2 Similarity measure for LBP/LTP descriptor

4 Fusion of ZM and LBP/LTP descriptors

5 Experiments and results

5.1 Performance on FERET database

5.2 Performance on Yale database

5.3 Performance on ORL database

5.4 Performance analysis against noise variation

5.5 Time complexity

5.6 Performance comparison

6 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords