- Open Access
Three-dimensional face recognition under expression variation
© Wang et al.; licensee Springer. 2014
- Received: 1 April 2014
- Accepted: 6 November 2014
- Published: 25 November 2014
In this paper, we introduce a fully automatic framework for 3D face recognition under expression variation. For 3D data preprocessing, an improved nose detection method is presented. The small pose is corrected at the same time. A new facial expression processing method which is based on sparse representation is proposed subsequently. As a result, this framework enhances the recognition rate because facial expression is the biggest obstacle for 3D face recognition. Then, the facial representation, which is based on the dual-tree complex wavelet transform (DT-CWT), is extracted from depth images. It contains the facial information and six subregions’ information. Recognition is achieved by linear discriminant analysis (LDA) and nearest neighbor classifier. We have performed different experiments on the Face Recognition Grand Challenge database and Bosphorus database. It achieves the verification rate of 98.86% on the all vs. all experiment at 0.1% false acceptance rate (FAR) in the Face Recognition Grand Challenge (FRGC) and 95.03% verification rate on nearly frontal faces with expression changes and occlusions in the Bosphorus database.
- Dual-tree complex wavelet transform
- 3D face recognition
- Sparse representation
- Linear discriminant analysis
3D face recognition is a continuously developing subject with many challenging issues [1–3]. These years, many new 3D face recognition methods which were demonstrated on the Face Recognition Grand Challenge (FRGC) v2 data have got good performances.
Regional matching scheme was firstly proposed by Faltemier et al. . In their paper, the whole 3D face images were divided into 28 patches. The fusion results from independently matched regions could achieve good performance. Wang et al.  extracted the Gabor, LBP, and Haar features from the depth image, and then the most discriminative local feature was selected optimally by boosting and trained as weak classifiers for assembling three collective strong classifiers. Mian et al.  extracted the spherical face representation (SFR) of the 3D facial data and the scale invariant feature transform (SIFT) descriptor of the 2D data to train a rejection classifier. The remaining faces were verified using a region-based matching approach which was robust to facial expression. Berretti et al.  proposed an approach that took into account the graph form to reflect geometrical information for 3D facial surface, and the relevant information among the neighboring points could be encoded into a compact representation. 3D weighted walkthrough (3DWW) descriptors were proposed to demonstrate the mutual spatial displacement among pairwise arcs of points of the corresponding stripes. Zhang et al.  found a novel resolution invariant local feature for 3D face recognition. Six different scale invariant similarity measures were fused at the score level, which increased the robustness against expression variation.
The accuracy of 3D face recognition could be significantly degraded by large facial expression variations. Alyuz et al.  proposed an expression resistant 3D face recognition method based on the regional registration. In recent years, many methods dealt with facial expression before recognition. Kakadiaris et al.  utilized the elastically adapted deformable model firstly, and then they mapped the 3D geometry information onto a 2D regular grid, thus combining the descriptiveness of the 3D data with the computational efficiency of the 2D data. A multistage fully automatic alignment algorithm and the advanced wavelet analysis were used for recognition. Drira et al.  represented facial surfaces by radial curves emanating from the nose tips and used elastic shape analysis of these curves to develop a Riemannian framework for analyzing shapes of full facial surfaces. Their method used the nose tips which are already provided. Mohammadzade et al.  presented a new iterative method which can deal with 3D faces with opened mouth. They performed experiments to prove that the combination of the normal vectors and the point coordinates can improve the recognition performance. A verification rate of 99.6% at a false acceptance rate (FAR) of 0.1% has been achieved using the proposed method for the all versus all experiment. Amberg et al.  described an expression invariant method for face recognition by fitting an identity/expression separated 3D Morphable Model to shape data. The expression model greatly improved recognition. Their method operated at approximately 40 to 90 s per query.
The main contributions of this work can be summarized as follows:
● The first contribution is an improved nose detection method which can correct the small pose of the face iteratively. The proposed nose detection algorithm is simple, and the success rate is 99.95% in the FRGC database.
● The second one is that we propose a new 3D facial expression processing method which is based on sparse representation. Li et al.  utilized sparse representation into 3D face recognition, but they applied it in the recognition section. In this paper, sparse representation is used for facial expression processing. The objective of the sparse representation is to relate a probe with the minimum number of gallery dataset. Considering that the first task of our expression processing work is to find the minimum number of expressional components out of the dictionary (because people only make one expression for one time), the objective of sparse representation is naturally better suited for finding the expressional deformation from the dataset. This method is a learning method that can abstract the testing face’s neutral component from a dictionary of neutral and expressional spaces, and it only costs 14.91 s for removing one facial expression (The type of our CPU is Intel (R) Core (TM) i3-2120, and the RAM is 2 GB.). The proposed method is more simple and only cost less time.
The paper is organized as follows: In Section 2, the data preprocessing methods are proposed. The improved nose tip detection method is presented in this section. Then, the 3D facial expression processing method is presented in Section 3. In Section 4, the framework of our 3D face recognition method is given. Experimental results are given in Section 5, and the conclusions are drawn in Section 6.
Firstly, a 3 × 3 Gaussian filter is used to remove spikes and noise, and then the range data are subsampled at a 1:4 ratio.
Some 3D faces in the FRGC database contains information of the ears, while some faces’ ears are hidden by the hair. For the purpose of consistency, we only use the face region into recognition. Now, we introduce the face region extracting method.
2.1 Nose detection
In this paper, the first step of nose tip detection is finding the central stripe. Details are presented in our earlier work .
Align stripe A to stripe B using the ICP  method and record the transformation matrix M 2.
Use M 2 to find point p which is the first person’s transformed nose tip.
Crop a sphere (radius =37 mm) centered at point p. The highest point in the sphere is found as the nose tip of B. The step is shown in our previous work .
Crop a sphere (radius =90 mm) centered at the nose tip and align to the standard face. Calculate the transformed nose tip p 1.
Crop a sphere (radius =25 mm) centered at point p 1. The highest point in the sphere is found as the new nose tip p 2.
If ||p 2 − p 1|| <2 mm, p 2 is the nose tip, else, back to step (4).
2.2 Face region
Once the nose tip is successfully found, the region in the last step of nose detection is used as the face region. All the faces with excessive head rotation, hair artifact, and big expressions were successfully segmented by the proposed nose detection algorithm. Some examples are presented in Figure 2.
Facial expression is one of the biggest obstacle of 3D face recognition because 3D face has less information and some information on the face can be changed easily by facial expression. In this section, we introduce a new expression processing method for removing facial expression which is based on sparse representation. We expect that our method could establish correspondence between an open mouth and estimated neutral component.
3.1 Brief introduction of sparse representation
where y is the test sample, x is the sparse representation on dictionary A, and γ is a scalar constant (we use γ = 5,000 in this paper). The feature-sign search method  is adopted to solve Equation 1.
3.2 Facial expression processing method
First of all, we use a triangle-based linear interpolation method to fit a surface Z = f (X, Y) (the size of it is 128 × 128). Meanwhile, we use a triangle-based linear interpolation to fit a surface too (the size of it is 384 × 384), and then we establish the depth image using the surface for the feature extraction in Section 4.
Finally, the expression-removed depth images are constructed using FNeutral. The size of the depth image is 128 × 128.
In the training section, we use all the 943 faces in FRGC 1.0 for training. First of all, we extract the four-level magnitude subimages of each training face. Subsequently, we vectorize the six magnitude subimages into a large vector (the dimension is 384), and then we utilize LDA  to learn the discriminant subspace and record the transformation matrix. Secondly, we extract the six subregions’ four-level magnitude subimages using DT-CWT and vectorize them into a large vector (the dimension is 2,304) and utilize LDA to learn the subspace too. Finally, we get all the gallery faces’ two features using DT-CWT and their transformation matrix, respectively.
In the function, S rc represent an element of similarity matrix S1 and S2 (at row r and column c), S r is the elements of S1 and S2 at row r, and denotes the similarity normalized S rc . Then, the final similarity matrix is established by a simple sum rule S = S1 + S2. Recognition is achieved by the nearest neighbor classifier.
The Bosphorus database consists of 105 subjects in various poses, expressions, and occlusion conditions. Eighteen subjects have beard/moustache and short facial hair is available for 15 subjects. The majority of the subjects are aged between 25 and 35 years. There are 60 men and 45 women in total, and most of the subjects are Caucasian. Also, 27 professional actors/actresses are incorporated in the database. Up to 54 face scans are available per subject, but 34 of these subjects have 31 scans. Thus, the number of total face scans is 4,652.
FRGC v1 contained 943 3D faces, while FRGC v2 contained 4,007 3D faces of 466 persons. The images were acquired with a Minolta Vivid 910. The Minolta 910 scanner uses triangulation with a laser stripe projector to build a 3D model of the face. The 3D faces are available in the form of four matrices, each of size 640 × 480. The data consists of frontal views. Some of the subject has facial hair, but none of them is wearing glasses. The 2D faces are corresponding to their respective 3D face. In FRGC v2, 57% are male and 43% are female. The database was collected during 2003 to 2004. In order to evaluate the robustness of our method against expression variations, we classified 1,648 faces with expression as the non-neutral dataset (411 persons), while 2,359 neutral faces as the neutral dataset (422 persons). The number of the neutral dataset and the non-neutral dataset is not equal because some people in FRGC v2 contained only one face. We use ‘N’ which represents for neutral, ‘E’ which indicates for non-neutral, and ‘A’ which stands for all in the following of the paper.
5.1 Experiments on Bosphorus database
5.2 Experiments on FRGC
5.2.1 Comparison with original mouths
Dealing with open mouth has been a serious topic in 3D face recognition, and a number of researchers have been working on it. We expect that our method in correctly establishing correspondence between an open mouth and estimated neutral component can greatly improve 3D face recognition.
As a first set of experiments, we test our algorithm on the mouth area of FRGC v2. As the experimental protocol, we constructed the gallery set containing the first neutral face for each subject and the remaining ones made up of the probe set. We compare the expression-removed mouths with the original mouths using the PCA method. The recognition rate of using the original mouths is 52.95%, while the recognition rate of using the expression-removed mouths is 69.5%. We could find that the expression-removed mouths contain more identity information than the original mouths.
5.2.2 Comparison with original faces
Then, for the purpose of evaluating the performance of the expression processing method, we compare the expression-removed faces with the original faces using the Gabor feature  and DT-CWT feature of the whole depth image. We finished four experiments which contained the neutral vs. neutral experiment, neutral vs. non-neutral experiment, all vs. all experiment, and ROCIII experiment. In the all vs. all experiment, every image of FRGC v2 is matched with all remaining others. It resulted 16,052,042 combinations. Similarly, in the neutral vs. neutral experiment, every image of the neutral database is matched with all remaining others and it resulted 5,562,522 combinations. In the neutral vs. non-neutral experiment, the gallery images come from the neutral dataset and the probe entries come from the expression dataset. In the ROCIII experiment, the gallery images come from the Fall 2003 semester, while the probe entries come from the Spring 2004 semester.
Facial expression-removed faces compared with original faces using Gabor feature and DT-CWT feature of whole depth image
Rank one recognition rate
Verification rate at 0.001 FAR
Facial expression-removed faces
Facial expression-removed faces
N vs. N
N vs. E
A vs. A
5.2.3 ROC and CMC of our method
5.2.4 Comparisons with other methods
Verification rate comparison with the state-of-the-art methods at 0.001 FAR
N vs. N
N vs. E
A vs. A
Our method (training set: FRGC v1)
Also, the verification rate of our method is shown in Table 2. The performances of A vs. A and ROCIII experiments were slightly lower but still closed to the best.
We presented an automatic method for 3D face recognition. We used an improved detection method to correct the pose of the face. We showed that the proposed method could correct posed face which angle is less than 30°.
We also proposed a 3D facial expression processing method, which was based on sparse representation. It could abstract the neutral component from a dictionary which is the combination of neutral and expressional spaces and enhance the recognition rate. Our method could deal opened mouth and expression of grin. We showed that the estimated neutral faces which are extracted from the expression faces are familiar with that extracted from their corresponding neutral face.
Then, the facial representation which contained the whole facial feature and the six subregions’ feature extracted by DT-CWT were gotten. Holistic and local feature could represent a 3D face more effective for the recognition. Finally, LDA was used to enhance the accuracy of the recognition.
This work was supported partly by the National Natural Science Foundation of China (61172128), the National Key Basic Research Program of China (2012CB316304), the New Century Excellent Talents in University (NCET-12-0768), the Fundamental Research Funds for the Central Universities (2013JBM020, 2013JBZ003), the Program for Innovative Research Team in the University of Ministry of Education of China (IRT201206), the Beijing Higher Education Young Elite Teacher Project (YETP0544), the National Natural Science Foundation of China (61403024), and the Research Fund for the Doctoral Program of Higher Education of China (20120009110008, 20120009120009).
- Zhong C, Sun Z, Tan T: Robust 3D face recognition using learned visual codebook. In Proceedings of IEEE Conference on Pattern Recognition. Minneapolis; 2007:17-22.Google Scholar
- Zhong C, Sun Z, Tan T: Learning efficient codes for 3D face recognition. In Proceedings of 15th IEEE International Conference on Image Processing. San Diego; 2008:1928-1931. 12–15 OctGoogle Scholar
- Chang KI, Bowyer KW, Flynn PJ: An evaluation of multi-modal 2D +3D face biometrics. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27(4):619-624.View ArticleGoogle Scholar
- Faltmier TC, Bowyer KW, Flynn PL: A region ensemble for 3-D face recognition. IEEE Trans. Inf. Forensics Secur. 2008, 3(1):62-73.View ArticleGoogle Scholar
- Wang Y, Liu J, Tang X: Robust 3D face recognition by local shape difference boosting. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32(10):1858-1870.View ArticleGoogle Scholar
- Lee H, Battle A, Raina R, Ng AY: Efficient sparse coding algorithms. Adv. Neural. Inf. Proc. Syst. 2007, 19: 801.Google Scholar
- Berretti S, Bimbo AD, Pala P: 3D face recognition using iso-geodesic stripes. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32(12):2162-2177.View ArticleGoogle Scholar
- Zhang G, Wang Y: Robust 3D face recognition based in resolution invariant features. Pattern Recognit. Lett. 2011, 32(7):1009-1019. 10.1016/j.patrec.2011.02.004View ArticleGoogle Scholar
- Alyuz N, Gökberk B, Akarun L: Regional registration for expression resistant 3-D face recognition. IEEE Trans. Inf. Forensics Secur. 2010, 5(3):425-440.View ArticleGoogle Scholar
- Kakadiaris A, Passalis G, Toderici G, Murtuza MN, Lu Y, Karampatziakis N, Theoharis T: Three-dimensional face recognition in the presence of facial expressions: an annotated deformable model approach. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29(4):640-649.View ArticleGoogle Scholar
- Drira H, Amor BB, Srivastava A, Daoudi M, Slama R: 3D face recognition under expressions. Occlusions Pose Variat. 2013, 35(9):2270-2283.Google Scholar
- Mohammadzade H, Hatzinakos D: Iterative closest normal point for 3D face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35(2):381-397.View ArticleGoogle Scholar
- Amberg B, Knothe R, Vetter T: Expression invariant 3D face recognition with a Morphable Model. In International Conference on Automatic Face & Gesture Recognition. Amsterdam; 2008:1-6. 17–19 SeptGoogle Scholar
- Belhumeur PN, Hespanha JP, Kriegman DJ: Eigenfaces vs. fisherface: recognition using class special linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19(7):711-720. 10.1109/34.598228View ArticleGoogle Scholar
- Li X, Jia T, Zhang H: Expression-insensitive 3D face recognition using sparse representation. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Miami; 2009:2575-2582. 20–25 JuneGoogle Scholar
- Wang X, Ruan Q, Jin Y, An G: Expression robust three-dimensional face recognition based on Gaussian filter and dual-tree complex wavelet transform. J. Intell. Fuzzy Syst. 2014, 26: 193-201.Google Scholar
- Besl PJ, McKay ND: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14(2):239-256. 10.1109/34.121791View ArticleGoogle Scholar
- Donoho D: For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. Commun. Pure Appl. Math. 2006, 59(6):797-829. 10.1002/cpa.20132MATHMathSciNetView ArticleGoogle Scholar
- Selesnick IW, Baraniuk RG, Kingsbury NG: The dual-tree complex wavelet transform. IEEE Signal Proc. Mag. 2005, 22(6):123-151.View ArticleGoogle Scholar
- Liu C, Dai D: Face recognition using dual-tree complex wavelet features. IEEE Trans. Image Process. 2009, 18(11):2593-2599.MathSciNetView ArticleGoogle Scholar
- Koenderink JJ, van Doorn AJ: Surface shape and curvature scales. Image Vision Comput. 1992, 10(8):557-565. 10.1016/0262-8856(92)90076-FView ArticleGoogle Scholar
- Savran A, Alyüz N, Dibeklioğlu H, Çeliktutan O, Gökberk B, Sankur B, Akarun L: Bosphorus Database for 3D face analysis. Workshop on Biometrics and Identity Management 2008, 47-56.View ArticleGoogle Scholar
- Phillips PJ, Flynn P, Scruggs T, Bowyer KW, Chang J, Hoffman K, Marques J, Min J, Worek W: Overview of the Face Recognition Grand Challenge. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1. San Diego; 2005:947-954. 20–25 JuneGoogle Scholar
- Jones JP, Palmer LA: An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. J. Neurophysiol. 1987, 27: 1233-1258.Google Scholar
- Mian AS, Bennamoun M, Owens R: An efficient multimodal 2D-3D hybrid approach to automatic face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29(11):1927-1943.View ArticleGoogle Scholar
- Maurer T, Guigonis D, Maslov I, Pesenti B, Tsaregorodtsev A, West D, Medioni G: Performance of Geometrix ActiveID TM 3D face recognition engine on the FRGC data. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). San Diego; 2005:154. 20–25 JuneGoogle Scholar
- Cook J, Cox M, Chandran V, Sridharan S: Robust 3D face recognition from expression categorisation, ICB 2007. LNCS 2007, 4642: 271-280.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.