Gauss–Laguerre wavelet textural feature fusion with geometrical information for facial expression identification
© Poursaberi et al.; licensee Springer. 2012
Received: 1 November 2011
Accepted: 3 September 2012
Published: 25 September 2012
Facial expressions are a valuable source of information that accompanies facial biometrics. Early detection of physiological and psycho-emotional data from facial expressions is linked to the situational awareness module of any advanced biometric system for personal state re/identification. In this article, a new method that utilizes both texture and geometric information of facial fiducial points is presented. We investigate Gauss–Laguerre wavelets, which have rich frequency extraction capabilities, to extract texture information of various facial expressions. Rotation invariance and the multiscale approach of these wavelets make the feature extraction robust. Moreover, geometric positions of fiducial points provide valuable information for upper/lower face action units. The combination of these two types of features is used for facial expression classification. The performance of this system has been validated on three public databases: the JAFFE, the Cohn-Kanade, and the MMI image.
Automatic facial expression recognition (AFER) is of interest to researchers because of its importance for facial biometric-based intelligent support systems. It provides a behavioral measure to assess emotions, cognitive processes, and social interaction. Examples of applications of AFER include robotics, human–computer interface, behavioral science, animations and computer games, educational software, emotion processing, and fatigue detection. Due to multiple limitations and difficulties such as occlusion, lighting conditions, and variation of expressions across the population, or even for an individual, having an automatic system helps in creating intelligent visual media for understanding different expressions. Moreover, this understanding helps in building meaningful and responsive HCI interfaces.
Processing 2D static images.
Processing image sequences.
The first one, which is more difficult than image sequence since less information is available, often uses feature-based methods. Using only one image for expression recognition needs robust and highly distinctive features to cope with variations in human subjects or imaging conditions. There are several methods to process still images. One of them is PCA-based holistic representations and feed forward neural networks (NN) for classification proposed by Cottrell and Metcalfe. Chen and Huang used a clustering-based feature extraction to recognize only three facial expressions. Eigenface feature extraction accompanied by principal component analysis (PCA) is proposed by Turk and Pentland. Holistic representations and NNs are applied to pyramid-structured images by Rahardja et al.. Feng et al. applied local binary pattern for feature extraction, and used a linear programming technique as the classifier. Deformable models were utilized by Lanitis et al. to capture variations in shape and grey-level appearance. In the second approach, an image sequence displays one expression. The neutral face is used as a baseline face, and FER is in based on the difference between the baseline and the following input face image. Preliminary work on facial expressions, by tracking the motion of 20 identified spots, has been done by Suma et al.. Motion tracking of facial features in image sequences is performed by optical flow, and expressions are classified into six basic classes. The Fourier transform was utilized for feature extraction, and a fuzzy C-means clustering was applied to build a spatiotemporal model for each expression in.
Facial coding is normally performed in two different ways: holistic and analytic. In the holistic approach, the face is treated as a whole. Different methods are presented in this approach including[16, 17]: optical flow, Fisher linear discriminates, NN, active appearance models (AAMs), and Gabor filters. In the analytic approach, local features are used instead of the whole face, namely, fiducial points describe the position of important points on the face (e.g., eyes, eyebrows, mouth, nose, etc.), together with the geometry or texture features around these points.
Gabor filters are widely used in texture analysis. These filters model simple cells in the primary visual cortex. Zafeiriou and Pitas showed the best performance of Gabor filters in both analytic and holistic approaches. Gabor filters have been used for expression classification in[4, 20]. Although the Gabor filters show high performance in FER, the main problems using this filter is how to select the optimum one, in terms of scale and orientation. For example in, 40 filters (5 scales and 8 orientations) are used. Because of the large number of convolution operations, it needs large amounts of memory and computational cost. Moreover, with the small training samples, the dimensionality is really high. Normally, two types of facial features are used: permanent and transient. Permanent features include eyes, lips, brows and cheeks, and transient features include facial lines, wrinkles, and furrows. The eyebrows and mouth play the main role in facial expressions. Pardas and Bonafonte showed that expressions such as surprise, joy, and disgust have much higher recognition rate, since clear motion of the mouth and the eyebrows are involved.
In this article, both the combined texture and the geometric information of face fiducial points are used to code different expressions. Gauss–Laguerre (GL) wavelets are used for texture analysis and the positions of 18 fiducial points represent the deformation of the eyes, eyebrows, and mouth. The combination of these features is used for expression classification. The K-nearest neighbor (KNN) is used for classifying expressions based on closest training examples in the feature space. The rest of the article is organized as follows: in “G–L wavelets” section, a mathematical description of GL circular harmonic wavelets (CHW) is presented; feature extraction approach in addition to the classification method are mentioned in “The proposed approach” section; experimental results, using the JAFFE, the Cohn-Kanade, and the MMI face databases are reported in “Experiment results” section; finally, a conclusion is drawn in “Conclusion” section.
where n is the order, k is the degree of the CHF, and is the radial profile.
The same functions also appear in harmonic tomographic decomposition, and have been considered for the analysis of local image symmetry. CHFs have been employed for defining of rotation-invariant pattern signatures. A family of orthogonal CHWs, forming a multi-resolution pyramid referred to as the circular harmonic pyramid (CHP), is utilized for coefficient generation and coding. Each CHW, pertaining to the pyramid, represents the image by translated, dilated, and rotated versions of a CHF. At the same time, for a fixed resolution, the CHP orthogonal system provides a local representation of the given image around a point in terms of CHFs. The self-steerability of each component of the CHP can be exploited for pattern analysis in the presence of rotation (other than translation and dilation), in particular, for pattern recognition, irrespective of orientation.
CHFs, these are complex, polar separable filters, characterized by harmonic angular shape, which allows building rotationally invariant descriptors. A scale parameter is also introduced to perform a multi-resolution analysis. The GL filters from the family of orthogonal functions, satisfying the wavelet admissibility condition required for multi-resolution wavelet pyramid analysis, are used. Similar to Gabor wavelets, any image may be represented by translated, dilated, and rotated replicas of the GL wavelet. For a fixed resolution, the GL CHFs provide a local representation of the image in the polar coordinate system centered at a given point, named the pivot point. This representation is called the GL transform. They are characterized by a CHF, which is a complex polar separable filter with a harmonic angular shape, represented in polar coordinates.
The proposed approach
In this section, the algorithmic steps of proposed approach are explained. For each input image, the face area is localized first. Then, the features are extracted based on GL filters, and, finally, the KNN classification is used for expression recognition.
Preprocessing is normally performed before feature extraction for the FER, in order to increase system performance. The aim of this step which includes scaling, intensity normalization, and size equalization, is to have images, which only contain a face, expressing a certain emotion. Sometimes, histogram equalization is also used to adjust image brightness and contrast. To normalize the face, the image with the neutral expression is scaled, so that it has a fixed distance between the eyes. No intensity normalization has been considered, since the GL filters can extract an abundance of features without any preprocessing.
To extract facial features, the AAM is utilized. It is widely used in face recognition and expression classification, due to its remarkable performance in extracting face shape and texture information. AAM contains both a statistical model and texture information of the face, and performs matching via finding the model parameters. These minimize the difference between the image and the synthesized model. We used 18 fiducial points to model the face and distinguish facial expressions. The features to distinguish the latter are explained in section “AAM”. In our experiment, the AAM model has been created using different images from three databases with different expressions. All images were roughly resized and cropped to 256 × 256. After creating the AAM, the eye positions in each image is automatically extracted, and the line, which connects the inner corner of the eyes, is used for normalization.
AAM is an algorithm for matching a statistical shape model to an image with both shape and appearance variations. For example in facial expression recognition, these deformations are both facial expression changes and pose variations along with the texture variations caused by illuminations. These variations are represented by a linear model like PCA. So, the main purpose of the AAM is first to define a model and then finding the best matched parameters between the given new image and built model using a fitting algorithm.
where α is the appearance parameter. After finding the shape and appearance parameters, a piecewise affine warp is used to construct the AAM by locating each pixel of appearance onto the inner side of the current shape. The goal is to minimize the difference between the warped image and the appearance image.
The feature vector consists of two types: textural features, which are extracted globally by applying the GL filter, and the geometric information of local fiducial points.
Textural feature extraction
Geometric feature extraction
The AAM is applied to extract the 18 points. The distances are labeled by d’s as shown in Figure 4.
For the upper portion of the face, ten distances are calculated, according to Table 1.
Upper (distances 1–10) and lower (distances 11–15) face geometric distance
left inner brow-left inner eye corner
right eye height
right inner brow-right inner eye corner
left top eye point-line connecting left eye corners
left top brow-line connecting left eye corners
right top eye point-line connecting right eye corners
right top brow-line connecting right eye corners
left bottom eye point-line connecting left eye corners
left eye height
right bottom eye point-line connecting right eye corners
left lip corner-line connecting left eye corners
right lip corner-line connecting right eye corners
top lip-line connecting lip corners
For the lower portion of the face, five distances are calculated, according to Table 1.
KNN is a well-known instance-based classification algorithm, which does not make any assumptions on the underlying data distribution. The similarity between the test sample and the other samples, used in training, is calculated, and k most similar set samples are determined. The class of the test sample is then found, based on the classes of its KNNs.
This classification suits the multi-class classification, in which the decision is based on a small neighborhood of similar objects. In the classification procedure, the training data are first plotted in n-dimensional space, where n is the number of features. Each of these consists of a set of vectors labeled with their associated class (arbitrary number of classes). The number k defines how many neighbors influence the classification. Based on the suggestion made in, the better classification is obtained when k = 3. This suggestion was based on different experiments and observing the classification rate on JAFFE database. The same classifier is used for the Cohn-Kanade and the MMI database as well.
To evaluate the performance of the proposed method, the JAFFE image database, the Cohn-Kanade, and the MMI databases have been used. Eighteen fiducial points have been obtained via the AAM model, and two types of information have been extracted: geometric and textural. MATLAB was used for implementation.
“Leave-One-Out” cross-validation: For each expression from each subject, one image is left out, and the rest are used for training .
Cross-validation: the database is randomly partitioned to ten distinct segments, and nine partitions are used for training, with the remaining partition used to test performance. The procedure is repeated so that every equal-sized set is used once as the test set. Finally, an average of ten experiments is been reported .
Expresser-based segmentation: the database is divided into several segments; each of them corresponds to a subject. For the JAFFE database, 213 expression images, posed by 10 subjects, are partitioned into 10 segments, each corresponding to one subject . For the Cohn-Kanade database, 375 video sequences are been used, that is, over 4,000 images. Nine out of ten segments are used for training and the tenth for testing. It is repeated, so each of the ten segments is used in testing. The average results for those ten experiments are been reported.
Recognition accuracy (%) on the JAFFE database for different approaches
Confusion matrix for the Leave-One-Out method (the JAFFE database)
Recognition accuracy (%) on the Cohn-Kanade database for different approaches
Confusion matrix for the Leave-One-Out method (the Cohn-Kanade database)
Comparison of facial expression recognition for the Cohn-Kanade database
Number of selected video sequences
Recognition rate (average%)
Zhan et al.
Shan et al.
Bartlett et al.
Littlewort et al.
Yang et al.
Aleksic and Katsaggelos
Zafeiriou and Pitas
Kotsia and Pitas
Recognition accuracy (%) on the MMI database for different approaches
The experimental results show that the proposed method meets the criteria of accuracy and efficiency for facial expression classification. It outperforms, in terms of accuracy, some other existing approaches that used the same database. The average recognition rate of the proposed approach is 96.71%, when using “Leave-One-Out” method, and 95.04% when using cross-validation for estimating its accuracy on the JAFFE database. For the Cohn-Kanade database, the average recognition rate of the proposed approach is 92.20%, when using “Leave-One-Out” method, and 90.37% when using the cross-validation for estimating its accuracy. For the MMI database, the average recognition rate of the proposed approach is 87.66%, when using the “Leave-One-Out” method, and 85.97% when using cross-validation for estimating its accuracy. Few articles reported the accuracy on emotion recognition on the MMI. Most of them reported the recognition rate on the AU. Sánchez et al. achieved 92.42% but it is not clear how many video sequences were used. Cerezo et al. reported 92.9% average recognition rate on 1,500 still images of mixed MMI and CK databases. Shan et al. used 384 images from the MMI, and the average recognition rate of 86.9% was reported.
For the “Leave-One-Out” procedure in Table5, all image sequences are divided into six classes, each corresponding to one of the six expressions. Four sets, each containing 20% of the data for each class, chosen randomly, were created to be used as training sets, while the other 20% were used as the test set.
The procedure of classification is repeated five times. In each cycle, the samples in the testing set are included into the current training set. The new set of samples (20% of the samples for each class) is again formed to have a new test set, and the remaining ones are the new training set. Finally, the average classification rate is the mean of the success rate in classification.
This article proposes a combined texture/geometric feature selection for facial expression recognition. The GL circular harmonic filter is applied, for the first time, to facial expression identification. The advantage of this filter is its rich frequency extraction capability for texture analysis, as well as being a rotation-invariant and a multiscale approach. The geometric information of fiducial points is added to the texture information to construct the feature vector. Given a still expression image, normalization is performed first. The extracted features are passed through a KNN classifier. Experiments showed that the selected features represent the facial expression effectively, demonstrating an average success rate of 96.71, 92.2, and 87.66% when following the “Leave-One-Out” strategy for accuracy estimation, as well as 95.04, 90.37, and 85.97% when following the cross-validation method. These are comparable with the results, reported for other approaches on both databases, namely, the presented results demonstrate better success rate for the JAFFE database, and have the same success range as the approaches for the Cohn-Kanade database. Further development of the proposed approach includes perfecting the local and global feature selections, as well as testing using other classification techniques.
- Yuki M, Maddux WW, Masuda T: Are the windows to the soul the same in the East and West? Cultural differences in using the eyes and mouth as cues to recognize emotions in Japan and the United States. J. Exp. Soc. Psychol. 2007, 43(2):303-311. 10.1016/j.jesp.2006.02.004View ArticleGoogle Scholar
- Suwa M, Sugie N, Fujimora K: A preliminary note on pattern recognition of human emotional expression, in Proceedings of the Fourth International Joint Conference on Pattern Recognition. Kyoto, Japan; 1978:408-410.Google Scholar
- Bashyal S, Venayagamoorthy GK: Recognition of facial expressions using Gabor wavelets and learning vector quantization. Eng. Appl. Artif. Intell. 2008, 21: 1056-1064. 10.1016/j.engappai.2007.11.010View ArticleGoogle Scholar
- Ekman P: WV Friesen, Manual for the Facial Action Coding System. Consulting Psychologists Press, Palo Alto, CA; 1977.Google Scholar
- Audio (MPEG Mtg, Atlantic City, 1998): MPEG Video and SNHC, Text of ISO/IEC FDIS 14 496–3 . Doc. ISO/MPEG N2503Google Scholar
- Ekman P, Friesen WV: Constants across cultures in the face and emotions. J. Personal Soc. Psychol. 1971, 17(2):124-129.View ArticleGoogle Scholar
- Cottrell G, Metcalfe J: Face, Gender and Emotion Recognition Using Holons, in. In Advances in Neural Information Processing Systems 3rd edition. Edited by: Morgan K, San M. 1991, 564-571. ed. byGoogle Scholar
- Chen X, Huang T: Facial expression recognition: a clustering based approach. Pattern Recognit. Lett. 2003, 24: 1295-1302. 10.1016/S0167-8655(02)00371-9MATHView ArticleGoogle Scholar
- Turk M, Pentland A: Eigenfaces for recognition. J. Cogn. Neurosci. 1991, 3: 71-86. 10.1162/jocn.19188.8.131.52View ArticleGoogle Scholar
- Rahardja A, Sowmya A, Wilson W: A neural network approach to component versus holistic recognition of facial expressions in images. Intell. Robots Comput. Vis. X: Algorithms and Techniques 1991, 1607: 62-70.Google Scholar
- Feng X, Pietikäinen M, Hadid A: Facial expression recognition based on local binary patterns. Pattern Recognit. Image Anal. 2007, 17(4):592-598. 10.1134/S1054661807040190View ArticleGoogle Scholar
- Lanitis A, Taylor C, Cootes T: Automatic interpretation and coding of face images using flexible models. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19(7):743-756. 10.1109/34.598231View ArticleGoogle Scholar
- Suma M, Sugie N, Fujimora K: A preliminary note on pattern recognition of human emotional expression, in Proceedings of the 4th International Joint Conference on Pattern Recognition. Kyoto, Japan; 1978:408-410.Google Scholar
- Yacoob Y, Davis L: Recognizing faces showing expressions, in International Workshop Automatic Face and Gesture Recognition. Zurich, Switzerland; 1995:278-283.Google Scholar
- Xiang T, Leung MKH, Cho SY: Expression recognition using fuzzy spatio-temporal modeling. Pattern Recognit. 2008, 41(1):204-216. 10.1016/j.patcog.2007.04.021MATHView ArticleGoogle Scholar
- Essa I, Pentland A: Coding, analysis, interpretation and recognition of facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19(7):757-763. 10.1109/34.598232View ArticleGoogle Scholar
- Fasel IR, Bartlett MS, Movellan JRA: A comparison of Gabor filter methods for automatic detection of facial landmarks. IEEE 5th International Conference on Automatic Face and Gesture Recognition, Washington, DC; 2002:242-248.Google Scholar
- Fasela B, Luettinb J: Automatic facial expression analysis: a survey. Pattern Recognit. 2003, 36(1):259-275. 10.1016/S0031-3203(02)00052-3View ArticleGoogle Scholar
- Zafeiriou S, Pitas I: Discriminant graph structures for facial expression recognition. IEEE Trans. Multimed. 2008, 10(8):1528-1540.View ArticleGoogle Scholar
- Lee CC, Shih CY: Gabor feature selection for facial expression recognition. International Conference on Signals and Electronic Systems, Gliwice, Poland; 2010:139-142.Google Scholar
- Deng H, Zhu J, Lyu MR, King I: Two-stage multi-class AdaBoost for facial expression recognition. Proceedings of IJCNN07, Orlando, USA; 2007:3005-3010.Google Scholar
- Pardas M, Bonafonte A: Facial animation parameters extraction and expression recognition using Hidden Markov Models. Signal Process: Image Commun 2002, 17: 675-688. 10.1016/S0923-5965(02)00078-4Google Scholar
- Jacovitti G, Neri A: Multiscale image features analysis with circular harmonic wavelets. Proc. SPIE: Wavelets Appl. Signal Image Process. 1995, 2569: 363-372.View ArticleGoogle Scholar
- Capdiferro L, Casieri V, Laurenti A, Jacovitti G: Multiple feature based multiscale image enhancement. Greece, in Digital Signal Processing Conference 2002, 2: 931-934.Google Scholar
- Ahmadi H, Pousaberi A, Azizzadeh A, Kamarei M: An efficient iris coding based on Gauss-Laguerre wavelets, in 2nd IAPR/IEEE International Conference on Biometrics , Seoul. South Korea 2007, 4642: 917-926.Google Scholar
- Sohail A, Bhattacharya P: Classification of facial expressions using k-nearest neighbor classifier. 4418 edition. Computer Vision Computer Graphics Collaboration Techniques; 2007:555-566.Google Scholar
- Lyons M, Akamatsu S, Kamachi M, Gyoba J: Coding facial expressions with Gabor wavelets. Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan; 1998:200-206.Google Scholar
- Kanade T, Cohn JF, Tian Y: Comprehensive database for facial expression analysis. Proceedings of the 4th IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France; 2000:46-53.Google Scholar
- Pantic M, Valstar MF, Rademaker R, Maat L: Web-based database for facial expression analysis. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’05), Amsterdam, Netherlands; 2005:317-321.Google Scholar
- Shan C, Shaogang G, McOwan PW: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 2009, 27: 803-816. 10.1016/j.imavis.2008.08.005View ArticleGoogle Scholar
- Lyons M, Budynek J, Akamatsu S: Automatic classification of single facial images. IEEE Trans. Pattern Anal. Mach. Intell. 1999, 21: 1357-1362. 10.1109/34.817413View ArticleGoogle Scholar
- Feng X, Lv B, Li Z, Zhang J: A novel feature extraction method for facial expression recognition. Proceeding of JCIS, Taiwan; 2006:371-375.Google Scholar
- Zhi R, Ruan Q: Facial expression recognition based on two dimensional discriminant locality preserving projections. Neuro Comput. 2008, 71: 1730-1734.Google Scholar
- Zhang Z, Lyons M, Schuster M, Akamatsu S: Comparison between geometry based and Gabor wavelet based facial expression recognition using multi layer perceptron. Proceeding 3rd International Conference on Automatic Face and Gesture Recognition, Nara, Japan; 1998:454-459.Google Scholar
- Liejun W, Xizhong Q, Taiyi Z: Facial expression recognition using improved support vector machine by modifying kernels. Inf. Technol. J. 2009, 8(4):595-599. 10.3923/itj.2009.595.599View ArticleGoogle Scholar
- Shih FY, Chuang C, Wang PSP: Performance comparisons of facial expression recognition in Jaffe database. IJPRAI 2008, 445-459.Google Scholar
- Zhao L, Zhuang G, Xu X: Facial expression recognition based on PCA and NMF. Proceeding of the 7th World Congress on Intelligent Control and Automation, Chongqing, China; 2008:6822-6825.Google Scholar
- Guo G, Dyer CR: Learning from examples in the small sample case: face expression recognition. IEEE Trans. Syst. Man Cybern. B 2005, 35(3):477-488. 10.1109/TSMCB.2005.846658View ArticleGoogle Scholar
- Zhan Y, Ye J, Niu D, Cao P: Facial expression recognition based on Gabor wavelet transformation and elastic templates matching. Int. J. Image Graph. 2006, 6(1):125-138. 10.1142/S0219467806002112View ArticleGoogle Scholar
- Shan C, Gong S, McOwan PW: Robust facial expression recognition using local binary patterns, in Proceeding of ICIP05 , Genoa. Italy 2005, 2: 370-373.Google Scholar
- Bartlett MS, Littlewort G, Fasel I, Movellan JR: Real time face detection and facial expression recognition: development and applications to human computer interaction, in IEEE Conference on Computer Vision and Pattern Recognition , Madison. Wisconsin 2003, 5: 53-53.Google Scholar
- Littlewort G, Bartlett M, Fasel I, Susskind J, Movellan J: Dynamics of facial expression extracted automatically from video. 5th edition. Proceeding of IEEE Conf. Computer Vision and Pattern Recognition, Workshop on Face Processing in Video, New York, USA; 2004:80-88.Google Scholar
- Yang P, Liu Q, Metaxas DN: Exploring facial expressions with compositional features. Proceeding of CVPR, San Francisco, USA; 2010:2638-2644.Google Scholar
- Tian Y: Evaluation of face resolution for expression analysis. Proceeding of IEEE Workshop Face Processing in Video, Washington, DC, USA; 2004:82-82.Google Scholar
- Aleksic SP, Katsaggelos KA: Automatic facial expression recognition using facial animation parameters and multi-stream HMMS. IEEE Trans. Inf. Forensics Secur. 2006, 1(1):3-11. 10.1109/TIFS.2005.863510View ArticleGoogle Scholar
- Kotsia I, Pitas I: Facial expression recognition in image sequences using geometric deformation features and support vector machines. IEEE Trans. Image Process 2007, 16(1):172.MathSciNetView ArticleGoogle Scholar
- Sánchez A, Ruiz JV, Ana M: Belén, AS Montemayor, H Javier, P Juan José. Differential optical flow applied to automatic facial expression recognition. Neurocomputing 2011, 74(8):1272-1282.Google Scholar
- Cerezo E, Hupont I, Baldassarri S, Ballano S: Emotional facial sensing and multimodal fusion in a continuous 2D affective space. Ambient Intell. Hum. Comput. 2012, 3: 31-46. 10.1007/s12652-011-0087-6View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.