Robust gait-based gender classification using depth cameras
© Igual et al.; licensee Springer. 2013
Received: 5 July 2012
Accepted: 16 October 2012
Published: 2 January 2013
This article presents a new approach for gait-based gender recognition using depth cameras, that can run in real time. The main contribution of this study is a new fast feature extraction strategy that uses the 3D point cloud obtained from the frames in a gait cycle. For each frame, these points are aligned according to their centroid and grouped. After that, they are projected into their PCA plane, obtaining a representation of the cycle particularly robust against view changes. Then, final discriminative features are computed by first making a histogram of the projected points and then using linear discriminant analysis. To test the method we have used the DGait database, which is currently the only publicly available database for gait analysis that includes depth information. We have performed experiments on manually labeled cycles and over whole video sequences, and the results show that our method improves the accuracy significantly, compared with state-of-the-art systems which do not use depth information. Furthermore, our approach is insensitive to illumination changes, given that it discards the RGB information. That makes the method especially suitable for real applications, as illustrated in the last part of the experiments section.
In this article, we focus our attention on the problem of gender classification. Almost any gait classification task can benefit from a previous robust gender recognition phase. However, current systems for gender recognition are still beyond human abilities. In short, there are three main drawbacks in the automatic gait classification problems: (i) the human figure segmentation, which is usually highly computationally demanding, (ii) the changes of viewpoint, and (iii) the partial occlusions. This study deals particularly with the first two drawbacks, and presents some future lines of research regarding the third one.
In order to improve the gait-based gender classification methods, we propose to use depth cameras. More concretely, we present a gait feature extraction system that uses just depth information. In particular, we used Microsoft’s Kinect, which is an affordable device provided with an RGB camera and a depth sensor. It records RGBD videos at 30 frames per second at a resolution of 640×480. This device has rapidly attracted interest among the computer vision community. For example, Shotton et al.  won the best paper award at CVPR 2011 for their work on human pose recognition using Kinect, while ICCV 2011 included a workshop on depth cameras, especially focused on the use of Kinect for computer vision applications.
In the recent literature, we can find some papers on the use of Kinect for human detection , body figure segmentation , or pose estimation [8–10]. However, there are few works on gait analysis that use this device, although this topic can benefit from the depth information. This study is reviewed in the next section. Notice that the use of depth cameras simplifies the human figure segmentation stage, making it possible to process this information considerably faster than before. The depth information offers the possibility of extracting gait features that are more robust against view changes.
In this study, we present a new feature extraction system for gait-based gender classification that uses the 3D point cloud of the subject per frame as a source. These point clouds are aligned according to their centroid and projected into their PCA plane. Then, a 2D histogram is computed in this plane, and it is divided into five parts, to compute the final discriminative features. We use support vector machine (SVM) during the classification stage. The proposed method is detailed in Section 3.
To test our approach we used the DGait database . This database is currently the only publicly available database for gait analysis that includes depth information. It has been acquired with Kinect in an indoor semi-controlled environment, and contains videos of 53 subjects walking in different directions. Subject, gender and age labels are also provided. Furthermore, a cycle per direction and subject has manually been labeled as well. We perform different experiments with the DGait database and compared our results with the state-of-the art method for gait-based gender recognition proposed by Li et al. . Our system shows higher performance across all the tests and higher robustness against changes in viewpoint. Moreover, we show results with a test performed in real environment data, where we deal with partial occlusions.
The rest of the article is organized as follows. The next section offers a brief overview of the recent literature on gait classification. Then, Section 3 details the proposed feature extraction method. In Section 3, we present the database and the results of our experiments. Finally, the last section concludes the study and proposes a line of further research.
2 Related work
In general, there are two main approaches for gait analysis, termed model-based and model-free. The first one encodes the gait information using body and motion models, and extracts features from the parameters of the models. In the second one, no prior knowledge of the human figure or walking process is assumed. While model-based methods have shown interesting robustness against view changes or occlusions [13, 14], they are usually high demanding computationally. That makes them less suitable for real-time applications than model-free approaches that do not have to understand the constraints of the walking movements.
Actually, most of the existing methods for gait classification just deal with the case of side views. Nevertheless, we can find some approaches dealing with the multi-view problem. For example, Makihara et al.  proposed a spatio-temporal silhouette volume of a walking person to encode the gait features, and then applied a view transformation model using singular value decomposition to obtain a more view-invariant feature vector. More recently, a further study on the view dependency in gait recognition was presented later by Makihara et al. . On the other hand, Yu et al.  present a set of experiments to evaluate the effect of view angle on gait recognition, using the GEI images as features and classifying with Nearest Neighbors. Finally, Kusakunniran et al.  proposed a view transformation model (VTM), which adopts a multi-layer perceptron as a regression tool. The method estimates gait features from one view using selected regions of interest from another view. With this strategy they obtained normalized gait features of different views into the same view, before gait similarity is measured. They tested their model using several large gait databases. Their results show a significantly improved performance for both cross-view and multi-view gait recognitions, in comparison with other typical VTM methods. As previously stated, we can find some work on gait analysis that uses depth information. For instance, Ioannidis et al.  presented another work on gait recognition using depth images. More recently, Sivapalan et al.  presented a new approach for people recognition based on frontal gait energy volumes. In their study, they use a gait database that includes depth information, but the authors have not published it yet. Moreover, this dataset just includes 15 different subjects in frontal view. For this reason, this database cannot be used for this study, which aims to analyze gait from different points of view.
3 Method for gait-based gender recognition
We present a feature extraction method which is partially based on the study of Li et al. . We selected this algorithm as a baseline for its tradeoff of simplicity and robustness, given that our goal is to deal with real-time applications. Moreover, it does not properly use RGB information, given that the features are extracted from the binary body silhouettes. That makes this method insensitive to illumination changes.
- 1.Data preprocessing. For all the images in a cycle, we perform preprocessing.
We resize images to d x ×d y , to ensure that all silhouettes have the same height.
We center the upper half of the silhouette with respect to its horizontal centroid, to ensure that the torso is aligned in all the sequences.
We segment human silhouettes using depth map. The segmentation algorithm is proprietary software included in the communication libraries of the Kinect (OpenNi middleware from ).
- 2.Compute GEI. GEI is defined as the average of silhouettes in a gait cycle (composed of T frames):(1)
where i and j are the image coordinates, and I(·,·,t)is the binary silhouette image obtained from the t th frame.
Parts definition. We divide the GEI into five parts corresponding to head and hair, chest, back, waist and buttocks, and legs, as defined in .
PCA. For every part, we compute PCA (keeping principal components that preserve the 98% of the data variance).
FLD. We compute one single feature per part using linear discriminant analysis, obtaining thus a final feature vector with five components.
- 1.3D points alignment. For all the images of a cycle, silhouettes are segmented based on a depth map.
We keep for each frame the 3D point cloud of the subject contour.
We compute the 3D point cloud centroid, and use it to align the points.
We accumulate all the centered 3D points, obtaining a single 3D point cloud that summarizes the entire cycle.
PCA plane computation. To ensure orientation invariance in the 3D-GEI features, we compute the PCA plane of the accumulated point cloud and use it to represent the 3D information. We rotate the plane so that the y-axis points up and the x-axis points to the right.
Point projection. We project all the points into the PCA plane. The PCA plane contains the main orientation of the person in the 3D frame, and projecting data into this plane allows us to capture orientation invariant shapes.
3D histogram definition. We consider the smallest window that contains all the projected points, divide it into a grid of m x ×m y bins, and compute a histogram image, as illustrated in Figure 4. More concretely, each cell of the grid represents the number of points whose projection belongs to the cell.
Parts definition. We divide the 3D histogram into five parts corresponding to body parts as it is done for 2D-GEI images. In Figure 5, we plotted an example of the histogram image with the corresponding parts.
PCA. For every part, we compute PCA (keeping principal components that explain the 98% of data variance).
FLD. We compute one single feature per part using linear discriminant analysis, obtaining thus a final feature vector with five components.
In order to test the proposed method we perform different experiments with the DGait database. This database contains DRGB gait video sequences from different points of view. The next section briefly describes this dataset. We use the manually labeled cycles to perform a first evaluation, and then we test the method without these labels on the entire trajectories of each subject. In these tests, we compare our results with the 2D feature extraction method described in . We denote this method by 2D-FE, while our method is denoted by 3D-FE. Notice that both methods extract a final feature vector of five components. In all the experiments, we used the OpenNi middleware from  to segment silhouettes in the scene. On the other hand, we classify with SVM. Concretely, we used the OSU-SVM toolbox for Matlab .
In the last part of this section, we show results computed in real-time in a video acquired in a non-controlled environment using our 3D-FE method.
4.1 The DGait database
The DGait database was acquired in an indoor environment, using Microsoft’s Kinect . The dataset contains DRGB video sequences from 53 subjects, 36 male (67.9%) and 17 female (32.0%), most of them Caucasian. This database can be viewed and downloaded at the following address: http://www.cvc.uab.es/DGaitDB.
The database contains one video per subject, containing all the sequences. The labels provided with the database are subject, gender, and age. Also the initial and final frames of an entire gait cycle per direction and subject are provided. Some baseline results of gait-based gender classification using this database are shown in .
In all the experiments, we considered images of dimensions d x =256, d y =180, and 3D Histograms of size m x =64, m y =45.
4.2 Experiments on the labeled cycles
In these experiments, we considered for each subject the manually labeled cycle per sequence. This is a total of 11 cycles per subject, and we group them into three categories: diagonal (denoted by D), side (denoted by S), and frontal (denoted by F).
First, we performed leave-one-subject-out validation on the set of 53 subjects of the DGait database. In each run, we trained a classifier with the cycles of all the subjects except one and estimated the gender of each of the 11 cycles of the test subject separately. The RBF parameter is learned in the leave-one-subject-out validation process and is set to σ=0.007 for 2D-FE, and σ=0.02 for 3D-FE.
Results of the leave-one-subject-out validation on labeled cycles
where TF and TM denote true female and true male, respectively, while FF and FM denote false female and false male, respectively.
Notice that the 3D-FE improves the 2D-FE in all the measures. In particular, the higher improvement of the 3D-FE relies on the F-recall. On the other hand, observe that M-recall is higher than F-recall in all the cases. This is because the training data are unbalanced, since the database include fewer females than males. However, the 3D-FE can represent more accurately the gait patterns even in the case of unbalanced data.
Results of the leave-one-orientation-out on labeled cycles
4.3 Experiments on the video sequences
Using the DGait database, we evaluated the performance of the leave-one-subject-out experiment for each frame, making no use of the labeled cycles of the testing subject. Concretely, we trained a classifier with the cycles of all the subjects except the subject to test, and estimated the gender of the subject at each frame of the whole trajectory, separately. Thus, given a specific frame, we compute the 2D-FE and 3D-FE features using a sliding window on the frame sequence of size 20. This size is the mean length of the labeled cycles in the DGait database.
Results of the leave-one-subject-out on frames
Finally, in this experiment, we also classified the gender of the subjects using the previous results at frame level. For that, we imposed a threshold of 60% on the percentage of classified frames. The gender of a subject will be assigned to be the gender of 60% of the frames of that subject. The obtained correct gender classification results are 65.38% for 2D-FE and 92.31% for 3D-FE. Thus, the improvement provided by 3D-FE is evident.
4.4 Evaluation on real-conditions
The last experiment performed in this study is a quantitative and qualitative evaluation of our method using a video acquired in non-controlled conditions. The Kinect was placed 2 m above the ground, and five different subjects where asked to walk around, with no specific paths. The algorithm ran in this video in real-time, and we used the SVM classifier previously learned with the DGait database for the gender recognition.
We perform the test on the whole sequence as described in Section 3, using a sliding window on frames of size 20, to compute the 3D-FE features. A total of 745 frames have been tested, considering each frame as many times as the number of people that appear in it. In this experiment, the OpenNi middleware was used to identify the different subjects, in order to compute the 3D-FE of each subject at each frame.
Results of the real-conditions evaluation
In this study, we presented a new approach for gait-based gender classification using Kinect, which can run in real-time. Specifically, we proposed a feature extraction algorithm that takes as input the 3D point cloud of the video frames. The system does not make use of RGB information, making it insensitive to illumination changes. In short, the 3D point cloud of a cycle sequence are aligned and grouped, and then projected into their PCA plane. A 2D histogram is computed in this plane, and then final discriminative features are obtained by first dividing the histogram into parts and then using linear discriminant analysis.
To evaluate the proposed methodology we have used a DRGB database with Kinect which is the first publicly available database for gait analysis that includes both RGB and depth information. As shown in the experiments, our proposal effectively encodes the gait sequences, and is more robust against view changes than other state-of-the-art approaches that can be run without RGB information. Our method is fast and suitable for real-time and real environment applications. In the last part of our tests we show an example of its performance in this context.
In our future work, we want to focus our attention on the problem of partial occlusions. In this case, we plan to process the 3D data and 2D histogram build in a more sophisticated way, to develop a system that is more robust against missing data.
This study was supported in part by TIN2009-14404-C02-01 and CONSOLIDER-INGENIO CSD 2007-00018.
- Sen Köktas N, Yalabik N, Yavuzer G, Duin R: A multi-classifier for grading knee osteoarthritis using gait analysis. Pattern Recognition Letters 2010, 31(9):898-904. 10.1016/j.patrec.2010.01.003View ArticleGoogle Scholar
- Bashir K, Xiang T, Gong S: Gait recognition without subject cooperation. Pattern Recognition Letters 2010, 31(13):2052-2060. 10.1016/j.patrec.2010.05.027View ArticleGoogle Scholar
- Trumedia: TruMedia and Dzine Introduce Joint Targeted Advertising Solution PROM Intergrated into DISplayer. http://www.trumedia.co.il/trumedia-and-dzine-introduce-joint-targeted-advertising-solution-prom-intergrated-displayer
- Wagg DK, Nixon MS: On automated model-based extraction and analysis of gait. In Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition, FGR’ 04. Seoul, Korea: IEEE Computer Society; 2004:11-16.Google Scholar
- Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A: Real-time human pose recognition in parts from single depth images. In Computer Vision and Pattern Recognition (CVPR), IEEE Conference on. Piscataway, New Jersey USA: IEEE Publisher; 2011:1297-1304.Google Scholar
- Spinello L, Arras K: People detection in RGB-D data. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2011, 3838-3843.Google Scholar
- Gulshan V, Lempitsky V, Zisserman A: Humanising GrabCut: Learning to segment humans using the Kinect. In 1st IEEE Workshop on Consumer Depth Cameras for Computer Vision (ICCV Workshops). Piscataway, New Jersey USA: IEEE Publisher; 2011:1127-1133.Google Scholar
- Jain HP, Subramanian A, Das S, Mittal A: Real-time upper-body human pose estimation using a depth camera. In Proceedings of the 5th international conference on Computer vision/computer graphics collaboration techniques, MIRAGE’11. Berlin, Heidelberg: Springer-Verlag; 2011:227-238.View ArticleGoogle Scholar
- Girshick RB, Shotton J, Kohli P, Criminisi A, Fitzgibbon AW: Efficient regression of general-activity human poses from depth images. In IEEE International Conference on Computer Vision (ICCV) (2011). Piscataway, New Jersey USA: IEEE Publisher; 415-422.Google Scholar
- Baak A, Müller M, Bharaj G, Seidel HP, Theobalt C: A Data-Driven Approach for Real-Time Full Body Pose Reconstruction from a Depth Camera. In IEEE 13th International Conference on Computer Vision (ICCV), (IEEE 2011). Piscataway, New Jersey USA: IEEE Publisher; 1092-1099.Google Scholar
- Borràs R, Lapedriza A, Igual L: Depth Information in Human Gait Analysis: An Experimental Study on Gender Recognition. In Proceedings of the International Conference on Image Analysis and Recognition. Berlin Heidelberg: (Springer-Verlag; 2012:98-105.View ArticleGoogle Scholar
- Li X, Maybank S, Yan S, Tao D, Xu D: Gait components and their application to gender recognition. Systems, Man, and, Cybernetics, Part, C: Applications and Reviews. IEEE Transactions on 2008, 38(2):145-155.Google Scholar
- Bouchrika I, Nixon MS: Model-based feature extraction for gait analysis and recognition. In Proceedings of the 3rd international conference on Computer vision/computer graphics collaboration techniques, MIRAGE’07. Berlin, Heidelberg: Springer-Verlag; 2007:150-160.View ArticleGoogle Scholar
- Yam C, Nixon M: Model-based Gait Recognition. Enclycopedia of Biometrics 2009, 1: 1082-1088.Google Scholar
- Han J, Bhanu B: Individual Recognition Using Gait Energy Image. IEEE Trans. Pattern Anal. Mach. Intell 2006, 28: 316-322.View ArticleGoogle Scholar
- Wang C, Zhang J, Pu J, Yuan X, Wang L: Chrono-gait image: a novel temporal template for gait recognition. In Proceedings of the 11th European conference on Computer vision: Part I. Berlin, Heidelberg: Springer-Verlag; 2010:257-270.Google Scholar
- Yu S, Tan T, Huang K, Jia K, Wu X: A study on gait-based gender classification. Image Processing, IEEE Transactions on 2009, 18(8):1905-1910.MathSciNetView ArticleGoogle Scholar
- Makihara Y, Sagawa R, Mukaigawa Y, Echigo T, Yagi Y: Gait recognition using a view transformation model in the frequency domain. In Proceedings of the 9th European conference on Computer Vision - Volume Part III. Berlin, Heidelberg: Springer-Verlag; 2006:151-163.Google Scholar
- Makihara Y, Mannami H, Yagi Y: Gait analysis of gender and age using a large-scale multi-view gait database. In Proceedings of the 10th Asian conference on Computer vision - Volume Part II, (ACCV’10). Berlin, Heidelberg: Springer-Verlag; 2011:440-451.Google Scholar
- Yu S, Tan D, Tan T: A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In 18th International Conference on Pattern Recognition, (ICPR), Volume 4, (IEEE 2006). New Jersey USA: ; 441-444.Google Scholar
- Kusakunniran W, Wu Q, Zhang J, Li H: Cross-view and multi-view gait recognitions based on view transformation model using multi-layer perceptron. Pattern Recognition Letters 2012, 33(7):882-889. 10.1016/j.patrec.2011.04.014View ArticleGoogle Scholar
- Ioannidis D, Tzovaras D, Damousis I, Argyropoulos S, Moustakas K: Gait recognition using compact feature extraction transforms and depth information. Information Forensics and Security, IEEE Transactions on 2007, 2(3):623-630.View ArticleGoogle Scholar
- Sivapalan S, Chen D, Denman S, Sridharan S, Fookes CB: Gait energy volumes and frontal gait recognition using depth images. In Proc. the 1st IEEE Int. Joint Conf. on Biometrics. Washington DC, USA: ; 2011:1-6.Google Scholar
- OpenNI: OpenNI Organization. www.openni.org
- OSU-SVM: Support Vector Machine (SVM) toolbox for the MATLAB numerical environment. http://sourceforge.net/projects/svm/
- Kinect: Microsoft Corp. Redmond WA. Kinect for Xbox 360. http://www.microsoft-careers.com/go/Kinect-for-Xbox-360-Jobs/150565/
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.