A statistical approach for person verification using human behavioral patterns
© Gomez-Caballero et al.; licensee Springer. 2013
Received: 25 September 2012
Accepted: 2 August 2013
Published: 8 August 2013
We propose a person verification method using behavioral patterns of human upper body motion. Behavioral patterns are represented by three-dimensional features obtained from a time-of-flight camera. We take a statistical approach to model the behavioral patterns using Gaussian mixture models (GMM) and support vector machines. We employ the maximum likelihood linear regression adaptation method to estimate GMM parameters with a limited amount of data. Experimental results show that it reduced by 28.6% the relative equal error rates from a system using the maximum likelihood estimation with 25 samples per subject. We also demonstrate that the proposed approach is robust against variations in body motion over time.
Identity verification systems are getting popular in our daily life. They provide a secure means of controlling access to information or equipment. Traditionally, these systems have required something that one has or something that one knows (e.g., keys and password). However, these representations of identity can be easily lost, manipulated, or stolen. This problem can be solved by a biometrics approach, which identifies an individual based on his/her characteristic traits. Biometrics can be divided into two classes : physiological and behavioral.
Physiological biometrics uses physical traits, such as fingerprint or the iris . This type of biometrics is stable and accurate since it relies on unique and permanent physical traits. However, they cannot be changed or reissued if the biometric data are exposed or counterfeited . They are also perceived as obtrusive . In behavioral biometrics, a person’s identity is verified through action patterns which can be repeated in a unique manner . In comparison to physiological biometrics, behavioral biometrics are less stable since behavior may change due to the environment or the physical state of the individual. On the other hand, they are not obtrusive and are difficult to disguise or to be imitated by others . Examples of behavioral biometrics include voice, keystroke dynamics, signature, and gait. Voice can be used for remote identity verification . Keystroke verifies the identity of a user while he/she uses a computer . Signature can be used in online transactions . Gait has been proven to be useful for surveillance applications where data can be collected from a distance .
In this paper, we focus on the individuality of upper body motion as an alternative cue for identity verification applications where the acquisition of other behavioral biometrics is not feasible due to visual and space constraints. An example application is an automatic gatekeeper, where a person explicitly requests permission to access a secure area by performing a body gesture in front of a camera. Only a few studies have been done for this kind of application. Pratheepan et al. used arm waving motion  and simple actions such as sitting down, standing up, and walking away  for individual identification in surveillance applications. These studies used holistic features which represent characteristics from a region of interest of a person as a whole . They used these features to create templates that represent each person’s behavioral patterns.
The holistic features used in the above mentioned works are sensitive to visual variations such as clothing differences and view and scale changes since they rely on global information about a person’s appearance . To reduce the effects of visual variations, human body models describing the kinematic properties (skeleton) or the shape of the body have been used for feature extraction in gait recognition . By doing this, it is possible to extract features from joint positions to alleviate the effects introduced by clothing changes or noise. However, this technique requires localization and tracking of specific human body parts, which are often erroneous.
Accurate extraction of features also depends on the correct segmentation of a person from the background scene, which can be affected by texture and illumination changes . Recently, time-of-flight (ToF) cameras have been used to reduce such errors by simplifying background segmentation [17, 18]. Furthermore, ToF cameras have a low dependency on lighting conditions and invariance to color and texture.
In previous approaches [11, 12], extracted features were utilized to create a template characterizing a person’s motion. However, templates are weak against the natural variations of individual behavioral patterns since they only encode an average representation of observed samples . Statistical models such as Gaussian mixture models (GMM) and hidden Markov models (HMM) have successfully handled variations in individual behavioral patterns [20–23]. For example, Kale et al.  showed that HMMs were more robust than templates for gait recognition. Statistical models usually exhibit higher accuracy than templates since they model intraindividual variations well and are also able to handle variations in the sample duration. One drawback is that a relatively larger amount of data is needed to estimate their parameters. The data sparseness problem is significant when the training data are limited, which is often solved by adaptation techniques such as maximum a posteriori (MAP) , maximum likelihood linear regression (MLLR) , and eigenspace-based techniques .
In this paper, we propose a statistical method for person verification based on behavioral patterns of human upper body motion, extending our prior work published in  and later in . We use depth information acquired from a ToF camera to extract characteristic features from specific parts of the body in the three-dimensional (3D) space. GMMs are used to robustly model individuals’ behavioral patterns. To cope with the problem of data sparseness, we use the MLLR adaptation technique. For identity verification tests, we combine GMMs with support vector machines (SVM)  and compare its performance with the GMM log-likelihood ratio framework. Lastly, we evaluate our method using a data set containing samples collected over different sessions to demonstrate that it is able to verify the identity of a person even after a period of time.
The remainder of this paper is organized as follows. Section 2 gives a brief overview of the adopted process for person verification. Section 3 describes the features that are used for person verification in our approach. Section 4 describes the statistical modeling and adaptation techniques used to model a person’s behavioral patterns. Section 5 describes the classifiers used for person verification systems. In Section 6, experimental conditions are explained and results are presented. Finally, conclusion and future work are described in Section 7.
2 Overview of our person verification system
3 Feature extraction
We implement an image processing front-end to locate and track eight anatomical landmark points on depth image streams acquired by the ToF camera. The input consists of image streams acquired from a Swissranger SR4000 camera (MESA Imaging, Zurich, Switzerland)  at approximately 20 frames per second. Each image frame represents the scene depth map with a resolution of 177 × 144 pixels and a field of view of 43.6° × 34.6°. Each pixel has an accuracy higher than 1 cm within the distance measurement range of 5 m.
where Dist(a,b) is the distance between pixel a and pixel b, D a and D b are the depth value of pixels a and b, respectively, and pixel a is the seed or a pixel already aggregated to R.
Pixel b is selected from among the four connected neighbor pixels of a, unless it is already a part of the region. The threshold θ in Equation 1 is determined empirically based on preliminary experiments. The region continues growing recursively until no neighbor pixel can satisfy the condition in Equation 1, and then only the region R, depicting the body of a person, remains.
Next, the algorithm segments the person’s body into four different regions (i.e., chest, head, left, and right arms). First, it finds a chest rectangle around the center of mass of the body. Then it performs few iterations of expanding and shrinking until 15% of its total area does not contain pixels from the body region. The 15% empty area represents the space between the arms and chest due to the ‘open arms’ position which each subject takes at the beginning of recording or some possible holes in the body region due to noisy data. After setting the chest rectangle, the algorithm segments the head and both arm regions by region growing with a constrained search space. For the head, region growing starts from the middle point at the top edge of the chest rectangle, and the growing is constrained to the upper sector. For the right/left arm, region growing starts at the corresponding top vertex of the chest rectangle, and growing is constrained to its corresponding side. In case a previous landmark point position (i.e., elbows and head) is available, the algorithm uses it as a seed for the growing. Each region is labeled as detected if its area is larger than a threshold, which is empirically set by preliminary tests.
After body segmentation, the algorithm calculates eight landmark points. The algorithm registers the 3D correspondence of the rectangle’s top vertex points as the left and right shoulder points. Also, it registers the 3D correspondence of the rectangle’s center as the center of mass of the body. The head point is obtained by taking the 3D correspondence of the calculated center of mass of the head region. For the elbow and hand points, skeletonization operation  is applied on each arm region to find the longest single pixel line starting from the shoulder point. We assume that the hand point can be found at the end of the line produced by skeletonization and that the center of mass of this line corresponds to the elbow point. Finally, a Kalman filter  for each landmark point is updated to estimate its position in the next frame. If a region is not found, the estimate calculated in the previous frame is used. The Kalman filter is used for visual tracking in order to cope with ambiguities when capturing human movement.
4 Statistical modeling based on Gaussian mixture models
A GMM can represent feature vectors by its mean components as well as represent their average variations by the covariance matrix. For this reason, it is possible to model the variations of individual features that characterize a person. While other statistical models such as hidden Markov models and conditional random fields have been proven to be effective to model human motion , they are better suited for sequential actions characterizing an activity .
To robustly estimate the GMM parameters, we have to deal with the problem of data sparseness. The ML method cannot precisely estimate the model parameters when the training data are sparse and their size is small. Adaptation techniques such as MAP , MLLR , and eigenspace-based techniques  are often used to solve this problem. Although eigenspace-based techniques are effective when adaptation data are extremely small , they restrict the model to a lower dimensionality where much information might be lost . On the other hand, MLLR and MAP do not impose this restriction on the models. However, MAP only updates distributions which are observable in the adaptation data, and thus, it requires more data for adaptation . MLLR estimates a set of transformations that can be shared by several model components, hence reducing the amount of adaptation data required [41, 42]. Therefore, we use MLLR.
where A is an n×n transformation matrix (n is the dimensionality of the data), and b is a bias vector which maximizes the likelihood of the adaptation data. The parameters A and b are shared among all the mixture components of a GMM. To further reduce the number of parameters, we use a diagonal transformation. As an initial model, a universal background model (UBM)  is often used. A UBM is a GMM trained by EM parameter estimation using the training data from all the subjects in the data set. The parameters of the UBM are adapted via MLLR to derive a person-dependent model using a specific person’s training data.
5 Person identity verification
In this task, an unknown person claims an identity and provides a sample to be compared with a model for the person whose identity is claimed. A match score between the claimed identity’s model and the input sample is computed, and if the score is above a threshold, the identity claim is accepted, otherwise rejected. We implement a GMM log-likelihood ratio and SVM classifiers for the person identity verification system and compare their performance.
5.1 Log-likelihood ratio
5.2 Support vector machine
where t i is an ideal output (either +1 or −1, depending on whether the corresponding support vector is a positive or negative example of a given class), and α i and d are the SVM parameters set during the training step. The vector x i is a support vector obtained from a training set by an optimization method. The data points from the training set lying on the boundaries are the support vectors.
We utilize two methods, SVM-S and SVM-T, where different input features are used. SVM-S employs GMM supervectors as input feature vectors [46, 47]. A GMM supervector is formed by concatenating the mean vectors (μ) of GMM mixture components into a single vector. One GMM is created by adaptation using one sample as adaptation data. For training, positive feature vectors are made from the GMMs of the target subject, and negative feature vectors are made from the GMMs of non-target subjects.
SVM-T employs MLLR transformation parameters as input feature vectors [48, 49]. The elements of the matrix A and the vector b in Equation 6 are concatenated to form a single supervector. By doing this, it is possible to model the difference between the subject GMM and the UBM instead of modeling each subject’s characteristics. One supervector is obtained by performing MLLR adaptation using one sample as adaptation data. For training, positive feature vectors are made from the transformation parameters of the target subject, and negative feature vectors are made from the transformation parameters of non-target subjects.
We collected a new data set to evaluate our method since there is no public database containing human upper body movements recorded with a ToF camera over several sessions. The data set used in our approach consists of short image streams of 12 subjects (three females and nine males), where each subject performed two different upper body movements separately. The movements are classified as ‘raising left arm’ and ‘raising right arm’. The data were organized in five sessions recorded with an interval of 3 to 5 days between them. The first session (session 0) contains 25 samples per user for each movement and was used only for the model training phase. Each of the four remaining sessions contains eight samples per person for each movement and was used only as testing samples. The average length per sample is 2.93 s, and the average frame number per sample is 70.5 frames.
Identity verification tests were conducted for each of the four sessions available for testing in the data set. In each session, we conducted eight verification trials per person where each trial used a single sample per subject. System performance was measured by the equal error rate (EER) calculated a posteriori for the optimal decision threshold. The EER is the value where the false acceptance rate and false rejection rate are equal. The obtained optimal threshold was used for all subjects. Detection error trade-off (DET) curves were also plotted to assess system behavior in the full range of operating points.
For the LLR system, we used person-dependent models adapted from a UBM by MLLR using a diagonal transformation and compared its performance with models obtained by ML estimation using the EM algorithm. Models used in LLR systems were created with 16 Gaussian mixtures since this setup exhibited the best performance in our preliminary experiments. The Hidden Markov Model Toolkit  was used to train GMMs. Each of the SVM-S and SVM-T systems used 2, 4, 8, 16, and 32 Gaussian mixture variants. For the sake of simplicity, only configurations which exhibit the best result are presented in this paper. The SVM-light toolkit  was used to train the SVMs. Based on preliminary experiments, we chose linear kernel for SVM training.
The SVMs were trained using the target person’s feature vectors as positive examples (25 samples for training) and the non-target persons’ feature vectors as negative examples (275 samples for training). To deal with the problem of an unbalanced training data set, we use a cost factor  to penalize classification errors on positive examples stronger than errors on negative examples by setting a higher cost for false-positives compared to false-negatives. The feature vector dimensionality for the SVM-S system was 45×N, where N is the number of Gaussian mixtures in the GMM, and 45 corresponds to the number of mean components per Gaussian mixture. For the SVM-T, the MLLR transforms resulted in 2,070-dimensional feature vectors (45×45+45, including the bias vector b).
First, we show the accuracy results for a landmark point localization test. Then the results for person verification task are presented.
6.2.1 Body landmark point localization accuracy test
We examined the accuracy of the landmark point localization implemented on the image processing front-end. The measure was an average accuracy per landmark point for all subjects. We provided hand-labeled ground truth data and compared them to the landmark positions inferred by our method. If a landmark point was found within D centimeters from the ground truth position, it was considered as correctly localized, otherwise it was considered as incorrectly localized. We set D=10 cm. This value is the same as the one used in the previous human pose recognition research . The hand-labeled data consisted of image streams depicting left arm movements of five subjects (ten samples per person), recorded on a separate session under the same conditions as the rest of the data set.
Landmark localization accuracy
average distance (cm)
average distance (cm)
Despite the low accuracy exhibited for the elbow and hand compared to other landmark points, the tracked points still follow the motion path thanks to the Kalman filter implementation. The measurement errors are smoothed by tuning the parameters of the Kalman filter to achieve a balance between the responsiveness of the tracker and estimate variance. By relying on the Kalman filter, the tracking results are consistent across samples for each subject, which results in a tracker that exhibits low variance - high bias for the elbow and hand points. For this reason, we assume that the motion pattern is preserved to some degree even when the estimated position differs with respect to the ground truth position. Furthermore, by combining features from all landmark points, the effects of inaccurate estimations of elbow and hand points are minimized since the remaining landmark points are estimated and tracked more accurately.
6.2.2 Person verification results
Average equal error rate for the LLR system
Full feature set
Arm feature set
By using the full feature set, the systems exhibit overall higher performance compared to using only features from the arm in motion. The reason for this result is that the whole upper body takes part in the execution of the analyzed gestures. For example, subjects assume a slightly different posture when raising their arms, and a characteristic motion is observed on the arm that its not raised. The combination of these perceptible features allows the creation of better representations of individual behavioral patterns.
By using the full feature set, the ML estimation yielded an average EER of 9.1% for the left arm samples. The MLLR adaptation reduced the average EER from 9.1% to 6.5%. The relative reduction in EER by 28% confirmed that MLLR adaptation was effective.
Average EER over four sessions for verification task using left and right arm movements
Left arm EER (%)
Right arm EER (%)
The two SVM systems achieved the same EER of 8.9% by using different numbers of Gaussian components for the GMM models used to construct the supervectors. However, contrary to our expectations, the LLR system with MLLR adaptation achieved 27% relative reduction in EER compared to the SVM system. McNemar’s test confirmed that performance difference between the systems is statistically significant (P value < 0.001).
The reason why the SVM systems did not achieve a better performance might be the size of the data used to derive GMM models from which the supervectors were created. The number of frames per sample used for GMM model adaptation on the SVM-S and SVM-T systems was relatively small (70.5 frames in average), and thus, adaptation was less effective compared to the case of LLR system where person-dependent models were derived using all training samples of the target person.
EER per session for verification task using left arm movement (%)
For comparison purposes, we measured the training and testing time for each system. We used a PC with 8 GB RAM and a double-core Intel(R) Xeon(R) CPU running at 1.86 GHz. For the LLR system, the average training time for the UBM is 34 and 0.33 s for each person-dependent model by MLLR adaptation. For the SVM systems, the average training time is 21.91 s. The average testing time per sample for the LLR and SVM systems is 0.04 and 0.17 s, respectively.
We have proposed a statistical approach for person identity verification using behavioral patterns observed on human body motion. In particular, we used behavioral patterns from left arm and right arm movements. By using a ToF camera, we simplified the segmentation of the human body to track specific human body parts in the 3D space. Since we extract static and dynamic features of human motion directly from identified landmark points, the effects of appearance changes were reduced. By taking a statistical approach, we effectively modeled the natural variation in features observed on behavioral patterns. To deal with the problem of data sparseness, we used the MLLR adaptation method along with a UBM to estimate parameters for person-dependent GMMs. In addition, we used GMM components and MLLR transformation parameters as features to create supervectors in the context of SVMs.
We have shown that by using a model adaptation method in the training phase, the average EER of the LLR system was reduced to 6.5%, a relative reduction of 27% compared with our SVM systems. The reason why the SVM systems did not exhibit better performance might be because the model adaptation used to derive GMM models for creating supervectors was not as effective as the LLR system. While the verification performance did not improve by using SVM classifiers, we consider that providing a comparison against the LLR system is useful for future improvement of such an approach. We found that features extracted from the left arm motion samples exhibit an overall higher degree of distinctiveness compared to those from the right arm motion samples. Furthermore, experimental results showed that our system is able to verify the identity of a person even after a period of time. Hence, our approach is promising for person verification tasks even when natural variations in behavioral patterns exist. Although we used a vertical arm motion in our experiments, the approach presented in this paper is suitable for any other upper body movements or gestures as well.
For future work, we plan to increase the size of the data set, in both the number of users and sessions, to perform further analysis using a wider range of body movements. We would also like to measure the discriminative degree of different upper body movements, especially for cases when each subject performs a personal movement. We also plan on implementing a more robust landmark point localization and tracking method in order to minimize errors introduced by ambiguities and noisy data. We are interested in implementing a HMM-based framework where more complex movements are used as a cue for identity verification, taking advantage of the temporal information of such movements. We would also like to explore the use of the Kinect image sensor since it can acquire depth images with higher resolutions.
- Jain A, Ross A, Prabhakar S: An introduction to biometric recognition. Circuits Syst. Video Technol., IEEE Trans 2004, 14: 4-20.View ArticleGoogle Scholar
- Lin JL, Hsu HL, Jong TL, Hsu WH: Biometric authentication. In Pattern Recognition, Machine Intelligence and Biometrics. Berlin: Springer; 2011:607-631.View ArticleGoogle Scholar
- O’Gorman L: Comparing passwords, tokens and biometrics for user authentication. Proc. IEEE 2003, 91(12):2021-2040. 10.1109/JPROC.2003.819611View ArticleGoogle Scholar
- Vildjiounaite E, Makela SM, Lindholm M, Riihimaki R, Kyllonen V, Mantyjarvi J, Ailisto H: Unobtrusive multimodal biometrics for ensuring privacy and information security with personal devices. In Pervasive Computing. Lecture Notes in Computer Science, vol 3968. Berlin: Springer; 2006:187-201.Google Scholar
- Yampolskiy RV, Govindaraju V: Behavioural biometrics: a survey and classification. Int. J. Biometrics 2008, 1: 81-113. 10.1504/IJBM.2008.018665View ArticleGoogle Scholar
- Moskovitch R, Feher C, Messerman A, Kirschnick N, Mustafic T, Camtepe A, Lohlein B, Heister U, Moller S, Rokach L, Elovici Y: Identity theft, computers and behavioral biometrics. In IEEE International Conference on Intelligence and Security Informatics, 2009. Piscataway: IEEE; 2009:155-160.View ArticleGoogle Scholar
- Furui S: 40 years of progress in automatic speaker recognition. In Advances in Biometrics. Lecture Notes in Computer Science, vol 5558. Berlin: Springer; 2009:1050-1059.Google Scholar
- Revett K, De Magalhães ST, Santos HMD: On the use of rough sets for user authentication via keystroke dynamics. In Progress in Artificial Intelligence. 13th Portuguese Conference on Artificial Intelligence, EPIA’07. Berlin: Springer; 2007:145-159.Google Scholar
- Bailador G, Sanchez-Avila C, Guerra-Casanova J, de Santos Sierra A: Analysis of pattern recognition techniques for in-air signature biometrics. Pattern Recognit 2011, 44: 2468-2478. 10.1016/j.patcog.2011.04.010View ArticleGoogle Scholar
- Sarkar S, Phillips P, Liu Z, Vega I, Grother P, Bowyer K: The humanID gait challenge problem: data sets, performance, and analysis. Pattern Anal. Mach. Intell., IEEE Trans 2005, 27(2):162-177.View ArticleGoogle Scholar
- Pratheepan Y, Prasad G, Condell J: Style of action based individual recognition in video sequences. In IEEE International Conference on Systems, Man and Cybernetics, 2008. SMC 2008. Piscataway: IEEE; 2008:1237-1242.View ArticleGoogle Scholar
- Pratheepan Y, Torr P, Condell J, Prasad G: Body language based individual identification in video using gait and actions. In Image and Signal Processing. Lecture Notes in Computer Science, vol 5099. Berlin: Springer; 2008:368-377.Google Scholar
- Davis J: Hierarchical motion history images for recognizing human motion. In Proceedings of the IEEE Workshop on Detection and Recognition of Events in Video, 2001.. Piscataway: IEEE; 2001:39-46.View ArticleGoogle Scholar
- Li N, Xu Y, Yang XK: Part-based human gait identification under clothing and carrying condition variations. In 2010 International Conference on Machine Learning and Cybernetics (ICMLC). Piscataway; 2010:268-273.View ArticleGoogle Scholar
- Wagg DK, Nixon MS: On automated model-based extraction and analysis of gait. In Sixth IEEE International Conference on Automatic on Face Gesture Recognition, 2004. Piscataway: IEEE; 2004:11-16.View ArticleGoogle Scholar
- Lee L, Grimson W: Gait analysis for recognition and classification. In Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, 2002. Piscataway: IEEE; 2002:148-155.Google Scholar
- Jensen R, Paulsen R, Larsen R: Analysis of gait using a treadmill and a time-of-flight camera. In Dynamic 3D Imaging. Lecture Notes in Computer Science, vol 5742. Berlin: Springer; 2009:154-166.Google Scholar
- Derawi MO, Ali H, Cheikh FA: Gait recognition using time-of-flight sensor. In BIOSIG. LNI. Bonn: GI; 2011:187-194.Google Scholar
- Boulgouris N, Hatzinakos D, Plataniotis K: Gait recognition: a challenging signal processing technology for biometric identification. Signal Process. Mag., IEEE 2005, 22(6):78-90.View ArticleGoogle Scholar
- Sundaresan A, RoyChowdhury A, Chellappa R: A hidden Markov model based framework for recognition of humans from gait sequences. In Proceedings of the 2003 International Conference on Image Processing, ICIP 2003. Piscataway: IEEE; 2003:II-93–6 vol. 3.Google Scholar
- Kale A, Sundaresan A, Rajagopalan AN, Cuntoor NP, Roy-chowdhury AK, Krüger V: Identification of humans using gait. IEEE Trans. Image Process 2004, 13: 1163-1173. 10.1109/TIP.2004.832865View ArticleGoogle Scholar
- Aqmar MR, Shinoda K, Furui S: Robust gait-based person identification against walking speed variations. IEICE Trans. Inf. Syst 2012, 95: 668-676.View ArticleGoogle Scholar
- Reynolds D, Rose R: Robust text-independent speaker identification using gaussian mixture speaker models. Speech Audio Process., IEEE Trans. 1995, 3: 72-83. 10.1109/89.365379View ArticleGoogle Scholar
- Luc J, Gauvain C, Lee H: Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process 1994, 2: 291-298. 10.1109/89.279278View ArticleGoogle Scholar
- Leggetter CJ, Woodland PC: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang 1995, 9(2):171-185. 10.1006/csla.1995.0010View ArticleGoogle Scholar
- Kuhn R, Nguyen P, Junqua JC, Goldwasser L, Niedzielski N, Fincke S, Field K, Contolini M: Eigenvoices for speaker adaptation. In International Conference on Spoken Language Processing. Camberra: ASSTA; 1998:1771-1774.Google Scholar
- Gomez-Caballero F, Shinozaki T, Furui S: User identification using time-of-flight camera image streams. Inf. Process. Soc. Japan Tech Rep 2010, 2: 615-616.Google Scholar
- Gomez-Caballero F, Shinozaki T, Furui S, Shinoda K: Person authentication using 3D human motion. In Proceedings of the 2011 Joint ACM Workshop on Human Gesture and Behavior Understanding, J-HGBU ’11. New York: ACM; 2011:35-40.View ArticleGoogle Scholar
- Vapnik N: Statistical Learning Theory. New York: Wiley; 1998.Google Scholar
- Oggier T, Lehmann M, Kaufmann R, Schweizer M, Richter M, Metzler P, Lang G, Lustenberger F, Blanc N: An all-solid-state optical range camera for 3D real-time imaging with sub-centimeter depth resolution (SwissRanger). Proc. SPIE Proceedings 2004, 5249: 534-545. 10.1117/12.513307View ArticleGoogle Scholar
- Bianchi L, Gatti R, Lombardi L, Lombardi P: Tracking without background model for time-of-flight cameras. In Advances in Image and Video Technology. 3rd Pacific Rim Symposium on Advances in Image and Video Technology. Lecture Notes in Computer Science, vol 5414. Berlin: Springer; 2009:726-737.Google Scholar
- Eberley D: Skeletonization of 2D binary images. Tumwater: Geometric Tools; 2008. . Accessed 06 Aug 2013 http://www.geometrictools.com/Documentation/Skeletons.pdf Google Scholar
- Kalman RE: A new approach to linear filtering and prediction problems. J. Basic Eng 1960, 82(1):35-45. http://dx.doi.org/10.1115/1.3662552 http://fluidsengineering.asmedigitalcollection.asme.org/article.aspx?articleid=1430402 10.1115/1.3662552View ArticleGoogle Scholar
- Reynolds D: Automatic speaker recognition using gaussian mixture speaker models. Lincoln Lab. J 1995, 8(2):173-192. http://www.ll.mit.edu/publications/journal/journalarchives08-2.html#4 Google Scholar
- Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc 1977, 39(Series B):1-38.MathSciNetGoogle Scholar
- Mendoza M, Pérez De La Blanca N: Applying space state models in human action recognition: a comparative study. Articulated Motion and Deformable Objects 2008, 5098: 53-62. 10.1007/978-3-540-70517-8_6View ArticleGoogle Scholar
- Starner T, Weaver J, Pentland A: Real-time american sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell 1998, 20: 1371-1375. 10.1109/34.735811View ArticleGoogle Scholar
- Kuhn R, Junqua JC, Nguyen P, Niedzielski N: Rapid speaker adaptation in Eigenvoice space. IEEE Trans. Speech Audio Process 2000, 8(6):695-707. 10.1109/89.876308View ArticleGoogle Scholar
- Thyes O, Kuhn R, Nguyen P, Junqua JC: Speaker identification and verification using eigenvoices. In Sixth International Conference on Spoken Language Processing. Baixas: ISCA; 2000:1-3.Google Scholar
- Lee CH, Lin CH, Juang BH: A study on speaker adaptation of continuous density HMM parameters. In 1990 International Conference on Acoustics, Speech, and Signal Processing. Albuquerque; 3–6 April 1990.Google Scholar
- Leggetter CJ, Woodland PC: Flexible speaker adaptation using maximum likelihhod linear regression. Proc. ARPA Spoken Lang. Technol. Workshop 1995, 9: 110-115.Google Scholar
- Gales MJF, Woodland PC: Mean and variance adaptation within the MLLR framework. Comput. Speech Lang 1996, 10(4):249-264. 10.1006/csla.1996.0013View ArticleGoogle Scholar
- Reynolds DA, Quatieri TF, Dunn RB: Speaker verification using adapted gaussian mixture models. Digit. Signal Process 2000, 10(1–3):19-41.View ArticleGoogle Scholar
- Reynolds DA, Campbell WM: Text-independent Speaker Recognition. Berlin: Springer; 2008.View ArticleGoogle Scholar
- Cristianini N, Shawe-Taylor J: Support Vector Machines. Cambridge: Cambridge University Press; 2000.View ArticleGoogle Scholar
- Campbell WM: Generalized linear discriminant sequence kernels for speaker recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Piscataway: IEEE; 2002:I-161–I-164.Google Scholar
- Campbell W, Sturim D, Reynolds D, Solomonoff A: SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. Piscataway: IEEE; 2006:97-97.Google Scholar
- Stolcke A, Ferrer L, Kajarekar S, Shriberg E, Venkataraman A: MLLR transforms as features in speaker recognition. In Proceedings of the 9th European Conference on Speech Communication and Technology. Baixas: ISCA; 2005:2425-2428.Google Scholar
- Ferras M, Leung CC, Barras C, Gauvain JL: MLLR techniques for speaker recognition. Odyssey-2008 (2008), paper 023Google Scholar
- Young SJ, Kershaw D, Odell J, Ollason D, Valtchev V, Woodland P: The HTK Book Version 3.4. Cambridge: Cambridge University Press; 2006.Google Scholar
- Joachims T: Advances in kernel methods. In Making Large-scale Support Vector Machine Learning Practical. Edited by: Schölkopf B, Burges CJC, Smola AJ. Cambridge: MIT Press; 1999:169-184.Google Scholar
- Morik K, Brockhausen P, Joachims T: Combining statistical learning with a knowledge-based approach - a case study in intensive care monitoring. In Proceedings of the 16th International Conference on Machine Learning (ICML-99). San Francisco: Morgan; 1999.Google Scholar
- Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A: Real-time human pose recognition in parts from single depth images. In IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.: IEEE; 2011.Google Scholar
- McNemar Q: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 1947, 12(2):153-157. 10.1007/BF02295996View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.