An automated chimpanzee identification system using face detection and recognition
© Loos and Ernst; licensee Springer. 2013
Received: 31 January 2013
Accepted: 31 July 2013
Published: 19 August 2013
Due to the ongoing biodiversity crisis, many species including great apes like chimpanzees are on the brink of extinction. Consequently, there is an urgent need to protect the remaining populations of threatened species. To overcome the catastrophic decline of biodiversity, biologists and gamekeepers recently started to use remote cameras and recording devices for wildlife monitoring in order to estimate the size of remaining populations. However, the manual analysis of the resulting image and video material is extremely tedious, time consuming, and cost intensive. To overcome the burden of time-consuming routine work, we have recently started to develop computer vision algorithms for automated chimpanzee detection and identification of individuals. Based on the assumption that humans and great apes share similar properties of the face, we proposed to adapt and extend face detection and recognition algorithms, originally developed to recognize humans, for chimpanzee identification. In this paper we do not only summarize our earlier work in the field, we also extend our previous approaches towards a more robust system which is less prone to difficult lighting situations, various poses, and expressions as well as partial occlusion by branches, leafs, or other individuals. To overcome the limitations of our previous work, we combine holistic global features and locally extracted descriptors using a decision fusion scheme. We present an automated framework for photo identification of chimpanzees including face detection, face alignment, and face recognition. We thoroughly evaluate our proposed algorithms on two datasets of captive and free-living chimpanzee individuals which were annotated by experts. In three experiments we show that the presented framework outperforms previous approaches in the field of great ape identification and achieves promising results. Therefore, our system can be used by biologists, researchers, and gamekeepers to estimate population sizes faster and more precisely than the current frameworks. Thus, the proposed framework for chimpanzee identification has the potential to open up new venues in efficient wildlife monitoring and can help researches to develop innovative protection schemes in the future.
According to the International Union for Conservation of Nature (IUCN), about 22% of the mammal species worldwide are threatened or extinct . The current biodiversity crisis is observed all over the world. Primates are hit by the crisis and belong to a species that is severely endangered. Walsh et al.  reported a decrease of ape populations in western Equatorial Africa by more than a half between 1983 and 2000. A similar survey was done by Campbell et al. . They observed a 90% decrease of chimpanzee sleeping nests in Côte d’Ivoire between 1990 and 2007.
Those agitating results demonstrate the urgent need to intensify close surveillance of this threatened species. Many protective areas have already been established. However, effectively protecting the animals requires a good knowledge of existing populations and changes of population sizes over time. Individual identification of animals is not only a prerequisite for measuring the success of implemented protection schemes but also for many other biological questions, e.g., wildlife epidemiology and social network analysis. However, it is a labor-intensive task to estimate population sizes in the wild. Therefore, noninvasive monitoring techniques that take advantage of automatic camera traps are currently under development, and the number of published studies that use camera traps or autonomous recording devices is highly increasing . However, the collected data are still evaluated manually which is a time- and resource-consuming task. Consequently, there is a high demand for automated algorithms to analyze remotely gathered video recordings. Especially so-called capture-mark-recapture methods, commonly used in ecology, could benefit from an automated system for identification of great apes.
This paper shows that technology developed for human face detection and identification can provide substantial assistance in evaluating data gathered by camera traps. We summarize and extend our previous work from -  on face detection and individual identification of African great apes for wildlife monitoring and present an automated framework to detect and subsequently identify free-living as well as captured chimpanzee individuals in uncontrolled environments.
Some aspects of this paper have been published in our previous work. We extended our approaches from  and  to improve the system’s robustness against pose variations, difficult lighting conditions, and partial occlusions . However, in this paper we present a complete system for chimpanzee photo identification including face detection, face alignment, and face recognition. We significantly improve previous approaches by fusing global and local descriptors in a decision-based manner.
Presentation of an automated framework for primate photo identification including face detection, face alignment and lighting normalization, as well as identification.
Extension and improvement of our previous work to achieve better performance and more robustness against pose variation, lighting conditions, facial expressions, noncooperative subjects, and even partial occlusion by branches or leafs.
Evaluation of the proposed system on two realistic real-world datasets of free-living and captured chimpanzee individuals gathered in uncontrolled environments.
The outcome of this paper builds the basis of an automated system for primate identification in photos and videos, which could open up new venues in efficient wildlife monitoring and biodiversity conservation management.
The remaining paper is organized as follows: In the subsequent section, we give a short recap of the existing work in the field of animal detection and identification and our own previous work. A detailed description of the proposed system, including face and facial feature detection, face alignment, and individual identification is presented in Section 3. We thoroughly evaluate our system on two datasets of free-living and captive chimpanzees in Section 4 using an open-set identification scheme. Finally, in Section 5, we conclude this paper and give further ideas of improvement.
2 Related work
The field of computer vision and pattern recognition has been an active research field for years. Even though automatic image and video processing techniques become more and more important for the detection and identification of animals, only few publications do exist dealing with that topic. In this section we give a brief overview of the existing technologies for the detection and identification of animals and briefly review face detection and recognition technologies developed for human identification.
2.1 Visual detection
Automatic face detection has been an important research area for many years now and has extensively been done for human faces. Rowley et al.  published good results with a neural network-based face detector more than 10 years ago. However, the system was not real-time capable at that time. Some years later, Viola and Jones  developed and published the probably best-known algorithm for real-time object detection. It uses AdaBoost  for feature selection and learning and benefits from the integral image to extract Haar-like features very fast. Numerous improvements and variants have been published in the literature afterwards - .
Whereas plenty of work has already been done in the field of human face detection, only few publications can be found that deal with automatic detection, tracking, and analysis of animals in videos. Wawerla et al.  describe a system to monitor the behavior of grizzly bears at the arctic circle with camera traps. They use motion shapelet features and AdaBoost to detect bears in video footage. Burghardt and Calic  worked on the detection and tracking of animal faces based on the Viola-Jones detector and a low-level feature tracker. They trained the system on lion faces and showed that the results can be used to classify basic locomotive actions. Spampinato et al. [19, 20] proposed a system for fish detection, tracking, and species classification in natural underwater environment. They first detect fishes using a combination of a Gaussian mixture model and moving average algorithms. The detected objects are then tracked using an adaptive mean shift algorithm. Finally, species classification is performed by combining texture and shape features to a powerful descriptor.
2.2 Visual identification
One of the most established and well-studied approaches for face recognition are appearance-based methods. Here the two-dimensional gray-level images with size w × h are represented as vectors of size n = w · h. Thus, often simple pixel-based features are used as face descriptors. Since this high-dimensional feature space is too large to perform fast and robust face recognition in practice, dimensionality reduction techniques like principal component analysis (PCA) , linear discriminant analysis (LDA) , or locality preserving projections (LPP)  can be used to project the vectorized face images into a smaller dimensional subspace. These methods are often referred to as Eigenfaces, Fisherfaces, and Laplacianfaces, respectively. Recently, a random projection has also been successfully used for face recognition in combination with a sparse representation classification (SRC) scheme . Random projection matrices can simply be generated by sampling zero-mean independent identically distributed Gaussian entries. This approach was extended by . The authors suggest to use Gabor features instead of pixel-based features, which greatly improve the recognition accuracy while at the same time reduce the computational cost when dealing with occluded face images.
While biometric identification of humans has been an active research topic for decades, individual recognition of animals has only been addressed in the recent past. Ardovini et al.  for instance proposed a system for semiautomatic recognition of elephants from photos based on shape comparison of the nicks characterizing the elephant’s ears. A similar approach was presented by Araabi et al. , who proposed a string matching method as part of a computer-assisted system for dolphin identification from images of their dorsal fin. Also Burghardt et al. [28, 29] presented a fully automatic system for penguin identification. After a penguin has been detected, unique individual-specific spot patterns on the penguin’s coat are used for identification. More recently a method called StripeCodes for zebra identification was published by Lahiri et al. . The authors claim that their algorithm efficiently extracts simple image features used for the comparison of zebra images to determine whether the animal has been observed before or not.
To the best of our knowledge, the problem of nonhuman primate identification has not yet been addressed by other researchers so far.
2.3 Own work
The aforementioned approaches use characteristic coat patterns or other individually unique biometrics like the pattern of fur and skin as well as unique nicks in ears or dorsal fins to distinguish between individuals. Unfortunately, such an approach is often infeasible for the identification of great apes since unique coat markings are not existent or cannot be used because of the limited resolution of video recordings.
Based on the assumption that humans and our closest relatives share similar properties of the face, we suggested to use and adapt face recognition techniques, originally developed to recognize humans, for the identification of great apes within the SAISBECO project ( http://www.saisbeco.com). In  we showed that state-of-the-art face recognition techniques are capable to also identify chimpanzees and gorillas. Based on these results, we significantly improved the performance of the proposed system by using Gabor features in combination with LPP for dimensionality reduction in . The SRC scheme was used to assign identities to the facial images. Although the results of  are very promising, the accuracy of the system drops significantly if nonfrontal face images are used for testing. Another drawback is our assumption that faces and facial feature points were already detected properly for alignment and recognition. We overcame the latter issue by combining face and facial feature detection as well as face recognition and presented an automated identification system for chimpanzees in . However, we only used simple pixel information in the recognition part of the proposed system. Thus although the achieved results were very promising for a first approach, the accuracy of the system was limited due to the lack of robustness against difficult lighting situations, pose, partial occlusion, and the various number of occurring expressions.
In this paper we show how to overcome this limitation by using more sophisticated face descriptors in combination with a powerful feature space transformation technique. By combining global and local features, the system’s performance and robustness against above-mentioned situations can be further increased . However, this technique has never been used within a complete identification framework for great apes including face detection, face alignment, and face recognition. Therefore, in this paper we propose, design, and evaluate an automated face detection and recognition system for chimpanzees in wildlife environments.
3 Proposed system
3.1 Face and facial feature detection
The first feature type describes the local gradient direction. Sobel kernels of size 3 × 3 extract the gradient s x and s y in x- and y-direction, similar to . In homogeneous regions where s x and s y equals 0, the final feature is encoded as 0; otherwise, the feature encodes the result of atan2(s y ,s x ) quantized to the range 1…q. Experiments indicated that 35 is a good choice for q and results in a quantization interval of slightly more than 10°. We use census features  (also known as local binary patterns ) as a second feature type. These features describe the local brightness changes within a 3 × 3 neighborhood. The center pixel intensity is compared with its eight neighbors. The result is encoded as an 8-bit string that shows which neighboring pixels are less bright than the center pixel. The 3 × 3 local features are complemented by the third feature type that includes enlarged areas. Therefore, we encode structures by resized versions of census features that are calculated on image regions of 3u × 3v pixels. These structure features are a superset of the census features. Nevertheless, considering census features separately is justified well in terms of processing speed because they can be calculated much faster for the whole image.
The distinction between pixel-based gradient features, census features, and region-based structure features is important for real-time requirements. Pixel-based features are calculated beforehand for the whole image and reused when sliding the analysis window over the image. Region-based features have to be calculated separately for each analysis window. Pixel-based features are suited for fast candidate search, whereas more significant region-based features improve the performance of candidate verification. We choose a model size of 24 × 24 pixels that is commonly used for human face detection and obtain 484 gradient features, 484 census features, and 8,464 structure features. The first stages offer a quick candidate search and the final stages provide a more accurate but slower classification. The training procedure starts with randomly chosen nonface data for the initial stage. Following stages are trained with nonface data that are gathered by bootstrapping the model on images without ape faces. More details about the training procedure can be found in our previous work .
A 3 × 3 mean filter reduces noise in the input image. We resize the filtered image with different scaling factors and generate an image pyramid to detect faces of arbitrary size. The detection model of size 24 × 24 pixels analyzes each pyramid level with a coarse to fine search to further improve speed. Therefore, the detection model is shifted with a step size of about 6 pixel across each pyramid level. The neighborhood of a grid point is scanned more thoroughly only if the grid point produced a high face correlation score.
After the face detection process, we apply a subsequent eye search in all detected face regions with the same algorithms. We trained a detection model for each eye with a reduced size of 16 × 16 pixels. Only the eye regions were cut out from the annotated training data for this purpose. The eye models are simpler and less powerful compared to the face model and comprise five stages only and less features, because searching within face regions will lead to few false positives. Selected areas in all face regions around the left and right eye are scanned with the appropriate eye model in different scaling levels. Fixed eye markers of the face model are used if an eye could not be detected by the eye search.
3.2 Face alignment and lighting normalization
3.3 Individual identification
The individual identification is the main part of the proposed system and consists of three steps: feature extraction, feature space transformation, and classification. In the first step we extract global as well as local visual features that are both well suited for discrimination. As those descriptors are too high dimensional to perform fast and robust face recognition in practice, we apply a feature space transformation technique called LPP  to achieve a lower dimensional subspace with only little loss of information that is important for identification. These lower dimensional feature vectors are then used for classification. After classifying the global and local feature vectors separately, we apply a decision fusion technique to get the final result.
3.3.1 Feature extraction
Since global features gather holistic information of the face and local descriptors around facial points represent intrinsic factors, both should be used for classification. Additionally, it has been reported in the literature that different representations misclassify different patterns . Therefore, various features offer complementary information which can be used to improve the recognition results. As for global features we propose to use Gabor features, which are known to perform well in pattern recognition tasks. The complimentary local descriptor is SURF, a powerful visual descriptor of interest points in an image.
where the wave vector k μ,ν is defined as with and . The maximum frequency is denoted as k max and f is the spacing between kernels in the frequency domain. Furthermore, σ represents the ratio of the Gaussian window to the wavelength.
where is a column vector representing the normalized and vectorized version of the magnitude matrix M μ,ν which was down-sampled by factor ρ.
For feature extraction we use five scales and eight orientations for the generation of Gabor kernels with size of 31 × 31. We chose to set , , and σ = π. After convolving an image with the resulting 40 Gabor wavelets, we down-sample the magnitude matrix M μ,ν by a factor of ρ = 8 by using a bilinear interpolation.
3.3.2 Feature space transformation
The resulting feature vectors , with k = 1,⋯,N, can then be used for classification.
LPP  assumes that the feature vectors reside on a nonlinear submanifold hidden in the original feature space. LPP tries to find an embedding that preserves local information by modeling the manifold structure of the feature space using a nearest-neighbor graph. First, an adjacency graph G is defined. An edge is put between two nodes k and j if they belong to the same class C.
where D is a diagonal matrix whose entries are column sums of S and L = D − S is the so-called Laplacian matrix. The k th column of matrix X is x k .
Details about the algorithm and the underlying theory can be found in .
Sparse representation classification
where ⊙ denotes the elementwise multiplication known as Hadamard product. The vector is called the characteristic function of class i. It is a filter vector which is 1 for all training samples of class i and 0 elsewhere. A detailed description of SRC can be found in .
Support vector machines
In the proposed system, we use a support vector machine (SVM)  for the classification of local features. SVM is a discriminative classifier, attempting to generate an optimal decision plane between feature vectors of the training classes. Often, the classification with linear separation planes is not possible in the original feature space for real-world applications. Using a so-called kernel trick, the feature vectors are transformed into a higher dimensional space in which they can be linearly separated. We use a radial basis function (RBF) as kernel in this paper.
3.3.4 Decision fusion
The decision fusion paradigm we use in this paper was influenced by ideas of . A parallel ensemble classifier which fuses the rank outputs of different classifiers is used to combine the results of local and global features. In contrast to the parallel fusion scheme proposed in , where only a single weighting function for rank and constant c is used as nonlinear rank sum method, we weight the results of both classifiers using different weighting functions for every classifier. Additionally, the confidences of each classifier can be taken into account when generating the weighting function , where represents the confidence of SRC or SVM for rank . For SRC we use the vector of residuals from Equation 11 as confidence measure, while for SVM the probability estimates of LibSVM  can be utilized. The probability estimates can simply be converted into match scores by negating the probabilities. Details on the estimation of probabilities for SVM can be found in . The final score vector , where C is the number of classes, is then simply the sum of both weighting functions: s f = w SRC + w SVM. Finally, s f is ordered ascendingly to obtain the final result.
4 Experiments and results
4.1 Dataset description
Overview of the datasets we used in our experiments
4.2 Evaluation measures and experiment design
Since the face detection stage will produce false-positive detections, we decided to use an open-set identification scheme to deal with that issue. We use the performance statistics described in -  to evaluate our system. In open-set identification, first the system has to decide if the probe p j represents a sample of an individual in the gallery or not. If the system decided that the individual in the probe is known to the system, then it also has to report the identity of the individual. While for a closed-set identification, the question is how many test images are correctly classified as a certain individual, two more types of errors can occur for an open-set classification. Additional to false classifications, it is also possible that the system rejects known individuals or accepts impostors. Let be the probe set that contains face images of chimpanzees in the gallery and the probe set that contains samples of chimpanzees that are not known to the system. When a probe p j is presented to the system, a score vector can be calculated, where C is the number of known individuals in the database. The entries of this vector are scaled between 0 and 1. The smaller the value, the higher the confidence of the classifier. For SRC we use the vector of residuals r from Equation 11 as confidence measures, while for the proposed decision fusion technique, the combined weightings s f can be used as score values for each class. For classification by SVM, the probabilities of the classifier can be negated to assign them to the score vector s f .
An ideal system would have a detection and identification rate of 1.0 and a false alarm rate of 0.0, which means that all individuals are detected and classified correctly and there are no false alarms. In practice however, both measures have to be traded-off against each other. This trade-off is shown in a receiver-operating characteristic (ROC) by iteratively changing the operating threshold τ. Another important performance statistic is the equal error rate (EER). It is reached when the false alarm rate is equal to the false detection and identification rate P FA = 1 − P DI.
In addition to false-positive detections, one individual at a time is removed from the training set and presented it as an impostor to test the system’s capability to reject unknown chimpanzees. This procedure is repeated C times, where C is the number of individuals in the dataset, such that every chimp takes the role of an impostor once. To get valid results, we additionally apply a tenfold stratified cross validation. Images of false-positive detections as well as all pictures of the unknown individual remain in the test set for all ten folds and are not used for training. We only consider detections with a minimum size of 64 × 64 pixels for identification, which dramatically decreases the number of false-positive detections. Furthermore, we only focus on individuals with at least five detected face images in the database to get an appropriate number of training images for each class. This limitation results in 24 individuals for the ChimpZoo and 48 subjects for ChimpTaï dataset. After aligning the detected face images as described in Section 3.2, we apply a histogram equalization for lighting normalization. To make the results comparable, we chose to have a feature dimension of 160 for all applied feature space transformation techniques. For the local SURF features, we transform the resulting feature vectors separately into a smaller dimensional subspace of size 50 for every of the six used facial fiducial points before concatenating them to the final feature vector. This results in a local feature vector of size 6 × 50 = 300.
4.3 System evaluation
4.3.1 Face detection
4.3.2 Face identification
Experiment 1: influence of visual features and feature space transformation
EER for Gabor and pixel-based features for feature space transformation
Since global Gabor features in conjunction with LPP achieves the best results in the first experiment, this combination should be used for holistic face recognition for primates. However, in the next experiment we will show that this algorithm can still be enhanced by additionally using locally extracted SURF features and our proposed decision-based fusion scheme.
Experiment 2: combination of global and local features
EER for global Gabor and local SURF features and proposed parallel decision fusion scheme
It is obvious that our proposed fusion scheme performs better than global and local features alone. Therefore, the idea of using the confidences of both classifiers improves the performance of the face recognition algorithm for chimpanzee faces in real-world environments.
However, we still used manually annotated eye coordinates for alignment and estimation of facial fiducial points for local feature extraction. In the final experiment we use the automatically detected facial markings for this purpose.
Experiment 3: manually annotated vs. automatically detected facial markings
EER of the proposed identification algorithm if alignment was applied
If manual markings were used for alignment and estimation of the facial fiducial points for local feature extraction, the proposed algorithm performs best. However, if we use automatically detected eye coordinates, the performance of the algorithms is only slightly worse than for manually annotated markings. This is because the automatic detection of eye coordinates is not always as accurate as manually detected ones. Another explanation is that for the automatically detected markings, it was only possible to estimate the coordinates for local feature extraction based on the location of both eyes. For the manually annotated ones, however, we additionally used the annotated location of the mouth to estimate these locations more precisely. Therefore, the local feature extraction is much more accurate if an exact location of the mouth region is available.
P FR and P FC at EER for both datasets
P FR at EER
P FC at EER
It can be seen that for the ChimpZoo dataset, the main contribution to the overall error rate of the system is caused by falsely rejected faces of genuine individuals with P FR of 12.53%. Only 3.52% was due to false classifications. This shows that many facial images of known identities were rejected as impostors because of too much pose variation, facial expressions, or occlusions. For the ChimpTaï dataset, however, the system’s performance is almost equally caused by false classification, with P FC of 17.73% and P FR of 13.52%. This shows that the ChimpTaï dataset is much more challenging than the ChimpZoo dataset because it was gathered in a wildlife environment. Furthermore, the ChimpTaï dataset contains twice as much individuals at a much lower quality which again explains the strong influence of false classifications to the overall error of the proposed system.
In the ongoing biodiversity crisis, many species including great apes like chimpanzees for instance are threatened and need to be protected. An essential part of efficient biodiversity and wildlife conversation management is population monitoring and individual identification to estimate population sizes, asses viability, and evaluate the success of implemented protection schemes. Therefore, the development of new monitoring techniques using autonomous recording devices is currently of intense research . However, manually processing large amounts of data is a tedious work and therefore extremely time-consuming and highly cost-intensive.
To overcome these issues, we presented an automated identification framework for chimpanzees in real-world environments in this paper. Based on the assumption that humans and chimpanzees share similar properties of the face, we proposed to use the face detection and recognition technology for identification of great apes in our previous work - . In this paper we successfully combined face detection, face alignment, and face recognition to a complete identification system for chimpanzee faces in real-world environments. We successfully combined globally extracted holistic features and local descriptors for identification using a decision fusion scheme. As global features we used the well-established Gabor features. We transformed the resulting high-dimensional feature vectors into a smaller, more discriminating subspace using LPP. For classification we used an algorithm called SRC. Since it is known from the literature that different features encode different information, we also extract SURF around local facial feature points to make the system more robust against difficult lighting situations, various poses and expressions as well as partial occlusion by branches, leafs, or other individuals. We separately transformed the resulting SURF descriptors into a lower dimensional subspace for every facial fiducial point. After concatenating the resulting low-dimensional descriptors to get one comprehensive vector of local features, we use SVM with RBF kernel for classification. We combine the classification results of global and local features in a decision-based manner by taking the confidences of both classifiers into account. Furthermore, we thoroughly evaluated our proposed algorithm on two datasets of captive and free-living chimpanzee individuals which were annotated by experts using an open-set classification scheme. In the three experiments we showed that our approach outperforms previously presented algorithms for chimpanzee identification. Although both datasets were gathered in real-world environments, opposed to most datasets used to evaluate algorithms for human face recognition, our system performs very well and achieves promising results. Therefore, the presented framework can be applied in real-life scenarios for identification of great apes. Thus, the system will assist biologists, researchers, and gamekeepers with tedious annotation work of gathered image and video material and therefore has the potential to open up new venues for efficient and innovative wildlife monitoring and biodiversity conservation management. Currently, intensive pilot studies using autonomous infrared-triggered remote video cameras are conducted in Loango National Park, Gabon  and Taï National Park, Côte d’Ivoire . These studies have provided promising results in both number of species detected, as well as visitation rates, demonstrating the potential of such an approach for biomonitoring. Our proposed framework for automatic detection and identification of chimpanzees will help researchers to efficiently scan and retrieve video sequences that are important for biologists, i.e., where chimpanzees or other great apes are present. After providing an annotated dataset of labeled chimpanzee faces, the system will also be able to recognize known and reject unknown individuals. Although grouping similar-looking faces of unknown individuals remains a future work, such an approach could help biologists to expand the dataset of known chimpanzees over time and successively improve the accuracy of the system. Hence, biologists are then able to conduct biodiversity time series analysis to assess whether significant fluctuations in biodiversity occur.
Although the presented system achieved very good results on both datasets, we hope to further increase the performance of the system by extending the approach for face recognition in video. Because the temporal component of video can contain important information for identification, we expect further improvement of the system by exploiting temporal information. For example, finding the shots in a video sequence which are best suitable for face recognition in terms of pose, motion blur, and lighting could be one approach to extend the system towards video. Furthermore, frame weighting algorithms or techniques like super-resolution are conceivable to take advantage of temporal information in video sequences. In addition, automatic detection of more facial features could lead to better alignment and more precise localization of facial fiducial points for local feature extraction, which will further improve the performance of the system.
This work was funded by the German Federal Ministry of Education and Research (BMBF) under the ‘Pact for research and innovation’. We thank the Ivorian authorities for the long-term support, especially the Ivorian Ministére de l’Environnement, des Eaux et Forêts and the Ministére de l’Enseignement Supérieur et de la Recherche Scientifique, the directorship of the Taï National Park, the OIPR, and the CSRS in Abidjan. Financial support is gratefully acknowledged from the Swiss Science Foundation. We would like to thank especially Dr. Tobias Deschner for collecting videos and pictures over the last years and for providing invaluable assistance during the data collection. We thank all the numerous field assistants and students for their work on the Taï Chimpanzee Project. We thank the Zoo Leipzig and the Wolfgang Köhler Primate Research Center (WKPRC), especially Josep Call and all the numerous research assistants, zoo-keepers, and Josefine Kalbitz for their support and collaboration. We also thank Laura Aporius for providing the videos and pictures in 2010. This work was supported by the Max Planck Society. We also thank Laura Aporius for the annotation of data.
- Hilton-Taylor C, Stuart SN: Wildlife in a Changing World - an Analysis of the 2008 IUCN Red List of Threatened Species. Gland Switzerland; IUCN 2009. http://www.iucnredlist.org/technical-documents/references Google Scholar
- Walsh PD, Abernethy KAA, Bermejo M: Catastrophic ape decline in western equatorial Africa. Nature 2003, 422: 611-614. 10.1038/nature01566View ArticleGoogle Scholar
- Campbell G, Kuehl H, Kouame PN, Boesch C: Alarming decline of west African chimpanzees in Côte d’Ivoire. Current Biology 2008, 18(19):R904-905. 10.1016/j.cub.2008.07.065View ArticleGoogle Scholar
- Rowcliffe JM, Carbone C: Surveys using camera traps: are we looking to a brighter future? Anim. Conserv 2008, 11(3):185-186. 10.1111/j.1469-1795.2008.00180.xView ArticleGoogle Scholar
- Loos A, Pfitzer M, Aporius L: Identification of great apes using face recognition. In 19th European Signal Processing Conference (EUSIPCO). Barcelona; 29 August–2 September 2011.Google Scholar
- Loos A: Identification of great apes using Gabor features and locality preserving projections. In 1st ACM International Workshop on Multimedia Analysis for Ecological Data (MAED) in Conjunction with ACM Multimedia. New York: ACM; 2012.Google Scholar
- Loos A, Ernst A: Detection and identification of chimpanzee faces in the wild. In IEEE International Symposium on Multimedia. CA: Irvine; 10–12 December 2012.Google Scholar
- Loos A: Identification of primates using global and local features. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vancouver; 26–31 May 2013.Google Scholar
- Ernst A, Küblbeck C: Fast face detection and species classification of African great apes, 8th IEEE International Conference on Advanced Video and Signal Based Surveillance. New York: IEEE; 2011.Google Scholar
- Bay H, Ess A, Tuytelaars T, Gool LV: Speeded-up robust features (SURF). Comput. Vis. Image Underst 2008, 110(3):346-359. 10.1016/j.cviu.2007.09.014View ArticleGoogle Scholar
- Rowley HA, Baluja S, Kanade T: Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell 1998, 20: 23-38. 10.1109/34.655647View ArticleGoogle Scholar
- Viola P, Jones M: Rapid object detection using a boosted cascade of simple features, IEEE Conference on Computer Vision and Pattern Recognition, vol. 1. New York: IEEE; 2001.Google Scholar
- Freund Y, Schapire RE: A short introduction to boosting. J. Japanese Soc. Artif. Intell 1999, 14(5):771-780.Google Scholar
- Viola P, Jones M: Robust real-time object detection. Int. J. Comput. Vis 2002, 57(2):137-154.View ArticleGoogle Scholar
- Lienhart R, Maydt J: An extended set of haar-like features for rapid object detection, IEEE International Conference on Image Processing (ICIP), vol. 1. New York: IEEE; 2002.Google Scholar
- Wu B, Haizhou A, Chang H, Shihong L: Fast rotation invariant multi-view face detection based on real Adaboost, 6th IEEE International Conference on Automatic Face and Gesture Recognition. New York: IEEE; 2004.Google Scholar
- Wawerla J, Marshall S, Mori G, Rothley K, Sabzmeydani P: BearCam: automated wildlife monitoring at the Arctic Circle. Mach. Vis. Appl 2009, 20(5):303-317. 10.1007/s00138-008-0128-0View ArticleGoogle Scholar
- Burghardt T, Calic J: Real-time face detection and tracking of animals. In 8th Seminar on Neural Network Applications in Electrical Engineering. Belgrade: Serbia & Montenegro; 25–27 September 2006:27-32.Google Scholar
- Spampinato C, Giordano D, Di Salvo R: Automatic fish classification for underwater species behavior understanding. In ACM International Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams (ARTEMIS). Italy: Firenze; 29 October 2010:40-50.Google Scholar
- Spampinato C, Palazzo S, Boom B, van Ossenbruggen J, Kavasidis I, Di Salvo R, Lin FP, Giordano D, Hardman L, Fisher RB: Understanding fish behavior during typhoon events in real-life underwater environments. Multimedia Tools Appl 2012. http://scholar.google.com/citations?view_op=view_citation%26hl=de%26user=yJr6TqAAAAAJ%26citation_for_view=yJr6TqAAAAAJ:qjMakFHDy7sC 10.1007/s11042-012-1101-5Google Scholar
- Turk M, Pentland A: Eigenfaces for recognition. J Cogn. Neurosci 1991, 3: 71-86. 10.1162/jocn.1922.214.171.124View ArticleGoogle Scholar
- Belhumeur PN, Hespanha JP, Kriegman DJ: Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Pattern Anal. Mach. Intell 1997, 19(7):711-720. 10.1109/34.598228View ArticleGoogle Scholar
- He X, Yan S, Hu Y, Niyogi P, Zhang HJ: Face recognition using Laplacianfaces. IEEE Trans. Pattern Anal. Mach. Intell 2005, 27(3):328-40.View ArticleGoogle Scholar
- Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y: Robust face recognition via sparse representation. IEEE Trans. Parallel Patt. Anal. Mach. Intell 2009, 31(2):210-27.View ArticleGoogle Scholar
- Yang M, Zhang L: Gabor feature based sparse representation for face recognition with Gabor occlusion dictionary, vol. 6316. In European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science. Heidelberg: Springer; 2010:448-461.Google Scholar
- Ardovini A, Cinque L, Sangineto E: Identifying elephant photos by multi-curve matching. Pattern Recognit 2008, 41(6):1867-1877. 10.1016/j.patcog.2007.11.010View ArticleMATHGoogle Scholar
- Araabi BN, Kehtarnavaz N, McKinney T, Hillman G, Würsig B: A string matching computer-assisted system for dolphin photoidentification. Ann. Biomed. Eng 2000, 28(10):1269-1279.View ArticleGoogle Scholar
- Burghardt T, Campbell N: Individual animal identification using visual biometrics on deformable coat patterns. In 5th International Conference on Computer Vision Systems (ICVS). Bielefeld; 21–24 March 2007. http://biecoll.ub.uni-bielefeld.de/volltexte/2007/20/ Google Scholar
- Burghardt T: Visual animal biometrics - automatic detection and individual identification by coat pattern,. PhD Thesis, University of Bristol, 2008.Google Scholar
- Lahiri M, Warungu R, Rubenstein DI, Berger-Wolf TY, Tantipathananandh C: Biometric animal databases from field photographs: identification of individual zebra in the wild, ACM International Conference on Multimedia Retrieval (ICMR). New York: ACM; 2011.View ArticleGoogle Scholar
- Schapire RE, Singer Y: Improved boosting algorithms using confidence-rated predictions. Mach. Learn 1999, 37(3):297-336. 10.1023/A:1007614523901View ArticleMATHGoogle Scholar
- Fröba B: Verfahren zur Echtzeit-Gesichtsdetektion in Grauwertbildern. Aachen: Shaker; 2003.Google Scholar
- Zabih R, Woodfill J: Non-parametric local transforms for computing visual correspondence. In Third European Conference on Computer Vision Proceedings, Volume II. Stockholm, Sweden; 2–6 May 1994:151-158.Google Scholar
- Ojala T, Pietikäinen M, Harwood D: A comparative study of texture measures with classification based on featured distributions. Pattern Recognit 1996, 29: 51-59. 10.1016/0031-3203(95)00067-4View ArticleGoogle Scholar
- Gao Y, Wang Y, Feng X, Zhou X: Face recognition using most discriminative local and global features. International Conference on Pattern Recognition (ICPR). New York: IEEE; 2006.Google Scholar
- Wiskott L, Fellous JM, Krüger N, von der Malsburg C: Face recognition by elastic bunch graph matching. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19(7):775-779. 10.1109/34.598235View ArticleGoogle Scholar
- Liu C, Wechsler H: Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans. Image Proc 2002, 11(4):467-476. 10.1109/TIP.2002.999679View ArticleGoogle Scholar
- Xie S, Shan S, Chen X: Fusing local patterns of Gabor magnitude and phase for face recognition. IEEE Trans. Image Proc 2010., 19(5):Google Scholar
- Lowe D: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis 2004, 60(2):91-110.View ArticleGoogle Scholar
- Chang CC, Lin CJ: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Tech 2011., 2: http://dl.acm.org/citation.cfm?id=1961199 10.1145/1961189.1961199Google Scholar
- Gökberk B, Salah AA, Akarun L: Rank-based decision fusion for 3D shape-based face recognition. International Conference on Audio- and Video-Based Biometric Person Identification (AVBPA). Berlin Heidelberg: Springer-Verlag; 2005.Google Scholar
- Wu TF, Lin CJ, Weng R: Probability estimates for multi-class classification by pairwise coupling. J Mach. Learn. Res 2004, 5: 975-1005.MathSciNetMATHGoogle Scholar
- Ernst A, Ruf T, Küblbeck C: A modular framework to detect analyze faces for audience measurement systems. In Proceedings of the 2nd Workshop on Pervasive Advertising 2009 In conjunction with Informatik 2009. Germany: Lübeck; 2 October 2009:3941-3953.Google Scholar
- Phillips PJ, Rizvi S, Rauss P: The FERET evaluation methodology for face recognition algorithms. IEEE Trans. Pattern Anal. Mach. Intell 2000, 22: 1090-1104. 10.1109/34.879790View ArticleGoogle Scholar
- Phillips PJ, Grother P, Micheals R, Blackburn D, Tabassi E, Bone J: Face recognition vendor test 2002: evaluation report. (Nistir 6965, National Institute of Standards and Technology, 2003)View ArticleGoogle Scholar
- Phillips PJ, Grother P, Micheals R: Chapter 21: evaluation methods in face recognition. In Handbook of Face Recognition. London: Springer-Verlag; 2011:551-574.View ArticleGoogle Scholar
- Ahumada JA, Silva CEF, Gajapersad K, Hallam C, Hurtado J, Martin E, McWilliam A, Mugerwa B, O’Brien T, Rovero F, Sheil D, Spironello WR, Winarni N, Andelman SJ: Community structure and diversity of tropical forest mammals: data from a global camera trap network. Philos. Trans. R. Soc. London B 2011, 366(1578):2703-2711. 10.1098/rstb.2011.0115View ArticleGoogle Scholar
- Head J, Boesch C, Robbins M, Rabal L, Makaga L, Kuehl H: Effective socio-demographic population assessment of elusive species for ecology and conservation management. Ecol. Evol 2013. http://onlinelibrary.wiley.com/doi/10.1002/ece3.670/abstract 10.1002/ece3.670Google Scholar
- Hoppe-Dominik B, Kühl H, Radl G, Fischer F: Long-term monitoring of large rainforest mammals in the biosphere reserve of Taï National Park, Côte d’Ivoire. Afr. J. Ecol 2011, 49(4):450-458. 10.1111/j.1365-2028.2011.01277.xView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.