 Research
 Open Access
2D and 3D analysis of animal locomotion from biplanar Xray videos using augmented active appearance models
 Daniel Haase^{1}Email author and
 Joachim Denzler^{1}
https://doi.org/10.1186/16875281201345
© Haase and Denzler; licensee Springer. 2013
 Received: 31 January 2013
 Accepted: 31 July 2013
 Published: 12 August 2013
Abstract
For many fundamental problems and applications in biomechanics, biology, and robotics, an indepth understanding of animal locomotion is essential. To analyze the locomotion of animals, highspeed Xray videos are recorded, in which anatomical landmarks of the locomotor system are of main interest and must be located. To date, several thousand sequences have been recorded, which makes a manual annotation of all landmarks practically impossible. Therefore, an automatization of Xray landmark tracking in locomotion scenarios is worthwhile. However, tracking all landmarks of interest is a very challenging task, as severe selfocclusions of the animal and low contrast are present in the images due to the Xray modality. For this reason, existing approaches are currently only applicable for very specific subsets of anatomical landmarks. In contrast, our goal is to present a holistic approach which models all anatomical landmarks in one consistent, probabilistic framework. While active appearance models (AAMs) provide a reasonable global modeling framework, they yield poor fitting results when applied on the full set of landmarks. In this paper, we propose to augment the AAM fitting process by imposing constraints from various sources. We derive a general probabilistic fitting approach and show how results of subset AAMs, local tracking, anatomical knowledge, and epipolar constraints can be included. The evaluation of our approach is based on 32 realworld datasets of five bird species which contain 175,942 groundtruth landmark positions provided by human experts. We show that our method clearly outperforms standard AAM fitting and provides reasonable tracking results for all landmark types. In addition, we show that the tracking accuracy of our approach is even sufficient to provide reliable threedimensional landmark estimates for calibrated datasets.
Keywords
 Active appearance models
 Xray videography
 Landmark tracking
 Animal locomotion analysis
1 Introduction
For an evaluation of acquired data, anatomical landmarks  usually skeletal joints of the locomotor system such as hip joints, knee joints, intertarsal joints, and phalangeal joints [6, 7]  have to be located in the images. Most evaluations to date solely rely on human experts (e.g., [5, 6]), which is an extremely timeconsuming process and complicates the realization of largescale studies. An automation of this process would therefore greatly benefit research in the aforementioned areas [9]. However, as almost all parts of an animal’s skeletal system undergo severe selfocclusions during locomotion (cf. Figure 1), developing fully automatic tracking methods for this application is a challenging task.
In this paper, we address the issue of landmark tracking in Xray sequences of grounded locomotion of birds. We present a novel method which, unlike previous approaches, is able to track all landmarks used in locomotion analysis and can overcome many other practically relevant drawbacks of existing methods (see Subsection 1.2) using a unified, consistent, and probabilistic framework that combines the complementing paradigms of modeldriven and datadriven tracking.
1.1 Related work
For very simple scenarios of locomotion analysis, straightforward tracking approaches such as template matching can be applied [9]. Due to severe occlusions, however, template matching and a variety of other standard methods such as optical flow/KLT and its extensions [10–12], region tracking [13, 14], and SIFTbased tracking [15] were proven to be unsuited for Xray analyses in the challenging scenario at hand [16, 17]. A more advanced approach for skeletal tracking is based on image registration between recorded Xray images and backprojected CT scans [3, 18, 19]. However, in most cases this method is only feasible for medical applications, as a full CT scan is necessary for each subject to be analyzed.
An alternative, completely datadriven approach for robust template tracking in Xray sequences was recently proposed in [16]. As standard template tracking fails due to the severe occlusions, the idea is to divide the template to be tracked into certain subtemplates. For each frame, all subtemplates are matched to the target image individually, and the results of these subtemplates are then merged to obtain one consistent parameter transformation for the whole template. The important difference between [16] and existing subtemplatebased approaches such as [20–22] lies in the fusion of subtemplate results. While previous approaches employ a hard decision between occluded and nonoccluded subtemplates, the authors in [16] use a soft decision which exploits special properties of Xray images. It has proven to be well suited for Xray bone tracking under moderate occlusions (e.g., for the lower leg landmarks in the side view) [16]. However, due to its datadriven nature, landmarks undergoing severe occlusions (landmarks occluded by the torso, e.g., knee landmarks of the side view or feet landmarks of the top view, cf. Figure 1) cannot be handled.
To overcome such problems of datadriven approaches, modeldriven methods generally are able to estimate landmark positions  even for total occlusions  by using global context. One prominent example of global models are active appearance models (AAMs) [23–25]. Besides many applications for human face modeling (e.g., [24, 26, 27]) and medical image analysis (e.g., [28, 29]), AAMs have also been successfully applied to landmark tracking in Xray locomotion scenarios [17, 30]. One major problem in our scenario, however, is that the movement of the animals often is very complex. As a result, especially for the lower legs, landmark configurations during locomotion substantially differ from the mean landmark configuration, i.e., the motions are nonstationary [31, 32]. As discussed in [31] and [33], this situation drastically complicates the fitting of AAMlike models. Besides the nonstationary motion, another major problem is the nondiscriminative texture information of the lower leg landmarks (cf. landmarks 12 to 15 and 19 to 22 in Figure 1b), which additionally complicates the fitting process of AAMs. Thus, the aforementioned standard AAMbased approaches only work when neglecting the set of nonstationary landmarks, as in [17, 30, 34].
To combine the benefits of datadriven and modeldriven methods, several hybrid models were developed over time. One straightforward example are combined local models [35], where the shape is modeled globally, as for AAMs, but the texture is modeled locally around each landmark. A recently proposed probabilistic example of this approach are discriminative Bayesian active shape models [36], where many local detectors are used to estimate a global landmark configuration. Both approaches, however, model landmark motions similarly to AAMs and are thus very likely to suffer from the same problems as well.
1.2 Motivation
As mentioned in the last subsection, datadriven [16] as well as modeldriven approaches [17, 30] exist for landmark tracking in Xray locomotion analysis. However, all previously published works in this field suffer from at least one of the following shortcomings, which is a major drawback for the usage of these methods for actual zoological and biomechanical studies:

Only very specific anatomical landmark subsets can be tracked, e.g., the torso landmarks [17, 30] or the lower leg landmarks [16]. In addition, certain landmarks exist which are covered by neither of the current approaches, e.g., the lower leg landmarks of the top camera view (cf. landmarks 12 to 15 and 19 to 22 in Figure 1b).

Although modeldriven and datadriven approaches can generally complement each other (e.g., [36]), only one paradigm is used at a time [16, 17, 30, 34].

For datadriven approaches, merely landmarks of the side camera view are considered due to severe selfocclusions in the top camera view [16].

Modeling, tracking, and evaluation are performed solely using twodimensional (2D) approaches, although an accurate camera calibration can be obtained for newly recorded datasets [16, 17, 30, 34].

The validation of the tracking methods is performed only on very few realworld datasets, as obtaining groundtruth landmarks is a tedious and timeconsuming task [17, 30, 34].
The remainder of this paper is structured as follows. First, an overview of standard AAMs is given in Section 2, as AAMs form the baseline of our method. In Section 3, we present augmented AAMs as our approach for landmark tracking in Xray locomotion sequences. After deriving a general fitting framework, we describe the constraints used in our specific case. The validation of our approach is presented and discussed in Section 4.
2 Active appearance models
This section gives an overview of standard AAMs [23–25], which form the baseline of our augmented approach presented in Section 3. AAMs are parametric statistical models which describe the visual appearance of arbitrary object classes. The variation in object appearance is modeled by a shape component (represented by image landmarks) and a shapefree texture component. AAMs are trained from sample images with annotated landmark positions. Once learned, a trained model can be fit to unseen images automatically. In the following subsections, the basic training and fitting procedure of standard AAMs will be described.
2.1 AAM training
where g is an arbitrary shapenormalized texture with texture parameters b _{g}, P _{g} are the texture eigenvectors, and $\overline{\mathit{g}}=\frac{1}{N}\sum _{n=1}^{N}{\mathit{g}}_{n}$ is the mean texture of the samples.
To obtain a combined representation of both shape and texture, the third  albeit optional  step of AAM training is to merge shape and texture parameters into one parameter set. This is achieved by concatenating the variancenormalized shape and texture parameter vectors for each training sample and again applying PCA. Therefore, each object instance can then be represented by its combined parameters b _{c}. The final parameter count, i.e., the dimension of b _{c}, is then reduced by discarding parameters which explain only a small fraction of the total variance.
2.2 AAM fitting
In its original formulation [23–25], this problem was solved in an iterative manner by assuming a linear relationship δ b _{c} = A δ g between the necessary model parameter changes δ b _{c} and the current image difference δ g, where A can be learned in advance. In general, however, such a simple constant relationship between δ b _{c} and δ g does not exist, which can lead to suboptimal fitting results [40]. An alternative optimization approach for Equation 3 is the inverse compositional/projectout algorithm [40]. By decoupling shape and texture parameters, it allows for a very efficient alignment that eliminates many drawbacks of the original AAM fitting method.
Note, however, that our augmented AAM approach presented in Section 3 is independent of the actual optimization scheme  it is possible to base it on both the additive as well as the inverse compositional methods (cf. Subsection 3.1).
2.3 Multiview extension
While standard AAMs can only be used for a single camera view, possible extensions are available for scenarios which contain more than one camera, e.g., [41] or [42]. In our case, a biplanar image acquisition is usual, albeit also monocular sequences exist. In addition, for many previously recorded datasets from biological studies such as [7], a calibration of the camera setup is not available. Therefore, in our locomotion scenario, it is generally not possible to apply any of the methods mentioned above, as they rely on certain assumptions about the scene. However, it is still possible to exploit relationships between multiple camera views using multiview AAMs [43, 44], as shown in [30].
The construction of multiview AAMs is closely related to standard AAMs. Let K denote the number of camera views. As first step, the aligned landmark vectors ${\mathit{s}}_{n}^{\left(1\right)},\dots ,{\mathit{s}}_{n}^{\left(K\right)}$ of all camera views are concatenated into one vector s n′. Afterwards, PCA is applied to obtain the multiview shape model in the same manner as for standard AAMs. As for the multiview landmarks, for each training sample the texture vectors of all views are concatenated to form the vector g n′ and PCA is applied. Note that this multiview extension is used in exactly the same manner for augmented AAMs, which are presented in the following section.
3 Augmented AAM approach
In the following augmented AAMs, our extension of standard AAMs are presented. As stated in the motivation (cf. Subsection 1.2), the goal is to overcome poor fitting results in cases of nonstationary shape activities [31, 32] and nondiscriminative texture information, which is particularly true for the locomotion analysis scenario presented in this paper. We achieve this goal by augmenting the fitting process of standard AAMs by including various types of constraints. A general overview of augmented AAMs is shown in Figure 2. It depicts the different components which contribute to the final system, whereas most parts are directly based on the given training data. An AAM trained on all landmarks of the training data forms the baseline of our augmented AAM (‘full AAM training’ in Figure 2). The fitting step of this AAM is then augmented using constraints derived from (1) a standard AAM trained only on the subset of stationary (i.e., torso and upper leg) landmarks, (2) local tracking methods for lower leg landmarks, (3) anatomical knowledge, and (4) the epipolar geometry of the scene.
In Subsection 3.1, we first derive a general framework for the inclusion of AAM fitting constraints. The remainder of this section gives a detailed description of the particular constraints used for the application on locomotion sequences. In Subsection 3.6, the necessary conditions of our approach and the generalization ability to other scenarios is discussed.
3.1 AAM fitting with constraints
For the first likelihood term p(Ib _{c}), not the whole input image I is relevant, but only its sampled version g _{image} which is based on the AAM shape configuration specified by b _{c}, i.e., p(Ib _{c}) = p(g _{image}b _{c}). As for standard AAMs, we assume the fitting process to be initialized at a parameter combination close to the optimal value. The likelihood can then be modeled as a Gaussian distribution ${\mathit{g}}_{\text{image}}{\mathit{b}}_{\mathrm{c}}\sim \mathcal{N}({\mathit{g}}_{\text{model}},{\mathit{\Sigma}}_{{\mathit{g}}_{\text{image}}{\mathit{g}}_{\text{model}}})$ or equivalently p(g _{image}b _{c}) = p(δ g) with $\delta \mathit{g}\sim \mathcal{N}(0,{\mathit{\Sigma}}_{\delta \mathit{g}})$. The covariance matrix Σ _{ δ g } of the texture errors can be estimated in the training step of the AAM and is usually assumed to be diagonal due to its large dimensionality (cf. Subsection 2.2).
The likelihood term p(πb _{c}) of Equation 5 integrates constraints into the fitting process. Here, π is a vector which contains the differences between given target values (constraints) and the actual values based on the current AAM parameters b _{c}. We again assume a Gaussian distribution, i.e., $\mathit{\pi}{\mathit{b}}_{\mathrm{c}}\sim \mathcal{N}(0,{\mathit{\Sigma}}_{\mathit{\pi}})$. Concrete configurations of π for different types of priors will be presented in the following subsections. Note that if multiple prior types are used, as is the case in our scenario, Equation 5 contains one likelihood term for each prior type.
The prior term p(b _{c}) of Equation 5 can be modeled in various ways, e.g., using a uniform distribution (resulting in a maximum likelihood estimation) or a zeromean Gaussian distribution [37]. To favor model configurations with a low complexity, in this work we prefer the latter method.
As a result of the above considerations, maximizing Equation 5 is equivalent to minimizing its negative log likelihood, thus
As mentioned above, Equation 6 can be optimized using arbitrary methods. One possible approach is based on the standard additive AAM parameter update scheme [25], which is derived in [37] and is used in this work. However, it is also possible to reformulate Equation 6  i.e., AAM fitting with constraints  for the inverse composition/projectout approach [40], which in detail is described in [45].
3.2 Anchor AAM
The first type of constraints we use for fitting the fullbody AAM are the results of an ‘anchor AAM’ or ‘subset AAM,’ which is an AAM applied on the subset of stationary landmarks, i.e., the torso and upper leg landmarks (cf. Figure 1). We include the results using the tracked landmark locations as positional constraints. Therefore, π _{anchor} is the difference vector between target and current landmark positions. To estimate the reliability of the constrained positions and thus ${\mathit{\Sigma}}_{{\mathit{\pi}}_{\text{anchor}}}$, robust confidence measures derived from the AAM fitting process (e.g., [46] or [47]) can be applied.
3.3 Robust local tracking constraints
While standard AAMs have problems with landmarks located at distal limb segments such as the lower legs, the datadriven approach in [16] was specifically designed for tracking in Xray sequences containing occlusions. In former studies, the method was proven to be wellsuited for tracking the subset of lower leg landmarks of the side camera view, but it is inapplicable for landmarks with more severe occlusions such as the knee landmarks of the side view or feet landmarks of the top view. We include the tracking results for the lower leg landmarks as additional constraints π _{local} into the augmented AAM. As for π _{anchor}, the vector π _{local} is the difference between target and current landmark positions. For the estimation of the corresponding covariance matrix ${\mathit{\Sigma}}_{{\mathit{\pi}}_{\text{local}}}$, the same options as for the local detector used in [36] apply. In our case, due to the high accuracy of the local method [16], it is sufficient to use an isotropic covariance.
3.4 Anatomical constraints
 1.
Global thresholding and contour finding →wholebody segment
 2.
Iterative ellipse fitting on the whole body →torso segment
 3.
Removing the torso segment from the wholebody segment →leg segments
Here, the main problem is to find the correct correspondence between the two leg segments in the images and their anatomical counterparts. We propose to use the anchor AAM’s training data to train a regression model which can predict the correct correspondence for the entire sequence based on the AAM’s model parameters.
To quickly obtain values for Equation 7 during the fitting process, we precompute distance transformed images for each segment using the algorithm presented in [49, 50]. However, also, faster approximations for the distance transform such as [51] can be used, as small errors in the computed distances do not affect the overall result.
Because anatomical region constraints can only provide a coarse estimate for individual landmark positions, for the covariance matrix ${\mathit{\Sigma}}_{{\mathit{\pi}}_{\text{anatomical}}}$, we assume a scaled identity matrix σ^{2} I, where σ^{2} is chosen to be substantially smaller than the covariances of other priors. This has the effect that the fitting process at first is completely driven by the anatomical constraints. When, as a result, each landmark l _{ n } is aligned to its corresponding anatomical segment S(m), i.e., p _{ m } ∈ S(m), the vector π _{anatomical} becomes zero and the fitting procedure is governed by other constraints.
3.5 Epipolar priors
Equation 8 becomes zero if v _{ n } is located on the epipolar line F u _{ n } and vice versa. The covariance matrix ${\mathit{\Sigma}}_{{\mathit{\pi}}_{\text{epipolar}}}$ can be estimated by applying the points used for the estimation of F on Equation 8.
3.6 Generalization to other scenarios
The presented method was specifically designed for the skeletal locomotion tracking scenario at hand. For this particular case, Xray acquisition is a necessity, as all skeletal landmarks of interest must be observable. In addition, all parts of interest of an animal must remain in the field of view during the whole sequence, which generally implies the use of a treadmill. As the appearance of the animal is modeled using multiview AAMs [43, 44] (cf. Subsection 2.3), the camera setup must remain static during a recording. Similarly, if a trained model is to be reused for another sequence, the recordings must share an identical camera setup. However, as for standard multiview AAMs, the number of cameras used for a sequence is flexible  in fact, the validation of our approach presented in Section 4 includes datasets with one camera view as well as datasets with two camera views.
More generally, the main characteristics of the data which led to our approach are nonstationary landmark movements and nondiscriminative local texture information of certain landmarks (cf. Subsection 1.2). Therefore, the idea of augmented active appearance models should be applicable for all scenarios (1) in which landmarks and texture can be modeled by active appearance models, (2) which suffer from the data characteristics mentioned above, and (3) for which sufficient fitting constraints can be obtained. One possible example might be a medical scenario, in which certain anatomical structures are to be tracked in an image sequence.
4 Experiments and results
We evaluate our approach based on the pointtopoint error [52], i.e., the Euclidean error (in pixels for the 2D case and in millimeters for the 3D case) between manually located and automatically tracked landmark positions. For each sequence, an AAM was trained based on exactly one stride, using the provided landmark data. In any case, at most ten frames of a sequence were used for AAM training. Afterwards, all frames of the sequence were tracked using our presented augmented AAM approach.
4.1 Comparison to standard AAMs
From the quantitative results presented in Figure 5a, it can be seen that augmented AAMs substantially outperform standard AAMs in terms of fitting accuracy in any case. This is particularly apparent for lower leg landmarks, where median errors of up to 150 pixels are constantly reduced to below 25 pixels for image sizes of 1,536 × 1,024 pixels. As a typical example, for all 15 quail sequences, the median pointtopoint error of lower leg landmarks of the side camera view is about 110 pixels for standard AAMs and only about 20 pixels for augmented AAMs. The reason for this result is that especially lower leg landmarks are prone to nonstationary shape movements and nondiscriminative texture information, which drastically complicates standard AAM fitting but can be handled well by augmented AAMs. For other landmark groups, augmented AAMs are also clearly superior to their standard AAM counterparts: for the example of the 15 quail datasets, the median pointtopoint error of torso landmarks of the top camera view is about 25 pixels for standard AAMs and about 15 pixels for augmented AAMs. The general performance disparity between the five bird species can be explained by different locomotion characteristics. For birds such as tinamous, the movement of the lower leg landmarks is less dominant compared to species such as jackdaws (cf. images in Figure 4).
Additional file 1: Video 1: Qualitative tracking examples. This video shows some qualitative landmark tracking results for the scenario of grounded bird locomotion. (AVI 19 MB)
The above comparison clearly shows that our augmented AAM approach, as opposed to standard AAMs, is well suited for tracking the entire set of anatomical landmarks in this challenging scenario. Based on a largescale study which analyzes the accuracy of manually located landmarks in Xray locomotion scenarios [34], it can be stated that the accuracy of our approach is comparable to the performance of human experts.
4.2 Comparison to nonholistic approaches
While our augmented AAM approach is holistic in the sense that all landmarks are modeled in one consistent framework, it uses constraints obtained from methods which only perform well on very specific landmark subsets (cf. Section 3).
It is important to note that for both cases, the evaluation is performed only on the specific landmark subset of the respective nonholistic method.
As can be seen in the top row of Figure 6, the median error of the subset AAM is between 2 pixels (tinamous, top view) and 5 pixels (quails, side view) smaller than for corresponding landmarks of the augmented AAM. For the example of quails, the median error of the side view landmarks is about 10 pixels for subset AAMs, and about 15 pixels for augmented AAMs. This effect can be explained by the fact that the subset AAM is optimized for these specific landmarks, while the augmented AAM mediates between various fitting constraints for all landmarks  even those not covered in this comparison. In addition, the shape and texture models of the augmented AAM are more complex due to the increased scope and thus are harder to optimize. The results of the second nonholistic method, robust template tracking, are presented in the bottom row of Figure 6 and show the same tendency. While local tracking is even more accurate than the subset AAM, the performance of the augmented AAM is similar for both comparisons. Here, the very same explanations as before apply.
As a result, we can state that both nonholistic methods are more accurate on their specific landmark subsets than our holistic approach. However, the holistic approach has the essential advantage that it also can reliably and consistently track landmarks which are covered by neither of the nonholistic approaches, as for instance the lower leg landmarks of the top camera view (cf. Figure 5).
4.3 Influence of constraints
As our approach combines several fitting constraints, an important aspect is the practical relevance of individual constraint types. It is to be expected that positional constraints such as local tracking priors will have a larger benefit on fitting accuracy than, e.g., anatomical constraints. However, the question is whether a combination of several constraints can improve the fitting results. We therefore compare the performance of augmented AAMs using different combinations of constraints described in Section 3.
Similarly, for lower leg landmarks, it is sufficient to use local template tracking constraints in easy scenarios (tinamous). However, in more challenging scenarios (jackdaws), all constraints contribute to the final fitting performance. In both scenarios, epipolar constraints primarily improve the results of the top view. This is mainly due to the fact that the lower leg landmarks have no positional constraints for the top view and thus have a larger inaccuracy. While anatomical constraints do not increase accuracy when used together with local constraints, they improve results of standard AAMs in complicated scenarios (jackdaws). This fits their intended purpose of providing a rough initial landmark estimate for the other constraints (cf. Section 3).
An example in Figure 7 which demonstrates all aspects of the above argumentation is the case of lower leg landmarks in the top view for jackdaws. In case that no constraints are used (‘none,’ standard AAMs), the median pointtopoint error is larger than 125 pixels. If all but local tracking priors are employed (‘without local’), a median error of about 55 pixels is obtained. When using all priors (‘all,’ augmented AAMs), the median error is smallest with about 25 pixels.
4.4 3D evaluation of tracking results
 1.
State whether our approach is accurate enough to produce reliable 3D results
 2.
Obtain an upper error bound for pure 3D tracking methods
Currently, camera calibration is only available for exactly one of the 32 datasets presented above  namely a quail dataset having 1,841 frames which cover 24 steps. For the calibration of this dataset, a custombuilt metal plate with a size of 140 mm×60 mm×0.5 mm was employed. It contains 18 uniquely identifiable holes which are easily detectable in both Xray as well as visible light cameras. For the actual calibration, we use the method of Zhang [55]. The mean backprojection error of the intrinsic camera calibration is 0.27 pixels at an image size of 1,536 × 1,024 pixels.
4.5 Implementation details
Both the augmented AAM approach presented in this work as well as the standard AAMs were entirely implemented in the programming language R (http://www.rproject.org/). The robust template tracking approach of [16] which is used to provide local AAM fitting constraints was implemented in C++ using the OpenCV library [56]. All experiments were performed on a typical desktop computer with an Intel ^{ Ⓡ } Corete^{TM} i5760 CPU at 2.80 GHz. On average, the creation of all fitting constraints for the augmented AAM was performed at 13.2 frames per second (fps) for the anchor AAM, 11.4 fps for local tracking constraints, and 2.8 fps for torso and leg distance constraints (cf. Section 3). Given these constraints, our implementation of augmented AAMs runs at 0.5 fps. Note that this value could be drastically increased using a pure C/C++ implementation, employing the inverse compositional/projectout [40, 45] optimization instead of the additive method [37] and by exploiting the vast parallelization capability of the approach. In the animal locomotion scenario at hand, however, a realtime processing of datasets is not of primary importance.
5 Conclusions and further work
In this paper, we presented augmented active appearance models, a general approach for AAM fitting in cases of nonstationary shape motions and nondiscriminative local texture information. Our method is based on a holistic, probabilistic framework which allows the inclusion of arbitrary fitting priors. We applied our approach to the challenging scenario of landmark tracking in Xray animal locomotion sequences, for which until now only methods for specific landmark subsets existed. For this particular scenario, we presented various types of suitable fitting constraints that were included into our probabilistic framework. Extensive experiments based on 32 realworld datasets including 175,942 groundtruth landmark positions showed that our approach clearly outperforms standard AAM fitting and allows to reliably track all landmarks of interest. In addition, we could show that the accuracy of our approach is sufficient to provide reliable 3D landmark estimates for calibrated datasets.
For further work, an interesting and relevant point to consider is the scenario of noncyclic locomotion, for instance birds running over obstacles. Another important problem we want to solve is how to transfer already trained models to different tracking scenarios, such as adapting a quail model to be able to track tinamous. Both points mentioned require an adaption of a given model to novel cases, and we plan to utilize methods from incremental learning [47] and domain adaptation for this task. Inspired by the promising results of 3D landmark estimation for calibrated datasets, another idea for further work is the inclusion of additional imaging modalities such as visible light cameras into the tracking process.
Declarations
Acknowledgements
The authors would like to thank Alexander Stößel from the Department of Human Evolution at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany for providing the quail, jackdaw, and tinamou datasets. Furthermore, we would like to thank John Nyakatura from the Institute of Systematic Zoology and Evolutionary Biology with Phyletic Museum at the Friedrich Schiller University of Jena, Germany for providing the bantam and lapwing datasets as well as one additional quail dataset. This research was supported by grant DE 735/81 of the German Research Foundation (DFG).
Authors’ Affiliations
References
 Gatesy SM, Guineafowl hind limb function: I: cineradiographic analysis and speed effects. J. Morphol 1999,240(2):10974687.Google Scholar
 Fischer MS, Schilling N, Schmidt M, Haarhaus D, Witte H: Basic limb kinematics of small therian mammals. J Exp Biol 2002,205(Pt 9):13151338.Google Scholar
 Tashman S, Anderst W: In vivo measurement of dynamic joint motion using high speed biplane radiography and CT: application to canine ACL deficiency. J. Biomech. Eng 2003, 125: 238245. 10.1115/1.1559896View ArticleGoogle Scholar
 Brainerd EL, Baier DB, Gatesy SM, Hedrick TL, Metzger KA, Gilbert SL, Crisco JJ: Xray reconstruction of moving morphology (XROMM): precision, accuracy and applications in comparative biomechanics research. J. Exp. Zool. A 2010,313A(5):262279.Google Scholar
 Gatesy SM, Baier DB, Jenkins FA: KP Dial: Scientific rotoscoping: a morphologybased method of 3D motion analysis and visualization. J. Exp. Zool A 2010,313A(5):244261.Google Scholar
 Nyakatura JA, Andrada E, Grimm N, Weise H, Fischer MS: Kinematics and center of mass mechanics during terrestrial locomotion of northern lapwings (Vanellus vanellus, Charadriiformes). J. Exp. Zool. A 2012,317(9):580594. 10.1002/jez.1750View ArticleGoogle Scholar
 Stößel A, Fischer MS: Comparative intralimb coordination in avian bipedal locomotion. J. Exp. Biol 2012, 215: 40554069. 10.1242/jeb.070458View ArticleGoogle Scholar
 Fischer MS, Lilje K: Dogs in Motion. Dortmund: VDH; 2011.Google Scholar
 Hedrick TL: Software techniques for two and threedimensional kinematic measurements of biological and biomimetic systems. Bioinspir Biomim 2008,3(3):034001. 10.1088/17483182/3/3/034001View ArticleGoogle Scholar
 Lucas BD, Kanade T: An iterative image registration technique with an application to stereo vision. In Proceedings of the 7th International Joint Conference on Artificial Intelligence (IJCAI ’81). Vancouver: William Kaufmann; August 1981:674679.Google Scholar
 Shi J, Tomasi C: Good features to track. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Seattle; 21–23 June 1994:593600.Google Scholar
 Baker S, Matthews I: LucasKanade 20 years on: a unifying framework. Int. J. Comput. Vis 2004,56(3):221255.View ArticleGoogle Scholar
 Hager GD, Belhumeur PN: Efficient region tracking with parametric models of geometry and illumination. IEEE T. Pattern Anal 1998,20(10):10251039. 10.1109/34.722606View ArticleGoogle Scholar
 Jurie F, Dhome M: Hyperplane approximation for template matching. IEEE T. Pattern Anal 2002,24(7):9961000. 10.1109/TPAMI.2002.1017625View ArticleGoogle Scholar
 Lowe DG, Distinctive image features from scaleinvariant keypoints: Int. J. Comput. Vis. 2004,60(2):91110.Google Scholar
 Amthor M, Haase D, Denzler J: Fast and robust landmark tracking in Xray locomotion sequences containing severe occlusions. In Proceedings of the Vision, Modeling and Visualization (VMV) Workshop. Magdeburg; 12–14 November 2012:1522.Google Scholar
 Haase D, Denzler J: Anatomical landmark tracking for the analysis of animal locomotion in Xray videos using active appearance models. In Image Analysis ed. by A Heyden, F Kahl. Proceedings of the 17th Scandinavian Conference on Image Analysis (SCIA 2011), no. 6688 in LNCS, Ystad, May 2011. Springer, Heidelberg, 2011); 604615.Google Scholar
 Rohlfing T, Denzler J, Gräßl C, Russakoff DB, Maurer Jr CR: Markerless realtime 3D target region tracking by motion backprojection from projection images. IEEE T. Med. Imaging 2005,24(11):14551468.View ArticleGoogle Scholar
 Miranda DL, Schwartz JB, Loomis AC, Brainerd EL, Fleming BC, Crisco JJ: Static and dynamic error of a biplanar videoradiography system using markerbased and markerless tracking techniques. J. Biomech. Eng 2011,133(12):121002. 10.1115/1.4005471View ArticleGoogle Scholar
 Ishikawa T, Matthews I, Baker S: Efficient image alignment with outlier rejection. Technical Report CMURITR0227. Carnegie Mellon University Robotics Institute, 2002Google Scholar
 Jurie F, Dhome M: Real time robust template matching. Proceedings of the British Machine Vision Conference 2002, (BMVC) (British Machine Vision Association, Cardiff, 2–5 September 2002)Google Scholar
 Pan J, Hu B, Zhang JQ: Robust and accurate object tracking under various types of occlusions. IEEE Trans. Circuits Syst. Video Techn 2008,18(2):223236.View ArticleGoogle Scholar
 Cootes TF, Edwards GJ, Taylor CJ: Active appearance models. In Computer Vision–ECCV’98 Edited by: Burkhardt H, Neumann B. Proceedings of the 5th European Conference on Computer Vision, Freiburg, 2–6 June 1998. Lecture Notes in Computer Science, vol. 1407 (Springer, Berlin, 1998), pp. 484–498Google Scholar
 Cootes TF, Taylor CJ, Edwards G J: Face recognition using active appearance models. In Computer Vision–ECCV’98 Edited by: Burkhardt H, Neumann B. Proceedings of the 5th European Conference on Computer Vision, Freiburg, 2–6 June 1998. Lecture Notes in Computer Science, vol. 1407 (Springer, Berlin, 1998), pp. 581–595Google Scholar
 Cootes TF, Edwards GJ, Taylor CJ: Active appearance models. IEEE T. Pattern Anal 2001,23(6):681685. 10.1109/34.927467View ArticleGoogle Scholar
 Ashraf AB, Lucey S, Cohn JF, Chen T, Ambadar Z, Prkachin KM, Solomon PE: The painful face  pain expression recognition using active appearance models. Im. Vis. Comp 2009,27(12):17881796. 10.1016/j.imavis.2009.05.007View ArticleGoogle Scholar
 van der Maaten L, Hendriks E: Action unit classification using active appearance models and conditional random fields. Cogn. Process 2012,13(Suppl 2):S507S518.View ArticleGoogle Scholar
 Cootes TF, Taylor CJ: Statistical models of appearance for medical image analysis and computer vision. Medical Imaging: Image Processing, vol. 4322 ed. by M Sonka, KM Hanson. Proceedings of SPIE, Bellingham August 2001, (SPIE, Bellingham, 2001), pp. 236–248Google Scholar
 Mitchell SC, Lelieveldt BPF, van der Geest RJ, Bosch JG, Reiber JHC, Sonka M: Multistage hybrid active appearance model matching: segmentation of left and right ventricles in cardiac MR images. IEEE Trans. Med. Imaging 2001,20(5):415423. 10.1109/42.925294View ArticleGoogle Scholar
 Haase D, Nyakatura JA, Denzler J: Multiview active appearance models for the Xray based analysis of avian bipedal locomotion. Pattern Recognition ed. by R Mester, M Felsberg. Proceedings of the 33rd DAGM Symposium (DAGM), no. 6835 in LNCS, Frankfurt, 31 August to 2 September 2011, (Springer, Berlin, 2011), pp. 11–20Google Scholar
 Das S, Vaswani N: Nonstationary shape activities: dynamic models for landmark shape change and applications. IEEE T. Pattern Anal 2010,32(4):579592.View ArticleGoogle Scholar
 Vaswani N, Chowdhury AKR, Chellappa R: “Shape activity”: a continuousstate HMM for moving/deforming shapes with application to abnormal activity detection. IEEE Trans. Image Process 2005,14(10):16031616.View ArticleGoogle Scholar
 Cootes TF, Taylor CJ, Cooper DH, Graham J, Active shape models—their training and application: Comput Vis. Image Underst. 1995, 61: 3859. 10.1006/cviu.1995.1004View ArticleGoogle Scholar
 Haase D, Denzler J: Comparative evaluation of human and active appearance model based tracking performance of anatomical landmarks in locomotion analysis. Proceedings of the 8th Open GermanRussian Workshop Pattern Recognition and Image Understanding (OGRW82011) Nizhny Novgorod, November 2011, 9699.Google Scholar
 Cristinacce D, Cootes TF: Automatic feature localisation with constrained local models. Pattern Recognit 2008,41(10):30543067. 10.1016/j.patcog.2008.01.024View ArticleGoogle Scholar
 Martins P, Caseiro R, Henriques JF: Discriminative Bayesian active shape models. In Proceedings of the 12th European Conference on Computer Vision. Florence; 7–13 October 2012:5770.Google Scholar
 Cootes TF, Taylor CJ: Constrained active appearance models. In IEEE International Conference on Computer Vision (ICCV). BC: Vancouver; 7–14 July 2001:748754.Google Scholar
 Bookstein FL: Landmark methods for forms without landmarks: morphometrics of group differences in outline shape. Med. Image Anal 1997,1(3):225243. 10.1016/S13618415(97)850128View ArticleGoogle Scholar
 Dryden IL, Mardia KV: Statistical Shape Analysis. Chichester: Wiley; 1998.Google Scholar
 Matthews I, Baker S: Active appearance models revisited. Int. J. Comput. Vis 2004,60(2):135164.View ArticleGoogle Scholar
 Xiao J, Baker S, Matthews I, Kanade T: Realtime combined 2D+3D active appearance models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C.; June 2004:535542.Google Scholar
 Sung J, Kim D: Estimating 3D facial shape and motion from stereo image using active appearance models with stereo constraints. In Third International Conference on Image Analysis and Recognition, Póvoa de Varzim, 18–20 September 2006, Lecture Notes in Computer Science vol. 4142. Berlin: Springer; 2006:457467.Google Scholar
 Lelieveldt B, Üzümcü M, van der Geest R, Reiber J, Sonka M: Multiview active appearance models for consistent segmentation of multiple standard views. Int. Congr. Ser 2003, 1256: 11411146.View ArticleGoogle Scholar
 Oost E, Koning G, Sonka M, Oemrawsingh PV, Reiber JHC, Lelieveldt BPF: Automated contour detection in Xray left ventricular angiograms using multiview active appearance models and dynamic programming. IEEE T. Med. Imaging 2006,25(9):11581171.View ArticleGoogle Scholar
 Papandreou G, Maragos P: Adaptive and constrained algorithms for inverse compositional active appearance model fitting. In Conference on Computer Vision and Pattern Recognition (CVPR 2008). Anchorage; 23–28 June 2008.Google Scholar
 Edwards GJ, Cootes TF, Taylor CJ: Advances in active appearance models. In IEEE International Conference on Computer Vision (ICCV) vol. 1. Kerkyra; 20–27 September 1999:137142.Google Scholar
 Sung J, Kim D: Adaptive active appearance model with incremental learning. Pattern Recogn. Lett 2009,30(4):359367. 10.1016/j.patrec.2008.11.006View ArticleGoogle Scholar
 Zhou M, Liang L, Sun J, Wang Y: AAM based face tracking with temporal matching and face segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). San Francisco; 13–18 June 2010:701708.Google Scholar
 Felzenszwalb PF, Huttenlocher DP: Distance transforms of sampled functions. Theory Comput 2012,8(19):415428.MathSciNetView ArticleGoogle Scholar
 van den Boomgaard R: Mathematical morphology: extensions towards computer vision. PhD thesis. University of Amsterdam, 1992Google Scholar
 Borgefors G: Distance transformations in digital images. Comput. Vis., Graph., Image Process 1986,34(3):344371. 10.1016/S0734189X(86)800470View ArticleGoogle Scholar
 Stegmann MB: Active appearance models: theory, extensions and cases. Master’s thesis. Technical University of Denmark, DTU, 2000Google Scholar
 Hartley R, Zisserman A: Multiple View Geometry in Computer Vision. Cambridge: Cambridge University Press; 2003.Google Scholar
 Mittrapiyanuruk P, DeSouza GN, Kak AC: Calculating the 3Dpose of rigidobjects using active appearance models. In International Conference on Robotics and Automation (ICRA 2004). New Orleans; 26 April to 1 May 2004:51475152.Google Scholar
 Zhang Z: A flexible new technique for camera calibration. TPAMI 2000,22(11):13301334. 10.1109/34.888718View ArticleGoogle Scholar
 Bradski G, Kaehler A: Learning OpenCV: Computer Vision with the OpenCV Library. Cambridge: O’Reilly; 2008.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.