Comparison of two 3D tracking paradigms for freely flying insects

In this paper, we discuss and compare state-of-the-art 3D tracking paradigms for flying insects such as Drosophila melanogaster. If two cameras are employed to estimate the trajectories of these identical appearing objects, calculating stereo and temporal correspondences leads to an NP-hard assignment problem. Currently, there are two different types of approaches discussed in the literature: probabilistic approaches and global correspondence selection approaches. Both have advantages and limitations in terms of accuracy and complexity. Here, we present algorithms for both paradigms. The probabilistic approach utilizes the Kalman filter for temporal tracking. The correspondence selection approach calculates the trajectories based on an overall cost function. Limitations of both approaches are addressed by integrating a third camera to verify consistency of the stereo pairings and to reduce the complexity of the global selection. Furthermore, a novel greedy optimization scheme is introduced for the correspondence selection approach. We compare both paradigms based on synthetic data with ground truth availability. Results show that the global selection is more accurate, while the previously proposed tracking-by-matching (probabilistic) approach is causal and feasible for longer tracking periods and very high target densities. We further demonstrate that our extended global selection scheme outperforms current correspondence selection approaches in tracking accuracy and tracking time.


Introduction
The investigation of complex movement patterns of various organisms has become an integral subject of biological research. From a biological point of view, motion is the visual response to any kind of perceivable stimulation. The nervous system is responsible for the perception, the integration of the information, and the execution of the final response. One of the most popular model organisms to study how the nervous system controls locomotion is Drosophila melanogaster (i.e., fruit fly). Sophisticated genetic tools as well as advanced imaging techniques allow the functional dissection of neural circuits [1][2][3][4].
Drosophila is a holomethabolous insect. In the larval stage, locomotion is confined to two dimensions, http://jivp.eurasipjournals.com/content/2013/1/57 time). Together they form the so-called general multiindex assignment problem [16]. This problem is nondeterministically polynomial-time hard (N P-hard) [16]. If all correspondences are known, triangulation is used to determine the 3D positions.
To avoid expensive multi-camera multi-target 3D tracking, existing approaches typically either track in two dimensions (no stereo matching) [2,[9][10][11] or track only a single target (no ambiguities over time) [14,17]. If multi-camera multi-target 3D tracking is required, stereo matching and temporal tracking can be solved separately by accepting a decrease of tracking accuracy [18][19][20].
Among others, there are two fundamentally different paradigms used to capture 3D trajectories of multiple adult Drosophila. The first paradigm uses the extended Kalman filter and avoids complexity by separating stereo and temporal correspondence associations [21,22]. Due to this separation, optimal results cannot be guaranteed, and fragmented tracks prevent the preservation of the fly identities over time. The second paradigm performs a global selection by combining both tasks to calculate the overall best assignment [23]. As a result, identity preservation can be achieved for many flies and frames. However, the amount of possible combinations increases exponentially with the number of animals and time steps; thus, current solutions are only able to track for a short period.
Another probabilistic approach addresses the tradeoff between identity preservation and long-term experiments [24]. The authors use the Hungarian algorithm and Kalman filtering for stereo matching and temporal correspondence association. Focusing on applicability for biologists, up to seven flies were evaluated in several experiments.
All the above-mentioned approaches focus on either tracking a few hundreds of targets for a short period of time or tracking less targets for more frames. High-density tracking is used in different research areas like particle tracking velocimetry [18,25] and tracking bats [26,27], bees [28] or fruit flies [20,23]. A quantitative comparison of several three-dimensional Lagrangian particle tracking approaches for high-density situations is given in [29].
Examples of long-term tracking approaches for fruit flies are given in [21,22,24,30,31]. In a recent publication, problems like noise and low frame rates are addressed to calculate trajectories of wild mosquitoes [32]. The authors used a probabilistic multi-target tracking for swarms of 6 to 25 mosquitoes. If hundreds of flies are tracked for a comparatively long period, trajectories are fragmented and the identity is not preserved. Furthermore, tracking several hundreds of flies simultaneously is not practical for most biological applications [24,30]. Only if swarming behavior needs to be analyzed, ambiguous animals are neglected leading to a strongly varying number of targets over time [33].
In a recent publication, multi-path branching was used to handle occlusions by employing global optimization when calculating the trajectories [34]. The algorithm was exhaustively tested for both high-density and long-term situations. Again, the tracking accuracy decreases if the number of targets and the number of frames increases simultaneously.

Proposed algorithms and comparison scheme
In this paper, we compare identity preserving 3D tracking approaches for long-term experiments considering biological usability. First, we present algorithms for both the above-mentioned paradigms (see Figure 1): • The previously proposed tracking-by-matching (TbM) solution [35] integrates a third camera to conduct projection consistency check into the probabilistic approach. • In addition, we introduce a global correspondence selection (GCS) algorithm (extension of [23]), calculating the global search space and minimizing a cost function afterwards.
Limitations of the TbM and the GCS approach are addressed by utilizing a third camera to verify the consistency of stereo pairings. The third camera is integrated by the so-called projection consistency [35]. As a result, the amount of ambiguous temporal associations is reduced in the TbM approach. The GCS approach benefits from the projection consistency by means of a reduced overall complexity. Besides utilizing Gibbs sampling for optimization, as suggested by [23], we introduce an alternative greedy selection scheme (see Figure 1). It should be pointed out that we use GCS in terms of optimizing a global search space, not determining the global optimum for our optimization task.
We compare both paradigms, the TbM and the GCS approach, based on synthetic data; thus, the ground truth is available. Global correspondence selection was done via Gibbs sampling [23] and greedy optimization utilizing projection consistency. This leads to the comparison scheme illustrated in Figure 1.
This paper is organized as follows: In Section 2, we provide notes about notations and central equations. In particular, the projection consistency is described in detail. Algorithms are presented in Section 2.2. Section 2.2.3 describes the extensions of the GCS approach. The synthetic data and measures used for comparison are described in Section 3. All results are listed in Section 4: We compare GCS approaches with the TbM approach in Sections 4.2, 4.3, and 4.4. In addition, we compare both GCS approaches in more detail in Section 4.1. A concluding discussion of both paradigms is given in Section 5.  Figure 1 General comparison scheme of this paper. The new approach is highlighted in yellow.

Methods
Both algorithms expect time-synchronized image streams from up to three cameras. Let I i t represent these images of cameras i = 1, 2, 3, and time t = 1, . . . , T. All cameras need to be calibrated; thus, the camera matrices K i , rotation matrices (from camera i to camera j) R ij , and translation vectors t ij are given. Then, the fundamental matrices can be calculated by (for more details, see [36]). Consider a swarm of flying targets of similar appearance and small size. The centers of detected targets (i.e., blobs) in a single image where (x, y) is the image coordinate of the objects' centroid. The value N i t may differ due to occlusions or noise.
To calculate the 3D positions of the flies, stereo correspondences between detected blobs need to be established. Since we use three cameras, triplets of image points (m 1 n 1 ,t , m 2 n 2 ,t , m 3 n 3 ,t ) correspond to one target. In general, two 2D image coordinates are sufficient to calculate a single 3D position; thus, we define all possible pairs given by H could represent either a true or a false correspondence for target k.

Stereo matching and projection consistency
Both paradigms perform stereo matching based on epipolar geometry and verify matches using the so-called projection consistency constraint [35].

Stereo matching
Stereo matching is used to identify possible pairings between two respective views and thus result in possible correspondences. For matching a point

Projection consistency
Since we use three calibrated cameras, triplets of 2D points (m 1 n 1 ,t , m 2 n 2 ,t , m 3 n 3 ,t ) located in images I 1 t , I 2 t , and I 3 t correspond to the same target in the 3D space. Projection consistency is applied to those triplets to verify the overall match. This constraint is satisfied if the respective projections from two 2D points (m i n i ,t , m j n j ,t ) into the third view where the first summand is the Euclidean distance between the hypothetic position m h t and the measured positions in M h t and the second summand is the distance between a measured point and the epipolar line l h n i in view then blob m h t (and thus the underlying pairing (m i n i ,t , m j n j ,t )) describes a correct stereo correspondence. τ is the threshold for the projection consistency and depends on calibration accuracy. The triplet

Presented algorithms
To compare current state-of-the-art tracking paradigms, we introduce the TbM algorithm and a GCS algorithm. http://jivp.eurasipjournals.com/content/2013/1/57 We try to overcome the limitations of probabilistic tracking, namely the separation of stereo matching and temporal tracking, by integrating projection consistency into the temporal tracking routine. Exponential complexity, arising in global correspondence selection algorithms, is avoided by reducing the global search space based on the projection consistency. Correspondence selection can be done by Gibbs sampling [37] or in a greedy manner. Since we introduce a novel selection scheme for the GCS approach (yellow box in Figure 1), we describe it in more detail. The TbM is explicitly described in [35].

Tracking-by-matching algorithm
As in all probabilistic tracking approaches, our tracking algorithm models the position and motion information of the targets independently from the stereo correspondences between the views. We use the unscented Kalman filter (UKF) as a Bayesian framework for 2D tracking [8,38]. Using the notation introduced above, every target m i n i ,t in every view (i ∈ {1, 2, 3}) is represented by its own tracker T i k (i.e., a single UKF). Temporal tracking is achieved by referring one of the measured n i -th targets to a specific tracker k over time t (yellow box 'temporal tracking' in Figure 2). Thus, for each new frame triplet, every UKF predicts the next possible 2D position for its target. Then, detections close to the predictions are verified with projection consistency (green box 'Verify new triplet via proj. consist. '; for details, refer to [35]). In this way, the projection consistency constraint is used to integrate stereo matching into temporal tracking. After updating all trackers T i k , this procedure is repeated as long as there are further frames available (see Figure 2).

Global correspondence selection
Before going into formal details, the following section introduces the GCS algorithm in a top-down manner.
General workflow of the GCS approach. In distinction from the TbM approach, temporal tracking and stereo matching is not done within the main loop (compare to Figure 2). In fact, a global search space named S is constructed over all the accessible frames before the actual tracking (compare to box 'Construction of S'). Afterwards, the best possible assignments are calculated by minimizing an overall cost function operating on S.
To reduce the size of this search space, epipolar and temporal assumptions are made before considering actual correspondences. As illustrated in Figure 2, possible stereo correspondences between cameras 1 and 2 are calculated for every time step. Only blob pairings close to their respective epipolar lines are considered as possible stereo matches (box 'Calculate stereo correspondences'). If projection consistency is used (dashed box in Figure 2), invalid matches are removed or replaced via Equation 4. Note that the third camera is only used to replace incorrect matches from cameras 1 and 2. In other words, projection consistency is used to further reduce the set of possible pairings arising from I 1 and I 2 . The resulting set of matches can be interpreted as a set of possible 3D positions. Given two sets of 3D positions for consecutive time steps, possible temporal assignments can be calculated (compared to box 'Calculate temporal correspondences'). If a target has no successor within a 3D neighborhood (given by the maximal flight speed), it is removed from S.
This reduction is done for all available frames and time steps T. After constructing the search space, several assignments are unique. Ambiguous pairings and ambiguous temporal correspondences form natural clusters in S. Thus, only samples inside these clusters need to be optimized (see box 'Get ambiguous clusters from S' in Figure 2). The subsequent optimization is done by a cost function introduced below which incorporates stereo and temporal matching (Equation 6). We implemented two optimization strategies to find possible samples in the respective clusters (see box 'Greedy / Gibbs cluster optimization'). To avoid additional complexity, arising from pairwise pairings between three views, we use cameras 1 and 2 for stereo matching. Thus, the initial search space is constructed for H 12 t . Blobs from I 3 t are only considered if projection consistency is used (see Section 2.2.3).
The subset containing N t pairings from the power set is called a configuration and contains N t stereo correspondences for time step t. If camera indices are not necessary, we use Let S = (P(H 1 ), P(H 2 ), . . . , P(H T )) be the set of all configurations over all time steps, or S P = (P N 1 (H 1 ), if the number of targets per time step is known. A sequence of configurations between two time steps t − 1 and t is denoted by S t−1:t and contains temporal correspondences between consecutive frames. Thus, an overall solution, containing all tracks for all flies and T time steps, is given by a sequence S 1:T ∈ S. The entire 3D trajectory of target k is then given by s k,1:T = (s k,1 , s k,2 , . . . , s k,T ). http://jivp.eurasipjournals.com/content/2013/1/57  Figure 2 Flow charts of the two 3D tracking paradigms. Temporal tracking is marked in yellow, stereo matching is marked in green, and projection consistency is highlighted by dots.
Cost function. Stereo matching and temporal tracking is incorporated into a single optimization task, solving the optimization problem S * 1:T = arg min The cost function f (·) incorporates an epipolar constraint f E (·) for stereo matching, kinetic coherence f K (·) for temporal tracking, and a so-called conservationobservation match f C (·) to punish multiple assignments. Thus, f (·) can be written as a sum of all the abovementioned constraints with weights α, β, and γ (compare to [23]).

Cost function summands. Epipolar costs are defined as
where ρ e (s k,t ) sums the distances between the blobs m i k,t , m j k,t from s k,t to its epipolar lines (compare Section 2.1.1). To avoid improbable stereo matchings, values f E (·) larger than a threshold ε E are set to ∞: otherwise.
The kinetic coherence dist(p k,t−1 , p k,t ) http://jivp.eurasipjournals.com/content/2013/1/57 calculates the Euclidean distances dist(·) between 3D positions p k,t−1 and p k,t (defined by s k,t−1 and s k,t ). ρ k (·) expects two consecutive pairings for 3D coordinate calculation. Improbable temporal connections are set to ∞: Finally, the conservation observation match is defined as where n c (m i k,t , S t ) adds up the contributions of a blob m i k,t in configuration S t . If the number of correspondences exceeds a threshold ε C , configuration costs are set to ∞: where ρ c (S t , Recursive decomposition. Equation 6 can be rewritten in a recursive manner as follows: with Thus, the whole optimization can be done by dynamic programming (for more details, see [23]).

Reduction of S.
In [23], Gibbs sampling [37] is suggested to find the best possible sequence of configurations S * 1:T ∈ S. Since S is a set of T power sets P(H t ), several steps are suggested to reduce the search space. First of all, sampling for solutions with N t targets for time t leads to a reduced set P N t (H t ). Thus, we redefine the overall search space for S * 1:T by S P = (P N 1 (H 1 ), P N 2 (H 2 ), . . . , P N T (H T )).
The set H t is reduced by rejecting pairings which do not satisfy Equation 7. In the remaining subset H t ⊂ H t , only blob pairings s k,t close to the respective epipolar lines are considered. Due to the recursive decomposition given in Equation 10, the successor to S t−1 can be selected from the N t permutation P N t (H t ). Since kinetic costs are limited (see Equation 8), improbable temporal correspondences can be rejected from P N t (H t ). Figure 3 illustrates the reduction of the cardinality for an N 1 permutation.
After rejecting both impossible stereo matchings and temporal correspondences, some sequences of configurations S t−1:t ∈ (P N t−1 (H t−1 ), P N t (H t )) are unique. The remaining ambiguities form natural clusters C (t−δ:t),ν ⊂ S P for δ + 1 frames and ν flies. Zou et al. [23] extend ambiguous clusters by adjacent pairings. However, these pairings can again be involved in an ambiguous cluster. Since we tried to keep the identity over time, we merged the clusters in these situations as long as there are no ambiguous situations before and after each cluster anymore. In this way, the resultant clusters include overall ambiguous situations, and the domain of Equation 5 is global.
Since the cluster size increases exponentially with the number of targets N and time steps T, Gibbs sampling also requires thousands of sampling steps to guarantee good results. Indeed, the authors of [23] were only able to track for less than 1 s of recording.

Introduced improvements for the GCS approach
Here, we introduce two extensions to improve the performance of the GCS approach: • Utilizing projection consistency to reject ambiguous pairings s k,t and thus reducing the sizes of the clusters • Performing optimization in a greedy manner by selecting the best successor directly based on Equation 11 GCS with projection consistency. Similar to the above introduced probabilistic tracking approach, ambiguities and wrong stereo matches increase the size of the search space H 12 t . Thus, all pairings s 12 k,t ∈ H 12 t are projected into the view of the third camera I 3 Figure 4). The overall search space for Equation 5 is then given by

The optimization of clusters based on Equation 5 via
Gibbs sampling is described in [23]. The greedy optimization strategy is described below.
Greedy optimization. Given a cluster with ambiguities C (t−δ:t),ν ⊂ S * P , a sequence of configurations S t−δ:t ∈ http://jivp.eurasipjournals.com/content/2013/1/57 is already assigned to s 1,1 , a successor to s 2,0 is selected by s * 2,1 = arg min . This is successively done for all pairings and all configurations until every pairing in every configuration has a successor.

Figure 4 Example for pairings H with and without projection consistency (PC).
One target is occluded in I 2 , and all possible combinations are generated between I 1 and I 2 . Pairings that do not satisfy the PC constraint are removed, and ambiguous pairings are corrected using projection consistency. http://jivp.eurasipjournals.com/content/2013/1/57

Complexity of the algorithms
The complexity of the GCS search space and thus the memory storage is O(k NT ) in theory (N is the number of targets, T is the number of time steps, and k ≤ N denotes ambiguities after cardinality reduction), since there are k N possible configurations between two views and each of these configurations at t can be combined with all configurations at (t + 1). Optimization is only necessary for ambiguous clusters C (t−δ:t),ν , therefore N = ν specifies the number of flies in this cluster and T = δ + 1 specifies the length of the cluster. Thus, the global optimum must be calculated based on k NT possible cluster configurations.

Synthetic data
Both tracking paradigms are evaluated using synthetic data, generated by the swarm simulator introduced in [35]. The simulator generates all necessary data for tracking (i.e., rendered images and camera matrices) and evaluation (i.e., ground truth of the 2D and 3D trajectories). For our tests, three synchronized and calibrated cameras are placed around a 20 × 20 × 20 cm 3 chamber. All movies are recorded with 800 × 800 pixel resolution and 150 fps. Since the beam width of the field of view is 45°, all cameras are placed 80 cm away from the cube's center. Rotations around the y-axes, for cameras 1, 2, and 3, are 0°, -120°, and 120°, respectively.
According to [39], the maximum flight speed is set to 0.8 m/s. The crawling speed is reduced by the factor 0.1, and we use a Gaussian random walk for flight movement calculation [35]. To achieve more realistic conditions and to increase the probability of occlusions and nearby targets, we integrated negative geotaxis within our random walk model. Negative geotaxis describes the tendency of Drosophila to orient themselves against the earth's gravity [40]. We integrated negative geotaxis by manipulating the randomly generated velocity in the y direction v t = v t−1 + n t (with Gaussian noise n t ∈ N (0, σ 2 ) and smoothness ∈ [0, 1]). With a probability of 0.002% the y entry of n t is forced to be zero or positive over time.
In this way, we generated several test movies with an increasing number of targets. For most real-world locomotion experiments, 50 flies per run are sufficient; thus, we generated movies with 10 to 50 targets and 1,000 frames (approximately 6 s; Sections 4.2 and 4.1). In addition, we made a long-term movie with 50 flies over 3,000 frames (Section 4.3) and high-density movies with a few hundreds of flies and time steps (Section 4.4).
To guarantee identical raw data for both algorithms, the 2D positions of all views are established by a separate blob detection routine. Resultant measurements contain time steps with several occluded flies in all views (leading to changes in N i t ; compare to Table 1). We also added Gaussian noise (σ 2 = 0.001 in the intensity domain [0, 1]) to the ground truth videos to simulate blob detections under realistic conditions. Figure 5 shows an example triplet of noisy images of 200 flies.

Evaluation and comparison measure
Both paradigms are compared in terms of tracking accuracy using the correspondence and association errors (E ca ) [35]. The E ca is defined as follows: where N c is the number of incorrect stereo matches, N a is the number of false temporal associations, and T is the number of frames. To calculate N a and N c , all computed 3D trajectories are assigned to their respective ground truth paths. This assignment is used to calculate Euclidean distances between calculated positions and ground truth positions. If the distance is not within a tolerance, N c is incremented for each frame and time step. The temporal association value is incremented if the ID of the calculated 3D paths changes between consecutive frames.

Results
We tested all combinations illustrated in Figure 1 as follows: • Tracking by matching method (named TbM ) • GCS optimized via Gibbs sampling analogous to [23] (named Gibbs) • GCS with projection consistency (PC) optimized via Greedy (named Greedy PC ) General tracking results for 50 flies and over 1,000 time steps are given in Figure 6. Table 1 summarizes results for all approaches. The resultant E ca value is additionally plotted in Figure 7a.

Gibbs sampling vs. greedy optimization
The first observation is related to the number of occlusions and maximal cluster sizes. In general, the complexity of the global search space increases with the number of targets and frames [16]. If there are only a few ambiguities (e.g., occlusions, nearby 3D paths), most of the corre-http://jivp.eurasipjournals.com/content/2013/1/57 n/a n/a n/a n/a n/a n/a n/a n/a G r e e d y P C 0 0 0 0 1 1 0 2 4 3 2 TbM  12  24  40  65  75  137  124  194  284   Gibbs  10  20  30  38  43 n/a n/a n/a n/a Greedy PC  10  20  30  40  49  50  106 137 179

Number of tracks
The measurements are classified into general, long-term, and high-density results. Each experiment was done for N targets and T frames (second row). Occlusions are given in the third row: the absolute numbers sum all occlusions in all views, and relative occlusions indicate the mean number of occlusions per camera and frame. In row max |C|, the size of the biggest cluster in S P (without PC) and S * P (with PC) is given for Gibbs and Greedy PC, respectively. |S P | indicates only one cluster C (t−δ:t),ν = C T,N including all sequences of configurations. Time measurements are given in seconds, besides the entries >4 h highlighting computational time longer than 4 h. Since we focus on biological applicability, we do not discuss tracking results with a computational time >4 h (entries are marked by n/a). In the last row, the E ca measurements are given in combination with the missing flies and the absolute number of calculated tracks.
spondences are unique and latter optimization is only necessary for a few small clusters (compare to max |C| in Table 1).
This can be observed in all movies besides the movie with 20 flies: the maximum cluster contains 19,024 pairings and 2,140 pairings without and with projection consistency, respectively (Table 1). Thus, the tracking time increases for both GCS approaches. However, the greedy selection is still able to calculate sufficient tracks, whereas Gibbs sampling results in less reliable results. The reason is that Gibbs needs to sample one sequences of configurations in a cluster containing almost 20,000 pairings covering 18 targets for 843 frames. Given one wrong correspondence selection prevents Gibbs from converging in the global optimum. Since we sampled for 10,000 iterations, this was not possible in reasonable time.
This coherence is also observable in the long-term and high-density experiments. In contrast to Zou et al. [23], we merge overlapping clusters for both joint time steps and joint targets (see Section 2.2.3) to guarantee a global search space. Thus, given very dense situations with hundreds of flies, the natural segmentation of the clusters is no longer available. In all measurements, the cluster size of the Gibbs approach was equivalent to the overall search space so that |C (t−δ:t),ν | = |C T,N | = |S P |. The latter optimization must therefore sample one sequence of configurations out of k ν(δ+1) (k ≤ ν and ν → N, δ → T) possible sequences (compare to Section 2.2.4) statistically. This is why Gibbs sampling requires millions of sampling steps to calculate stable results [37] which was neither shown in [23] nor possible in our data for thousands of frames in reasonable time. Since algorithms requiring more than 4 h for only a few seconds movie length are not suitable for biological applications, we neglect these tracking results in Table 1 (indicated by n/a). Thus, Gibbs sampling for high-density or long-term situations is more interesting from a theoretical point of view [24].

TbM vs. GCS
Obviously, the Greedy PC approach has the best overall performance int the general experiments. The TbM approach is between the Greedy and Gibbs solution. Optimization of GCS without PC and via Gibbs leads to the worst results with irregular E ca values. If the number of flies increases, the E ca increases for both TbM and Greedy tracking (compare to Figure 7a). Since both measurements increase proportional to the number of wrong correspondences between views N c and wrong associations over time N a , these values are examined in Figure 7b.
As apparent, an increasing N c value leads to high error measurements for both the TbM and the Greedy PC approach. The main reason for wrong or missing stereo correspondences is caused by occlusions. The more flies are located in the chamber, the more occlusions arise during blob detection (compare to Table 1). Especially in latter frames, occlusions arise very frequently because of the negative geotaxis (compare to Section 3.1). During the movie with 50 flies, up to 4 flies are occluded for several frames in camera 1, for example. Thus, even in situations with up to 50 flies, the target density is comparatively high. Since TbM and Greedy PC try to overcome this events using the projection consistency, both have much lower N c than the two camera tracking solutions. Gibbs has up to 1,677 wrong stereo correspondences (data not shown). Therefore, it is not able to calculate the global optimum even after 10,000 sampling steps.
In other words, the overall search space S * P , containing more than 1,000 occlusions, cannot be sampled sufficiently because of the growth of the clusters (Table 1). However, Greedy PC benefits from the previously calculated overall search space S * P : since all possible pairings and sequences of configurations are used for coast calculations, ambiguities caused by occlusions can be corrected more frequently.

Long-term tracking
In the long-term experiment, 50 flies were tracked for 3,000 frames. Gibbs failed in this experiments because the size of the clusters C increases drastically for 3,000 frames. Thus, only TbM and Greedy PC were able to track during this experiment.
In long-term movies, the TbM approach can achieve better results than the Greedy PC algorithm ( Table 1). The reason for this inversion compared to the 1,000 frame results is that the size of the clusters |C| increases to much for 3,000 frames. Thus, the probability of getting a local optimum via greedy selection increases accordingly. However, tracking accuracy is still convenient in the Greedy PC approach. On the other hand, TbM, as a causal method, is not affected by the length of tracking sequences.
In contrast to the TbM approach, GCS can miss targets during optimization (see Table 1). However, projection consistency reduces the amount of missed targets. Furthermore, GCS optimization leads to less fragmented trajectories than the TbM approach. Whereas TbM results in 137 trajectories (N a = 99) for 50 flies (over 3,000 frames), the Greedy PC approach calculates 49 complete tracks of 50 tracks in total (N a = 7). If complete trajectories are required (i.e., identity of the flies must remain over time), Greedy PC is recommended but with the possibility of loosing flies.

High-density tracking
To evaluate the behavior of both tracking approaches, we tracked up to 200 flies. Similar to the long-term experi-ment in Section 4.3, we limit our comparisons to Greedy PC and TbM. Table 1 highlights the measurements for 100, 150, and 200 targets. We decreased the number of frames for the movies with 150 and 200 flies to limit the size of S * P . For up to 100 targets and 200 frames, Greedy correspondence selection is more accurate than TbM. However, given more than 100 targets, resulting in very high fly densities, the TbM outperforms the GCS approach. Most importantly, there are no missing targets in the probabilistic approach, whereas Greedy was not able to find trajectories for all flies. Furthermore, TbM can achieve better overall accuracy in high-density situations in less tracking time. The only drawback of the probabilistic tracking is again the fragmentation of the trajectories: TbM calculates more tracks than Greedy PC resulting in many identity changes.

Conclusion
In this paper, we discussed two tracking paradigms for identical appearing objects such as Drosophila melanogaster in 3D. One paradigm is based on a probabilistic approach conducting tracking and matching alternatively [35]. The other paradigm constructs a global search space over all targets and time steps, which is optimized in a second step [23].
Due to the high complexity of the second GCS paradigm, we introduced two improvements, namely projection consistency and greedy optimization. Especially, the projection consistency is able to reduce the overall complexity and thus improve the tracking results without yielding into local optima. Since Gibbs sampling, used for GCS optimization in [23], needs thousands of iterations to guarantee good results, our greedy selection scheme outperforms Gibbs sampling. However, a global result cannot be guaranteed via greedy optimization.
We demonstrated several advantages and disadvantages of both the TbM and GCS approach. Thus, the decision which approach to use must be done carefully. If the identity of the flies is not important, TbM can be used to track for several thousands of frames (compare to Section 4.3). All flies were detected, but the trajectories were fragmented due to occlusions. The GCS approach was not able to track all flies in all experiments: only 49 of 50 flies were detected. On the other hand, the trajectories of the detected flies were less fragmented (compare to Section 4.2). In addition, the GCS was able to solve collisions and occlusions more frequently because of the global search space. This leads to the higher tracking accuracy illustrated in Figure 7a. If dozens of flies must be tracked for a comparatively short period, GCS outperforms TbM tracking. For very long sequences, it is the other way around. http://jivp.eurasipjournals.com/content/2013/1/57 If high fly densities are needed for a comparatively long period, the size of the global search space prevents Gibbs optimization, because it requires too many sampling iterations. In addition, greedy tracking quality decreases drastically compared to TbM (see Section 4.4). Thus, without further reductions of the global search space, probabilistic tracking is the preferable paradigm in high-density experiments.
TbM could be optimized in terms of tracking accuracy, whereas GCS could be optimized for longer tracking durations and higher target densities. Possible improvements for the TbM approach are discussed in [35]. Here, we want to focus on improvements of the GCS approach.
Currently, we use the third camera only to correct mismatches between cameras 1 and 2. The optimization scheme is still executed on pairings. Since pairwise comparison in a triplet would further reduce the search space, all optimization steps could be done on three image points.
The kinetic model given by Equation 8 is also a current drawback of the GCS approach. Only motion form (t − 1) is considered for time step t. Thus, a more appropriate motion model would further improve the accuracy of GCS tracking.
Currently, we are developing a three-camera real-world setup to capture movies of adult Drosophila flies. Thus, we are going to test both algorithms on real video sequences, comparable to the synthetic data introduced above.