Efficient estimation of disparity statistics and their use as a predictor for perceived 3D video scene quality
© Gürol et al.; licensee Springer. 2013
Received: 27 February 2013
Accepted: 9 September 2013
Published: 16 September 2013
Excessive depth perception in 3D video is one of the major factors that causes discomfort to the viewer and that can decrease the viewer’s quality perception of 3D video. With the idea of real-time quality control of 3D videos, we proposed an edge-based sparse disparity estimation algorithm with a novel similarity metric. The comparative assessment with other four state-of-the-art similarity metrics, implemented within the proposed edge-based disparity estimator, showed higher performance for the novel metric. User tests are conducted to assess the relation between certain disparity statistics and user perception of 3D scene quality that is a retrospective subjective experience of quality. Subjective tests indicate that the viewer discomfort can be predicted best by using maximum and slew rate of 95 percentile scene disparities together.
The consumer market is moving rapidly toward 3D motion image delivery, and content providers, distributors, and equipment manufacturers see this as an opportunity. Consequently, there is an intense development effort in the field of 3D technologies, which range from compressive coding to 3D displays. In parallel, the 3D content production is growing rapidly in the form of 3D cinema and television programs. It is expected that 3D video will attain a large usage both at home theaters and mobile platforms in the coming years. The penetration of 3D video into our lives brings in concomitantly the question of multimedia user experience. In particular, the comfort level of 3D viewing will be of paramount importance in contrast to 2D video; in fact, negative aspects such as visual strain and viewer fatigue will curb the wide adoption of 3D. In this paper, we address two issues: (1) realizable and efficient estimation of large disparities in 3D video and (2) the impact of excessive disparities on 3D viewing comfort.
In 3D broadcasting besides having visually good images for each stereo channel, it is also important that the channels match with each other. Research in stereo quality control has identified a number of factors affecting viewing experience, such as parallax (disparity) irregularities, focus mismatch, color mismatch, geometry mismatch, vertical parallax, object edge tearing, cardboard effect, pincushion distortion, etc. Such distortion factors affecting 3D video quality are well documented .
The mechanism of depth perception in the human visual system is fairly well understood. It is known that depth perception uses both psychological and physiological cues. On the psychological aspect, the human visual system (HVS) uses cues related to perspective, such as overlap, shadow, apparent size, and texture; on the physiological aspect, the main cues include binocular parallax, motion parallax, accommodation, and convergence . Among them, the binocular parallax, while being the most common method for 3D stereo rendering, is also one of the most dominant factors affecting the viewing experience. Binocular parallax is the relative spatial distance between similar points, which share the same physical origin, in the left and right stereo image pairs. It is governed by the binocular disparity, that is, the horizontal separation between the retinal images of the two eyes, when convergence to a specific distance is achieved. Hence, it can be conjectured that the quality of 3D video from the viewers’ point of view is largely determined by the binocular disparity. This calls for an automated 3D video quality assessment tool based on the estimated disparities between stereo frame pairs as a function of time. Such a tool would not only provide an overall quality figure for a given video but could also be able to provide information regarding the frequency of parallax errors with respect to scenes as a guidance for video post-processing.
It is known that the brain uses binocular disp arity to infer depth information from the 2D retinal images resulting in 3D perception, that is, stereopsis. The creation of the sense of depth via binocular disparity in stereoptical screens is influenced by the size of the 3D display and the viewing distance, given the same relative parallax. Thus, the disparity requirements vary proportionally for cinema viewing (typically 20 m), home TV viewing (typically 1.5 m), and mobile device viewing (typically 0.2 m). In practice, smaller screens require a larger stereo baseline to provide more disparity as a fraction of the image width to retain a good impression of depth. It has been recommended that perceived depth range be upper bounded at a visual angle of 60 arcmin to ensure visual comfort for the majority of the viewers [2–5].
In order to create a satisfying sense of depth, the disparities between the image pairs should be made compatible with HVS 3D perception. Excessive disparity values correspond to an exaggerated depth range; they may strain the binocular fusion faculty of the subject and may cause the scene depth to be perceived inaccurately. Such defective image pairs can cause a weakened depth sense in the observer, and it can even result in headache or nausea when exposed for a long time. The effects of such defective image pairs on human visual system [3, 6–8] and some quality assessment methods based on disparities  have been investigated in recent years.
The viewer discomfort is known to be affected by multiple factors and is observed in multiple ways. It is generally accepted that the vergence-accommodation conflict is a dominant factor in viewer discomfort and eye strain. Two major consequences of excessive disparities are vergence-accommodation conflict and double vision. Under fixed viewing conditions, the vergence-accommodation conflict can be related to the maximum disparity. The double vision is, however, affected by not only the stereo properties of 3D video, such as disparity, but also by its content and viewing environment. Lambooij et al. have noted that even with plausible disparity range, there are video characteristics such as fast motion or spatial and temporal inconsistencies that may contribute to the visual discomfort . In this work, we focus solely on the effect of disparity, that can be measured efficiently with the proposed novel method, on discomfort.
In accordance with our goal of viewing comfort prediction using estimated excessive disparities, the main contributions of this paper are the following:
We introduce a new edge-based efficient sparse disparity estimation approach.
We introduce a novel similarity metric (correlation of gradient orientations (CGO)) for disparity estimation and carry out a comparative performance assessment with respect to the state-of-the-art metrics.
We report the results of a pilot study exploring the correlation between the perceived 3D video quality and a number of statistical measures extracted from the sparse disparity estimates.
The disparity is estimated only on detected edge pixels; hence, the resulting disparity field is sparse in comparison to conventional dense disparity estimators in [10–13]. The sparse approach is chosen since the intention is not to reconstruct the entire disparity field but to find large range disparities in frames; as a byproduct, edge sparseness enables more rapid and efficient estimation of disparity statistics. We have introduced a new block similarity measure called CGO. This method is found to be a more efficient and reliable disparity estimator compared to several other block search methods. We also investigate the relationship between certain disparity statistics and user viewing comfort in order to develop a predictor of subjective 3D video quality. Such a quality indicator would help screen content provided by third parties prior to purchase decision; it would also be instrumental in quality-based scene selection in 3D video during post-processing and prior to broadcast.
The rest of the paper is organized as follows. In Section 2, we describe the proposed edge-based sparse disparity estimation algorithm and the similarity metrics used including the novel CGO. In Section 3, we explain the test material used together with the details of the similarity metrics we defined. The performance results on both reference databases, with ground-truth information, and on actual video streams from TV industry are presented and discussed in Section 4. The concluding remarks are given in Section 5.
2 Disparity estimation methods
Disparity estimation has been the subject of much interest in the last two decades, and a plethora of algorithms have been developed. These algorithms and their relative performances are well documented in the literature [14, 15].
Several advanced algorithms have been proposed to reconstruct dense disparity fields through global optimization methods [12, 13]. Such approaches, primarily due to their computational load, are not suitable for our major goal of providing a 3D video quality metric fast enough to meet the needs of broadcast companies in selecting and/or post-processing the 3D video prior to purchase/broadcast. Furthermore, dense disparity fields are not required for our purpose. In order to differentiate our method from the dense disparity map methods in the literature, we will call ours the point disparity estimator and the outcome as sparse disparity map.
2.1 Image pre-processing for disparity estimation
Both left and right images are preprocessed to mitigate the illumination artifacts and to enhance the edge structures . The rank filter is applied prior to disparity search whenever a pixel intensity difference-based similarity metric is used . The rank filtering is omitted for other metrics as it decreases the dynamic range as explained in Section 2.2. The rank filtering simply considers a W×W window around each pixel, rank orders these pixel values, and assigns the rank of each pixel as its new pixel value ∈ [ 1,W2]. This filtering is applied to both of the stereo images and the choice of W=15 was found to be adequate, so that original gray values are mapped to the range [ 1,225].
2.2 Disparity search
ψ is the similarity metric (similarity cost function), C(d) is called the cost profile, and the estimated disparity is . The parameters h and b are chosen empirically as the smallest block sizes providing robust estimation of similarity costs used in the study. The choice of k upper bounds the disparity estimates. Too small k values would result in underestimation of disparities, while too large k values would cause the algorithm to match unrelated image patches based on some structure or intensity similarity. We have k=80, based on the properties of HVS and the datasets used, as detailed in Section 3.
We considered five state-of-the-art similarity cost functions including the proposed CGO. The definitions of the five similarity cost functions and the details of the cost aggregations are given in the following.
2.2.1 Sum of absolute differences
where N=h x b. The SAD value at candidate disparity d is obtained as the sum of the absolute difference of the two blocks from f R and f T at d units horizontal shift from each other.
2.2.2 Hermann Weyl’s discrepancy measure
Here, I q represents the integral image obtained along the direction q. The minimum values of the integral images are subtracted from the maximum ones in order to constrain the final costs to be positive. The coordinate location that yields the minimum HWDM is taken as the disparity estimate.
2.2.3 Adaptive support windows
where SAD i (d) represents the SAD cost of the i th sub-block at shift d.
2.2.4 Sum of absolute differences of scale invariant feature transform vectors
The images f T and f R are processed to extract their scale invariant feature transform (SIFT) fields . The SIFT vectors are obtained by dividing the 16×16 neighborhood of each pixel into 4×4 cells and then quantizing the orientation in each cell into 8 bins . Thus, each pixel in both of the images is replaced with a SIFT vector of size 128, and the corresponding SIFT vector images are obtained. The disparity search consists of matching blocks between the reference and target SIFT image blocks using simply the SAD criterion. Note that this algorithm does not need the pre-processing step of rank filtering (Section 2.1), since SIFT already uses the image contrast in the neighborhood. Again, the matching is performed only on edges of the reference image.
2.2.5 Correlation of gradient orientations
Notice that this algorithm also bypasses the rank filtering step due its use of gradients.
We determined the block sizes empirically. For fairness in the performance comparison of disparity estimation methods, we take the block size that yields the best results for each method. Thus, we used the following h×b figures: the size of the search block is 25×25 in SAD and ASW; in the latter, the four overlapping sub-blocks are of size 15×15 and the middle sub-block is of size 5×5. For HWDM, the block size is 11×11, for SADSIFT, 5×5, and for CGO, it is taken as 15×15.
As a post-processing step, each disparity estimate is validated by cross-checking. If a block at some position (x,y) in f R finds its match in f T at (x,y+d E ), where d E is the estimated disparity, then one searches for matching block in f R this time starting from the reference point in f T . If the two estimations: f R →f T and f T →f R are consistent with each other, that is, they are within 2 pixels distance from each other, then the estimate is considered as valid. Otherwise, the pixel under investigation is labeled as unreliable in the disparity map. Such cross-checking is useful in determining the occluded regions and hence helps to reduce wrong disparity estimates.
3 Experimental setup
We describe here briefly the stereo image database, video test material, and the metrics used in experiments.
3.1 Stereo image database
We used 35 stereo image pairs, with known dense ground-truth disparity maps, in the Middlebury stereo image database for quantitative performance assessments [21, 22]. These images are known to be rectified and have unidirectional disparity. The maximum disparity values occurring in Middlebury database are as follows: for 6 of the images, their absolute maximum disparities are less than 20 pixels (≈5% of the scene), and in the remaining 27 images, the absolute maximum disparities vary between 38 to 71 (≈10% to 16% of the scene). Accordingly, in all of our experiments with the Middlebury dataset, we set k=80 (≈18% of the scene), such that the disparity search ranges between (x,y−80) and (x,y+80) for any pixel at location (x,y). Thus, the search limit k is chosen larger than the maximum true disparity value of the image set, and hence, it allows for some overestimation. Any larger setting of the range k would have the potential to induce erroneous and somewhat unrealistic disparity estimates.
3.2 Video test material
In order to assess the potential of using the edge-based maximum disparity estimation for subjective 3D video quality prediction, we used a custom test video set consisting of 12 different stereo scenes from the footages provided by a commercial digital broadcasting company (Digitürk A.Ş.). Eight of these footages were taken in a soccer stadium by an expert 3D broadcasting crew; one of them is a computer animation, and three of them were taken in public locations around the city. The stereo shots have frame rate of 25 frames per second, and the scenes have durations that range from about 17 to 60 s. There is a 1-s length black screen between the scenes. The total length of the test video is 9,963 frames (≈6.5 min). The scenes do not contain any subtitles.
The original resolution of the videos were 1,080×1,920 pixels, but we down-sampled them to 270×480 for practical purposes. The videos, shot by the professional crew, were not rectified; however, their vertical disparity was negligible. The stereo shots were taken with a slight angular shift between the cameras. Therefore, they have bi-directional disparities, such that when the left image is taken as the reference, then the resulting disparities for background objects can be expected to be positive and for foreground objects as negative. Some of these video shots contain large disparities as the offset between the cameras was intentionally and randomly modulated during the shootings.
Although these video scenes do not have ground-truth information according to which k could be set (as in Section 3.1), we have set k=80 (≈17% of the scene) manually, based on the observed maximum absolute disparity between the stereo pairs throughout the video. A disparity of 17% of the scene is equivalent to 64 arcmin angular disparity in our subjective test setup. This search range slightly exceeds the 60 arcmin of visual angle, which is the threshold for HVS to be able to fuse stereo images for 3D perception [2–5]. Therefore, k=80 would be a suitable choice in our setup for the prediction of excessive disparity-related visual discomforts within the limits that still enable fusion.
3.3 Performance metric
Since the goal of our algorithm is to detect the largest disparities in the scene, we have developed performance measures to this effect. Our experience has shown that the absolute disparity estimation error is proportional to the size of the actual disparity. Furthermore, it can be conjectured that the largest disparities per frame and per scene affect the viewer comfort level the most. We therefore compute the mean of the largest 5% of the actual disparities, the true 95 percentile mean, μ 95, for each image and use it as an indicator of estimation error. In fact, we rank the Middlebury images according to their ground-truth μ 95 values in ascending order and reported the disparity estimation performance as a function of ascending μ 95. The metrics we used are as follows:
This metric gives scores in the [ 0,100] range, with 0 corresponding to perfect estimation
where the notations and signify the v th rank-ordered true and estimated disparities (95 percentile disparities). Criterion 2 yields the absolute discrepancy between the 95 percentile values of the ground-truth and estimated disparities. Obviously, the range of this metric is between 0 for perfect estimation and k, the largest attainable error.
4 Results and discussion
4.1 Performance results on stereo images
Rank sums of the disparity estimation methods according to the percentage of erroneous disparity value (Erroneous%) of each image pair
Figure 5b,c confirms further the high performance of CGO in comparison to the other four methods based on the absolute disparity estimation error and the relative disparity scores, respectively. CGO resulted in Diff95% values larger than 0 in only 10 image pairs (out of 35) with an outlier in a single image pair. The nearest performance is realized by SAD method, which results in 11 image pairs with Diff95% value larger than 0. When we consider the Ratio5% measure, both the SAD and CGO scores are clustered around 1 (which represents perfect performance). CGO has an outlier in a single image pair, though in general, CGO results are more tightly clustered around 1 as compared to SAD.
It is interesting to observe that all methods overestimated the disparity as in almost all cases. This gives us confidence that the methods we applied would not miss large disparities, albeit at the risk of false alarms by estimating the disparities larger than their actual values. It can be observed that HWDM is more prone to give false alarms since it overestimates the disparity more frequently. In this respect, CGO is a more reliable metric than the others with its Ratio5% values clustering around 1 without large outliers.
In terms of Diff95% and Ratio5% metrics, SAD and CGO may seem to have similar performances though it should be noted that the results in Figure 5 are obtained by selecting the best performing search block parameters for each similarity metric. In this sense, SAD required a larger search block size (25×25) to be taken in order to yield similar performance to CGO, where a block size of 15×15 sufficed. Obviously, larger block sizes are computationally more costly and the computational burden arising from larger block size in SAD is far greater than the simple gradient orientation map computations in CGO.
4.2 Subjective assessment of stereo scenes
We employed 15 subjects to assess the quality of the 3D stereo video data (as described in Section 3.2), and they reported their viewing experience in a follow-up questionnaire. This population size is suggested in the ITU-BT.500 recommendation as the lower limit of the cohort size. These subjects were chosen among students within an age range of 20 to 30. The subjects had normal visual acuity since none of them were normally wearing eye glasses. All subjects confirmed that they had previously watched stereoscopic 3D movies and that they did not experience any trouble in perceiving different levels of depth. For our calculations, we take the interpupillary distance of the subjects as 65 mm as it corresponds to the average human eye separation . The display was a commercial 3D TV having 1,920×1080 pixel resolution and 89×50 cm screen dimensions. The stereoscopic display system was a two-view TV based on temporal multiplexing (shutter glasses) to create the stereoscopic depth sense. The subjects watched the videos sitting in a comfortable chair at a distance of approximately 2 m in a room lit by subdued daylight. The subjects were not guided for looking at a particular point or object on the screen. The comfortable viewing zone is defined as a perceptual depth range, where the stereoscopic visual comfort is maintained . Accordingly, the foreground and background distances of the comfortable viewing zone in our experiments are 0.57 and 1.33 m, respectively. After watching each scene, the subjects were asked if they felt any strain on their eyes at any part of the scene.
Measures of estimated maximum 95 percentile disparity and the number of subjects reporting discomfort for the 12 scenes
(in % of the frame resolution)
(in absolute minutes of arc)
Number of subjects who reported eye strain
Mean absolute errors of linear regression between explanatory variables and the number of subjects reporting eye strain
Independent regression parameters
Multivariable regression of and yields the lowest MAE score and hence the best prediction. In comparison to single variable regression with only, using the slew rate of 95 percentile disparities () together with improves the prediction performance by yielding 25% lower MAE score (0.8032). Although the yields the worst single parameter prediction results, it is interesting to observe that it can significantly improve the viewer discomfort prediction performance when combined with. Slew rate of 95 percentile disparities gives a notion about the rate of the disparity changes within the scenes. Our experiments confirm that the sudden disparity changes, captured with statistics, are also an important factor in the assessment of stereo video quality together with maximum disparity statistics captured with. In any case, it is encouraging to observe that eye strain can be predicted with an average error rate of 1 person among 15 subjects (6.7%).
We would like to discuss the limitations of our work. Recently, it has been pointed out that subjective assessment requires an evaluator cohort larger than the suggested size in ITU-BT.500. In our exploratory study, the material provided to us consisted mostly of outdoor scenes. A richer repertoire of video material including indoor scenes would be desirable. For example Lambooij et al. have shown that the visual comfort of video characteristics depends on the activity in the scene . In addition to the post-session questionnaire, on-session monitoring of the viewer’s comfort, e.g., with a slider command would be valuable, for example, to monitor the onset of discomfort. Our questionnaire was binary; enabling multilevel qualified answers, allowing for example, bad, poor, fair, good, and excellent gradations would yield richer information but would possibly demand more experienced, if not expert, subjects. Finally, and perhaps most importantly, we have addressed only one aspect of visual discomfort.
In this study, we proposed a new maximum disparity estimation method and evaluated its performance in comparison with other four state-of-the-art methods, as a simple, fast, and objective 3D video quality assessment method. As the driving force for this study is to develop and validate a simple, objective, and fast method for 3D video quality assessment from the view point of viewers’ comfort, to be used by broadcast companies, the video content and viewing environment dependent factors are not considered.
A combination of maximum and slew rate of 95 percentile disparity statistics per scene, estimated with the proposed CGO algorithm, was shown to predict viewer discomfort, seen as eye strain, with higher accuracy. The results of limited user tests suggest that the viewer discomfort is directly affected by even short duration of high disparities and sudden disparity changes in a scene, leading to low quality perception of the scene. This may be due to cognitive processes that drive the perception of video on the basis of scenes. This fact is especially important in decision making regarding the broadcast of 3D video content. A direct use of the proposed method would be a monitoring tool to preempt uncomfortable viewing experience especially for live 3D video shooting. In a future study, CGO algorithm can further be enhanced by using time-correlated information between consecutive frames to improve estimation reliability and smooth disparity time sequence.
Disparities in a band
Along the depth of a scene, the disparity values are expected to range from the smallest negative values in the foreground to the largest positive values in the background (left image is taken as the reference). Accordingly, choosing the smallest signed disparities across the band means choosing the disparities that belong mostly to foreground objects. Since large disparities are typically associated with foreground objects, this processing step is consistent with our goals and is helpful in resolving some of the ambiguities. The s-wide band around each edge pixel is processed according to the following rule (Figure 8):
If the number of unreliable estimates <s/2, then chose the minimum disparity.
If the number of unreliable estimates >s/2 and center pixel is reliable, then chose the minimum disparity.
If the number of unreliable estimates >s/2 and center pixel is unreliable, then leave as unreliable.
This research was supported by research grants from Digitürk. The authors would like to thank software development team and 3D shooting crew at Digitürk for their support in data collection and evaluation, Bernhard Moser for the fruitful discussions, and MATLAB codes for Herman Weyl’s discrepancy calculations.
- Boev A, Hollosi D, Gotchev A: Classification of stereoscopic artefacts. Tech. Rep. 216503, MOBILE3DTV Project 2008. . Accessed at 13 Sep 2013 http://sp.cs.tut.fi/mobile3dtv/results/tech/D5.1_Mobile3DTV_v1.0.pdf Google Scholar
- Chen W, Fournier J, Barkowsky M, Le Callet P, et al.: New requirements of subjective video quality assessment methodologies for 3DTV. Scottsdale: Video Process. Qual. Metrics (VPQM); 2010.Google Scholar
- Lambooij M, Fortuin M, Heynderickx I, IJsselsteijn W: Visual discomfort and visual fatigue of stereoscopic displays: A review. J. Imaging Sci. Technol 2009, 53(3):1-14.View ArticleGoogle Scholar
- Jolly S, Zubrzycki J, Grau O, Vinayagamoorthy V, Koch R, Bartczak B, Fournier J, Gicquel J, Tanger R, Barenbrug B, Murdoch M, Kluger J: 3D Content requirements & initial acquisition work. Public Document 215075, 3D4YOU Project. 2009.http://www.hitech-projects.com/euprojects/3d4you/www.3d4you.eu/images/PDFs/3D4YOU_WP1_D1.1.2_v1.0.pdf . Accessed at 13 Sep 2013Google Scholar
- Shibata T, Kim J, Hoffman DM, Banks MS: The zone of comfort: Predicting visual discomfort with stereo displays. J. Vis 2011., 11(8):Google Scholar
- Ukai K, Howarth PA: Visual fatigue caused by viewing stereoscopic motion images: background, theories, and observations. Displays 2008, 29(2):106-116. 10.1016/j.displa.2007.09.004View ArticleGoogle Scholar
- Lambooij M, Murdoch M, IJsselsteijn W, Heynderickx I: The impact of video characteristics and subtitles on visual comfort of 3D TV. Displays 2013, 34: 8-16. 10.1016/j.displa.2012.09.002View ArticleGoogle Scholar
- Lambooij M, IJsselsteijn W, Heynderickx I: Visual discomfort of 3D TV: assessment methods and modeling. Displays 2011, 32(4):209-218. 10.1016/j.displa.2011.05.012View ArticleGoogle Scholar
- Benoit A, Le Callet P, Campisi P, Cousseau R: Using disparity for quality assessment of stereoscopic images. In 15th IEEE International Conference on Image Processing. San Diego; 12–15 Oct 2008:389-392.Google Scholar
- Kanade T, Okutomi M: A stereo matching algorithm with an adaptive window: theory and experiment. Pattern Anal. Mach. Intell. IEEE Trans 1994, 16(9):920-932. 10.1109/34.310690View ArticleGoogle Scholar
- Yoon KJ, Kweon IS: Locally adaptive support-weight approach for visual correspondence search. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Colorado Springs; 20–25 Jun 2005:924-931.Google Scholar
- Boykov Y, Veksler O, Zabih R: Fast approximate energy minimization via graph cuts. Pattern Anal. Mach. Intell. IEEE Trans. 2001, 23(11):1222-1239. 10.1109/34.969114View ArticleGoogle Scholar
- Felzenszwalb P, Huttenlocher D: Efficient belief propagation for early vision. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, D.C.; 27 Jun–2 Jul 2004:261-268.Google Scholar
- Scharstein D, Szeliski R: A taxonomy evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis 2002, 47: 7-42. 10.1023/A:1014573219977View ArticleGoogle Scholar
- Hirschmüller H, Scharstein D: Evaluation of cost functions for stereo matching. In IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis; 17–22 Jun 2007:1-8.Google Scholar
- R Zabih: Non-parametric local transforms for computing visual correspondence. In Computer Vision — ECCV ’94, Lecture Notes in Computer Science. Edited by: Eklundh JO. Berlin: Springer; 1994:151-158.Google Scholar
- Moser B: A similarity measure for image and volumetric data based on Hermann Weyl’s discrepancy. Pattern Anal. Mach. Intell. IEEE Trans 2011, 33(11):2321-2329.View ArticleGoogle Scholar
- Hirschmüller H, Innocent P, Garibaldi J: Real-time correlation-based stereo vision with reduced border errors. Int. J. Comput. Vis 2002, 47: 229-246. 10.1023/A:1014554110407View ArticleGoogle Scholar
- Liu C, Yuen J, Torralba A: SIFT flow: dense correspondence across scenes and its applications. Pattern Anal. Mach. Intell. IEEE Trans 2011, 33(5):978-994.View ArticleGoogle Scholar
- Fitch A, Kadyrov A, Christmas WJ, Kittler J: Orientation correlation. In British Machine Vision Conference. Cardiff; 2–5 Sept 2002:133-142.Google Scholar
- Scharstein D, Szeliski R: High-accuracy stereo depth maps using structured light. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Wisconsin; 18–20 Jun 2003:195-202.Google Scholar
- Middlebury dataset . Accessed 13 Sept 2013 http://vision.middlebury.edu/stereo/data/
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.