Real-time stereo vision system using adaptive weight cost aggregation approach
© Ding et al; licensee Springer. 2011
Received: 31 March 2011
Accepted: 7 December 2011
Published: 7 December 2011
Many vision applications require high-accuracy dense disparity maps in real time. Due to the complexity of the matching process, most real-time stereo applications rely on local algorithms in the disparity computation. These local algorithms generally suffer from matching ambiguities as it is difficult to find appropriate support for each pixel. Recent research shows that algorithms using adaptive cost aggregation approach greatly improve the quality of disparity map. Unfortunately, although these improvements are excellent, they are obtained at the expense of high computational. This article presents a hardware implementation for speeding up these methods. With hardware friendly approximation, we demonstrate the feasibility of implementing this expensive computational task on hardware to achieve real-time performance. The entire stereo vision system, includes rectification, stereo matching, and disparity refinement, is realized using a single field programmable gate array. The highly parallelized pipeline structure makes system be capable to achieve 51 frames per second for 640 × 480 stereo images. Finally, the success of accuracy improvement is demonstrated on the Middlebury dataset, as well as tests on real scene.
Keywordsstereo vision real-time adaptive support weight field programmable gate array (FPGA)
Stereo vision has traditionally been and continues to be one of the most extensively investigated topics in computer vision. Since stereo can provide depth information, it has potential uses in many visual domains such as autonomous navigation, 3D reconstruction, object recognition, and surveillance systems. Especially, it is probably the most widely used for robot navigation in which accurate 3D information is crucial for the reliability of navigation. Compared with other range sensors such as laser scanner or time-of-flight, stereo vision is a technology that can deliver the sufficient description of the surrounding environment. Moreover, it is purely passive technology and thus offers low cost and high reliability solutions for many applications.
Stereo matching algorithms are computationally intensive for finding reliable matches and extracting dense map. As a result, sparse feature matching methods for stereo correspondence were widely used at first, due to their efficiency. A wide variety of approaches were proposed and greatly improved the accuracy of the stereo matching results over the last decade. However, real-time dense stereo is still difficult to be achieved with general purpose processors. For real-time requirements of most applications, the specific algorithms were often implemented using dedicated hardware, like digital signal processors (DSPs), graphics processing units (GPUs), application specific integrated circuits (ASICs), and field programmable gate arrays (FPGAs). In the last few years, the GPUs have become more and more popular. Using GPUs for stereo acceleration can directly be a solution for PC-oriented applications. However, the high power consumption limits their applications. FPGAs have already shown their high performance capacity for image processing tasks especially for embedded systems. An FPGA consists of an array of programmable logic blocks and represents an efficient solution to do parallel processing. Comparing with ASICs, FPGAs are re-programmable and have a relatively short design cycle. This makes the FPGAs offer great flexibility in manipulating the algorithm.
A lot of work has been carried out on hardware implementation of stereo algorithms. However, they differ considerably in their basic design principles. Integrating specific algorithm in an embedded system is a delicate task, as it faces the factors of limited resources and different scales. At present, few high-performance implementations of stereo vision algorithms exist. The key challenge in realizing a reliable embedded real-time stereo vision system is keeping the balance of execution time and the quality of the matching results.
To solve these issues, a real-time stereo vision system is designed in this article to produce depth information both fast and accurate. All the modules are completely implemented inside a single chip of state-of-art FPGA. Our study is motivated by the following observation that: with carefully selecting the support region, the adaptive support weight (AW) approach  leads to a dramatic improvement in performance of the matching quality. Unfortunately, this advantage does not come for free. Since for each pixel many weight factors have to be computed, it is obtained at the expense of high computational requirements. This is very crucial, since it cancels out the biggest advantage over other complicated methods, the fast computation time. Being aware of hardware features, we propose a stereo system on FPGA with adaptive support weight approach. Currently, the only solution incorporating AW algorithm was proposed by Chang et al.  recently. Their proposed architecture was only evaluated by standard ASIC cell library. According to the implementation of a system, improvements are still needed in regard to the limited resources consumption, frame rate and integration of pre- and post-processing. Our contribution goes one step further, the entire stereo vision process, includes rectification, stereo matching, and post-processing, is designed using a single FPGA. The AW algorithm consists of multiplication and division operations which are difficult to implement on FPGA. As a consequence, original algorithm is modified in an essential way to eliminate the computational bottleneck and makes it hardware friendly. This design concept is realized as highly parallelized pipeline structure with good resource utilization. The implemented system can generate disparity images of 640 × 480 resolution at a frame-rate of 51 fps.
There are two main contributions in this article. First, the design of a complete stereo vision system is a combination of process including rectification, stereo matching, and post-processing. We keep the high data parallel structure in algorithm design, such that the architecture can efficiently exploit the capabilities of an FPGA with pipeline and process parallelism. Second, the AW approach is modified by introducing hardware-friendly approximation. The total design concept aims at balancing the functional performance, the computational burden, and the usage of the device resources. To the best of the authors' knowledge, it is the first solution to implement AW algorithm in a complete FPGA-based system.
The remainder of this article is organized as follows: Section 2 introduces background of stereo vision and related works on real-time implementations, and Section 3 describes the adaptive support weights cost aggregation method with corresponding hardware-friendly approximation. Then, we present our system design as well as detailed description about the implementation in Section 4. Section 5 evaluates the experimental results. In the end, Section 6 presents final conclusions.
2 Background and related works
A stereo matching algorithm tries to solve the correspondence problem for projected scene points and results in disparity map. Scharstein and Szeliski  have provided an excellent survey of stereo algorithms and a taxonomy was given based on matching cost, aggregation, and optimization. In general, two main categories can be distinguished: local algorithms and global algorithms. The local algorithms estimate the disparity value at a given point only based on the intensity values within a support region around the point. Global algorithms, such as dynamic programming [4, 5], graph cuts , and belief propagation , make explicit smoothness assumptions of the disparity map to improve estimated results. Typically, these algorithms solve for the disparity map by minimizing a pre-defined global cost function. The smoothness constraint improves the stereo quality in textureless areas and pixel-based photo-consistency term reduces the blurring effect in object border regions. Recently, another kind of method called semi-global matching (SGM) was proposed . It performs an optimization through multiple paths across the entire image to approximate the global optimization.
Most top-ranked dense stereo algorithms rely on global optimization methods. However, the algorithms usually did not depend on the specific global method alone. As indicated by Li and Zuker , the smoothness priors used implicitly encourage the frontal parallel plane which will poorly capture the real scene by improperly splitting a slanted or curved surface. As a result, the most successful variants related class of "segment-based" methods [10, 11], and the optimization is usually complex and extremely computation intensive. To include processing time in the evaluation, only few algorithms declared capable of near real-time. And the original algorithm is constructed via a iterative and sequential procedure which is difficult to utilize hardware parallelism. In order to achieve real-time, recent advances exploit the parallel computational power within GPUs [12, 13]. With some necessary modifications, some global methods also implemented in the very large scale integration circuit [14, 15] at the expense of considerable high consumption of logic source, memory and bandwidth.
Take account of the balance between the resource consumption and performance, we still focus on the local algorithms. Local algorithms can be efficient but they are sensitive to local ambiguities and noise. Early research mostly studied the work of different similarity measure or used the combination of different pre and post-processing methods to improve the matching quality. Here we do it another way: our research is concern with the improvement on the essence of the algorithm. More specifically, we improve the way to aggregate support during correlation. Local algorithms rely on support windows. The major challenge in local algorithms is to find a well-suited size for the typically square support window. Large support windows give sufficient intensity variation to reduce ambiguities, but result blurred object borders and lose of detail. Small support windows reduce the problem, but increase the influence of local ambiguities, which leads to a decrease of correct matches. To handle these areas, the variable window  algorithm was proposed at first. Later, Bobick et al.  and Fusiello et al.  proposed shiftable-window approaches which consider multiple windows located at different positions and select the one with smallest cost. Hirschmüller et al.  proposed another extension which picks several sub-windows from multiple windows configuration.
The methods mentioned above are still restricted to limited sizes and shapes. Understanding that, several adaptive-window approaches [20, 21] have been proposed, which could model non-rectangular support windows. Instead of finding an optimal support window with arbitrary shape and size, Yoon et al.  proposed the adaptive support weight algorithm that adjusts the support-weight of each pixel in a given support window. Tombari et al.  and Gong et al.  have evaluated many cost aggregation methods in a Winner-Takes-All framework, considering both matching accuracy and execution speed. In their evaluation results, adaptive support weight (AW) approach is the leading method in terms of accuracy. It is even comparable to many global optimization based algorithms. However, the aggregation process becomes computationally expensive. As reported in , it took about one minute to produce a small depth map.
2.1 Related studies
Due to the computational complexity of stereo algorithm, a number of attempts have been made to realize real-time performance. The works in [24, 25] tried to implement real-time stereo matching on general purpose process, however the limited computing power restricts the frame rates. The DSPs have more computation power than general purpose processors. An early system was introduced by Kanade et al. , a hybrid system based on a custom hardware with C40 DSP array. Later, Konolige  introduced the SRI stereo vision system performing rectification and area correlation. It is one of the most famous DSP solutions. However, the DSPs still have the limitation of their sequential operation. Another powerful solution is the use of GPUs. The GPU is a massively parallel computing device. Using GPUs for stereo matching was first investigated by Yang and Pollefeys . They used the sum-of-square difference (SSD) dissimilarity measures for windows of different sizes. With the advantage of compatibility and flexibility, the GPUs solutions could implement complex global optimizing algorithms [29–31]. But, GPUs are generally too expensive in power consumption for embedded applications.
Regarding the hardware, FPGAs remains the most popular choice because of its inherent parallelism architecture and high computational power. The original study was begun by Woodfill and Von Herzen , the PARTS reconfigurable computer which consists of 4 × 4 mesh connected FPGAs was used to implement stereo matching. Keeping this research line, a number of real-time stereo systems were presented in the literature in recent years.
For the Sum of Absolute Differences (SAD) algorithm, Miyajima and Maruyama  proposed a stereo vision system which connects the personal computers using PCI cards. The system can process images with a size of 640 × 480 at a frame rate of 20 fps. The processing time is closely related to the window size, as it was mentioned, the performance becomes worse as the window size gets larger. Perri et al.  proposed an stereo matching circuit processing 512 × 512 images using a disparity range of 255 pixels. This study shows the advantage in terms of large disparity range, but the 5 × 5 of window size is not considered as sufficiently enough for correlation. MingXiang and Yunde  proposed a stereo vision system on programmable chip. It performs 320 × 240 pixels dense disparity mapping in 32 disparity levels, achieving video rate. Recently, Calderon et al.  presented a solution with two step processing algorithm. The hardware accelerator works within five pipeline stages could achieve 142 fps for CIF format image, at a frequency of 174.5 MHz.
Another popular algorithm for hardware implementation is census-based matching method. Woodfill et al.  described an ASIC design called DeepSea which enables the processing of 512 × 480 at a disparity range of 52 pixels, a block size of 7 × 7, and a high frame rate of 200 fps. The technique for implementing a flexible block size, disparity range and frame rate was proposed in . The impact of using different similarity measures as SAD, rank, and census transform was also presented. Another census-based approach has been introduced by Murphy et al. . Here, Xilinx Spartan-3 FPGA was used and a frame rate of 150 fps could be achieved for 320 × 240 pixel images.
The phase-based computational model provided another alternative to correlation methods [40, 41]. Diaz et al.  developed a phase-based stereo vision design which generates about 20 stereo disparities using 1280 × 960 pixel images. As the number of phase correlation units directly related to the disparity range, the resource limitation on the FPGA limits the range of disparity. In order to handle this problem, Masrani and MacLean  took the advantage of the temporal information to extend disparity range without large resource consumption.
In recent years, Ambrosch and Kubinger  proposed a stereo matching implementation that extends the Census Transform to gradient image and prepared to offer as an IP core for embedded real-time system. Another recent implementations was proposed by Jin et al. , who designed a stereo matching system based on a Xilinx Virtex-4 FPGA, processing 640 × 480 images with block size 15 × 15 and a disparity range of 64 pixels. Although all these implementations mentioned above exhibit good real-time behavior, these two studies also present a complete discussion of the accuracy of the algorithm on the Middlebury stereo datasets.
3 Adaptive support weight approach
where Δc p, q represents the color difference and Δg p,q represents the spatial distance between the pixel p and q. The f c and f g represent the two weighting functions, respectively, assign weight value by color similarity and geometric proximity.
with choice of gaussian variance value γ tunes up the strength of the resulting weights.
This mechanism reduces the influence of occluded pixels.
3.1 Hardware-friendly approximation
In spite of the dramatic improvement of accuracy brought by adaptive support weight approach, it pays the cost of high computational complexity which makes it not only time-consuming but also resource-intensive. While designing with FPGAs is faster than designing ASICs, it suffers from the problem of fixed resources. Thus, approximations must be introduced to provide trade-off between best overall performance and resources consumption.
Number of operations required of AW cost aggregation for every pixel, when using n × n support region size and d disparity range
3(n2 × d)
2(n2 - 1)
Omit weights of reference image
2(n2 × d)
n2 - 1
4(n × d)
+Truncation of weight value
4(n × d)
where Nvet is a notion of neighborhood in vertical direction and Nhor represents neighborhood in horizontal direction, respectively.
In this section, the modified algorithm we present was evaluated. After those hardware-friendly approximation, the disparity maps we generated were not as good as those reported in the original article . To find the good trade-off, the terms of matching quality and hardware efficiency were mainly considered. As reference for the matching quality, four stereo image sets from the Middlebury benchmark datasets are used. These are the Tsukuba, venus, Teddy, and Cones datasets. As evaluation criterion for the matching quality, we use the average error rate which is average percent of bad pixels of all four benchmark datasets.
4 System design
In this section, we first describe the overview of our system briefly. Then, we discuss the implementation of each processing module in details.
4.1 System overview
4.2 Hardware implementation
4.2.1 Rectification module
4.2.2 Stereo matching module
As we mentioned above, the corresponding pixels in rectified images only differ in horizontal displacement. Here we use the shift registers to make the correlation under different disparity candidates. The disparity search range is currently set to 0-59 pixels. The column vector of image pixels as a whole unit walks through the delay network one after another. Since the delay network consists of shift registers, it guarantees the correlated data under different disparity hypotheses be accessed in parallel. Multiple cost aggregation modules are used for calculating the final cost of every disparity candidate. This mechanism gives us the opportunity to estimate disparity synchronized with column data input.
The significant differences between AW and other aggregation methods is that the sliding window cannot be applied, as the adaptive weight has to be recomputed at every pixel. Therefore, cost aggregation has to be performed in an exhaustive manner and the computational requirements are directly depending on the size of matching window. This is particularly bad, as we found the AW approach usually require larger window to get good quality result. Thus, we had to reduce the computation's complexity and make it again suitable for real-time implementations. Figure 12 shows block diagrams of our aggregation strategy. Instead of directly aggregates cost of whole support size n × n, we use a two pass approach that first pass aggregates cost along vertical direction, followed by a pass aggregating cost along horizontal direction. This reduces the arithmetic complexity form O(n2) to O(2n). It is also worth observing that the weight term obtained only using the target image in our weight generation strategy and the correlated blocks shares the same weights under the different disparity hypotheses. Consequently, we omit the weights accumulation operation as well as the normalization operation module. The weight of every pixel is a combination of the similarity and proximity measurements. Thus those two kinds of weights component have to be calculated in each aggregation pass.
It simplifies the weighted action through using the shift operation instead of multiplication. The intensity difference of center pixel is shifted by the highest coefficient C_0. In the next stage, the outputs of the shift modules are provided to the subsequent geometric proximity weighting module. Here we exploit the symmetry of the weighting function of the geometric component. The top and bottom pixels in the column share the same geometric coefficient P_0, so those two costs are summed at first then shifted as a group. The cost of pixel in the center of the column is not belong to any pair and is directly done with shift. Once the costs are proximity weighted, the weighted values are summed up by the adder tree to one row value. The intensities of vertical center pixels are also transmitted out for calculating the color similarity in horizontal aggregation.
4.2.3 Post processing module
The post processing module contains three sub-modules as follows: sub-pixel estimation, uniqueness check, and median filter. They can be divided into two categories: (a) the sub-pixel estimation improves the resolution of final results, (b) the other two sub-modules used to detect unreliable matches and improve the accuracy of the depth measurement.
Finally, the median filtering is applied to the disparity data. The median operation applies a 3 × 3 filter on the disparity image. It can enhance the disparity results by cleaning up spurious isolated matches which almost are false ones. Median filtering is also implemented based on the window processing architecture and similar scan line buffers are used to construct filter window as stereo matching module. For such small kernel, the element in the middle position can be identified through pipeline sorting and swapping operations which only consumes limited system resources .
5 Experimental results and discussion
The application of our hardware-based stereo vision system is mainly focus on robot navigation. This task requires the system continuing providing accurate information about the environment. Thus, for the analysis of our system we evaluate the performance of the implementation both in accuracy and running speed.
Device utilization summary
Number of Slice Registers
Number of Slice LUTs
Number of DSP48Es
Number of TEMACs
Module level utilization
Number of occupied Slices
Number of BRAM/FIFO
The licensed Ethernet Media Access Controller (TEMAC) core is used to develop Ethernet communications. Two DSP48E slices are used to rectify left and right images, respectively. The results show that the logic resources consumption are dominated by the stereo matching module due to its high number of aggregations, while the rectification and post processing require slightly less logic. Since the correlation modules of different disparity hypothesis are processed in parallel, increasing the range of disparities will proportionally increases the necessary resources for stereo matching module. Applying block reusing techniques could optimize resource usage, but on the expense of processing speed. On the other hand, increasing image resolution has little effect on the resources consumption, since our architecture is based on the local pixel processing. The results also show that vast majority of the FPGA's block RAM is consumed by the scan line buffers in the stereo matching module.
Real time performance of reported stereo vision systems based on FPGA
fps (60 MHz)
Ambrosch et al. 
750 × 400
Miyajima et al. 
640 × 480
Mingxiang et al. 
320 × 240
Calderon et al. 
320 × 240
Perri et al. 
512 × 512
Murphy et al. 
320 × 240
Jin et al. 
640 × 480
Diaz et al. 
1280 × 960
Darabiha et al. 
360 × 256
Chang et al. 
352 × 288
640 × 480
The accuracy of disparity maps
Jin et al. 
Darabiha et al. 
Ambrosch et al. 
Banz et al. 
Veksler et al. 
Grauer-Gray et al. 
Gong et al. 
Yang et al. 
Adapt. Weights 
Finally, it must be acknowledged that our algorithm is not among the top performance in the comparative results. The global algorithm generally performs better than correlation-based algorithm. However, the iterative, sequential operations of global algorithm still make it difficult to implement in median-scale FPGA. SGM is another interesting algorithm which performs an optimization through multiple paths across the entire image to approximate the global optimization. Recent advancements in FPGA technology have made it possible to implement SGM algorithm with necessary modifications [52, 53]. Although it still needs to consume excessive memory to store the temporary cost of different aggregation path, it still have the potential to affect the trends in Hardware implementation due to its high accuracy results.
In this article, we have proposed a high performance FPGA-based stereo vision system. Our system exploits a novel cost aggregation approach called adaptive support weights method. This approach has shown to be remarkably effective. It essentially similar to "segment-based" algorithm while avoids the difficult problem of image segmentation. However, this aggregation scheme is very expensive in terms of computation. As the weighting mask varies from pixel to pixel, it cannot be computed by means of incremental calculation schemes. Also it suffers from complex arithmetic operations like multiplication and division. Our analysis shown the necessity of trade-offs between the accuracy and efficient hardware implementation. With hardware friendly approximation, we demonstrate the feasibility of implementing this expensive computational task on hardware to achieve frame-rate performance. Evaluation results have shown that our implementation is among one of the best performing local algorithms in terms of quality. In addition, the highly parallelized pipeline structure makes system be capable to handle 640 × 480 pixels image at over 51 fps. The adaptive cost aggregation units of this system can also be reused as bilateral filter for noise reduction in other vision systems.
In the future, the proposed system will be used for higher level vision applications such as autonomous vehicle navigation. Some improvements still could be extended. It is expected that the accuracy performance can be improved using the pre-processing step to reject the matches belonging to poorly textured areas. Moreover, with the fast evolvement of FPGA technology, it is possible to include soft processor core within an FPGA device. This customization enables the integrated design for higher-level control tasks.
The authors would like to thank Xin Du for his thoughts and suggestions, and Xinhuan Wang for his technical assistance during the course of this study. The authors would also like to acknowledge the financial support from the National Natural Science Foundation of China via grant 61001171, 60534070, and 90820306, and the Natural Science Foundation of Zhejiang Province (Grant No. Y1090881).
- Yoon KJ, Kweon IS IS: Adaptive support-weight approach for correspondence search. IEEE Trans Pattern Anal Mach Intell 2006, 28(4):650-656.View ArticleGoogle Scholar
- Chang NYC, Tsai TH, Hsu BH, Chen YC, Chang TS: Algorithm and architecture of disparity estimation with mini-census adaptive support weight. IEEE Trans Circ Syst Video Technol 2010, 20(6):792-805.View ArticleGoogle Scholar
- Scharstein D, Szeliski R: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int J Comput Vis 2002, 47: 7-42. 10.1023/A:1014573219977View ArticleMATHGoogle Scholar
- Amini AA, Weymouth TE, Jain RC: Using dynamic programming for solving variational problems in vision. IEEE Trans Pattern Anal Mach Intell 1990, 12(9):855-867. 10.1109/34.57681View ArticleGoogle Scholar
- Veksler O: Stereo correspondence by dynamic programming on a tree. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05) 2005, 2: 384-390.Google Scholar
- Boykov Y, Veksler O, Zabih R: Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 2001, 29: 1222-1239.View ArticleGoogle Scholar
- Sun J, Zheng NN, Shum HY: Stereo matching using belief propagation. IEEE Trans Pattern Anal Mach Intell 2003, 25(7):787-800. 10.1109/TPAMI.2003.1206509View ArticleMATHGoogle Scholar
- Hirschmüller H: Accurate and efficient stereo processing by semi-global matching and mutual information. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05) 2005, 2: 807-814.Google Scholar
- Li G, Zucker S: Surface geometric constraints for stereo in belief propagation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'06) 2006, 2: 2355-2362.Google Scholar
- Klaus A, Sormann M, Karner K: Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. Proceedings of the IEEE International Conference on Pattern Recognition (ICPR'06) 2006, 3: 15-18.View ArticleGoogle Scholar
- Yang Q, Wang L, Yang R, Stewenius H, Nister D: Stereo matching with color-weighted correlation, hierachical belief propagation and occlusion handling. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'06) 2006, 2: 2347-2354.Google Scholar
- Yang R, Pollefeys M, Li S: Improved real-time stereo on commodity graphics hardware. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'05) 2005, 36-43.Google Scholar
- Yang R, Pollefeys M: A versatile stereo implementation on commodity graphics hardware. Real-Time Imag 2005, 11: 7-18. 10.1016/j.rti.2005.04.002View ArticleGoogle Scholar
- Pérez J, Sanchez P, Martinez M: High memory throughput FPGA architecture for high-definition Belief-Propagation stereo matching. Proceedings of the 3rd International Conference on Signals, Circuits and Systems (SCS) 2009, 1-6. IEEEGoogle Scholar
- Liang C, Cheng C, Lai Y, Chen L, Chen H: Hardware-efficient belief propagation. IEEE Trans Circ Syst Video Technol 2009, 21(5):525-537.View ArticleGoogle Scholar
- Okutomi M, Kanade T: A locally adaptive window for signal matching. Int J Comput Vis 2nd edition. 1992, 7: 143-162. 10.1007/BF00128133View ArticleGoogle Scholar
- Bobick A, Intille S: Large occlusion stereo. Int J Comput Vis 3rd edition. 1999, 33: 181-200. 10.1023/A:1008150329890View ArticleGoogle Scholar
- Fusiello A, Roberto V, Trucco E: Symmetric stereo with multiple windowing. Int J Pattern Recogn Artif Intell 8th edition. 2000, 14: 1053-1066.Google Scholar
- Hirschmüller H, Innocent P, Garibaldi J: Real-time correlation-based stereo vision with reduced border errors. Int J Comput Vis 2002, 47: 229-246. 10.1023/A:1014554110407View ArticleMATHGoogle Scholar
- Boykov Y, Veksler O, Zabih R: A variable window approach to early vision. IEEE Trans Pattern Anal Mach Intell 1998, 20(12):1283-1294. 10.1109/34.735802View ArticleGoogle Scholar
- Veksler O: Fast variable window for stereo correspondence using integral images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'03) 2003, 1: 556-561.Google Scholar
- Tombari F, Mattoccia S, Di Stefano L, Addimanda E: Classification and evaluation of cost aggregation methods for stereo correspondence. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08) 2008, 1-8.Google Scholar
- Gong M, Yang R, Wang L: A performance study on different cost aggregation approaches used in real-time stereo matching. Int J Comput Vis 2007 2007, 75(2):283-296. 10.1007/s11263-006-0032-xView ArticleGoogle Scholar
- Mühlmann K, Maier D, Hesser J, Mnner R: Calculating dense disparity maps from color stereo images, an efficient implementation. Int J Comput Vis 2002, 47: 79-88. 10.1023/A:1014581421794View ArticleMATHGoogle Scholar
- Zinner C, Humenberger M, Ambrosch K, Kubinger W: An optimized software-based implementation of a census-based stereo matching algorithm. Adv Visual Comput 2008, 5358: 216-227. 10.1007/978-3-540-89639-5_21Google Scholar
- Kanade T, Yoshida A, Oda K, Kano H, Tanaka M: A stereo machine for video-rate dense depth mapping and its new application. Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR'96) 1996, 196-202.View ArticleGoogle Scholar
- Konolige K: Small vision systems: hardware and implementation. In Proceedings of the Eighth International Symposium on Robotics Research. Volume 8. MIT Press; 1997:203-212.Google Scholar
- Yang R, Pollefeys M: Multi-resolution real-time stereo on commodity graphics hardware. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'03) 2003, 1: 211-217.Google Scholar
- Grauer-Gray S, Kambhamettu C: Hierarchical belief propagation to reduce search space using cuda for stereo and motion estimation. Proceedings of the Workshop on Applications of Computer Vision (WACV) 2009, 1-8. IEEEGoogle Scholar
- Gong M, Yang YH: Near real-time reliable stereo matching using programmable graphics hardware. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05) 2005, 1: 924-931.Google Scholar
- Yang Q, Wang L, Yang R, Wang S, Liao M, Nister D: Real-time global stereo matching using hierarchical belief propagation. Proceedings of the British Machine Vision Conference (BMVC'06) 2006, 989-998.Google Scholar
- Woodfill J, Von Herzen N: Real-time stereo vision on the PARTS reconfigurable computer. Proceedings of the Symposium on FPGAs for Custom Computing Machines 1997, 201-210. IEEEGoogle Scholar
- Miyajima Y, Maruyama T: A real-time stereo vision system with FPGA. In Field Programmable Logic And Application. Springer; 2003:448-457.View ArticleGoogle Scholar
- Perri S, Colonna D, Zicari P, Corsonello P: SAD-based stereo matching circuit for FPGAs. Proceedings of the International Conference on Electronics, Circuits and Systems (ICECS'06) 2006, 846-849. IEEEGoogle Scholar
- Mingxiang L, Yunde J: Stereo vision system on programmable chip (SVSoC) for small robot navigation. Proceedings of International Conference on Intelligent Robots and Systems 2006, 1359-1365. IEEEGoogle Scholar
- Calderon H, Ortiz J, Fontaine J: High parallel disparity map computing on FPGA. Proceedings of the International Conference on Mechatronics and Embedded Systems and Applications (MESA) 2010, 307-312. IEEEGoogle Scholar
- Woodfill JI, Gordon G, Buck R: Tyzx DeepSea high speed stereo vision system. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) 2004, 3: 41-48.Google Scholar
- Ambrosch K, Kubinger W, Humenberger M, Steininger A: Flexible hardware-based stereo matching. EURASIP J Embed Syst 2008, 2008: 1-18.View ArticleGoogle Scholar
- Murphy C, Lindquist D, Rynning A, Cecil T, Leavitt S, Chang M: Low-cost stereo vision on an FPGA. Proceedings of the Symposium on Field-Programmable Custom Computing Machines (FCCM) 2007, 333-334. IEEEView ArticleGoogle Scholar
- Darabiha A, Rose J, MacLean WJ: Video-rate stereo depth measurement on programmable hardware. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05) 2003, 1: 203-210.Google Scholar
- Darabiha A, MacLean W, Rose J: Reconfigurable hardware implementation of a phase-correlation stereoalgorithm. Mach Vis Appl 2nd edition. 2006, 17: 116-132. 10.1007/s00138-006-0018-2View ArticleGoogle Scholar
- Diaz J, Ros E, Carrillo R, Prieto A: Real-time system for high-image resolution disparity estimation. IEEE Trans Image Proces 2007, 16: 280-285.View ArticleMathSciNetGoogle Scholar
- Masrani D, MacLean W: A real-time large disparity range stereo-system using FPGAs. In Proceedings of the Asian Conference on Computer Vision (ACCV'06. Springer; 2006:42-51.Google Scholar
- Ambrosch K, Kubinger W: Accurate hardware-based stereo vision. Comput Vis Image Understand 11th edition. 2010, 114: 1303-1316. 10.1016/j.cviu.2010.07.008View ArticleGoogle Scholar
- Jin S, Cho J, Dai Pham X, Lee KM, Park SK, Kim M, Jeon JW: FPGA design and implementation of a real-time stereo vision system. IEEE Trans Circ Syst Video Technol 2010, 20: 15-26.View ArticleGoogle Scholar
- Tomasi C, Manduchi R: Bilateral filtering for gray and color images. Proceedings of International Conference on Computer Vision (ICCV'98) 1998, 839-846. IEEEGoogle Scholar
- Pham T, Van Vliet L: Separable bilateral filtering for fast video preprocessing. Proceedings of International Conference on Multimedia and Expo (ICME'05) 2005, 4-8. IEEEGoogle Scholar
- Fusiello A, Trucco E, Verri A: A compact algorithm for rectification of stereo pairs. Mach Vis Appl 2000, 12: 16-22. 10.1007/s001380050120View ArticleGoogle Scholar
- Shimizu M, Okutomi M: Precise sub-pixel estimation on area-based matching. Proceedings of International Conference on Computer Vision (ICCV'01) 2001, 1: 90-97. IEEEGoogle Scholar
- Di Stefano L, Marchionni M, Mattoccia S: A fast area-based stereo matching algorithm. Image Vis Comput 2004, 22(12):983-1005. 10.1016/j.imavis.2004.03.009View ArticleGoogle Scholar
- Bates G, Nooshabadi S: FPGA implementation of a median filter. Proceedings of the IEEE Conference on Speech and Image Technologies for Computing and Telecommunications (TEN-CON'97) 1997, 2: 437-440.Google Scholar
- Banz C, Hesselbarth S, Flatt H, Blume H, Pirsch P: Real-time stereo vision system using semi-global matching disparity estimation: architecture and FPGA-implementation. Proceedings of the International Conference on Embedded Computer Systems (SAMOS) 2010, 93-101. IEEEGoogle Scholar
- Gehrig S, Eberli F, Meyer T: A real-time low-power stereo vision engine using semi-global matching. In Proceedings of the International Conference on Computer Vision Systems (ICVS). Springer; 2009:134-143.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.