Real-time single-pass connected components analysis algorithm
- Fei Zhao1Email author,
- Huan zhang Lu1 and
- Zhi yong Zhang1
https://doi.org/10.1186/1687-5281-2013-21
© Zhao et al.; licensee Springer. 2013
Received: 20 March 2012
Accepted: 27 March 2013
Published: 22 April 2013
Abstract
Due to the demand for real-time processing in real-time automatic target recognition (RTATR) systems, fast connected components analysis (CCA) is significant to RTATR performance improvement. Conventional single-pass CCA algorithms need horizontal blanking periods to resolve the equivalence, which are difficult to be applied when the streamed data is transmitted without horizontal blanking periods. In this paper, a real-time single-pass CCA algorithm is proposed. Unlike the conventional ones, we adopt the pixel as a scan unit while the line as a labeling unit and manage the correspondence of labels between adjacent rows by designing a multi-layer-index structure. Equivalence is resolved when the image is scanning, without extra processing time. The proposed algorithm is suitable for hardware acceleration, and the streamed image data can be processed during image transmission without horizontal blanking periods. Experimental results indicate that the hardware acceleration of algorithm achieves real-time CCA in RTATR system.
Keywords
1 Introduction
- (1)
Labeling first algorithms [1–5]. These algorithms label the binary image first, and then the features are extracted from the labeled image. The connected components labeling is a general purpose method, and many researchers have concentrated on the fast connected components labeling. Existing fast connected components labeling (CCL) algorithm can be divided into two classes: (a) label-equivalence-based algorithms [1–4] and (b) region-growing-based algorithms [5]. These algorithms process an image in the raster-scan order (top to bottom, left to right) at least twice. In the first scan, provisional labels are assigned, and then the key point is to resolve label equivalence by finding a unique representative label for each group of equivalent labels. In the second scan, the pixel’s provisional label is replaced by the representative label. According to different scan unit, label-equivalence-based algorithms can be divided into run-based algorithms [1], pixel-based algorithms [2, 4], and block-based algorithms [3]. Label equivalence is not recorded and resolved in such algorithms, and connected components of any shapes can be labeled in single scan. But these algorithms access the image in an irregular way, which means that the whole image must be available before labeling.
- (2)
Single-pass CCA algorithms [10–12]. D.G. Baily et al. have proposed another CCA method, in a more goal-directed way: Single-pass CCA algorithm, which accumulates the feature data (such as the area, center of gravity, and perimeter) while the pixels are being scanned and labeled, does not generate a labeled image. Essentially, the single-pass CCA algorithms are label-equivalence-based too, which resolve equivalence in each row during the horizontal blanking periods as well as merge accumulated data. These algorithms eliminate the need for producing a labeled image and spare the second re-labeling pass, hence suitable for processing streamed image data on FPGA. Furthermore, an optimized single-pass CCA algorithm [12] is proposed, which applies a label recycling scheme between adjacent rows to save the memory resources. Such methods will be of great benefit to the RTATR systems: streamed images can be processed in the transmission from detector to processing unit and this will save the processing time and shorten the delays of closed-loop control. However, these methods require horizontal blanking periods to resolve equivalence, which will be a bottleneck when the streamed data is transmitted without horizontal blanking periods.
In this study, we present a new single-pass CCA algorithm, which eliminates the requirement for horizontal blanking periods in aforementioned methods, and the FPGA implementation of proposed algorithm which achieves real-time processing for streamed images. The remainder of this paper is organized as follows. Section 2 presents the principle of the algorithm. In Section 3, the hardware implementation of the proposed method is introduced. Section 4 gives the experiment results of the algorithm in both PC and RTATR platforms, and the comparison with existing single-pass CCA algorithm is presented, too. Section 5 concludes the discussion.
2 Proposed algorithm
Stair-like connected component.
Multi-layer-index structure.
As shown in Figure 2, Previous_P_Label and New_P_Label are provisional labels assigned in the previous row and the current row; Previous_F_Label and New_F_Label are label-index tables, which indicate the representative labels corresponding to Previous_P_Label and New_P_Label. For instance, if Previous_F_Label (p) = n, it indicates that the representative label for p is n, in which p and n are labels assigned in the previous row; if Previous_F_Label (p) = 0, it indicates that the representative label for p is itself, and New_F_Label has the same meaning in the current row. MAP is the translation table, which denotes the correspondence between the representative labels in the previous row and the provisional labels in the current row. If MAP(i) = k, it indicates that i which is a representative label in the previous row has been translated to label k in the current row; if MAP(i) = 0, it indicates that representative label i in the previous row does not have a translated label in the current row. The feature data tables which store the accumulated features are presented as Data_Previous and Data_New and the four layer indexes (shown as layer 1_index, layer 2_index, layer 3_index, and layer 4_index in Figure 2) form a pipeline when a provisional label in the previous row is indexed.
In our method, the key point is to keep the consistency of multi-layer-index between adjacent rows, which means the correctness of correspondence(such as correspondence between labels in adjacent rows, correspondence between feature data in adjacent rows, and correspondence between feature data and labels in the same row) must be kept in the scanning. By setting the run as labeling unit, only the labels in the previous row are concerned in the scanning. Thus, the assignment of provisional labels in the current row is simplified, and only in some special cases (such as overlapping between runs in adjacent rows, label equivalence and tail of runs) does the multi-layer-index structure need to be updated. Furthermore, the updating can be distributed into each layer of multi-layer-index (from layer 1_index to layer 4_index), and the operations in each layer can be accomplished independently while the pixel is in scanning. By doing this, all the special cases can be resolved in the scanning, eliminating the need for horizontal blanking periods. In the scanning of pixels, the feature data are accumulated and labels are assigned to the runs, the multi-layer-index is updated in some special cases to keep the consistency, and the features of completed connected components are passed to subsequent processing unit. In this study, the proposed algorithm is divided into two blocks for introduction: (1) special case detection and feature accumulation and (2) multi-layer-index update.
2.1 Special case detection and feature accumulation
In the pixel-based scan process, the run is set as labeling unit, which means we must detect the beginning of the run at first and accumulate the features before the tail of the run. When the scanning pixel lies inside the run, overlapping between runs in adjacent rows and label equivalence in the previous row are detected, and some details (presented as overlapping modes and equivalence modes) about the special cases are provided for the updating. When the scanning pixel is the tail of the run, the run which is just scanned will be labeled by a provisional label, and some details about the run (such as run end modes) will be sent to the updating module, too.
- (1)
If P(y,x − 1) = 0 and P(y,x) ≠ 0, then P(y,x) is the beginning of a run.
- (2)
If P(y,x − 1) ≠ 0 and (y,x) ≠ 0, then P(y,x) exists inside a run.
- (3)
If P(y,x − 1) ≠ 0 and P(y,x) = 0, then P(y,x) is the first pixel behind the tail of a run.
- (4)
If P(y,x − 1) = 0 and P(y,x) = 0, then P(y,x) does not exist inside the run.
Special case detection and label assignment.
As shown in Figure 3, when the image is scanned, the label in the previous row is stored in last _label and equivalence decision is performed. When P(y,x) is the beginning of the run or inside a run, the connectivity is detected by judging info(x).g. If the connectivity is detected (overlapped = 1), the translated label (the label which corresponds to pfl 1 in the current row (i.e., MAP(pfl 1)) when MAP(pfl 1) ≠ 0, or new _cnt when MAP(pfl 1) = 0) is stored in line _new _cnt, and it is used to assign provisional label for the run when the first pixel behind the tail of the run is encountered, the line _new _cnt is also used for updating of the multi-layer-index structure. The overlapping mode (overlapping _mode) and run end mode (run _end _mode) are also recorded for the updating of the multi-layer-index structure. In the scanning, updating of info(x).g and info(x).run _start is performed at each pixel, and assignment of info(x).label is performed at the tail of the run.
Equivalence decision.
For analysis of the features of the region, such as the area, center of gravity, bounding box, or perimeter, some features should be collected when each pixel is scanned. When position of P(y,x) is judged, the data can be accumulated from the beginning to the tail of the run, and the accumulated data can be used for updating of data table.
2.2 Multi-layer-index update
Depending on different modes of special cases (such as overlapping between runs in adjacent rows, equivalence and run end), the multi-layer-index structure should be updated during the scanning to maintain the correctness of index results in each layer.
2.2.1 Overlapping between runs in adjacent rows
To save the resources for labels storing, labels are reused between adjacent rows. When the scanning pixel P(y,x) connects with run L p in the previous row, it means that the connected component which contains L p is not complete, label assigned for L p in the previous row should be translated to a new label, and the translation table MAP needs to be updated.
When it is the first time to set overlapped = 1 and MAP(pfl 1) = 0 (as shown in Figure 3) (which means L p connects with no pixels before, corresponding to overlapping _mode = 1), a new label in the current row is assigned and the translation table MAP is updated as MAP(pfl 1) = line_new_cnt; line _new _cnt is the assigned new label which is recorded in Figure 3. The feature data is translated as Data_New(line_new_cnt) = Previous(pfl 1). If it is not the first time to find overlapped = 1 (i.e., MAP(last _label) ≠ 0 corresponding to overlapping _mode = 0), no update is required.
2.2.2 Equivalence
Update when Equ _ sta = 1.
Update when Equ _ sta = 2.
2.2.3 Run end
Update in run end.
After the above-mentioned update, the multi-layer-index and data table are kept up to date. The update is carried out as soon as the pixel is scanned, and no extra periods are required.
When the last pixel in the current row is scanned, Previous _F _Label is replaced with New _F _Label, and the Data _Previous is replaced with Data _New for the updating in the next row. By analyzing the translation table MAP and the previous data table Data _Previous, connected components which are completed can be found so the feature data can be passed to the next processing unit immediately, without waiting for the end of image scanning. At the end of image, a dummy row is needed for analyzing the last row in the image and initializing the memories.
3 Hardware acceleration design
- 1.
The row buffer block is used to store the pixel information in the previous row. It is implemented by dual-port block RAM (BRAM). Considering that the information updating of scanning pixel will conflict with the provisional label assignment at the tail of the run, two BRAMs are used alternately to store pixel information in each row.
- 2.
The special cases detection and data accumulation block provide different flags for multi-layer-index updating; meanwhile, the feature data are also accumulated.
- 3.
Multi-layer-index update block updates the translation table, label-index tables, and data tables, and maintains the correctness of index results in each layer. In this block, the label-index tables (such as Previous _F _Label and New _F _Label) are implemented by register array; translation table MAP and data tables (such as Data _Previous, Data _New) are implemented by dual-port Block RAMs.
- 4.
MAP and data analysis block analyze the translation table MAP and the data table Data _Previous in the scanning, then output the feature data of connected components if they are found complete.
Architecture of hardware acceleration.
In the implementation, three data tables, two translation tables, and two label-index tables are used in turn to achieve real-time update and analysis. We assume d1, d2, and d3 are three identical data tables, m1 and m2 are two identical translation tables, and i1 and i2 are two identical label-index tables before the image is scanned, they are all initialized to 0. In the first row of image, d1 and d2 are used as Data _Previous and Data _New, m1 is used as MAP, i1 is used as Previous _F _Label, and i2 is used as New _F _Label. At the end of the first row, d2 is used as Data _Previous and d3 is used as Data _New, m2 is used as MAP, and m1 and d1 are replaced and used to extract the feature data of regions if the regions are found complete. Meanwhile, the i1 is initialized to 0 and exchanged for i2, so i2 becomes the Previous _F _Label and i1 becomes the New _F _Label in the second row. At the end of each row, the tables exchange alternately, and the analysis for completed connected components and updating of multi-layer-index can be processed in parallel.
Due to its preferable pipeline and parallel architecture, the implementation of proposed algorithm can perform real-time analysis by the original pixel clock, and no extra periods (like horizontal blanking periods in [10] and [12]) are required, which means that the time consumption of the hardware acceleration is only relevant to the frequency of pixel clock and equals to the transmission period of images.
4 Experiment results
As aforementioned, the proposed algorithm not only runs on GPP platform, but also suits hardware acceleration in FPGA-based RTATR platform. Therefore, we verified the performance in both PC and RTATR platforms. The algorithm in [5], as an acknowledged fast connected components labeling algorithm, is selected for comparison; and running time is the key measurement in the experiment. For a more special purpose, the optimized single pass [12] algorithm which is designed for hardware implementation is also selected for comparison, and the resource utilization and processing ability are analyzed.
4.1 Experiment in different platforms
In the experiment, the two platforms are PC (2.5 GHz + 2.5 GHz, 2G memory, Windows XP OS, VC6) and our RTATR (digital signal processor, DSP: TMS320C67 13,200 MHz + FPGA: XC2V3000-4FG676) system. In the RTATR system, selection of processor (DSP) is limited because of the volume and power consumption constraints, the frequency of the processor is only 200 MHz, and the external bus bandwidth is ideally 400 MB/s. For processor-based algorithms (like algorithm in [5]), the image can only be processed after the transmission, and the access of image data from external memory becomes the bottleneck of processing. Algorithm in [5] is performed by adding an analysis step after the CCL in the experiment. By contrast, the proposed algorithm can be implemented by FPGA in the transmission of image.
Experiment images. (a) Cone target, (b) two persons in forest, (c) two persons beside river, (d) truck and APCs, (e) aerial, (f) mandrill.
Experiment results
Image | CC a | Time consumption of proposed algorithm (ms) | Time consumption of algorithm in [5] (ms) | ||
---|---|---|---|---|---|
PC | RTATR | PC | RTATR | ||
a | 5 | 2.34 | 4.37 | 0.32 | 22.7 |
b | 67 | 2.81 | 4.37 | 0.34 | 30.4 |
c | 134 | 2.79 | 4.37 | 0.33 | 33.7 |
d | 220 | 2.75 | 4.37 | 0.46 | 37.8 |
e | 341 | 3.22 | 4.37 | 0.91 | 62.7 |
f | 802 | 3.33 | 4.37 | 0.82 | 73.2 |
4.2 Hardware acceleration comparison
Resource utilization
Resource | Proposed method | Algorithm in [12] |
---|---|---|
Used BRAMs | 13 (14%) | 4 |
Used slice flip-flops | 3,154 (11%) | 600 |
Used four-input LUTs | 4,587 (16%) | 1,757 |
Maximum clock frequency (MHz) | 95.7 | 40.63 |
Comparing with existing single-pass CCA algorithm (such as hardware-implemented algorithm in [12]), our goal is to obtain the center of region, and implementation in [12] only aims at the area, therefore, the used BRAM is much more than [12]. Considering the image size is larger in [12], the occupied resource in our implementation is much more than in [12]. This is because the multi-layer-index structure and row buffer occupy more RAM and registers to store intermediate results, and more logical judgments in the indexing and updating of multi-layer-index cause more occupation of LUTs. However, the contribution of this paper is that the need of horizontal blanking periods has been eliminated for equivalence resolution. From this point of view, the consumption in our application is acceptable. By performing the CCA in the scanning, the center of each region is obtained when the scanning is over. This is very important for real-time processing in systems without blanking periods (such as the IR detector in our system).
Furthermore, the maximum clock frequency of the design exceeds 90 MHz which is more than twice the frequency which can be gotten in [12]. If the pixel clock frequency is higher (must lower than 90 MHz), the hardware implementation of the proposed algorithm will consume less time, and real-time CCA can be realized for larger size images.
5 Conclusions
In this study, a real-time single-pass connected components analysis algorithm is proposed. Compared with the existing single-pass CCA algorithms, the pixel is set as a scan unit, the run is set as a labeling unit, and the correspondence of labels in adjacent rows are managed by the multi-layer-index structure. By doing this, the equivalence can be resolved as soon as it is encountered, eliminating the need for waiting for the end of the row. Due to the preferable architecture, the algorithm can perform single-pass CCA on FPGA while the pixel is being transmitted. Experimental results indicated that the algorithm is suitable for real-time processing in the RTATR system.
Declarations
Authors’ Affiliations
References
- Lifeng H, Yuyan C, Kenji S: A run-based two-scan labelling algorithm. IEEE. Trans. Image. Process 2008, 17: 749-756.MathSciNetView ArticleGoogle Scholar
- Lifeng H, Yuyan C, Kenji S: An efficient first-scan method for label-equivalence-based labelling algorithms. Pattern. Recognit. Lett. 2010, 31: 28-35. 10.1016/j.patrec.2009.08.012View ArticleGoogle Scholar
- Costantino G, Daniele B, Rita C: Fast block based connected component labelling. In IEEE International Conference on Image Process (ICIP2009). Grant Hyatt Cairo, Cairo: Grant Hyatt Cairo, Cairo; 7–10 November 2009:4061-4064.Google Scholar
- Luigi Di S, Andrea B: A simple and efficient connected components labelling algorithm. In International Conference on Image Analysis and Process. Venice; 27–29 September 1999:322-327.Google Scholar
- Fu C, Chun-Jen C, Chi-Jen L: A linear-time component-labelling algorithm using contour tracing technique. Comput. Vis. Image. Und. 2004, 93: 206-220. 10.1016/j.cviu.2003.09.002View ArticleGoogle Scholar
- Crookes D, Benkrid K: FPGA implementation of image component labelling. In Proceedings SPIE 3844, Reconfigurable Technology: FPGAs for Computing and Applications. Boston; 20–21 September 1999:17-23.View ArticleGoogle Scholar
- Jablonski M, Gorgon M: Handel-C implementation of classical component labelling algorithm. In Euromicro Symposium on Digital System Design (DSD 2004). Rennes; 31 August to 3 September 2004:387-393.Google Scholar
- Kofi A, Andrew H, Patrick D, Jonathan O: A run-length based connected component algorithm for FPGA implementation. In International Conference on Field-Programmable Technology. Taibei; 8–10 December 2008:177-184.Google Scholar
- Kofi A, Andrew H, Patrick D, Hongying M: Accelerated hardware video object segmentation: from foreground detection to connected components labelling. Comput. Vision. Image. Und. 2010, 114: 1282-1291. 10.1016/j.cviu.2010.03.021View ArticleGoogle Scholar
- Bailey DG, Johnston CT: Single pass connected components analysis. In Proceedings of Image and Vision Computing. Hamilton: University of Waikato; 5–7 December 2007:282-287.Google Scholar
- Johnston CT, Bailey DG: FPGA implementation of a single pass connected components algorithm. In The 4th IEEE International Symposium on Electronic Design, Test and Applications (DELTA 2008). Hong Kong; 23–25 January 2008:228-231.View ArticleGoogle Scholar
- Ni M, Bailey DG, Johnston CT: Optimised single pass connected components analysis, Taibei. In International Conference on Field-Programmable Technology. Taibei; 8–10 December 2008:185-192.Google Scholar
- University of Southern California: SIPI Image Database. , Accessed June 2010 http://sipi.usc.edu/database/database.php?volume=misc&image=11#top Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.