Skip to main content

Efficient scan mask techniques for connected components labeling algorithm

Abstract

Block-based connected components labeling is by far the fastest algorithm to label the connected components in 2D binary images, especially when the image size is quite large. This algorithm produces a decision tree that contains 211 leaf nodes with 14 levels for the depth of a tree and an average depth of 1.5923. This article attempts to provide a faster method for connected components labeling. We propose two new scan masks for connected components labeling, namely, the pixel-based scan mask and the block-based scan mask. In the final stage, the block-based scan mask is transformed to a near-optimal decision tree. We conducted comparative experiments using different sources of images for examining the performance of the proposed method against the existing methods. We also performed an average tree depth analysis and tree balance analysis to consolidate the performance improvement over the existing methods. Most significantly, the proposed method produces a decision tree containing 86 leaf nodes with 12 levels for the depth of a tree and an average depth of 1.4593, resulting in faster execution time, especially when the foreground density is equal to or greater than the background density of the images.

1. Introduction

Applying connected components labeling in a binary image is of crucial importance in image processing, image recognition, and computer vision tasks. Labeling operations involve the entire gamut of finding connected components in an image by assigning a unique label to all points in the same component. There are many algorithms that have been proposed to address the labeling operation. In general, these algorithms are categorized into four classes: (i) one-scan [1, 2], (ii) two-scan [3–11], (iii) multi-scan [12], and (iv) contour tracing [13] algorithms.

According to Grana et al. [3], two-scan is the fastest algorithm for labeling the connected components. In this article, a two-scan algorithm method will be discussed and analyzed in detail. Two-scan is a simple and efficient algorithm in computation time that was previously introduced by Rosenfeld and Pfaltz in 1966 [4]. It consists of three classical operations:

  1. 1.

    First image scan: provisional label assignment and collection of label equivalences

  2. 2.

    Equivalences resolution: equivalence classes creation

  3. 3.

    Second image scan: final label assignment

First image scan

This is an operation in the classical two-scan labeling algorithm which accesses the pixels sequentially in raster scan order to find the eight-connectivity using the pixel-based scan mask, as shown in Figure 1 [5]. This algorithm works with only two contiguous rows of an image at a time.

Figure 1
figure 1

Pixel-based scan mask [5]. (a) Pixels coordinate; (b) identifiers of the single pixels.

The equivalences resolution

This is an operation that creates an equivalence table containing the information needed to assign unique labels to each connected component. In the first image scan, all those labels that belong to one component are declared equivalent. In the second image scan, one label from an equivalent class is selected to be assigned to all pixels of a component.

Recently, a new algorithm (in the class of two-scan labeling algorithms was proposed by Grana et al. [3]) to improve the performance of all other existing algorithms with an average improvement of 23-29% is proposed. They optimized the first image scan process using the block-based connected components labeling method that moved a 2 × 2 pixels grid over an image. An extended mask of five 2 × 2 blocks is shown in Figure 2. As a result of using their algorithm, the number of provisional labels created during the first scan is roughly reduced by a factor of four which leads to requiring fewer union operations (i.e., labels equivalence are implicitly solved within the blocks). Consequently, the block-based connected components labeling proposed by Grana et al. [3] creates a decision tree with 210 condition nodes and 211 leaf nodes with 14 levels for the depth of a tree.

Figure 2
figure 2

Block-based scan mask [3]. (a) Identifiers of the single pixels; (b) blocks identifiers.

This article presents a new, more efficient algorithm for assigning provisional labels to object pixels (eight-connectivity) in binary images for the two-scan connected components labeling process. In this article, we only considered those binary images which are stored in a 2D array of pixels and we propose a new block-based connected components labeling method to introduce a new scan mask as shown in Figure 3 (an extended mask of four 2 × 2 blocks as shown in Figure 4). After applying our algorithm to block-based connected components labeling, an optimal tree is produced containing only 86 leaf nodes with 12 levels for the depth of a tree. The experimental results show that our algorithm is more efficient in computation time for the connected components labeling operation and it can process high density images in less time when compared with other existing comparable algorithms.

Figure 3
figure 3

The proposed pixel-based scan mask or P-Mask (do not check on position r).

Figure 4
figure 4

The proposed block-based scan mask or B-Mask (do not check on position R).

The rest of this article is organized as follows. A general background of the connected components labeling process as well as the two-scan algorithm and evolution strategies are discussed in Section 2. The details of the proposed method are described in Section 3. The comparative experimental results comparing our proposed method and other two-scan algorithms (previous studies) are shown in Section 4. The analyses and interpretation of results are discussed in Section 5, and finally a brief conclusion is given in Section 6.

2. Fundamentals

2.1 Connected components labeling

A connected component is a set of pixels in which all pixels are connected to each other. Connected component labeling is a methodology to group all connected pixels into components based on pixel connectivity and mark each component with a different label. In a connected component, all pixels have similar values and are, in some manner, connected to each other.

Pixel connectivity is a method typically used in image processing to analyze which pixels are connected to other pixels in the surrounding neighborhoods. Two pixels are considered connected to each other if they are adjacent to each other and their values are from the same set of values. A pixel value in a binary image is an element of the set {0, 1}, of which the 0-valued pixels are called background and the 1-valued pixels are called foreground.

The two most widely used methods to formulate the adjacency criterion for connectivity are four-connectivity (N4) and eight-connectivity (N8) as shown in Figure 5. For a pixel p with the coordinates (x, y), the set of connectivity pixels of p(x, y)is given by:

Figure 5
figure 5

Pixel connectivity. (a) Four-connectivity (N4); (b) eight-connectivity (N8).

N 4 ( p ) = { p ( x + 1 , y ) , p ( x - 1 , y ) , p ( x , y + 1 ) , p ( x , y - 1 ) }
(1)
N 8 ( p ) = N 4 ( p ) ∪ { p ( x + 1 , y + 1 ) , p ( x + 1 , y - 1 ) , p ( x - 1 , y + 1 ) , p ( x - 1 , y - 1 ) }
(2)

2.2 Two-scan algorithm

The two-scan algorithm is a method used for labeling the connected components in a binary image. There are three classical operations in the two-scan algorithm: first image scan, equivalences resolution, and second image scan. This section presents the literature related to first image scan operations. The algorithms used in the first scan image operation are classified into two types as pixel-based and block-based scan masks.

2.2.1 Pixel-based scan mask

This operation accesses the pixels sequentially in raster scan order for finding the eight-connectivity using the pixel-based scan mask as shown in Figure 1 [5]. The condition outcomes are given by all possible combinations of five Boolean variables (p, q, r, s, x). The actions belong to four classes: no action, new label, assign, and merge [3].

  1. 1.

    No action: is performed if the current pixel belongs to the background

  2. 2.

    New label: is created when the neighborhood is only composed of background pixels

  3. 3.

    Assign action: current pixel can take on any existing provisional label in the mask without the need for consideration of label equivalences (either only one pixel is foreground or all pixels share the same label)

  4. 4.

    Merge action: is performed to solve equivalence between two or more classes and a representative is assigned to the current pixel.

In 2005, Wu et al. [6] proposed a decision tree as shown in Figure 6 to examine the neighbors of the connected components. A decision tree is a binary tree whose non-terminal nodes are conditional variables and whose terminal nodes are actions to be performed. A decision tree will be defined as being optimal if it has a minimal number of non-terminal nodes and terminal nodes. Wu et al. [6] suggested the idea that every pixel in the scan mask is always the neighbor of "q" (see Figure 1). If there is enough equivalence information to access the correct label of "q", there is no need to examine the rest of the neighbors. Therefore, their decision tree minimizes the number of scanned neighbors.

Figure 6
figure 6

The decision tree used in scanning for eight-connectivity proposed by Wu et al. [6].

Instead of using the decision tree, He et al. [7], in 2009, analyzed the mask for eight-connectivity containing 16 possible cases (not including "x", which is the background), as shown in Figure 7. Case 1 is the new label action, cases 2-9 and 13-16 are the assign action and cases 10-12 are the merge action. Based on these cases, they proposed the algorithm as shown in Figure 8.

Figure 7
figure 7

Sixteen possible cases for the current object pixel in the mask for eight-connectivity proposed by He et al. [7].

Figure 8
figure 8

He et al. [7]first-scan algorithm.

In 2010, Grana et al. [3] analyzed the eight-connectivity scan mask using a decision table. They defined the OR-decision table in which any of the actions in the set of actions may be performed to satisfy the corresponding condition. Their OR-decision table is different from the classical decision table in that all actions in a classical decision table have to be performed. First, they produced an optimal decision tree from the OR-decision table and then converted the multiple actions OR-decision table into a single action decision table using the greedy method. The resulting OR-decision table is shown in Table 1. It contains 16 rules with boldfaces 1's. We added the "Mask" column to Grana et al. [3]'s OR-decision table to map the 16 possible cases proposed by He et al. [7] (Figure 7) to the corresponding rule in the OR-decision table.

Table 1 OR-decision table for labeling

The following describes the algorithm they used to convert the OR-decision table into a single action decision table for obtaining the optimal decision tree. In OR-decision tables, only one of the different alternatives provided must be selected. While an arbitrary selection does not change the result of the algorithm, the optimal tree derived from a decision table implementing these arbitrary choices may be different. They used a greedy approach: the number of occurrences of each action entry is counted; iteratively the most common one is selected and for each rule where this entry is present all the other entries are removed until no more changes are required. If two actions have the same number of entries, they arbitrarily choose the one with a lower index. The resulting table, after applying this process, is shown in Table 1 with boldfaces 1's. The algorithm in which only two actions are chosen arbitrarily leads to four possible equivalent decision trees. All of these trees have the same number of nodes and are optimal. Two of these trees are described by Wu et al. [6] as shown in Figure 6 and He et al. [7] as shown in Figure 8.

In 2010, He et al. [8] proposed a new pixel-based scan mask consisting of three processed neighbor pixels for the case where the pixel is followed by the current foreground pixel as shown in Figure 9. In this new pixel-based scan mask, the current foreground pixel following a background pixel or a foreground pixel can be known without any additional computation cost. By applying this algorithm, the pixel followed by the current foreground pixel can be removed from the mask. In other words, their algorithm is highly efficient when there are long runs of foreground pixels.

Figure 9
figure 9

Pixel-based scan mask proposed by He et al. [8]. (a) Pixels coordinate; (b) identifiers of the single pixels.

In Figure 9, a pixel-based scan mask (proposed by He et al. [8]) is illustrated. In Figure 10, eight possible cases for the current object pixel in the mask are shown and finally in Figure 11, the first-scan algorithm (of a pixel-based scan mask) is shown.

Figure 10
figure 10

Eight possible cases for the current object pixel in the mask [8].

Figure 11
figure 11

He et al. [8]algorithm. (a) First-scan algorithm; (b) Procedure 1; (c) Procedure 2.

Figure 11 shows the first-scan algorithm proposed by He et al. [8]. In the while loop, they increased the value of "x" without checking whether "x" is greater than the image width. The reason for this is because they considered all pixels on the border of an image to be equivalent to background pixels [8]. When we apply this algorithm to a general image, the border of an image is not considered as foreground or background pixels. The value of "x" (greater than the image width) is going to be checked. Therefore, their performance will be reduced when we apply their algorithm to general images (the performance of modified version [8] will be shown in Section 4).

He et al. [9, 10] proposed a run-based procedure for the first-scan of the two-scan labeling algorithm that could lead to more efficient computation time with regard to images with many long runs and/or a small number of object pixels (VO < VB: VO for the pixel value for the object and VB for the pixel value for the background). These two studies [9, 10] also are working on images in which pixels of the border are considered background pixels.

Finally, we can conclude that these three algorithms proposed by He et al. [8–10] are highly efficient in computation time for images with many long runs (foreground pixel followed by foreground pixel).

2.2.2 Block-based scan mask

This operation accesses the pixels sequentially in raster scan order for finding the eight-connectivity using the 2 × 2 block-based scan mask as shown in Figure 2. Classical 2 × 2 block-based connected components labeling was first introduced by Grana et al. [3]. The main idea of their proposal is based on two very straightforward observations: (1) when using eight-connection, the pixels of a 2 × 2 square are all connected to each other and (2) a 2 × 2 square is the largest set of pixels in which this property holds. This implies that all foreground pixels in a block will share the same label at the end of the computation. For this reason, they proposed to scan an image by moving over a 2 × 2 pixel grid applying an extended mask of five 2 × 2 blocks as shown in Figure 2 instead of the classical neighborhood as shown in Figure 1.

Scanning the image with this larger area has the advantage of labeling four pixels at the same time. The number of provisional labels created during the first scan is roughly reduced by a factor of four, which leads to applying many fewer unions since labels equivalence is implicitly solved within the blocks. Moreover, a single label is stored for the whole block.

The new scanning procedure may also require the same pixel to be checked multiple times but the impact of this problem is greatly reduced by their optimized pixel access scheme. Finally, a second scan requires accessing the original image again to check which pixels in the block require their label to be set. Overall, the advantages will be shown to largely overcome the additional work required in subsequent stages.

Considering the block-based scan mask in Figure 2, they would need to work with 20 pixels: for this reason, the decision table would have 20 conditions and the number of possible configurations of condition outcomes would be 220. However, some pixels do not provide an eight-connection between blocks of the mask and can be ignored (a, f, l, q), thus the decision table only has 16 pixels or 216 = 65, 536 possible combinations (rules).

Grana et al. [3] defined the abstracting layer of the relations between blocks which they call block connectivity; the connectivity between two blocks implies that all foreground pixels of the two blocks share the same label. They also defined the block-based decision table (BBDT) over the block connectivity. The conditions for block connectivity are shown below.

P X = ( h ∈ F a n d o ∈ F ) Q X = ( i ∈ F o r j ∈ F ) a n d ( o ∈ F o r p ∈ F ) R X = ( k ∈ F a n d p ∈ F ) S X = ( n ∈ F o r r ∈ F ) a n d ( o ∈ F o r s ∈ F ) P Q = ( b ∈ F o r h ∈ F ) a n d ( c ∈ F o r i ∈ F ) Q R = ( d ∈ F o r j ∈ F ) a n d ( e ∈ F o r k ∈ F ) S P = ( g ∈ F o r h ∈ F ) a n d ( m ∈ F o r n ∈ F ) S Q = ( i ∈ F a n d n ∈ F ) X = ( o ∈ F o r p ∈ F o r s ∈ F o r t ∈ F )

They also defined nine Boolean conditions, with a total of 29 = 512 combinations. But, only 192 conditions are effectively possible (cover 65, 536 combinations in a pixel-based decision table--PBDT) in BBDT, which they call OR-decision tables. Grana et al. [3] converted an OR-decision table into a decision tree in two steps. First, they used the greedy procedure to optimize the OR-decision table into a single entry decision table. Second, they used dynamic programming [14] to synthesize the decision tree that contains 211 leaf nodes with 14 levels for the depth of a tree.

The concept of dynamic programming is an optimal solution that can be built from optimal sub-solutions. This is the case because the building of a decision sub-tree for each restriction is a separate problem that can be optimally solved independently of the others but sub-diagrams often overlap the resulting interaction, which destroys the independence of the sub-problems [15] as shown in Figure 12.

Figure 12
figure 12

The dynamic programming lattice of three input variables.

Figure 12 illustrates the lattices of three input variables using dynamic programming: there are eight different problems at step 0, 12 different problems at step 1, six different problems at step 2, and one problem at the final step. The number of different problems at each step was calculated by using formula (3) [14]. Table 2 shows the number of different problems at each step when the input variables vary from 3 to 16. The disadvantage of using dynamic programming to convert a decision table to a decision tree is that it requires a huge amount of calculation. According to Table 2, to convert a 16 input variable decision table to a decision tree, there are 43, 046, 721 problems that need to be computed.

Table 2 The number of different problems of each step vary from 3 to 16 input variables
∑ i = 0 n n i 2 n - i
(3)

2.3 Evolution strategies

Evolution strategy (ES) is one of the main branches of evolutionary computation. Similar to genetic algorithms [16], ESs are algorithms which imitate the principles of natural Darwinian evolution and generally produce consecutive generations of samples. During each generation, a batch of samples is generated by perturbing the parents' parameters by mutating their genes. A number of samples are selected based on their fitness values, while the less fit individuals are discarded. The survivors are then used as parents for the next generation, and so on. This process typically leads to increasing fitness over the generations.

The ES was proposed for real-valued parameter optimization problems developed by Rechenberg [17] in 1971. In ES, the representation used was one n-dimensional real-valued vector. A vector of real values represented an individual. The standard deviation was used to control the search strategy in ES. Rechenberg used Guassian mutation as the main operator in ES, in which a random value from a Gaussian distribution (normal distribution) was added to each element of an individual's vector to create a new offspring. This basic ES framework, though simple and heuristic in nature, has proven to be very powerful and robust, spawning a wide variety of algorithms.

The basic difference between evolution strategy and genetic algorithms lies in their domains (i.e., the representation of individuals). ES represents an individual as float-valued vectors instead of a binary representation. This type of representation reduces the burden of converting genotype to phenotype during the evolution process.

ESs introduced by Rechenberg [17, 18] were (1 + 1)-ES and (μ + 1)-ES, and two further versions introduced by Schwefel [19, 20] were (μ + λ)-ES and (μ, λ)-ES.

  • (1 + 1)-ES or two-membered ES is the simplest form of ES. There is one parent which creates one n-dimensional real-valued vector of object variables by applying a mutation with identical standard deviations to each object variable. The resulting individual is evaluated and compared to its parent, and the better of the two individuals survive to become the parent of the next generation, while the other one is discarded.

  • (μ + 1)-ES or steady-state ES is the first type of a multimembered ES. There are μ parents at a time (μ > 1) in which one child is created from μ parents. In (μ + 1)-ES, μ parent individuals are recombined to form one offspring, which also undergoes a mutation. The best one is selected as the new current solution, which may be the offspring or one of the parents, thus keeping constant the population size.

  • (μ + λ)-ES, in which not only one offspring is created at a time or in a generation, but λ ≥ 1 descendants, and, to keep the population size constant, the λ worst out of all μ + λ individuals are discarded.

  • (μ, λ)-ES, in which the selection takes place among the λ offspring only, whereas their parents are "forgotten" no matter how good or bad their fitness was compared to that of the new generation. Obviously, this strategy relies on a birth surplus, i.e., on λ > μ in a strict Darwinian sense of natural selection.

3. Proposed scan mask for two-scan algorithm

This article proposes a new scan mask for connected components labeling. The underlying idea of proposing the scan mask is to produce a near-optimal decision tree to improve performance over the existing connected components labeling algorithms, especially for high density images. Instead of having five pixels, the proposed scan mask has only four pixels (ignore pixel r) as shown in Figure 3. We also applied the concept of a pixel-based scan mask to the block-based scan mask as shown in Figure 4. More details on the proposed algorithm are described in the following sections.

3.1 Proposed pixel-based scan mask (P-mask)

From the literature described in the previous sections, all connected components labeling algorithms create unbalanced trees; for instance, the decision tree proposed by Wu et al. [6] in Figure 6. It shows that at the current position (x), if it is a background, then there is no action performed at this position and the operation is complete. So, the heights between the two child sub-trees of node x are very different; the left side of the tree is much shorter than the right side.

This section presents the concept of using the proposed scan mask for finding eight-connectivity. The proposed scan mask ignores pixel r and uses only four pixels as shown in Figure 3. It is used to scan pixels in raster scan order; from top to bottom, left to right, and pixel-by-pixel. For the first scan at the top-left position of an image at time 1, the proposed scan mask checks positions p, q, s, x (see Figure 13a). After that, it is shifted to the right 1 pixel at time 2 as shown in Figure 13b. Then, it continue checking at the new positions p, q, s, x as shown in Figure 13b. Now, the new position of q at time 2 was previously the position of r at time 1, and the new position of s at time 2 was previously the position of x at time 1. So, the positions of x and r at time 1 can be checked later while performing the checks at positions s and q at time 2. Therefore, checking at positions s and q is always performed no matter whether the position of x is a foreground or background pixel. So, we suggest that the scan mask has only p, q, s, x.

Figure 13
figure 13

Example of using P-Mask at times 1 and 2.

More importantly, we also reanalyzed the actions in the scan mask. We added merge only as a new class of action. We analyzed the new pixel-based scan mask for eight-connectivity. There are 16 possible cases (whether x is background or foreground) as shown in Figure 14. The action entries are obtained by applying the following considerations:

Figure 14
figure 14

Sixteen possible cases.

  1. 1.

    No action: for cases 1-3 and cases 5-8, take no action.

  2. 2.

    New label: for case 9, assign the new provisional labels to pixel 'x'.

  3. 3.

    Assign: for cases 10-11 and cases 13-16, assign the provisional labels of its neighbor to pixel 'x' (its entire foreground neighbors have the same provisional label).

  4. 4.

    Merge: for case 12, merge two provisional labels into the same class and a representative is assigned to pixel 'x' (using the proposed scan mask, s and q are not connected to each other yet, s and q might belong to different provisional labels).

  5. 5.

    Merge only: for case 4, merge two provisional labels of s and q into the same class and do not assign provisional labels to pixel 'x'.

We also analyzed the above 16 possible cases into the OR-decision table as shown in Table 3.

Table 3 OR-decision table

We converted the OR-decision table (Table 3) into a decision tree directly, without converting the OR-decision table into a single action decision table, using the algorithm previously reported in [21]. The resulting decision tree is shown in Figure 15.

Figure 15
figure 15

The resulting decision tree converted from Table 3.

The resulting decision tree shown in Figure 15 has only four levels, whereas that of Wu et al. [6] has five levels (see Figure 6). Therefore, the decision tree created from the proposed algorithm appears to be more optimal than the tree proposed by Wu et al. [6]. But considering the number of leaf nodes, the proposed decision tree has nine leaf nodes, which are more than the eight leaf nodes of the decision tree proposed by Wu et al. [6]. Practically, an optimal decision tree should have a lower number of leaf nodes. So the proposed scan mask might not work well in pixel-based connected components labeling, but we wanted to demonstrate the idea of the proposed scan mask as pixel-based initially so that later in this article, we will apply it to the block-based connected component method. Thus, the proposed algorithm has advantages over existing algorithms in both criteria; the tree height and number of leaf nodes, and eventually it produced an optimal decision tree. The next section describes the concept of the proposed block-based scan mask.

3.2 Proposed block-based scan mask (B-mask)

From the success of Grana et al. [3] method, their decision tree can perform connected components labeling very fast. But, their method uses the classical scan mask, which produces an unbalanced decision tree as described in previous section.

In this article, a new block-based scan mask (see Figure 4) is also proposed by applying the new pixel-based scan mask (see Figure 3). The proposed block-based scan mask has only four blocks of 2 × 2 pixels, 16 pixels in total. But, pixels a, d, q (see Figure 4) do not provide eight-connectivity between blocks of the mask and can be ignored. We therefore need to deal with only 13 pixels, with a total of 213 possible combinations. The basic idea is to reduce the number of possible combinations from 216 = 65, 536 of Grana et al. [3] to 213 = 8, 192 rules. There are seven conditions for the proposed block connectivity:

X = ( o ∈ F o r p ∈ F o r s ∈ F o r t ∈ F ) P X = ( h ∈ F a n d o ∈ F ) Q X = ( i ∈ F o r j ∈ F ) a n d ( o ∈ F o r p ∈ F ) S X = ( n ∈ F o r r ∈ F ) a n d ( o ∈ F o r s ∈ F ) P Q = ( b ∈ F o r h ∈ F ) a n d ( c ∈ F o r i ∈ F ) S P = ( g ∈ F o r h ∈ F ) a n d ( m ∈ F o r n ∈ F ) S Q = ( i ∈ F a n d n ∈ F )

There are seven Boolean conditions, with a total amount of 27 = 128 combinations. But, only 57 conditions are effectively possible (cover 8, 192 combinations in a PBDT) in BBDT. The complete proposed BBDT is shown in Table 4. We also defined two new actions as

Table 4 Proposed new BBDT
  • Merge only: for mask numbers 2, 4, and 6 in Table 4, merge two provisional labels of blocks S and Q into the same class and do not assign provisional labels to pixels in block X. An example of mask number 2, 4, and 6 as shown in Figure 16

Figure 16
figure 16

Q + S mask number 2, 4, and 6 in Table 3.

  • Merge and assign new label: for mask numbers 10, 12, and 14 in Table 4, merge two provisional labels of blocks S and Q into the same class and assign new provisional labels to pixels in block X. An example of mask number 10, 12, and 14 as shown in Figure 17.

Figure 17
figure 17

Q + S, new label, mask number 10, 12, and 14 in Table 3.

The merge only operation is performed on the mask number 2, 4, and 6. According to Table 4, mask number 2 has 16 possible rules as shown in Figure 18, and mask numbers 2 and 6 also have 16 possible rules. The total numbers of possible rules performing the merge only operation are 48.

Figure 18
figure 18

Sixteen rules of mask number 2.

The merge and assign new label operation is performed on the mask numbers 10, 12, and 14. According to Table 4, there are 48 possible rules for performing the merge and assign new label operation.

Next, we mapped the BBDT to PBDT and produced the 8, 192 rules PBDT. After that we used the algorithm as previously reported in [21] (by setting the condition weight to 1.0) to convert a PBDT into a decision tree containing 118 condition nodes and 119 leaf nodes. To convert 13 inputs of the decision table to a decision tree, Sutheebanjard and Premchaiswadi [21]'s algorithm needs to compute 118 problems which is significantly fewer than the 1, 594, 323 problems of Schumacher and Sevcik [14]'s algorithm as shown in Table 2. Therefore, it is obvious that using Sutheebanjard and Premchaiswadi [21]'s algorithm to convert a decision table to a decision tree can enormously reduce the computation time.

In controlling the resulting decision tree, we assigned a weight to each condition in the OR-decision table. Therefore, if the condition weight is changed, the resulting decision tree is also changed. The question is: what is the proper condition weight to create an optimum decision tree? The easiest method is to randomly assign a real value to the condition weight. But, because there are 13 condition weights, it is impractical to do that. In order to deal with this problem, this article applied the (μ + λ) ES introduced by Rechenberg [18] to adjust the condition weight until the optimized weight is found. The (μ + λ)-ES consisted of 80 parent individuals (a real-valued vector), which produced 100 offspring by means of adding Gaussian distribution (normal distribution) random numbers. The best of 80 individuals serve as the ancestor of the following iteration/generation. The weight of the prediction function is initialized by the mutation operation and then the evolution process begins.

Sutheebanjard and Premchaiswadi [21]'s algorithm was used to converted the OR-decision table into a decision tree and evaluated the fitness by counting the number of leaf nodes in order to minimize the number of leaf nodes.

The child vector was defined by the mutation operation of a real-valued coefficient by sampling a real value from a Gaussian distribution and by adding it to the coefficient as shown in (4).

a c = a p + N ( 0 , σ 2 )
(4)

where a p is a parent coefficient; a c is a child coefficient; N(0, σ2) is a normal distribution; and σ denotes the standard deviation of the system.

In controlling the standard deviation, an adjustment of standard deviation was considered and taken from the ratio of a better individual during the evolution process (refer to 1/5 success rule [18]) as shown in (5). The implemented algorithm is shown in Figure 19.

Figure 19
figure 19

Optimizing the OR-decision table algorithm.

σ ′ = σ ∕ 0 . 8 1 7 i f ( p > 1 ∕ 5 ) σ ⋅ 0 . 8 1 7 i f ( p < 1 ∕ 5 ) σ i f ( p = 1 ∕ 5 )
(5)

The algorithm in Figure 19 was repeated for 1, 000 generations and it resulted in a decision tree with a minimum number of 86 leaf nodes and 12 levels for the depth of a tree and was therefore selected. The resulting optimum decision tree was implemented in C++ using OpenCV library. It is available on-line at http://phaisarn.com/labeling.

4. Experimental result

The tested algorithms in this article, as mentioned earlier, are categorized in the class of "two-scan algorithms". In order to evaluate the performance of the proposed first image scan algorithm and then to avoid the effect of the equivalences resolution operation, this experiment was conducted by executing a variety of different algorithms (He et al. [8], Grana et al. [3], and our proposed method). In the equivalences resolution, we followed the Union-Find technique as presented by He et al. [11], which are the most advanced available techniques currently available. The Union-Find technique uses three array-based data structures that implement the algorithm in a very efficient way. During the second image scan, we only need to replace each provisional label by its representative label. As a result, all pixels belonging to a connected component will be assigned a unique label.

In this study, we stated that the decision tree created from the proposed block-based scan mask (B-mask) provides the most efficient way to scan the images and evaluate the connectivity in terms of computation time for general images (border pixel can be any background or foreground). Consequently, we tested and compared the results of different image datasets to elaborate the efficiency and performance of different methods and algorithms.

The experiment was performed on Ubuntu 10.04 OS with an Intel® Xeon® Processor E5310, 1.60 GHz, 4 cores, using a single core for the processing. All algorithms used for our comparison were implemented in C++ using OpenCV library, the compiler is gcc version 4.4.3. All experimental results presented in this section were obtained by averaging the execution time for 100 runs. To prevent one run from filling the cache to make subsequent runs faster, we deallocated the image header and the image data at the end of each run by calling the standard function (cvReleaseImage()) of OpenCV. On the other hand, all algorithms produced the same number of labels and the same labeling on all images.

4.1 Synthetic dataset

We used the synthetic dataset of black and white random noise square images with eight different image sizes from a low resolution of 32 × 32 pixels to a maximum resolution of 4096 × 4096 pixels proposed by Grana et al. [3]. In our experiment, the synthetic dataset of [22] containing 720 files is used for the test. The experimental results show that the proposed method consumes the lowest computation time for all image sizes as shown in Figure 20 and Table 5.

Figure 20
figure 20

Performance of each algorithm with varying size of the image.

Table 5 Performance of each algorithm with varying size of the image

We also tested 4096 × 4096 pixels images with nine different foreground densities (ten images for each density). An illustrative example of density variation is provided in Figure 21. The experimental results show that the proposed method consumes the lowest computation time in six out of nine densities as shown in Figure 22 and Table 6.

Figure 21
figure 21

Sample collection of random images, in this case shown at 32 × 32 resolution, to which a variation on the threshold is performed in order to produce different densities of labels.

Figure 22
figure 22

The average performance of each algorithm with varying label densities. The image size was 4096 × 4096 pixels.

Table 6 Performance of each algorithm with varying label densities.

The resulting dataset gave us the possibility to evaluate the performance of both our approach and other selected algorithms in terms of scalability on the number of pixels and scalability on the number of labels (density).

4.2 Simplicity

We tested 1, 000 images from the database used in the SIMPLIcity paper [23] (as we called SIMPLIcity). We transformed images from SIMPLIcity into binary images using Otsu's threshold selection method in [24]. Also we categorized the 1, 000 images into 9 different density levels (images are available on-line at http://phaisarn.com/labeling). The example images with different densities are shown in Figure 23. The performance of each algorithm is shown in Figure 24 and Table 7.

Figure 23
figure 23

Sample images at different densities from the MIRflickr dataset binarized by Otsu's method.

Figure 24
figure 24

The average performance of each algorithm using images from the SIMPLIcity dataset binarized by Otsu's method at nine densities.

Table 7 Performance of each algorithm using images from the SIMPLIcity dataset binarized by Otsu's method at nine densities

4.3 The USC-SIPI image database

The USC-SIPI image database is a collection of digitized images. The USC-SIPI image database is appropriate to support different research studies regarding image processing, image analysis, and machine vision. The first edition of the USC-SIPI image database was distributed in 1977 and many new images have been added since then.

The database is divided into volumes based on the basic characteristics of the pictures. Images in each volume are of various sizes such as 256 × 256, 512 × 512, or 1024 × 1024 pixels. All images are 8 bits/pixel for black and white images, 24 bits/pixel for color images. We selected images from the Aerials, Miscellaneous, and Textures volumes [25]. The images were transformed into binary images using Otsu's threshold selection method in [24] and we then categorized them into nine density levels (images are available on-line at http://phaisarn.com/labeling). Samples of some images in each density are shown in Figure 25. The average performance over all of the images within the set for each algorithm at different densities is shown in Figure 26 and Table 8.

Figure 25
figure 25

Sample images at different densities from the USC-SIPI database binarized by Otsu's method.

Figure 26
figure 26

The averge performance of each algorithm using images at different densities from the USC-SIPI database binarized by Otsu's method.

Table 8 Performance of each algorithm using images at different densities from the USC-SIPI database binarized by Otsu's method

5. Analysis

5.1 The average tree depth analysis

The comparison of the 16 configurations is shown in Figure 7. For processing a foreground pixel in the first scan, the number of times for checking the neighbor pixels in He et al. [7, 8] and other conventional label-equivalence-based labeling algorithms are shown in Table 9 [8].

Table 9 Number of times for checking the processed neighbor pixels [8]

We also performed the analysis with Grana et al. [3]'s decision tree, in which there are 216 = 65, 536 combinations. We counted the total number of execution conditions and then we calculated the average number of executions by dividing the total number of actions by 65, 536. After that, we divided the result by 4 (Grana et al. [3] used 2 × 2 block-based); the calculated result is equal to 1.592.

Finally, the same analysis was performed on the proposed method (decision tree from block-based scan mask) in which there is 213 = 8, 192 combinations. We counted the total number of execution conditions and calculated the average number of executions by dividing the total number of actions by 8, 192. Afterward, we again divided the result by 4 and the calculated result is equal to 1.459.

Thus, in order to preprocess any pixel, the average number of times for checking the processed neighbor pixels in the first scan of our decision tree reduced to 1.459. As a result, our proposed algorithm is the fastest algorithm for labeling the connected components, but if we take a look at each density level as shown in Tables 6, 7 and 8; it shows that between the density ranges of 0.1-0.4, Grana et al. [3] algorithm performs as the fastest. However, between the density ranges of 0.5-0.9, our proposed method performs much faster than all other algorithms. The rationale behind the outcomes and results will be fully discussed and elaborated in the next session.

5.2 The balanced tree analysis

The decision tree proposed by Wu et al. [6] and He et al. [7] is created from a pixel-based scan mask as shown in Figure 1. It is an unbalanced tree as shown in Figure 6. The decision tree proposed by Grana et al. [3] is also an unbalanced decision tree because they are produced from a pixel-based scan mask as shown in Figure 1. According to Tables 6, 7 and 8, the decision tree proposed by Grana et al. [3] performs faster than our proposed decision tree for lower density images. Their algorithms perform faster because in low density images, most of the pixels are background not foreground. It is obvious that if pixel 'x' is a background pixel, the operation stops with no further required action.

Compared to other algorithms, our proposed decision tree as shown in Figure 15, which is created from the proposed pixel-based scan mask in Figure 3, is a near-optimal decision tree. It performs approximately the same number of operations whether the pixel 'x' is a background or foreground. Hence, we can consider fundamental differences between our proposed decision tree (Figure 15) and the decision tree proposed by Wu et al. [6] (Figure 6). If the current pixel is background, Wu et al. [6]'s decision tree is going to check the pixel "x" only one time but our proposed decision tree usually checks it between 2 and 4 times. On the other hand, if the current pixel is foreground, Wu et al. [6]'s decision tree is going to check the pixel "x" between 2 and 5 times but our proposed decision tree will only check it between 2 and 4 times.

The properties of a block-based decision tree are also the same as a pixel-based decision tree that we have just analyzed. Grana et al. [3] developed a decision tree based on a block-based scan mask as shown in Figure 2 that is an enhancement of the pixel-based scan mask shown in Figure 1. Hence, their decision tree performs very fast if the current 2 × 2 block consists of background pixels when compared with our proposed decision tree method. Our proposed decision tree is an extension of the pixel-based scan mask in Figure 3 to the block-based scan mask in Figure 4. Therefore, our decision tree performs very fast if the current 2 × 2 block is composed of foreground pixels when compared with Grana et al. [3]'s decision tree. This is the reason why the proposed decision tree performs faster than Grana et al. [3] for high density images.

6. Conclusion

The main contribution of this article is to improve the performance of the existing connected components labeling methods for general binary image, especially for high density images. In this article, we presented a new method to label the connected components. Initially, we introduced a new pixel-based scan mask (P-mask) of eight-connectivity in conjunction with a new class of action. Second, we applied the new pixel-based scan mask to the new block-based scan mask (B-mask) and then created the BBDT from the B-mask. Then, we mapped the BBDT into the PBDT. Finally, we converted the PBDT into a decision tree with fast computation [21] and used the ES methodology to optimize the weight conditions in the PBDT. The result of these operations is a near-optimal decision tree that contains 85 condition nodes and 86 leaf nodes with 12 levels for the depth of a tree.

In terms of performance, we explored the performance of the proposed method against other techniques using images from various sources with different image sizes and densities. The experimental results show that the proposed method is faster than all other techniques except for [3] that performed slightly faster for low density images. The analyses of the results are also described in Section 5. Based on our findings, we conclude that the proposed method improves the performance of connected components labeling and is particularly more effective for the high density images.

References

  1. AbuBaker A, Qahwaji R, Ipson S, Saleh M: One scan connected component labeling technique. In IEEE International Conference on Signal Processing and Communications (ICSPC 2007). Dubai, United Arab Emirates; 2007:1283-1286.

    Chapter  Google Scholar 

  2. Trein J, Schwarzbacher AT, Hoppe B: FPGA implementation of a single pass real-time blob analysis using run length encoding. In MPC-Workshop. Ravensburg-Weingarten. Germany; 2008:71-77.

    Google Scholar 

  3. Grana C, Borghesani D, Cucchiara R: Optimized block-based connected components labeling with decision trees. IEEE Trans Image Process 2010, 19(6):1596-1609.

    Article  MathSciNet  Google Scholar 

  4. Rosenfeld A, Pfaltz JL: Sequential operations in digital picture processing. J ACM 1966, 13(4):471-494. 10.1145/321356.321357

    Article  MATH  Google Scholar 

  5. Rosenfeld A, Kak AC: Digital Picture Processing. Volume 2. 2nd edition. Academic Press, San Diego; 1982.

    Google Scholar 

  6. Wu K, Otoo E, Shoshani A: Optimizing connected component labeling algorithms. Proc SPIE 2005, 5747: 1965-1976.

    Article  Google Scholar 

  7. He L, Chao Y, Suzuki K, Wu K: Fast connected-component labeling. Pattern Recogn 2009, 42(9):1977-1987. 10.1016/j.patcog.2008.10.013

    Article  MATH  Google Scholar 

  8. He L, Chao Y, Suzuki K: An efficient first-scan method for label-equivalence-based labeling algorithms. Pattern Recogn Lett 2010, 31: 28-35. 10.1016/j.patrec.2009.08.012

    Article  Google Scholar 

  9. He L, Chao Y, Suzuki K: A run-based two-scan labeling algorithm. IEEE Trans Image Process 2008, 17(5):749-756.

    Article  MathSciNet  Google Scholar 

  10. He L, Chao Y, Suzuki K: A run-based one-and-a-half-scan connected-component labeling algorithm. Int J Pattern Recogn Artif Intell 2010, 24(4):557-579. 10.1142/S0218001410008032

    Article  Google Scholar 

  11. He L, Chao Y, Suzuki K: A linear-time two-scan labeling algorithm. In 2007 IEEE International Conference on Image Processing (ICIP). San Antonio, Texas, USA, September; 2007:V-241-V-244.

    Google Scholar 

  12. Suzuki K, Horiba I, Sugie N: Linear-time connected-component labeling based on sequential local operations. Comput Vis Image Understand 2003, 89: 1-23. 10.1016/S1077-3142(02)00030-9

    Article  MATH  Google Scholar 

  13. Chang F, Chen CJ, Lu CJ: A linear-time component-labeling algorithm using contour tracing technique. Comput Vis Image Understand 2004, 93: 206-220. 10.1016/j.cviu.2003.09.002

    Article  Google Scholar 

  14. Schumacher H, Sevcik KC: The synthetic approach to decision table conversion. Commun ACM 1976, 19: 343-351. 10.1145/360238.360245

    Article  MATH  Google Scholar 

  15. Moret BME: Decision trees and diagrams. ACM Comput Surv (CSUR) 1982, 14(4):593-623. 10.1145/356893.356898

    Article  Google Scholar 

  16. Holland JH: Genetic algorithms. Sci Am 1992, 267: 66-72. 10.1038/scientificamerican0792-66

    Article  Google Scholar 

  17. Rechenberg I: Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Dr.-Ing., Thesis, Technical University of Berlin, Department of Process Engineering 1971.

    Google Scholar 

  18. Rechenberg I: Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Frommann-Holzboog Verlag, Stuttgart 1973.

    Google Scholar 

  19. Schwefel HP: Evolutionsstrategie und numerische Optimierung. Dissertation, TU Berlin, Germany 1975.

    Google Scholar 

  20. Schwefel HP: Numerische Optimierung von Computer-Modellen mittels der Evolutionsstrategie, Interdisciplinary Systems Research, 26. Birkhäuser, Basel 1977.

    Google Scholar 

  21. Sutheebanjard P, Premchaiswadi W: Fast convert OR-decision table to decision tree. IEEE ICT&KE2010 2010.

    Google Scholar 

  22. University of Modena and Reggio Emilia, Modena, Italy: cvLabelingImageLab: an impressively fast labeling routine for OpenCV.2010. [http://imagelab.ing.unimore.it/imagelab/labeling.asp]

    Google Scholar 

  23. Wang JZ, Li J, Wiederhold G: SIMPLIcity: Semantics-sensitive Integrated Matching for Picture Libraries. IEEE Trans Pattern Anal Mach Intell 2001, 23(9):947-963. 10.1109/34.955109

    Article  Google Scholar 

  24. Otsu N: A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 1979, 9: 62-66.

    Article  Google Scholar 

  25. University of Southern California: The USC-SIPI Image Database.2010. [http://sipi.usc.edu/database/]

    Google Scholar 

Download references

Acknowledgements

The authors would like to express their extreme gratitude to Grana et al. [3] who provide the source code on the internet and is available for use free of charge for all researchers, users and those who are interested in this field of study. We used their source code in the experiment and implemented our own algorithms based on their source code. Especially, we would like to thank Daniele Borghesani who helps us explain their algorithm.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phaisarn Sutheebanjard.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Authors’ original file for figure 11

Authors’ original file for figure 12

Authors’ original file for figure 13

Authors’ original file for figure 14

Authors’ original file for figure 15

Authors’ original file for figure 16

Authors’ original file for figure 17

Authors’ original file for figure 18

Authors’ original file for figure 19

Authors’ original file for figure 20

Authors’ original file for figure 21

Authors’ original file for figure 22

Authors’ original file for figure 23

Authors’ original file for figure 24

Authors’ original file for figure 25

Authors’ original file for figure 26

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Sutheebanjard, P., Premchaiswadi, W. Efficient scan mask techniques for connected components labeling algorithm. J Image Video Proc. 2011, 14 (2011). https://doi.org/10.1186/1687-5281-2011-14

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1687-5281-2011-14

Keywords