A mutual GrabCut method to solve co-segmentation

Gao, Zhisheng; Shi, Peng; Karimi, Hamid Reza; Pei, Zheng

doi:10.1186/1687-5281-2013-20

Research
Open access
Published: 20 April 2013

A mutual GrabCut method to solve co-segmentation

Zhisheng Gao¹,
Peng Shi^2,3,
Hamid Reza Karimi⁴ &
…
Zheng Pei¹

EURASIP Journal on Image and Video Processing volume 2013, Article number: 20 (2013) Cite this article

5175 Accesses
14 Citations
Metrics details

Abstract

Co-segmentation aims at segmenting common objects from a group of images. Markov random field (MRF) has been widely used to solve co-segmentation, which introduces a global constraint to make the foreground similar to each other. However, it is difficult to minimize the new model. In this paper, we propose a new Markov random field-based co-segmentation model to solve co-segmentation problem without minimization problem. In our model, foreground similarity constraint is added into the unary term of MRF model rather than the global term, which can be minimized by graph cut method. In the model, a new energy function is designed by considering both the foreground similarity and the background consistency. Then, a mutual optimization approach is used to minimize the energy function. We test the proposed method on many pairs of images. The experimental results demonstrate the effectiveness of the proposed method.

1 Introduction

Image segmentation is a fundamental problem for many computer vision tasks, such as object recognition [1, 2], image understanding [3], and retrieval [4]. Due to variations of the objects, image segmentation remains a challenging problem. Recently, co-segmentation [5–15] has attracted much attention from the community. The goal of co-segmentation is to segment common objects from a group of images. Unlike traditional single-image segmentation, the co-segmentation method can segment multiple images jointly rather than independently segmenting each image based on the co-occurrence of objects in the images [16]. Several examples can be found in Figure 1, where six image pairs are shown. In each image pair, the co-segmentation aims to extract the common objects from the image pair, such as the ‘plane’ and ‘banana’ in the first two image pairs. Compared with traditional segmentation methods, co-segmentation can accurately segment objects from images by several related images and requires less user workload [17]. It has many potential applications in computer vision, such as image classification, object recognition, and image retrieval. This paper focuses on the co-segmentation problem.

The existing co-segmentation models address co-segmentation as an optimization problem, which achieves common objects by adding foreground similarity into segmentation models. Both the local smoothness in each image and the foreground similarity among the images are considered. Many traditional segmentation methods have been improved to solve co-segmentation method, such as the Markov random field (MRF)-based segmentation methods [5–8], random walker-based segmentation method [18], and discriminative clustering-based segmentation method [10, 19]. Analyzing these methods, these co-segmentation methods can be concluded as the extensions of the interactive-based segmentation methods since it is natural to replace the initial seeds manually given in the traditional method by searching the local similar regions shared by images.

Several well-known interactive-based segmentation methods have been extended to solve co-segmentation problem. MRF-based segmentation method was first extended for co-segmentation task by Rother [5], which introduced a global term representing foreground similarity into the MRF-based image segmentation model. Kim et al. [15] extended heat diffusion-based interactive segmentation method to solve multi-class co-segmentation problem. The heat diffusion-based method spreads the heat from the source seeds to the other pixels by pixel similarity. To solve multiple images in co-segmentation, the heat was diffused among the common objects by foreground similarity. The random walker-based interactive segmentation method was extended to solve co-segmentation problem in the work of Collins et al. [18], which introduces foreground similarity constraint into the random walker-based method. In the work of Meng et al. [16, 20], the active contour-based model (Chan-Vese model) was extended to fit co-segmentation task by considering both foreground similarity constraint and background consistency.

Among these methods, MRF-based co-segmentation methods attract much attentions since the success of the MRF-based segmentation method on single-image segmentation. Several MRF-based co-segmentation methods have been proposed [5–8]. Their differences focus on the formulation of foreground similarity constraints. Several foreground similarity constraints have been added, such as L1-norm [5], L2-norm [6], and reward strategy [7]. However, it remains challenging to minimize the MRF-based co-segmentation energy function although many global terms have been introduced. To cope with the minimization problem, the existing methods search approximate solutions [5, 6, 8] or require user to provide foreground appearance and locations [7]. Other methods use saliency map [13] to obtain initial object appearance model. For these methods, the results depend on the accuracies of the initial appearance models.

GrabCut [21] is an important MRF-based co-segmentation method, which segments the objects from a manual rectangle setting by graph cut algorithm. The main advantage of GrabCut is that the energy function can be efficiently minimized by mutually applying graph cut algorithm in polynominal time. Hence, it can be used in many real-time applications. Furthermore, it models the foreground and background appearance priors by a simple rectangle setting, which is convenient compared with the other interactive-based segmentation methods. It is seen that performing co-segmentation based on the GrabCut model can result in efficient optimization and prior model generation. Meanwhile, the GrabCut model can also benefit from co-segmentation task. The GrabCut model will be more robust to initial curve setting. The reason is that the prior provided by a pair of images in co-segmentation is more sufficient compared with a single image. Hence, automatically segmenting objects by GrabCut (without manual curve setting) can be achieved in co-segmentation task.

In this paper, we propose a new MRF-based co-segmentation method namely mutual GrabCut (MGrabCut) for common object segmentation, which extends GrabCut [21] to solve co-segmentation. In the method, the region outside each initial rectangle is treated as background region. Meanwhile, the regions inside initial rectangles are used to model unary potential of the foreground. To segment similar foregrounds, we introduce the foreground model of the other image in the unary term of the current image. The final co-segmentation results are achieved by graph cuts with iteratively updating unary term of the foreground appearance model and background appearance model. The main advantage of the proposed method is that compared with existing MRF-based co-segmentation methods, we consider foreground similarity into unary term rather than global term, which results in easier minimization. Hence, the proposed model is efficient and real time. Secondly, the proposed method is robust to initial curve setting because the common objects can be more accurately located by the constraint of foreground similarity. A fixed initial curve can be used for all pairs of images. Thirdly, since the foreground model is dynamically updated along the iteration, a more accurate appearance model is obtained by the proposed method. We test the proposed method on many pairs of images. The experimental results demonstrate the effectiveness of the proposed method.

The contributions of the proposed method are listed as follows:

1.
A novel MRF-based co-segmentation model is designed. In the model, the foreground similarity constraint is added into unary term rather than global term, which results in the efficient minimization by graph cut algorithm.
2.
Compared with traditional GrabCut model, the proposed model is more robust to initial curve setting and can segment objects with fixed initial curves. The benefit is caused by considering a pair of images instead of a single image.
3.
A mutual graph cut-based minimization method is developed to minimize the energy pairs.

2 Related work

In image segmentation, many minimization techniques have been used to achieve accurate object segmentation. Boykov et al. in [22] used graph cut algorithm to minimize the energy in MRF-based segmentation model. In the work of Meng et al. [16], the active contour-based energy function was minimized by level set techniques and the method of calculus of variations. In [17], the shortest path algorithm achieved by dynamic programming method was used for object segmentation. In the work of Zeng et al. [23], a hybrid extended Kalman filter and switching particle swarm optimization algorithm were proposed for model parameter estimation. In [24], a new particle filter was developed to simultaneously identify both the states and parameters of the model. In [25], Zineddin et al. presented a new image reconstruction algorithm using the cellular neural network that solves the Navier-Stokes equation, which offered a robust method for estimating the background signal within the gene spot region.

In the existing co-segmentation methods, co-segmentation is commonly modeled as an optimization problem, which introduces foreground similarity to fit common object segmentation. For MRF-based co-segmentation model, the energy function is usually defined as

E = U_{pixel} + V_{pair} + G_{global}

(1)

where U _pixel is the data term which evaluates the potential of the pixel to the foreground or background. V _pair is the smoothness term to measure the smoothness of local pixels. These two terms are single-image segmentation-based term. The term G _global is the global term evaluates the similarity between the foregrounds. By minimizing the energy function, only common objects are extracted.

Although the global term makes the foreground similar, it also results in difficult minimization since searching the regions with similar appearance is challenging. The existing methods employ various global terms to cope with the minimizations. Rother et al. [5] used L1-norm to measure foreground similarity. The trust region graph cut method was proposed for energy optimization. Mukherjee et al. [6] replaced L1-norm with L2-norm. Pseudo-Boolean optimization was used for optimization. Instead of penalizing foreground difference, Hochbaum and Singh [7] rewarded foreground similarity. Vicente et al. in [8] modified the Boykov-Jolly model for foreground similarity measurement. Dual decomposition was employed for minimization.

Other methods have also been used for co-segmentation task. Joulin et al. [10] segmented common objects by clustering strategy. The main idea was that the common objects can be classified into the same class since they have similar features. Hence, by searching a classifier based on spectral clustering technique and positive definite kernels that best classified the common objects, co-segmentation was achieved. In the work of Batra et al. [11], an interactive co-segmentation method which segmented common objects through human interaction guided by an automatic recommendation system was proposed. Mukherjee et al. [12] proposed a scale-invariant co-segmentation method to segment common objects through the fact that the rank of the matrix corresponding to foreground regions should be equal to 1. The algorithm of Chang et al. [13] solved co-segmentation by a novel global energy term which used the co-saliency model to measure foreground potentials. The energy function considering both foreground similarity and background consistency was submodular and can be efficiently minimized by graph cut algorithm. Vicente et al. [14] focused on interesting object co-segmentation. A useful feature to distinguish the common objects was trained from a total of 33 features through random forest regression. The common objects were segmented by loop belief propagation on a full connected graph. Kim et al. in [15] solved multiple-class-based co-segmentation problem by anisotropic heat diffusion. By combining clustering method and random walk segmentation method, multiple classes can be successfully labeled from a large number of images. Recently, Joulin et al. in [19] focused on multi-class co-segmentation, which considers discriminative clustering and multi-class co-segmentations into account. More accurate segmentation results were obtained. Collins et al. in [18] solved co-segmentation by random walker-based segmentation method which added foreground consistency into traditional random walker-based method. Compared with MRF-based co-segmentation, the random walker-based co-segmentation method was efficient. Rubio et al. in [26] segmented common objects by modifying the wrongly segmented from the other successful segmentations. A co-segmentation framework was formulated by MRF, and a new global term based on graph matching was proposed. In the work of Meng [17], co-segmentation from a large number of original images with similar backgrounds was considered. A digraph was constructed by foreground similarity and saliency values. The co-segmentation problem was formulated as the shortest path problem and was solved by dynamic programming method.

3 The proposed model

In this section, we first introduce the GrabCut method. Then, the proposed method is illustrated.

3.1 GrabCut segmentation

GrabCut is an interactive image segmentation method. It has been widely used in many computer vision tasks. In GrabCut, the image segmentation is a label problem which assigns a label α _i∈{0,1},i=1,⋯,N to each image pixels z _i,i=1,⋯,N with α _i=1 for foreground and 0 for background. N is the number of pixels. The label problem is then set as an optimization problem by minimizing the energy function

E (α, k, θ, z) = U (α, k, θ, z) + V (α, z)

(2)

where α=(α ₁,…,α _N), z=(z ₁,…,z _N), and θ describes image foreground and background appearance model which is represented as

θ = {h (z; α), α = {0, 1}}

(3)

where α=0 for the background model and α=1 for the foreground model. h is the appearance model, which is represented as a Gaussian mixture model. In the model, a full-covariance Gaussian mixture with K components is considered for the construction. With a Gaussian mixture model (GMM) for the foreground or the background, each pixel z _i is assigned a unique GMM component k _i either from the background or the foreground model according to α=0 or 1, where k=(k ₁,…,k _N), θ={Π(α,k _i),μ(α,k _i),Σ(α,k _i),α=0,1,i=1,⋯N}. Here, Π(·) are mixture weighting coefficients, and μ(·) and Σ(·) are means and covariances of the distribution p(·).

The data term U(α,k,θ,z) in Equation 2 evaluates the fit of the label α to the date z with θ and k and is represented as

U (α, k, θ, z) = \sum_{n} D (α_{n}, k_{n}, θ, z_{n})

(4)

where n is the number of pixels and

D (α_{n}, k_{n}, θ, z_{n}) = - log p (z_{n} | α_{n}, k_{n}, θ) - log Π (α_{n}, k_{n}) .

(5)

The smoothness term V(α,z) in Equation 2 encourages coherence in local regions and is defined as

V (α, z) = γ \sum_{(m, n) \in C} [α_{n} \neq α_{m}] exp - β ∥ z_{m} - z_{n} ∥^{2}

(6)

where [·] denotes the indicator function taking values 0,1 for a predicate ·. β is constant. C is the set of pairs of neighboring pixels. The pixels are neighbors if they are adjacent either horizontally/vertically or diagonally.

Based on Equation 2, the segmentation is obtained by minimizing Equation 2 represented as

\hat{α} = arg min_{α} E (α, k, θ, z)

(7)

By fixing k and θ, the problem in Equation 2 is solved by minimum cut algorithm (graph cut algorithm). In GrabCut, the energy minimization scheme works iteratively, which updates k and θ by current segmentation and uses new k and θ to obtain new segmentation by solving the problem in Equation 2. The algorithm starts from an initial curve setting manually. The iteration stops when convergence criterion is satisfied.

3.2 The proposed method

Unlike single-image-based GrabCut method, a pair of images z^l,l=0,1 is considered in the proposed model. Set $z_{i}^{l}$ is the i th pixel in the l th image and $z^{l} = (z_{1}^{l}, \dots, z_{N^{l}}^{l})$ . The label for image z^l,l=0,1 is α^l,l=0,1. The proposed method sets co-segmentation as a label problem that assigns 1 for pixels on the common objects and 0 otherwise. To segment common objects, we design a new unary term by considering foreground similarity, which guarantees that only common objects are considered. In the method, the unary term is defined as

\begin{matrix} U (α^{l}, k^{l}, θ^{l}, θ^{1 - l}, z^{l}) = \{\begin{matrix} \sum_{n} (λ \cdot D^{1} (α_{n}^{l}, k_{n}^{l}, θ^{l}, z_{n}^{l}) + (1 - λ) \\ \times D^{2} (α_{n}^{l}, k_{n}^{1 - l}, θ^{1 - l}, z_{n}^{l})), if α_{n}^{l} = 1; \\ D^{1} (α_{n}^{l}, k_{n}^{l}, θ^{l}, z_{n}^{l}), else \end{matrix} \end{matrix}

(8)

where θ^l and k^l are the parameter sets of GMM representation of z^l, which is similar to the definition in GrabCut. λ is the scale factor to balance the impacts of the foregrounds in the current image and the other image. D¹ evaluates the fit of the label α^l to the date z^l with θ^l and k^l in the current image and is represented as

D^{1} (α_{n}^{l}, k_{n}^{l}, θ^{l}, z_{n}^{l}) = - log p (z_{n}^{l} | α_{n}^{l}, k_{n}^{l}, θ^{l}) - log Π (α_{n}^{l}, k_{n}^{l}) .

(9)

The foreground similarity term D² evaluates the similarity between the foregrounds and is defined as

\begin{array}{l} D^{2} (α_{n}^{l}, k_{n}^{1 - l}, θ^{1 - l}, z_{n}^{l}) = & - log p (z_{n}^{l} | α_{n}^{l}, k_{n}^{1 - l}, θ^{1 - l}) \\ - log Π (α_{n}^{l}, k_{n}^{1 - l}) . \end{array}

(10)

We use the smoothness term in GrabCut shown in Equation 6 to form the smoothness term of the proposed method. Then, the co-segmentation is set to minimize the energy function represented as

{\hat{α}}^{l} = arg min_{α^{l}} E (α^{l}, k^{l}, θ^{l}, k^{1 - l}, θ^{1 - l}, z^{l}) .

(11)

We can see from Equation 10 that D² evaluates the fit of the pixels with $α_{n}^{l} = 1$ in the current image to the foreground model θ^1−l in the next image. The pixels on common objects have small D² since they are similar to the common objects in the next image. Hence, it intends to be assigned 1. For other pixels, a larger D² will be obtained. Hence, it intends to be a background pixel.

By keeping k^l, θ^l, k^1−l, and θ^1−l fixed, the energy function is minimized by minimum cut method. Similar to GrabCut, we iteratively update the foreground model and the background model to accurately segment the common objects. The main difference is that there are two images in our model. Hence, we improve the iteration method by simultaneously updating the foreground model and the background model of two images. In the optimization method, the initial curve is first set to each image. The initial segmentations are obtained by treating the pixels inside the curve as the foreground and the pixels outside the curve as the background. Then, based on the initial segmentation, we model the foreground model and background model $θ_{k}^{l}$ and k^l for each image which are then used to obtain the foreground potential and background potential for each image according to Equations 9 and 10. Finally, we optimize the two energies by Equation 11 to obtain segmentation results. The segmentation results are used as the new initial segmentations for the next iteration. The algorithm stops when stop condition is satisfied.

We analyze next the proposed model compared with the GrabCut. Their difference can be found in Figure 2, where Figure 2a shows the model of the GrabCut, which is related to a single image. There is an initial curve C⁰ which separates the image Z⁰ into two regions, i.e., the region inside the curve and the region outside the curve. The GrabCut considers the region inside the curve as the foreground and the region outside the curve as the background. Then, the GMM of the foreground and the background are determined based on the two regions. The GMM is represented as k⁰ and θ⁰. For a pixel (the blue points), there are two influences in the GrabCut model. One is the foreground model represented by the green lines. The other is the background model represented by the yellow lines. Based on the two aspects, the point will be given a label. We can see that GrabCut is sensitive to the initial curve setting because the change of initial curve will also change the parameters of the foreground model and background model, which results in different segmentations. Hence, for GrabCut, manually selecting the initial curve is used for the segmentation.

The proposed model is represented in Figure 2b, where there are two images, Z⁰ and Z¹, rather than a single image. For each image, there is a curve. The curve also segments the image into two regions: the region inside the curve and the region outside the curve. Like the analysis of the GrabCut in Figure 2a, we consider the blue points in Z⁰. We can see that there are three terms in our model. The first two are the foreground model (the green line in Z⁰) and the background model (the yellow line) in the current image Z⁰. These two terms are similar to the two in GrabCut. The third is the foreground model in Z¹. For the third influence, since only the common objects share similar colors, the pixels on the objects will have large response of the third term. While for a background pixel, it has a small response, which results in the label of background. Hence, the pixels on the common objects will be considered as foreground.

Comparing our model with GrabCut, the difference is that we introduce the third term in our model, which results in the segmentation of the common objects. We can see that the third term also results in the robustness to initial curve setting. The reason is that the initial curve setting of the current image may change the foreground model. However, the next image can provide the accurate foreground model when the curve C¹ covers most of the area of the image pairs. The appearance model of the third model can improve the label of the pixels and result in successful segmentation. Here, we have to guarantee that the curve in the next image covers the most area on the common objects. This can be simply satisfied by setting the initial curve as the rectangle with small distance to the image edge. We can see that this initial curve setting can be used for all image pairs, which means that the proposed method does not need to manually set the initial curve. Note that other initial curve settings, such as the saliency map-based initial curve setting or manual setting, can also be used as the initial curve setting.

In this paper, we set the initial curve as a rectangle with small distance (ν=5) to the image edge; some examples are shown in Figure 3. The iteration stops when the difference between the old segmentation and new segmentation is less than a threshold T _s. The algorithm of the proposed method is shown in Algorithm 1.

Algorithm 1 The algorithm for MGrabCut

4 Experimental results

In this section, we introduce the experimental results. The subjective results and objective results are illustrated.

4.1 Datasets

We use the co-saliency database given in [27]. The co-saliency database contains 105 image pairs which are collected from several well-known datasets, such as the Microsoft Research Cambridge image database, the Caltech-256 Object Categories database, and PASCAL VOC dataset. Each image pair contains a common object. All image pairs are considered in our method. Due to the complexities of the backgrounds and the changes of the foregrounds, the co-saliency dataset is challenging for co-segmentation task.

4.2 Results of the proposed method

We first introduce the parameter setting. In Equation 8, λ=0.2. For GMM, we set the number of Gaussian distribution N=5 for the foreground model and N=3 for the background model. The stop condition of the iteration is set as the number of the iteration for simplicity. We set the stop number as 9.

The results of the proposed method are shown in Figure 4, where the first row for each image block shows the original images. The segmentation results by the proposed method are shown in the fifth row. We can see that the original images have complex backgrounds. Meanwhile, the proposed method successfully segments the common objects from these images. For example, the ‘bus’ in the last image pair schoolbus are segmented from the original images although the backgrounds are complex.

We also compare our method with GrabCut [21] and several existing co-segmentations such as [10, 15]. Joulin et al. in [10] proposed co-segmentation model using discriminative clustering and spectral clustering method. In the method, a supervised classifier trained from a label of the images corresponds to a separation. The label leading to the maximal separation of the two classes is the co-segmentation result. The searching problem is solved by relaxing to a continuous convex optimization problem. Superpixels are generated by the method in Ncuts [28]. The results by the method in [10] are shown in the second row of each image block in Figure 4. It is seen that the common objects are successfully segmented from original images by [10], such as the ‘boats’ in boats. Meanwhile, there are unsuccessful segmentations, such as first image pairs kim. These unsuccessful segmentations are caused by the complexity and similarity of the background.

The method in [15] focuses on segmenting multiple common objects, which uses color information to label the similar objects. By using linear anisotropic diffusion method into co-segmentation, the co-segmentation is molded as a K-way segmentation problem that maximizes the temperature on anisotropic heat diffusion. Greedy algorithm is employed for optimization. In the experiment, the code released by the author is used. The intra-image Gaussian weights and the number of segments (K) are adjusted to obtain more accurate results. The results by the method in [15] are shown in the third row of each image block in Figure 4. We can see that the method achieves successful segmentation on several classes, such as boats and faces2. Unsuccessful results are also obtained, such as kim and schoolbus. The reason is mainly caused by the fact that the complex background interferes with the common object extraction.

For GrabCut, we use the same initial curve for fair comparison. The results by the GrabCut-based method are shown in the fourth row in each image block of Figure 4. It is seen that GrabCut successfully segments the common objects from the original images, such as the ‘car’ in the first image of car. There are also unsuccessful segmentations, such as the ‘butterfly’ in the first image of butterfly where the red flower is also segmented as the foreground. The unsuccessful segmentations are caused by the fact that it is not enough to distinguish the objects from the background by only considering a single image. For example, the red flower is located inside the initial curve. Hence, GrabCut segments the red flower as the foreground. For MGrabCut, the red flower is segmented as the background since there are no similar regions in the next images.

Furthermore, we show the segmentation results under different scale λ which balances the foreground potential that is similar to the foregrounds in the current image or the other image. The results by various λ are shown in Figure 5, where the original images are shown in the first column of each image block. The results with λ=0.1,0.2,0.3,0.4, and 0.5 are shown in the second-to-the-last column, respectively. Six image pairs are shown. We can see that the proposed method is robust to λ. Meanwhile, slight differences are obtained by adjusting λ. A small λ results in segmentation similar to single-image segmentation, which may contain redundant regions, such as the segmentation of plane. While a large λ induces to the segmentation of common objects. However, several regions may be lost, such as the segmentation of train. Hence, we set λ=0.2 for the trade-off between single-image segmentation and common object segmentation.

Figure 6 displays some segmentation results under different initial curve settings. Three image pairs are shown. For each image pair, we segment the common objects by three initial curve settings, i.e., the initial curves that cover most parts of the common objects, the initial curves that partially cover the common objects, and the initial curves that cover only one of most parts of the common objects. The results of the three initial curve settings are shown in each row. From Figure 4, we can see that the proposed method can achieve successful segmentation in these image pairs with various initial curve settings, which demonstrates that the proposed model is robust to the initial curve setting.

4.3 Objective results

We introduce next the objective evaluation. We evaluate the segmentation performance based on the error rate which is defined as the ratio of the number of wrongly segmented pixels to the total number of pixels. The error rate is small when the object is accurately segmented. Since there are 105 image pairs, we only show the error rates of the 30 image pairs here. The error rates are shown in Table 1. We can see that the proposed method successfully segments the common objects in most of the image pairs. Meanwhile, there are several unsuccessful segmentations, such as ‘cdcora’ and ‘pvocsheepb’. The reason for the unsuccessful segmentation is that the common objects have color variations, which does not fulfill our assumption that the common objects have similar colors.

Table 1 The comparison results compared with the existing methods in terms of the error rate

Full size table

The error rates of the existing method such as the methods in [10, 15, 21] are also shown for comparison. From the results, we can see that the proposed method achieves the lowest error rates in most of the image pairs. We also calculate the average error rate of all image pairs for comparison. The error rates by the existing methods and the proposed method are shown in Figure 7. We can see that the proposed method obtains the smallest mean error rate, which demonstrates the effectiveness of the proposed method. Compared with the original GrabCut method [21], we can see that the MGrabCut achieves lower error rates. The improvements are a benefit from considering foreground similarity.

The error rates with various λ are shown in Figure 8, where the error rate is shown in the y-axis. The x-axis displays different λ. We can see that the error rate is smallest when λ=0.2, which means that considering the foregrounds of both the current image and the other image will result in a more accurate co-segmentation.

4.4 Computational complexity analysis

In the proposed method, the minimization is achieved by graph cut algorithm. Since there are pairs of images, the computational complexity of the proposed method is two times that of the graph cut algorithm O(n logn), which equals to O(n logn). Hence, the computational complexity of the proposed method is O(n logn), which has the same computational complexity with the existing graph cut-based segmentation methods such as [7, 21]. Meanwhile, because of the efficiency of the graph cut minimization, the computational complexity of the proposed method is lower than the computational complexities of the other co-segmentation methods such as [10, 16, 29], as shown in Table 2. It is seen that the computational complexity of the proposed method is low compared with the existing methods.

Table 2 The computational complexities for the comparison methods and our method

Full size table

5 Conclusions

This paper proposes a new co-segmentation model by extending GrabCut to MGrabCut. To consider common object segmentation, we introduce the foreground appearance model of the other image to construct the unary term of current images. Both the foreground similarity and background consistency are considered to design our model. The common objects are finally segmented by mutually updating the foreground model and the background model of two images. The experimental results demonstrate the effectiveness of the proposed method. In the future, we will extend the proposed model to solve images with more than two images. Furthermore, other local features will be considered for more accurate segmentation.

References

Mori G, Ren X, Efros A, Malik J: Recovering human body configurations: combining segmentation and recognition. In IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society; 27 June–2 July 2004:326-333.
Google Scholar
Gu C, Lim JJ, Arbeláez P, Malik J: Recognition using regions. In IEEE Conference on Computer Vision and Pattern Recognition. Florida: IEEE Computer Society; June 2009:20-25.
Google Scholar
Jacobson N, Lee YL, Mahadevan V, Vasconcelos N, Nguyen T, A novel approach to fruc using discriminant saliency and frame segmentation 2010, 19(11):2924-2934.
Article MathSciNet Google Scholar
Jing F, Li M, Zhang HJ, Zhang B: Relevance feedback in region-based image retrieval. IEEE Trans. Circuits Syst. for Video Technol. 2004, 14(5):672-681. 10.1109/TCSVT.2004.826775
Article Google Scholar
Rother C, Kolmogorov V, Minka T, Blake A: Cosegmentation of image pairs by histogram matching-incorporating a global constraint into mrfs. In IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Computer Society; June 2006:17-22.
Google Scholar
Mukherjee L, Singh V, Dyer CR: Half-integrality based algorithms for cosegmentation of images. In IEEE Conference on Computer Vision and Pattern Recognition. Florida: IEEE Computer Society; June 2009:20-25.
Google Scholar
Hochbaum DS, Singh V: An efficient algorithm for co-segmentation. In International Conference on Computer Vision. Kyoto: IEEE Inc.; 29 September–2 October, 2009.
Google Scholar
Vicente S, Kolmogorov V, Rother C: Cosegmentation revisited: models and optimization. In European Conference on Computer Vision. Grete: Springer-Verlag; September 2010:5-11.
Google Scholar
Batra D, Parikh D, Kowdle A, Chen T, Luo J: Seed image selection in interactive cosegmentation. In IEEE International Conference on Image Processing. Cairo: IEEE Computer Society; November 2009:7-10.
Google Scholar
Joulin A, Bach F, Ponce J: Discriminative clustering for image co-segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. San Francisco: IEEE Computer Society; June 2010:13-18.
Google Scholar
Batra D, Kowdle A, Parikh D: Icoseg: interactive co-segmentation with intelligent scribble guidance. In IEEE Conference on Computer Vision and Pattern Recognition. San Francisco: IEEE Computer Society; June 2010:13-18.
Google Scholar
Mukherjee L, Singh V, Peng J: Scale invariant cosegmentation for image groups. In IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs: IEEE Computer Society; June 2011:20-25.
Google Scholar
Chang K, Liu T, Lai S: From co-saliency to co-segmentation: an efficient and fully unsupervised energy minimization model. In IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs: IEEE Computer Society; June 2011:20-25.
Google Scholar
Vicente S, Rother C, Kolmogorov V: Object cosegmentation. In IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs: IEEE Computer Society; June 2011:20-25.
Google Scholar
Kim G, Xing EP, Fei-Fei L, Kanade T: Distributed cosegmentation via submodular optimization on anisotropic diffusion. In International Conference on Computer Vision. Barcelona: IEEE Inc.; November 2011:6-13.
Google Scholar
Meng F, Li H, Liu G, Ngan KN: Image cosegmentation by incorporating color reward strategy and active contour model. IEEE Trans. on Cybern. 2013, 43(2):725-737.
Article Google Scholar
Meng F, Li H, Liu G, Ngan KN: Object co-segmentation based on shortest path algorithm and saliency model. IEEE Trans. Multimedia 2012, 14(5):1429-1441.
Article Google Scholar
Collins M, Xu J, Grady L, Singh V: Random walks for multi-image cosegmentation: quasiconvexity results and gpu-based solutions. In IEEE Conference on Computer Vision and Pattern Recognition. Rhode Island: IEEE Computer Society; June 2012:16-21.
Google Scholar
Joulin A, Bach F, Ponce J: Multi-class cosegmentation. In IEEE Conference on Computer Vision and Pattern Recognition. Rhode Island: IEEE Computer Society; June 2012:16-21.
Google Scholar
Meng F, Li H, Liu G: Image co-segmentation via active contours. In IEEE International Symposium on Circuits and Systems. Seoul: IEEE Inc.; May 2012:20-23.
Google Scholar
Rother C, Kolmogorov V, Blake A: Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph 2004, 23(3):309-314. 10.1145/1015706.1015720
Article Google Scholar
Boykov YY, Jolly MP: Interactive graph cuts for optimal boundary & region segmentation of objects in n-d images. In International Conference on Computer Vision. Vancouver: IEEE Inc.; July 2001:7-14.
Google Scholar
Zeng N, Wang Z, Li Y, Du M, Liu X: A hybrid ekf and switching pso algorithm for joint state and parameter estimation of lateral flow immunoassay models. IEEE/ACM Trans. Comput. Biol. Bioinform 2012, 9(2):321-329.
Article Google Scholar
Zeng N, Wang Z, Li Y, Du M, Liu X: Identification of nonlinear lateral flow immunoassay state-space models via particle filter approach. IEEE Trans. Nanotechnol 2012, 11(2):321-327.
Article Google Scholar
Zineddin B, Wang Z, Liu X: Cellular neural networks, Navier-Stokes equation and Trans, microarray image reconstruction. IEEE. Image Process 2011, 20(11):3296-3301.
Article MathSciNet Google Scholar
Rubio J, Serrat J, López A, Paragios N: Unsupervised co-segmentation through region matching. In IEEE Conference on Computer Vision and Pattern Recognition. Rhode Island: IEEE Computer Society; June 2012:16-21.
Google Scholar
Li H, Ngan KN: A co-saliency model of image pairs. IEEE Trans. Image Process 2011, 20(12):3365-3375.
Article MathSciNet Google Scholar
Shi J, Malik J: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell 2000, 22(8):888-905. 10.1109/34.868688
Article Google Scholar
Raviv TR, Socheny N, Kiryati N: Shape-based mutual segmentation. Int. J. Comput. Vis 2008, 79: 231-245. 10.1007/s11263-007-0115-3
Article Google Scholar

Download references

Acknowledgements

This work is partially supported by the research fund of Sichun Key Laboratory of Intelligent Network Information Processing (SGXZD1002-10), Sichuan Key Technology Research and Development Program (2012GZ0019, 2013GZX0155), and Xihua University Key Laboratory Development Program (szjj2011-021).

Author information

Authors and Affiliations

Center for Radio Administration & Technology Development, Xihua University, Chengdu, Sichuan, 610039, China
Zhisheng Gao & Zheng Pei
College of Engineering and Science, Victoria University, Melbourne, Victoria, 8001, Australia
Peng Shi
School of Electrical and Electronic Engineering, The University of Adelaide, Adelaide, South Australia, 5005, Australia
Peng Shi
Department of Engineering, Faculty of Technology and Science, University of Agder, 4898, Grimstad, Norway
Hamid Reza Karimi

Authors

Zhisheng Gao
View author publications
You can also search for this author in PubMed Google Scholar
Peng Shi
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Reza Karimi
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Pei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamid Reza Karimi.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Gao, Z., Shi, P., Karimi, H.R. et al. A mutual GrabCut method to solve co-segmentation. J Image Video Proc 2013, 20 (2013). https://doi.org/10.1186/1687-5281-2013-20

Download citation

Received: 20 November 2012
Accepted: 14 March 2013
Published: 20 April 2013
DOI: https://doi.org/10.1186/1687-5281-2013-20

A mutual GrabCut method to solve co-segmentation

Abstract

1 Introduction

2 Related work

3 The proposed model

3.1 GrabCut segmentation

3.2 The proposed method

Algorithm 1 The algorithm for MGrabCut

4 Experimental results

4.1 Datasets

4.2 Results of the proposed method

4.3 Objective results

4.4 Computational complexity analysis

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords