- Open Access
Image segmentation based on adaptive K-means algorithm
EURASIP Journal on Image and Video Processingvolume 2018, Article number: 68 (2018)
Image segmentation is an important preprocessing operation in image recognition and computer vision. This paper proposes an adaptive K-means image segmentation method, which generates accurate segmentation results with simple operation and avoids the interactive input of K value. This method transforms the color space of images into LAB color space firstly. And the value of luminance components is set to a particular value, in order to reduce the effect of light on image segmentation. Then, the equivalent relation between K values and the number of connected domains after setting threshold is used to segment the image adaptively. After morphological processing, maximum connected domain extraction and matching with the original image, the final segmentation results are obtained. Experiments proof that the method proposed in this paper is not only simple but also accurate and effective.
Image segmentation refers to the decomposition of an image into a number of non-overlapping meaningful areas with the same attributes. Image segmentation is a key technology in digital image processing, and the accuracy of segmentation directly affects the effectiveness of the follow-up tasks. Considering its complexity and difficulty, the existing segmentation algorithm has achieved certain success to varying degrees, but the research on this aspect still faces many challenges. Clustering analysis algorithm divides the data sets into different groups according to a certain standard, so it has a wide application in the field of image segmentation.
Image segmentation as one of the key technology of digital image processing, combined with relevant professional knowledge, is widely used in machine vision, face recognition, fingerprint recognition, traffic control systems, satellite image positioning objects (roads, forests, etc.), pedestrian detection, medical imaging, and many other fields, and it is worthy of in-depth study to explore.
Based on the original method, our algorithm proposed in this paper improves both the accuracy of image segmentation and algorithm structure. The luminance component l in LAB color space is fixed to filter the influence of background. Meanwhile, a new method which determines the value of K in K-means clustering algorithm was proposed. The image segmentation method proposed in our paper is widely applied and has achieved good results in the field of food image.
The traditional image segmentation algorithm mainly includes the segmentation method based on the threshold value , the segmentation method based on the edge  and the segmentation method based on the region . Because image segmentation technology is closely related to other disciplines in the field of information, such as the mathematics, pattern recognition artificial intelligence, computer science, and other disciplines in the production of the new theory and technology, a lot of segmentation technology combing special theory appeared. The improved algorithm proposed in this paper utilizes the theory of cluster analysis . Cluster analysis is an important human behavior. As early as childhood, one can learn how to distinguish different kinds of things by constantly improving the subliminal clustering pattern. Various clustering methods are constantly proposed and improved. This proposed algorithm is based on classical K-means cluster analysis. Because of the high efficiency of the algorithm, it is widely used in the clustering of large-scale data . At present, many algorithms are extended and improved around this algorithm. Compared with the traditional K-means method, the improved algorithm we proposed in this paper will transform the image into the LAB color space before segmentation and set the luminance l to the fixed value to reduce the interference caused by the background. In addition, for the selection of K value, creatively put forward the number of connected domain images meet requirements comparing with iterative variables, and when the two are equal, the value of K is the value of iterative variable. The improvement of the above part can greatly improve the accuracy of image segmentation and also optimize the optimization of algorithm structure to a certain extent.
In this dissertation, the proposed method is divided into the following steps: image normalization, color space conversion, adaptive K-means segmentation, and image morphology processing. Finally, the maximum connected domain algorithm is used to match the original image. Our technology flowchart is shown in Fig. 1.
Before the formal processing of the image, we will first perform some necessary preprocessing on the image to meet the requirements of the subsequent steps and achieve faster and better segmentation.
Since the size of the processed image is different, the image needs to be normalized firstly. The photos will be compressed to a certain extent, so that the subsequent segmentation can be carried out more quickly while the clarity of basic requirements is satisfied.
L*a*b* color space conversion
Because L*a*b*  is wide in color space, it not only contains all the color fields of RGB and CMYK but also displays the colors that they cannot perform. The colors perceived by human eyes can be expressed by the L*a*b* model. In addition, the beauty of the L*a*b* color model is that it compensates for the inequality of the color distribution of the RGB color model because the RGB model has too much transition color between blue and green. However, it lacks yellow and other colors in green to red. Therefore, we choose to use L*a*b* when dealing with food images that need to retain as wide a color space as possible .
After a lot of experimental verification, we found that different food images will inevitably cause uneven background due to differences in conditions such as light and the color of the food itself, which will seriously affect the segmentation results. Therefore, we take the L* component, the luminance component, in L*a*b* as a fixed value x.
First of all, we need to realize the conversion of the L*a*b* color space and the RGB color space of the image itself. Since RGB cannot be directly converted into L*a*b*, it needs to be converted into XYZ and then converted into L*a*b*, i.e., RGB–XYZ–L*a*b*. Therefore, our conversion is divided into two steps:
(1) RGB to XYZ.
Assume that r, g, b are three channels of pixels, and the range of values is [0,255]. The conversion formula is as follows:
Where, M is a 3 × 3 matrix:
The gamma function in the formula is used to perform nonlinear tone editing on the image in order to improve the image contrast and the gamma function is not fixed.
(2) XYZ to L*a*b*
The following results are obtained by using the obtained XYZ results to convert the three components L*A*B*:
Where f is a calibration function similar to gamma, which is defined as follows.
Finally, the L* component in this method is defined as fixed value x, and Fig. 2 is the comparison of food images before and after processing.
Adaptive K-means segmentation
K-means algorithm is the most classical partition-based clustering method, and it is one of the ten classical data mining algorithms. The basic idea of K-means algorithm is to cluster the objects closest to them by clustering the K points in the space. Iteratively, the values of centroid of clusters are updated one by one until the best clustering results are obtained. K-means algorithm is a typical representative of the clustering method based on the prototype function. It takes the distance from the data point to the prototype as the objective function of optimization. The adjustment rules of iterative operation are obtained by the method of finding extreme values of functions. The K-means algorithm takes Euclidean distance as the similarity measure, which is to find the optimal classification of an initial cluster center vector, so that the evaluation index is minimum. The error square sum criterion function is used as a clustering criterion function. Although the algorithm of K-means is efficient, value of K should be given in advance, and the selection of K value is very difficult to estimate. In many cases, it is unknown in advance how many categories the given data set should be divided into.
As we mentioned, K-means is one of the classical clustering algorithms in the partitioning method. The efficiency of this algorithm is high, but due to the need to determine the number of clusters K, it brings certain difficulties for automated calculations. This method combines the maximum connected domain algorithm to determine K values of the K-means segmentation method adaptively. After extensive experiments, we have found that the value of K is usually between 2 and 10. We use the maximum connected domain algorithm to restore the image containing only the target object, record the number, and compare it with the K value to obtain an accurate K value. The algorithm steps are as follows:
As shown in the pseudo code of this adaptive K-means method, when choosing the K value, it starts from 2 and progressively increases to 10. According to our large number of experimental results, the selection of cluster K is mostly between 2 and 10. Determining the correct K value is the key to the success of the K-means method. We start with the selection of K = 2, that is, image segmentation starts from two clusters, and then the image is segmented. Finally, we determine the number of segmentation results based on the maximum connected domain algorithm. If the image number of the final segmentation result matches the K value, the K value is selected correctly. If the K value does not match, the K value at the beginning will be increased until the above two values match.
After the segmentation of K-means algorithm, we obtained the segmentation results of all target objects with few influence of background, as shown in Fig. 3.
As we observed in Fig. 3, many dark (bright) areas of the target object are below (above) the selected threshold and therefore are misclassified. For this, additional morphological processing must be implemented.
The median filter has a good filtering effect on the pulse noise, especially when the noise is filtered and the edge of the signal can be protected so that it is not blurred. A large number of experiments show that the median filter is better for filtering noise in food image segmentation. We performed binary operation on the filtered image, and the threshold value was 0.95.
Take and reverse the processed image (exchanging black and white part), and there are many white gaps that can be observed in the image. In order to ensure the subsequent step accuracy, we performed a partial filling operation: convert the black pixel parts in the connected domain formed by the white part into white to ensure the integrity of the image. The processed image after partial filling is shown in Fig. 4. We can see clearly that, comparing with image in Fig. 3, the gaps and noises in connected domains are all erased and the image can be segmented into several integrated parts.
Edge extraction of canny operator
The Canny operator [8, 9], based on the measurement of the product of noise-signal ratio and positioning, gets the optimized approximation operator to extract the contour of the target object in the image. In the previous step, we have removed the influence of the noise, and we continue to apply the Canny operator to extract the edge of the target object. Here are the following steps:
(1) Calculate the gradient value and direction
The edges in the image can point to all directions, so the canny algorithm uses four operators to detect horizontal, vertical, and diagonal edges in the image. The method proposed in this paper uses sobel operator to return the first derivative values of horizontal Gx and vertical Gy direction, so as to determine the gradient G and direction of pixel points θ. The formulas are as follows:
(2) Non-maximum inhibition
Non-maximal suppression is an edge sparse technique that helps to suppress all gradient values other than the local maximum to 0. The algorithm compares the gradient intensity of the current pixel with two pixels along the positive and negative gradient direction, and if the gradient intensity of the current pixel is bigger than that of the other two pixels, the pixel point remains the edge point; otherwise, the pixel point will be suppressed.
In order to calculate more accurately, linear interpolation is used between two adjacent pixels across the gradient direction to obtain the pixel gradient to be compared.
(3) Double threshold detection
The canny algorithm used in this paper applies double threshold values, i.e., a high threshold value and a low threshold value, to distinguish the edge pixels. If the edge pixel gradient is larger than the high threshold, it is considered to be a strong edge point. If the edge gradient is smaller than the high threshold and larger than the low threshold, it is marked as a weak edge point, and points below the low threshold are suppressed.
(4) Hysteresis boundary tracking
The hysteresis boundary tracking algorithm checks an 8-connected domain pixel of a weak edge point. This algorithm searches for all connected weak edges. As long as there is a strong edge point, the weak edge point is considered to be the real edge and it can be retained; otherwise, the weak edge is suppressed.
After these steps, the image set of the target objects that we restored to remove the background influence is shown in Fig. 5.
Maximum connected domain algorithm matching
It can be seen from Fig. 6 that the output results are not ideal and still need to be screened manually due to the small noise in the image which is invisible to the naked eye. There is a big gap between the size of the invalid image and the target image that are not completely filtered. That is to say, there is still a large gap between the smallest object and the noise that can be distinguished by human eyes. After a lot of experiments, the number of pixels of the smallest effective object is set to n. Connected domain smaller than n is recognized as noise. The final processing result is shown in Fig. 7.
Discussion and experiments
In this article, an adaptive K-means algorithm was proposed. First of all, change the picture to LAB color space, then use the adaptive K-means algorithm to segment it where the value of K is a cycle from 2 to 10. Next, the image is converted into two valued by morphological operations. Finally, under the condition of setting the threshold, use the operation of selecting the maximum threshold to gain the segmentation result in iteration. If the number of results is the same as the value of K at this moment, the stop iteration, and the results of division are the final results.
Figure 8a is a picture contains hand, a cake and a cheesecake. Change it to LAB color space and set the luminance component L to a fixed value. The result is showed in Fig. 8b. The K-means method is called to cluster and segment the Fig. 8b, and Fig. 8c is obtained. Due to the noise existed in Fig. 8c, a morphological processing has been done on it. Then, we get the Fig. 8d. Next, the value for setting the maximum threshold is 489; get the maximum threshold image gradually and match it with the origin picture to get the segmentation results as shown in Fig. 8e–g. At this moment, the number of results is three, the same as K; therefore, stop the iteration and get the final results.
The above segmentation process is just one representative of our large number of experiments. We also selected medical, animal, landscape, plants, and other different types, different styles and different application areas to carry out experiments. According to our experimental results, the adaptive K-means segmentation method proposed in this paper has great practical value. Figure 9 shows some examples of segmentation results that we have applied in other fields.
The analysis of segmentation results
In order to measure the accuracy of results segmented by the methods proposed in this article, we compare them with the results of artificial segmentation in Photoshop. The calculation method of error rate is as follows:
Among these parameters, Error represents for the rate of error in segmentation results. S represents the pixel number of results segmented by the method in this article, while represents the pixel number of results segmented by artificial work (Tables 1 and 2). The results of the experiment are as follows:
The experimental results show that the error rate between the segmentation result and the Photoshop segmentation result is acceptable, and the segmentation result is accurate.
The results compared with the watershed segmentation method
The segmentation method is compared with the image segmentation method based on watershed algorithm  and Intersection over Union (IOU) is used as the standard to evaluate the image segmentation. The formula is as follows [12,13,14]:
The formula means the IOU is equal to the ratio of the overlapping area to the combined area of the original image and segmentation results.
Calculate the IOU value between the segmentation results of this paper (Fig. 10b) and the original (Fig. 10a); also the IOU values between the original images and the segmentation results of the watershed algorithm  shown in Fig. 10c are calculated. Compare them. The larger the IOU values are, the greater the similarity between the segmentation result and the original image. The test results are as follows:
The effect of L component in LAB color space on the segmentation results
In the experiment, the image is inevitably affected by illumination during the shooting process which produces shadows or exposures, has a great influence on the image segmentation effect. To solve that, changing the picture to LAB color space is proposed in this article. The L component represents Luminance in LAB color space, the value of which is from 0 to 100. The LAB color space image with original L value is obtained, as shown in Fig. 11a. After the K-means clustering and subsequent operation, the result of segmentation is shown in Fig. 11b. When the value of L is set to be 0, the LAB color space image is shown in Fig. 11c, and the result of segmentation is shown in Fig. 11d.
Compared with the two segmentation results, it is found that, when the luminance component is not set, the shadow of food whose color is closed to food in natural light will be divided into a part of the food, and the result of image segmentation is inaccurate. When the L component is set to a fixed value, the shadow vanishes and the segmentation result is accurate.
The effect of morphological operation on the segmentation results
In order to distinguish the prospects and background, the grayscale image of the adaptive K-means algorithm, as shown in Fig. 12a, is processed by morphological operation. Different colors represent different gray values. And how to select the binarization threshold has a great influence on the image segmentation results.
When the threshold is set to 0.5, which means the pixel with a gray value of less than 128 is set to black in the process of binarization, and more than 128 parts are converted to white. The binarization results are shown in Fig. 12b.
When the threshold is set to 1.0, the image will be transformed into whiteness and the foreground extraction will fail.
For this, after a lot of experiments, we determined that the threshold value was set to 0.95, and the morphological operation results are shown in Fig. 12c. The background was eliminated in the figure, and the hand and food parts remained intact.
The influence of the Unicom domain on the segmentation results
The ultimate goal of this paper is to separate the objects in the picture foreground, so as to facilitate the subsequent operation. The results of morphological processing are used to extract the maximum connected domain and match the original image. Because the small noise in the image is also retained when the connected domain is extracted, the result of the matching of the connected domain and the original image has a lot of useless pictures. Therefore, the setting of the connected domain threshold has great influence on the accurate segmentation result.
When the threshold is set to 300, the connected domain extraction results contain 4 images as shown in Fig. 13.
When the threshold is too large, the smaller components in the image are filtered out as noise. Therefore, after a lot of experiments, we finally determined that the minimum number of effective pixels is 489, that is, when the number of pixels is less than 489, it is discarded. The resulting result is shown in Fig. 14.
In this paper, we propose an accurate image segmentation algorithm which provides a technical basis for volume calculation. Compared with the traditional method, there are several advantages. Firstly, in the K-means method, the method of determining K is optimized, and the loop is used to compare the number of connected domains that meet the requirements in the final step, and when they are equal, the K value is selected correctly. This innovation, compared with other traditional methods such as elbow method, can save a lot of code, save time and improve efficiency. Secondly, the method proposed in this paper that images transformed into l determined LAB color space is not available in traditional methods. The parameter can better filter the influence of the background to the image, so that the final segmentation result is more accurate.
With the rapid development of science and technology, image images are becoming more and more sophisticated. The image segmentation algorithm introduced in this paper is of high accuracy, but it is slightly lacking in running time. In the future work, we plan to enhance the image preprocessing work, and greatly reduce the number of pixels and speed up the algorithm, on the premise of completion of image information.
Intersection over Union
SS Alamr, NV Kalyankar, SD Khamitkar, Image segmentation by using threshold techniques, “Computer Science”. 2(5),83-86 (2010)
J Fan, DY Yau, AK Elmagarmid, WG Aref, Automatic image segmentation by integrating color-edge extraction and seeded region growing. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 10(10), 1454–1466 (2001)
C Li, CY Kao, JC Gore, Z Ding, Minimization of region-scalable fitting energy for image segmentation. IEEE Trans. Image Process. 17(10), 1940–1949 (2008)
Everitt, S Brian, Landau, Sabine, Leese, Cluster analysis. Qual. Quant. 14(1), 75–100 (1980)
M Capó, A Pérez, JA Lozano, An efficient approximation to the K -means clustering for massive data. Knowl.-Based Syst. 117, 56–69 (2017)
Hunterlab. (2018). Hunter l, a, b versus cie 1976 l∗a∗b. Applications Note. 13. 1-4.
MW Schwarz, WB Cowan, JC Beatty, An experimental comparison of RGB, YIQ, LAB, HSV, and opponent color models[J]. ACM Trans. Graph. 6(2), 123–158 (1987)
J Canny, A Computational Approach to Edge Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.8 n.6, p.679-698, June 1986
Wang B, Fan S S. An Improved CANNY Edge Detection Algorithm[M]. 2009.
W Zuo, Research on connected region extraction algorithms [J]. Comp. Appl. Softw. 23(1), 97–98 (2006)
Qazanfari K, Aslanzadeh R, Rahmati M. An Efficient Evolutionary Based Method for Image Segmentation[J]. 2017.
Tan, Pang-Ning; Steinbach, Michael; Kumar, Vipin. Introduction to Data Mining, ISBN 0-321-32136-7. 2005.
P Jaccard, Etude de la distribution florale dans une portion des Alpes et du Jura[J]. Bull. De La Soc. Vaudoise Des Sci. Natur. 37(142), 547–579 (1901)
P Jaccard, The distribution of the flora in the alpine zone.1[J]. New Phytol. 11(2), 37–50 (2010)
A Bieniek, A Moga, An efficient watershed algorithm based on connected components[J]. Pattern Recogn. 33(6), 907–916 (2000)
About the Authors
Xin Zheng, PhD, female, majors in computer science, Associate Professor in the college of information science and technology, Beijing Normal University, China. Her main research areas are computer graphics, image processing, computational intelligence, and so on. There are more than 50 academic papers that have been published in related areas. She has undertaken and participated in 863 national projects and National Natural Science Foundation projects.
Qinyi Lei, female, an undergraduate in Beijing Normal University, major in electronic science and technology, participates in image processing-related projects; published several papers; and holds a national patent. She have great interests in image processing and related fields.
Yao run, female, an undergraduate majoring in computer science and technology at Beijing normal university, has studied graphics, digital image processing, and computer vision and has a strong interest in image processing. She have a national patent and have published papers in the field of image processing.
Yifei Gong, female, is an undergraduate in Beijing Normal University, major in computer science and technology. She has studied graphics, digital image processing, pattern recognition, and some related courses. She is one of the “Multi-view Model Contour Matching Based Food Volume Estimation” patent holders and has posted two other papers with partners, one was accepted by AHFE conference and the other was accepted by MCEI conference.
Qian Yin, female, major in computer science, is an Associate Professor in the college of information science and technology, Beijing Normal University, master’s tutor. Her main research areas are image processing, computational intelligence, and so on. There are more than 50 academic papers that have been published, and many books and textbooks have been published. She has undertaken and participated in 863 national projects and National Natural Science Foundation projects. Prof. Qian Yin is the corresponding author.
The research work described in this paper was fully supported by the grants from the National Key R&D program of China (2017YFC1502505) and the National Natural Science Foundation of China (Project No.61472043). Prof. Qian Yin is the author to whom all correspondence should be addressed.
Ethics approval and consent to participate
Consent for publication
There are no potential competing interests in our paper. And all authors have seen the manuscript and approved to submit to your journal. We confirm that the content of the manuscript has not been published or submitted for publication elsewhere.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.