Color space
In the color image segmentation, the first step is to choose a color space. The color model we know contains RGB, HSI, HSV, CMYK, CIE, YUV, and so on. RGB model is the most commonly used for hardware color model while the HSI model is the most commonly used color model for color processing. They are often used in image processing technology [14, 15].
RGB space is represented by the three primary colors of red, green, blue; other colors are made up with the three primary colors. The RGB model is represented by the Cartesian coordinate system, as shown in Fig. 1. The three axes stand for R, G, B, respectively, and every point in the three-dimensional space means the three components of brightness value. The brightness value is between one and zero.
In Fig. 1, the origin is black, which value is (0,0,0); while the farthest vertex with a value of (1,1,1) from the origin is white. The straight line between black and white called gray line means that the gray value changes from black to white. The remaining three corners represent the complementary color of the three primary colors - yellow, cyan, magenta.
The three components in the RGB color space, which is highly relevant. And it will be changed accordingly as long as the brightness is changed. RGB is a non-uniform color space, so the perception of differences (color) between the two colors cannot stand fort the distance that between two points in the color space. Thus, the RGB color space is often converted to the other color spaces, such as HSI, HSV, the CIE, and Lab, by using linear or nonlinear transform in image processing. However, the original image we have collected usually is the RGB space, color space conversion will increase the amount of computation. And there are many segmentation methods using RGB color space, for example, license location [16] gets the license plate area accurately by calculating the contrast in the RGB components, reducing the calculated amount.
HSI color model is put forward by Munsell, which is suitable for human visual characteristics. The H (hue) means the different colors, S (saturation) means the depth of color, and I (brightness) mean the light and shade of color. This model has two important characteristics: (1) I component has nothing to do with the color information of the image and (2) H and S component are closely linked to the feelings. They are suitable for image processing with the visual system to perceive the color characteristics, and we often take advantage of the H component to segment the color image. The model shows in Fig. 2.
To deal with the image in the HSI space, image must be converted to the HSI mode. The conversion formula (geometric derivation method) as follows Eq. (1):
$$ \begin{aligned} H & =\left.\begin{cases} \theta, & G \geq B \\ 2\pi - \theta, & G < B \end{cases}\right.\\ &\text{when}~ \theta\,=\,\cos^{-1}\left(\frac{(R-G)+(R-B)}{2\sqrt{(R-B)(G-B)+(R-G)^{2}}}\right). \\ I & =\frac{R+G+B}{3}, \\ S & =1-\frac{3\min(R,G,B)}{R+G+B}=1-\frac{\min(R,G,B)}{I}. \end{aligned} $$
(1)
In the conversion Eq. (1), transformation from the RGB model to the HSI model needs more computation. When brightness was zero, saturation was meaningless and when the saturation was zero, hue made no sense. In the conversion, the hue will generate a singularity that cannot be eliminated [17]. The singularity may lead to the discontinuous of the nearby tonal value in value, which will ignore the low saturation pixels in the image processing and lead to the incorrect segmentation [18]. As is known to us, HSI is suitable for human visual characteristics. Therefore, many scholars have put forth a lot of research for color image segmentation in the HSI model. Reference [19] used the saturation and brightness information of HSI model to get texture image segmentation, which is a combination of fractal theory and BP neural network.
Video image capture
Generally, there are two ways in video image capture: (1) the use of video capture card with the SDK development tools. This method relies on the Video capture card and the type of camera, not flexible and universal and (2) the use of Microsoft’s Windows operating system and VFW (Video For Window) software Development Kit of Visual C++. It is a pure software way to realize the collection of video streaming, input, and output. This method does not depend on the type of vision sensors, with better flexibility and versatility [20, 21].
This paper uses OpenCV’s CVCAM technology to realize the collection of video stream of visual sensor, processing, and playback (display) at the same time, and realize the file streaming reading, processing, and broadcasting (display).
-
The introduction of the open source Computer Vision Library (OpenCV)
OpenCV is an open source computer vision library that was funded by Intel, composed of a series of C functions and the C++ class, and provides easy-to-use computer vision framework and rich library. The functions include the field of image processing, computer vision, pattern recognition, and artificial intelligence. With the realization of image processing, signal processing, structure analysis, motion detection, camera calibration, computer graphics, 3D reconstruction, and machine learning, a large number of generic algorithms have higher efficiency.
-
OpenCV library has the following advantages:
-
1.
The cross-platform: Windows, Linux, Mac OS, iOS, Android, independent of the operating system, hardware and graphics manager;
-
2.
Free: open source, does not matter if for business applications or for non-commercial applications;
-
3.
The high speed: uses the C/C++, suitable for the development of real-time applications;
-
4.
Easy to use: has a general image/video to load and a save and retrieve module;
-
5.
Flexible: has good scalability, with low-level and high-level application development kit.
-
OpenCV 1.0 version consists of the following six modules:
-
1.
The CXCORE module: basic data structures and algorithmsfunction;
-
2.
The CV module: main OpenCV functions;
-
3.
CVAUX module: experimental auxiliary functions;
-
4.
The HighGUI module: graphics interface functions;
-
5.
The ML module: machine learning function;
-
6.
CVCAM modules: camera interface function.
Because the OpenCV library functions by optimizing the C code, not only is the code simple and efficient but also can make full use of the advantages of multi-core processors. Therefore, this paper uses Visual C++ development environment and OpenCV technology for video image capture, processing and display.
Video recognition
Video recognition mainly includes three links: front-end video information collection and transmission, video retrieval, and back-end analysis processing. Video recognition requires front-end video capture camera to provide a clear and stable video signal as video signal quality will directly affect the effect of video identification, then through embedded intelligent analysis module to detect, analyze, identify the video screen, and filter out interference, then make targets and track marks to the video screen in abnormal situations. In which, the intelligent video analysis module is based on the principles of artificial intelligence and pattern recognition algorithms. Its researches have been applied in fire recognition system [22].
Segment algorithms of a flame object is a key problem in fire recognition based on video sequences applications and have a direct impaction improving fire recognition accuracy [23]. In segmentation of flame object, its procedure is precisely based on analyzing fire image characteristic. This paper introduces a new segmentation method of a flame goal based on threshold value of the area using digital image processing technology and pattern recognition technology. Further, it can judge whether fire occurs from the characteristic information such as the fire color, spreading area, the similarity change, and fire smoke. Experiments prove that the method has better robustness. It can segment the image of flame effectively from a sequence of images and reduce the false and missing alarms of the fire surveillance system. So it is very effective to the complex large outdoors occasion.
Using video recognition technology, through effective analysis of surveillance video images of discrimination, may well detect a fire and treated as early as possible to reduce the economic losses, safeguard people’s life, and property safety! Either economically or technically video, fire recognition technology has a distinct advantage. It will also be an important research direction for future identification of fire.
Currently, due to different research directions of hardware devices, video fire recognition technology is divided into the following several research ideas: only analysis of static characteristics of the flame, such as the shape, color, and texture of the flame, analysis of the dynamic characteristics such as similarity, spread trend, edge changes, the whole mobile, layered changes, or in the process of dynamic analysis with some simple area characterized criterion [24]. Dynamic characteristics are focused on by comparing two or more adjacent images in the video to judge the fire flame. An analysis of the properties of a single image of flame is relatively lacking; static characteristics are focused on single picture by precise analysis of the geometric properties of flame to arrive at a determination result. This analysis is faster, but ignoring the analysis of trend of the flame between several consecutive frame pictures; judgment result is inevitable errors.
In order to improve the defects and based on the analysis of the fire and the image features, this paper proposes a new segmentation method of flame goal based on threshold value of the area. The method can not only remove noise but also rapidly and accurately extract the target object. Further, it can judge whether fire occurs from the characteristic information, such as the fire flame color, spreading area and the similarity change, and fire smoke. Experimental results show that the method greatly improves the reliability of the fire judging and accuracy and reduces the false alarm and the omission of the fire recognition, shortening the recognition time of fire.
Video segmentation
The so-called video segmentation is to separate the object or objects in video sequences that are important or people are interested in (Video Object; VO) from the background, or that is to draw respectively consistent attributes of each area and, at the same time, to distinguish the background and foreground regions. Video images can be regarded as a kind of 3D image. In other words, the video image is composed of a series of time-continuous 2D images. From the perspective of spatial segmentation, video image segmentation is mainly the use of both the spatial and temporal information to pick out the independent motion regions of the video image in a frame by frame detection [25]. Video segmentation is the premise and foundation of other video image processing, such as video coding, video retrieval, and video database operation. The segmentation quality has a direct impact on the work of the late. So, the research of video segmentation technology is important and challenging.
The main purpose of video segmentation is to segment the moving foreground that people are interested in from the background. At present, there are many splitting methods in video segmentation, such as image difference method, time difference method, and optical flow method. Image difference method is the use of the original image and the reconstructed background image to make differences to realize video segmentation. Time difference method is based on the different images, introducing the relationship between hot and cold time-space domain. Optical flow method is based on the moving object optical flow characteristics with time’s change to efficiently extract and track the moving object [26]. Comparing these methods, image difference method with low computational complexity, less affected by the light and low requirement to the hardware, detected better in most cases. The key of image difference method is how to reconstruct a complete video image background. Background reconstruction method mentioned in the literature requires at least 25 video images of unified coordinate pixel values to reconstruct the background image. This method takes a long time and is not conducive to the implementation of segmentation. Since each frame video image of moving foreground region in the same coordinate point have different gray value in general, i.e., frame difference should be a large difference in the foreground area than in the stationary background area. Therefore, by calculating the gray scale value between successive frames can be obtained foreground motion region.
At present, the general steps of video segmentation are the following: first, the original video image data is simplified and eliminated the noise in order to facilitate the segmentation, which can be accomplished by low-pass filtering, median filtering, and morphological filtering; next, extract the features of the video image, which including color, texture, motion, frame difference, and so on; then, based on certain standards of uniformity, determine the split decision according to the feature extraction to classify the video image, and finally, the post-treatment to achieve filtering noise and accurately extract the boundary, getting accurate segmentation results.
The analysis of segmentation algorithms
Threshold segmentation method [27] is one of the most commonly used parallel regional technologies; it is one of the largest number used in image segmentation. Actually threshold segmentation method is that transform image G to the output image F as follow:
$$ F(i,j)=\begin{cases} 1, & G(i,j) \geq T \\ 0, & G(i,j) < T. \end{cases} $$
(2)
T is the threshold value. If it is the object, then image element G(i,j)=1 or image element G(i,j)=0. Thus, the key of threshold segmentation algorithm is to determine the threshold value. When threshold is determined, we compare the threshold with the gray value of the pixel and divide every pixel concurrently; segmentation result will output the image area directly. Threshold segmentation has the advantage of simple calculation, high efficiency operation, and high speed. It has been widely used in applications that focus on operation efficiency, such as hardware implementation. Scholars have studied all kinds of threshold processing technologies, including global threshold value, adaptive threshold value, and the best threshold value.
In the color image segmentation, we also consider the color value of pixels, i.e., the color information and brightness, which influence the segmentation result. And many scholars have made a lot of research of this problem.
Cheng and Quan [18] puts forward a model color image background difference method based on HSI. According to chromaticity (H), saturation (S), and brightness (I), independent characteristics of the HSI model, it creates the brightness information by the H component and the S component and extracts the precise prospects with using a dynamic threshold of the brightness information. The change of the light will influence accuracy of detection of moving objects, so this paper eliminates it with HSI. The results show that this method is robust for noise and light changes and can well solve the problems of brightness changes. This method can well solve the influence of light, but it would increase the amount of computation when the color space was transformed to HSI space.
Huang et al. [28] describes an algorithm in traffic sign segmentation. It considers the influence of light and the transformations in the color space and analysis of a lot of traffic sign pictures and researches the relationship between the color pixels in the RGB color space; the paper puts forward a traffic sign segmentation method based on an RGB model. This method can be very good in dealing with traffic sign segmentation of the impact of the noise and light; the segmentation result is precise and can be real-time processed, but it needs a lot of research of traffic to get the experience threshold.
In this paper, we seriously discussed the influence factors of the image segmentation, including light, noise, and color space. An algorithm of color image segmentation base on color similarity in the RGB color space is presented; we calculate the pixels’ similarity by color similarity and form classification map, and obtain the segmentation finally.
Color sensor and color correction
Color sensor
Color has always played an important role in our life and production activities. The color of an object contains a lot of information, so it is easily affected by many factors, such as radiation light and reflections, light source azimuth, observation orientation, and the performance of the sensor [29]; the change of any parameter will lead to a change in the observed color.
The standard method of color measurement is that measures the sample tristimulus values by making use of spectrophotometric color measurement instrument and obtains the color of the sample. At present, there are two basic types sensor based on the principle of all kinds of color identification:
-
RGB color sensor (red, green, blue) mainly detects tristimulus values;
-
Chromatic aberration sensor detects the chromatic aberration of the object to be tested and the standard color. This kind of device contains diffuse type, beam type, and optical fiber type, and is encapsulated in various metals and polycarbonate shells.
RGB color sensor has two kinds of measurement modes: one is to analyze the proportion of red, green, blue. No matter how detection distance changes, it just only cause the change of light intensity but not the proportion of the three kinds of color light. Therefore, it can be used even in the target mechanical vibration occasions. The other mode is to use the reflected light intensity of the primary colors of red, green, and blue to detect. It can detect the tiny color discrimination, but the sensor will be affected by the impact of the target mechanical position. Most RGB color sensors have a guide function that makes it very easy to set up. This kind of sensor mostly has a built-in chart and a threshold value which can determine the operating characteristics. It can more accurately measured color using panchromatic color sensitive devices and means of correlation analysis. Typically, in order to obtain the color tristimulus values, it requires at least three photodiodes as well as three corresponding filters [30], so the structure and circuits are complicated.
Partial color detection
In the color sensor, the main point is how to detect a color. We know that there is a disparity between the real color of the object surface and the acquisition image color by imaging device. This is a partial color, which is caused by the surrounding environment, such as light and noise. And the degree of color cast has a deal with the color temperature of the outside light. Color temperature [31] to the color of the light source is the description of a color measurement. When a light color from a light source and the radiation color of a black body in a certain temperature phase is the same, we call it light color temperature.
Under the different light sources, such as natural light, tungsten filament lamp, and halogen lamp, the same kind of color is not the same. The difference is caused by different sources of the “color temperature.” Generally, the image color shows slanting blue when the light color temperature is higher. And the image color shows slanting red when the light color temperature is lower. So how to make the collected images to correctly reflect the real color is a key of research.
Before correcting the color, we should know if the image exists a partial color and how to detect it and its degree. At present, there are some representative partial color detection methods, including histogram statistics [32], gray balance method [33], and white balance method [33]. They can detect images whether there are partial colors.
Histogram statistics can show the whole color performance of the image. It will give the average brightness of three channel of RGB color space. We can judge whether the color of initial image is partial by the average brightness of R, G, and B channels. If the brightness of any component is the highest value, then the whole image color will be the color of this component representative. That is, if the brightness value of component G is the biggest, the whole image displays red. But the cause of the partial color is complex for different applications, so this method is difficult to get comprehensive and accurate judgment.
Gray balance method assumes the mean of the R, G, and B is equal in the whole image, which embodies as neutral “ash.” It uses statistics to average the brightness of every channel, converts it into Lab color space, obtains the homogeneous Lab coordinates relatively, calculates the color lengths to the neutral point, and judges whether there is partial color. But when the environment is lighter or darker, or the color of the image is more single, the mean of the R, G, and B is not equal.
White balance method deals with the existing mirror reflection image; it considers that the specular part of the mirror or the white area reflection can reflect the light color of light source. We count the max brightness value of every channel, convert it into Lab color space, obtain the homogeneous Lab coordinates relatively, calculate the color lengths to the neutral point, and judge whether there is partial color. But the result is distorted when the shooting objects has no white or specular part.
All these methods are just only suitable for a certain scope but not all. Therefore, it is limited just to the average image color or brightness max value to measure partial color degree. So, people develop other detection methods for well detection.
Color correction
After color cast detection, the next step is color correction. Color correction is how to describe object intrinsic color under different lighting conditions, and it has been applied in medical image, remote sensing images, mural images, licenses, and many other images. There are some classic methods for color correction, such as gray world color correction [34] and perfect reflection color correction [35].
Gray world color correction meets a hypothesis of the film image which is colorful, namely the statistics mean value of every channel should be equal and the color shows gray scale. We calculate the mean average of the filmed image, keep component G unchanged, and let the mean value of component R and B as the basis of color correction. But this method cannot be used in an image with a large single color.
Perfect reflection color correction. The object itself has no color; it shows color through a different wavelength of light absorption, reflection, and projection. If the object is white, all the light is reflected. The white object or area is called the perfect reflector. Perfect reflection theory is based on the hypothesis that it consider the perfect reflector as a standard white in an image. No matter what light it is, a white object, the R, G, and B of its image are of great value. Based on the perfect reflector, it corrects other colors.
The two kinds of color correction method are suitable for most color corrections, and the calculation is relatively simple, but sometimes can not come back to the real object color.
With various application scenarios of color correction, many scholars have proposed novel methods for color correction. Luz et al. propose a method based on Markov Random Field (MRF) which is used to represent the relationship between color depleted and color image to enhances the color of the image for the application of underwater image [36]. The parameters of the MRF model are learned from the training data and then the most likely color distribution for each pixel in the given color-depleted image is inferred by using belief propagation (BP). This allows the system to adapt the color restoration algorithm to the current environmental conditions and also to the task requirements. Colin et al. propose a method for correcting the color of multiview video sets as a preprocessing step to compression [37]. Distinguished from a previous work, where one of the captured views is used as the color reference, they correct all views to match the average color of the set of views. Block-based disparity estimation is used to find matching points between all views in the video set, and the average color is calculated for these matching points. A least-squares regression is performed for each view to find a function that will make the view most closely match the average color. Rizzi et al. propose a new algorithm for digital images unsupervised enhancement with simultaneous global and local effects, called ACE for Automatic Color Equalization [38]. It is based on a computational model of the human visual system that merges the two basic “Gray World” and “White Patch” global equalization mechanisms. Similar with the human visual system, ACE adapts to a wide range of lighting conditions and effectively extracts visual information from the environment. It has shown promising results in achieving different equalization tasks, e.g., performing color and lightness constancy, realizing image dynamic data driven stretching, and controlling the contrast. Yoon et al. use the temporal difference ratio of HSV color channels to compensate of color distortion between consecutive frames [39]. Experimental results show that the proposed method can be applied to consumer video surveillance systems for removing atmospheric artifacts without color distortion.