Red-Eyes Removal through Cluster-Based Boosting on Gray Codes
© Sebastiano Battiato et al. 2010
Received: 26 March 2010
Accepted: 29 July 2010
Published: 11 August 2010
Since the large diffusion of digital camera and mobile devices with embedded camera and flashgun, the redeyes artifacts have de facto become a critical problem. The technique herein described makes use of three main steps to identify and remove red eyes. First, red-eye candidates are extracted from the input image by using an image filtering pipeline. A set of classifiers is then learned on gray code features extracted in the clustered patches space and hence employed to distinguish between eyes and non-eyes patches. Specifically, for each cluster the gray code of the red-eyes candidate is computed and some discriminative gray code bits are selected employing a boosting approach. The selected gray code bits are used during the classification to discriminate between eye versus non-eye patches. Once red-eyes are detected, artifacts are removed through desaturation and brightness reduction. Experimental results on a large dataset of images demonstrate the effectiveness of the proposed pipeline that outperforms other existing solutions in terms of hit rates maximization, false positives reduction, and quality measure.
Alternatively, red eyes can be detected after photo acquisition. Some photo-editing software makes use of red-eye removal tools which require considerable user interaction. To overcome this problem, different techniques have been proposed in literature (see [1, 2] for recent reviews in the field). Due to the growing interest of industry, many automatic algorithms, embedded on commercial software, have been patented in the last decade . The huge variety of approaches has permitted to explore different aspects of red-eyes identification and correction. The big challenge now is to obtain the best results with the minor number of visual errors.
In this paper, an advanced pipeline for red-eyes detection and correction is discussed. In the first stage, candidates rede-yes patches are extracted from the input image through an image filtering pipeline. This process is mainly based on a statistical color model technique coupled with geometrical constraints. In the second stage, a multimodal classifier, obtained by using clustering and boosting on gray codes features, is used to distinguish between true red-eyes patches versus other patches. Once the red eyes are detected, a correction technique based on desaturation and brightness reduction is employed to remove the red-eyes artifact. The proposed approach has been compared with respect to existing solutions on proper collected dataset, obtaining competitive results. One of the main contributions of the present work is to demonstrate that better results are achieved if the multimodally nature of candidates red-eyes as well as the spatial information during classification task are taken into account. To this aim, we have compared the proposed cluster-based boosting, to standard boosting in both cases, with and without considering spatial information.
The remainder of the paper is organized as follows. Section 2 gives an overview of related works. Section 3 provides details of the proposed red-eyes removal pipeline. Section 4 illustrates the experimental settings and the results obtained using the presented technique. Finally, Section 5 concludes this paper with avenues for further research.
2. Related Works
Several studies have considered the problem of automatic red-eyes removal. A pioneering technique for automatic red-eye reduction was proposed by Patti et al. . The technique uses a nonstandard luminance-chrominance representation to enhance the regions affected by the red-eye artifacts. After the detection of an interesting block, thresholding operation and a simple color replacement pipeline are employed to remove the red eyes.
Battiato et al.  have proposed to deal with the problem of red eye detection by using the bag-of-keypoints paradigm. It involves extraction of local image features, quantization of the feature space into a codebook through clustering, and extraction of codeword distribution histograms. An SVM classifier has been used to decide to which class each histogram, thus each patch, belongs.
Gaubatz and Ulichney  proposed to apply a face detector as first stage and then search for the eyes in the candidate face regions by using constraints based on colors, red variance, and glint. One of the drawbacks of such method is the robustness with respect to the multimodality of the face space with respect to poses (e.g., not always frontal and upright). The redness of detected red eyes is attenuated with a tapered color desaturation process.
Schildkraut and Gray  used an automatic algorithm to detect pairs of eyes, which is restricted to near-frontal face images. The pair verification technique was used to reduce false positive. However, many photos have single redeyes (e.g., face partially screened) that cannot be corrected with this approach. Detected red eyes are removed blending the corrected region with the neighborhood in order to preserve the natural appearance of the eyes. Building on , a combination of boosting classifiers has been proposed by Ioffe . Specifically, a boosting classifier was used to discriminate between red-eyes versus other, and another boosting classifier was used to detect faces in order to reduce the false positives.
A two-stage algorithm was described by Zhang . At the first stage, red pixels are grouped, and a cascade of heuristic algorithms to deal with color, size, and highlight is used to decide whether the grouped region is red eye or not. At the second stage, candidate red-eyes regions are checked by using Adaboost classifier. Though highlight is useful for red-eyes detection, some red eye with no highlight region may occur when the eye direction does not face toward the camera/flash light. Artifacts are corrected through brightness and contrast adjustment followed by blending operation.
Luo et al.  proposed an algorithm that first uses square concentric templates to assess the candidate red-eye regions and then employs an Adaboost classifier coupled with a set of adhoc selected Haar-like features for final detection. Multiscale templates are used to deal with the scale of red-eyes patches. For each scale, a thresholding process has been used to determine which pixels are likely to be red-eye pixels. The correction process is mainly based on adaptive desaturation over the red-eye regions.
Petschnigg et al.  presented a red-eyes detection technique based on changes of pupil color between the ambient image and the flash image. The technique exploits two successive photos taken with and without flash considered into YCbCr space to decorrelate luminance from chrominance. The artifacts are detected by thresholding the differences of the chrominance channels and using geometric constraints to check size and shape of red regions. Detected red eyes are finally corrected through thresholding operation and the color replacement pipeline proposed in .
A wide class of techniques make use of geometric constraints to restrict possible red-eye regions in combination with reliable supervised classifiers for decision making. Corcoran et al.  proposed an algorithm for real-time detection of flash eye defects in the firmware of a digital camera. The detection algorithm comprises different substeps on Lab color space to segment artifacts regions that are finally analyzed with geometric constraints.
The technique proposed by Volken et al.  detects the eye itself by finding the suitable colors and shapes. They use the basic knowledge that an eye is characterized by its shape and the white color of the sclera. Combining this intuitive approach with the detection of "skin" around the eyes, red-eyes artifacts are detected. Correction is made through an adhoc filtering process.
Safonov et al.  suggested a supervised approach taking into account color information via 3D tables and edge information via directional edge detection filters. In the classification stage, a cascade of supervised classifiers has been used. The correction consists in conversion of pixel to gray color, darkening and blending with the initial image in the YCbCr color space. The results were evaluated by using an adhoc detection quality criterion.
Alternatively, an unsupervised method to discover red-eye pixels was adopted by Ferman . The analysis is performed primarily in the hue-saturation-value (HSV) color space. A flash mask, used to define the regions where red-eye artifacts may be present, is first extracted from the brightness component. Subsequent processing on the other color components prunes the number of candidate regions that may correspond to red eyes. Though the overall results are satisfactory, this approach is not able to identify red-eyes region outside the flash mask area (i.e.; a very common case).
3. Red-Eyes Detection and Correction
The proposed red-eyes removal pipeline uses three main steps to identify and remove red-eyes artifacts. First, candidates red-eyes patches are extracted, then they are they are classified to distinguish between eyes and non-eyes patches. Finally, correction is performed on detected red eyes. The details of the three steps involved in the proposed pipeline are detailed in the following subsections.
3.1. Red Patch Extraction
Once the closing operation has been accomplished, a search of the connected components is achieved using a simple scanline approach. Each group of connected pixels is analyzed making use of simple geometric constraints. As in , the detected regions of connected pixels are classified as possible red-eye candidates if the geometrical constraints of size and roundness are satisfied. Specifically, a region of connected red pixels is classified as possible red-eye candidate if the following constraints are satisfied:
The parameters involved in the aforementioned filtering pipeline have been set through a learning procedure as discussed in Section 4.
where denotes the exclusive OR operation. This code has the unique property that successive code words differ only one bit position. Thus, small changes in gray level are less likely to affect all bit planes.
3.2. Red Patch Classification
The main aim of the classification stage is the elimination of false positive red eyes in the set of patches obtained performing the filtering pipeline described in Section 3.1.
At this stage, we deal with a binary classification problem. Specifically, we want to discriminate between eye versus non-eye patches. To this aim, we employ an automatic learning technique to make accurate predictions based on past observations. The approach we use can be summarized as follows: start by gathering as many examples as possible of both eyes and non-eyes patches, next feed these examples, together with labels indicating if they are eyes or not, to a machine-learning algorithm which will automatically produce a classification rule. Given a new unlabeled patch, such a rule attempts to predict if it is eye or not.
Building a rule that makes highly accurate predictions on new test examples is a challenging task. However, it is not hard to come up with rough weak classifiers that are only moderately accurate. An example of such a rule for the problem under consideration is something like the following: "If the pixel located in the sclera region of the patch under consideration is not white, then predict it is non-eye". In this case, such a rule is related to the knowledge that the white region corresponding to the sclera should be present in an eye patch. On the other hand, such a rule will cover all possible non-eyes cases; for instance, it is correct to say nothing about what to predict if the pixel is white. Of course, this rule will make predictions that are significantly better than random guessing. The key idea is to find many weak classifiers and combine them in a proper way deriving a single strong classifier.
The rationale beyond the use of gray code representation is the following. In the gray code space, just a subset of all possible bit combinations is related to the eyes patches. We wish to select those bits that usually differ in terms of binary value between eye and non-eye patches. Moreover, by using gray code representation rather than classic bit planes decomposition, we reduce the impact of small changes in intensity of patches that could produce significant variations in the corresponding binary code .
The approach described above does not take into account spatial relationship between selected gray code bits. Spatial information is useful to make the classification task stronger (e.g., pupil is surrounded of sclera). To overcome this problem we coupled the gray codes bits selected at the first learning stage using xor operator to obtain a new set of binary features. We randomly select a subset containing of these features and performed a second round of Gentleboost procedure to select the most discriminative spatial relationship among the randomly selected. This new classifier is combined with the one learned previously to perform final eye and non-eye patches classification.
Due to the multimodally nature of the patches involved in our problem (i.e., colours, orientation, shape, etc.), a single discriminative classifier could fail during classification task. To get through this weakness, we propose to perform first a clustering of the input space and then apply the two stage boosting approach described above on each cluster. More specifically, during the learning phase, the patches are clustered by using K-means  in their original color space producing the subsets of the input patches with the relative prototypes; hence, the two stages of boosting described above are performed on each cluster. During the classification stage, a new patch is first assigned to a cluster according to the closest prototype and then classified taking into account the two additive models properly learned for the cluster under consideration.
Experimental results reported in Section 4 confirm the effectiveness of the proposed strategy.
3.3. Boosting for Binary Classification Exploiting Gray Codes
where is the class label associated to the feature vector g. In this work, is associated to the eye class whereas is the label associated to the non-eye class. The cost function in the (9) can be thought as a differentiable upper bound of the misclassification rate .
The procedures employed for learning and classification on the proposed representation are summarized in Algorithm 1 and Algorithm 2. In the learning stage, we initialize the weights corresponding to the elements of the training set such that the number of the samples within each class is taken into account. This is done to overcome the problems that can occur due to the unbalanced number of training samples within the considered classes.
Algorithm 1: Learning.
Algorithm 2: Classification.
3.4. Red-Eyes Correction
where is a surrounding of the "white" color which can slightly vary in terms of lightness, hue, and saturation. This means that to prevent glint from disappearing only red pixels are desatured (the whitish pixels are excluded from the brightness processing).
4. Experimental Settings and Results
Estimated eye sizes taking into account the distance from the camera.
Distance from the sensor (m)
Pupil Diameter (pixels)
For each image of the dataset, the pixels belonging to red eyes artifacts have been manually labeled as red-eye pixels. The parameters , , , , , , , and involved in the first stage of the proposed approach (see Section 3.1) have been learned taking into account the true and false red-eyes pixels within the labeled dataset. To this aim, a full search procedure on a grid of equispaced points in the eight- dimensional parameters' space was employed. For each point of the grid, the correct detection and false positives rates of the true red-eyes pixels within the dataset were obtained. The tuple of parameters with the best tradeoff between correct detection and the false positives have been used to perform the final filtering pipeline. A similar procedure was employed to determine the subspace of the RGB space involved in the correction step to identify pixels belonging to the glint area.
Comparison of different configurations.
Gray Codes + Clustering
Gray Codes + XOR
Gray Codes + Clustering + XOR
To properly evaluate the overall red-eyes removal pipeline, the qualitative criterion proposed in  was adopted to compare the proposed solution with respect to existing automatic solutions. According to , we divided False Positive (FP) and False Negative ( ) to distinguish different detection cases as follows:
The proposed pipeline has been compared with respect to the following automatic (mainly commercial) solutions: Volken et al. , NikonView V6.2.7, KodakEasyShare V6.4.0, StopRedEye! V1.0, HP RedBot, Arcsoft PhotoPrinter V5, and Cyberlink MediaShow. Experiments have been done using effective commercial software and the implementation of  provided by the authors. NikonView approach is mainly based on .
Quality score of different red-eyes removal approaches.
Volken et al. 
Arcsoft PhotoPrinter V5
Battiato et al. 
4.1. Computational Complexity
To evaluate the complexity, a deep analysis has been performed by running the proposed pipeline on an ARM926EJ-S processor instruction set simulator. We have chosen this specific processor because it is widely used in embedded mobile platforms. The CPU run at 300 MHz and both data and instruction caches have been fixed to 32 KB. The bus clock has been set to 150 MHz, and the memory read/write access time is 9 ns. The algorithm has been implemented using bitwise operators to work on colour maps and fixed- point operations. Due to the dependence of the operations to the number of red clusters found in the image, we have analyzed a midcase, that is an image containing around 40 potential red eye zones, but only 2 of them are real eyes to be corrected.
Table 4 contains a report of the performances of the main steps of the proposed pipeline, assuming to work on an XGA version (scaled) of the image: the redness detection (Color Map), the processing on the generated maps (Morphological Operations), the candidate extraction, the classification step, and finally the correction of the identified eyes. The performances information reported in Table 4 is related to the following computational resources.
Instructions: counts the executed ARM instructions.
Core cycles: core clock ticks needed to make the Instructions.
Data (D$): Read/Write Hits and Misses, cache memory hits and misses.
Seq and Nonseq: sequential and nonsequential memory accesses.
Idle: represents bus cycles when the instruction bus and the data bus are idle, that is, when the processor is running.
Busy: counts busy bus cycles, that is, when the data are transferred from the memory into the cache.
Wait States: the number of bus cycles introduced when waiting for accessing the RAM (is an indicator of the impact of memory latencies).
Total: is the total number of cycles required by the specific function, expressed in terms of bus cycles.
Milliseconds: time required by the specific function expressed in milliseconds.
Performances of the main steps of the proposed pipeline.
D$ R Hits
D$ W Hits
D$ R Misses
D$ W Misses
The overall time achieved on this midcase is 326 ms. The table highlights the efficiency of the classifier because it is mainly based on bit comparisons. Considering patches scaled at before the classification stage, the classifier is essentially a comparison of bit words for each channel with complexity in the range of one operation per pixel. For this reason, it is very fast and light. Also the correction is very light because, as explained in Section 3.4, it is based on the resampling of a precomputed Gaussian function. The impact on memory is valuable only on the map processing, where data are processed several times, whereas in the remaining steps of the pipeline the weight of the instructions determines the main part of process timing.
We cannot compare the performances and complexity of our methodology with other methods because the other proposed methods are commercial ones; hence, the related codes are not available for the analysis.
5. Conclusion and Future Works
In this paper, an advanced red-eyes removal pipeline has been discussed. After an image filtering pipeline is devoted to select only the potential regions in which red-eye artifacts are likely to be, a cluster-based boosting on grey codes- based features is employed for classification purpose. Red eyes are then corrected through desaturation and brightness reduction. Experiments on a representative dataset confirm the real effectiveness of the proposed strategy which also allows to properly managing the multimodally nature of the input space. The obtained results have pointed out a good trade-off between overall hit rate and false positives. Moreover, the proposed approach has shown good performance in terms of quality measure. Future works will be devoted to include the analysis of other eyes artifacts (e.g., "golden eyes").
- Gasparini F, Schettini R: Automatic red-eye removal for digital photography. In Single-Sensor Imaging: Methods and Applications For Digital Cameras. Edited by: Lukac R. CRC Press, Boca Raton, Fla, USA; 2008.Google Scholar
- Messina G, Meccio T: Red eye removal. In Image Processing for Embedded Devices, Applied Digital Imaging Ebook Series. Edited by: Battiato S, Bruna AR, Messina G, Puglisi G. Bentham Science; 2010.Google Scholar
- Gasparini F, Schettini R: A review of redeye detection and removal in digital images through patents. Recent Patents on Electrical Engineering 2009,2(1):45-53.View ArticleGoogle Scholar
- Patti A, Konstantinides K, Tretter D, Lin Q: Automatic digital redeye reduction. Proceedings of the International Conference on Image Processing (ICIP '98), October 1998 55-59.Google Scholar
- Battiato S, Guarnera M, Meccio T, Messina G: Red eye detection through bag-of-keypoints classification. Proceedings of the International Conference on Image Analysis and Processing, 2009, Lecture Notes in Computer Science 5716: 528-537.Google Scholar
- Gaubatz M, Ulichney R: Automatic red-eye detection and correction. Proceedings of the International Conference on Image Processing (ICIP '02), September 2002 I/804-I/807.View ArticleGoogle Scholar
- Schildkraut JS, Gray RT: A fully automatic redeye detection and correction algorithm. Proceedings of the International Conference on Image Processing (ICIP '02), September 2002 I/801-I/803.View ArticleGoogle Scholar
- Ioffe S: Red eye detection with machine learning. Proceedings of the International Conference on Image Processing (ICIP '03), September 2003 871-874.Google Scholar
- Zhang L, Sun Y, Li M, Zhang H: Automated red-eye detection and correction in digital photographs. Proceedings of the International Conference on Image Processing (ICIP '04), October 2004 2363-2366.Google Scholar
- Luo H, Yen J, Tretter D: An efficient automatic redeye detection and correction algorithm. Proceedings of the 17th International Conference on Pattern Recognition (ICPR '04), August 2004 883-886.Google Scholar
- Petschnigg G, Szeliski R, Agrawala M, Cohen M, Hoppe H, Toyama K: Digital photography with flash and no-flash image pairs. ACM Transactions on Graphics 2004,23(3):664-672. 10.1145/1015706.1015777View ArticleGoogle Scholar
- Corcoran P, Bigioi P, Steinberg E, Pososin A: Automated in-camera detection of flash eye-defects. Proceedings of the International Conference on Consumer Electronics (ICCE '05), January 2005 129-130.Google Scholar
- Volken F, Terrier J, Vandewalle P: Automatic red-eye removal based on sclera and skin tone detection. Proceedings of the European Conference on Color in Graphics, Imaging and Vision, 2006Google Scholar
- Safonov IV, Rychagov MN, Kang K, Kim SH: Automatic red eye correction and its quality metric. Color Imaging XIII: Processing, Hardcopy, and Applications, 2008, San Jose, Calif, USA, Proceedings of SPIE 6807:Google Scholar
- Ferman AM: Automatic detection of red-eye artifacts in digital color photos. Proceedings of the IEEE International Conference on Image Processing (ICIP '08), October 2008 617-620.Google Scholar
- Duda RO, Hart PE, Stork DG: Pattern Classification. 2nd edition. Wiley-Interscience, New York, NY, USA; 2000.Google Scholar
- Gonzalez RC, Woods RE: Digital Image Processing. Prentice-Hall, Englewood Cliffs, NJ, USA; 2006.Google Scholar
- Friedman J, Hastie T, Tibshirani R: Additive logistic regression: a statistical view of boosting. Annals of Statistics 2000,28(2):337-407.View ArticleMathSciNetMATHGoogle Scholar
- Schapire RE: The boosting approach to machine learning: an overview. Proceedings of the MSRI Workshop on Nonlinear Estimation and Classification, 2001Google Scholar
- Schapire RE: The strength of weak learnability. Machine Learning 1990,5(2):197-227.Google Scholar
- Lienhart R, Kuranov A, Pisarevsky V: Empirical analysis of detection cascades of boosted classifiers for rapid object detection. Proceedings of the 25th Symposium of the German Association for Pattern Recognition (DAGM '03), September 2003, Magdeburg, Germany 2781: 297-304.Google Scholar
- Torralba A, Murphy KP, Freeman WT: Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 2007,29(5):854-869.View ArticleGoogle Scholar
- Battiato S, Farinella GM, Guarnera M, Messina G, Ravì D: Red-eyes removal through cluster based linear discriminat analysis. Proceedings of the IEEE International Conference on Image Processing (ICIP '10), September 2010 2185-2188.Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.