- Open Access
Shadow extraction and application in pedestrian detection
© Wang and Yagi; licensee Springer. 2014
- Received: 7 January 2013
- Accepted: 31 January 2014
- Published: 24 February 2014
We find shadows in many images and videos. Traditionally, shadows are considered as noises because they make hurdles for visual tasks such as detection and tracking. In this work, we show that shadows are helpful in pedestrian detection instead. Occlusions make pedestrian detection difficult. Existing shape-based detection methods can have false-positives on shadows since they have similar shapes with foreground objects. Appearance-based detection methods cannot detect heavily occluded pedestrians. To deal with these problems, we use appearance, shadow, and motion information simultaneously in our method. We detect pedestrians using appearance information of pedestrians and shape information of shadow regions. Then, we filter the detection results based on motion information if available. The proposed method gives low false-positives due to the integration of different features. Moreover, it alleviates the problem brought by occlusions since shadows can still be observable when foreground objects are occluded. Our experimental results show that the proposed algorithm provides good performance in many difficult scenarios.
- Indoor Environment
- Object Detection
- Motion Information
- Foreground Object
- Shadow Region
Shadows can be found in many images and videos. Shadows are formulated when direct light from a light source cannot reach due to obstruction by an opaque object. Traditionally, shadows are regarded as noises for vision tasks such as detection and tracking. In this work, we propose a detection algorithm considering shadows as helpful information in pedestrian detection.
Pedestrian detection in images and videos is a key issue for many applications such as autonomous vehicles and visual surveillance. A large number of pedestrian detection algorithms have been proposed in recent years. Among them, various features, descriptors, and classification methods have been investigated. Despite of the good performance achieved by many detection methods, pedestrian detection is still an open problem. For example, pedestrian detection methods usually have a low detection rate when many pedestrians are occluded. Appearance variations due to viewpoint or illumination also bring problems to pedestrian detection.
We classify image features into four categories: shape, appearance, motion, and depth features. Shape features are employed in many detection algorithms due to their invariance to viewpoint changes . Shape features are very sparse in detection and modeling processes. Therefore, shape-based detection methods can be efficient. However, it is rather difficult to extract accurate shape features because of background clutters. Appearance features are successfully applied in sliding window-based detection systems [2, 3]. They compute contrast information and describe such information using various descriptors. Histograms of gradient orientations (HOGs) have achieved good performance in pedestrian detection . Unfortunately, it is rather difficult to detect heavily occluded objects using appearance-based methods.
In visual surveillance scenarios, stationary cameras are widely used. Background subtraction is the first step to understand the scene. Object detection and tracking can be performed based on background subtraction results. In a crowded environment, motion blobs found by comparison with the learned background is informative . In this work, we also perform background subtraction in video sequences.
Although background modeling and subtraction is helpful, pedestrian detection is still difficult since subtraction result can contain many errors. Background subtraction is far from perfect: one blob in the subtraction results may merge several objects; one object may be split into a few blobs. Incorrect background model updating tends to introduce incorrect models because of motion ambiguities. The problem becomes more difficult when foreground objects have similar appearance with its background.
In surveillance and many other scenarios, we can find cast shadows easily . Shadows are regions where direct light cannot reach due to obstruction by an object. The space behind an opaque object is occupied by the shadow. The shape and position of the shadow are determined by the shape of the object and the position of the light sources. We can calculate the position and the shape of shadows approximately if we know the light source and the rough shape of the object. Therefore, shadows are informative in telling the existence of an object. We will detect shadows in background subtraction results based on the properties of shadows. To be specific, we compute the position of the Sun using the location and timing information. Then, we predict the orientation, position, and shape of an object in images.
The rest of this paper is organized as follows. After a brief review of important related works in Section 2, we discuss feature extraction in our algorithm in Section 3. Then, we will give the geometric transforms for feature extraction in Section 4. Detection method using foreground and shadow information is given in Section 5. Motion filtering is described in Section 6. Experimental results on real image sequences are demonstrated in Section 7. Section 8 explains the difficulty in applying shadow information in indoor environments. Section 9 concludes this work.
Pedestrian detection has been intensively investigated in the last decades. Gavrila  proposed a shape-based pedestrian detection method. Dalal and Triggs  proposed an appearance-based object detection method using histogram of gradient orientations. Their approach is very effective for detecting articulated objects such as pedestrians. Tuzel et al.  found that covariance description has nice properties for object detection. All of the above detection methods are carried out based on appearance information only. Bertozzi et al.  detect pedestrians in infrared images using active contours and neural networks.
Motion information has been noticed to be helpful in detecting objects. Dalal et al.  normalized optical flow in video frames and applied motion information into pedestrian detection. Other works also looked at pedestrian detection from video sequences. Actually, motion information has been used in a few previous works before Dalal et al. . Viola and Jones  detect pedestrians using patterns of motion and appearance. They model both of the motion and appearance using Haar-like features. Cutler et al.  proposed a detection method using long-term periodic motion information. In order to find periodicity, they analyzed long video sequences. Their system can be applied in image sequences with very low resolutions. Jones and Snow  extended the original pedestrian detection algorithm. They analyzed a moderate number of frames in a batch processing.
Depth information has also been adopted for pedestrian detection. Gavrila and Munder  designed a detection algorithm for driving aid. This algorithm integrates various techniques for finding pedestrians including the use of stereo cameras. Ess et al.  explicitly use depth information and projected two-dimensional (2D) detection results onto the three-dimensional (3D) space.
Part-based detection can be successfully applied in pedestrian detection under the condition that the resolution of images is sufficiently high. Felzenszwalb et al.  presented a general object detection algorithm that is able to detect objects in partial occlusions. Lempitsky  applied a similar idea in object detection using HOGs.
There are some works combining detection and tracking in an integrated framework, e.g., Leibe et al.  presented a detection and tracking algorithm in which object detection and trajectory estimation are coupled.
We detect pedestrians using appearance information of pedestrians and shape information of shadow regions. Motion information is used if available. The focus of our work is to show the power of shadow information. Therefore, we do not combine all the features in our work.
We model the background using Gaussian mixture models for each pixel. In addition, we also model possible shadows using available time and region information. We segment input images into three kinds of regions: foreground, shadows, and background.
3.1 Background modeling
where I(t) is the pixel value in the input image, and μ(t−1) and μ(t) are the mean values calculated at t−1 and t. Here, we set η(t) to 0.03. The learning of other parameters follows the similar approach in .
3.2 Shadow detection
One of the difficulties in using shadow information is detecting shadows in input images. Invariant color properties have been used in shadow detection, e.g., normalized RGB color space and a lightness measure are employed in shadow detection . Pixels with similar hue and saturation values and lower luminosity in hue-saturation-value (HSV) color space are classified as cast shadows .
In surveillance scenarios, an object casts shadows on surfaces. Shadow regions tend to have lower intensities due to the obstruction of the direct light source. Given a color vector without cast shadows, many shadow detection algorithms assume that the vector under cast shadows keeps the original vector direction. This assumption is not correct in outdoor environments because the ambient light source is blue. The values in different color channels are attenuated differently.
Background subtraction results include foreground objects and shadows. To separate shadows from foreground blobs, we apply a morphological close filter on background subtraction results to fill the gaps. Then, we convert the input images into HSV space which explicitly separates chromaticity and luminosity channels. A pixel in background subtraction results is considered as a possible shadow pixel when it has lower luminosity and similar hue values compared with the mode in the background model. After the classification, we calculate pixels that can be confidently classified into shadows. We use the Canny edge detection algorithm to find edges. The edges on shadow boundaries are found by comparing the hue and luminosity values. When shadows are projected on textured background, many edges are found including texture edges. The gradient orientations of such pixels are similar to those in the background model.
Shadows are helpful for pedestrian detection. However, shadows tend to vary according to the relative position between a pedestrian and the Sun. The Sun angle varies according to timing, latitude, and aptitude of the camera .
4.1 Shadows in the 3D world coordinate
It is possible to infer time based on shadow direction and length. The reverse inference is much easier since we can get precise timing and location information.
where ϕ o is the observer geometric latitude calculated using the local latitude; δ′, the topocentric sun declination calculated using the geocentric sun declination from the local longitude and current time; H′, the topocentric local hour angle from the current time.
4.2 Camera projection matrix
The setting of the coordinates of the camera is shown in Figure 2b. The projected coordinates of 3D points in the image can be obtained by multiplying its 3D coordinates with the camera projection matrix u=M x. We calibrate the cameras that are used for video capture. Both camera intrinsic and extrinsic parameters are known. For single images, we obtain camera lens length from the EXIF of these images. Then, we calibrate the image using the method introduced in .
where R is the rotation matrix; t, the translation vector. According to the setting in Figure 2, the translation vector t= [ 0 0 0]T.
4.3 Shadows in images
To estimate the shape of a shadow, we need the height of the obstruction object and the Sun angle. We calculate the Sun angle based on the all sky model [22–24]. We define a world coordinate (x w ,y w ,z w ). We assume the 3D coordinates of the Sun, s. The Sun position is determined by its zenith angle θ S and azimuth angle ϕ S . The two angles decide the shadow projection in the image. We denote the camera local frame by (x c ,y c ,z c ), which is rotated by angles (θ C ,ϕ C ).
and since shadows are on the ground.
5.1 Detection in foreground regions
We compute detection probabilities in foreground regions using appearance information. First, we train a Hough forest to model pedestrians. The Hough forest consists of many Hough trees that are efficient in matching descriptors. In testing stages, we calculate histograms of orientations for images in different scales. After that, we accumulate voting probabilities in a Hough space similar to the approach in . The probabilistic formulation in  fits into our framework quite well.
There are many pedestrian detection approaches in the literature. We select the Hough forest due to a few merits of this approach. First, the Hough forest can detect multiple pedestrians under heavy occlusions. According to the survey by Dollar et al. , occlusion is one of the major difficulties for pedestrian detection. Second, the Hough forest detection model has a probabilistic nature. It can be easily integrated with other knowledge. Pedestrian detection has been considered in a Hough-based framework using object segmentation and an MDL prior . The implicit shape model (ISM) interleaves pedestrian detection and segmentation. Therefore, the probabilistic aspect of this work is not very clear because of the interleaving. DPM  is a multi-scale sliding window object detector. It is good at dealing with pose variations and small occlusions. It usually gets multiple overlapping detections for a pedestrian. Non-maximum suppression has to be carried out on the initial detection results. A greedy procedure is adopted in DPM for discarding repeated pedestrian detections. Some of the true detection can be eliminated in this procedure. In contrast, the non-maximum suppression in the Hough forest detector is more reasonable since it accumulates detection probabilities according to the comparison of the voting in the iterations. This strategy can lead to a good performance for occluded pedestrians. There are other approaches for pedestrian detection. However, most of them perform non-maximum suppression as the DPM method. Therefore, they are not very good at dealing with heavy occlusions. The Hough forest is better in this aspect. Our approach improves such ability by incorporating shadow information.
where L A denotes HOG descriptors obtained from the appearance information. The details of the calculation can be found in .
5.2 Shape representation and matching in shadows
We construct a simple 3D model [4, 28] for pedestrian detection. Since we can calculate geometry properties of shadows, we can generate specific shape templates according to different timing and locations. To be specific, we project the 3D model onto 2D space based on shadow geometries. The shape templates generated are matched with shadow silhouettes.
where exp(−λ∥dΥ,Λ∥) considers the overlapping between the modeling and shadow regions.
5.3 Fusing detection probabilities
When the appearance cue of a pedestrian is available and the shadow cue is unavailable (e.g., a pedestrian’s shadow is in another large shadow region), the probability calculated based on the shadow cue is zero (or a very small value). We get an extremely low probability (or zero) if we use a simple product probability fusion method. In fact, we can infer the existence of this pedestrian based on the appearance cue only. The probability using a product of probabilities from the shadow and the appearance can be misleading. We meet a similar problem when the shadow cue is available and the appearance cue is unavailable (e.g., a pedestrian walks outside of the image, but his/her shadow is still in the image). The probability using a max of probabilities can detect pedestrians successfully when either an appearance cue or a shadow cues are available. We select the cue which is more informative when both of the cues are available.
In case of multiple pedestrians having overlaps in the image, the probabilities of the appearance given the state cannot be simply considered as a probability of a single pedestrian. Instead, a joint likelihood of the whole image, given all pedestrians, needs to be considered. The exact normalization of the probabilities’ distributions based on appearance and shadow is not easy. We carry out the normalization using a heuristic way. First, We discard those probabilities in both distributions less than pTA or pTS. Then, we normalize the remaining probabilities in [ 0,1].
We improve detection performance by filtering detection results using motion information. In images with low resolutions, a pedestrian is roughly a blob moving following a curve. It is difficult to discriminate motion of arms in this resolution. Motion information is more complicated in high resolutions. Despite of the complexity, we found that false-positive detection results usually have different motion patterns with true-positives. Many false-positive detection results are due to the appearance similarity with the object models. However, they do not have any motion in long time durations. We apply the motion filters described in  in the blobs that possible hypothesis exits. We learn a pattern library off-line using real pedestrian motion. Then, we compare the motion patterns of possible detections with the modeling motion patterns. We discard those detections when they are very different from the pattern models.
We implemented the proposed method and tested it on a data set collected from several video sequences and single images. We have 4,230 images in the data set. Among them, we randomly select 126 images. We label the subset of the images to make the ground truth for quantitative analysis. We detect shadows in the images captured by stationary cameras using the method described in Section 3.2. We estimate shadow regions in single images using the method described in .
We compare our algorithm with two detection methods. The first one is a part-based detection algorithm  that integrates appearance and spatial information using part representation and assembly. One of the nice properties of their method is that it can detect objects in partial occlusions. The second one is the Hough transform-based method that is also very good at detecting objects in partial occlusions .
We show that integration of multiple cues is helpful in designing an effective object detection system. To be specific, we found that shadow information should be considered as informative instead of noise. In addition, motion information-based filtering process finds false-positives and improves the performance of our detection system. The experimental results confirm our expectation that fusing multiple information is important for object detection.
Our method has a few limitations. First, it is rather difficult to improve its performance in overcast or raining days. Second, a pedestrian’s shadow cannot be extracted reliably when his/her shadow is merged into a large shadow formulated by a large object. Third, shadow cues are not very informative when the zenith angle is very small. We consider shadow as informative features in good weather.
- Gavrila D, Philomin V: Real-time object detection for smart vehicles. In Proc. Int. Conf. Computer Vision. Corfu: IEEE; 1999:87-93.Google Scholar
- Viola P, Jones M: Rapid object detection using a boosted cascade of simple features. In Proc. of Conf. on Computer Vision and Pattern Recognition. Kauai: IEEE; 2001:511-518.Google Scholar
- Dalal N, Triggs B: Histograms of oriented gradients for human detection. In Proc. of Conf. on Computer Vision and Pattern Recognition. San Diego: IEEE; 2005:886-893.Google Scholar
- Zhao T, Nevatia R: Tracking multiple humans in complex situations. IEEE Trans. Pattern Anal. Mach. Intell 2004, 26(9):1208-1221. 10.1109/TPAMI.2004.73View ArticleGoogle Scholar
- Wang J, Yagi Y: Pedestrian detection based on appearance, motion, and shadow information. In Proc. of Int. Conf. on Systems, Man, Cybernetics. Seoul: IEEE; 2012.Google Scholar
- Tuzel O, Porikli F, Meer P: Human detection via classification on Riemannian manifolds. In Proc. of Conf. on Computer Vision and Pattern Recognition. Minneapolis: IEEE; 2007:1-8.Google Scholar
- Bertozzi M, Cerri P, Felisa M, Ghidoni S, Rose MD: Pedestrian validation in infrared images by means of active contours and neural networks. EURASIP J. Adv. Signal Process 2010: (5), (2010)Google Scholar
- Dalal N, Triggs B, Schmid C: Human detection using oriented histograms of flow and appearance. In Proc. of European Conf. on Computer Vision. Graz: Springer; 2006:428-441.Google Scholar
- Viola PA, Jones MJ, Snow D: Detecting pedestrians using patterns of motion and appearance. Int. J. Comput. Vis 2005, 63(2):153-161. 10.1007/s11263-005-6644-8View ArticleGoogle Scholar
- Cutler R, Davis L: Robust real-time periodic motion detection: analysis and applications. IEEE Trans. Patt. Anal. Mach. Intell 2000, 22(7):781-796.View ArticleGoogle Scholar
- Jones M, Snow D: Pedestrian detection using boosted features over many frames. In Proc. of Int. Conf. on Pattern Recognition. Tampa: IEEE; 2008:1-4.Google Scholar
- Gavrila DM, Munder S: Multi-cue pedestrian detection and tracking from a moving vehicle. Int. J. Comput. Vis 2007, 73: 41-59. 10.1007/s11263-006-9038-7View ArticleGoogle Scholar
- Ess A, Leibe B, Gool LJV: Depth and appearance for mobile scene analysis. In Proc. of Int. Conf. on Computer Vision. Rio de Janeiro: IEEE; 2007:1-8.Google Scholar
- Felzenszwalb PF, Girshick RB, Allester McDA, Ramanan D: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell 2010, 32(9):1627-1645.View ArticleGoogle Scholar
- Barinova O, Lempitsky V, Kohli P: On detection of multiple object instances using hough transforms. In Proc. of Conf. on Computer Vision and Pattern Recognition. San Francisco: IEEE; 2010:2233-2240.Google Scholar
- Leibe B, Schindler K, Gool L: Coupled detection and trajectory estimation for multi-object tracking. In Proc. of Int. Conf. on Computer Vision. Rio de Janeiro: IEEE; 2007:1-8.Google Scholar
- Elgammal AM, Duraiswami R, Harwood D, Davis LS: Background and foreground modeling using non-parametric Kernel density estimation for visual surveillance. Proc. IEEE 2002, 10(7):1151-1163.View ArticleGoogle Scholar
- Elgammal AM, Harwood D, Davis LS: Non-parametric model for background subtraction. In Proc. of European Conf. on Computer Vision. Marseille: Springer; 2000:751-767.Google Scholar
- Friedman N, Russell S: Image segmentation in video sequences: a probabilistic approach. In Proc. 13th Conf. on Uncertainty in Artificial Intelligence. Providence: AUAI; 1997:175-181.Google Scholar
- Stenger B, Ramesh V, Paragios N, Coetzee F, Buhmann JM: Topology free hidden Markov models: application to background modeling. In Proc. of Int. Conf. on Computer Vision. Vancouver: IEEE; 2001:294-301.Google Scholar
- Cucchiara R, Grana C, Piccardi M, Prati A: Detecting moving objects, ghosts, and shadows in video streams. IEEE Trans. Pattern Anal. Mach. Intell 2003, 25(10):1337-1342. 10.1109/TPAMI.2003.1233909View ArticleGoogle Scholar
- Blanco-Muriel M, Alarcon-Padilla DC, Lopez-Moratalla T, Lara-Coira M: Computing the solar vector. Solar Energy 2001, 70(5):431-441. 10.1016/S0038-092X(00)00156-0View ArticleGoogle Scholar
- Preetham AJ, Shirley P, Smits B: A practical analytic model for daylight. In Proceedings of ACM SIGGRAPH. Los Angeles: ACM; 1999:91-100.Google Scholar
- Reda I, Andreas A: Solar position algorithm for solar radiation applications. Technical report NREL/TP-560-34302, National Renewable Energy Laboratory, USA, (2005)Google Scholar
- Hoiem D, Efros A, Hebert M: Putting objects in perspective. Int. J. Comput. Vis 2008, 80: 3-15. 10.1007/s11263-008-0137-5View ArticleGoogle Scholar
- Dollar P, Wojek C, Schiele B, Perona P: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell 2012, 34(4):743-761.View ArticleGoogle Scholar
- Leibe B, Leonardis A, Schiele B: Robust object detection with interleaved categorization and segmentation. Int. J. Comput. Vis 2008, 77(3):259-289.View ArticleGoogle Scholar
- Zhao T, Nevatia R, Wu B: Segmentation and tracking of multiple humans in crowded environments. IEEE Trans. Pattern Anal. Mach. Intell 2008, 30(7):1198-1211.View ArticleGoogle Scholar
- Toyama K, Blake A: Probabilistic tracking in a metric space. In Proc. of Int. Conf. on Computer Vision, Corfe. Corfu: IEEE; 2001:50-59.Google Scholar
- Guo R, Dai Q, Hoiem D: Single-image shadow detection and removal using paired regions. In Proc. of Conf. on Computer Vision and Pattern Recognition. Colorado Springs: IEEE; 2011:2033-2040.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.