3D Objects Localization Using Fuzzy Approach and Hierarchical Belief Propagation: Application at Level Crossings
© N. Fakhfakh et al. 2011
Received: 15 April 2010
Accepted: 1 October 2010
Published: 20 October 2010
Technological solutions for obstacle-detection systems have been proposed to prevent accidents in safety-transport applications. In order to avoid the limits of these proposed technologies, an obstacle-detection system utilizing stereo cameras is proposed to detect and localize multiple objects at level crossings. Background subtraction is first performed using the color independent component analysis technique, which has proved its performance against other well-known object-detection methods. The main contribution is the development of a robust stereo-matching algorithm which reliably localizes in 3D each segmented object. A standard stereo dataset and real-world images are used to test and evaluate the performances of the proposed algorithm to prove the efficiency and the robustness of the proposed video-surveillance system.
In recent years, public security has been facing an increasing demand from the general public as well as from governments. An important part of the efforts to prevent the threats to security is the ever-increasing use of video-surveillance cameras throughout the network in order to monitor and detect incidents without delay. Existing surveillance systems rely on human observation of video streams for high-level classification and recognition. The typically large number of cameras makes this solution inefficient and in many cases unfeasible. Although the basic imaging technologies for simple surveillance are available today, the reliable deployment of them in a large network is still ongoing research.
In the context of railway transport, one of the major issues is the monitoring of linear infrastructures such as railway tracks and level crossings (LCs) which represent an interaction between a road and a railway track. The latter represents what is called extended perimeters. Numerous acts of malevolence occur in those areas. One can refer in particular to: objects hanging from catenaries, objects that may explode under the ballast, and obstacles at LCs. Transportation network could be interrupted aiming at causing economical damage or damaging a symbolic landmark.
The advanced surveillance system we intend to present here after relates to problems of safety and security at LCs. For some years, road and railways operators have shown growing interest in improving the safety and security of level crossings (LCs). They have been identified as a particular weak point in the safety of road and railway infrastructures. Road and highway safety professionals from several countries have dealt with the same subject: providing safer LCs. Recently, the EU's FP6 SELCAT project  (Safer European Level Crossing Appraisal and Technology) has provided recommendations for actions and evaluation of technological solutions to improve the safety at LCs. A new French project entitled PanSafer  aims at proposing such technologies, building upon the results provided by SELCAT. The present work is developed as part of PanSafer.
High-technology systems are developed so as to avoid collisions between trains and road vehicles. Nevertheless, high safety requirements may mean a costly systems which will hinder their actual use. Systems which have unacceptable levels of false/missed detection have adverse effects and should not be implemented either. Some conventional object-detection systems have been tested at level crossings, and provide more or less significant information. Referring to the literature, little research has focused on passive vision to solve the problems at LCs. Among the existing systems, two of them based on CCTV cameras are to be distinguished: one of them is a system using a single camera . It uses a single CCD camera placed on a high pole in a corner of the LC, classifying objects such as cars, bikes, trucks, pedestrians, dogs, and papers and localizing them according to the camera calibration, assuming a planar model of the road and railroad. This system is prone to false and missed alarms caused by fast illumination changes or shadows. The other one is a system using stereo cameras , with a stereo-matching algorithm and 3D background removal. This system more or less detects vehicles and pedestrians by day and night under usual weather conditions, but it is extremely sensitive to adverse weather conditions, like heavy rain, fog, or snow.
This paper is organized as follows: after an introduction covering the problem, we describe the requirements of the LC's application in Section 2 and our proposed system in Section 3. We present in Section 4 the background subtraction technique to highlight the moving objects in the scene. Section 5 is dedicated to outlining a robust approach for 3D localization of the moving objects. Results are detailed in Section 6. The conclusion is devoted to a discussion on the obtained results, and perspectives are provided.
The most reliable solution to decrease the risk and accident rate at level crossings is to eliminate unsafe railroad crossings. This avoids any collisions between trains and road users. Unfortunately, this is impossible in most cases, due to location feasibility and cost that would be incurred. For instance, almost 10 million Euros per year are earmarked for the removal of the most dangerous level crossings in France. To overcome these limits, the development of a new obstacle-detection system is required. Any proposed system is not intended to replace the currently equipment installed on each level crossing. The purpose of such a system is to provide additional information to the human operator it can be considered as support system operations. This concerns the detection and localization of any kind of objects, such as pedestrians, people on two-wheeled vehicle, wheelchairs, and car drivers. Presently, sensors are evaluated relying on their false object-detection alert among other. This may increase the risk related to level-crossing users. It is important to be noted that risks associated with the use of technology systems are becoming increasingly important in our society. Risk involves notions of failure and consequences of failure. Therefore, it requires an assessment of dependability; this might be expressed, for example, as probability of failure upon demand, rate of occurrence of failures, probability of mission failure, and so on. Each level crossing is equipped with various sensors for timely detection of potentially hazardous situations. To be reliable, the related information must be shared and transmitted to the train dispatching center, stations, train drivers, and road users. Generally, most level crossings are fitted with standard equipments such as lights, automatic full or half barriers, and notices. This equipment warns and prevents all users of the level crossing if a train is approaching the dangerous area.
3. Overview of the System
Our research aims at developing an Automatic Video-Surveillance (AVS) system using the passive stereo-vision principle. The proposed imaging system uses two cameras to detect and localize any kind of object lying on a railway level crossing. The system supervises and estimates automatically the critical situations by detecting objects in the hazardous zone defined as the crossing zone of a railway line by a road or path. The AVS system is used to monitor dynamic scenes where interactions take place among objects of interest (people or vehicles). After a classical image grabbing and digitizing step, this architecture is composed of the two following modules.
(i) Motion Detection Module
The first step consists of separating the motion regions from the background. It is performed using Independent Component Analysis (ICA) technique for high-quality motion detection. The color information is introduced in the ICA algorithm that models the background and the foreground as statistically independent signals in space and time. Although many relatively effective motion estimation methods exist, ICA is retained for two reasons: first, it is less sensitive to noise caused by the continuous environment changes over time, such as swaying branches, sensor noise, and illumination changes. Second, this method provides clear-cut separation of the objects from the background and can detect objects that remain motionless for a long period. Foreground extraction is performed separately on both cameras. The motion-detection step allows focusing on the areas of interest, in which 3D localization module is applied.
(ii) 3D Localization Module
This process applies a specific stereo matching algorithm to obtain a 3D localization of the detected objects. In order to deal with poor-quality images, a selective stereo-matching algorithm is developed and applied to the moving regions. First, a disparity map is computed for all moving pixels according to a dissimilarity function entitled Weighted Average Color Difference (WACD) . An unsupervised classification technique is then applied to the initial set of matching pixels. This allows to automatically choose only wellmatched pixels. A pixel is considered as well-matched if its correspondant which is obtained thanks to a given stereo-matching algorithm, is the true correspondant. However, all true correspondants are given by the ground truth which allows to verify the accuracy of the applied matching algorithm. The classification is performed applying the confidence-measure technique detailed in . It consists of evaluating the result of the likelihood function, based on the "winner-take-all" strategy. However, the pixels constituting each object are then estimated applying a hierarchical belief-propagation technique detailed in Section 5.3.
4. Background Subtraction by Independent Component Analysis
4.1. Related Work
Real environments are much more complex than indoor environments and require advanced tools to deal, for instance, with sharp brightness variations. Another aspect that must be dealt with is the motion in the background, such as swaying branches, illumination changes, clouds, shadows, and sensor noise. Background subtraction is one of the motion detection methods introduced to extract the foreground objects from a reference background in an image sequence. In recent years, another set of techniques has emerged to cope with the problem of foreground estimation. The Independent Component Analysis (ICA) technique is getting much attention in video processing. It was introduced in the 1980s  in the context of neural network modeling. The purpose of ICA is to restore statistically independent source signals, given only observed output signals without knowing the mixing matrix of the sources. Zhang and Chen  have introduced the spatiotemporal ICA method to model a video sequence for background subtraction. Their scheme tries to extract a set of mutually independent components from a given mixture of two signals representing, respectively, a background and an image containing an arbitrary object. Recently, Tsai and Lai  have proposed an improved ICA scheme for background subtraction without background updating in indoor environment, but this method proves its effectiveness with a stationary monochrome camera. Their work is limited to an indoor environment with small environmental changes and only uses monochrome image sequences.
4.2. Motion Detection by Independent Component Analysis
An ICA algorithm can be seen as a convolution between two signals. The more the signals are similar, the smaller are the result values. In Figure 1, the intensity of pixels of the white lines on the road in the two images are very similar. The difference between these two signals gives a small value. The smaller the value, the darker the corresponding pixel. For background subtraction, the color ICA model outputs three channels, each linked to a color component of the processed image: red, green, or blue. The channel with the highest signal/noise ratio is used to perform the motion-based segmentation process. The foreground segmentation is based on a threshold calculated from the histogram of the considered output channel. The threshold is estimated using the following procedure: each pixel in the output channel belongs to a class representing a color level. The color value corresponding to the class with the highest number of entries is taken as threshold (the entries of a class represent the number of pixels with a color corresponding to this class). Therefore, the foreground object can be easily extracted from the estimated source according to its Gaussian distribution.
The inverse of the mixing matrix, called de-mixing matrix, is estimated by the FastICA algorithm . The estimated source images contain only the foreground object in a uniform region without the detailed contents of the background. The FastICA algorithm is based on a fixed-point iteration scheme maximizing non-Gaussianity as a measure of statistical independence. It attempts to find a set of independent components by estimating the maximum negentropy. The FastICA algorithm uses an approximation of the Newton method, tailored to the ICA problem, and provides fast convergence with little computation per iteration. In order to make this estimate, the algorithm iteratively searches for the weight set matrix of a neural network from a data set that properly separates the data signal mixtures into independent components. Let be the covariance matrix of the data matrix . The th iteration of the search loop makes an estimate of the th weight vector. Note that an intuitive interpretation of the contrast functions is that they are measures of nonnormality. However, the estimated source signals are termed independent components. The iterative algorithm finds the direction for the weight matrix maximizing the non-Gaussianity of the projection for the data matrix . The FastICA algorithm is described in Algorithm 1.
of the choice of function in ).
repeat starting at step (2).
Indeed, this technique leads to the detection of any kind of objects such as pedestrians, cars, or arbitrary objects. Furthermore, one can highlight the advantages of this technique. First, the very small objects can be detected easily. Second, unlike other foreground detection techniques, CICA does not absorb a stationary object into the background. Therefore, the period during which an object is motionless does not affect the detection performances.
5. Stereo Matching for Robust 3D Localization
The two-frame stereo-matching approaches allow computing disparities and detecting occlusions, assuming that each pixel in the input image corresponds to a unique depth value. The stereo algorithm described in this section stems from the inference principle based on hierarchical belief propagation and energy minimization.
It takes into account the advantages of local methods for reducing the complexity of the belief-propagation method which leads to an improvement in the quality of results. A Hierarchical Belief Propagation (HBP) based on a confidence-measure technique is proposed: first, the data term (detailed in Section 5.1) is computed using Weighted Average Color Difference dissimilarity function (WACD) . The obtained 3D volume allows initializing the belief-propagation graph by attributing a set of possible labels (i.e., disparities) for each node (i.e., pixels). The originality is to consider a subset of nodes among all the nodes to begin the inference algorithm. This subset is obtained thanks to a confidence measure computed at each node of a graph of connected pixels. Second, the propagation of messages between nodes is performed hierarchically from the nodes having the highest confidence measure to those having the lowest one. A message is a vector of parameters (e.g., possible disparities, ) coordinates, etc.) that describes the state of a node. To begin with, the propagation is performed within each homogeneous color region and then passed from a region to another. The set of regions is obtained by a color-based segmentation using the meanshift method . In level crossings, the motion constraint is also employed in the matching process in order to reduce both the matching error rate and the processing time. However, the 3D localization step concerns only the pixels in motion. A summary of our algorithm is given in Algorithm 2.
Initialize the data cost for nodes in the graph using the method in .
Repeat steps (a), (b), (c) and d for each node
Update the label of the current node.
Update the weight of the current node.
5.1. Global Energy Minimization
The minimization of this energy is performed iteratively by passing messages between all the neighboring nodes. These messages are updated at each iteration, until convergence. However, a node can be represented as a pixel having a vector of parameters such as, typically, its possible labels. Several studies [16–18] have proposed ways to improve the processing time of the inference process. However, reducing the complexity of the inference algorithm leads in most cases to reduced matching quality. Other algorithm variants can be derived from this basic model by introducing additional parameters in the message to be passed. A compromise must be found between the reliability and the computational cost. One of the important parameters is the spatiocolorimetric proximity between nodes .
(ii) The smoothness term is used to ensure that neighboring pixels have similar disparities.
5.2. Confidence-Measure Computation
(i) Best Correlation Score (min)
The output of the dissimilarity function is a measure of the degree of similarity between two pixels. Then, the candidate pixels are ranked in increasing order according to their corresponding scores. The couple of pixels that has the minimum score is considered as the best-matched pixels. The lower the score, the better the matching. The nearer the minimum score to zero, the greater the chance of the candidate pixel to be the actual correspondent.
This parameter represents the number of potential candidate pixels having similar scores. has a big influence because it reflects the behavior of the dissimilarity function. A high value of means that the first candidate pixel is located in a uniform color region of the frame. The lower the value of , the fewer the candidate pixels. If there are few candidates, the chosen candidate pixel has a greater chance of being the actual correspondent. Indeed, the pixel to be matched belongs to a region with high variation of color components. A very small value of and a score close to zero mean that the pixel to be matched probably belongs to a region of high color variation.
A disparity value is obtained for each candidate pixel. For the potential candidate pixels, we compute the standard deviation of the disparity values. A small means that the candidate pixels are spatially neighbors. In this case, the true candidate pixel should belong to a particular region of the frame, such as an edge or a transition point. Therefore, it increases the confidence measure. A large means that the candidate pixels taken into account are situated in a uniform color region.
5.3. Hierarchical Belief Propagation for Disparity Enhancement
All the matched pixels can be modeled as a set of nodes in an undirected graph. Typically, the inference algorithm based on a belief-propagation method [20, 21] can be applied to achieve the optimal solution that corresponds to the best disparity set. A set of messages are iteratively transmitted from a node to its neighbors until convergence. Referring to this basic framework, all the nodes have the same weight, meaning that a message is passed from a node to all its neighbors. The main drawback is that several erroneous messages might be passed across the graph, leading to an increased number of iterations without guarantee of reaching the best solution. Several works have tried to improve the performances of the message passing step of the standard belief-propagation method. The proposed HBP technique allows both improving the quality of results and speeding up the inference step.
where is the Euclidian distance between node and node in and is the diagonal of the cubic support window of edge β. According to (11), noisy nodes, characterized by a high confidence measure and an outlying disparity value, are eliminated. This reduces the errors in the high level of the message passing step and enables to decrease significantly the number of iteration, which leads to reach the optimal solution quickly.
The confidence measure is used to assign a weight to each node in the graph. At each iteration, messages are passed hierarchically from nodes having a high confidence measure (i.e., high weight) to nodes having a low confidence measure (i.e., small weight). A high weight means a high certainty of the message to be passed. The weights of the nodes are updated after each iteration, so that a subset of nodes is activated to be able to send messages in the next iteration.
The propagation is first performed inside a consistent color region, and then passed to the neighboring regions. The set of regions is obtained by a color-based segmentation using the mean shift method .
In our framework, the messages are passed differently from the standard BP algorithm. Instead of considering the 4-connected nodes, the k-nearest neighboring nodes are considered. These k-nearest neighboring nodes belong to a cubic 3D support window. We assume that the labels of nodes vary smoothly within a 3D support window centered on the node to be updated.
6. Experimental Results
The effectiveness of the proposed system is evaluated on both standard and real dataset. Each module is evaluated separately due to the unavailability of a conventional ground truth for motion estimation and 3D localization at a Level crossing.
The proposed stereo-matching algorithm is evaluated on the Middlebury stereo benchmark , using the Tsukuba, Venus, Teddy, and Cones standard datasets and on real-world datasets. The evaluation concerns nonoccluded regions (nonocc), all regions (all) and depth-discontinuity regions (disc). In the first step of our algorithm, the WACD likelihood function is performed on all the pixels. Applying the "winner-take-all" strategy, a label corresponding to the best estimated disparity is attributed to each pixel. The second step consists of selecting a subset of pixels according to their confidence measure. Indeed, the pixels having a low confidence measure generally belong to either occluded or textureless regions. However, the subset corresponding to the well-matched pixels is taken as the starting point of the hierarchical belief-propagation module. We begin by evaluating the selective approach of attributing a confidence-measure to each matched pair. Figure 6 shows the percentage of well-matched pixels depending on the confidence measure parameter. The higher the confidence measure, the greater the rate of well-matched pixels.
Algorithm evaluation on the Middlebury dataset.
The different steps described in the previous sections are illustrated in Figure 10, showing a car crossing an LC in Lausanne (Switzerland). CICA is applied to the left-hand image. The segmentation results are used as motion constraints to the stereo matching process, yielding quite often several disparity values for each detected foreground point. The false matches corresponding to wrong disparity values are detected automatically using the confidence-measure technique. However, the final disparity map obtained for each object allows locating very precisely each object at the LC. The foreground extraction method based on CICA has already been evaluated in terms of Recall (95%) and Precision (98%), on a set of 300 images with manually elaborated ground truth.
where is the depth, that is, the distance between the sensor camera and the object point along the axis, is the focal length, that is, the distance between the lens and the sensor, supposed identical for both cameras, is the baseline, that is, the distance separating the cameras, and is the estimated disparity.
7. Conclusions and Perspectives
In this paper, we have proposed a processing chain addressing safety at level crossings composed of a foreground extraction based on CICA followed by a robust 3D localization. The latter proves its effectiveness compared to stereo-matching algorithms found in the literature. The experimentations showed that the method is applicable to real-world scenes in level-crossing applications. The foreground extraction method based on CICA has already been evaluated in terms of Recall (95%) and Precision (98%) on a set of 300 images with manually elaborated ground truth. Real-world datasets have been shot at four different level crossings, including a hundred scenarios per level crossing under different illumination and weather conditions. The global chain including foreground extraction and 3D localization, still needs to be evaluated intensively on the above dataset. According to the experimentations, the localization of some objects may fail. However, the localization of one among sixty objects fails, this is due to the smaller number of pixels having confidence measure larger than a fixed threshold. The starting point of the belief-propagation process highly depends on the number and repartition of pixels, having hight confidence measure, inside an object. This drawback can be handled by introducing the temporal dependency in the belief-propagation process.
The main output of the proposed system is an accurate localization of any object in and around a level crossing. For safety purposes, the proposed system will be coupled with already existing devices at level crossings. For instance, the status of the traffic light and the barriers will be taken as input in our vision-based system. The level of such an alarm depends on the configuration of the different parameters. For instance, the presence of an obstacle in the crossing zone when the barriers are lowering is a dangerous situation, and the triggered alarm must be of high importance. A Preliminary Risk Analysis (PRA) seems to be an interesting way to categorize the level of alarms. In the frame of the French project entitled PANSafer, these different parameters will be studied. In particular, telecommunication systems will be used to inform road users on the status of the level crossing. Such informations could also be shared with the train driver and the control room. The communication tool and the nature of information to be transmitted are in study.
- Project SELCAT (Safer European Level Crossing Appraisal and Technology) : A Co-ordination Action of the European Commission's. 6th Framework Programme
- Foresti GL: A real-time system for video surveillance of unattended outdoor environments. IEEE Transactions on Circuits and Systems for Video Technology 1998, 8(6):697-704. 10.1109/76.728411View ArticleGoogle Scholar
- Ohta M: Level crossings obstacle detection system using stereo cameras. Quarterly Report of Railway Technical Research Institute 2005, 46(2):110-117. 10.2219/rtriqr.46.110Google Scholar
- Fakhfakh N, Khoudour L, El-Koursi M: Mise en correspondance stéréoscopique d'images couleur pour la détection d'objets obstruant la voie aux passages à niveau. Proceedings of the TELECOMA and 6th JFMMA, 2009, Agadir, Maroc 206-209.Google Scholar
- Fakhfakh N, Khoudour L, El-Koursi M, Jacot J, Dufaux A: A new selective confidence measure-based approach for stereo matching. Proceedings of the International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, 2009, Santiago, Chile 5711: 184-191.View ArticleGoogle Scholar
- Herault J, Jutten C: Space or time adaptive signal processing by neural networks model. Proceeding of International Conference on Neural Networks for Computing, April 1986, Snowbird, Utah, USA 206-211.Google Scholar
- Zhang X-P, Chen Z: An automated video object extraction system based on spatiotemporal independent component analysis and multiscale segmentation. EURASIP Journal on Applied Signal Processing 2006, 2006: 1-22.MATHGoogle Scholar
- Tsai D-M, Lai S-C: Independent component analysis-based background subtraction for indoor surveillance. IEEE Transactions on Image Processing 2009, 18(1):158-167.View ArticleMathSciNetGoogle Scholar
- Hyvärinen A: Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks 1999, 10(3):626-634. 10.1109/72.761722View ArticleGoogle Scholar
- Zhen T, Zhenjiang M: Fast background subtraction using improved GMM and graph cut. Proceedings of the 1st International Congress on Image and Signal Processing (CISP '08), 2008, Sanya, China 4: 181-185.Google Scholar
- Kim K, Chalidabhongse TH, Harwood D, Davis L: Real time foreground-background segmentation using a modified codebook model. Journal of Real-Time Imaging 2005, 11(3):172-185. 10.1016/j.rti.2004.12.004View ArticleGoogle Scholar
- Comaniciu D, Meer P: Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 2002, 24(5):603-619. 10.1109/34.1000236View ArticleGoogle Scholar
- Felzenszwalb PF, Huttenlocher DP: Efficient belief propagation for early vision. International Journal of Computer Vision 2006, 70(1):41-54. 10.1007/s11263-006-7899-4View ArticleGoogle Scholar
- Boykov Y, Veksler O, Zabih R: Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 2001, 23(11):1222-1239. 10.1109/34.969114View ArticleGoogle Scholar
- Isard M, MacCormick J: Dense motion and disparity estimationvia loopy belief propagation. Proceedings of the Asian Conference on Computer Vision, January 2006, Hyderabad, India 32-41.Google Scholar
- Zhou X, Wang R: Stereo matching based on color and disparity segmentation by belief propagation. Optical Engineering 2007., 46(4):
- Yang Q, Wang L, Yang R, Wang S, Liao M, Nister D: Real-time global stereo matching using hierarchical belief propagation. Proceedings of the Brith Machine Vision Conference (BMVC '06), September 2006 989-998.Google Scholar
- Trinh H: Efficient stereo algorithm using multiscale belief prpagationon segmented images. Proceedings of the Brith Machine Vision Conference (BMVC '08), 2008Google Scholar
- Yang Q, Wang L, Yang R, Stewénius H, Nistér D: Stereo matching with color-weighted correlation, hierarchical belief propagation, and occlusion handling. IEEE Transactions on Pattern Analysis and Machine Intelligence 2009, 31(3):492-504.View ArticleGoogle Scholar
- Yang Q, Wang L, Ahuja N: A constant-space belief propagation algorithm for stereo matching. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10), June 2010, San Francisco, Calif, USA 1458-1465.Google Scholar
- Scharstein D, Szeliski R: Middlebury stereo vision research. http://vision.middlebury.edu/stereo/eval
- Miyazaki D, Matsushita Y, Ikeuchi K: Interactive shadow removal from a single image using hierarchical graph cut. Proceedings of the Asian Conference on Computer Vision (ACCV '09), 2009Google Scholar
- El-Etriby S, Al-Hamadi A, Michaelis B: Desnse stereo correspondance with slanted surface using phase-based algorithm. Proceedings of the IEEE International Symposium on Indistrual Electronics, 2007Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.