Vehicle classification based on images from visible light and thermal cameras
EURASIP Journal on Image and Video Processing volume 2018, Article number: 5 (2018)
We propose novel vehicle detection and classification methods based on images from visible light and thermal cameras. These methods can be used in real-time smart surveillance systems. To classify vehicles by type, we extract the headlight and grill areas from the visible light and thermal images. We then extract texture characteristics from the images and use these as features for classifying different types of moving vehicles. We also extract several features from images obtained at night and during the day, which are the contrast, homogeneity, entropy, and energy. We validated our method experimentally and achieved that the accuracy of our visible image classifier was 92.7% and the accuracy of our thermal image classifier was 65.8% when vehicles were classified into six types such as SUV type, sedan type, RV type.
Recently, as vehicle traffic levels have increased, issues such as traffic accidents, congestion, and traffic-induced air pollution have arisen. Among these issues, traffic accidents present a particularly challenging problem. When conducting criminal investigations of traffic accidents, it is important to have automated ways of searching for suspicious vehicles. As a result, Intelligent Transportation Systems (ITS) have been implemented in many countries [1,2,3]. These systems detect and classify vehicles based on data from video and infrared cameras and acoustic and vibration sensors. However, the expense of traditional vehicle detection and classification systems renders them impractical. Furthermore, these systems are difficult to operate, as they require a large amount of hardware. Nevertheless, in the past decade, traffic surveillance cameras have been increasingly deployed to monitor traffic on major roadways. Hence, the effective utilization of these cameras for data collection is of practical significance.
One of the essential components of real-time smart surveillance systems in ITSs is that of moving vehicle detection and classification. Moving vehicles are detected from video frames by foreground extraction. There is a range of vehicle detection methods, based on the inter-frame difference method , background extraction method, and optical flow estimation method . Background subtraction is the first step toward object detection and can be performed using frame averaging, a single Gaussian , or a Gaussian mixture model (GMM). Friedman and Russel  proposed the basic idea of using a GMM for vehicle detection. They used three Gaussians to represent the road, shadow, and the moving vehicle. This method was then modified by Stauffer and Grimson , who used K Gaussians, where K was fixed. Zoran Zivkovic  used a Bayesian probability method to adaptively vary the number of Gaussian components required to model a pixel.
Hit-and-run is the act of causing to a traffic accident, such as colliding with another vehicle, and then failing to stop and identify oneself at the accident. Many hit-and-run accidents involving parked cars occur while the driver of the struck car is away from the car. Many studies have been devoted to the detection and classification of hit-and-run vehicles. However, relatively few studies have been conducted to identify the factors that contribute to the decision to run after a crash. There are many issues affecting the detection and classification of different types of vehicles. For example, it is important to classify a characteristic region rather than the contour of a vehicle. However, image-processing techniques are very sensitive to variations in the external environment, so they tend to lose accuracy when the external environment changes rapidly.
2 Related work
Numerous researchers have proposed approaches for detecting vehicles and classifying different types of vehicles. In , standard Principal Component Analysis (PCA) was used for feature extraction, together with a nearest-neighbor classifier. However, as the evaluation database used in that study was relatively small, it is difficult to draw any firm conclusions from this work. In , a Haar wavelet was combined with Gabor features to describe the properties of a vehicle. Scale invariant feature transform (SIFT) features  were used in  to detect the rear images of vehicles. In , texture features were computed from frontal images of vehicles. The authors built a three-layer neural network and trained it with texture features. Thus, the neural network was able to recognize the make and model of moving vehicles.
In , a vehicle-detection and tracking system was developed to detect vehicles entering. This system measured optical flow and tracked vehicles by classifying their headlights, bonnet, front window, and roof area. In , the authors proposed a vehicle model recognition system based on a SIFT of an image of the vehicle’s headlights and the homogeneity, which were calculated based on the distribution of features. In , license plates were extracted from ROIs and corner templates based on edge detection. In , a distortion invariant vehicle license plate and recognition algorithm were proposed based on a Difference of Gaussians (DoG) filter. In addition, after geometric distortion correction and image enhancement, neural networks were used to recognize the license plates.
The authors of  presented a vehicle detection method based on extracting a histogram of oriented gradient (HOG) features from a given region of an image. In , a combination of speeded up robust features (SURF)  and edges was used to detect vehicles in the blind spot. Recently, researchers studying vehicle detection have moved away from complex image features such as Gabor filters and HOGs to simpler, more efficiently computable feature sets. As Haar-like features are sensitive to vertical, horizontal, and symmetric structures, and they can be computed efficiently, they are well suited to real-time vehicle detection applications .
In recent decades, traffic information was obtained using a thermal camera . In addition, a thermal camera with UAV was used to detect objects in the ocean surface . In [25, 26], people were detected and classified by a moving thermal infrared camera with the low-resolution images. In this paper, we propose a smart surveillance system to detect and classify vehicles during the day and at night. To achieve this, we first select regions of interest (ROIs) of the vehicle from the visible light and thermal images. Then, we measure the texture, contrast, homogeneity, entropy, and energy of the vehicle. We also estimate the aspect ratio of the headlight area and the grill area in each frame.
3.1 System architecture
Figure 1 shows a flow chart of the vehicle detection and classification process. The vehicle detection and classification steps outlined in the flow chart are followed by both the visual and thermal image classification systems. The first step is to subtract the foreground image from a corresponding reference background image and decompose the image into regions that correspond to objects. In the second step, we extract the headlight and grill areas. In the feature extraction step, we extract object features such as the color, texture, and shape. The fourth step is to compare the features of different objects, either to identify them or to train the classifier. Finally, we classify the type of vehicle.
3.2 Foreground subtraction and ROI extraction of visual images
Foreground subtraction is a general method used to separate foreground objects from the rest of an image. First, a reference image is produced based on the initial images. Then, we compare the current image to the reference image to identify the foreground. The reference image can be produced by averaging the initial background images. The first step is to detect the vehicles that we will then classify. We detect the vehicles using the background modeling and subtraction (BGS) model. To make a robust, accurate model that can classify vehicles in real time, we also perform Gaussian background subtraction and filtering using functions provided by the OpenCV library . We use pixel-based and non-motion-based background update methods to perform adaptive region-level background learning and updating. We exclude distant regions by specifying an ROI for each camera. To eliminate objects that are too small or too large, we use adaptive filters to calculate the object size and average vehicle height.
Figure 2 shows the obtained images and the results of the object subtraction step. As shown in the figure, when objects move through the image for a specified period, we update the motion history by comparing the current image to the first two images of the vehicle. We store the location of the center and direction of the object in a buffer. The center of the object in the next frame is determined as the point of the shortest distance between the center of the object in the previous frame and the center of the object in the next frame. Figure 3 shows the results of the foreground extraction step.
3.3 Feature extraction of visual images
In the feature extraction step, we detect Sobel edges in the ROIs of two consecutive images and measure vertical and horizontal projections in the grill and headlight regions. Figure 4a, b show the extracted edge image and horizontal projection of the ROIs, respectively. The grill and headlight regions are extracted using vertical projections and a median filter. In Fig. 5, the ROIs extracted from the detected vehicle are marked with red.
In general, vehicle models are identified using texture descriptors. Texture descriptors [28, 29] have been widely used to quantify the texture of objects. The differences between the brightness of the pixels represent the texture of the image. The local binary pattern (LBP) approach is a computationally simple method that provides highly discriminative texture information [30, 31] that is invariant to monotonic changes in gray level. We used histograms of LBP patterns as texture descriptors and classified them using a log-likelihood dissimilarity measure. The LBP operator was extended to utilize neighborhoods of different sizes .
We quantified textures by generating a gray level concurrence matrix (GLCM) based on the spatial relationships between pixels. The GLCM considers the relationship between two pixels at a time. These are denoted the reference pixel and the neighbor pixel. The GLCM matrix elements G[i, j] represent the probability distribution of all pixels that are distance d between a pixel of level i and a pixel of level j. As gray images generally have 256 levels, the dimension of the GLCM is very large. Many other kinds of texture descriptors can be defined, such as the contrast, dissimilarity, and homogeneity. These are based on the brightness, momentum, and entropy, which are related to properties of the distribution statistics, such as the regularity, or average, variance, and correlation. In this paper, we used four basic texture descriptors in the GLCM matrix G: the contrast, homogeneity, entropy, and momentum.
The contrast, defined by Eq. (1), is the difference in luminance or color. The contrast enables us to distinguish objects. In real scenarios, often, there may be a low contrast between the background and the vehicle. This makes it difficult to improve the accuracy of the detection and classification schemes. The accuracy of existing vehicle classification schemes in low contrast conditions is not sufficient to ensure the success of ITS.
By homogeneity, we mean the spectral homogeneity, which is defined in Eq. (2). We used homogeneity as a texture feature for a Bayesian classifier. However, we cannot discriminate vehicle types well using only the contrast and homogeneity.
Entropy is a measure of information content, which is defined in Eq. (3) as the average uncertainty of the information source. Mutual data measures the mutual dependence of two random variables and can be used as a measure of similarity.
Energy can usually be used to distinguish between vehicle types. Energy is defined by the following equation:
3.4 Feature extraction of thermal images
We collected real-time videos from visual and thermal cameras, operating both at day and night. The collected video dataset comprises two sequences, one for the visual images and one for the thermal images. The thermal images were obtained from FLIR ONE , which consists of a thermal camera with a resolution of 160 × 120 pixels and a recording speed of 10 frames-per-second (fps), and a visual camera with a resolution of 1280 × 720 pixels and a recording speed of 29 fps.
Figure 6 shows visual images obtained from FLIR ONE during the daytime and during the night. Vehicles can be classified by visual images obtained from a RGB camera during the daytime, but vehicles cannot be classified by visual images obtained at night unless there is sufficient illumination. Therefore, another type of data is needed as input to a vehicle detector and classifier at night. Thermal imaging data could be suitable for this application.
Figure 7 shows thermal images from the FLIR ONE taken during the day and at night. The thermal images obtained during the day contain fewer features that can be used for classification purposes than visual images obtained during the day. However, thermal images obtained at night contain more features that can be used for classification purposes than visual images obtained during the night, as shown in Figs. 6b and 7b.
The vehicle can be identified by the headlight and the shape including the ratio of the width of the vehicle to the height of the vehicle then applied image segmentation and pattern analysis techniques. However, the headlight makes it difficult to classify vehicle types using visual video images taken at night, whereas thermal video cameras have an obvious advantage in the case of nighttime surveillance.
The headlights can be detected by identifying bright masses of pixels in the ROI. These can be identified using morphological operations  or via template matching . However, in practice, these methods are not efficient when applied to all kinds of traffic scenes and camera exposures. During the day, vehicle types can be classified by the shape of the headlight or the distance between the headlights. However, most vehicles have circular headlamps, which emit a circular beam of light at night. This makes it very difficult to distinguish between vehicle types using data captured from visual cameras at night. The vehicle consists of various elements such as the bumper, engine, windshield, and tiers, which have their own heat identification depending on the type of vehicles. With varying shapes, sizes, and position of each element of vehicles, the vehicles can be classified into vehicle types such as sedan, SUV(sport utility vehicles or recreational vehicles), truck, and bus as shown in Fig. 7b. In this paper, we measured the contrast, homogeneity, entropy, and momentum using thermal images for vehicle classification.
4 Results and discussion
We evaluated our classifiers using a dataset acquired on a local, two-way road. The dataset comprised 767 visual vehicle images and 447 thermal vehicle images. We classified the visual images into six types based on the texture and the ratio of the width of the grill to the height of the grill. We used 6671 visual images and 4005 thermal images as a training set and 767 visual vehicle images and 447 thermal vehicle images as a test-set. To classify the thermal images, we identified vehicle objects then extracted the shapes of the fronts of the cars. We then classified the vehicles in six types based on their texture.
We improved the robustness and accuracy of the model by performing a mixture of Gaussian background subtraction and filtering using functions provided in the OpenCV library. We then extracted a set of features from the GLCM for texture analysis. In Fig. 6a, we define the vehicle types d1, d2, d3, d4, d5, and d6, in a clockwise direction, which are the full-size SUV, the mid-size SUV, the small-size sedan, the full-size sedan, the full-size RV, and the mid-size RV. Table 1 shows the confusion matrix of the visual image classifier, and the accuracy is summarized in Table 2. The overall accuracy of this classifier is 92.7%.
When we divided a vehicle extracted from the visual image dataset collected during the day into the headlight and grill regions, there are sometimes bright blobs caused by white bands on the ground. For example, there are large bright blobs caused by one or more headlights reflecting off the bonnet of a white or light vehicle. There may also be bright blobs caused by the reflections of headlights on the pavement or by sunlight. In addition, there are lots of small bright blobs caused by highlights on the vehicles. There may even be more complicated bright blobs with many components.
In Fig. 7a, we defined vehicle types by d1, d2, d3, d4, d5, and d6, in a clockwise direction. Using the GLCM, we classified the vehicles in terms of the ratio of the width of the front to the height of the front. Table 3 shows the confusion matrix of the thermal image classifier, and the accuracy is summarized in Table 4. The overall accuracy of this classifier is 65.8%.
When monitoring traffic using low- and medium-mounted video cameras, nighttime conditions present many issues. These conditions are challenging, firstly because the typical daytime surveillance framework, using a RGB camera, cannot work at night, due to the contrast and light sensitivity of the camera, which are generally the moving reflections emerging from the headlights.
Even though thermal images obtained from FLIR ONE at night provide more reliable features for vehicle classification, we cannot distinguish vehicles using thermal images obtained at night because the resolution is insufficient and less identical features. Therefore, it is necessary to identify appropriate types of sensor data and find new methods to detect and classify vehicles at night.
For additional experiments, we defined vehicle types by sedan, SUV, and truck. Table 5 shows the confusion matrix of the visual image classifier, and the accuracy is summarized in Table 6. The correct rates for classes, sedan, SUV, and truck, were 96.3, 95.4, and 98.2%, respectively. The overall accuracy of this classifier is 96.4%. The results show that the proposed method could identify the valid vehicle ROIs in different types effectively. From the table, we found that the regions of sedan and SUV were misclassified due to the similar shapes.
Table 7 shows the confusion matrix of the thermal image classifier, and the accuracy is summarized in Table 8. The correct rates for classes, sedan, SUV, and truck, were 86.9, 22.2, and 28.6%, respectively. The overall accuracy of this classifier is 70.5%. From the table, we found that the regions of SUV and truck were misclassified due to the similar shapes. In particular, the regions of SUV were similar to those of sedan because the thermal image shapes of the SUV were similar to those of sedan.
In this paper, we have presented a smart surveillance system to detect and classify vehicles. We collected videos during the day and at night using FLIR ONE. We selected the front, grill, and headlight of each vehicle as ROIs. For feature extraction, we measured the texture, contrast, homogeneity, entropy, and energy from front view images. This enabled us to classify six types of vehicle. We increased the accuracy of the classification by estimating the ratio of the width to the height of the headlight and the grill. In the experiments, when vehicles were classified into six types, the accuracy of the classifiers based on visible light and thermal images were 92.7 and 65.8%, respectively. When vehicles were classified into three types, the accuracy of the classifiers based on visible light and thermal images were 95.9 and 70.5%, respectively. Even though thermal images at night provide more reliable features for vehicle classification, the accuracy of the classifier using thermal images is lower than that of using visual images in the daytime because the resolution is insufficient and less identical features. To classify vehicles more accurately at night, more reliable features and a thermal camera with a higher resolution are necessary. In the future, we will improve our method for detecting vehicles at night. In addition, we will improve our method by considering various types of vehicles by mixture and validate our method using a wider range of experimental data. In addition, we will conduct additional experiments to show that the proposed method is robust to the weather conditions.
Q Luo, Research on Intelligent Transportation System Technologies and Applications, in: 2008 Workshop on Power Electronics and Intelligent Transportation System (2008), pp. 529–531
FH Somda, H Cormerais, J Buisson, Intelligent transportation systems: A safe, robust and comfortable strategy for longitudinal monitoring. IET Intell. Transp. Syst. 3(2), 188–197 (2009)
M Tubaishat, P Zhuang, Q Qi, et al., Wireless sensor networks in intelligent transportation systems. Wirel Commun Mob Comput 9(3), 287–302 (2009)
DLK Park, Y Park, Video-Based Detection of Street-Parking Violation, vol 1 (Proc. Int. Conf. Image Process., Comput. Vis., Pattern Recognit, Las Vegas, 2007), p. 152156
A Ottlik, HH Nagel, Initialization of model-based vehicle tracking in video sequences of inner city intersections. Int. Journal of Computer Vision 80, 211–225 (2008)
B Morris, M Trivedi, in Intelligent Transportation Systems Conference, ITSC’06. Robust classification and tracking of vehicles in traffic video streams (2006), p. 10781083
N Friedman, S Russell, Image Segmentation in Video Sequences: A Probabilistic Approach (Proc. of the Thirteenth Conf. on Uncertainty in Artificial Intelligence (UAI), Providence, 1997), pp. 1–3
C Stauffer, WEL Grimson, Adaptive Background Mixture Models for Real-Time Tracking, vol 2 (Proc. IEEE Comput. Soc. Conf. Computer Vision Pattern Recognition, Fort Collins, 1999), p. 246252
Z Zivkovic, Improved Adaptive Gaussian Mixture Model for Back- Ground Subtraction, vol 2 (Proceedings of the 17th Int. Conf. on Pattern Recognition, Cambridge, 2004), pp. 23–26
J Wu, X Zhang, A PCA Classifier and its Application in Vehicle Detection, vol 1 (Proc. IEEE Int’l Joint Conf. Neural Networks, Washington, 2001), pp. 600–604
Z Sun, G Bebis, R Miller, Improving the Performance of on-Road Vehicle Detection by Combining Gabor and Wavelet Features (The IEEE 5th International Conference on Intelligent Transportation Systems, Singapore, 2002), pp. 130–135
D Lowe, in Proc. Int. Conf. Comput. Vis. Object recognition from local scale-invariant features (1999), pp. 1150–1157
X Zhang, N Zheng, Y He, F Wang, in Proc. 14th Int. IEEE Conf. ITSC. Vehicle detection using an extended hidden random field model (2011), pp. 1555–1559
Hyo Jong Lee. "A study on the model recognition of moving vehicles using a neural network." Ins Electron Eng Korea - Signal Processing, 42.4 (2005.7): 69-78.
Sung-Ho Bae, Jun-Eui Hong. "A vehicle detection system robust to environmental changes for preventing crime." J Korea Multimedia Soc, 13.7 (2010.07): 983-990.
Min-Ho Kim, Doo-Hyun Choi. "A vehicle model recognition using car’s headlights features and homogeneity information." J Korea Multimedia Soc, 14.10 (2011.10): 1243-1251.
Jong-Hwa Kim, Doo-Hyun Choi. "License plate extraction through the searching area reduction and corner templates." Proceeding of Korea Multimedia Society, (2010.5): 31-32.
J-H Kim, Distortion invariant vehicle license plate extraction and recognition algorithm. J Korea Contents Assoc 11.3(2011.3), 1–8
M Cheon, W Lee, C Yoon, M Park, Vision-based vehicle detection system with consideration of the detecting location. IEEE Trans. Intell. Transp. Syst. 13(3), 1243–1252 (2012)
BF Lin, YM Chan, LC Fu, PY Hsiao, LA Chuang, SS Huang, M-F Lo, Integrating appearance and edge features for sedan vehicle detection in the blind-spot area. IEEE Trans. Intell. Transp. Syst. 13(2), 737–747 (Jun. 2012)
H Bay, A Ess, T Tuytelaars, LV Gool, SURF: Speeded up robust features. Comput. Vis. Image Underst 110(3), 346–359 (2008)
X. Wen, Y. Zheng, “An Improved Algorithm Based on AdaBoost for Vehicle Recognition”, The 2nd International Conference on Information Science and Engineering (ICISE2010), Hangzhou, China, 2010, pp. 4-7.
Y Iwasaki, M Misumi, T Nakamiya, Robust vehicle detection under various environmental conditions using an infrared thermal camera and its application to road traffic flow monitoring. Sensors 13(6), 7756–7773 (2013)
Leira, F. S., Johansen, T. A., & Fossen, T. I. (2015). Automatic detection, classification and tracking of objects in the ocean surface from uavs using a thermal camera. In Aerospace Conference, 2015 IEEE (pp. 1-10).
M Teutsch, T Muller, M Huber, J Beyerer, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Low resolution person detection with a moving thermal infrared camera by hot spot classification (2014), pp. 209–216
X Zhao, Z He, S Zhang, D Liang, Robust pedestrian detection in thermal infrared imagery using a shape distribution histogram feature and modified sparse representation classification. Pattern Recogn. 48(6), 1947–1960 (2015)
Open Source Computer Vision Library (2017). https://sourceforge.net/projects/opencvlibrary/ opencvlibrary/
JR Parker, Algorithms for Image Processing and Computer Vision (John Wiley and Son, New York, 1996)
I Pitas, Digital Image Processing Algorithms and Applications (Wiley Inter- Science, New York, 2000)
T Ojala, M Pietikäinen, D Harwood, A comparative study of texture measures with classification based on feature distributions. Pattern Recogn. 29, 51–59 (1996)
T. Ojala, M. Pietikäinen, and T. Mäenpää, “Multiresolution Gray Scale and Rotation Invariant Texture Classification with Local Binary Patterns,” IEEE Trans. Pattern Anal. Machine Intell., vol. 24, no. 7, in press.
Z Guo, L Zhang, D Zhang, Rotation invariant texture classification using LBP variance (LBPV) with global matching[J]. Pattern Recogn. 43(3), 706–719 (2010)
FLIR ONE, https://www.flirkorea.com
R. Taktak, M. Dufaut, and R. Husson. Vehicle detection at night using image processing and pattern recognition. In Im- Age Processing, 1994. Proceedings. ICIP-94., IEEE International Conference, volume 2, pages 296–300vol.2, 13–16 Nov. 1994.
R. Cucchiara, M. Piccardi, and P. Mello. sis and rule-based reasoning for a traffic monitoring system. Intelligent Transportation Systems, IEEE Transactions on, 1(2):119–130, June 2000.
The authors would like to thank Chungman Kim and Chanki Moon for his valuable contribution to this project.
This work was supported by the Soonchunhyang University Research Fund and also supported by the “ICT Convergence Smart Rehabilitation Industrial Education Program” through the Ministry of Trade, Industry & Energy (MOTIE) and Korea Institute for Advancement of Technology (KIAT).
Availability of data and materials
We can provide the data.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Nam, Y., Nam, YC. Vehicle classification based on images from visible light and thermal cameras. J Image Video Proc. 2018, 5 (2018). https://doi.org/10.1186/s13640-018-0245-2