A study on implementation of real-time intelligent video surveillance system based on embedded module

Conventional surveillance systems for preventing accidents and incidents do not identify 95% thereof after 22 min when one person monitors a plurality of closed circuit televisions (CCTV). To address this issue, while computer-based intelligent video surveillance systems have been studied to notify users of abnormal situations when they happen, it is not commonly used in real environment because of weakness of personal information leaks and high power consumption. To address this issue, intelligent video surveillance systems based on small devices have been studied. This paper suggests implement an intelligent video surveillance system based on embedded modules for intruder detection based on information learning, fire detection based on color and motion information, and loitering and fall detection based on human body motion. Moreover, an algorithm and an embedded module optimization method are applied for real-time processing. The implemented algorithm showed performance of 88.51% for intruder detection, 92.63% for fire detection, 80% for loitering detection and 93.54% for fall detection. The result of comparison before and after optimization about the algorithm processing time showed 50.53% of decrease, implying potential real-time driving of the intelligent image monitoring system based on embedded modules.

sources. Section 4 describes implementation of the suggested intelligent video surveillance system. Section 5 describes the result of experiment for the implemented intelligent video surveillance system. Section 6 describes conclusion of this paper.

Related works
An abnormal situation is defined as a situation different from general behaviors and environment from the standard personal, statistic, socio-cultural and professional viewpoint. It is classified into intrusion, arson, loitering, fall and assault. There are various conventional algorithms for detecting abnormal situations depending on the type of detection, and conventional intelligent surveillance systems use computers or small devices.

Abnormal situation detection algorithm
The initial face recognition method for detecting intruder is to use principal component analysis (PCA) to recognize faces by using the eigenface obtained through PCA in the face image [6]. Other exemplary methods include the method for using images reconfigured with the discrete cosine transform (DCT) coefficient for the face images to improve PCA performance [7]; the method for using combination of PCA with linear discriminate analysis (LDA) [8]; and the method using PCA and LDA for the face image by the distances of 1 to 5 m [9]. Another method is to extract features with local binary pattern (LBP) to classify them by using Convolution Neural Networks (CNN) [10]. The last method is to recognize faces with Restricted Boltzmann Machines (RBM) of deep learning to use face expression change, lighting change and changing angular images [11]. The method for detecting fires is divided into the method for using sensors and the method for using images. The method for using sensors is divided into the method for detecting fires by using the values obtained by using temperature sensors, smoke sensors and CO2 sensors as parameters of the Fuzzy logic [12], and the method for using a device made by combining 8 sensors of AMS MOX sensors, PID sensors, and NDIR CO 2 sensors et [13].
The color-based fire detection method is divided into the method for using additional information including spreading fires after detecting fire colors in the RGB color space [14], the method for detecting fires by calculating standard deviations of colors in various fire environments [15], and the method for detecting fires by using features of HIS color space and optical flows [16]. Another method is to detect fires with support vector machine (SVM) by using features of HIS color space, 2-dimensional discrete wavelet transform (DWT), pixel ratios and optical flow [17].
The method for identifying loitering is divided into the method for using time to measure the time of an object staying in an image [1,18,19], and the method dividing an image inputted into blocks of n by m to measure the time in each block [20,20]. This is a method for using the characteristics that a loitering person shows more directional changes than normal people to measure object angles [22]. The method for identifying loitering by using 2 conditions for determining loitering is divided into the method for measuring the object time and the block time [23], the method for measuring the object time and using angles [24], and the method for using angles and the object movement distance [25]. The method for detecting fall is divided into the method for using sensors and the method for using images. The method for using sensors is divided into the method for attaching 3-axis accelerometer sensors to a body to use sensor values and acceleration values [26], and the method for using accelerometer sensors and pressure sensors [27]. Moreover, a method for using accelerometer sensors of mobile phones is the method for saving normal acceleration patterns as Activities of Daily Living (ADL) to compare them with ADL on the basis of real-time nearest neighbor rule (NNR) [28].
The method for using images is divided into the method for setting up a circle changing depending on object motion, and using circle changes, vertical and horizontal histogram feature values to determine fall with SVM [29], the method for setting up a bounding box in a detected object and using changing acceleration of the bounding box [30], and the method for using aspect ratios, effective area ratios of a concerned object, object feature points, axis angles, and contour ratios to determine fall with SVM [31].

Intelligent video surveillance system
The intelligent video surveillance system for issuing an alert when an abnormal situation occurs in video is divided into the method for using computers and the method for using small devices. The computer-based intelligent video surveillance system conducts various detections, and have been studied. However, they involve installation and maintenance costs, high power consumption and personal information leaks, and are thus not ideal to be used in real environment.
An exemplary conventional computer-based intelligent video surveillance system is an object tracking system by using a plurality of cameras. The videos inputted by digital signal processor (DSP) based IP cameras are encoded by using audio video coding standard (AVS) and sent to the IP network by means of real-time streaming protocol (RTSP). The IP network uses GPU for reducing processing time to conduct distributed processing [32]. Second, another system is the fall detection system based on body contours. This is a system for using videos inputted by cameras in real time so that computers can detect fall and send the information to a hospital server through its network to save and monitor the information. The fall detection system uses Gaussian mixture model (GMM) in the inputted images to detect objects and uses aspect ratios and tilt angle features of a human body contour with less computation for determining the fall [33]. The computerbased intelligent video surveillance system sends the videos inputted by each camera to the central server. The central server conducts various detections in combination, for example, intruder detection, fire detection, loitering detection, and fall detection.
An exemplary conventional intelligent surveillance system based on small devices is the intruder detection system using Raspberry Pi and Arduino. The system detects motions in inputted images by using MOG2, and determines human bodies by using sizes thereof. The system detects faces by using Haar-like features, and detects intruders through fisherfaces-based face recognition. When an intruder is identified, a relevant user is issued with an alert through e-mail, and can view the video remotely through a web interface [34].
Second, another exemplary system for detecting intruders uses Raspberry Pi. When a motion is sensed in an inputted video, this system saves the sensed video in the cloud server for later examination. This system detects object s by using differential images, and uses Haar-like features to determine human bodies. When the object is identified as a human body, this system uses the GSM module to send messages to a relevant user in order to notify the user of the abnormal situation [35]. Last, the unmanned aerial vehicle (UAV) fire detection system uses QuaRC-based single Gumstix. This system uses fire color information and motion information to determine fires, and uses the Lab color space to use color information. It also uses optical flow to use motion information. This system uses sliding mode control (SMC) and linear quadratic regulator (LQR) to reduce calculation time and prevent chattering [36].

Optimization
There are some studies that reduce processing time by using various optimization methods when using small devices with lower specifications than a computer. First, there is a study that reduces processing time in low specification mobile environments by changing the front-end browser loading method. By classifying data types such as text and images, a text layout with a relatively low load is displayed to improve the experience speed, while rendering reconstruction is performed after the screen is displayed. In addition, image size is determined to reduce processing time by reducing or decreasing image quality when rendering takes a long time [37].
T.W. Lee conducted research to reduce embedded module boot time and speed processing of applications. Software suspend using the principle that the memory state and registers are changed when the program is changed from the operating state to the suspend state, the root file system that improves the decompression efficiency to reduce weight, the JFFS2 file system with high compression efficiency and low process usage, etc. The embedded modules used in the experiment is the XP-100 model manufactured by Huins [38].
J.W. Kang analyzed the structural features and speed-reduction factors of mobile enterprise application platform (MEAP) and then performed research to reduce the processing speed of mobile applications using front-end optimization speed improvement techniques. Add an expansion or a cache-control header is carried out to request process resources from a local location, not from a server. Saving server resources required for compression using gzip components, optimization using minimize HTTP requests is conducted to reduce HTTP requests by merging scripts divided [39].
Finally, there is a study by C. C. Paglinawan, who performed optimization on raspberry pi. C. C. Paglinawan performed optimization for the real-time operation of the vehicle speed calculation system, and shortened the vehicle detection time by developing the GMM for vehicle detection and the Kalman Filter (KF) for vehicle tracking in OpenCV. In addition, sparse random projection (SRP) using scikit-learn reduced processing time through image compression that projects high-dimensional video frames into low-dimensional partial spaces [40].
The intelligent video surveillance system based on small devices uses videos inputted by cameras to conduct single detections, for example, intruder detection, fire detection and motion detection. It is required to further study the intelligent video surveillance system based on small devices which conducts integrated detection in order to address weakness of personal information leaks and high power consumption of the computer-based intelligent video surveillance system and use the system efficiently in real environment. In addition, the optimization process is required for the real-time operation of intelligent video surveillance system in low specification devices. Figure 1 shows a flowchart of the suggested intelligent video surveillance system. This system detects a moving object with the adaptive differential images in the video inputted by the embedded module, and conducts erosion for accuracy. When an object is not detected for a given time, the background image is updated, and the object size information is used to determine a human body when an object is detected.

Proposed method for intelligent video surveillance system
If it is not a human body, color information and motion information is used to detect fires. When it is determined as a human body, the system conducts intruder detection by using taught information, loitering detection by using balance point angle changes and movement distances, and fall detection by using human body motion information and acceleration information. When an abnormal situation is identified, this system uses ssmtp and the mock library to send the images and thus notify a relevance of the identified abnormal situation. This paper uses TensorFlow and AlexNet for face recognition. AlexNet based on GPU is composed of 5 convolution layers, 3 max pooling, and 3 fully connected layers. The convolution layers extracts features, and is composed of filters for extracting features and an activation function for changing filter values into non-linear values. Max pooling is for reducing the size of data obtained by applying the filters in the convolution layers. The fully connected layers use extracted feature values to classify data.

Intruder detection
As shown in Fig. 1, the system conducts face detection based on Haar-like features to detect intruders. When a face is detected, the system uses deep learning to conduct face recognition based on the taught information. Deep learning is a type of machine learning for image recognition through combination of various non-linear transformation techniques, and examples thereof include TensorFlow, Cafe and Torch.
TensorFlow is an open source library made by Google, and globally used at present. Because data flow of TensorFlow shows the structural view of parameter changes through graphs, it is easy to know connections and data flows. This paper uses TensorFlow and AlexNet for face recognition. AlexNet based on GPU is composed of 5 convolution layers, 3 max pooling, and 3 fully connected layers. The convolution layers extract features, and is composed of filters for extracting features and an activation function for changing filter values into non-linear values. Max pooling is for reducing the size of data obtained by applying the filters in the convolution layers. The fully connected layers use extracted feature values to classify data (Fig. 2).

Fire detection
In this paper, fires are detected by using fire motion information using fire color information which uses the HSV color space like the flow shown in Fig. 1, and optical flow [16]. While the primary color information RGB color space has values ranging from 0 to 255, non-uniform surface light can result in different brightness or chroma on the entire surface even with the same color. It is necessary to convert the HSV color space composed of colors, chroma and brightness for detection by using colors [41].
The optical flow is a method for using object to estimate the object moving in an image. A light and dark patterned motion in an image is defined with speed vectors u and v, to add them to the object in the video, and the area at the same speed in the videos is determined as the information of one object for detection [42]. For detecting fire motion, the feature extraction method by Shi and Tomas is used to detect feature points in a video, and tracking points repeat while using the optical flow by Lucas-Kanade. The motion speeds u and v of light dark patterns are used to calculate Eqs. (1) to (3) for determining fires: is for using fire color information and the optical flow using the HSV color space to determine fires with fire motion information. In Eq. 4, TH H is a threshold for H representing colors.

Loitering detection
Loitering is defined as moving from place to place of a special space without a fixed plan [3], and the activity before most crimes happen is that the relevant criminal loiter around the environment surrounding a crime target location. Therefore, crimes can be prevented by detecting and closely looking into loitering people to prevent crimes beforehand [4]. In this paper, the following 2 conditions are applied to using walking patterns of abnormal pedestrians described above and thus detect wandering people [43].
Condition 1: Distance of balance point movement of object. Condition 2: Balance point angle changes of object. Figure 3 shows the method for measuring movement distance of the object. If the movement distance from t−1 to t in the object is L(t), L(t−1) and L(t) are calculated by using coordinate changes of the balance point of the object. The balance point of object normalizes the trajectory and uses the current center of gravity in every predetermined section to reduce wrong detection caused by noise. Equation 5 is used to calculate the distance L(t−1) moving from t-1 to t in the object. X is the object length change; Y is the object height change; H is the object height for correcting wrong calculation of different movement distance measurement depending on the inputted video and the distance of the object:  Figure 4 is a method for measuring changing angles of the object. The angle created when the object is moving from t−2 to t−1 is θ(t−1), and the angle created when it is moving from t−1 to t is θ(t). The angle is calculated by using X and Y changes of the balance point of the object. Equation 6 is used to calculate the angles. Angle changes are calculated by using the difference between θ(t−1) and θ(t): Table 2 illustrates angle changes and weight standard of the object. The angles changing right and left from the direction of progress of the current object are grouped in 5 levels to calculate the weight corresponding to each group. Small weights are given to small angle changes, and great weights are given to great angle changes to achieve sensitivity to directional changes of the object.

Fall detection
As shown in the flow of Fig. 1 to detect fall, the motion information and the acceleration information of the object are used [5]. The reference points used for calculating the   motion information and the acceleration information are Y min , Y max , Xmin and Xmax, among which Y min is the minimum coordinate of axis Y; Y max is the maximum coordinate of axis Y; Xmin is the minimum coordinate of axis X; and Xmax is the maximum coordinate of axis X as shown in Fig. 5. Figure 6 shows coordinate changes when the normal state changes into a fall state. Figure 6a shows a backward-fall with increasing Y min and Y max . Figure 6b shows a forwardfall with increasing Y min . Figure 6c shows a left-fall with increasing Y min but decreasing X min and X max . Figure 6d shows a right-fall with increasing Y min , X min and X max . The fall state shows increasing Y min , and the right and left-fall state shows apparently increasing and decreasing X min and X max . These features are used to detect fall based on motion information obtained by Eq. 7:  Equation 8 is used to calculate the acceleration information of the object by using the acceleration information of the object as a condition for determining fall. In Eq. 8, t is time, and V t is an acceleration at t. Equation 9 is used to determine fall by using the method for using motion information of the object and acceleration information:

Embedded module implementation and optimization
The intelligent video surveillance system is classified into the computer-based method and the method based on small devices. Although the computer-based intelligent video surveillance system is actively studied at present, it involves weakness of high power consumption and personal information leaks, and is thus not preferred in actual application. Furthermore, the intelligent video surveillance system based on small devices currently conducts non-combination detection, for example, motion detection, or person detection. Therefore, the system implemented in this paper is an intelligent video surveillance system for intruder detection, fire detection, loitering detection and fall detection by using small devices driven with strength of low cost, light weight and small power consumption. Five algorithm optimization methods and 3 embedded module optimization methods are applied in order to reduce processing time of the implemented intelligent video surveillance system.

Embedded module implementation
An embedded module is defined as a system designed by mounting a microprocessor which is a brain of a machine or device to perform specific tasks. The embedded module features driving with low cost, light weight and small power consumption. Although available open source hardware types are Raspberry Pi, Orange Pi, BeagleBoard, and Arduino, Raspberry Pi is used the most in many fields because of its superior universality and performance. Raspberry Pi is a credit card-sized single-board computer, and ideal as a small device of the real-time intelligent video surveillance system because Raspberry Pi 3 increased its processor from 32 to 64 bits, and its speed from 0.9 GHz to 1.2 GHz in comparison with Raspberry Pi 2 [44].
While object detection is carried out for the videos inputted by means of general RGB cameras, objects cannot be detected or wrong detection can occur in the environment of low lighting conditions and at night as shown in Fig. 7. To address this issue, the Kinect depth camera is used as a device for inputting videos in the embedded module (Fig. 8).
Libfreenect is used in order to use Kinect as a video input device of the embedded module. Libfreenect is a library developed to use Kinect in the Linux environment by openkinect. Five required libraries including python-dev and ipython related to python and opencv are configured to use Libfreenect, and 11 related libraries are configured. The TensorFlow environment is built by configuring 5 required libraries including pip and linux-armv7l to conduct intruder detection based on face recognition using an embedded module. Fifteen libraries including matplotlib and pkg-config are configured to conduct fire detection, loitering detection and fall detection. It is required to configure load balancing, swapping, overclocking and Cython to optimize the embedded module and algorithms. cgroup-bin is used for load balancing; swappiness is used for swapping; arm-freq of config is used for changing overclocking; and setuptools is used for using cython. Ssmtp for sending alert when an abnormal situation occurs and the mock library for sending videos are configured.

Algorithm optimization
It is required the embedded module of lower performance than computers conducts the algorithm and embedded module optimization process to detect abnormal situations in real time. In this paper, 5 algorithm optimization methods and 3 embedded module optimization methods are applied to enhancing processing performance of the embedded module and reducing algorithm processing time as shown in Fig. 9.
The first algorithm optimization method is to use positive integers. The basic operators of the program is integer type, and processing time increases because of the unnecessary process for declaring it in a different variable type, converting and calculating it in the calculation process of driving algorithms and then converting it back to the original variable type. Moreover, because the process is faster than the  Second, the method is for minimizing division and remainder operators. Because division of a standard process has an execution cycle of 20 to 140 for 32 bits of denominators and numerators, it takes longer time than other operators. Therefore, one shift operation is used among fast operators for operation with 2's multipliers, and multiplication instead of division is otherwise used.
Third, the method is for minimizing direct calculation of global variables. Because global variables cannot be allocated to registers for use thereof, they can be used by the method for allocating pointers indirectly or using function calls. Invoking pointers and function calls cause memory waste to increase processing time, and the compiler experiences overload to repeat reading global variables whenever they are directly used. Therefore, a method is used to avoid the use of global variables, and calculate local variables after substituting values for the local variables to use the global variables.
Fourth, a method is used to minimize transfer factors in function calls. Where there are at least 4 transfer factors in function calls, the factors are sent through a stack to enable memory access as big as the stack. Moreover, when a structure is sent as a transfer factor, all values of the structure are sent through the stack to enable memory access. Therefore, a method is used to declare a structure when there are at least 4 transfer factors, and send pointers for the structure.
Last, a method for using Cython language is used. The Cython language is a type of combining the strength of fast productivity of python language with execution speed of C language, and is Progressively Typed Language or Gradual Typing [45]. Figure 10 shows the processing time of the fire detection algorithm depending on whether the algorithm is optimized, in which the vertical axis depicts processing time, and the horizontal axis depicts the n-th operation. Processing time comparison reveals that the average time is 0.32 s before optimizing the algorithm, and 0.24 s after optimizing the algorithm, suggesting reduced processing time by 25%.

Embedded module optimization
The first method for optimizing embedded modules is to remove bottleneck. A bottleneck is the situation that performance or capacity of a system is limited by one component, and lowers speed if it occurs in one core of CPU. Moreover, because the number of clocks of CPU is limited, processing speed is thus limited. Therefore, load balancing and overclocking are applied and used to avoid CPU cores to gather in one of them. The arm-freq used in the experiment was established to be 1300. Figure 11 shows CPU use while driving algorithms, and an increase in CPU use after bottlenecks removal in comparison with before state.  Second, a method is to use memory. The method is divided into a disk-based calculation method and a memory-based calculation method when driving algorithms. However, because the memory access speed is even faster than the disk access speed, swapping is used to conduct memory-based calculation. The experiment was conducted after specifying swapping as 20. Figure 12 shows memory use when driving algorithms, and an increase in memory use after swapping.
Last, a method for using zRam is used. zRam is a method for moving the data in the memory domain to the zRam domain after compression thereof when the memory use is more than a specified level [46]. Although the data movement process can lower speed when a computer with enough memory is used, this is a method for addressing lower speed due to a lack of memory when used in an embedded module with insufficient memory to address the low speed issue (Table 3). Figure 13 shows the result of processing time comparison depending on the use of embedded optimization. When embedded optimization is not used, the average processing time is 0.25 s, but it is 0.24 s when embedded optimization is used, implying a decrease by 4% in terms of processing time.

Experimental results and discussion
The intelligent video surveillance system based on embedded modules and suggested in this paper was implemented by using Raspberry Pi 3, and Kinect v1 was used as a video input device to use a depth camera. Table 4 shows the performance of the abnormal situation detection algorithms, performance of embedded module and PC is the same. For intruder detection, face rotation images for each distance were used to conduct learning with original images and the images when lighting changed as shown in Fig. 14.
The intelligent video surveillance system was used to evaluate performance through real-time face recognition experiments. For evaluating fire detection performance, the fire database of NIST was used, and composed of burning sofa and dry wood videos in an indoor residential environment. The PETS database and self-created database were  used to evaluate the loitering detection performance. The PETS database shows at least one person in the video showing indoor and outdoor environments, and specified people among them are shown loitering. The self-created database shows one object loitering in indoor and outdoor environments. For evaluating the performance of fall detection, the intelligent video surveillance system was used to evaluate the performance while driving the system in real time (Figs. 15, 16). Figures 17,18,19,20 show the result of detected abnormal situation when an abnormal situation of intruders, fires, loitering and fall occurs, and the abnormal situation is remotely sent to notify a relevant user of the abnormal situations. Table 5 illustrates the performance compared before and after optimizing the intelligent video surveillance system which conducted the method for optimizing 5 algorithms and 3 embedded modules applied in this paper. The integrated processing  time of the abnormal situation detection algorithms was 0.95 s before optimization but 0.47 s after optimization, implying a decrease by 50.53%. When each abnormal situation detection algorithm was driven, the processing time of intruder detection algorithm was 0.58 s before optimization but 0.55 s after optimization, implying a decrease by 5.17%. The processing time of fire detection algorithm was 0.32 s before optimization but 0.24 s after optimization, implying a decrease by 25%. The processing time of loitering detection algorithm was 0.21 s before optimization but 0.15 s after optimization, implying a decrease by 28.57%. The processing time of fall detection algorithm was 0.2 s before optimization but 0.15 s after optimization, implying a decrease by 25%. Therefore, we confirmed the real-time operation and various  detection possibility of intelligent video surveillance system based on embedded module. Table 6 illustrates the result of comparing the processing time between the embedded module-based system and the PC by using the same abnormal situation detection algorithm. The image input/output time in the PC showed a difference of 6.46 times faster than the embedded module-based system before optimization, and 4.12 times faster than the embedded module-based system after optimization. The algorithm driving time in the PC showed a difference of 17.9 times faster than the embedded module-based system before optimization, and 8.86 times faster than the embedded module-based system after optimization. Table 7 illustrates the result of comparing performance between the embedded module-based system and the PC used in the experiment. Between the embedded module and the PC, the PC showed better performance by 12 times because of the CPU, number of core and clocks. However, because the embedded module is 44 times smaller with respect to consumed power, 230 times smaller with respect to the weight, and 365 times smaller with respect to volume, it can be used without environmental restrictions. Table 8 shows the comparison of processing times before and after optimization of small devices. System optimization was performed using algorithm optimization and embedded module optimization methods, and processing time was reduced by more than 25%. In the case of using the optimization method proposed in this paper, Intelligent video surveillance system reduced processing time by 50.53%, but accurate comparison is impossible since they are not the same device.

Conclusions
This paper suggests the intelligent video surveillance system based on embedded modules, and applies 5 algorithm optimization and 3 embedded module optimization methods. It is to recognize faces based in taught information to detect intruders, and fires by using color information and motion information. Loitering are detected by using balance point change angles and movement distances of objects, and fall are detected by using motion features and acceleration of the object. The system conducts 5 algorithm optimization methods including the use of positive integers for real-time processing, minimization of remainder and division operators, minimizing direct calculation of global variables, minimization of transfer factors and recursive functions, and using Cython language, and 3 embedded module optimization methods including removing bottleneck, and the use of memory and zRam. The algorithm performance of the suggested intelligent video surveillance system was 88.51% for intruder detection, 92.63% for fire detection, 80% for loitering detection, and 93.54% for fall detection. Moreover, the result of comparing processing time before and after optimization of the intelligent video surveillance system showed that processing time of the integrated algorithm decreased by 50.53%. Driving each abnormal situation algorithm showed decreases of 5.17% for intruder detection, 25% for fire detection, 28.57% for loitering detection, and 25% for fall detection in terms of processing time.
The reason for the least processing time change of intruder detection among 4 algorithms is because optimization was not conducted for the inputted image of the Haarlike feature-based learning file for face detection and for the inputted image of the deep learning file for face recognition. The future plan is to optimize learning files for