Open Access

Three Novell Analog-Domain Algorithms for Motion Detection in Video Surveillance

  • Arnaud Verdant1,
  • Patrick Villard1,
  • Antoine Dupret2Email author and
  • Hervé Mathias3
EURASIP Journal on Image and Video Processing20112011:698914

https://doi.org/10.1155/2011/698914

Received: 1 May 2010

Accepted: 8 December 2010

Published: 18 January 2011

Abstract

As to reduce processing load for video surveillance embedded systems, three low-level motion detection algorithms to be implemented on an analog CMOS image sensor are presented. Allowing on-chip segmentation of moving targets, these algorithms are both robust and compliant to various environments while being power efficient. They feature different trade-offs between detection performance and number of a priori choices. Detailed processing steps are presented for each of these algorithms and a comparative study is proposed with respect to some reference algorithms. Depending on the application, the best algorithm choice is then discussed.

1. Introduction

Motion detection in video surveillance with CMOS Image Sensors (CIS) requires high performance but it also needs to meet power consumption constraints, especially for remote sensing applications.

One way to address this issue is to design ASICs with specific image processing architectures. It allows some low level local analog processing to be performed at the sensor level (prior to A/D conversion), which is particularly power efficient. Thanks to submicron CMOS processes, the in-sensor processing can be performed without significantly impairing the device resolution and sensitivity. In the case of embedded video surveillance with a major concern on autonomy, such a physical motion detection implementation is a particularly interesting task to investigate since it allows extracting relevant information from a scene prior to broadcasting. This could be used to adapt the sensor's performance such as ADC resolution. Power consumption for capturing, storing, and transmitting the video would so be reduced. However, specific adapted algorithms have to be developed concurrently. Since such sensors have to be fully autonomous, these algorithms have to be both robust and compliant to various environments while being at the same time computationally and power efficient.

In the case of quasisteady camera (video still), adaptive environment modeling constitutes a key point in motion segmentation for surveillance systems. Among many works focusing on computer vision, the visual surveillance problem is discussed in [1], where conventional approaches for motion detection are presented. Implementation of optical flow measurement is also an interesting well-known technique in [2, 3]. These precedent approaches focus on optimizing motion detection in CIS but are not concerned with very low power image processing. In addition, optical flow methods based on Two-Frame Differential Method (i.e., Lucas and Kanade [4] or Horn and Schunk [5]) are based on hypotheses such as illumination steadiness. Such hypotheses are not always relevant, especially when objects move fast with respect to the frame rate. The aperture problem also constitutes a limitation to their straightforward implementation. Hence, these algorithms require iterative multiresolution processing as to extract information.

On the other hand, motion detection achieved by estimating background is based on weaker hypotheses. Background updating is an essential task since real-time algorithms for embedded systems have to be efficient in a large number of situations, that is able to adapt their sensitivity to the scene. Image segmentation with difference to background and adaptive threshold has been studied in [6], where the signal variance is computed from recursive average computations and then compared to a threshold obtained by averaging background variance over all the pixels. This method has been improved in [7] where its inherent trailing effect is compensated by a confidence weight representing the confidence of a pixel being part of the foreground. Adaptive threshold for motion detection in outdoor environment has been explored in [8]. The histogram of a distant matrix (obtained with Principal Component Analysis technique) and the variance of a mean image allow adapting the threshold level according to outdoor conditions. Other approaches based on multiple background estimations [9] or adaptive background estimation [10] have also been proposed.

All the precedent methods are efficient but require many operations. Due to the reduced processing resources available in CMOS Image Sensors, computational efficiency is so required yet keeping enough robustness. In order to perform low power motion detection in CIS, other methods based on background modeling have been proposed. In [11] low-level motion detection algorithms are presented and in [12], an efficient algorithm based on Σ-Δ modulation for artificial retinas is described. In this work, robustness improvement to false positives is achieved with local thresholding. For each pixel, background estimation and variance are computed with nonlinear operations to perform adaptive local thresholding.

In our proposed motion detection scheme for increased autonomy, such algorithms [11, 12] need to be improved in terms of false positives and detection efficiency while only using low power operations. The developed algorithms based on low-level computations are designed to be implemented on a versatile analog architecture allowing a wide range of operators and compact processing steps. In this paper, after a short presentation of our architectural choices and their consequences on the associated algorithms (part 2), we describe the motion detection algorithms we take as reference (part 3). We then present the developed motion detection algorithms with associated results and estimated power consumption (part 4). Finally, we discuss the algorithms performance from different points of view in order to balance purely simulated results according to targeted application.

2. Constraints and Targeted Architecture

2.1. Programmable Architecture

The considered programmable computational unit (Figure 1) is a low power SIMD machine based on analog processing [13]. It is composed of an photosensors array to which an array of analog memory points (Analog RAM) is associated, where is the number of memory elements per pixel. In our implementation, we have chosen . Indeed, the analog memory is constrained by technological trade-offs such as silicon area and immunity to noise. The capacitive density is linked to technological parameters (with a typical value of 0.9 fF/μ m²). The temporal noise specifications of our architecture also impose a lower bound for capacitance value for a typical value of about 500 fF). According to these two parameters, 3 memory elements allow to keep reasonable memory area with regard to pixel matrix, while providing enough robustness with regard to noise and impact of parasitic capacitances. and may be up to 1024. The so-formed matrix is bordered on one side by a vector of switched capacitor analog processors. A column of multiplexers selects the column of pixels or memories to be used by the processor. A sequencer, implemented by a digital IP CPU, delivers the successive processor instructions. For each processor instruction, the switches configurations for the OTA and for the associated analog registers are fixed. Hence, motion detection is directly performed on the pixel gray levels (voltage signals). The matrix does not embed Bayer filter. Thus, demosaicing is not required.
Figure 1

Sensor architecture.

This architecture is implemented using a 0.35 μ m CMOS process. It features a 10 μ m pixel pitch with a standard fill factor (30%). With small parasitic capacitors and 3.3 V voltage swing, it constitutes a good compromise with respect to larger or to deep sub-micrometer processes. Moreover, leakages are also reduced compared to more advanced technologies, thus reducing static power consumption as well as defects in Analogue RAM (ARAM).

In order to take advantage of the SIMD architecture parallelism, the motion segmentation has to be performed independently for each pixel. The corresponding processing so requires many identical operations to be performed iteratively. Provided that the variables involved in the computations are independent, a parallel implementation of algorithms is thus possible and interesting in order to reduce the global power consumption. An analog-based computational system is an efficient response to these constraints.

With such an architecture, performing motion detection algorithms in the analog domain can be achieved with little power requirements. For example, mixing capacitors charges at pixel level [14] efficiently performs pixel averaging. A digital counterpart implementation would require numerous computations and power consuming data transfers.

The chosen programmable architecture globally enables the implementation of "simple" algorithms at a much reduced power cost. "Simple" is to be understood as stepwise linear algorithms based on a reduced temporal or spatial convolution kernel. From available basic operators, different low level algorithms can be implemented by suitably programming the architecture. The various operations required by our algorithms can be performed with this parallel architecture, relying on
  1. (i)

    pixel average,

     
  2. (ii)

    recursive average (i.e., weighted sums),

     
  3. (iii)

    fixed step increments/decrements,

     
  4. (iv)

    storage (state).

     

The most used operators are addition, multiplication of a variable by a fixed coefficient, increment, absolute value, and comparison. Conditional operations are needed, their executions depending upon comparison results referred to states.

Our analog-based architecture has been shown to overcome its digital counterparts in [15] in the context of a low power CMOS image sensor based on a waking up scheme for which the presented algorithms have been optimized.

2.2. Methodology

Concluding on algorithm performance is achieved by measuring motion detection performance on Matlab, as well as induced power consumption and temporal noise effect of CMOS devices using a SystemC model of the system (architecture and algorithm).

As to validate our algorithms performance, we have used different 8 bit sequences representative of indoor and outdoor conditions: Walk (IEF's sequence, rustling foliage), Pets 2002 (strobe light), dtneu_schnee (falling snow), and kwbB (i21http://www.ira.uka.de/), respectively (a), (b), (c), and (d) on Figure 2 and Hall Monitor (Figure 4). For instance, the falling snow in the dtneu_schnee sequence and the rustling foliage of Walk sequence both introduce parasitic changes of pixels' grey level and constitute realistic tests for the robustness of our algorithms. In our sequences, the objects to be detected are humans or cars.
Figure 2

Tested sequences for motion detection.

2.3. Metrics Choice and Performance Evaluation

Performance metrics are based on [16]. During the simulation, motion segmentation is performed on gray level images resulting in binary images containing "moving" and "static" pixels. Each image is then divided in blocks of pixels. If a block contains more than a predefined number of moving pixels, this block is then considered as a region of interest (ROI). From experimental evaluations based on a hand generated ground truth, an ROI can be considered as active when 5 to 10% of the pixels are "moving". Measurements for reference algorithms as well as proposed new ones are based on this value. For each frame, the state of each block is stored in a vector. This vector is compared to a reference which indicates ground truth information for the current frame. The number of True Positives and False Positives and Negatives can thus be counted ( , , , ).

Our considered performance criteria are
  1. (i)

    Detection Rate ( ), which is the ability of the algorithm to detect moving objects,

     
  2. (ii)

    False Alarm Rate ( ) which estimates detection quality,

     
  3. (iii)

    False Positive Rate ( ), which is representative of algorithm robustness.

     

In our sequence, nonrelevant motion concerns static elements of the scene or other elements such as snow in dtneu_schnee sequence, rustling foliage in Walk and kwbB sequences and strobe light in Pets 2002 sequence.

We have developed a faithful, Cycle Accurate, SystemC behavioral model of the architecture [17]. This model enables to jointly simulate the proposed algorithms and the processing architecture. This SystemC modeling is used to determine the number of instructions and the instruction rate required for each algorithm. The SystemC modeling also enables checking the consistency between the results obtained by the model and purely algorithmic results. A log file allows tracing instructions and data, hence enabling to check the whole coherence of the architecture for any conflicts during the parallel processing.

In order to take into account the impact of the nonidealities introduced by the analog parts and to get an accurate evaluation of power consumption, the analog blocks composing the architecture have been described at a low level, down to simple components like switches, capacitors, OTAs. For all these elementary blocks, relevant nonidealities have been modeled with respect to the target CMOS technology and validated thanks to classical electrical simulations (Spice-like). The power consumptions given in the next parts derive from this SystemC modeling of our architecture. Some hints about these aspects of the works have been exposed in [17].

3. Starting Point: and Algorithms

The embedded power motion detection algorithms have to meet two requirements: limited complexity, as to comply with our CIS computational limitations and high performance. In order to perform adaptive motion detection, background modeling has been chosen because of its computationally efficient implementation. In [11], two techniques allowing adaptive background modeling are presented. These algorithms perform local computations (i.e., from each pixel value) in order to generate low pass filtering on the observed scene. Approaches based on connected-component extraction, object merging, clustering are not explored here, because they require too intensive calculations with regard to the aimed architecture.

3.1. Background Estimation Using and Recursive Average Algorithms

The autonomous remote CIS we develop must perform motion detection in unknown and potentially changing environments. In such configurations, algorithms must meet hard constraints of robustness and adaptability. Markovian algorithms are generally used to face these situations. However, with respect to the considered power consumption and computational constraints, we had to simplify algorithms of this class while preserving their robustness.

As reference algorithms, we consider the Recursive Average ( ) algorithm and the algorithm, respectively, presented in [11, 12]. Both feature simple arithmetic computations. Moreover, the algorithm, which follows the Markov model and has been used for real-time implementations in [18, 19], provides high robustness.

3.1.1. Recursive Average: Principle

A first technique exposed in [11] relies on recursive operations. Considering a pixel value (from 0 to 255), its background estimation is obtained from (1), with a large time constant fixed by .
(1)
As to evaluate the impact of time constants and other algorithm parameters, we plot the temporal variations of a pixel grey level along with its filtered output. The slower the to be detected object, the higher the required time constant. Figure 3 illustrates low pass filtering of a pixel signal using . Not surprisingly from Figure 3, we can see that a proper choice of , depending on frame rate, enables to extract background from moving objects. Yet this representation will help us explain the other algorithms. The visual impact of is shown on Figure 4 showing estimated background with two different time constants.
Figure 3

Background estimation ( ) with recursive average filtering for a temporal pixel variation ( ) as a function of time.

Figure 4

Estimated background from an original image (a) ( Hall Monitor sequence), with (b) and (c).

Motion is then considered when the absolute difference between the estimated background and the processed pixel level is greater than a static global threshold (2).
(2)
This algorithm so performs basic motion detection while being well suited for our analog implementation. However, local thresholding must be considered to improve robustness. Motion detection performance is exposed on Table 1.
Table 1

Motion detection performance of two state-of-the-art algorithms.

 

Grey level sequence

Performance metrics (%)

Detection Rate (DR)

Hall

97.3

94.2

 

kwbB

97.8

94.6

 

Walk

100

99.1

 

Pets 2002

95.8

93.3

 

dtneu_schnee

99.9

91.6

False Alarm Rate (FAR)

Hall

79.3

16.3

 

kwbB

81.7

32.4

 

Walk

84.8

86.7

 

Pets 2002

85.0

28.3

 

dtneu_schnee

54.8

43.7

False Positive Rate (FPR)

Hall

42.0

2.5

 

kwbB

15.4

2.7

 

Walk

59.2

60.5

 

Pets 2002

16.5

1.6

 

dtneu_schnee

24.3

14.5

3.1.2. : Principle

The second method presented in [12] is based on nonlinear operations with Σ-Δ modulations. According to successive comparisons with signal value (3), a variable is here incremented (4) or decremented (5) by a constant value so as to fit the pixel level .
(3)
(4)
(5)
As for on Figure 3, Figure 5 illustrates low pass filtering of a pixel signal with modulation method.
Figure 5

Background estimation with modulation. is the pixel gray level value, is the estimation of the background as a function of time.

Considering an analogue implementation, the main advantage of this method is that it features more flexibility than the algorithm. Indeed, estimated background variations can be adjusted by incrementation/decrementation steps, whereas time constant values of recursive averages are limited by the physical implementation of the computation. In our architecture, these time constant values are fixed by the ratios of the capacitances on which the signals charges are shared.

Figure 6 shows the estimated background obtained with modulations on the Hall Monitor sequence.
Figure 6

Result of background estimation on Hall Monitor sequence with modulations. Notice the trailing effect generating a "ghost".

For motion detection, based on the same modulations than (4) or (5), a variable is generated. It can be interpreted as the signal variance and allows to threshold the absolute difference between the pixel signal and the estimated background (Figure 7). Motion is detected when is higher than .
(6)
Figure 7

algorithm. is the pixel gray level value and the background estimation, and the threshold of .

Instead of the global threshold used in , the algorithm so computes a local adaptive threshold for each pixel as to achieve more robustness on noisy elements, while keeping enough sensitivity on static background. Thanks to the observed scene nonuniformity, local thresholding is computed according to the temporal activity of each zone. Moreover, this algorithm features no trailing effects, at the cost of a poor band pass filtering capability.

3.1.3. Recursive Average and Performance

Table 1 presents the motion performance of state-of-the-art algorithms. The value used for the algorithm is 25. The value used for the algorithm (required for threshold processing) is 15.

exhibits poor robustness. Indeed, this algorithm requires setting a global threshold that constitutes the main limitation of this method since no sensitivity adaptation according to scene activity can be performed. Moreover, exhibits phase shifting resulting in trailing effects and poor band pass filtering. More specifically, this algorithm does not allow high frequency rejection along with background subtraction.

The motion detection performance exposed for the algorithm clearly shows the interest of local adaptive thresholding compared to the global one used by the algorithm.

However, the on-chip motion detection information can be used to adapt the sensor performance (e.g., higher ADC accuracy on moving pixels). In order to keep a reasonable global power consumption (a few mW), an improved robustness of these on-chip motion detection analog domain algorithms is still required while keeping high detection rate.

4. Algorithms

We now describe our three designed motion segmentation algorithms for CIS:
  1. (i)

    a first algorithm running with no a priori determination of constant, based on scene activity to adapt its sensitivity,

     
  2. (ii)

    a second algorithm using band pass filtering in order to reduce false positives upon high frequency pixel variations,

     
  3. (iii)

    finally, an algorithm featuring only one constant to determine a priori, and reducing the trailing effect induced by recursive averaging.

     

4.1. Scene-Based Adaptive Algorithm (SBA)

In order to improve adaptability, we now present the Scene-Based Adaptive (SBA) algorithm. This algorithm derives from the algorithm in [12]. It performs motion segmentation on gray level sequences with no a priori constant determination, like the N constant used in . Based on modulations, the SBA algorithm is also compliant with the reduced available computational resources of CIS architectures, thus eliminating true Markovian approaches.

Our idea is to get rid of constants related to the background of the scene. The detection of grey level variations resulting from motion derives from the absolute difference between the last extremum and the current pixel value (Figure 8). Instead of detecting grey level variations like in (4) and (5), this filter requires no constant setting.
Figure 8

Extracting the signal's variations ( ) according to SBA.

The value generated is now used to perform adaptive motion detection with the technique presented below.

First, the mean value of is computed (7). Considering that insignificant motions of the background introduce only small variations changes, the idea is to favor large signal variations at the expense of small ones. A convex function is so needed to amplify . Therefore, (8) introduces which is an approximation of . Indeed, our switched capacitor architecture enables only multiplication between a digital number (i.e., the steps of ) and an analog value (i.e., ).

In order to reduce the trailing effects, the next step consists in building an adjustable increment, much like in adaptive . A third variable is thus obtained from the signal value (9). Indeed, derives from a modulation of the signal value using an increment equal to . If the absolute difference between and is larger than (10), then the pixel variation is reckoned as relevant and motion is detected.
(7)
(8)
(9)
(10)

The absolute difference between and can be seen as the maximal estimated signal dispersion. A larger variation than the estimated one is considered due to a relevant moving object (10). Apart from the increment or decrement level, this algorithm runs without any a priori fixed constant.

Figure 9 illustrates SBA computations of a pixel signal. In absence of motion, one can notice that fits ( ). Compared to , the estimator of the background can have a steeper slope when large signal variations occur. Reciprocally, small changes of the pixel grey level lead to long time constants.
Figure 9

Second computation of a pixel signal with SBA algorithm. is the pixel gray level value, with and as, respectively, expressed in (8) and (9).

Figure 10 illustrates motion detection performed with the and SBA algorithms. In the presented algorithm, some trailing effect can be observed but with a better robustness: in this illustration, the rustling foliage is filtered while motion detection is preserved on the pedestrian.
Figure 10

(a) Original image, (b) Motion detection with , and(c) Motion detection with SBA.

4.2. Recursive Average with Estimator Algorithm (RAE)

In various outdoor situations, many false alarm sources can be encountered. Despite the fact that the static background encountered in urban area does not provide such constraints, weather conditions in the same areas can lead to increased FPR and FAR. In [12], no high frequency rejection is performed, thus implying numerous false positives.

Figure 12(b) illustrates motion detection, performed at a crossroad under falling snow, with the algorithm. In order to improve motion detection robustness by rejecting high frequency variations, we have designed an algorithm featuring band pass filtering. It is also based on recursive average which can be compactly implemented considering charge transfer between capacitances. Though having the same degree of complexity, the designed algorithm is thus optimized for an analog-based architecture, compared to delta modulation.

4.3. Recursive Average with Estimator Algorithm (RAE)

In various outdoor situations, many false alarm sources can be encountered. Despite the fact that the static background encountered in urban area does not provide such constraints, weather conditions in the same areas can lead to increased FPR and FAR. In [12], no high frequency rejection is performed, thus implying numerous false positives.

Figure 12(b) illustrates motion detection, performed at a crossroad under falling snow, with the algorithm. In order to improve motion detection robustness by rejecting high frequency variations, we have designed an algorithm featuring band pass filtering. It is also based on recursive average which can be compactly implemented considering charge transfer between capacitances. Though having the same degree of complexity, the designed algorithm is thus optimized for an analog-based architecture, compared to delta modulation.

This algorithm is thus based on a background estimation extracted from the difference between two low pass filters. The computation of two recursive averages ( (12) and (13)), each with its own time constant (fixed by the and parameters), allows here to define a band pass filter: the slowest is used to bring out the background while the other, with short lag, filters out the signal's fast perturbations. For each pixel, the main computation steps are described below. represents the frame index, the current gray level value for the considered block, and a local threshold (14).
(11)
(12)
(13)
(14)
An adaptive threshold based on the temporal variations of this absolute difference allows detecting motion. If this estimator becomes larger than a local threshold , which depends on the temporal activity, motion is detected. acts as a band-pass filter selecting only moving objects of interest in the scene. The adaptive threshold is obtained by using , the recursive average of , as a variable amplifying gain for the threshold (17). The increase of the threshold level , due to signal variations, can be seen on Figure 11. With this method, directly depends on perturbation level, periodicity or persistence. To prevent saturation (considering either analog or fixed point implementation), is amplified rather than . The time constant of this threshold must be quite large with respect to pertinent scene motions in order to adapt the sensitivity to persistent perturbations only.
Figure 11

Computation of a pixel signal with the RAE algorithm. is the pixel gray level value with the variables , , , and as, respectively, expressed in (12), (13), (14), and (17).

Figure 12

Motion segmentation with the algorithm ( ) (b) and the RAE algorithm (c).

These recursive operations with few memory requirements make this algorithm easy to implement on our architecture. The time constant for fast recursive average can be determined in order to allow an efficient fast perturbations filtering while not inducing significant trail effect. Considering the -transform of the recursive average, the time constant is given as follows:
(15)
The response to a step function with amplitude of the transfer function defined by is expressed in (16), with and being the constants used in (12) and (13).
(16)
In this algorithm, the two constants ( ) depend on the to-be detected objects properties (i.e., size and speed) and on the frame rate. However, knowing the type of object to be detected, local adaptive thresholding is achieved. In the following section, these ( ) constants have been, respectively, set to (22, 24) for the simulations performed on the reference sequences, with a 25 Hz frame rate. The class of objects to detect here are cars or pedestrians. The power of two based sizing for and facilitates our analog implementation with regard to component matching. With , the 95% rise time is which corresponds approximately to 50 frames at 25 fps. Considering tested videos, this value has experimentally shown efficient background estimation. Choosing is a good compromise between implementation constraints and filtering efficiency (in order not to reduce DR, while improving FAR).
(17)

The constant has been set to 26 ( or 200 frames). The constant can be typically set around 2 and can be increased in order to reduce false positives.

Figure 11 illustrates computations of a pixel signal using the proposed algorithm.

One can notice that this algorithm can bring efficient filtering of high frequency perturbations. However, some trailing effect is observed with the RAE algorithm (not obtained with ). Figure 12 illustrates RAE applied on the dtneu_schnee sequence with falling snow. With the same sensitivity as , this algorithm allows to filter these high frequency perturbations.

4.4. Adaptive Wrapping Thresholding Algorithm (AWT)

Although being robust and computationally efficient, the and RAE algorithms require determining some constants. According to the known frame rate, the , , and constants of RAE as well as the increment level of can be determined a priori. However, the RAE constant or the constant allows adjusting the algorithm sensitivity in accordance with the amplitude of noisy elements. In order to avoid defining a priori constants, an Adaptive Wrapping Thresholding motion detection algorithm (AWT), based on recursive average operations with a reduced number of constants, is presented in this section. Unlike common algorithms based on recursive low pass filtering [6], this algorithm also limits the trailing effect due to phase shifting.

We thus propose an algorithm based on recursive average operations performing local adaptive thresholding from each pixel signal (Figure 13). In the two precedent algorithms (SBA and RAE), motion detection is performed by thresholding temporal variations ( ). We propose here to compute two wrapping variables in order to detect significant variations of the signal. These two variables are used to define the upper and lower bounds between which the grey level of the signal should remains. In order to take into account the variations of the background, those two variables are updated using a low pass-filter. Yet the time constant of these filters can be much larger than the ones used in and even SBA.
Figure 13

Computation of a pixel signal with AWT algorithm. is the pixel gray level value, with the variables , , , , and as, respectively, expressed in (19), (21), (22), (23), and (20).

This algorithm relies on a background estimation for each pixel signal from which we estimate the signal standard deviation. This standard deviation is then used to estimate a maximum range for background variations. If the value of a considered pixel moves outside this estimated range of background variations, we consider that motion occurs.

First of all, background estimation ( ) is computed recursively (19). The temporal variations ( ) are extracted as absolute difference between the pixel signal ( ) and the background estimation (20). The mean deviation of the estimated background variations ( ) is then calculated from ( ) (21). In a fourth step, two variables ( and ) are computed (22) and (23), which allow here to define the estimated range of maximum background variations. Motion is then considered according to (24).
(18)
(19)
(20)
(21)
(22)
(23)
(24)

Hence this algorithm relies on a constant, , allowing to determine the time constant of recursive averages (equivalent to increment/decrement levels of the algorithm [12]). However, no additional constant is required to handle sensitivity, unlike or RAE where a coefficient is required to set the threshold level. Computations of and allow here to define adaptive thresholding directly from the signal variations (Figure 13).

Furthermore, this method allows reducing the trailing effect observed with common motion detection algorithms based on recursive average. Indeed, recursive average based on signal level induces phase shifting and trail effect on target. With this algorithm, the double condition in motion detection with and reduces the trailing effect (Figure 14).
Figure 14

Comparison between algorithm (b) and AWT (c) algorithm on kwbB sequence.

Unlike , SBA or RAE, there is no need for a multiplication operation. From our analog implementation point of view, this constitutes an improvement since there is no need to implement multiple capacitors to get a wide range of constants for multiplication.

5. Results

5.1. Algorithms Performance

Table 2 exposes the different results of the state-of-the-art algorithms ( and ), as well as new ones (SBA, RAE, and AWT).
Table 2

Motion detection performance.

 

Grey level sequence

Performance metrics (%)

    

SBA

RAE

AWT

Detection Rate (DR)

Hall

97.3

94.2

93.5

94.8

92.8

 

kwbB

97.8

94.6

94

96.4

96.6

 

Walk

100

99.1

99.3

99.5

99.3

 

Pets 2002

95.8

93.3

94.1

93

94.6

 

dtneu_schnee

99.9

91.6

90.1

87.5

90.1

False Alarm Rate (FAR)

Hall

79.3

16.3

14.9

12.6

16.7

 

kwbB

81.7

32.4

27.4

26.4

36.8

 

Walk

84.8

86.7

83.4

85.7

85

 

Pets 2002

85.0

28.3

43.4

26.2

29.8

 

dtneu_schnee

54.8

43.7

54.9

11.9

45.2

False Positive Rate (FPR)

Hall

42.0

2.5

2.2

1.8

2.5

 

kwbB

15.4

2.7

1.7

1.7

3.0

 

Walk

59.2

60.5

46.7

56

52.9

 

Pets 2002

16.5

1.6

3.9

1.2

1.6

 

dtneu_schnee

24.3

14.5

22.1

1.8

13.3

Number of Instructions

6

30

43

21

32

Simulations performed on sequences with the SBA algorithm without any arbitrary constant (Table 3) provides quite similar detection rate along with close FAR and FPR measurements, compared to measurements (Table 2). This algorithm thus provides equivalent detection efficiency and robustness, with no need for constant settling, thus showing improved adaptability. Although it does not feature a high frequency rejection, a satisfying detection performance is achieved on gray level sequences.
Table 3

Motion detection performance.

Algorithm

Average parameter variation on 5 sequences (%)

 

DR

FAR

FPR

−0.9

50.8

161.3

SBA

−13.3

9.2

−11.6

RAE

0.7

4.7

8.9

AWT

−0.2

−4.4

−8.3

The results exposed on Table 4 show that RAE is equivalent to in terms of DR for all sequences. However, better results are obtained by our algorithm with respect to FPR and FAR. This algorithm so features different variables allowing motion segmentation on gray level sequences with a good sensitivity and high frequency rejection. However, a constant k allowing threshold setting is required and some trailing effect is generated.
Table 4

Average motion detection performance.

Algorithm

Performance metrics (%)

 

FAR

DR

FPR

41.5

94.6

16.3

SBA

44.8

94.2

15.3

RAE

32.6

94.2

12.5

AWT

42.7

94.7

14.6

The AWT algorithm results are slightly below the performance levels of RAE. However, no a priori choice of threshold sensitivity has been made. Hence these results highlight interesting performance about motion detection without environment knowledge.

TheWalk sequence denotes reduced robustness here. Although rustling foliage is efficiently filtered out by our algorithms, the motion of the tree branches has the same speed and amplitude characteristics as the objects to be detected (e.g., humans). The single processing is not robust to such motion.

The power consumption is proportional to the Number of Instructions (NOI). From SystemC simulations applied to 30 fps video sequences, we have estimated a power consumption below 5 mW for the worst case (SBA algorithm). This is less than the power consumption of a state of the art 3 M samples/s 10-bit Successive Approximation Register (SAR) ADC designed in the same technology, that is between 10 and 20 mW. The SAR are known to be the least power consuming ADC architectures. This validates the relevance of the algorithm architecture codesign since a digital implementation of those algorithms would require such an ADC plus a digital processing unit. Furthermore, our analog processing unit derives from a SAR ADC; therefore, the scaling of the CMOS technology brings the same improvements as for the classical SAR ADC.

So as to take into account technological parameters in these simulations, temporal noise had been added in these sequences via our SystemC model. Indeed, in our architecture, several noise sources create signal variations that can be interpreted as relevant motion. In our model, the 8-bit images are converted into voltage signal on a 1.8 V dynamic range. An additional Gaussian noise with a 1.1 mV standard deviation is added to each image. During processing, a second Gaussian noise source with a 0.25 mV standard deviation is added to each operation to model analog processor nonidealities.

Table 3 presents the impact of noise on analog processing on the different motion detection parameters considered.

We can see that in the case of SBA and , DR is reduced while FAR is increased. For these two algorithms, noise induces less sensitivity on relevant part of the scene, while decreasing global robustness. These results highlight the lower robustness of these two algorithms when implemented in our analog architecture. Concerning the RAE algorithm, both DR and FAR are amplified. This can be due to an insufficient threshold amplification. For AWT algorithm, the whole parameters are decreased. The threshold amplification is too high for this one, leading to less sensitivity on the whole images. However, the noise added on recursive average-based processing (RAE, AWT) induces fewer variations for the selected parameters. Thus we can consider that the recursive average-based methods are more robust than the ones based on modulations ( , SBA), when implemented in our analog architecture.

5.2. Discussion

In the precedent part, we have presented 3 robust and fast new algorithms and compared them to the reference algorithm. Based on particular parameters allowing the measurement of motion detection performance, such as detection rate or false positive rate, we have determined the robustness or detection efficiency of these algorithms. The average results for the tested sequences are presented on Table 4.

However, these results must be balanced by some factors. Indeed, we can define some criteria allowing taking into account implementation constraints such as power consumption or other limitations like the kind of targeted application for motion detection algorithm. We have exposed below some of the criteria, which can be found according to motion detection context. Table 5 illustrates the rates of each algorithm according to these criteria.
  1. (1)

    settings: the fewer the required constants for adapting threshold level or time constants, the more autonomous the left-behind sensor,

     
  2. (2)

    adaptation: threshold level evolution according to pixel temporal activity,

     
  3. (3)

    high frequency rejection: high frequency noise filtering of pixel signal (band pass filtering),

     
  4. (4)

    trailing effect: artefacts or motion segmentation distortion due to phase shifting induced by algorithm,

     
  5. (5)

    robustness: number of generated false positives,

     
  6. (6)

    computational efficiency: induced power consumption (mainly depending on the number of instructions in our implementation),

     
  7. (7)

    robustness with regard to analog implementation (temporal noise).

     
Table 5

Balanced algorithm performance according to selected criteria.

Algo.

Criteria

 

1

2

3

4

5

6

7

+ +

 

+

+

±

+

SBA

+

+

+

RAE

+

+ +

+

+

+

AWT

+

+

+

+

±

+

These qualitative results show that, depending on the aimed application, an algorithm can prevail on another, even if its motion detection performance is worse. However, AWT and RAE are better suited for an analog implementation.

6. Conclusion

Three algorithms developed using a codesign approach have been presented. They perform motion detection at reduced power consumption while ensuring fast and robust computation. Compared to classical sensors performing motion detection downstream the image acquisition, the offered processing capabilities are somehow limited, but the chosen analog architecture, on which they are implemented, offers a better compromise between power consumption and algorithm performance. Moreover, considering only the algorithmic aspect of the works, significant improvements have been brought in terms of self-adaptability to the scene. Constants involved in the presented algorithms are indeed mostly depending on the nature of the objects to be detected (speed and size).

Though these algorithms have been tailored for a dedicated architecture, a real-time implementation on a standard digital processor (e.g., an ARM920T) is however possible but at a significantly higher power consumption (roughly some 100 mW for the processor alone).

Finally, an ASIC is currently being designed as to provide an experimental validation of the concept. One of its main features is that the pixel area ( μ m2) is very close to state-of-the-art pixels in similar technology (0.35 μ m CMOS).

Authors’ Affiliations

(1)
CEA, LETI, MINATEC
(2)
ESYCOM-ESIEE Paris, 2, Boulevard Blaise Pascal, Cité DESCARTES
(3)
IEF, Bâtiment 220

References

  1. Hu W, Tan T, Wang L, Maybank S: A survey on visual surveillance of object motion and behaviors. IEEE Transactions on Systems, Man and Cybernetics Part C 2004, 34(3):334-352. 10.1109/TSMCC.2004.829274View ArticleGoogle Scholar
  2. Moini A, Bouzerdoum A, Eshraghian K, Yakovleff A, Nguyen XT, Blanksby A, Beare R, Abbott D, Bogner RE: An insect vision-based motion detection chip. IEEE Journal of Solid-State Circuits 1997, 32(2):279-284. 10.1109/4.551924View ArticleGoogle Scholar
  3. Mehta S, Etienne-Cummings R: Normal optical flow measurement on a CMOS APS imager. Proceedings of the IEEE International Symposium on Cirquits and Systems (ISCAS '04), May 2004 4: 848-851.Google Scholar
  4. Lucas BD, Kanade T: An iterative image registration technique with an application to stereo vision. Proceedings of the 7th International Joint Conference on Artificial Intelligence (IJCAI '81), April 1981 674-679.Google Scholar
  5. Horn BKP, Schunck BG: Determining optical flow. Artificial Intelligence 1981, 17(1-3):185-203. 10.1016/0004-3702(81)90024-2View ArticleGoogle Scholar
  6. Joo S, Zheng Q: A temporal variance-based moving target detector. Proceedings of the IEEE Workshop on Performance Analysis of Video Surveillance and Tracking (PETS '05), January 2005Google Scholar
  7. Abdelkader MF, Chellappa R, Zheng Q, Chan AL: Integrated motion detection and tracking for visual surveillance. Proceedings of the 4th IEEE International Conference on Computer Vision Systems (ICVS '06), January 2006 28.View ArticleGoogle Scholar
  8. Vázquez JF, Mazo M, Lázaro JL, Luna CA, Ureña J, Garcia JJ, Guillan E: Adaptive threshold for motion detection in outdoor environment using computer vision. Proceedings of the IEEE International Symposium on Industrial Electronics (ISIE '05), June 2005 3: 1233-1237.View ArticleGoogle Scholar
  9. Pan W, Wu K, Chai Z, You ZS: A background reconstruction method based on double-background. Proceedings of the 4th International Conference on Image and Graphics (ICIG '07), August 2007 502-507.View ArticleGoogle Scholar
  10. Guo J, Rajan D, Chng ES: Motion detection with adaptive background and dynamic thresholds. Proceedings of the 5th International Conference on Information, Communications and Signal Processing, December 2005 41-45.Google Scholar
  11. Richefeu J, Manzanera A: Motion detection with smart sensor. Proceedings of the 9th Congress Young Searchers in Computer Vision (ORASIS '05), May 2005Google Scholar
  12. Manzanera A, Richefeu JC: A new motion detection algorithm based on Σ-Δ background estimation. Pattern Recognition Letters 2007, 28(3):320-328. 10.1016/j.patrec.2006.04.007View ArticleGoogle Scholar
  13. Moutault S, Mathias H, Klein JO, Dupret A: An improved analog computation cell for Paris II, a programmable vision chip. Proceedings of the IEEE International Symposium on Cirquits and Systems (ISCAS '04), May 2004 453-456.Google Scholar
  14. Massie M, Baxter C, Curzan JP, McCarley P, Etienne-Cummings R: Vision chip for navigating and controlling micro unmanned aerial vehicles. Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS '03), May 2003 3: 786-789.Google Scholar
  15. Verdant A, Dupret A, Mathias H, Villard P, Lacassagne L: Adaptive multiresolution for low power CMOS image sensor. Proceedings of the 14th IEEE International Conference on Image Processing (ICIP '06), September-October 2007, San Antonio, Tex, USA 5: 185-188.Google Scholar
  16. Black J, Ellis TJ, Rosin P: A novel method for video tracking performance evaluation. Proceedings of the IEEE Workshop on Performance Analysis of Video Surveillance and Tracking (PETS '03), October 2003 125-132.Google Scholar
  17. Verdant A, Villard P, Dupret A, Mathias H: SystemC validation of a low power analog CMOS image sensor architecture. Proceedings of the IEEE North-East Workshop on Circuits and Systems (NEWCAS '07), August 2007 903-906.Google Scholar
  18. Lacassagne L, Milgram M, Garda P: Motion detection, labeling, data association and tracking, in real-time on RISC computer. Proceedings of International Conference on Image Analysis and Processing (ICIP '99), 1999, Venice, Italy 520-525.Google Scholar
  19. Denoulet J, Mostafaoui G, Lacassagne L, Mérigot A: Implementing motion Markov detection on general purpose processor and associative mesh. Proceedings of the 7th International Workshop on Computer Architecture for Machine Perception (CAMP '05), July 2005, Palermo, Italy 288-293.View ArticleGoogle Scholar

Copyright

© Arnaud Verdant et al. 2011

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.