- Research Article
- Open Access
Human Motion Analysis via Statistical Motion Processing and Sequential Change Detection
© Alexia Briassouli et al. 2009
- Received: 31 January 2009
- Accepted: 15 July 2009
- Published: 13 October 2009
The widespread use of digital multimedia in applications, such as security, surveillance, and the semantic web, has made the automated characterization of human activity necessary. In this work, a method for the characterization of multiple human activities based on statistical processing of the video data is presented. First the active pixels of the video are detected, resulting in a binary mask called the Activity Area. Sequential change detection is then applied to the data examined in order to detect at which time instants there are changes in the activity taking place. This leads to the separation of the video sequence into segments with different activities. The change times are examined for periodicity or repetitiveness in the human actions. The Activity Areas and their temporal weighted versions, the Activity History Areas, for the extracted subsequences are used for activity recognition. Experiments with a wide range of indoors and outdoors videos of various human motions, including challenging videos with dynamic backgrounds, demonstrate the proposed system's good performance.
- False Alarm
- Activity Area
- Change Point
- Activity Recognition
- Translational Motion
The area of human motion analysis is one of the most active research areas in computer vision, with applications in numerous fields such as surveillance, content-based retrieval, storage, and virtual reality. A wide range of methods has been developed over the years to deal with problems like human detection, tracking, recognition, the analysis of activity in video, and the characterization of human motions .
One large category of approaches for the analysis of human motions is structure-based, using cues from the human body for tracking and action recognition . The human body can be modeled in D or D, with or without explicit shape models . Model-based methods include the representation of humans as stick figures , cardboard models , volumetric models , as well as hybrid methods that track both edges and regions . Structure-based approaches that do not use explicit models detect features , objects , or silhouettes , which are then tracked and their motion is classified. Feature-based methods are sensitive to local noise and occlusions, and the number of features is not always sufficient for tracking or recognition. Statistical shape models such as Active Contours have also been examined for human motion analysis , but they are sensitive to occlusions and require good initialization.
Another large category of approaches extracts cues about the activity taking place from motion information . One such approach examines the global shape of motion features, which are found to provide enough information for recognition . The periodicity of human motions is used in  to derive templates for each action class, but at a high computational cost, as it is based on the correlation of successive video frames. In , actions are modeled by temporal templates, that is, binary and grayscale masks that characterize the area of activity. Motion Energy Images (MEIs) are binary masks indicating which pixels are active throughout the video, while Motion History Images (MHIs) are grayscale, as they incorporate history information, that is, which pixels moved most recently. This approach is computationally efficient, but cannot deal with repetitive actions, as their signatures overwrite each other in the MHI. In , spatiotemporal information from the video is used to create "space-time shapes" which characterize human activities in space and time. However, these spatio-temporal characteristics are specific to human actions, limiting the method to this domain only. Additionally, the translational component of motions cannot be dealt with in .
Both structure and motion information can be taken into account for human action analysis using Hidden Markov Models (HMMs), which model the temporal evolution of events [17, 18]. However, the HMM approach requires significant training to perform well  and, like all model-based methods, its performance depends on how well the chosen model parameters represent the human action.
In this work, a novel, motion-based nonparametric approach to the problem of human motion analysis is presented. Since it is not model-based, it does not suffer from sensitivity to the correct choice of model, nor is it constrained by it. Additionally, it is based on generally applicable statistical techniques, so it can be extended to a wide range of videos, in various domains. Finally, it does not require extensive training for recognition, so it is not computationally intensive, nor dependent on the training data available.
1.1. Proposed Framework
The second stage of the system is one of the main novel points of this framework, as it leads to the detection of changes in activity in a non ad-hoc manner. In the current literature, temporal changes in video are only found in the context of shot detection, where the video is separated into subsequences that have been filmed in different manners. However, this separation is not always useful, as a shot may contain several activities. The proposed approach separates the video in a meaningful manner, into subsequences corresponding to different activities by applying sequential change detection methods. The input, that is, interframe illumination variations, is processed sequentially as it arrives, to decide if a change has occurred at each frame. Thus, changes in activity can be detected in the real time, and the video sequence can then be separated into segments that contain different actions. The times of change are further examined to see if periodicity or repetitiveness is present in the actions.
After the change detection step, the data in each subsequence between the detected change points is processed for more detailed analysis of the activity in it. Activity Areas and a temporally weighted version of them called the Activity History Areas are extracted for the resulting subsequences. The shape of the Activity Areas is used for recognition of the activities taking place: the outline of each Activity Area is described by the Fourier Shape Descriptors (see Section 5), which are compared to each other using the Euclidean distance, for recognition. When different activities have a similar Activity Area (e.g., a person walking and running), the Activity History Areas (AHAs) are used to discriminate between them, as they contain information about the temporal evolution of these actions. This is achieved by estimating the Mahalanobis distance between appropriate features of the AHAs, like their slope and magnitude (see Section 5 for details). It is important to note that Activity History Areas would have the same limitations as MHIs  if they were applied on the entire video sequence: the repetitions of an activity would overwrite the previous activity history information, so the Activity History Area would not provide any new information. This issue is overcome in the proposed system, as the video is already divided into segments containing different activities, so that Activity History Areas are extracted for each repeating component of the motion separately, and no overwriting takes place.
In the proposed system, the interframe illumination variations are initially processed statistically in order to find the Activity Area, a binary mask similar to the MEIs of , which can be used for activity recognition. Unlike the MEI, the Activity Areas are extracted via higher-order statistical processing, which makes them more robust to additive noise and small background motions. Interframe illumination variations, resulting from frame differences or optical flow estimates (both referred to as "illumination variations" in the sequel), can be mapped to the following two hypotheses:
where are the illumination variations for a static/active pixel, respectively, at frame and pixel . The term corresponds to measurement noise and is caused by pixel motion. The background is considered to be static, so only the pixels of moving objects correspond to . The distribution of the measurement noise is unknown, however, it can be sufficiently well modeled by a Gaussian distribution, as in [20, 21]. In literature, the background is often modeled by mixtures of Gaussian distributions , but this modeling is computationally costly and not reliable in the presence of significant background changes (e.g., a change in lighting), as it does not always adapt to them quickly enough. The method used here is actually robust to deviations of the data from the simple Gaussian model [23, 24], so even in such cases, it provides accurate, reliable results at a much lower computational cost.
The illumination variations of static pixels are caused by measurement noise, so their values over time should follow a Gaussian distribution. A classical test of data Gaussianity is the kurtosis , which is equal to zero for Gaussian data, and defined as
In order to find the active pixels, that is, Activity Areas, the illumination variations at each pixel are accumulated over the entire video and their kurtosis is estimated from (2). Even if in practice the static pixels do not follow a strictly Gaussian distribution, their kurtosis is still significantly lower (by orders of magnitude) than that of active pixels. This is clearly obvious in the experimental results, where the regions of activity are indeed correctly localized, as well as in the simulations that follow.
As mentioned in Section 1.1, the Activity Area is not always sufficient for recognizing activities, as some actions can lead to Activity Areas with very similar shapes. For example, different translational motions like jogging, running, and walking have similar Activity Areas, although they evolve differently in time. Thus, information about their temporal evolution should be used to discriminate amongst them. The temporal evolution of activities is captured by the Activity History Area (AHA), which is similar to the Motion History Area of , but extracted using the kurtosis, as in Section 2, rather than straightforward frame differencing. If the Activity Area value (binarized kurtosis value) on pixel is at frame , the AHA is defined as
One of the main novel points of the proposed system is the detection of the times at which the activity taking place changes. The input data for the change detection is a sequence of illumination variations from frame to , that is, . If only the pixels inside the Activity Area are being examined, the data from each frame contains the illumination variations of that frame's pixels, for the pixels inside the Activity Area. Thus, if the activity area contains pixels, we have . In this work we examine the case where only the pixels inside the Activity Area are processed. It is considered that the data follows a distribution before a change occurs, and after the change, at an unknown time instant . This is expressed by the following two hypotheses:
At each frame , is an input into a test statistic to determine whether or not a change has occurred until then, as detailed in Section 4.1. If a change is detected, only the data after frame is processed to detect new changes, and this is repeated until the entire video has been examined.
4.1. Cumulative Sum (CUSUM) for Change Detection
The sequential change detection algorithm  uses the log-likelihood ratio (LLRT) of the input data as a test statistic. For the detection of a change between frames and , we estimate
where it has been assumed that the frame samples are identically independently distributed (i.i.d.) under each hypothesis, so that . Similarly, it is assumed that the illumination variations of the pixels inside the Activity Area are i.i.d., so .
Pixels in highly textured areas can be considered to have i.i.d. values of illumination variations, as they correspond to areas of the moving object with a different appearance, which may be subject to local sources of noise, shadow, or occlusion. In homogeneous image regions that move in the same manner this assumption does not necessarily hold, however, even these pixels can be subject to local sources of noise, which remove correlations between them. The approximation of the data distribution for data that is not considered i.i.d. is very cumbersome, making this assumption necessary for practical purposes as well. Such assumptions are often made in the change detection literature to ensure tractability of the likelihood test.
Under the i.i.d. assumption, the test statistic of (5) obtains the recursive form :
A change is detected at this frame when the test statistic becomes higher than a predefined threshold. Unlike the threshold for sequential probability likelihood ratio testing [27, 28], the threshold for the CUSUM testing procedure cannot be determined in a closed form manner. It has been proven in  that the optimal threshold for the CUSUM test for a predefined false alarm is the threshold that leads to an average number of changes equal to under , that is, when there are no real changes. In the general case examined here, the optimal threshold needs to be estimated empirically from the data being analyzed . In Section 6 we provide more details about how we determine the threshold experimentally.
In practice, illumination variations of only one pixel over time do not provide enough samples to detect changes effectively, so the illumination variations of all active pixels in each frame are used. If an Activity Area contains pixels, this gives samples from frame to , which leads to improved approximations of the data distributions, as well as better change detection performance.
4.2. Data Modeling
As (6) shows, in order to implement the CUSUM test, knowledge about the family of distributions before and after the change is needed, even if the time of change itself is not known. For the case where only the pixels in the Activity Area are being examined, it is known that they are active, and hence do not follow a Gaussian distribution (see Section 2). The distribution of active pixels over time contains outliers introduced by a pixel's change in motion, which lead to a more heavy-tailed distribution than the Gaussian, such as the Laplacian or generalized Gaussian . The Laplacian distribution is given by
where is the data mean and is its scale, for variance . The tails of this distribution decay more slowly than those of the Gaussian, since its exponent contains an absolute difference instead of the difference squared. Its tails are consequently heavier, indicating that data following the Laplace distribution contains more outlier values than Gaussian data. The test statistic of (7) for data samples can then be written as
Gaussian and laplace modeling errors.
Activity areas are extracted to find the active pixels.
The illumination variations of the pixels inside the activity area over time are estimated.
Sequential change detection is applied to the illumination variations, to detect changes.
If the change points are (nearly) equidistant, the motion is considered to be (near) periodic.
The Activity Areas and Activity History Areas for the frames (subsequences) between change points are extracted. The shape of the Activity Areas and the direction and magnitude of motion are derived from the Activity History Area, to be used for recognition.
False alarms are removed: if motion characteristics of successive subsequences are similar, those subsequences are merged and the change point between them is deleted.
Multiple Activity Areas and Activity History Areas originating from the same activity are detected and merged if their motion and periodicity characteristics coincide.
Shape descriptors of the resulting Activity Areas and motion information from the Activity History Areas are used for recognition.
The detection of different activities between change points increases the usefulness and accuracy of the system for many reasons. The proposed system avoids the drawback of "overwriting" that characterizes MHIs that are extracted using the entire sequence. In periodic motions, for example, where an activity takes place from left to right, then from right to left, and so on, all intermediate changes of direction are lost in the temporal history image if the all video frames are used. This is overcome in our approach, as Activity History Areas are estimated over segments with one kind of activity, giving a clear indication of the activity's direction and temporal evolution. This also allows the extraction of details about the activity taking place, such as the direction of translational motions, periodicity of motions like boxing, or of more complex periodic motions, containing similarly repeating components (see Section 6.2). Finally, the application of recognition techniques to the extracted sequences would not be meaningful if the sequence had not been correctly separated into subsequences with one activity each.
Both the shape of the Activity Area and motion information from the Activity History Area are used for accurate activity recognition, as detailed in the sections that follow.
5.1. Fourier Shape Descriptors of Activity Area
The shape of the Activity Areas can be described by estimating the Fourier Descriptors (FDs)  of their outlines. The FDs are preferred as they provide better classification results than other shape descriptors . Additionally, they are rotation, translation, and scale invariant, and inherently capture some perceptual shape characteristics: their lower frequencies correspond to the average shape, while higher frequencies describe shape details . The FDs are derived from the Fourier Transform (FT) of each shape outline's boundary coordinates. The DC component is not used, as it only indicates the shape position. All values are divided by the magnitude of to achieve scale invariance, and rotation invariance is guaranteed by using their magnitude. Thus, the FDs are given by
Only the first terms of the FD, corresponding to the lowest frequencies, are used in the recognition experiments, as they capture the most important shape information. The comparison of the FDs for different activities takes place by estimating their Euclidean distance, since they are scale, translation, and rotation invariant. When elements of the FDs are retained, the Euclidean distance between two FDs , is given by
and each activity is matched to that with the shortest Euclidean distance.
5.2. Activity History Area for Motion Magnitude and Direction Detection
sign of the slope shows the direction of motion: it is negative for a person moving to the left and positive for motion to the right.
magnitude of the slope is inversely proportional to the velocity, that is, higher magnitudes correspond to slower activities.
The values of the Activity History Area are higher in pixels that were active recently; here the the pixel locations correspond to the horizontal axis, and the slope is estimated by
The Activity History Area of a fast activity (e.g., running) contains a small range of frames (from to ), since it takes place in a short time, whereas the Activity History Area of a slow activity occurs during more frames, since the motion lasts longer. In order to objectively discriminate between fast and slow actions, the same number of pixels must be traversed in each direction. Thus, in (13), is the same for all activities, and has high values for slow actions and low values for fast ones. Consequently, higher magnitudes of the slope of (13) correspond to slower motions and lower magnitudes correspond to faster ones.
For the classification of a test video, its Activity History Area is extracted, and its mean is estimated. The sign of its slope indicates whether the person is moving to the right or left and its magnitude is compared to the average slope of the three baseline categories of Table 2 using the Mahalanobis distance. For a baseline set with mean and covariance matrix , the Mahalanobis distance of data from it is defined as . The Mahalanobis distance is used as a distance metric as it incorporates data covariance, which is not taken into account by the Euclidean distance. In this case the data is one dimensional (the slope) so its variance is used instead of the covariance matrix.
Experiments with real videos take place to examine the performance of the change detection module. These videos can be found on http://mklab.iti.gr/content/temporal-templates-human-activity-recognition, so that the reader can observe the ground truth and verify the validity of the experiments. The ground truth for the times of change is extracted manually and compared to the estimated change points to evaluate the detection performance.
We model the data by a Laplacian distribution (Section 4.2) to approximate and of (5), which are unknown and need to be estimated from the data at each time . The distribution of the "current" data is extracted from the first samples of , in order to take into account samples that belong to the old distribution, while is approximated using the most recent samples. There could be a change during the first samples used to approximate , but there is no way to determine this a priori, so there is the implicit assumption that no change takes place in the first frames. Currently, there is no theoretically founded way to determine the optimal length of the windows and , as stated in the change detection literature . Consequently, the best possible solution is to empirically determine the window lengths that give the best change detection results for certain categories of videos, and use them accordingly. After extensive experimentation, and are found to give the best detection results with the fewest false alarms, for detecting a change between successive activities. For periodic motions, the changes occur more often, so smaller windows are used, namely .
At each frame , the test statistic is estimated and compared against a threshold in order to determine whether or not a change has occurred. Due to the sequential nature of the system, there is no closed form expression for this threshold, so an optimal value cannot be determined for it a priori . It is found empirically that for videos of human motions like the ones examined here, the threshold which leads to the highest detection rate with the fewest false alarms is given by
6.1. Experiments with Translational Motions
Change points for videos with translational motions.
Jog LR 1
Jog LR 2
Run LR 1
Run LR 2
Walk LR 1
Walk LR 2
Walk LR 3
18, 30, 89
Walk LR 4
Walk LR 5
Walk LR 6
Walk LR 7
Walk LR 8
Walk LR 9
35, 69, 104
Figures 9(e)–9(i) contains frames from a walking sequence, where the pixels around the person's neck are mistaken for static pixels, leading to two Activity Areas, one corresponding to the head and one to the body, shown in Figures 9(f), 9(g). When there are more than one Activity Area, the sequential testing is applied to each Activity Area separately, since there could be more than one different activity taking place. In this example, the area corresponding to the head is too small to provide enough samples for a reliable estimate of the change-point, so only the likelihood ratio values for the Activity Area corresponding to the body of the person with the coat are shown in Figures 9(h), 9(i). Even in this case, the change points are correctly found.
6.2. Experiments with Nontranslational Motions
6.2.1. Periodic Motions
The values of the data windows , chosen for approximating respectively, affect the resolution of the system. When have higher values, they detect changes at a coarse granularity, but at the cost of missing small changes inside each individual activity. In this section, we present experiments where these windows are set to , enabling the detection of changes in repeating activities with good accuracy.
6.3. Experiments with Multiple Activity Areas
6.4. Experiments with Dynamic Backgrounds
Change points for dynamic backgrounds.
23, 39, 61, 72, 80
15, 62, 77, 87, 110, 130, 153
10, 17, 23, 37, 45, 56
13, 40, 58, 68, 81, 110, 121, 133, 146
Experimental results for recognition based on the Activity Area and Activity History Area information are presented here. It should be emphasized that the activity recognition results are good although there is no training stage, so the proposed method is applicable to various kinds of activity, without restrictions imposed by the training set.
7.1. Recognition Using Fourier Shape Descriptors of Activity Area
Recognition for boxing, handclapping, handwaving (%).
Different methods have also used this dataset for activity recognition. In , excellent recognition results of for boxing, for clapping, and for waving are achieved. However, that method is based on extracting motion templates (motion images and motion context) using very simple processing, which would fail for more challenging sequences, like those in Section 6.4: the standard deviation of the illumination over successive video frames is estimated to find active pixels, a measure which can easily lead to false alarms in the presence of noise. In , Support Vector Machines (SVMs) are used, so training is required in their method. They achieve recognition of for boxing, but for clapping and for waving, that is, worse than our results. Finally, in  volumetric features are used, leading to a higher computational cost, but achieving recognition results of only for boxing, for clapping and for waving (which is comparable to our result). Overall our approach has a consistently good performance, with recognition rates above , despite its simplicity, low computational cost, and the fact that it does not require any training or prior knowledge.
7.2. Recognition Using Activity History Area Features
Mahalanobis distance for running videos.
Mahalanobis distance for jogging videos.
In this work, a novel approach for the analysis of human motion in video is presented. The kurtosis of interframe illumination variations leads to binary masks, the Activity Areas, which indicate which pixels are active throughout the video. The temporal evolution of the activities is characterized by temporally weighted versions of the Activity Areas, the Activity History Areas. Changes in the activity taking place are detected via sequential change detection, applied on the interframe illumination variations. This separates the video into sequences containing different activities, based on changes in their motion. The activity taking place in each subsequence is then characterized by the shape of its Activity Area or on its magnitude and direction, derived from the Activity History Area. For nontranslational activities, Fourier Shape Descriptors represent the shape of each Activity Area, and are compared with each other, for recognition. Translational motions are characterized based on their relative magnitude and direction, which are retrieved from their Activity History Areas. The combined use of the aforementioned recognition techniques with the proposed sequential change detection for the separation of the video in sequences containing separate activities leads to successful recognition results at a low computational cost. Future work includes the development of more sophisticated and complex recognition methods, so as to achieve even better recognition rates. The application of change detection on video is also to be extended to a wider range of videos, as it is a generally applicable method, not limited to the domain of human actions.
The research leading to these results has received funding from the European Communitys Seventh Framework Programme FP7/2007-2013 under grant agreement FP7-214306-JUMAS, from FP6 under contract no. 027685-MESH and FP6-027026-K-Space.
- Wang L, Hu W, Tan T: Recent developments in human motion analysis. Pattern Recognition 2003,36(3):585-601. 10.1016/S0031-3203(02)00100-0View ArticleGoogle Scholar
- Aggarwal JK, Cai Q: Human motion analysis: a review. Computer Vision and Image Understanding 1999,73(3):428-440. 10.1006/cviu.1998.0744View ArticleGoogle Scholar
- Gavrila DM: The visual analysis of human movement: a survey. Computer Vision and Image Understanding 1999,73(1):82-98. 10.1006/cviu.1998.0716View ArticleMATHGoogle Scholar
- Akita K: Image sequence analysis of real world human motion. Pattern Recognition 1984,17(1):73-83. 10.1016/0031-3203(84)90036-0View ArticleGoogle Scholar
- Haritaoglu I, Harwood D, Davis LS: W4: real-time surveillance of people and their activities. IEEE Transactions on Pattern Analysis and Machine Intelligence 2000,22(8):809-830. 10.1109/34.868683View ArticleGoogle Scholar
- Bottino A, Laurentini A: A silhouette based technique for the reconstruction of human movement. Computer Vision and Image Understanding 2001, 83: 79-95. 10.1006/cviu.2001.0918View ArticleMATHGoogle Scholar
- Green RD, Guan L: Quantifying and recognizing human movement patterns from monocular video imagespart I: a new framework for modeling human motion. IEEE Transactions on Circuits and Systems for Video Technology 2004,14(2):179-189. 10.1109/TCSVT.2003.821976View ArticleGoogle Scholar
- Laptev I, Lindeberg T: Space-time interest points. Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV '03), October 2003, Nice, France 1: 432-439.View ArticleGoogle Scholar
- Oren M, Papageorgiou C, Sinha P, Osuna E, Poggio T: Pedestrian detection using wavelet templates. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '97), June 1997, San Juan, Puerto Rico, USA 193-199.View ArticleGoogle Scholar
- Singh M, Basu A, Mandal M: Human activity recognition based on silhouette directionality. IEEE Transactions on Circuits and Systems for Video Technology 2008,18(9):1280-1292.View ArticleGoogle Scholar
- Cootes T, Taylor C, Cooper D, Graham J: Active shape models-their training and application. Computer Vision and Image Understanding 1995,61(1):38-59. 10.1006/cviu.1995.1004View ArticleGoogle Scholar
- Cedras C, Shah M: Motion-based recognition a survey. Image and Vision Computing 1995,13(2):129-155. 10.1016/0262-8856(95)93154-KView ArticleGoogle Scholar
- Boyd J, Little J: Global versus structured interpretation of motion: moving light displays. Proceedings of the IEEE Workshop on Motion of Non-Rigid and Articulated Objects (NAM '97), 1997 18-25.View ArticleGoogle Scholar
- Polana R, Nelson R: Detecting activities. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '93), June 1993, New York, NY, USA 2-7.View ArticleGoogle Scholar
- Bobick AF, Davis JW: The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 2001,23(3):257-267. 10.1109/34.910878View ArticleGoogle Scholar
- Gorelick L, Blank M, Shechtman E, Irani M, Basri R: Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 2007,29(12):2247-2253.View ArticleGoogle Scholar
- Yamato J, Obya J, Ishii K: Recognizing human action in time sequential images using hidden markov model. Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR '92), 1992, The Hague, The Netherlands 379-385.Google Scholar
- Kale A, Sundaresan A, Rajagopalan AN, et al.: Identification of humans using gait. IEEE Transactions on Image Processing 2004,13(9):1163-1173. 10.1109/TIP.2004.832865View ArticleGoogle Scholar
- Sun X, Chen CW, Manjunath BS: Probabilistic motion parameter models for human activity recognition. Proceedings of the International Conference on Pattern Recognition (ICPR '02), August 2002, Quebec, Canada 16(1):443-446.Google Scholar
- Wren CR, Azarbayejani A, Darrell T, Pentland AP: P finder: real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence 1997,19(7):780-785. 10.1109/34.598236View ArticleGoogle Scholar
- Aach T, Dümbgen L, Mester R, Toth D: Bayesian illumination-invariant motion detection. Proceedings of the IEEE International Conference on Image Processing (ICIP '01), October 2001, Thessaloniki, Greece 3: 640-643.Google Scholar
- Stauffer C, Grimson W: Adaptive background mixture models for real-time tracking. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '99), June 1999, Fort Collins, Colo, USA 2: 246-252.View ArticleGoogle Scholar
- El Hassouni M, Cherifi H, Aboutajdine D: HOS-based image sequence noise removal. IEEE Transactions on Image Processing 2006,15(3):572-581.View ArticleGoogle Scholar
- Giannakis GB, Tsatsanis MK: Time-domain tests for Gaussianity and time-reversibility. IEEE Transactions on Signal Processing 1994,42(12):3460-3472. 10.1109/78.340780View ArticleGoogle Scholar
- Stauffer C, Grimson W: Adaptive background mixture models for real-time tracking. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '99), June 1999, Fort Collins, Colo, USA 2: 246-252.View ArticleGoogle Scholar
- Page ES: Continuous inspection scheme. Biometrika 1954,41(1):100-115.View ArticleMathSciNetMATHGoogle Scholar
- Poor HV: An Introduction to Signal Detection and Estimation. 2nd edition. Springer, New York, NY, USA; 1994.View ArticleMATHGoogle Scholar
- Wald A: Sequential Analysis. Dover Publications, New York, NY, USA; 2004.Google Scholar
- Moustakides GV: Optimal stopping times for detecting changes in distributions. Annals of Statistics 1986,14(4):1379-1387. 10.1214/aos/1176350164View ArticleMathSciNetMATHGoogle Scholar
- Basseville M, Nikiforov I: Detection of Abrupt Changes: Theory and Application. Prentice-Hall, Englewood Cliffs, NJ, USA; 1993.Google Scholar
- Aiazzi B, Alparone L, Baronti S: Estimation based on entropy matching for generalized Gaussian PDF modeling. IEEE Signal Processing Letters 1999,6(6):138-140. 10.1109/97.763145View ArticleGoogle Scholar
- Nolan JP: Stable Distributions—Models for Heavy Tailed Data. Birkhäuser, Boston, Mass, USA; 2010.Google Scholar
- Briassouli A, Tsakalides P, Stouraitis A: Hidden messages in heavy-tails:DCT-domain watermark detection using alpha-stable models. IEEE Transactions on Multimedia 2005, 7: 700-715.View ArticleGoogle Scholar
- Simitopoulos D, Tsaftaris SA, Boulgouris NV, Briassouli A, Strintzis MG: Fast watermarking of MPEG-1/2 streams using compressed-domain perceptual embedding and a generalized correlator detector. EURASIP Journal on Applied Signal Processing 2004, 8: 1088-1106.View ArticleGoogle Scholar
- Bober M: MPEG-7 visual shape descriptors. IEEE Transactions on Circuits and Systems for Video Technology 2001,11(6):716-719. 10.1109/76.927426View ArticleGoogle Scholar
- Zhang DS, Lu G: A comparative study of Fourier descriptors for shape representation and retrieval. Proceedings of the 5th Asian Conference on Computer Vision (ACCV '02), Januray 2002, Melbourne, Australia 646-651.Google Scholar
- Hory C, Kokaram A, Christmas WJ: Threshold learning from samples drawn from the null hypothesis for the generalized likelihood ratio CUSUM test. Proceedings of the IEEE Workshop on Machine Learning for Signal Processing, September 2005 111-116.Google Scholar
- Nikiforov IV: A generalized change detection problem. IEEE Transactions on Information Theory 1995,41(1):171-187. 10.1109/18.370109View ArticleMATHGoogle Scholar
- Zhang ZM, Hu YQ, Chan S, Chia LT: Motion context: a new representation for human action recognition. Proceedings of the European Conference on Computer Vision (ECCV '08), October 2008, Marseille, France, Lecture Notes in Computer Science 5305: 817-829.Google Scholar
- Schuldt C, Laptev I, Caputo B: Recognizing human actions: a local SVM approach. Proceedings of the 17th International Conference on Pattern Recognition, August 2004, Cambridge, UKGoogle Scholar
- Ke Y, Sukthankar R, Hebert M: Efficient visual event detection using volumetric features. Proceedings of the10th IEEE International Conference on Computer Vision (ICCV '05), October 2005, Beijing, China 1: 166-173.Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.