Skip to main content

Thermal spatio-temporal data for stress recognition


Stress is a serious concern facing our world today, motivating the development of a better objective understanding through the use of non-intrusive means for stress recognition by reducing restrictions to natural human behavior. As an initial step in computer vision-based stress detection, this paper proposes a temporal thermal spectrum (TS) and visible spectrum (VS) video database ANUStressDB - a major contribution to stress research. The database contains videos of 35 subjects watching stressed and not-stressed film clips validated by the subjects. We present the experiment and the process conducted to acquire videos of subjects' faces while they watched the films for the ANUStressDB. Further, a baseline model based on computing local binary patterns on three orthogonal planes (LBP-TOP) descriptor on VS and TS videos for stress detection is presented. A LBP-TOP-inspired descriptor was used to capture dynamic thermal patterns in histograms (HDTP) which exploited spatio-temporal characteristics in TS videos. Support vector machines were used for our stress detection model. A genetic algorithm was used to select salient facial block divisions for stress classification and to determine whether certain regions of the face of subjects showed better stress patterns. Results showed that a fusion of facial patterns from VS and TS videos produced statistically significantly better stress recognition rates than patterns from VS or TS videos used in isolation. Moreover, the genetic algorithm selection method led to statistically significantly better stress detection rates than classifiers that used all the facial block divisions. In addition, the best stress recognition rate was obtained from HDTP features fused with LBP-TOP features for TS and VS videos using a hybrid of a genetic algorithm and a support vector machine stress detection model. The model produced an accuracy of 86%.

1 Introduction

Stress is a part of everyday life, and it has been widely accepted that stress, which leads to less favorable states (such as anxiety, fear, or anger), is a growing concern to a person's health and well-being, functioning, social interaction, and financial aspects. The term stress was coined by Hans Selye, which he defined as ‘the non-specific response of the body to any demand for change’ [1]. Stress is a natural alarm, resistance, and exhaustion system [2] for the body to prepare for a fight or flight response to either defend or make the body adjust to threats and changes. The body shows stress through symptoms such as frustration, anger, agitation, preoccupation, fear, anxiety, and tenseness [3]. When chronic and left untreated, stress can lead to incurable illnesses (e.g., cardiovascular diseases [4], diabetes [5], and cancer [6]), relationship deterioration [7, 8], and high economic costs, especially in developed countries [9, 10]. It is important to recognize stress early to diminish the risks. Stress research is beneficial to our society with a range of benefits, motivating interest and posing technical challenges in computer science in general and affective computing in particular.

Various computational techniques have been used to objectively recognize stress using models based on techniques such as Bayesian networks [11], decision trees [12], support vector machines [13], and artificial neural networks [14]. These techniques have used a range of physiological (e.g., heart activity [15, 16], brain activity [17, 18], galvanic skin response [19], and skin temperature [12, 20]) and physical (e.g., eye gaze [11], facial information [21]) measures for stress as inputs. Physiological signal acquisition requires sensors to be in contact with a person, and this can be obtrusive [3]. In addition, the physiological sensors are usually required to be placed on specific locations of the body, and sensor calibration time is usually required as well, e.g., approximately 5 min is needed for the isotonic gel to settle before galvanic skin response readings can be taken satisfactorily using the BIOPAC System [22]. The trend in this area of research is leading towards obtaining symptom of stress measures through less or non-intrusive methods. This paper proposes a stress recognition method using facial imaging and does not require body contact with sensors unlike the usual physiological sensors.

A relatively new area of research is recognition of stress using facial data in the thermal (TS) and visible (VS) spectrums. Blood flow through superficial blood vessels, which are situated under the skin and above the bone and muscle layer of the human body, allows TS images to be captured. It has been reported in the literature that stress can be successfully detected from thermal imaging [23] due to changes in skin temperature under stress. In addition, facial expressions have been analyzed [24] and classified [2527] using TS imaging. Commonly, VS imaging has been used for modeling facial expressions, and associated robust facial recognition techniques have been developed [2830]. However, from our understanding, the literature has not developed computational models for stress recognition using both TS and VS imaging together as yet. This paper addresses the gap and presents a robust method to use information from temporal and texture characteristics of facial regions for stress recognition.

Automatic facial expression analysis is a long researched problem. Techniques have been developed for analyzing the temporal dynamics of facial muscle movements. A detailed survey of facial expression recognition methods can be found in [31]. Further, vision-based facial dynamics have been used for affective computing tasks such as pain monitoring [32] and depression analysis [30]. This motivated us to explore vision-based stress analysis where inspiration can be taken from the vast field of facial expression analysis. Descriptors such as the local binary pattern (LBP) have been developed for texture analysis and have been successfully applied to facial expression analysis [25, 33, 34], depression analysis [30], and face recognition [35]. A particular LBP extension for analysis of temporal data - local binary patterns on three orthogonal planes (LBP-TOP) - has gained attention and is suitable for the work in this study. LBP-TOP provides features that incorporate appearance and motion, and is robust to illumination variations and image transformations [25]. This paper presents an application of LBP-TOP to TS and VS videos.

Various facial dynamics databases have been proposed in the literature. For facial expression analysis, one of the most popular databases is the Cohn-Kanade + [32], which contains facial action coding system (FACS) and generic expression labels. Subjects were asked to pose and display various expressions. There are other databases in the literature which are spontaneous or close to spontaneous, such as RU-FACS [36], Belfast [37], VAM [38], and AFEW [39]. However, these are limited to emotion-related labels which do not serve the problem in the paper, i.e., stress classification. Lucey et al. [32] proposed the UNBC McMasters database comprising video clips where patients were asked to move the arm up and their reaction was recorded. For creating ANUStressDB, subjects were shown stressful and non-stressful video clips. This database is similar to that in [32].

There are various forms of stressors, i.e., demands or stimuli that cause stress [23, 4042] validated by self-reports (e.g., self-assessment [43, 44]) and observer reports (e.g., human behavior coder [42]). Some examples of stressors are playing video (action) games [45, 46], solving difficult mathematical/logical problems [47], and listening to energetic music [45]. Among these stressors are films, which were used to stimulate stress in this work. In this work, we develop a computed stress measure[3] using facial imaging in VS and TS. Our work analyzes dynamic facial expressions that are as natural as possible elicited by a typical stressful, tense, or fearful environment from film clips. Unlike the previous work in the literature that uses posed facial expressions for classification [48], the work presented in this paper provides an investigation of spontaneous facial expressions as responses or reactions to environments portrayed by the films.

This paper describes a method for collecting and computationally analyzing data for stress recognition from TS and VS videos. A stress database (ANUStressDB) of videos of faces is presented. An experiment was conducted to collect the data where experiment participants watched stressful and non-stressful film clips. ANUStressDB contains videos of 35 subjects watching film clips that created stressed and not-stressed environments validated by the person. Facial expressions in the videos were stimulated by the film clips. Spatio-temporal features were extracted from the TS and VS videos, and these features were provided as inputs to a support vector machine (SVM) classifier to recognize stress patterns. A hybrid of a genetic algorithm (GA) and SVM was used to select salient divisions of facial block regions and determine whether using the block regions improved the stress recognition rate. The paper compares the quality of the stress classifications produced from using LBP-TOP and HDTP (our thermal spatio-temporal descriptor) features from TS and VS data with and without using facial block selection.

The organization of the paper is as follows: Section 2 presents the experiment for TS, VS, and self-reported data collection. Section 3 describes the facial imaging processing steps for the TS and VS data. The new thermal spatio-temporal descriptor, HDTP, is proposed in Section 4. Stress classification models are described in Section 5. Section 6 presents the results, an analysis of the results, and suggestions for future work.

2 Data collection from the film experiment

After receiving approval from the Australian National University Human Research Ethics Committee, an experiment was conducted to collect TS and VS videos of faces of individuals while they watched films. Thirty-five graduate students consisting of 22 males and 13 females between the ages of 23 and 39 years old volunteered to be experiment participants. Each participant had to understand the experiment requirements from written experiment instructions with the guidance of an experiment instructor before they filled in the consent form. The participant was provided information about the experiment and its purpose from a script to ensure that there was consistency in the experiment information provided across all participants. After providing consent, the participant was seated in front of a LCD display (placed between two speakers). The distance between the screen and subject was in the range between 70 and 90 cm. The instructor started the films, which triggered a blank screen with a countdown of the numbers 3, 2, and 1 transitioning in and out slowly with one before the other. The reason for the countdown display and the blank screen was for participants to move away from their thoughts at the time and get ready to pay attention to the films that were about to start. This approach was like that used in experiments for similar work in [49]. Subsequent to the countdown display, a blank screen was shown for 15 s, which was followed by a sequence of film clips with 5-s blank screens in between. After watching the films, the participant was asked to do a survey, which related to the films they watched and provided validation for the film labels. The experiment took approximately 45 min for each participant. An outline of the process of the experiment for an experiment participant is shown in Figure 1.

Figure 1
figure 1

An outline of the process followed by each experiment participant in the film experiment.

Participants watched two types of films either labeled as stressed or not-stressed. Stressed films had stressful content (e.g., suspense with jumpy music), whereas not-stressed films created illusions of meditative environments (e.g., swans and ducks paddling in a lake) and had content that was not stressful or at least was relatively less stressful compared with films labeled as stressed. There were six film clips for each type of film. The survey done by experiment participants validated the film labels. The survey asked participants to rate the films they watched in terms of levels of stress portrayed by the film and the degree of tension and relaxation they felt. Participants found the films that were labeled stressed as stressful and films labeled not-stressed as not stressful with a statistical significance of p < 0.001 according to the Wilcoxon test.

While the participants watched the film clips, TS and VS videos of their faces were recorded. A schematic diagram of the experiment setup is shown in Figure 2. TS videos were captured using a FLIR infrared camera (model number SC620, FLIR Systems, Inc. Notting Hill, Australia), and VS videos were recorded using a Microsoft webcam (Microsoft Corporation, Redmond, WA, USA). Both the videos were recorded with a sampling rate of 30 Hz, and the frame width and height were 640 and 480 pixels, respectively. Each participant had a TS and VS video for each film they watched. As a consequence, a participant had 12 video clips made up of six stressed videos and six not-stressed videos. We name the database that has the collected labeled video data and its protocols as the ANU Stress database (ANUStressDB).

Figure 2
figure 2

Setup for the film experiment to obtain facial video data in thermal and visible spectrums.

Note the usage of the terms film and video in this paper. We use the term film to refer to a video portraying entertaining content, colloquially called a ‘film’ or ‘movie’, which a participant watched during the experiment. We use the term video to refer to a visual recording of a participant's face and its movement during the time period while they watched a film. Thus in this paper, a film is something which is watched, while a video is something recorded about the watcher.

3 Face pre-processing pipeline

Facial regions in VS videos were detected using the Viola-Jones face detector. However, facial regions could not be recognized satisfactorily using the Viola-Jones algorithm in thermal spectrum (TS) videos, so a face detection method based on eye coordinates [50, 51] and a template matching algorithm was used. A template of a facial region was developed from the first frame of a TS video. The facial region was extracted using the Pretty Helpful Development Functions toolbox for Face Recognition [5052], which calculated the intraocular displacement to detect a facial region in an image. This facial region formed a template for facial regions in each video frame of the TS videos, which were extracted using MATLAB's Template Matcher system [53]. The Template Matcher was set to search the minimum difference pixel by pixel to find the area of the frame that best matched the template. Examples of facial regions that were detected in the VS and TS videos for a participant are presented in Figure 3.

Figure 3
figure 3

Examples of facial regions extracted from the ANUStressDB database. The facial regions are of an experiment participant watching the different types of film clips. (a) The participant was watching a not-stressed film clip. (b) The participant was watching a stressed film clip. (i) A frame in the visual spectrum. (ii) The corresponding frame in the thermal spectrum. The crosshairs in the thermal frame were added by the recording software and represents the camera auto-focus.

Facial regions were extracted from each frame of a VS video and its corresponding TS video. Grouped and arranged in order of time of appearance in a video, the facial regions formed volumes of the facial region frames. Examples of facial blocks in TS and VS are shown in Figure 4.

Figure 4
figure 4

Examples of facial volumes extracted from the ANUStressDB database. The facial volumes are of an experiment participant watching the different types of film clips. (a) The participant was watching a not-stressed film clip. (b) The participant was watching a stressed film clip. (i) A facial volume in the visual spectrum. (ii) The corresponding facial volume in the thermal spectrum.

4 Spatio-temporal features

There are claims in the literature that features from segmented image blocks of a facial image region can provide more information than features directly extracted from an image of a full facial region in VS [25]. Examples of full facial regions are shown in Figure 4, and blocks of a full facial region are presented in Figure 5. To illustrate the claim, features from each of the blocks used in conjunction with features from the other blocks in Figure 5 (i) can offer more information than features obtained from Figure 4a (i). The claim aligns with the results from classifying stress based on facial thermal characteristics [23]. As a consequence, the facial regions in this work were segmented into a grid of 3 × 3 blocks for each video segment, or facial volume, forming 3 × 3 blocks. A block has X, Y, and T components where X, Y, and T represent the width, height, and time components of an image sequence, respectively. Each block represented a division of a full facial block region or facial volume. LBP-TOP features were calculated for each block.

Figure 5
figure 5

The facial region in Figure4a segmented into 3 × 3 blocks. (i) Blocks of the frame in the visual spectrum. (ii) Blocks of the corresponding frame in the thermal spectrum.

LBP-TOP is the temporal variant of local binary patterns (LBP). In LBP-TOP, LBP is applied to three planes - XY, XT, and YT - to describe the appearance of an image, horizontal motion, and vertical motion, respectively. For a center pixel O p of an orthogonal plane O and its neighboring pixels N i , a decimal value is assigned to it:

d = O XY , XT , YT p i = 1 k 2 i - 1 I O p , N i

According to a study that investigated facial expression recognition using LBP-TOP features, VS and near-infrared images produced similar facial expression recognition rates, provided that VS images had strong illumination [33]. Due to the fact that TS videos are defined by colors and different color variations, LBP-TOP features may not be able to fully exploit thermal information provided in TS videos and in particular capture thermal patterns for stress. In addition, LBP-TOP features have been mainly extracted from image sequences of people told to show some facial expression, which is not like the image sequences obtained from our film experiment. In our film experiment, participants watched films and involuntary facial expressions were captured. The recordings may have more subtle facial expressions of the kind of facial expressions analyzed in the literature using LBP-TOP. With the subtleness in facial movement, it is possible that LBP-TOP may not be able to offer as much information for stress analysis. These points motivate the development of a new set of features that exploits thermal patterns in TS videos for stress recognition. We propose a new type of feature for TS videos that captures dynamic thermal patterns in histograms (HDTP). This feature makes use of thermal data in each frame of a TS video of a face over the course of the video.

4.1 Histogram of dynamic thermal patterns

HDTP captures normalized dynamic thermal patterns, which enables individual-independent stress analysis. Some people may be more tolerant to some stressors than others [54, 55]. This could mean that some people may show higher degree responses to stress than others. Additionally in general, the baseline for human response can vary from person to person. To consider these characteristics in features used for individual-independent stress analysis, ways have been developed to normalize data for each participant for their type of data [42]. HDTP is defined in terms of a participant's overall thermal state to minimize individual bias in stress analysis.

A HDTP feature is calculated for each facial block region. Firstly, a statistic (consider the standard deviation) is calculated for each facial region frame for a participant for a particular block (e.g., facial block region situated at the top right corner of the facial region in the XY plane) for all the videos. The statistic values from all these frames are partitioned to define empty bins. A bin has a continuous value range with a location defined from the statistic values. The bins are used to partition statistic values for each facial block region where the value for each bin is the frequency of statistic values in the block that falls within the bounds of the bin range. Consequently, a histogram for each block can be formed from the frequencies. An algorithm presenting the approach for developing histograms of dynamic thermal patterns in thermal videos for a participant who has a set of facial videos is provided in Figure 6.

Figure 6
figure 6

The HDTP algorithm captures dynamic thermal patterns in histograms from thermal image sequences.

As an illustration, consider that the statistic used is the standard deviation and the facial block region for which we want to develop a histogram is situated at the top right corner of the facial region in the XY plane (FBR1) for video V1 when a participant P i was watching film F1. In order to create a histogram, the bin locations and sizes need to be calculated. To do this, the standard deviation needs to be calculated for all frames in FBR1 in all videos (V1-12) for P i . This will give standard deviation values from which the global minimum and maximum can be obtained and used to calculate the bin location and sizes. Then, the histogram for FBR1, for V1, and for P i is calculated by filling the bins with the standard deviation values for each frame in FBR1. This method then provides normalized features that also take into account the image and motion, and can be used as inputs to a classifier.

5 Stress classification system using a hybrid of a support vector machine and a genetic algorithm

SVMs have been widely used in the literature to model classification problems including facial expression recognition [27, 33, 34]. Provided a set of training samples, a SVM transforms the data samples using a nonlinear mapping to a higher dimension with the aim to determine a hyperplane that partitions data by class or labels. A hyperplane is chosen based on support vectors, which are training data samples that define maximum margins from the support vectors to the hyperplane to form the best decision boundary.

It has been reported in the literature that thermal patterns for certain regions of a face provide more information for stress than other regions [23]. The performance of the stress classifier can degrade if irrelevant features are provided as inputs. As a consequence and due to its benefits noted in literature, the classification system was extended to include a feature selection component, which used a GA to select facial block regions appropriate for the stress classification. GAs are inspired by biological evolution and the concept of survival of the fittest. A GA is a global search technique and has been shown to be useful for optimization problems and problems concerning optimal feature selection for classification [56].

The GA evolves a population of candidate solutions, represented by chromosomes, using crossover, mutation, and selection operations in search for a better quality population based on some fitness measure. Crossover and mutation operations are applied to chromosomes to achieve diversity in the population and reduce the risk of the search being stuck with a local optimal population. After each generation during the search, the GA selects chromosomes, probabilistically mostly made up of better quality chromosomes, for the population in the next generation to direct the search to more favorable chromosomes.

Given a population of subsets of facial block regions with corresponding features, a GA was defined to evolve sets of blocks by applying crossover and mutation operations, and selecting block sets during each iteration of the search to determine sets of blocks that produce better quality SVM classifications. Each block set was represented by a binary fixed-length chromosome where an index or locus symbolized a facial block region; its value or allele depicted whether or not the block was used in the classification and the length of the chromosome matched the number of blocks for a video. The search space had 3 × 3 blocks (as shown in Figure 5) with an addition of blocks that overlapped each other by 50%. The architecture for the GA-SVM classification system is shown in Figure 7. The characteristics of the GA implemented for facial block region selection is provided in Table 1.

Figure 7
figure 7

The architecture of the GA-SVM hybrid stress classification system.

Table 1 GA implementation settings for facial block region selection

In summary, various stress classification systems using a SVM were developed which differed in terms of the following input characteristics:

  • VSLBP-TOP: LBP-TOP features for VS videos

  • TSLBP-TOP: LBP-TOP features for TS videos

  • TSHDTP: HDTP features (as described in Section 4.1) for TS videos





These inputs were also provided as inputs to the GA-SVM classification systems to determine whether the system produced better stress recognition rates.

6 Results and discussion

Each of the different features is derived from VS and TS facial videos using LBP-TOP and HDTP facial descriptors on standardized data and provided as inputs to a SVM for stress classification. Facial videos of participants watching stressed films were assigned to the stressed class, and videos associated with not-stressed films were assigned to the not-stressed class. Furthermore, their corresponding features were assigned to corresponding classes. Recognition rates and F-scores for the classifications were obtained using 10-fold cross-validation for each type of input. The results are shown in Figure 8.

Figure 8
figure 8

Performance measures for SVM and GA-SVM stress recognition systems. The measures were obtained for various input features based on 10-fold cross-validation. The labels on the horizontal axes are shortened to improve readability. L and H stand for LBP-TOP and HDTP, respectively. (a) Recognition rate measure for the stress recognition systems. (b) F-score measure for the stress recognition systems.

Results show that when HDTP features for TS videos (TSHDTP) were provided as input to the SVM classifier, there were improvements in the stress recognition measures. The best recognition measures for the SVM were obtained when VSLBP-TOP + TSHDTP was provided as input. It produced a recognition rate that was at least 0.10 greater than the recognition rate for inputs without TSHDTP where the range for recognition rates was 0.13. This provides evidence that TSHDTP had a significant contribution towards the better classification performance and suggests that TSHDTP captured more patterns associated with stress than VSLBP-TOP and TSLBP-TOP. The performance for the classification was the lowest when TSLBP-TOP was provided as input.

The features were also provided as inputs to a GA which selected facial block regions with a goal to disregard irrelevant facial block regions for stress recognition and to improve the SVM-based recognition measures. Performances of the classifications using 10-fold cross-validation on the different inputs are provided in Figure 8. For all types of inputs, GA-SVM produced significantly better stress recognition measures. According to the Wilcoxon non-parametric statistical test, the statistical significance was p < 0.01. Similar to the trend observed for stress recognition measures produced by the SVM, TSHDTP also contributed to the improved results in GA-SVM. The best recognition measures were obtained when VSLBP-TOP + TSLBP-TOP + TSHDTP was provided as input to the GA-SVM classifier. The performance of the classifier was highly similar when it received VSLBP-TOP + TSHDTP as inputs with a difference of 0.01 in the recognition rate. Results show that when a combination of at least two of VSLBP-TOP, TSLBP-TOP, and TSHDTP was provided as input, then it performed better than when only one of VSLBP-TOP, TSLBP-TOP, or TSHDTP was used.

Further, stress recognition systems provided with TSHDTP as input produced significantly better stress recognition measures than inputs with TSHDTP replaced by TSLBP-TOP (p < 0.01). This suggests that stress patterns were better captured by TSHDTP features than TSLBP-TOP features.

In addition, blocks selected by the GA in the GA-SVM classifier for the different inputs were recorded. When VSLBP-TOP was given as inputs to a GA, the blocks that produced better recognition results were the blocks that corresponded to the cheeks and mouth regions on the XY plane. For VSLBP-TOP, fewer blocks were selected and they were situated around the nose. On the other hand for TSHDTP, more blocks were used in the classification - nose, mouth, and cheek regions and regions on the forehead were selected by the GA. Future work could extend the investigation by more complex block definitions to find and use more precise regions showing symptoms of stress for classification.

Future work could also investigate other block selection methods different from the GA used in this work. The GA search took approximately 5 min to reach convergence, but it could take longer if the chromosome is extended to encode more general information for a block, e.g., coordinate values and the size for the block. The literature has claimed that a GA usually takes longer execution times than other types of feature selection techniques, such as correlation analysis [57]. Therefore in future, other block selection methods could be investigated that do not require execution times as long as a GA and still produce stress recognition measures comparable to the GA hybrid.

7 Conclusions

The ANU Stress database (ANUStressDB) was presented which has videos of faces in temporal thermal (TS) and visible (VS) spectrums for stress recognition. A computational classification model of stress using spatial and temporal characteristics of facial regions in the ANUStressDB was successfully developed. In the process, a new method for capturing patterns in thermal videos was defined - HDTP. The approach was defined so that it reduced individual bias in the computational models and enhanced participant-independent recognition of symptoms of stress. For computing the baseline for stress classification, a SVM was used. Facial block regions selected informed by a genetic algorithm improved the rates of the classifications regardless of the type of video - videos in TS or VS. The best recognition rates, however, were obtained when features from TS and VS videos were provided as inputs to the GA-SVM classifier. In addition, stress recognition rates were significantly better for classifiers provided with HDTP features instead of LBP-TOP features for TS. Future work could extend the investigation by developing features for facial block regions to capture more complex patterns and examining different forms of facial block regions for stress recognition.


  1. Selye H: The stress syndrome. Am. J. Nurs. 1965, 65: 97-99. 10.1097/00000446-196505000-00023

    Google Scholar 

  2. Hoffman-Goetz L, Pedersen BK: Exercise and the immune system: a model of the stress response? Immunol. Today 1994, 15: 382-387. 10.1016/0167-5699(94)90177-5

    Article  Google Scholar 

  3. Sharma N, Gedeon T: Objective measures, sensors and computational techniques for stress recognition and classification: a survey. Comput. Methods Prog. Biomed. 2012, 108: 1287-1301. 10.1016/j.cmpb.2012.07.003

    Article  Google Scholar 

  4. Miller GE, Cohen S, Ritchey AK: Chronic psychological stress and the regulation of pro-inflammatory cytokines: a glucocorticoid-resistance model. Health Psychology Hillsdale 2002, 21: 531-541.

    Article  Google Scholar 

  5. Surwit RS, Schneider MS, Feinglos MN: Stress and diabetes mellitus. Diabetes Care 1992, 15: 1413-1422. 10.2337/diacare.15.10.1413

    Article  Google Scholar 

  6. Vitetta L, Anton B, Cortizo F, Sali A: Mind body medicine: stress and its impact on overall health and longevity. Ann. N. Y. Acad. Sci. 2005, 1057: 492-505. 10.1196/annals.1322.038

    Article  Google Scholar 

  7. Seltzer JA, Kalmuss D: Socialization and stress explanations for spouse abuse. Social Forces 1988, 67: 473-491. 10.1093/sf/67.2.473

    Article  Google Scholar 

  8. Johnson PR, Indvik J: Stress and violence in the workplace. Employee Counsell. Today 1996, 8: 19-24.

    Article  Google Scholar 

  9. The American Institute of Stress. (05/08/10), America's no. 1 health problem - why is there more stress today? . Accessed 5 August 2010

  10. Lifeline Australia, Stress costs taxpayer $300K every day, 2009 Accessed 10 August 2010

  11. Liao W, Zhang W, Zhu Z, Ji Q: A real-time human stress monitoring system using dynamic Bayesian network. San Diego, CA, USA, 25 June 2005. Computer Vision and Pattern Recognition - Workshops, CVPR Workshops

    Google Scholar 

  12. Zhai J, Barreto A: Stress recognition using non-invasive technology. Melbourne Beach, Florida, 2006. Proceedings of the 19th International Florida Artificial Intelligence Research Society Conference FLAIRS 395-400.

    Google Scholar 

  13. Wang J, Korczykowski M, Rao H, Fan Y, Pluta J, Gur RC, McEwen BS, Detre JA: Gender difference in neural response to psychological stress. Soc. Cogn. Affect. Neurosci. 2007, 2: 227. 10.1093/scan/nsm018

    Article  Google Scholar 

  14. Sharma N, Gedeon T: Stress Classification for Gender Bias in Reading - Neural Information Processing vol. 7064. Edited by: Lu B-L, Zhang L, Kwok J. Springer, Berlin; 2011:348-355.

    Google Scholar 

  15. Ushiyama T, Mizushige K, Wakabayashi H, Nakatsu T, Ishimura K, Tsuboi Y, Maeta H, Suzuki Y: Analysis of heart rate variability as an index of noncardiac surgical stress. Heart Vessel. 2008, 23: 53-59. 10.1007/s00380-007-0997-6

    Article  Google Scholar 

  16. Seong H, Lee J, Shin T, Kim W, Yoon Y: The analysis of mental stress using time-frequency distribution of heart rate variability signal. San Francisco, CA, USA, 1–4 September 2004, vol 1. Annual International Conference of Engineering in Medicine and Biology Society, 2004 283-285.

    Chapter  Google Scholar 

  17. Morilak DA, Barrera G, Echevarria DJ, Garcia AS, Hernandez A, Ma S, Petre CO: Role of brain norepinephrine in the behavioral response to stress. Prog. Neuro-Psychopharmacol. Biol. Psychiatry 2005, 29: 1214-1224. 10.1016/j.pnpbp.2005.08.007

    Article  Google Scholar 

  18. Haak M, Bos S, Panic S, Rothkrantz LJM: Detecting stress using eye blinks and brain activity from EEG signals. Chez Technical University, Prague, 2008. Proceeding of the 1st Driver Car Interaction and Interface (DCII 2008)

    Google Scholar 

  19. Shi Y, Ruiz N, Taib R, Choi E, Chen F: Galvanic skin response (GSR) as an index of cognitive load. San Jose, CA, USA, 2007, 28 April - 3 May 2007. CHI '07 extended abstracts on Human factors in computing systems 2651-2656.

    Chapter  Google Scholar 

  20. Reisman S: Measurement of physiological stress. 1997, 4–6 April 1997. Bioengineering Conference 21-23.

    Google Scholar 

  21. Dinges DF, Rider RL, Dorrian J, McGlinchey EL, Rogers NL, Cizman Z, Goldenstein SK, Vogler C, Venkataraman S, Metaxas DN: Optical computer recognition of facial expressions associated with stress induced by performance demands. Aviat. Space Environ. Med. 2005, 76: B172-B182.

    Google Scholar 

  22. BIOPAC Systems Inc, BIOPAC Systems, 2012 . Accessed 10 February 2011

  23. Yuen P, Hong K, Chen T, Tsitiridis A, Kam F, Jackman J, James D, Richardson M, Williams L: W. Oxford, Emotional & physical stress detection and classification using thermal imaging technique. London, 2009, 3 December 2009. 3rd International Conference on Crime Detection and Prevention (ICDP 2009) 1-6.

    Google Scholar 

  24. Jarlier S, Grandjean D, Delplanque S, N'Diaye K, Cayeux I, Velazco MI, Sander D, Vuilleumier P, Scherer KR: Thermal analysis of facial muscles contractions. IEEE Trans. Affect. Comput. 2011, 2: 2-9.

    Article  Google Scholar 

  25. Zhao G, Pietikainen M: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29: 915-928.

    Article  Google Scholar 

  26. Hernández B, Olague G, Hammoud R, Trujillo L, Romero E: Visual learning of texture descriptors for facial expression recognition in thermal imagery. Comput. Vis. Image Underst. 2007, 106: 258-269. 10.1016/j.cviu.2006.08.012

    Article  Google Scholar 

  27. Trujillo L, Olague G, Hammoud R, Hernandez B: Automatic feature localization in thermal images for facial expression recognition. San Diego, CA, USA, 2005, 20, 21 and 25 June 2005. IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, 2005. CVPR Workshops 14.

    Google Scholar 

  28. Manglik PK, Misra U, Maringanti HB: Facial expression recognition. 2004, The Hague, Netherlands, 10–13 October 2004. IEEE International Conference on Systems, Man and Cybernetics 2220-2224.

  29. Neggaz N, Besnassi M, Benyettou A: Application of improved AAM and probabilistic neural network to facial expression recognition. J. Appl. Sci. 2010, 10: 1572-1579.

    Article  Google Scholar 

  30. Sandbach G, Zafeiriou S, Pantic M, Rueckert D: Recognition of 3D facial expression dynamics. Image Vis. Comput. 2012, 30: 762-773. 10.1016/j.imavis.2012.01.006

    Article  Google Scholar 

  31. Zeng Z, Pantic M, Roisman GI, Huang TS: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31: 39-58.

    Article  Google Scholar 

  32. Lucey P, Cohn JF, Prkachin KM, Solomon PE, Matthews I: Painful data: The UNBC-McMaster shoulder pain expression archive database. 2011, Santa Barbara, CA, USA, 21–25 March 2011. IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011) 57-64.

    Google Scholar 

  33. Taini M, Zhao G, Li SZ, Pietikainen M: Facial expression recognition from near-infrared video sequences. 2008, Tampa, Florida, USA, 8–11 December 2008. 19th International Conference on Pattern Recognition (ICPR) 1-4.

    Google Scholar 

  34. Michel P, Kaliouby RE: Real time facial expression recognition in video using support vector machines. Vancouver, British Columbia, Canada, 5–7 November 2003. the Proceedings of the 5th International Conference on Multimodal Interfaces, 2003

    Google Scholar 

  35. Ahonen T, Hadid A, Pietikainen M: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28: 2037-2041.

    Article  MATH  Google Scholar 

  36. Bartlett MS, Littlewort GC, Frank MG, Lainscsek C, Fasel IR, Movellan JR: Automatic recognition of facial actions in spontaneous expressions. J. Multimed. 2006, 1: 22-35.

    Article  Google Scholar 

  37. Douglas-Cowie E, Cowie R, Schröder M: A new emotion database: considerations, sources and scope. ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion 2000, 39-44.

    Google Scholar 

  38. Grimm M, Kroschel K, Narayanan S: The Vera am Mittag German audio-visual emotional speech database. Hannover, Germany, 23–26 June 2008. IEEE International Conference on Multimedia and Expo 2008, 865-868.

    Google Scholar 

  39. Dhall A, Goecke R, Lucey S, Gedeon T: A semi-automatic method for collecting richly labelled large facial expression databases from movies. IEEE Multimedia 2012, 19: 34-41.

    Article  Google Scholar 

  40. Zhai J, Barreto A: Stress detection in computer users based on digital signal processing of noninvasive physiological variables. 2006, New York City, NY, USA, 30 August - 3 September 2006. Proceedings of the 28th IEEE EMBS Annual International Conference 1355-1358.

  41. Hjortskov N, Rissén D, Blangsted A, Fallentin N, Lundberg U, Søgaard K: The effect of mental stress on heart rate variability and blood pressure during computer work. Eur. J. Appl. Physiol. 2004, 92: 84-89. 10.1007/s00421-004-1055-z

    Article  Google Scholar 

  42. Healey JA, Picard RW: Detecting stress during real-world driving tasks using physiological sensors. IEEE Trans. Intell. Transport. Syst. 2005, 6: 156-166. 10.1109/TITS.2005.848368

    Article  Google Scholar 

  43. Niculescu A, Cao Y, Nijholt A: Manipulating stress and cognitive load in conversational interactions with a multimodal system for crisis management support. In Development of Multimodal Interfaces: Active Listening and Synchrony. Springer, Dublin Ireland; 2010:134-147.

    Chapter  Google Scholar 

  44. Vizer LM, Zhou L, Sears A: Automated stress detection using keystroke and linguistic features: an exploratory study. Int. J. Hum. Comput. Stud. 2009, 67: 870-886. 10.1016/j.ijhcs.2009.07.005

    Article  Google Scholar 

  45. Lin T, John L: Quantifying mental relaxation with EEG for use in computer games. Las Vegas, NV, USA, 26–29 June 2006. International Conference on Internet Computing, 2006 409-415.

    Google Scholar 

  46. Lin T, Omata M, Hu W, Imamiya A: Do physiological data relate to traditional usability indexes? In Proceedings of the 17th Australia Conference on Computer-Human Interaction: Citizens Online: Considerations for Today and the Future. Narrabundah, Australia; 2005:1-10.

    Google Scholar 

  47. Lovallo WR: Stress & Health: Biological and Psychological Interactions. Sage Publications, Inc., California; 2005.

    Google Scholar 

  48. Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). San Francisco, CA, USA; 2010:94-101.

    Google Scholar 

  49. Gross JJ, Levenson RW: Emotion elicitation using films. Cognit. Emot. 1995, 9: 87-108. 10.1080/02699939508408966

    Article  Google Scholar 

  50. Struc V, Pavesic N: The complete Gabor-Fisher classifier for robust face recognition. EURASIP Advances in Signal Processing 2010, 2010: 26.

    MATH  Google Scholar 

  51. Struc V, Pavesic N: Gabor-based kernel partial-least-squares discrimination features for face recognition. Informatica (Vilnius) 2009, 20: 115-138.

    MATH  Google Scholar 

  52. Struc V: The PhD Toolbox: Pretty Helpful Development Functions for Face Recognition. 2012. . Accessed 12 September 2012

    Google Scholar 

  53. Mathworks, Vision TemplateMatcher System Object R2012a 2012. . Accessed 12 September 2012

  54. APA: American Psychological Association, Stress in America. APA, Washington, DC; 2012.

    Google Scholar 

  55. Holahan CJ, Moos RH: Life stressors, resistance factors, and improved psychological functioning: an extension of the stress resistance paradigm. J. Pers. Soc. Psychol. 1990, 58: 909.

    Article  Google Scholar 

  56. Frohlich H, Chapelle O, Scholkopf B: Feature selection for support vector machines by means of genetic algorithm. Sacramento, California, USA, 3–5 November 2003. 15th IEEE International Conference on Tools with Artificial Intelligence 2003, 142-148.

    Chapter  Google Scholar 

  57. Yu L, Liu H: Feature selection for high-dimensional data: a fast correlation-based filter solution. Los Angeles, CA, 23–24 June 2003. 12th International Conference on Machine Learning 2003, 856-863.

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Nandita Sharma.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, N., Dhall, A., Gedeon, T. et al. Thermal spatio-temporal data for stress recognition. J Image Video Proc 2014, 28 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: