Skip to main content

Beyond the visible: thermal data for facial soft biometric estimation

Abstract

In recent years, the estimation of biometric parameters from facial visuals, including images and videos, has emerged as a prominent area of research. However, the robustness of deep learning-based models is challenged, particularly in the presence of changing illumination conditions. To overcome these limitations and unlock new opportunities, thermal imagery has arisen as a viable alternative. Nevertheless, the limited availability of datasets containing thermal data and the small amount of annotations on them limits the exploration of this spectrum. Motivated by this gap, this paper introduces the Label-EURECOM Visible and Thermal (LVT) Face Dataset for face biometrics. This pioneering dataset includes paired visible and thermal images and videos from 52 subjects along with metadata of 22 soft biometrics and health parameters. Due to the reduced number of existing datasets in this domain, the LVT Face Dataset aims to facilitate further research and advancements in the utilization of thermal imagery for diverse eHealth applications and soft biometric estimation. Moreover, we present the first comparative study between visible and thermal spectra as input images for soft biometric estimation, namely gender age and weight, from face images on our collected dataset.

1 Introduction

Facial processing from visual content has gained significant attention in recent years, as the estimation of soft biometrics from faces has been proven important to support various biometric systems [1, 2]. Moreover, remote estimation of health parameters from facial multimedia offers a non-invasive contactless estimation for assessing a subject’s health status, with applications ranging from medical emergencies and road accidents to at-home daily monitoring and telehealth.

Automatic face recognition has consistently been one of the most active research areas of computer vision [3]. Beyond people identification, the estimation of soft biometric traits such as gender and age, has been established in the literature. Moreover, a vast amount of soft biometrics and health information belonging to a subject has been proved to be embedded in face visuals [4]. The estimation of health indicators such as height, weight and body mass index (BMI) from a single facial shot, has been explored in the literature by training a regression method based on the 50-layer ResNet-architecture [5]. Beyond physical attributes, researchers have extracted the called micro-signals from faces, information that has played important roles in media security and forensics [6]. The concept of remote photoplethysmography (rPPG) has evolved over the past 15 years, utilizing the fact that blood absorbs more light than ambient tissues therefore subtle changes in blood volume can be captured by cameras based on the above-mentioned light absorption. This has allowed for remote photoplethysmography (rPPG). Research has shown how a mobile phone camera has enough resolution to capture rPPG signal from faces leading to a successful heart rate (HR) estimation [7]. Following the same principle, recent works have successfully approximated the blood pressure (BP) of a subject thanks to the difference between the times a pulse wave reaches two different parts of the face [8]. More recent investigations employ Convolutional Neural Networks (CNN) to compute the ratio of oxygenated to total hemoglobin (SpO2) from facial videos, considering direct and alternating current components extracted from RGB facial videos [9].

Facial processing models have traditionally based their estimations on images acquired in the visible spectrum. Despite these networks reaching a significant level of maturity with practical success, deep learning approaches relying on data from the visible spectrum are affected by compromising factors such as occlusion and changes in illumination. Thermal imagery has proven itself as a powerful capturing tool [3]. Computer vision researchers have affirmed its superiority to visible imaging in challenging conditions, such as the presence of smoke, dust, and the absence of light sources [10]. Thermal imagery operates by detecting electromagnetic radiation in the medium MWIR (\(3 - 8\mu m\)) and long LWIR (\(8 - 15\mu m\)) wave infrared spectrum [11], where skin heat is located. This capability enables thermal images to overcome the lack of illumination or certain types of occlusions. However, studies have highlighted how the thermal heat captured by thermal cameras can be influenced by factors such as ambient temperature or intense physical activity [3].

To move towards more accurate facial processing models and because we believe in the potential of thermal imagery, in this paper we are presenting a comparative study between visible and thermal spectra for the estimation of different soft biometrics in our new dataset. The main contributions of this work are as follows:

  • We present our Label-EURECOM Visible and Thermal Face dataset for face biometrics composed of 612 images and 416 videos from 52 different subjects and a compendium of 22 health metrics and soft biometrics annotated per person.

  • We compare the performances of state-of-the-art deep learning-based models for the estimation of three soft biometric traits: gender, age, and weight when trained with visible versus thermal data.

  • We offer insights into the strengths and weaknesses of each modality by testing the models with three types of conditions in our dataset: Studio lights, Presence of eyeglasses, and Ambient illumination.

The rest of the paper is organized as follows, Sect. 2 reviews existing works on facial processing for soft biometric and health parameters estimation and lists existing datasets containing thermal visuals and some descriptors of them. In Sect. 3, our LVT Face dataset for face biometrics newly collected is presented in detail. Section 4 includes a description of the networks studied as well as the evaluation protocol and metrics used in our experiments while in Sect. 5 the results of testing the three networks on our LVT dataset are discussed. Finally, Sect. 6 summarizes and concludes with the future directions of our work.

The LVT Face dataset for face biometrics is publicly available upon request.Footnote 1

2 Potential of visible and thermal paired data

Deep learning-based biometric systems and facial eHealth models are traditionally trained on datasets acquired in the visible or, more recently, the near-infrared (NIR) spectra. In this section, we present existing thermal datasets as well as various studies that have focused on the thermal spectrum for facial processing applications such as cross-spectrum face recognition algorithms or HR estimation.

2.1 Relevant thermal datasets

Interest in employing thermal face images has grown in the past years; nevertheless, its use has been mostly confined to tasks such as landmarks and face detection and FR [3, 12]. A relevant subset of FR is Cross-FR (CFR) discipline that aims to identify a person’s image in the thermal spectrum from a gallery containing face images acquired in the visible spectrum [13]. Only a few datasets have been provided involving visuals acquired in thermal spectra, and among them, those covering soft and hidden biometric metadata are few. In Table 1, we present an exhaustive selection of relevant datasets that include visuals in the thermal spectrum and some key descriptors of them including their year of release, the number of subjects, images and videos present in the dataset and their initial intended purpose.

One of the first datasets containing thermal visual data was presented in 2003 [14]. The data were acquired at the University of Notre Dame and contains images from 240 distinct subjects with four views showing different lighting and facial expressions with the purpose of recognizing individuals. Beyond people recognition, Wang et al. established a similar dataset for expression recognition, containing both spontaneous and intended expressions of more than 100 subjects [15], while Gault et al. recorded thermal videos from 32 subjects under three imaging scenarios and their paired rPPG signals for HR estimation [16]. In 2018, two new datasets were acquired for FR with multiple illumination, pose, and occlusion variations [3], including imagery from different modalities, namely visible, thermal, near-infrared, a computerized facial sketch, and 3D images of each volunteer’s face [17]. In the same year, Barbosa et al. collected thermal videos from 20 healthy subjects in two phases: phase A (frontal view acquisitions) and phase B (side view acquisitions), and the corresponding PPG and thoracic effort were simultaneously recorded for HR and respiratory rate (RR) estimation [18]. More recently, two large-scale visible and thermal datasets have been assembled. Abdrakhmanova et al. gathered a combination of thermal, visual, and audio data streams to support machine learning-based biometric applications [19], and Poster et al. presented the largest collection of paired visible and thermal face images to date. Variability in expression and pose were recorded [20]. Following, a thermal face dataset with annotated face bounding boxes and facial landmarks composed of 2556 images was introduced [12].

Table 1 Relevant face datasets containing visuals in thermal spectra

2.2 Thermal data in facial processing tasks

Thermal data for soft biometrics: Recent works have initiated the exploration of the thermal spectrum for facial processing models. In the literature, two soft biometrics, namely gender and ethnicity, have been estimated from thermal input data. In [21], the authors presented the first work on gender classification from faces in the thermal spectrum. They proposed a pipeline of techniques that consists of Local Binary Pattern method to detect the edges in an image, followed by principal component analysis for dimensionality reduction, and finally, a Support Vector Machine as a classification technique for estimating the subjects’ gender. Similarly, Abouelenien et al. utilized the Eigenfaces method for visible faces and statistical measurements of pixel color for thermal faces, employing decision trees for gender classification. Fusion between visible and thermal data was integrated within the decision tree model [22]. More generally, works have explored gender estimation from nine different narrow spectral bands [23]. Deep learning structures began to be explored in [24], where the authors trained a VGG-CNN structure with visible faces and tested it on thermal faces for gender and ethnicity classification. Farooq et al. performed transfer learning from nine famous architectures to estimate gender from thermal data [25], including ResNet-50, ResNet-101, Inception-V3, MobileNet-V2, VGG-19, AlexNet, DenseNet-121, DenseNet-20, and EfficientNet-B44. They also proposed GENNet for the same task. More recently, Abdrakhmanova et al. proposed a combination of bidirectional recurrent neural network and CNN for extracting features from visible–thermal–audio and then performed fusion at the feature level to estimate the person’s gender [19]. They classify females and males afterward with a final decision layer.

Thermal data for eHealth: Although facial thermal imagery has traditionally been employed for face recognition tasks or gender estimation, some researchers have aimed to its use for eHealth parameter estimation in the thermal spectrum, highlighting the potential of this type of data. In 2017, Rai et al. suggested that thermal imaging systems have the ability to provide details about physiological processes through skin temperature distributions, influenced by factors such as blood perfusion. Thermal cameras are commonly employed in the medical field to observe minute temperature variations, with applications including the detection of malignant tumors [11]. The assessment of eHealth parameters such as heart rate from face videos has been studied in depth in recent years. To the best of the authors’ knowledge, all existing methods require proper illumination, posing challenges in uncontrolled environments. In 2018, Barbosa et al. introduced a novel method for remote HR monitoring based on periodic head movements resulting from the cyclical ejection of blood flow from the heart to the head. This innovative algorithm utilized thermal images as input data [18]. Furthermore, they demonstrated the feasibility of evaluating and measuring a subject’s respiratory rate by analyzing temperature fluctuations under the nose during the respiratory cycle. Thermal imagery has proven to be of high value in overcoming illumination constraints, given its light-invariant nature. In a similar line of research, ongoing works explore the potential of deep-learning approaches for extracting heart rate and blood pressure information from thermal images [26].

To the best of our knowledge, current literature has predominantly focused on extracting gender, ethnicity and eHealth parameters namely HR, RR and BP from thermal face data. The estimation of other traits such as age and weight from thermal images remains unexplored by the community. The motivation behind collecting a new dataset of visible face visual data and their thermal counterparts arises from the potential of using thermal images and videos as input data in facial processing tasks. Furthermore, existing datasets are often limited to visual face information content and one or two biometric or health parameters. We believe that the value added by a dataset comprising more than 20 different soft biometric and health measures is a valuable contribution to the biometric community.

3 LVT Face dataset description

In this section, we first introduce the recording setup of the dataset and the characteristics of the acquisition devices. We then elaborate on the data collection methodology, as well as the dataset design and the associated subjects’ metadata.

3.1 Acquisition material

Fig. 1
figure 1

Flir Duo R camera (left) and acquisition setup (right)

The visible and thermal face visual data were obtained using the dual sensor from the FLIR Duo R camera developed by FLIR Systems. This camera is specifically designed to capture visible and thermal visuals simultaneously, particularly suitable for unmanned aerial vehicles. The FLIR Duo R dual camera has been employed in recent research due to its appropriateness in data collection for various tasks such as face recognition and cross-spectrum applications [3, 10]. The visible and thermal sensors of this camera consist of a CCD sensor with a pixel resolution of 1920\(\times\)1080 and an uncooled VOx microbolometer with a pixel resolution of 640\(\times\)512, respectively.

To assess the health status of the subjects, various devices were utilized. A contactless infrared thermometer with a precision of ± \(0.2^{\circ }\)C between \(34^{\circ }\)C and \(42.0^{\circ }\)C and a precision of ± \(0.3^{\circ }\)C in the range of \(42.1^{\circ }\)C and \(43.0^{\circ }\)C was employed for computing the user’s body temperature. For calculating BP, an OMRON HEM-7155-E tensiometer was used, along with a LED finger oximeter for SpO2 measurement with a precision of ±2%. To track HR, subjects wore a Garmin Vivoactive®4 smartwatch equipped with an optical PPG sensor capable of detecting the heart rate by shining a green light through the subject’s skin, reflecting the red cells in the skin’s blood vessels. For quantifying bodyweight-related measures, the RENPHO®Body Fat Smart scale was utilized. When a subject steps on the device and enters their gender, age, and height into the system, the scale returns 13 metrics, including weight and BMI.

3.2 Visuals collection protocol

The image and video acquisition took place in an indoor environment with the ambient temperature set to \(25^{\circ }\)C. In Fig. 1, we present the arrangement. The acquisition setup included a white wall serving as a background, and a chair positioned at a fixed distance of 0.25 m from the camera, which was placed at a height of 1 m from the ground. Additionally, a two-point lighting kit was strategically placed to minimize shadows, facilitating the segmentation of the subject from the background. Each volunteer participated in two separate acquisition sessions, with an average time interval of 6 weeks. Before the acquisition process, volunteers were requested to fill out and sign consent forms.

The visual data comprise 6 images per person (3 visible and their associated thermal pair) in each session, encompassing three different conditions: neutral (N), ambient light (A), and an occlusion in the form of eyeglasses (O), resulting in a total of 612 images. Figure 2 illustrates example images of an individual from the dataset. Additionally, four 60-s videos were recorded per subject in each session under neutral (N) conditions. The first pair of videos (one in the visible spectrum and its paired thermal counterpart) was captured after the subject had been resting for at least 5 min, while the second pair followed moderate exercise in the form of climbing stairs to elevate their HR values, resulting in a total of 408 60-s videos. As for video length, several studies have highlighted the feasibility of estimating the PPG signal, leading to successful HR and BP estimation (among other parameters), from video lengths ranging from 5 [27] to 60 s [28, 29].

Fig. 2
figure 2

Example images from the LVT Face dataset. The three variations are displayed in visible (upper row) and thermal (bottom row) spectra, from left to right: natural conditions (N), occlusions in the form of eyeglasses (O) and no dedicated light sources are applied, i.e., ambient light (A)

3.3 Subjects’ metadata

Several pieces of metadata were collected to describe the subjects, including gender, age, and height. Additional parameters were quantified to assess their health status, such as body temperature, HR, BP, SpO2, weight, and BMI. In addition to weight and BMI, the smart scale provided 11 other variables: body fat and body water percentages, skeletal muscle, fat-free weight, muscle mass, bone mass, protein, subcutaneous and visceral fat, basal metabolic rate (BMR), and metabolic age. The filenames for images and videos are constructed by indicating the visual data spectrum, subject ID, session ID (1 or 2), and in the case of images, the conditions at the time of acquisition (N, O, or A).

3.4 Summary

The presented dataset is designed as a collection of images, videos, soft biometrics, and health parameters recorded from 52 different subjects across two sessions. It consists of 612 face and shoulders images and 408 60-s videos, totaling approximately 285 GB of disk space. The 52 recorded participants, comprising 38 males and 14 females, are from 13 different countries spanning four continents, with ages ranging between 22 and 51 years. Out of the 52 subjects, 50 were present for 2 sessions, while 2 attended only one session. An executive summary of the dataset is provided in Table 2.

Table 2 (Table is read horizontally) Summary of the information contained in the LVT Face dataset

4 Methodology

In this section, we describe the models implemented and compared in our experiments, the evaluation metrics, and the experimental setup of the networks.

4.1 Soft biometric estimation models

VGGNet [30] was developed by the Visual Geometry Group from the University of Oxford to improve computer vision tasks by increasing the depth of an architecture with small convolutional filters of size 3 \(\times\) 3. In addition to this, VGG incorporates 1 \(\times\) 1 convolutional layers to make the decision function more non-linear without changing the receptive fields. VGG architecture has been proven in the literature as powerful for estimating gender and age from face images [31]. Moreover, in their comparative study of architectures for gender estimation from thermal data, Farooq et al. [25] revealed the high performance of VGGNet for this task. No study, to the authors’ knowledge, has been conducted on the feasibility of thermal imagery for age estimation. Therefore, we select the VGGNet network with 16 weight layers, i.e., the VGG16 model, for our gender and age estimation models. We use the VGG16 base model as a feature extractor and we add custom fully connected layers on top for binary classification and regression for gender and age prediction, respectively.

Residual Neural Networks (ResNet) [32] are convolutional neural networks that introduce the concept of residual learning. Instead of learning a direct mapping between layers, ResNet learns the differences between the input and the desired output of a layer by using shortcut connections, also known as skip connections that bypass one or more layers and directly connect the input of a layer to its output. In the literature, face image-based weight estimation has been demonstrated using ResNet architectures with 50 layers and a final regression layer [5, 33, 34]. For our experiments, similar to those studies, we select a ResNet50 model.

4.2 Evaluation metrics

Accuracy is used as a metric for gender classifier assessment. Regarding age and weight, we report the mean absolute error (MAE) and mean root square error (MSRE) in years and kilograms (kg), respectively, and the Pearson’s correlation coefficient (\(\rho\)). Additionally, for age, we provide the standard deviation (StD) of the difference between the predicted and the real age of the subjects. Finally, we include the percentage of acceptable predictions (PAP) for the weight estimation network. This metric represents the percentage of predictions with an error smaller than 10% of the initial weight, indicating a reasonable error in medical applications.

4.3 Experimental setup

Fig. 3
figure 3

Transfer learning protocol for soft biometric estimation from visible and thermal images. Acronyms: Transfer Learning (TL)

The VGG16 and ResNet50 architectures were implemented using TensorFlow and Keras frameworks. The weights of the VGG16 model were initialized with pre-trained weights obtained from the ImageNet [35] dataset with its final fully connected layers excluded to add layers designed for the specific tasks under consideration. The output of the VGG16 base model was flattened and passed through a fully connected layer consisting of 256 neurons with ReLU activation. Additionally, a dropout of 0.5 was applied to prevent overfitting. For the binary gender classification task, a final output layer with a single neuron and sigmoid activation function was added, while for the age regression task, an output layer with a single neuron and linear activation function was incorporated. For the ResNet50 model, the weights were initialized with pre-trained weights obtained from the UTK [36] dataset.

The input images were resized to 224 \(\times\) 224 pixels. Each network underwent two training sessions with identical configurations: one with visible images and another with thermal images, utilizing Transfer Learning (TL). A subject-exclusive split of the dataset was conducted, allocating 240 images for training and 60 for testing per spectra. The training and testing pipeline is illustrated in Fig. 3. The selected VGGNet is pre-trained on ImageNet, while the ResNet50 architecture, is pre-trained on UTK dataset. In our study, we aim to compare the performance of visible and thermal data for soft biometrics estimation. Thus, in our experiments, each of the selected networks is fine-tuned on the training set of the LVT dataset in two different manners, resulting in two new versions. Version A was fine-tuned with visible data, and Version B with thermal spectra. We tested both Version A and Version B on the test set of the LVT dataset, with Version A tested on the visible data and Version B on the thermal data. Therefore, we ensure a fair comparison between visible and thermal spectra for each soft biometric trait studied.

The VGG16 networks were trained for 20 epochs using the Adam optimizer with a learning rate of 0.001. Binary cross-entropy loss function was selected for gender classification, while mean squared error was employed for age estimation. Each ResNet50 model was re-trained during 10 epochs followed by an additional 10 epochs for training the final regression layer. During each TL step, the first 20 layers were frozen. Adam optimizer was used with a learning rate of 0.01, and Huber loss function was selected with \(\delta =1\).

5 Experiments

In this section, we present a comparative study of the state-of-the-art networks for soft biometrics estimation using visible versus thermal images as input data. Additionally, we conduct a comparison of thermal and visible domains against various facial variations introduced in our dataset, reflecting the performance of both modalities in practical scenarios. Results are presented in Tables 3, 4 and 5 where best performances per modality are highlighted in bold.

Table 3 Accuracy of VGG-16 for gender estimation in thermal and visible spectra face images from the LVT test set
Table 4 Performance of VGG-16 for age estimation in thermal and visible spectra face images from the LVT test set
Table 5 Performance of ResNet50 for weight estimation in thermal and visible spectra face images from the LVT test set

It has been proved that bone, muscle, and body fat do not conduct equally temperature [37]. Heat emission patterns can be used to characterize a person as they provide information about the location of major blood vessels, skeleton thickness, amount of tissues, and muscle and fat distribution.Footnote 2 Additionally, it is known that male and female bodies have different bone mineral and muscle density, and their facial appearance differs even when they are of the same weight [38]. Therefore, we believe thermal imagery will access crucial information for the various soft biometric tasks considered in this research.

In Table 3, we present the results of the VGG16 network trained on the visible (VIS) and thermal (TH) images of our dataset for gender classification. We observe that under natural conditions (N), where studio lights are used, visible data outperform thermal imagery. This trend continues when occlusions in the form of eyeglasses (O) are present. In the thermal spectrum, glasses act as opaque barriers, leading to a loss of information in the eyes and penalizing this modality. However, in scenarios where no dedicated light sources are applied such as ambient light (A), VGG16 trained with thermal data delivers better results.

Table 4 displays the results of the VGG16 networks trained for age estimation. In contrast to gender estimation, thermal imagery exhibits clear superiority across all data variations. Even in the presence of eyeglasses, the error is only slightly higher compared to other variations.

Finally, Table 5 presents the results of the weight estimation network. Like age estimation, the metrics demonstrate that ResNet50 performs better in weight estimation when using thermal data, particularly in N and A conditions. This confirms the potential of thermal imagery in capturing hidden and detailed information from human faces, especially for age and weight estimation tasks using VGG and ResNet50 architectures, respectively.

6 Conclusion

This article introduces the LVT Face dataset for face biometrics, containing visuals from 52 subjects captured under various conditions. The dataset comprises 306 visible and 306 thermal images, along with 204 visible and 204 thermal videos collected simultaneously using a paired camera (FLIR Duo R). This setup allows for the comparison or fusion of different data types. The visuals acquired are associated with metadata, covering both biometric and health-related information. To the best of our knowledge, this is the first dataset providing visible–thermal face images and recordings, accompanied by gender, age, body temperature, SpO2, blood pressure, heart rate (resting and after physical activity), height, weight, BMI, and 11 additional health metrics. The extensive annotations for each subject aim to unlock the potential of thermal data in assessing a person’s health status. Furthermore, experiments are conducted in this novel dataset demonstrating the feasibility of estimating three biometric traits: gender, age, and weight from facial thermal data. We partition the test set into three subsets based on the three variabilities presented in the LVT dataset: studio lights, occlusion in the form of eyeglasses, and ambient light. The results highlight the advantages of thermal imagery, especially for age and weight estimation from faces, and demonstrate that thermal imaging is superior when no dedicated light sources are used, such as in ambient light conditions. Building on these promising results, future work will explore thermal imagery not only as an alternative, but also as a complement to visible data. The estimation of other parameters, such as SpO2 or height, from thermal depictions will be investigated.

Availability of data and materials

To allow other scholars to reproduce our results, we have made publicly available the LVT Face dataset. More information can be found on the webpage https://lvt.eurecom.fr/. A download link for the dataset compressed and a password for decrypting the compressed LVT ZIP files will be provided after receiving the duly signed license agreement. Please fill in the license agreement and send a scanned copy by e-mail at lvt@eurecom.fr.

Notes

  1. https://lvt.eurecom.fr/

  2. https://biometrics.mainguet.org/types/face.htm#thermogram.

References

  1. A. Dantcheva, C. Velardo, A. D’angelo, J.-L. Dugelay, Bag of soft biometrics for person identification. Multimedia Tools Appl. 51, 739 (2011)

    Article  Google Scholar 

  2. A.K. Jain, S.C. Dass, K. Nandakumar, Soft biometric traits for personal recognition systems, in International conference biometric authentication. ed. by A.K. Jain (Springer, Berlin, 2004), pp.731–738

    Chapter  Google Scholar 

  3. K. Mallat, J.-L. Dugelay, A benchmark database of visible and thermal paired face images across multiple variations. In: 2018 International Conference of the Biometrics Special Interest Group (BIOSIG), pp. 1–5 (2018). IEEE

  4. A. Ross, S. Banerjee, A. Chowdhury, Deducing health cues from biometric data. Comput. Vis. Image Understanding 221, 103438 (2022)

    Article  Google Scholar 

  5. A. Dantcheva, F. Bremond, P. Bilinski, Show me your face and i will tell you your height, weight and body mass index. In: 2018 24th International Conference on Pattern Recognition (ICPR) (2018). IEEE

  6. M. Wu, Exploiting micro-signals for physiological forensics. In: Proceedings of the 2020 ACM Workshop on Information Hiding and Multimedia Security, pp. 1–1 (2020)

  7. H. Rahman, M.U. Ahmed, S. Begum, P. Funk, Real time heart rate monitoring from facial rgb color video using webcam. In: The 29th Annual Workshop of the Swedish Artificial Intelligence Society (SAIS), 2–3 June 2016, Malmö, Sweden (2016). Linköping University Electronic Press

  8. Y. Lu, C. Wang, M.Q.-H. Meng, Video-based contactless blood pressure estimation: A review. In: 2020 IEEE International Conference on Real-time Computing and Robotics (RCAR), pp. 62–67 (2020). IEEE

  9. Y. Akamatsu, Y. Onishi, H. Imaoka, Blood oxygen saturation estimation from facial video via dc and ac components of spatio-temporal map. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5 (2023). IEEE

  10. M.J. Eddine, J.-L. Dugelay, Gait3: an event-based, visible and thermal database for gait recognition. In: 2022 International Conference of the Biometrics Special Interest Group (BIOSIG), pp. 1–5 (2022). IEEE

  11. M. Rai, T. Maity, R. Yadav, Thermal imaging system and its real time applications: a survey. J. Eng. Technol. 6(2), 290–303 (2017)

    Google Scholar 

  12. A. Kuzdeuov, D. Koishigarina, D. Aubakirova, S. Abushakimova, H.A. Varol, Sf-tl54: a thermal facial landmark dataset with visual pairs. In: 2022 IEEE/SICE International Symposium on System Integration (SII), pp. 748–753 (2022). IEEE

  13. D. Anghelone, C. Chen, P. Faure, A. Ross, A. Dantcheva, Explainable thermal to visible face recognition using latent-guided generative adversarial network. In: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pp. 1–8 (2021). IEEE

  14. X. Kevin, W. Bowyer, Visible-light and infrared face recognition. In: Workshop on Multimodal User Authentication, p. 48 (2003). Citeseer

  15. S. Wang, Z. Liu, S. Lv, Y. Lv, G. Wu, P. Peng, F. Chen, X. Wang, A natural visible and infrared facial expression database for expression recognition and emotion inference. IEEE Trans. Multimed. 12(7), 682–691 (2010)

    Article  Google Scholar 

  16. T. Gault, A. Farag, A fully automatic method to extract the heart rate from thermal video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops pp. 336–341 (2013)

  17. K. Panetta, Q. Wan, S. Agaian, S. Rajeev, S. Kamath, R. Rajendran, S.P. Rao, A. Kaszowska, H.A. Taylor, A. Samani et al., A comprehensive database for benchmarking imaging systems. IEEE Trans. Pattern Anal. Mach. Intell. 42(3), 509–520 (2018)

    Article  Google Scholar 

  18. C. Barbosa Pereira, M. Czaplik, V. Blazek, S. Leonhardt, D. Teichmann, Monitoring of cardiorespiratory signals using thermal imaging: a pilot study on healthy human subjects. Sensors 18(5), 1541 (2018)

    Article  Google Scholar 

  19. M. Abdrakhmanova, A. Kuzdeuov, S. Jarju, Y. Khassanov, M. Lewis, H.A. Varol, Speakingfaces: a large-scale multimodal dataset of voice commands with visual and thermal video streams. Sensors 21(10), 3465 (2021)

    Article  Google Scholar 

  20. D. Poster, M. Thielke, R. Nguyen, S. Rajaraman, X. Di, C.N. Fondje, V.M. Patel, N.J. Short, B.S. Riggan, N.M. Nasrabadi et al., A large-scale, time-synchronized visible and thermal face dataset. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision pp. 1559–1568 (2021)

  21. C. Chen, A. Ross, Evaluation of gender classification methods on thermal and near-infrared face images. In: 2011 International Joint Conference on Biometrics (IJCB), pp. 1–8 (2011). IEEE

  22. M. Abouelenien, V. Pérez-Rosas, R. Mihalcea, M. Burzo, Multimodal gender detection. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp. 302–311 (2017)

  23. N. Vetrekar, A. Naik, R. Gad, Cross-spectral gender classification using multi-spectral face imaging. In: Journal of Physics: Conference Series, vol. 1921, p. 012048 (2021). IOP Publishing

  24. N. Narang, T. Bourlai, Gender and ethnicity classification using deep learning in heterogeneous face recognition. In: 2016 International Conference on Biometrics (ICB), pp. 1–8 (2016). IEEE

  25. M.A. Farooq, H. Javidnia, P. Corcoran, Performance estimation of the state-of-the-art convolution neural networks for thermal images-based gender classification system. J. Electron. Imaging 29(6), 063004–063004 (2020)

    Article  Google Scholar 

  26. K.S. Nair, S. Sarath, Illumination invariant non-invasive heart rate and blood pressure estimation from facial thermal images using deep learning. In: 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1–7 (2021). IEEE

  27. N. Mirabet-Herranz, K. Mallat, J.-L. Dugelay, Deep learning for remote heart rate estimation A reproducible and optimal state-of-the-art framework. In International Conference on Pattern Recognition, 558–573 (2022). Springer

  28. B. Lokendra, G. Puneet, And-rppg: A novel denoising-rppg network for improving remote heart rate estimation. Comput. Biol. Med. 141, 105146 (2022)

    Article  Google Scholar 

  29. X. Niu, H. Han, S. Shan, X. Chen, Synrhythm: learning a deep heart rate estimator from general to specific. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3580–3585 (2018). IEEE

  30. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (ICLR 2015) (2015). Computational and Biological Learning Society

  31. D. Gyawali, P. Pokharel, A. Chauhan, S.C. Shakya, Age range estimation using mtcnn and vgg-face model. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–6 (2020). IEEE

  32. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp. 770–778 (2016)

  33. N. Mirabet-Herranz, K. Mallat, J.-L. Dugelay, New insights on weight estimation from face images. In: 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG), pp. 1–6 (2023). IEEE

  34. N. Mirabet-Herranz, J.-L. Dugelay, Lvt face database: A benchmark database for visible and hidden face biometrics. In: 2023 International Conference of the Biometrics Special Interest Group (BIOSIG), pp. 1–6 (2023). IEEE

  35. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). Ieee

  36. Z. Zhang, Y. Song, H. Qi, Age progression/regression by conditional adversarial autoencoder. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017). IEEE

  37. M. Morley, Thermal conductivities of muscles, fats and bones. Int. J. Food Sci. Technol. 1(4), 303–311 (1966)

    Article  Google Scholar 

  38. D. Han, J. Zhang, S. Shan, Leveraging auxiliary tasks for height and weight estimation by multi task learning. In: 2020 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–7 (2020). IEEE

Download references

Acknowledgements

Not applicable.

Funding

This work has been partially supported by the European CHIST-ERA program via the French National Research Agency (ANR) within the XAIface project (grant agreement CHIST-ERA-19-XAI-011).

Author information

Authors and Affiliations

Authors

Contributions

Nelida Mirabet-Herranz worked on the dataset collection, model implementation and results. Jean-Luc Dugelay provided assistance on the dataset and methodology design and supervision. Both authors worked on the final manuscript.

Corresponding author

Correspondence to Nelida Mirabet-Herranz.

Ethics declarations

Competing interests

The co-author of this paper Jean-Luc Dugelay is the founder EiC.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mirabet-Herranz, N., Dugelay, JL. Beyond the visible: thermal data for facial soft biometric estimation. J Image Video Proc. 2024, 27 (2024). https://doi.org/10.1186/s13640-024-00640-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13640-024-00640-5

Keywords