 Review
 Open Access
 Published:
Age estimation via face images: a survey
EURASIP Journal on Image and Video Processing volume 2018, Article number: 42 (2018)
Abstract
Facial aging adversely impacts performance of face recognition and face verification and authentication using facial features. This stochastic personalized inevitable process poses dynamic theoretical and practical challenge to the computer vision and pattern recognition community. Age estimation is labeling a face image with exact real age or age group. How do humans recognize faces across ages? Do they learn the pattern or use ageinvariant features? What are these ageinvariant features that uniquely identify one across ages? These questions and others have attracted significant interest in the computer vision and pattern recognition research community. In this paper, we present a thorough analysis of recent research in aging and age estimation. We discuss popular algorithms used in age estimation, existing models, and how they compare with each other; we compare performance of various systems and how they are evaluated, age estimation challenges, and insights for future research.
Introduction
You can never see the same face twice. This statement is true because facial appearance varies more dynamically as it is affected by several factors including pose, facial expression, head profile, illumination, aging, occlusion, mustache, beards, makeup (cosmetics), and hair style. Major factors that influence facial aging include gravity, exposure to ultraviolet (UV) rays from the sun, maturity of soft tissues, bone restructuring, and facial muscular activities [1]. These factors cause variations in face appearance. For instance, a face seen in blue light illumination is totally different from one seen under red light illumination. Another factor that constantly and permanently causes variations in facial appearance is age. Aging is an inevitable stochastic process that affects facial appearance. Aging involves both variations in soft tissues and bony structure on the human face. A face seen at one age is totally different from the face of same individual at a different age. Therefore, these ageintroduced variations could be learned and used to estimate facial age.
The human face provides prior perceptible information about one’s age, gender, identity, ethnicity, and mood. Alley [2] asserts that attributes derived from human facial appearance like mood and perceived age significantly impact interpersonal behavior as is considered as essential contextual cue in social networks [3, 4]. Information rendered by the human face has attracted significant attention in the face image processing research community. Imagebased age and agegroup estimation particularly has attracted enormous research interest due to its vast application areas like ageinvariant face recognition and face verification across age, among other commercial and law enforcement areas [5–9]. Age estimation has been extensively studied with the aim of finding out aging patterns and variations and how to best characterize an aging face for accurate age estimation.
Age estimation research has gained significant attention in recent years with many journal and conference papers being published annually as well as Masters and PhD theses defended [10]. Age estimation is a technique of automatically labeling the human face with an exact age or age group. This age can be either actual age, appearance age, perceived age, or estimated age [11]. Actual age is the number of years one has accumulated since birth to date, denoted as a real number. Appearance and perceived age are estimated based on visual age information portrayed on the face while estimated age is a subject’s age estimated by a machine from the facial visual appearance. Appearance age is assumed to be consistent with actual age although there are variations due to the stochastic nature of aging among individuals. Estimated age and perceived age are defined on visual artifacts of appearance age. There has been relatively few publications on age and agegroup estimation [11]. This could be attributed to age estimation not being a classical classification problem. Age estimation can be approached as a multiclass classification problem or a regression problem or as an ensemble of both classification and regression in a hierarchical manner. Another reason that could be affecting research in age estimation is the difficulty in collecting a large database with chronological images for a subject. Prolific and diverse information conveyed by faces also make special attributes of aging variations not accurately captured [11]. Uncontrollable and personalized age progression information displayed on faces further complicates age estimation problem [12–14].
Facial aging
Aging is a stochastic, uncontrollable, inevitable, and irreversible process that causes variations in facial shape and texture. Although aging is stochastic with different people having different aging patterns, there are some general variations and similarities that can be modeled [15, 16]. There are two stages in human life that are distinct with regard to facial growth: formative or childhood stage and adulthood or aging stage [17].
Aging introduces significant change in facial shape in formative years and relatively large texture variations with still minor change in shape in older age groups [11, 18]. Shape variations in younger age groups are caused by craniofacial growth. Craniofacial studies have shown that human faces change from circular to oval as one ages [19]. These changes lead to variations in the position of fiducial landmarks [20]. During craniofacial development, the forehead slopes back releasing space on the cranium. The eyes, ears, mouth, and nose expand to cover interstitial space created. The chin becomes protrusive as cheeks extend. Facial skin remains moderately unchanged than shape. More literature on craniofacial development is found in [16].
As one ages, facial blemishes like wrinkles, freckles, and age spots appear. Underneath the skin, melaninproducing cells are damaged due to exposure to the suns’ ultraviolet (UV) rays. Freckles and age spots appear due to overproduction of melanin. Consequently, lightreflecting collagen not only decreases but also becomes nonuniformly distributed making facial skin tone nonuniform [1]. Parts adversely affected by sunlight are the upper cheek, nose, nose bridge, and forehead.
The most visible variations in adulthood to old age are skin variations exhibited in texture change. There is still minimal facial shape variation in these age groups. Biologically, as the skin grows old, collagen underneath the skin is lost [11]. Loss of collagen and effect of gravity make the skin become darker, thinner, leathery, and less elastic. Facial spots and wrinkles appear gradually. The framework of bones beneath the skin may also start deteriorating leading to accelerated development of wrinkles and variations in skin texture. More details about face aging in adulthood is found in [16]. These variations in shape and texture across ages could be modeled and used to automatically estimate someone’s age. We refer readers to [16] for more details on facial aging. Facial aging has three unique attributes [13]:

1.
Aging is inevitable and uncontrollable. No one can avoid aging, advance, or delay it. The aging process is slow but irreversible.

2.
Aging patterns are personalized. People age differently. Individuals’ aging pattern is dependent on her/his genetic makeup as well as various extrinsic factors such as health, environmental conditions, and lifestyle.

3.
Achieved aging patterns are temporal. Facial variations caused by aging are not permanent. Furthermore, facial variation at a particular point in time affects future appearance and does not affect previous appearance of these faces.
These facial aging attributes, among other factors, make automatic age estimation a difficult and challenging task. Since individuals cannot voluntarily control aging, automatic age estimation data collection becomes a hard task to do. This problem was slightly alleviated by dissemination of FGNET Aging Dataset [21] in 2002. Although this dataset has images of subjects at different ages, there are several missing images hence making the aging patterns incomplete. Fortunately, we do not need a complete aging face dataset since people, who computers try to mimic, also learn how to process face image patterns from incomplete patterns. Age estimation technique should be capable of considering various aging patterns since each individual has his/her own aging pattern.
Information rendered by the human face has attracted significant attention in the face image processing research community. Imagebased age and agegroup estimation has vast application areas like ageinvariant face recognition, face verification across ages, commercial and law enforcement areas [5–9], security control and surveillance [11, 22], agebased image retrieval [23], biometrics [11, 24, 25] human computer interaction [26, 27], and electronic customer relationship management (ECRM) [11]. The main aim of studying age estimation is to find out aging patterns and variations in facial appearance and how to best characterize an aging face for accurate age estimation. Although this problem has attracted significant research, still automatic age estimation accuracies are far below human accuracy.
Age estimation application areas
Characterizing variations in facial appearance across age has many significant realworld applications. Computerbased age estimation is useful in situations where one’s age is to be determined. There are several application areas for age estimation including the following:
3.0.1 Age simulation
Characterization of facial appearance at different ages could be effectively used in simulating or modeling one’s age at a particular point in time. Estimated ages at different times could help in learning the aging pattern of an individual, which could assist in simulating facial appearance of the individual at some unseen age. More details on facial aging simulation could be found in [28, 29]. By observing aging patterns at different ages, unseen appearance could be simulated and used to find missing persons. By observing aging patterns at different ages, unseen appearance could be simulated.
3.0.2 Electronic customer relationship management (ECRM)
ECRM [11] is the use of Internetbased technologies such as websites, emails, forums, and chat rooms, for effective managing of distinguished interactions with clients and individually communicating to them. Customers in different ages may have diverse preferences and expectations of a product [30]. Therefore, companies may use automatic age estimation to monitor market trends and customize their products and services to meet needs and preferences of customers in different age groups. The problem here is how to acquire and analyze substantive personal data from all client groups without infringing on their privacy rights. With automatic age estimation, a camera can snap pictures of clients and automatically estimate their age groups in addition to collection of demographic data.
3.0.3 Security and surveillance
Age estimation can be used in surveillance and monitoring of alcohol and cigarette vending machines and bars for preventing underage from accessing alcoholic drinks and cigarettes and restricting children access to adult websites and movies [23, 31]. Age estimation can also be significant in controlling ATM money transfer fraud by monitoring a particular age group that is apt to the vice [11]. Age estimation can also be used to improve accuracy and robustness of face recognition hence improving homeland security. Age estimation can also be used in healthcare systems like robotic nurse and doctors expert system for customized medical services. For instance, a customized avatar can be automatically selected from a database for interacting with patients from various age groups depending on preferences.
3.0.4 Biometrics
Age estimation via faces is a soft biometric [32] that can be used to compliment biometric techniques like face recognition, fingerprints, or iris in order to improve recognition, verification, or authentication accuracies. Age estimation can be applied in ageinvariant face recognition [10], iris recognition, hand geometry recognition, and fingerprint recognition in order to improve accuracy of hard (primary) biometric system [11].
3.0.5 Employment
Some government employments like the military and police consider one’s age as a requirement. Age estimation systems could be used to determine age of the recruits during recruitment process. It is also a policy of several governments that employees should retire after reaching a particular age. Age estimation systems could also play a significant role in finding if one has reached retirement age.
3.0.6 Content access
With the proliferation of diverse content in televisions (TV) and the Internet, age estimation can be used to control access to unwanted content to children. A camera could be mounted on a TV to monitor people looking at it such that it switches off the TV if at a particular time unwanted content is streamed and people watching are children.
3.0.7 Missing persons
Age estimation role in age simulation go a step further in aiding identification of missing persons. Age simulation can be used to identify old people from their previous images for purposes of identification.
Factors affecting facial aging
Facial aging is affected by several factors ranging from lifestyle, natural, occupation, psychological, and environmental. Factors affecting facial aging can be categorized as both intrinsic and extrinsic. Extrinsic factors are those that are external to the human body like environmental and occupation factors while intrinsic are internal factors like bone structure and genetic influence which occur naturally over time [1, 33]. In childhood, facial changes are mainly caused by craniofacial development which lead to changes in facial shape [16] due to growth, modeling, and deposition of bony tissues in the face. This leads to changes in height and shape of the face [34]. The forehead slopes back releasing space on the cranium. Drifting and expansion of facial landmarks to occupy this space causes variations in facial shape in childhood. In adulthood, facial aging is mainly manifested in texture variations which are caused by a wide variety of factors.
Taister et al. [34] found that general exposure to wind and arid air influences facial aging. Arid environment and wind dehydrate the skin leading to wrinkle formation. Air pollution has also been found to affect aging by accelerating wrinkle development [35–37]. Research on air pollution and aging has shown that city dwellers who are exposed to air pollution from industries develop deep wrinkles than individuals who are not exposed to pollution. Smoking influence on aging has also been cited in [34, 38–40] although [41] asserts that smoking has negligible effect to facial wrinkling compared to effect of UV rays. However, smoking interrupts skin microvasculature which affects elastin and collagen production and functioning leading to wrinkles around the mouth, but photoaging effects lead to more facial wrinkling compared to smoking [34, 41]. It is therefore evident that facial skin aging does not provide objective analysis of cumulative exposure to UV rays. Taister et al. [34] also assert that exposure to drug and psychological stress affects skin texture and pigmentation making skin complexion spotted and blemished.
Exposure to ultraviolet (UV) rays influences production of collagen making the skin darker. UV rays dry and destroy cells and underlying skin structure, giving the skin a furrowed and thickened appearance hastening development of wrinkles especially around the eyes due to squinting effects [42]. Long exposure to UV rays leads to variations in photoaging like skin wrinkling, elastosis, actinic keratosis, and irregular pigmentation [43]. With long exposure to UV rays, skin texture and color change becoming blotchy, yellowish, leathery, loose, inelastic, and hyperpigmented. Blood veins close to the skin surface become protrusive forming “spider vein” network in addition to overall speckled skin appearance [44]. Naturally, with lower production of collagen and elastin, the skin becomes leathery and less elastic. Fat cells begin to disappear leading to skin sagging. Fat deposits in some areas like the eye lobe region also affect skin texture. Force of gravity makes the skin leathery and less elastic hence accelerating skin wrinkling.
Internally, changes in bone structure and subsequent variations in musculature cause skin wrinkling [16]. Loss of skin elasticity makes the skin leathery leading to formation of wrinkles [45]. Aging was also found to be different between males and females with female faces tending to age faster compared to male faces [16].
Aging in males and females share many common characteristics, but there are some differences. Although it is generally acknowledged that females age faster compared to men, it is not yet clear whether these gender differences are caused by rate of aging or sexual dimorphism [16]. Investigation into differences in aging between males and females is necessary [46]. Differences in male facial aging include manifestation of facial hair like beards, increased thickness, facial vascularity, sebaceous content, and potential differences in fat and bone absorption rates [47]. Development of deeper wrinkles around the perioral region is high in women compared to men [47] since women’s skin has few appendages compared to men [48]. Some women look younger than their actual age and have large lips and are genetically protected from wrinkle and gray hair development [49].
Other factors affecting perceived facial aging include diet, genetic makeup, ethnicity (race), skin infections, and cosmetics. Cosmetics are generally used to hide perceived age of an individual by hiding wrinkles and age spots and brightening wrinkle shadows around the eyes, mouth, and nose regions [50]. Chen et al. [50] found that facial makeup significantly impacts age estimation. Guo and Wang [51] and Nguyen et al. [52] investigated the effect of facial expression in age estimation. By quantitative evaluations on Lifespan [53] and FACES [54] datasets, Guo and Wang [51] found that facial expression influences age estimation. Same findings were reported by Nguyen et al. [52]. Voelkle and Ebner [55] investigated the effect of age, gender, and facial expression on perceived age. They found that facial expression influences age estimation with faces with happy facial expressions most underestimated. Some facial expressions like smiling, frowning, surprise, and laughing may introduce wrinklelike lines on some regions of the face like the forehead, cheek bone area, mouth region, and nosebridge regions. These wrinklelike lines may be registered as wrinkles during age estimation hence having an impact on age estimation performance.
Image representation for age modeling
In this section, we present different approaches used for image representation for age estimation. Age estimation can be modelled using anthropometric data, active appearance model (AAM) parameters, aging pattern subspace (AGES), manifold learning, appearance features, or a hybrid of two or more modeling technique. We present an overview of these modeling techniques in the subsequent sections.
Anthropometric models
Anthropometric modeling of facial aging focuses on distance measurements between facial points. Face anthropometry is the study of measuring sizes and proportions on human faces [56]. Farkas [56] defined face anthropometry based on measurements taken from 57 landmark points on human faces. Figure 1 shows some of the points used to describe a face. Landmark points are identified by abbreviation of their respective anatomical names. For instance, the eye inner corner is en for endocanthion while front of the ear is t for tragion.
Farkas defined five measurements between landmarks: shortest distance, axial distance, tangential distance, angle of inclination, and angle between locations. Figure 2 shows sample measurements of these distances.
A total of 132 facial measurements were defined by Farkas [56], whereby some corresponding measurements on the left and right of the face were paired. The measurements can be taken by hand by experienced anthropometrists or 3D scanners [56–58].
Facial measurements could be taken at different ages for instance from childhood to old age. Ratios of distances between facial landmarks like the eyes, nose, mouth, ear, chin, and forehead are measured across age. Facial measurements are used to determine the aging pattern of an individual at a particular age and hence used to discriminate between ages and age groups. This approach embraces studies in craniofacial development theory [2].
Craniofacial development theory uses cardioid strain transformation mathematical model to describe a person’s facial growth from infancy to adult age. This model defines a circle to track facial growth by tracking variations in radius of the circle as
where R is the initial radius of the circle, θ is the initial angle formed with the vertical axis, k is a parameter that increases with time, and R^{′} is the successive growth of the circle over time. Figure 3 shows simulated face profiles using cardioidal strain transformations.
The mathematical formulation in Eq. 1 is not commonly used for age estimation because it does not encode head profile, especially in adults [59], and head profiles are hard to estimate from 2D facial images [11]. Furthermore, anthropometric models cannot be used for age modeling in adult and old age face images since there are no significant changes in facial shape at these stages. This approach is also only appropriate for frontal face images since distance between landmarks are sensitive to head poses. This modeling technique has not been experimented on a large publicly available database, with few studies reported in the literature working on small private datasets. Another limitation of this approach is that it only considers distance between facial landmarks with no consideration for facial appearance. Measurements and landmark points defined by Farkas in [56], which often guide anthropometric modeling, are from people in one ethnic group (European) and may not be representative of all other races.
Active shape models
Active shape model (ASM) [60] is a statistical model that characterizes shape of an object. ASM builds a model by learning patterns of variability from a training set of correctly annotated images. ASMs are able to capture natural variability of images of the same class unlike active contour models (ACMs) [61]. ASMs are specific to images of the class of objects they represent. Face image shape is denoted by a collection of landmark points. Good choices for landmark points are points at clear corners of the face and facial landmark boundaries. These points can be determined by use of appropriate 2D landmarking algorithm like the one proposed in [62]. The sets of points are automatically aligned to reduce the variance in distance between equivalent points. The number of landmark points must be adequate enough to show overall shape of the face images. Each face is then represented by a predefined number of landmark points depending on complexity of the facial shape and the desired level of descriptive information. A point distribution model (PDM) is derived by examining spatial statistics of labeled points. PDM gives mean locations of points and a set of parameters that control main variability modes found in the training set.
Given such a model and test image, image interpretation involves choosing values for each of the parameters such that the best fit of the model to the image is found. ASM allows initial rough guess of best shape, orientation, scale, and position which is refined by comparing hypothesized model instance to image data and using difference between model and image to deform to shape. ASM is more similar to AAM but differs in the sense that instances in ASM can only deform according to variations found in the training set. ASM is not commonly used in age estimation; hence, more investigations adopting this modeling strategy are necessary.
Active shape model has the following limitations [63]:

1.
Results into poor matching of boundaries in an image due to parametric description of shape. It is not robust when new images are introduced. These lead to problems during subsequent image analysis

2.
Active shape model needs many landmark points and training samples to represent shape and its variations. Makes ASM costly and time consuming during training

3.
Active shape model segmentation results are sensitive to local search region around landmarks
Active appearance model
Active appearance models (AAMs) [64] are statistical facial image coding models. Using principal component analysis (PCA), AAM learns shape model and intensity model from a set of training images. AAMs have been used extensively in modeling facial shape for face recognition, face verification, age estimation, and gender estimation among other tasks. AAM considers both facial shape and texture unlike anthropometric models that consider shape parameters only. This makes AAMs appropriate for age estimation modeling at all stages from infancy to old age. Labeling each test image with a definite age label from continuous age range makes AAM approaches give precise age estimations [11].
Annotated sets of training images marked with points defining facial main features are needed to build AAM. Figure 4 shows a sample of annotated face and points used for annotation.
These points can be determined by use of appropriate 2D landmarking algorithm like the one proposed in [62]. These sets of points are represented as a vector and aligned before a statistical shape model built. Each training image is then warped so that the annotated points match points of mean shape and obtain a shapefree image patch. The shapefree raster is pushed into a texture vector, g, which is normalized by applying a linear transformation, \(g \gets \frac {\left (g  \mu _{g}1\right)}{\sigma _{g}}\), where 1 is a vector of ones and μ_{ g } and \(\sigma _{g}^{2}\) are the mean and variance of elements of g, respectively. After normalization, g^{T}1=0 and g=1. Principal component analysis (PCA) is then used to build a texture model. Finally, connections between shape and texture are learned to produce a combined appearance model as detailed in [65].
The generated appearance model has parameters, c, controlling the shape and texture according to:
where \(\bar {x}\) is the mean shape, \(\bar {g}\) is the mean texture in a meanshaped patch, and Q_{ s } and Q_{ g } are matrices describing modes of variation derived from training set. AAM are slower compared to active shape models (ASMs) [60]. Details of AAM implementation could be found in [64].
Lanitis et al. [66] extended AAM by proposing and aging function age=f(b). In this function, age is the real subject’s age, b is AAMlearned vector of 50 raw model parameters, and f is aging function. The function f describes the association between an individual’s age and vector of parameters.
AAM face encoding considers both shape and texture unlike anthropometric techniques that only represent shape. This makes AAM approaches appropriate for age estimation since both texture and shape features necessitate precise age estimation. However, evidence is needed to show that aging patterns can be modelled as a quadratic function and highlight effect of outliers in age estimation. Active appearance model is computational intensive. Training phase requires a substantive number of images for the model to learn robust shape and appearance features. Active appearance model uses graylevel intensities of the image to train an intensity model. Graylevel intensities may be affected by noise hence leading to a weak intensity model. Performance of AAM depends on the quality of images used. Images with significantly different background and scale inhibit model fitting, resulting in poor performance of AAMbased systems.
Aging pattern subspace
Geng et al. [13, 26] proposed aging pattern subspace (AGES) for automatic age estimation using appearance of face images. A series of individual images arranged in temporal order make up aging pattern. Aging pattern is defined in [13] as “…a sequence of personal face images sorted in time order.” All images in a pattern must come from the same individual and must be ordered by time. This aging pattern is called a complete pattern if images at all ages for an individual are available or else it is referred to as an incomplete pattern. AGES compensate missing ages by learning a subspace representation of one’s images when modeling a series of a subject’s aging face. To estimate age, test image is positioned at each possible location in the aging pattern to find a point that can best reconstruct it. Aging subspace that minimizes reconstruction error determines age of the test image. Figure 5 shows vectorization of aging pattern with missing images in the aging pattern vector marked with m. Available face images in the pattern (ages 2, 5, and 8) are placed at their respective positions and ages at which images are not available if their positions are left blank.
After vectorization of the aging pattern, face images at ages 2, 5, and 8 are represented by feature vectors b_{2},b_{5}, and b_{8}, respectively. Representing aging pattern using AGES ensures that label age(I) and id(I) are integrated into the data whereby each pattern implies an ID and each age is fixed at a particular timeordered position in the aging pattern.
The first step of AGES is learning, where aging pattern is learned then followed by age estimation. Subspace representation is obtained in the learning stage using PCA. Due to the possibility of missing age images, reconstruction error between available age and reconstructed face image is minimized by expectation maximization (EM) iterative learning technique. Average of the available face images is used to initialize values for missing faces. Thereafter, mean, covariance matrix, and eigenvectors of all face images are computed. Faces are then reconstructed using mean face and eigenvectors. This process is repeated until the reconstruction error is significantly small. During age estimation, the test image finds aging pattern subspace and position in that pattern that can minimize its reconstruction error. The position that gives minimal reconstruction error is returned as the estimated age of the probe image. Ghostlike twisted faces are reconstructed when test image is positioned at a wrong location in the aging pattern subspace [13, 26].
AGES was evaluated on FGNET [21] and a MAE of 6.77 years was reported [13, 26]. This performance was superior to previously used approaches reported in literature. In AGES, face images are first encoded with AAM. AGES undertakes existence of multiple images of the same person at various ages or aging pattern of the face is similar in a given training dataset. This assumption may not be satisfied in aging datasets like Yamaha gender and age (YGA) [12]. Collecting face dataset with individuals’ face images at several ages with some image quality may not be possible. AAM cannot encode wrinkles on the face since AAM only encodes image gray values without spatial neighborhood information for texture pattern calculation. Intensities of individual pixels cannot describe local texture. This affects applicability of AGES for age and agegroup estimation since single pixel values cannot represent local texture. Techniques like Gabor filter [67] may be appropriate to encode wrinkle features on elderly faces.
Age manifold
In age manifold, a common aging pattern is learned from images of many individuals and different ages. Several face images are adopted to represent an age. Each subject may be represented by one image or several images at different ages. These images make a set referred to as a manifold which make up points in a highdimensional vector space. Age manifold learning face representation offers flexible means of face representation as compared to AGES [13]. Age manifold [68] can be used to learn aging pattern by learning lowdimensional aging pattern from several faces at every age. Individuals may have as low as one image at each age in the dataset which makes it simpler to collect enormous facial aging dataset. Scherbaum et al. [69] proposed statistical age estimation using manifold learning on 3D morphable model. Isosurfaces of nonlinear support vector regression (SVR) function formed the manifold, and aging pattern was found by identifying a trajectory orthogonal to the isosurfaces. Discriminative subspace learning based on manifold criterion for lowdimensional representation of aging manifold was proposed by Guo et al. in [31]. Coded face representation and age is learned by applying regression on aging manifold patterns. This approach consisted of two support vector regression (SVR) with one used for rough agegroup estimation followed by refined age estimation within the initially obtained age group.
Given ageordered image space \(X~=~\{x_{i}:x_{i} \in \text {I\!R}^{D}\}_{i=1}^{n}\) with image dimension D and a vector \(L~=~\{l_{i}:l_{i} \in \text {I\!N}^{D}\}_{i=1}^{n}\) of labels associated with the images in the image space, the objective is to learn a lowdimensional manifold in the embedded subspace, data distribution, and its representation \(Y~=~\{x_{i}:x_{i} \in \text {I\!R}^{D}\}_{i=1}^{n}\) with d ≤ D, which is a direct mapping to X. Therefore, image space to manifold space projection can be modelled as Y = P(X,L), where P(·) denotes the projection function which can be linear or nonlinear. Figure 6 shows a simple nonlinear projection function that models an image space into a 2D age manifold. Respective ages are shown on the topleft corner of each image.
The objective of manifold embedding is to find n × d matrix P that satisfies Y = P^{T}XX or directly find Y where Y = {y_{1},y_{2}…,y_{ n }}, X = {x_{1},x_{2}…,x_{ n }}, P = {p_{1},p_{2}…,p_{ n }}, and d ≤ n. PCA, locally linear embedding (LLE), and orthogonal locality preserving projections (OLPP) are examples of techniques used for dimensionality reduction and embedding manifold. PCA finds the embedding that maximizes the projected variance P = arg maxp=1P^{T}p where \(S ~=~ \sum _{i=1}^{n}\left (x_{i}  \bar {x}\right)\left (x_{i}  \bar {x}\right)^{T}\) is the scatter matrix and \(\bar {x}\) is the mean of vector \(\{x_{i}\}_{i=1}^{n}\). LLE technique seeks a nonlinear embedding in a neighborhoodpreserving way by using local linear image class reconstruction symmetries while seeking local reconstruction optimal weights. Based on linear preserving projections (LPP), OLPP technique produces orthogonal basis functions [70, 71] to find additional discerning information for embedding. LPP looks for the embedding that will preserve essential manifold structure by measuring distance information in local neighborhood. Affinity weights are defined as \(s_{ij} ~=~ \exp \left (\frac {x_{i}  x_{j}^{2}}{t}\right)\) where x_{ i } and x_{ j } are k nearest neighbors of each other; otherwise, s_{ ij } = 0 and s_{ ij } is a symmetric matrix. LPP similarly defines diagonal matrix D(i,j) and a Laplacian matrix L = D − S. LPP represents age manifold well and performs better in age estimation compared to traditional PCA.
There is a connection between age manifold and subspace analysis for aging patterns. This technique finds embedded lowdimensional when each age is represented by many faces in the database. By using LPP for manifold embedding, age labels can be incorporated to the embedding process in a supervised manner which improves results compared to PCA embedding. Age manifold, unlike AGES [13], does not learn subjectspecific aging pattern; rather, it uses all available ages from different individuals. However, age manifold requires a large dataset in order to satisfactorily learn the embedded manifold.
Huang et al. [72] proposed a multimanifold metric learning (MMML) for face recognition based on image sets. In MMML, several personspecific distance metrics in different manifolds are learned by modeling each image set as a manifold minimizing intraclass variations and maximizing interclass manifold variations. Figure 7 shows the multimanifold metric learning.
MMML could be applied to age estimation by grouping images at the same age into one set and learn distance metrics between these sets. Each class (as shown in Fig. 7) could consist of images at a particular age. The limitation of age manifold models is that they are computationally intensive.
Appearance models
Appearance models mainly model facial appearance using texture, shape, and wrinkle features for age estimation, face recognition, face verification, and gender estimation among other tasks. Image is represented by vectoring both shape and texture [73]. Appearance models are more like AAM [64] that builds a statistical model using the shape and texture of the face. Both global and local texture, shape and wrinkle features are extracted and modelled for age estimation. Texture and shape have been used for age and gender estimation [74, 75]. Age estimation using appearance features can be improved by performing gender estimation prior since males and females exhibit varied aging patterns.
Given a set of facial images \(X ~=~ \{x_{i}:x_{i} \in \text {I\!R}\}_{i=1}^{n}\) and a vector of age labels \(X ~=~ \{l_{i}:l_{i} \in \text {I\!N}\}_{i=1}^{n}\), facial features are extracted from vector \(\{x_{i}\}_{i=1}^{n}\) of images at a particular age. Every feature F_{ i } has a onetoone mapping with one of the age label l_{ i }. After features are extracted and associated with age label, they are used for age estimation either using a regression model or classification. Effectiveness of LBP [76] in texture characterization has made it popular in extraction of appearance features for age estimation. LBP has been used in [77] and achieved 80% accuracy in age estimation with nearest neighbor classifier and 80–90% accuracy with AdaBoost classifier [78]. Gao and Ai [79] used Gabor filter [67] appearance feature extraction technique for age estimation and reported better results compared to LBP technique. BIF [80, 81] is also used in appearancebased models as used in [82]. Using age manifold, BIF and SVM classifier, MAE of 2.61 and 2.58 years for females and males, respectively, can be achieved on YGA database [11]. This shows BIFs’ superior performance in age estimation. Spatially flexible patch (SFP) proposed in [83, 84] is another feature descriptor that can be used for characterizing appearance for age estimation. Other techniques that can be used to build appearance models for age estimation are linear discriminant analysis (LDA) and principal component analysis (PCA). Detailed description of these techniques is presented in Section 6.
Hybrid models
What is the best modeling approach for age estimation? It is hard to certainly answer this question since each of the modeling approaches discussed have their inherent strengths and limitations. To get the answer to the question, one may try different modeling approaches on the representative images and compare their performance. By comparing different modeling approaches, strengths and limitations of each of the models can be found. Modeling approaches that are complementary of each other can be combined to form a hybrid modeling approach. Hybrid age estimation modeling combines several modeling techniques to take advantage of the strengths of each technique used. By combining different modeling techniques, age estimation accuracies are expected to not only improve but also be robust. These models could be combined in a hierarchical manner or parallel and results from different models combined for final age estimation.
Aging feature extraction techniques
Gabor filters
Originally introduced by Denis Gabor in 1946 [67], Gabor filters have been extensively used for wrinkle, edge, and texture feature extraction due to its capability of determining orientation and magnitude of wrinkles [70]. Gabor filter has been regarded as the best texture descriptor in object recognition, segmentation, tracking of motion, and image registration [71]. Gabor features have been used in age estimation [27] and demonstrated to be an effective texture descriptor compared to LBP. Since wrinkles appear as edgelike components with high frequency, Gabor edge analysis technique has been commonly used for wrinkle feature extraction. Sobel filter [85, 86], Hough transform [74], and active contours [87] are among the most commonly used texture edge descriptors. Though edges in a face image also consist of noise such as beards, mustache, hairs, and shadows, to reduce the effect of this noise, [70] proposes use of predominant orientation of wrinkles to be considered in wrinkle feature extraction. 2D spatial domain Gabor is defined as:
where σ_{ x } and σ_{ y } are the standard deviations of the distribution along x and y axes, respectively, and W is the sinusoidal radial frequency.
The general equation for creating Gabor filter bank could be expressed as:
where \(\bar {x} ~=~ x\cos \theta ~+~ y\sin \theta \) and \(\bar {y} ~=~ \thinspace {x}\sin \theta ~+~ y\cos \theta \) where \(\theta _{k} ~=~ \pi \frac {\left (k1\right)}{n}, k ~=~ 1, 2, 3\dots n\) where n is the number of orientations used and a^{−m} is filter scale for m = 0,1,2…S for S scales. Redundancy in the frequency domain is prevented by designing Gabor wavelets as:
where U_{ l } and U_{ h } denote lower and higher average frequencies, respectively, and W = U_{ h }. We refer readers to [71] and [88] for more details on Gabor wavelets.
Linear discriminant analysis
Linear discriminant analysis (LDA) [89, 90] is a feature extraction technique that searches for features that best discriminate between classes. Given a set of independent features, LDA creates a linear combination of these features such that the largest mean differences between classes are achieved. LDA defines two measures: within class scatter matrix, given by
where \(x_{i}^{j}\) is ith sample of class j, μ_{ j } is the mean of class j,c is number of classes, and N_{ j } is the number of samples in class j, and betweenclass scatter matrix, given by
where μ is the mean of all classes. The LDA main objective is to maximize betweenclass scatter matrix while minimizing withinclass scatter matrix.
One way of doing this is maximizing the ratio \(\frac {detS_{b}}{detS_{w}}\). Given that S_{ w } is nonsingular, it has been proven [89] that this ratio is maximized when column vectors of projection matrix are the eigenvectors of \(S_{w}^{1}S_{b} \). S_{ w } maximum rank is N−c with N samples and c classes. This therefore requires N = t + c samples to guarantee that S_{ w } does not become singular, where t is the dimensionality of input data. The number of samples N is almost always smaller than t, making the scatter matrix S_{ w } singular. To solve this problem, Belhumeour [91] and Swets and Weng [92] propose projecting input data to PCA subspace, to reduce dimensionality to N−c, or less, before applying LDA. PCA and LDA are widely used appearance feature extraction methods in pattern recognition [93]. Consequently, we adopt LDA for extraction of global face appearance features for agegroup estimation.
Local binary patterns
Texture features have been extensively used in age estimation techniques [10]. Local binary pattern (LBP) is a texture description technique that can detect microstructure patterns like spots, edges, lines, and flat areas on the skin [76]. LBP is used to describe texture for face recognition, gender classification, age estimation, face detection, and face and facial component tracking. Gunay and Nabiyev [94] used LBP to characterize texture features for age estimation. They reported accuracy of 80% on FERET [77] dataset using nearest neighbor classifier and 80–90% accuracy on FERET and PIE datasets using AdaBoost classifier [78]. Figure 8 shows a sample of 3 × 3 LBP operation.
Concatenating all 8 bits gives a binary number. The resulting binary number is converted to a decimal and assigned to center pixel as its LBP code.
Ojala et al. [95] found that when using eight neighbors and radius 1, 90% of all patterns are made up of uniform patterns. The original LBP operator had limitation in capturing dominant features with largescale structures. The operator was latter extended to capture texture features with neighborhood of different radii [95]. A set of sampling pixels distributed evenly along the circle circumference centered at the pixel to be labeled defines the neighborhood. Bilinear interpolation of points that do not fall within the pixels is done to allow any radii and any number of sampling pixels.
Uniform patterns may represent microstructures as line, spot, edge, or flat area. Figure 9 shows microstructure pattern representation.
Ojala et al. [76] further categorized LBP codes as uniform and nonuniform patterns. LBP pattern with utmost two bitwise transition from 0 to 1 or 1 to 0 is categorized as a uniform pattern. For instance, 00000000, 00010000, and 11011111 patterns are uniform while 01010000, 11100101, and 10101001 are nonuniform patterns. For nbit pattern representation, there is n(n − 1) + 2 uniform patterns. Figure 9 shows LBP codes for sample uniform patterns in LBP(8,1) neighborhood. In order to extract rotational invariant features using LBP, the generated LBP code is circularly rotated until its minimum value is obtained [96].
Extended LBP operator could capture more texture features on an image but still it could not preserve spatial information about these features. Ahonen et al. [97] proposed a technique of dividing a face image into n cells. Histograms are generated for each cell then concatenated to a single spatial histogram. Spatial histogram preserves both spatial and texture descriptions of an image. Image texture features are finally represented by histogram of LBP codes. LBP histogram contains detailed texture descriptor for all structures on the face image like spots, lines, edges, and flat areas. More details on the use of LBP on facial image analysis could be found in [76, 96–98].
Local directional pattern
Local binary patterns (LBP) [99] were found to be unstable to image noise and variations in illumination. Jabid et al. [100] proposed local directional pattern (LDP) which is robust to image noise and nonmonotonic variations in illumination. Figure 10 shows robustness of LDP operator to noise compared to LBP.
LDP computes 8bit binary code for each pixel in the image by comparing the edge response of each pixel in different orientations instead of comparing raw pixel intensities as LBP. Kirsch edge detector [101], Prewitt edge detector [102], and Sobel edge detector [103] are some of the edge detectors that can be used [104]. Among them, the Kirsch edge detector has been known to detect different directional edge responses more accurately than others because the Kirsch edge detector considers all eight neighbors [105]. Figure 11 shows Kirsch edge detector response masks (kernels) for eight orientations.
Given a center pixel in an image P(i,j), 8directional responses are computed by convolving the neighboring pixels, 3 × 3 image region, with each of the Kirsch masks. For each center pixel, there will be eight directional response values. The presence of an edge or a corner will show high (absolute) response values in that particular direction. The interest of LDP is to determine k significant directional responses and set their corresponding bit value to 1 and set the rest of 8 − k bits to 0. These binary bits are converted to decimal and assigned to the center pixel. This process is repeated for all pixels in an image to obtain LDP representation of the image. Figure 12 shows the process of encoding an image using LDP operator.
Given an image region as shown in Fig. 12a, LDP response in the east direction is obtained by convolving the 3×3 image region shown in Fig. 10 with the East M_{0} mask shown in Fig. 11 topleft corner as:
The absolute values of the directional responses are arranged in descending order. For k = 3 significant responses, the binary response bit for each of the eight neighboring pixels shown in Fig. 12b is calculated as:
where m_{ k } is the kth significant directional response, example in Fig. 12 m_{ k }=−399, and m_{ i } is response of Kirsch mask M_{ i }.
For k = 3, LDP operator generates \(C_{3}^{8}=\frac {8!}{3!\times \left (83\right)!}=56\) distinct values in the LDP encoded image. The resultant histogram will have values between 0 and 56. A histogram H(i) with \(C_{k}^{8}\) bins can be used to represent the input image of size M × N as:
where f(p,i) is a logical function that compares if the LDP code at location p(m,n) of the LDPencode image is equal to the current LDP pattern i for all i in the range \(0\leq i \leq C_{k}^{8}\). The resultant histogram has dimensions \(1 \times C_{k}^{8}\) and is used to represent the image. The resultant feature has spots, corners, edges, and texture information about the image [106].
Local ternary patterns
LBP is sensitive to noise and illumination especially in nearly uniform image blocks. Local ternary patterns (LTP) [107] seek to improve robustness of image features in a fairly uniform region. LTP extends LBP to a threevalue code by comparing pixel values of the neighboring pixels with a preset threshold value τ. Values that lie within ± τ are set to 0, values above τ are set to + 1 while values below τ are set to − 1. The thresholding function is defined as
where τ is a preset threshold, x_{ c } is the value of the central pixel, and x_{ i } for i = 0,1,2…7 are the neighboring pixels of x_{ c }. Although this extension makes LTP robust to noise and encode more patterns, it is not easy to practically select an optimum τ for all images in a dataset or for all datasets, and the resultant code is not invariant to pixel value transformations. LTP can encode 3^{8} patterns. The LTP codes are split into its positive and negative parts and two histograms are generated, one for the negative part and the other for the positive part. These histograms are concatenated and used as feature descriptor for pattern recognition. Figure 13 shows LTP codes for a 3×3 sample image region.
Graylevel cooccurrence matrix
Statistical moments of histogram intensities of an image are commonly used to describe texture of an image [108]. Use of histograms to describe texture results to texture descriptors that convey information about graylevel intensity distribution with no spatial relative information of pixel with each other. Haralick et al. [109] introduced graylevel cooccurrence matrix (GLCM) back in 1973.
GLCM describes image texture by comparing each pixel with its neighboring pixel at a specified distance and orientation. This technique extracts secondorder statistical texture features from grayscale images. GLCM is a square matrix whose rows and columns are equal to the number of quantized gray levels, N_{ g }. The entry p(i,j) is the secondorder statistical probability for changes between gray level values i and j at a particular distance d and orientation θ.
Supposed we have an N × N image I(i,j), with N_{ x } columns and N_{ y } rows. N_{ g } is quantization of gray level appearing at each pixel in the image. Let the rows of the image be N_{ y } = (1,2,…N_{ y }), the columns be N_{ x } = (1,2,…N_{ x }), and set of N_{ g } quantized gray levels be G_{ x } = (1,2,3…N_{g−1}). The image can be represented as a function that assigns some gray level in G to each pixel or pair of coordinates in L_{ y }×L_{ x }; G←L_{ y }×L_{ x }. Texture information is specified by GLCM matrix of relative frequencies C(i,j). The value at GLCM(i,j) represents the number of occurrences of graylevel value i at reference pixel and graylevel value j at a neighbor pixel, a certain distance d, and orientation θ^{o}. The probability measure can be defined as:
where p(i,j) is defined as:
The sum in the denominator represents total number of graylevel pairs (i,j) within the image and is bounded by N_{ g }×N_{ g }. Dividing every pixel in the GLCM matrix with the denominator results into a normalized GLCM matrix. Figure 14 shows an example of calculating GLCM from an image region at distance 1 and angle θ = 0°, and Fig. 15 shows an example of calculating GLCM from an image region at distance 1 and angle θ = 45^{o}.
The orientation of the neighbor pixel from reference pixel can be θ=(0^{o},45^{o},90^{o},135^{o}), and distance can vary from d=(1,2,3…n) where n is any reasonable distance bounded by M_{ x } and M_{ y }.
Haralick et al. [109] defined 14 statistical features that can be used to describe texture. Table 1 shows some of the Haralick features used for texture description [110] where:
and
Harlick features have been successfully used in brain tumor classification [111], texture description [112], and remote sensing [113] among other fields. GLCM has not been investigated in aging feature extraction. Haralick features like homogeneity, variance, and correlation could be extracted from ageseparated faces and used for age estimation.
Spatially flexible patch
The spatially flexible patch (SFP) proposed in [83] and [84] is another feature descriptor that can be used for feature extraction for age estimation. SFP is effective for capturing local variations in facial appearance as one ages. SFP encodes local appearance and its spatial information. SFP solves the problem of local variations in appearance during aging since SFPs similar in appearance and slightly different in position can provide similar confidence for age estimation. By considering local patches and their spatial information, SFP can effectively characterize facial images with slight disorientation, occlusion, and head pose disparities. Another advantage of SFP is that it alleviates the problem of insufficient samples by enriching the discriminating characteristics of the feature vector.
Grassmann manifold
Grassmann manifold is the space G(k,n) of all kplanes through the origin in IR^{n},k≤n that generalizes real projective spaces [114]. It consists of a set of all kdimensional subspaces of IR^{n}. To each kplane v in IR^{n}, a matrix n × k can be associated with orthogonal matrix Y, such that columns of matrix Y form an orthonormal basis vector that spans the same subspace. Therefore, each kplane v in G(k,n) is connected with a correspondence class of n × k matrices YR in IR^{n×k}, for IR∈SO(k), where Y is an orthonormal basis for the kplane. G(k,n) is not a vector space, but points on G(k,n) can be projected onto the tangent space at meanpoint, and standard vectorspace methods can be used on tangent space. Geodesic distance between points on the manifold are used for classification or regression problems. Wu [115] used Grassmann manifold tangentspace regression approach for age estimation.
Grassmann manifold can be used in age estimation by representing each face by a deformation that warps an average face to a given face. This requires defining what an average face is and how to quantify the deformation between the average face and the given face. Average face can be represented by computing a mean point from all the (landmark) points on G(k,n). This can be done by calculating Karcher mean [116]. Age estimation can be performed using the Grassmann nearest neighbor (GNN) classification approach. In GNN, Karcher mean is computed for every age. During testing, compare the Karcher mean of the probe image with the mean of every age using one defined distance on Grassmann manifold. The closest mean to the probe gives the target age.
Biologically inspired features
Biologically inspired features (BIFs) were first proposed in 1999 by Riesenhuber and Poggio (R and P model) [80]. These BIF features are derivative of primates feedforward model of visual object recognition pipeline, referred to as HMAX model [117]. Primates are known to be able to recognize visual patterns with high accuracy. Recent studies in computer vision and brain cognition show that biologically inspired models (BIM) improve face identification performance [118], object recognition [119], and scene classification [120]. Visual cortex application in age estimation tasks saw some improvement in age estimation accuracies.
The visual model of primates contains alternating layers of simple (S) and complex (C) cell units. Complexity of these cells increase as layers advance from primary visual cortex (V1) to inferior temporal cortex (IT). In primary visual cortex, S units use a bellshaped tuning function to combine input intensities to increase scale and orientation selectivity. Using MAX, STD, AVG, or any other pooling operation, C units pool inputs from S units, thereby introducing gradual invariance to scale, rotation, and translation.
Gabor functions [121, 122] are used to model simple cells (S) in the visual cortex of mammalian brains. Frequencies and orientation illustration in Gabor filters are alike to frequencies and orientations in human visual system. It is therefore thought that Gabor filter image analysis is similar to perception in visual system of humans. BIFs have demonstrated success in age estimation tasks [82, 123, 124]. BIF feature extraction encompass two layers of computational units with simple cell units (S1) in layer one followed by complex cell units (C1) in the subsequent layer.
S1 units—simple cells: They represent the receptive field in primary visual cortex (V1) [121] which has basic attributes of multiorientation, multifrequency, and multiscale selection [125]. S1 units are commonly described by a bank of Gabor filters [81]. Gabor filters are appropriate for modeling of cortical simplecell receptive fields. 2D spatial domain Gabor is defined as:
where X = x cosθ + y sinθ and Y = − x sinθ + y cosθ are angle of rotations of Gabor filters, θ varies from 0 to π, γ and σ are aspect ratio and standard deviation of the Gaussian envelop, respectively, and λ is the wavelength and determines spatial frequency 1/λ.
Useful discriminating features are extracted using Gabor filters with different orientation and frequencies [126]. Consequently, previous studies [126, 127] suggest that spatial frequency processing is done in primary visual cortex. Spatial frequency analysis extracts discriminative features that are more robust to distortions [128]. Daugman [129] found that visual system in primates extracts information both in 2D spatial and frequency domains, and Shapley [38] proved that spatial frequency analysis help the brain understand an image.
C1 units—cortical complex cells: These units receive responses from S1 units and perform linear feature integration. C1 units represent complex cells that are shift invariant. Lampl et al. [130] proposed that spatial integration of complex cell in visual cortex can be described by a series of pooling operations. Riesenhuber and Poggio [80] demonstrated merits of using MAX pooling operator compared to SUM while Guo et al. [82] showed that standard deviation (STD) pooling operator outperforms MAX operator. Cai et al. [125] improved on STD by using a cell grid of 4 × 4 in normalization. The MAX operator returns maximum values at each index i of the two consecutive scale features. Given a feature at scale S_{ x } and scale S_{x+1}, the maximum value F_{ i } at index i is given by:
where \(S_{x}^{i}\) and \(S_{x+1}^{i}\) are the filtered values at the position i of features from scale x and x + 1 respectively.
Guo et al. [82] defined the STD operator to incorporate mean of values in a particular neighborhood. The STD operator was defined as:
where maximum value at index i between two consecutive S_{1} scales is represented by F_{ i } and \(\bar {F}\) is the mean of filtered values within n_{ s }×n_{ s } neighborhood. Given two N × N features at scales S_{ x } and S_{x+1}, STD operator with n_{ s }×n_{ s } grid returns ⌊N/n_{ s }⌋×⌊N/n_{ s }⌋ features. STD operator captures local texture and wrinkle variations which are significant for subtle age estimation.
Serre et al. [81, 131] extended the HMAX model [80] to include two layers, S_{2} and C_{2} for object recognition. In S_{2}, template matching is done to match patches from C_{1} layer with some prelearned patches extracted from images. The S_{2} layer gets more selective intermediate features capable of discriminating between object classes. The S_{2} units are convolved over an entire image, and maximum response values of S_{2} are assigned to C_{2} units. Mutch and Lowe [132] extended the model in [81] by reducing the number of output units in S_{1} and C_{1} and picking features that are highly weighted by support vector machines (SVMs) [133].
Age estimation algorithms
Once aging features are extracted and represented, the subsequent phase is age estimation. Age estimation is a special patter recognition task where age labels can be viewed as a class or a set of sequential value. When age labels are viewed as classes, age estimation is approached as a classification problem, whereas when age labels are viewed as sequential chronological series, regression approach is used for age estimation. Hybrid approach can also be employed for age estimation where both classification and regression techniques integrated, mostly hierarchically, to find the relationship between extracted feature vectors and age labels. We present an analysis of existing approaches and suggest an effective approach in our opinion.
Classification
Lanitis et al. [23] explored the performance of nearest neighbor, artificial neural network (ANN), and quadratic function in age estimation tasks. Although the quadratic function used to relate face representations to face labels is a regression function, the authors referred to it as a quadratic function classifier [23]. The quadratic function reported MAE of 5.04, which was superior to MAEs reported by nearest neighbor. ANN and selforganizing maps (SOMs) reported better performance compared to quadratic function. The authors proposed clustering and hierarchical age estimation for improving performance. The error rates in the extended techniques reduced although evaluations were done on small datasets. Comparison between humans and computers in age estimation was also done and found that computers can estimate age almost as reliable as humans.
Ueki et al. [134] built 11 Gaussian models in lowdimensional 2DLDA and LDA feature space using expectation maximization (EM). Agegroup estimation was determined by fitting probe image to each cluster and comparing the probabilities. They reported a higher accuracy, 82% male and 74% female, with wide age groups of 15 years as compared to 50% male and 43% female in age groups of a 5year range. This demonstrates that this approach can only post better accuracies where age groups have wide ranges and hence not applicable in a narrowrange agegroup estimation.
Fusing texture and local appearance, Huerta et al. [135] used a deep learning classification for age estimation. Using LBP [95], speededup robust features (SURF) [136], and histogram of oriented gradients (HOG) [137], he evaluated the performance of deep learning on two large datasets and achieved MAE of 3.31. Hu et al. [138] used KullbackLeibler/raw intensities for face representation before using convolutional neural network (CNN) for age estimation. Their approach achieved MAE of 2.8 on FGNET and 2.78 on MORPH II. This demonstrates that deep learning (deep neural networks or CNN) achieves better MAE compared to traditional classification methods.
Regression
Using 50 raw model parameters, Lanitis et al. [66] investigated linear, quadratic, and cubic formulation of aging function. Genetic algorithm is used to learn optimal model parameters from training face images of different ages. Quadratic and cubic aging function achieved better MAE 0.86 and 0.75, respectively, compared to 1.39 of linear function. This suggests that quadratic function offers the best alternative since its MAE was not significantly different from that of cubic function and it is not computationally intensive as cubic function. Guo et al. [31, 139] used linear support vector regression (SVR) on age manifold for age estimation. They reported MAE of 7.47 and 7.00 years for males and females, respectively, on YGA dataset and MAE of 5.16 on FGNET dataset. Yan et al. [140] formulated a regression problem for age estimation using semidefinite programming (SDP). The regressor was learned from uncertain nonnegative labels. They reported MAE of 10.36 and 9.79 years for males and females, respectively, on YGA. They further demonstrated that age estimation by SDP formulation achieves better results compared to ANN. The limitation of SDP is that it is computationally expensive especially when the training set is large.
Nguyen et al. [141] used a regression model for age estimation. The face image was represented by a multilevel local binary pattern (MLBP). Their approach achieved a MAE of 6.6. Guo and Mu [124] achieved a MAE of 4.0 by using BIF to model a regression model for age estimation. Using manifold of raw pixel intensities to represent face image, Lu and Tan [142] evaluated their regression model on MORPH II dataset and obtained a MAE of 5.2 for White ethnic group and 4.2 for Black ethnic group. Onifade et al. [143] applied a boosted regressor on agerank local binary patterns (arLBP). They reported a MAE of 2.34 on FGNET using LOPO validation protocol. Their approach demonstrated that age ranking with correlation of aging patterns across age groups improves performance of age estimation. Using raw pixel features, Akinyemi and Onifade [144] investigated ethnicspecific age group ranking for age estimation. This approach learns ethnic parameters in addition to the parameters learned in [143]. They evaluated this technique on FGNET and FAGE datasets and reported a MAE of 3.19 years. Their findings show that incorporating ethnic parameters improves performance of age estimation approaches. This could be attributed to the fact that people in different ethnic groups age differently.
Hybrid approach
As discussed in the preceding sections, age estimation task can be approached as either a classification or a regression problem. To choose between the two, one may perform an experiment by selecting representative classifiers and regressors to compare their performance on the same dataset using the same features. Guo et al. [31, 139] compared SVM classifier to SVR regressor. This experiment showed that SVM performs better compared to SVR on YGA dataset with SVM achieving a MAE of 5.55 for females and 7.00 for males while SVR achieving 5.52 for females and 7.47 for males. It was also reported that SVM performed poorly on FGNET compared to SVR (MAE 7.16 against 5.16 years). This experiment shows that classification approach to age estimation may perform better or worse than regression approach depending on other aspects like quality of images in the dataset used, feature selection and feature extraction techniques used, and distribution of images across ages among other factors.
Combining classification and regression may result into robust and more accurate age estimation systems. Guo et al. [31, 139] therefore proposed age estimation using locally adjusted robust regression (LARR). LARR first performs regression using all existing aging images. Regression results are then used to limit a classifier with small search range. They demonstrated that better age estimation performance can be achieved by combining classification and regression schemes. By combining regression and classification, the MAE improved to 5.30 and 5.25 years for females and males, respectively, on YGA dataset and 5.07 on FGNET dataset. The limitation of LARR method [139] is that it cannot automatically determine local search range for a classifier. The range is determined by heuristically trying different ranges and requires the user to experimentally choose the best solution. To automatically determine limited search range, Guo et al. [145] proposed a likelihoodbased approach for combining classification and regression outcomes. Using a uniform distribution, regression results are transformed into likelihoods, then likelihoods from classification outcome are cut off by the uniform distribution. This further improved accuracies by achieving MAE 5.12 and 5.11 for males and females, respectively, on YGA and 4.97 on FGNET.
Gunay et al. [146] represented aging face by fusing AAM, LBP, and Gabor features. They used an ensemble of three SVMs arranged in a hierarchical manner to build an age estimation model. The first step of their model was to perform agegroup estimation by SVM classification. A linear regression was then performed to estimate age within the age group. Their approach achieved a MAE of 4.13 on FGNET. These results show that feature and decision fusion used in a hybrid hierarchical age estimation can improve estimation errors compared to classification approaches.
Han et al. [147] performed hierarchical demographic estimation and compared machine and human performance. They extracted BIF features and demographic informative features using a boosting algorithm. They then perform a hierarchical age estimation using betweengroup classification followed by within group regression. Evaluating this technique on MORPH II and FGNET, they achieved MAE of 3.6 and 3.8 on MORPH II and FGNET datasets, respectively. Choi et al. 2011 [70] used AAM, Gabor, and LBP to represent face image. Their hybrid age estimation model achieved a MAE of 4.7 on FGNET, 4.3 on PAL, and 4.7 on BERC datasets.
Hybrid approach to age estimation demonstrates better performance compared to regression and classification when used alone. To combine classification and regression, one may test extracted features on both techniques separately before combining them. Arrange regression and classification in an arbitrary hierarchical order and compare performance when regression is done before classification and when done after classification.
Facial aging databases
Precise age and agegroup estimation requires a database with good quality facial images at different ages. It is hard to collect a large aging database with a series of chronometric images from an individual. Age and agegroup estimation often uses databases early collected and published. Brief descriptions of these databases are found in [11]. Table 2 gives the summary of some of the aging databases available.
FGNET, MORPH, and webcollected Gallagher’s databases are publicly available. Other databases can be found by contacting the owners. MORP, Ni’s, YGA, LHI, and Gallagher’s webcollected databases are large databases and well suited for regressionbased age estimation using statistical algorithms like AAM and age manifold. FGNET is a suitable database for evaluations with several age estimation methods like AGES. AI & R, LHI, and Iranian datasets comprise comparatively high resolution 2D face images. Other datasets stated here were not extensively used but may be appropriate for some application areas.
FGNET aging database
FGNET [21] contains 1002 both color and grayscale images of 82 individuals from age 0 to 69 years. Each individual has averagely 12 images. Images are collected from multirace subjects and have great inconsistencies in head pose, facial expression, and illumination. Some images have adverse condition because they were scanned. There are 68 landmark points provided which can be used to model facial shape. Age features can be modelled as AAM or as appearance model using texture and wrinkle features.
MORPH database
MORPH [148] is a publicly available aging database created by the Face Aging Group at the University of North Carolina. This dataset is split into two sets. Album 1 has 1724 images collected between 1962 and 1998 from 515 individuals. Images in this dataset range from 27 to 68 years. There are 1430 images for males and 294 images for females with age gap ranging from 46 days to 29 years. Set 2 contains 55,134 images of 13,000 individuals collected over 4 years. Both albums contain metadata for race, gender, date of birth, and date of acquisition. The eye coordinates of the dataset can be requested. A commercial version of album 2 contains a larger set of images collected over a longer time span and includes information like the height and weight of individual.
Yamaha gender and age (YGA) database
YGA [12, 68] database has 8000 highresolution colored images of 1600 individuals consisting of 800 males and 800 females of Asian race, aged between 0 and 93 years. Each subject has approximately five nearly frontal face images at the same age and a label of his or her approximated age. The images have high variations in expression, illumination, and facial expression. Haar cascade face detector [149] is used to crop and resize images to 60 × 60 grayscale patches.
WITDB database
Waseda humancomputer interaction technology [134] dataset consists of 12,008 face images of 2500 females and about 14,214 images of 3000 males from the Japanese race, with age ranging between 3 and 85 years. The ages are arranged in 11 nonoverlapping age groups. The dataset has wide variations in illumination on unoccluded frontal view faces with neutral facial expression. Face images are cropped and resized to 32 × 32 grayscale patches.
AI & R Asian face database
AI & R Asian [150] dataset contains images of different expressions, ages, poses, and illuminations. There are 34 frontalview images collected from 17 individuals with ages ranging from 22 to 61 years. There are averagely two images per individual making this database not suitable for age or agegroup estimation.
Burt’s Caucasian face database
This was collected and used in [151] by Burt and Perrett to investigate visual cues to age by blending color and shape of facial components. The database contains 147 images of European males aged between 20 and 62 years. Faces had neutral expression with beards shaved with no glasses and makeups. There are 208 landmark points placed manually in standardized positions. These points can be used to encode facial shape.
LHI face database
Lotus Hill Research Institute (LHI) database contains 50,000 images of Asian adults at different ages. The images have slight dissimilarities in pose and lighting. Part of this database was used in [152] by Suo et al. to model a hierarchical face model for age estimation. The part used consists of 8000 color images of individuals aged between 9 and 89 years with one image per person. This database could not be appropriate for subjectbased age estimation since it does not provide multiple face images of the same individual at different ages.
HOIP face database
Human and object interaction processing (HOIP) database consists of 306,600 images of 300 individuals aged between 15 and 64 years. The database is divided in 10 age groups. Each age group has got 30 subjects, 15 females and 15 males [11].
Iranian face database
Iranian face database [153] has 3600 color images from 616 individuals aged between 2 and 85 years of which 487 are males and 129 females. The images have variations in pose and facial expression. At least one image with glasses was also taken. Majority of the images are of subjects in the age group of 1–40 years. This database can therefore be appropriate in modelling aging and age estimation in formative and middleage years.
Gallagher’s webcollected database
This database was collected by Gallagher and Chen [4] from Flickr.com image search engine. The database has 28,231 faces in 5080 images. It divided into seven age groups as 0–2, 3–7, 8–12, 13–19, 20–36, 37–65, and 66+. This dataset is suitable for agegroup estimation although the age groups are wider in older ages.
Ni’s webcollected database
This database was collected from the web by Ni et al. [154, 155] using Google.com and Flickr.com image search engines. The database has 219,892 faces in 77,021 images with age range between 1 and 80 years. This is the largest aging database ever reported. The wide age range in this database makes it suitable for age estimation in child, adult, and old age groups.
Kyaw’s webcollected database
This database was collected from the web by Kyaw et al. [156] using API services provided by Microsoft Search Engine Bing. The images in the collected database are aligned with eye corner points captured manually and cropped to 65 by 75 patches. The database contains 963 images divided in four age groups of 3–13, 23–33, 43–53, and 63–73. The database is not appropriate for agegroup estimation since there are missing images between age groups.
BERC database
BERC database [70] was collected by the Biometric Engineering Research Center (BERC). The database contains images of 390 subjects with age ranging from 383 years. Images are of high resolution 3648 × 2736 pixels. There are no variations in light and facial expression on all the images, and subjects are uniformly distributed with respect to age and gender. These make the database suitable for age estimation, although it is comparatively small.
3D morphable database
The database contains 3D scans of 100 male adults and 100 female adults’ faces and 238 teenage faces aged between 8 and 16 years consisting of 113 females and 125 males [69, 157]. All faces were without makeup, accessories, and facial hair. In 3D morphable face models, individual faces are represented as face vector in 3D. By caricaturing texture and shape feature vectors, the model can transform one’s face. As one ages, each face will transform along a curved trajectory in a high dimensional space. Faces are represented by shape and texture vectors such that each linear combination of different faces is a new realistic face.
Summary
FGNET, MORPH, and webcollected Gallagher’s databases are publicly available. Other databases can be found by contacting the owners. MORP, Ni’s, YGA, LHI, and Gallagher’s webcollected databases are large databases and well suited for regressionbased age estimation using statistical algorithms like AAM and age manifold. FGNET is a suitable database for evaluations with several age estimation methods like AGES. AI & R, LHI, and Iranian datasets comprise comparatively highresolution 2D face images. Other datasets stated here were not extensively used but may be appropriate for some application areas.
Age estimation evaluation protocols
Evaluation protocol determines system test, criteria for test data selection, and system performance measure. A good validation strategy should be independent of training data and representative of the population from which it has been drawn [158]. Age estimation technique needs to be validated using previously unseen data to avoid overfitting age estimation technique and improve its generalization capability. Crossvalidation is a popular strategy for age estimation evaluation. In crossvalidation, data is split into two subsets; one segment is used to train or learn age estimation model and the other segment is used to validate or evaluate the model. In classic crossvalidation, training and validation datasets must crossover in consecutive rounds such that every data point has equal chance of being validated or evaluated against the other. The basic form of validation is holdout.
Holdout strategy is the simplest and computational efficient strategy [159] used for validating age estimation techniques. The dataset is randomly split into two sets: training subset and validation subset. Commonly, training subset consists of two thirds of the original data, and the remaining onethird samples constitute validation subset. Age estimation model is then fitted using the training subset and validated on the test subset. In this strategy, the model is trained and validated only once. Although this method is preferred and takes a shorter time to compute, its evaluation depends on the data in respective subsets and results into high variance hence making this strategy give different evaluation results depending on how the dataset is divided [160]. Another validation strategy commonly used is repeated random subsampling (RSS) [161, 162]. In RSS validation technique, the holdout strategy is iterated a number of times and results averaged. The dataset is randomly split into two subsets (train and validation) with a fixed number of samples for each phase of validation. For each data split, age estimation model is retrained on train subset and validated using test subset. The advantage of this strategy over kfold validation is that the size of training and validation is independent to the number of validation iterations. However, this strategy has a limitation such that some samples may never be selected for validation while other samples may be selected repetitively leading to overlapping of validation subsets [163]. But with a significantly large number of iterations done, RSS is likely to achieve better results as kfold validation [164].
Crossvalidation [163] is a standard statistical technique used for model generalization ability with wide application in classification and regression problems [165]. It involves dividing dataset into two subsets, one subset is used to train an estimator while the other subset is used to test an estimator [166]. Crossvalidation is used to assess how a model generalizes to initially unseen data [163, 167]. Crossvalidation strategies can be categorized into two: (i) exhaustive (compute all possible ways of data splitting) and (ii) nonexhaustive (does not compute all possible ways on data splitting). Exhaustive crossvalidation algorithms include leaveoneout (LOO) and leavepout (LPO) while nonexhaustive include kfold and repeated random subsampling (RSS) [160, 168]. Crossvalidation [169] consists of averaging multiple holdout validation results from different subsets of data.
kfold crossvalidation is the basic form of crossvalidation. Other forms of crossvalidation are just but special cases of kfold crossvalidation or involve repeated rounds of kfold validation. In kfold crossvalidation [169], original data is randomly split into k equal subsets. Then, k iterations of training and validation are performed such that in every iteration, a different fold of data is reserved for validation while the remaining k − 1 are used to learn a model. The estimated error is the mean of all validation errors. Standard deviation of these errors can be used to approximate the confidence range of the estimate. The main advantage of kfold crossvalidation is that eventually all samples will be used for both learning and validating models. The common value of k used in various techniques is 10 as a compromise between efficiency and accuracy. A stratified crossvalidation is commonly used in order to improve accuracy of the estimation [163].
Leaveoneout (LOO) [166, 169, 170] is a special type of crossvalidation that given a dataset with C classes, C−1 validation experiments are performed. For each experiment, data from C−1 classes is used for training and data from one class that was left out is used for validation. Therefore, given a dataset of S subjects from age 0→A_{ n }, LOO crossvalidation will perform S−1 validation experiments. In each experiment i, facial images of subject S_{ i } are used for validation while images of the rest S−1 subjects are used for learning a model. In this approach, images of each subject will be used for both training and validation. This way, the technique is validated in the same way as its application scenario where the subject whose age is to be estimated is previously unseen in the system. Although LOO is almost unbiased, it may give unreliable estimates due to its high variance [171]. Leavepout (LPO) [172] with p∈{1,2,3…,n−1} successively leaves out every possible subset of p data samples to be used for validation. In age estimation, given a set of images of N subjects, LPO can be used by leaving out images of p where p≤(N−1) subjects to be used for validation and use images of N−p subjects for training. Elisseef and Pontil [173] showed that LPO crossvalidation is less biased compared to LOO. LPO will have \(\binom nk\) iterations where n is the number of images. These iterations are almost always much higher compared to n−1 iterations in LOO, leading to high computation time. LPO with p = 1 is same as LOO. LOO and LPO are exhaustive crossvalidation strategies compared to other methods. Further information on LPO can be found in [174]. Detailed information on crossvalidation can be found in [172] and [175].
Bootstrap is a strategy introduced by Efron and Tibshirani [176, 177]. Bootstrap is commonly used when working on a small dataset [159]. In this strategy, a bootstrap set is created by uniformly sampling, with replacement, n instances from the original data to make a training set. The remaining samples not selected are used as testing set. The value n of selected samples is likely to change from fold to fold. Since data is sampled with replacement, the probability of any data sample not being selected is given by \(\left (1\frac {1}{n}\right)^{n}\approx e^{1}\approx 0.368\). Chances of a data sample being selected into a train set is (1−0.368) = 0.632. Therefore, the expected number of distinct samples appearing in the train set is 0.632 × n. Since error estimate obtained by using test data will be too pessimistic (since only 62.3% of instances are used for training), error is calculated as error = 0.632 × e_{0}+0.368×e_{ bs } where e_{0} is rate of error obtained from bootstrap sets not having the instance being predicted (test set error) and e_{ bs } is the error obtained on bootstrap sets themselves, both averaged over all data samples and bootstrap samples. Estimate accuracy is directly proportional to number of times the process is repeated. More details on bootstrap validation technique can be found in [177]. Bootstrapping increases the variance that can occur in each fold which makes this strategy more realistic of the real application situation [177]. This validation strategy is rarely used in age estimation.
In most cases, a dataset is split into three subsets: validation subset, training subset, and testing subset [167]. In this approach, the validation subset is used to tune the system to determine the termination point of the training phase when overfitting starts occurring on the training subset. The testing subset is used to validate the trained model using data samples not initially in validation and training subsets. Kiline and Uysal [164] proposed a technique of splitting the dataset with samples from specific subjects rotationally left out of training and validation sets. Budka and Gabrys [158] proposed a densitypreserving sampling (DPS) technique that eliminates the need for repeating error estimation procedures by dividing the dataset into subsets that are guaranteed to be representative of the population the dataset is drawn from. These new proposed approaches of model validation could be experimented in age estimation problem and results compared with other common methods. Crossvalidation and bootstrap strategies are commonly used when one has limited data such that holdout strategy cannot be sufficient for data representativeness in both training and test sets. With abundant data with stable distribution over time, single stratified random split is able to provide required representativeness [158].
For purposes of comparing the performance metric of two or more learning algorithms, Salzberg [178] proposed the use of kfold crossvalidation followed by appropriate hypothesis testing instead of comparing their average accuracies. This strategy can be used to compare two age estimation techniques.
In each iteration of validation, absolute error (AE) for each estimated age is defined as:
where is a_{ i } is the ground truth age and \(\bar {a}_{i}\) is the estimated age. After all validation iterations, mean absolute error (MAE) is defined as the average of all absolute errors between estimated and ground truth age as:
where N is the total number of test images, a_{ i } is the ground truth age of image i, and \(\bar {a}_{i}\) the estimated age of image i. Although this performance evaluation is commonly used, it does not give age estimation performance for specific age but rather gives general performance of the technique for all ages. This approach could be slightly modified such that it gives MAE for every age and general MAE of the technique.
Given a set of testing images \(a_{1}^{n_{1}}, a_{2}^{n_{2}}\dots a_{k}^{n_{k}}\) belonging to k ages to be estimated with n_{ i } representing number of test images known to belong to age a_{ i }, MAE for every age can be defined as:
where \(\bar {a}_{i}\) is the estimated age for image i of age a_{ k } and n is the number of test images belonging to age a_{ k }. This will give agespecific performance of age estimation technique. Overall, MAE can be found by summing all the MAE for all ages tested and dividing by the sum of the number of test images in each age as:
where N=n_{1}+n_{2}+⋯+n_{ k }.
Age estimation technique performance is evaluated based on MAE. The smaller the MAE, the better the age estimation performance. MAE only shows average performance of the age estimation technique. MAE is the appropriate measure of age estimation when the training data has missing images [10]. The overall accuracy of the estimator is given by cumulative score (CS) [12, 31] which is defined as:
where N_{e≤x} is the number of images on which the age estimation technique makes an absolute error no higher than x years error tolerance and N is the total number of test images.
In agegroup estimation, the agegroup label represents a range of ages; hence, the cumulative scores are compared at error level 0, i.e., the percentage of exactly correct agegroup estimation. Therefore, the CS equation becomes:
where n_{ x } is the number of test images correctly recognized as belonging to age group x and N_{ x } is the total number of test images in age group x. Therefore, CS is used as an indicator of accuracy of agegroup estimator [13]. CS is a useful measure of performance in age estimation when the training dataset has samples at almost every age [11]. MAE is a good evaluation technique when the training set has a lot of missing ages. However, in age estimation, both MAE and CS are used since different techniques, datasets, and systems may be extremely imbalanced or skewed for evaluation.
A review of age estimation studies
Agegroup estimation
Global, local, and hybrid features have been previously used in age and agegroup estimation. Ramanathan et al. [179] present a recent survey in automated age estimation techniques.
Age group is a range of ages. Persons whose real age are within the defined ranges are said to be in the same age group. Significant amount of research has been done to automatically extract visual artifacts from faces and group persons in respective age groups. Kwon and Lobo [87] estimated age group based on anthropometry and density of wrinkles. They separated adults from babies using distance ratios between frontal face landmarks on a small dataset of 47 images. They also extracted wrinkle features from specific regions using snakes. Young adults were differentiated from senior adults using these wrinkle indices. Baby group classification accuracy was lower than 68%, but overall performance of their experiments was not reported. Furthermore, ratios used were mainly from baby faces. Horng et al. [85] used geometric features and Sobel filter for texture analysis to classify face images into four groups. They used Sobel edge magnitude to extract and analyze wrinkles and skin variance. They achieved an accuracy of 81.6% on subjectively labeled agegroups.
Ramanathan and Chellappa [59] computed eight distance ratios for modelling age progression in young faces like 0 to 18 years. Their objective was to predict one’s appearance and face recognition across age progression. Using 233 images of which 109 were from FGNET aging dataset, and the rest from their private dataset, they reported improvement in face recognition from 8 to 15%. Dehshibi and Bastanfard [20] used distance ratios between landmarks to classify human faces in various age groups. Using a back propagation neural network with distance ratios as inputs, they classified face images into four age groups of 15, 16–30, 31–50, and above 50. Using a private dataset, they reported 86% accuracy. Thukral et al. [180] used geometric features and decision fusion for agegroup estimation. They achieved 70% overall performance for 0–15, 15–30, and above 30 age groups. Farkas et al. [181] used 10 anthropometric measurements of the face to classify individuals in various ethnic groups. They analyzed these measurements and identified ones that contribute significantly to diversity in facial shape in different ethnic groups. They also found that horizontal measurements differed between ethnic groups than vertical measurements.
Tiwari et al. [182] developed a morphologicalbased face recognition technique using Euclidean distance measurements between fiducial facial landmarks. Using morphological features with back propagation neural network, they reported superior recognition rate than performance of principal component analysis (PCA) [90] with back propagation neural network. This technique recognized faces but it was independent of aging factor due to variations in these distances as one ages. This signifies that distances between facial landmarks differ at different age, especially in young agegroups, and therefore, it could be used in age estimation. Gunay and Nabiyev [94] used spatial LBP [76] histograms to classify faces into six age groups. Using nearest neighbor classifiers, they achieved accuracy of 80% on age groups 10 ± 5,20 ± 5,30 ± 5,40 ± 5,50 ± 5, and 60 ± 5. In [146], Gunay and Nabiyev trained three support vector machine (SVM) models for agegroup estimation using AAM [64], LBP, and Gabor filter [67] features. They fuse decisions from these classifiers to obtain final decision. Although they reported 90% accuracy of subsequent age estimation, overall performance of agegroup estimation was not reported.
Hajizadeh and Ebrahimnezhad [183] represented facial features using histogram of oriented gradients (HOG) [137]. Using probabilistic neural network (PNN) to classify HOG features extracted from several regions, they achieved 87% accuracy in classifying face images into four groups. Liu et al. [184] build a region of certainty (ROC) to link uncertaintydriven shape features with particular surface features. Two shape features are first designed to determine face certainty and classify it. Thereafter, SVM is trained on gradient orient pyramid (GOP) [185] features for agegroup classification. Testing this method on three age groups, 95% accuracy was reported. They further used GOP in [186] with analysis of variance (ANOVA) for feature selection to classify faces into age groups using linear SVM [187] to model features from the eyes, nose, and mouth regions. They achieved 91% on four age groups on FGNET dataset and 82% on MORPH dataset. It was also found that the overall performance of age estimation decreases as the number of age groups increase. This is because the number of images in each age group reduces drastically as the number of groups increase.
Lanitis et al. [66] adopted AAM to represent face image as a vector of combined shape and texture parameters. They defined aging as a linear, cubic, or quadratic function. For automatic age estimation, they further evaluated quadratic function, nearest neighbor, and artificial neural network (ANN) in [23]. They found that hierarchical age estimation achieves better results with quadratic function and ANN classifiers. Although AAM has been extensively used, it does not extract texture information. This problem is avoided by using hybrid feature extraction techniques to combine both shape and texture features for age and agegroup estimation.
Sai et al. [188] used LBP, Gabor, and biologically inspired features for face representation. They used extreme learning machines (ELM) [189] for agegroup estimation. Their approach achieved accuracy of about 70%. Using LBP and a bank of Gabor filters, Wang et al. [190] classified images into four age groups. They used SVM, errorcorrecting output codes (ECOC) and AdaBoost for agegroup estimation. Table 3 shows the summary of age and agegroup estimation studies.
Age estimation
Age is a real number that signifies the number of years elapsed since one’s birth to a point in life. Age estimation is the process of estimation one’s actual age using visual artifacts on the face. These visual artifacts are extracted and used to estimate one’s age.
Lanitis et al. [66] adapted active appearance model (AAM) for aging face by proposing aging function. They defined age as a function age=f(b) to cater for ageintroduced variations. In this function, age is the real estimated age of a subject, b consists of 50 AAMlearnedparameters feature vector, and f is the aging function. They performed experiments on 500 images of 60 individuals of which 45 subjects had images at different ages. Focusing on small age variations, they demonstrated that simulation of age improves performance of face recognition from 63 up to 71% and from 51 to 66% when training and testing datasets are used interchangeably.
Adopting aging pattern subspace (AGES), Geng et al. [13, 26] proposed automatic age estimation using appearance of face images. Evaluating AGES on FGNET aging database, they used 200 AAM parameters to characterize each image for age estimation. They reported 6.77 years mean absolute error (MAE). Fu and Huang [12] used ageseparated face images to model a lowdimensional manifold. Age was estimated by linear and quadratic regression analysis of feature vectors derived from respective lowdimensional manifold. The same approach of manifold learning was used by Guo et al. in [31]. They extracted face aging features using age learning manifold scheme and performed learning and age prediction using locally adjusted regressor. Their approach reported better performance than support vector regression (SVR) and SVM.
Guo et al. [31] used locally adjusted robust regression (LARR) to estimate age. Evaluating their approach on a large dataset, they reported MAE of 5.30 and 5.07 years on FGNET. Guo et al [82] further proposed age estimation using biologically inspired features (BIF) [80,81]. BIF features with support vector machine (SVM) achieved MAE of 4.77 years on FGNET aging dataset and 3.91 and 3.47 years on females and males, respectively, on YGA dataset. Combining gender and age estimation, Guo et al. [191] used BIF and age manifold feature extraction with SVM classifier. They reported superior MAE of 2.61 for females and 2.58 for males on YGA database. Yan et al. [192] performed personindependent age image encoding using synchronized submanifold embedding (SME). SME considers both individuals’ identities and age labels to improve generalization ability on age estimation. Evaluating this technique on FGNET, they reported a MAE of 5.21 years. Yan et al. [83,84] used spatially flexible patch (SFP) for feature description. SFP does not only consider local patches only but also their spatial information. With SFP, slight misalignment, pose variations, and occlusion can be effectively handled. Furthermore, this technique can improve discriminating characteristics of the feature vector when limited samples are available. Adopting Gaussian mixture model (GMM), they achieved a MAE of 4.95 years on FGNET aging dataset and 4.94 and 4.38 years on females and males, respectively, on YGA dataset. Combining BIF and age manifold features and SVM for age estimation achieves MAE of 2.61 and 2.58 years for males and females, respectively, on YGA dataset [11].
Suo et al. [152] designed graphical facial feature topology based on hierarchical face model [193]. They used particular filters to diverse features at various stages of their hierarchical feature extraction design. Using multilayer perceptron (MLP), they reported MAE of 5.97 years on FGNET and 4.68 years on their private dataset.
Craniofacial aging model that combines psychophysical and anthropometric evidences was prop59]. The model was used to simulate perceived age of a subject across age for improving accuracy of face recognition. Choi et al. [70] proposed age estimation approach using hierarchical classifiers with local and global facial features. Using Gabor filters for wrinkle extraction and LBP for skin feature extraction, they classified face images into age groups with SVM. This approach is error prone because it only depends on a single classifier. Wrong age group classification leads to wrong age estimation. For accurate age estimation, age group classification must be robust, and this can be achieved by use of an ensemble of classifiers. Chao et al. [194] determined the relationship between age labels and facial features by merging distance metric, learning, and dimensionality reduction. They used labelsensitive and nearest neighbor (KNN) and SVR for age estimation. Chang et al. [195] proposed ordinal hyperplane ranker for age estimation. Using AAM and SVM, their approach achieved 4.48 MAE on FGNET and MORPH II datasets. Guo et al. [123] build a regression model using BIF and partial least squares (PLS) for age estimation. Their approach achieved 4.43 MAE on MORPH II dataset and showed that learning label distribution improves age estimation. Lu and Tan [142] investigated age estimation using ordinary preserving manifold analysis approach. They found that gait can be used as an effective cue for age estimation at a distance for purposes of enhancing understanding capabilities of existing visual surveillance systems. They further found that discriminating age information can be better exploited in the lowdimensional manifold for achieving better age estimation performance.
Using uniform ternary patterns (UTP) and AAM, Tan et al. [107] and Luu et al. [196] proposed a spectral regressor for age estimation. Evaluating their technique, they achieved a MAE of 6.17. Further work by Luu et al. [197] using contourlet transform achieved a MAE of 6.0 on FGNET and PAL datasets which was better compared to using UTP. Using Gabor wavelets and orthogonal locality preserving projections (OLPP), Lin et al. [198] developed an automatic age estimation system. They evaluated their technique on FGNET dataset and SVM as a classifier and achieved a MAE of 5.71 years. Wu et al. [115] used 2D points to model facial shape for age estimation. Choober et al. [199] proposed use of an ensemble of classifiers for improving automatic age estimation. The limitation of this work is that only neural network was used to make the ensemble. An ensemble can be made robust if different classifiers are used so as each acts as a complimentary to the other. Guo and Mu [124] compared canonical correlation analysis (CCA) and partial least squares (PLS) performance in age, gender, and ethnicity estimation. Using BIF as a feature extractor, they found that CCA performs better compared to PLS. Hadid and Pietikainen [200] experimented manifold learning on age and gender estimation. They reported 83.1% accuracy age estimation on images extracted from video. Geng et al. [201] learned label distribution and used them for age estimation. Their technique was evaluated on both FGNET and MORPH datasets.
Guo et al. [82] first introduced BIF in imagebased age estimation domain. They reported that using Gabor bank starting from smaller sizes like 5×5 can characterize aging. Later, Guo and Mu [123] used kpartial least quares (KPLS) for simultaneous dimensionality reduction of BIF features for age estimation using a regressor. They also showed that partial least squares (PLS) performs better in dimensionality reduction compared to traditional dimensionality reduction techniques like principal component analysis (PCA). They later [124] used canonical correlation analysis (CCA) for modelling age estimation as multiplelabel regression problem. They reported that CCAbased methods work better compared to KPLSbased methods. Spizhevoi and Bovyrin [202] used RBF SVM to learn BIF features for age estimation. Han et al. [203] proposed a hierarchical age estimation and analyzed how aging affects distinct facial components. They used SVM for both classification and regression to classify each face component. Their component localization was not accurate, thereby affecting subsequent features extracted from these components. They later [147] compared human and machine performance on demographic (age, gender, and ethnicity) estimation. They modelled age estimation in particular as a hierarchical problem that consists of betweenclass classification and within class regression of boosted BIF and demographic informative features extracted from a face image.
Deep learning schemes, especially convolutional neural network (CNN), have been successfully used in face analysis tasks including face detection, face alignment [204], face verification [205], and demographic estimation [206]. Wang et al. [207] extracted feature maps obtained in different layers as age features based on deep learning model. Huerta et al. [135] provide a thorough evaluation on deep learning for age estimation using fused features and compare it with handcrafted fusion features. CNN have been used in different recent studies on age estimation and have demonstrated superior performance compared to other methods. Niu et al. [208] used ordinal regression and multiple output CNN for age estimation and reported a MAE of 3.27 on MORPH II and a private Asian Face Age Dataset (AFAD). Chen et al. [209] presented a cascaded CNN that had 0.297 Gaussian error on age estimation. As further demonstrated in [210–212], CNN have posted better results in age estimation tasks. Although CNN performs better than other traditional methods, their applicability is limited by high processing demand required for their implementation. Table 3 shows a summary of studies in agegroup and age estimation.
Conclusions
Comprehensive survey of various techniques and approaches used for age estimation has been presented. There has been enormous effort from both academia and industry dedicated towards modelling age estimation, designing of algorithms, aging face dataset collection, and protocols for evaluating system performance. Table 3 summarizes the findings of recent studies in age estimation, evaluation protocol used, dataset used, age estimation approach used (regression, classification, or hybrid), and feature extraction or age face representation used.
The main issues to consider in age estimation via faces are image representation and estimation techniques. AAM provides a parametric modelling for face representation. A face is represented as a set of shape and texture parameters learned from a face image. AAM can represent both young and old faces since model parameters encode both facial shape and texture. AAM is often used in line with regressionbased age estimation approaches. Anthropometric face representation encodes change in facial shape. Anthropometric approaches to facial representation can be very significant in capturing change in facial shape in young faces. AGES can be used to extract subjects’ aging patterns when a dataset has sequential aging face images while age manifold is convenient when a dataset has missing aging face images in a large age dataset with wide age ranges. Age manifold learning entwines aging feature extraction and dimensionality reduction. Age manifold can be used both in classification and regressionbased approaches. Appearance models often extract facial features that can be used in regression or classificationbased age estimation approach. These features represent facial appearance. These features could be texture, shape, or wrinkle. Feature extraction techniques like LBP, Gabor, BIF, LDA, PCA, and LDP have been often used for appearance face modeling.
Age estimation can be either approached as agegroup estimation or exact age estimation. Agegroup estimation approaches approximate age range in which a face image can fall. Exact age estimation approaches estimate a single label (value) that represents the age of a face image. Both exact age and agegroup estimations can be either classificationbased, regressionbased, or hybrid of both classification and regression. Choice between regression and classification may be guided by face image representation and size and age distribution of the dataset. For big datasets with sequential age labels, both classification and regression can be used, while for datasets with only agegroup labels or significantly missing images at some ages, classificationbased approach may be more appropriate. Both classification and regression can be combined in a hierarchical manner. In this hybrid approach, often classification is used for agegroup estimation followed by exact age estimation within the agegroup using regression techniques.
Age estimation techniques can be evaluated using mean absolute error (MAE) or cumulative score (CS). MAE is appropriate when the training set has a lot of missing ages while CS is used when the training dataset has samples at almost every age. Overall performance of the system is represented by CS. In practice, both MAE and CS are used because different techniques and datasets may be biased for evaluation. The most often used evaluation protocols are LOPO and CrossValidation.
There are a number of promising future directions for age estimation. The following are some of the future research directions that may see improvement in age estimation performance:

Fusion—Feature and decision fusion for age estimation has not been extensively investigated. Fusing shape, wrinkle, and texture features may result into a rich feature set that can distinguish faces in different ages or age groups. Decisions from multiple classifiers or regressors could also be fused to see how they impact age estimation performance.

Multiinstance—Facial landmarks can be extracted and considered as an instance for age estimation. Which parts of the face age faster and how? A face can be broken down into its components (eyes, forehead, nose, nose bridge, mouth, and cheeks) and aging investigation done on each component. Both geometric and anthropometric appearance face modeling can be used on each component.

Ethnic—Faces of subjects from different ethnic groups age differently. Incorporating ethnic parameters as in [144] improves age estimation performance. This approach has not been fully investigated due to lack of large datasets with images from different ethnic groups like African, Asian, and Caucasian.

Lifestyle—One’s lifestyle affects how the face ages. Faces of individuals of the same age but with different lifestyles will appear different. Research has shown that smoking has an influence in facial aging [34,38–41]. It may be interesting to investigate aging and age estimation among a smoking population and how it compares to nonsmoking population. Taister et al. [34] asserts that exposure to drug and psychological stress affects skin texture and color making skin complexion spotted and blemished. Drug use and stress could also be investigated to determine their effect on age estimation.

Environment—Taister et al. [34] found that general exposure to wind and arid air influence facial aging. Arid environment and wind dehydrates the skin leading to wrinkle formation. An investigation of age estimation in populations in different environments is an interesting direction for further research.

Databases—A large multiracial database is needed for effective investigation of aging in different ethnic groups and gender. Collecting a large database with welldistributed age labels is essential. Web image collection is an efficient way of achieving this [154,155].

Profile face aging—How do nonfrontal parts of the face age? How to estimate age from nonfrontal face images? Investigations to answer these two questions could be necessary though are based on availability of such databases (nonfrontal face images). 3D face modelling could be vital in investigating profile face aging and age estimation.

Multisensor—Image collection from multiple imaging sensors could be appropriate for mitigating degrading factors from uncontrollable and personalized attributes. Fusion could be done on the image features for age estimation.
Abbreviations
 AAM:

Active appearance model
 ASM:

Active shape model
 AGES:

Aging pattern subspace
 ACM:

Active contour model
 ANN:

Artificial neural network
 ANOVA:

Analysis of variance
 BIF:

Biologically inspired feature
 CS:

Cumulative score
 CCA:

Canonical correlation analysis
 CNN:

Convolutional neural network
 2D:

Two dimensional
 ECRM:

Electronic customer relationship management
 EM:

Expectation maximization
 ELM:

Extreme learning machines
 FGNET:

Face and gesture network
 GLCM:

Graylevel cooccurrence matrix
 GNN:

Grassmann nearest neighbor
 GMM:

Gaussian mixture models
 HOG:

Histogram of oriented gradients
 KNN:

Knearest neighbour
 KPLS:

kpartial least squares
 LBP:

Local binary patterns
 LDP:

Local directional patterns
 LDA:

Linear discriminant analysis
 LLE:

Locally linear embedding
 LPP:

Linear preserving projections
 LTP:

Local ternary patterns
 LOPO:

Leaveonepersonout
 LARR:

Locally adjusted robust regression
 LOO:

Leaveoneout
 LPO:

Leavepout
 MMML:

Multimanifold metric learning
 MAE:

Mean absolute error
 MLBP:

Multilevel local binary pattern
 MLP:

Multilayer perceptron
 OLPP:

Orthogonal locality preserving projections
 PDM:

Point distribution model
 PCA:

Principal component analysis
 PLS:

Partial least squares
 RSS:

Random subsampling
 ROC:

Region of certainty
 SVM:

Support vector machines
 SVR:

Support vector regression
 SFP:

Spatially flexible patch
 STD:

Standard deviation
 SOM:

Selforganizing maps
 SURF:

Speededup robust features
 SDP:

Semidefinite programming SME: Submanifold embedding
 TV:

Television
 UV:

Ultraviolet
 UTP:

Uniform ternary patterns
 YGA:

Yamaha gender and age
References
MS Zimbler, MS Kokosa, JR Thomas, Anatomy and pathophysiology of facial aging. Facial Plast. Surg. Clin. N. Am.9:, 179–187 (2001).
R Alley, Social and Applied Aspects of Perceiving Faces (Lawrence Erlbaum Associates, Inc, Hillsdale, 1998).
A Gallagher, T Chen, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Estimating age, gender and identity using first name priors (IEEEAnchorage, 2008).
A Gallagher, T Chen, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Understanding images of groups of people, (2009).
N Ramanathan, R Chellappa, S Biswas, Computational methods for modeling facial aging: a survey. J. Vis. Lang. Comput.20:, 131–144 (2009).
MJ Raval, P Shankar, Age invariant face recognition using artificial neural network. Int. J. Advance Eng. Res. Dev.2:, 121–128 (2015).
A Sonu, K Sushil, K Sanjay, A novel idea for age invariant face recognition. Int. J. Innov. Res. Sci. Eng. Technol.3:, 15618–15624 (2014).
SN Jyothi, M Indiramma, Stable local feature based age invariant face recognition. Int. J. Appl. Innov. Eng. Manag.2:, 366–371 (2013).
S Jinli, C Xilin, S Shiguang, G Wen, D Qionghai, A concatenational graph evolution aging model. IEEE Trans. Pattern Anal. Mach. Intell.34:, 2083–2096 (2012).
G Panis, A Lanitis, N Tsapatsoulis, TF Cootes, Overview of research on facial ageing using the FGNET ageing database. IET Biometrics. 5:, 37–46 (2016).
Y Fu, G Guo, T Huang, Age synthesis and estimation via faces: a survey. IEEE Trans. Pattern Anal. Mach. Intell.32:, 1955–1976 (2010).
Y Fu, TS Huang, Human age estimation with regression on discriminative aging manifold. IEEE Trans. Multimedia. 10:, 578–584 (2008).
X Geng, Z Zhau, K Smithmiles, Automatic age estimation based on facial aging patterns. IEEE Trans. Pattern Anal. Mach. Intell.29:, 2234–2240 (2007).
N Ramanathan, R Chellappa, Face verification across age progression. IEEE Trans. Image Process.15:, 3349–3361 (2006).
LA Zebrowitz, Reading Faces: Window to the Soul (Westview Press, Washington DC,1997).
AM Alberta, K Ricanek, E Pattersonb, A review of the literature on the aging adult skull and face: implications for forensic science research and applications. Forensic Sci. Int.172:, 1–9 (2007).
LS Mark, JB Pittenger, H Hines, C Carello, RE Shaw, JT Todd, Wrinkling and head shape as coordinated sources for agelevel information. Percept. Psychophys.27:, 117–124 (1980).
U Park, Y Tong, AK Jain, Age invariant face recognition. IEEE Trans. Pattern Anal. Mach. Intell.32:, 947–954 (2010).
N Ramanathan, R Chellappa, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Face verification across age progression (IEEESan Diego, 2005), pp. 462–469.
MM Dehshibi, A Bastanfard, A new algorithm for age recognition from facial images. Signal Process.90:, 2431–2444 (2010).
FGNET: Face and Gesture Recognition Working Group (2002). http://wwwprima.inrialpes.fr/FGnet/. Accessed 10 Apr 2017.
Z Song, B Ni, D Guo, T Sim, S Yan, in Proceedings of IEEE International Conference on Computer Vision (ICCV). Learning universal multiview age estimator using video context (IEEEBarcelona, 2011), pp. 241–248.
A Lanitis, C Draganova, C Christodoulou, Comparing different classifiers for automatic age estimation. IEEE Trans. Man Syst. Cybern.34:, 621–628 (2004).
E Patterson, A Sethuram, M Albert, K Ricanek, M King, in Proceedings of 1ST IEEE Conference on Biometrics, Theory and Application Systems. Aspects of age variation in facial morphology affecting biometrics (IEEECrystal City, 2007), pp. 1–6.
K Ricanek, E Boone. The effect of normal adult aging on standard PCA face recognition accuracy rates (IEEEMontreal, 2005), pp. 2018–2023.
X Geng, ZH Zhou, Y Zhang, G Li, H Dai, in Proceedings of ACM Conference on Multimedia. Learning from facial aging patterns for automatic age estimation (ACMSanta Barbara, 2006), pp. 307–316.
F Gao, H Ai, in Proceedings of 3rd International Conference on Advances in Biometrics. Face age classification on consumer images with gabor feature and fuzzy lda method: lecture notes in computer science (SpringerAlghero, 2009), pp. 132–141.
H Yang, D Huang, Y Wang, H Wang, Y Tang, Face aging effect simulation using hidden factor analysis joint sparse representation. IEEE Trans. Image Process.25(6), 2493–2507 (2016). https://doi.org/10.1109/TIP.2016.2547587.
H Wang, D Huang, Y Wang, H Yang, Facial aging simulation via tensor completion and metric learning. IET Comput. Vis.11(1), 78–86 (2017). https://doi.org/10.1049/ietcvi.2016.0074.
B Bruyer, JC Scailquin, Person recognition and ageing: the cognitive status of addressesan empirical question. Int. J. Psychol.29:, 351–366 (1994).
G Guo, Y Fu, C Dyer, T Huang, Imagebased human age estimation by manifold learning and locally adjusted robust regression. IEEE Trans. Image Process.17:, 1178–1188 (2008).
AK Jain, SC Dass, K Nandakumar, in Proceedings of International Conference on Biometric Authentication. Soft biometrics traits for personal recognition systems (SpringerBerlin, 2004), pp. 731–738.
I Macleod, B Hill, Heads and Tales: Reconstructing Faces (National Museums of Scotland, Edinburgh, 2001).
MA Taister, SD Holliday, HIM Borman, Comments in facial aging in law enforcement investigation. Forensic Sci. Commun.2:, 1–11 (2000).
E Drakaki, C Dessinioti, CV Antoniou, Air pollution and the skin. Environ. Sci.2(11), 1–6 (2014).
D Zoe, MD Draelos, Aging in a polluted world. J. Cosmet. Dermatol.13:, 85 (2014).
A Vierkotter, T Schikowski, U Ranft, D Sugiri, M Matsui, U Kramer, J Krutmann, Airborne particle exposure and extrinsic skin aging. J. Investig. Dermatol.130(12), 2719–2726 (2010).
FG Fedok, The aging face. Facial Plast. Surg.12:, 107–115 (1996).
WC Leung, I Harvey, Is skin ageing in the elderly caused by sun exposure or smoking?Br. J. Dermatol.147:, 1187–1191 (2002).
HB López, J Tercedor, JM Ródenas, LFRM Simón, OS Serrano, Skin aging and smoking. Rev. Clin. Esp., 147–149 (1995).
PM O’Hare, AB Fleischer, RB D’Agostino, SR Feldman, MA Hinds, AA Rassette, A Mcmichael, PM Williford, Tobacco smoking contributes little to facial wrinkling. J. Eur. Acad. Dermatol. Venereol.12:, 133–139 (1999).
RB Shaw, EB Katzel, PF Koltz, DM Kahn, JA Girotto, HN Langstein, Aging of the mandible and its aesthetic implications. Plast. Reconstr. Surg.125:, 332–342 (2010).
M Situm, M Buljan, V Cavka, V Bulat, I Krolo, LL Mihic, Skin changes in the elderly people—how strong is the influence of the UV radiation on skin aging?Coll. Anthropol.34:, 9–13 (2010).
R Neave, in Proceedings of Forensic Medicine (J.G. Clement and D. L. Ranson, eds). Age Changes to the Face in Adulthood (Oxford University PressNew York, 1998), pp. 225–231.
S Coleman, R Grover, The anatomy of the aging face: volume loss and changes in 3dimensional topography. Aesthet. Surg. J. Am. Soc. Aesthet. Plast. Surg.26:, 4–9 (2006).
PL Leong, Aging changes in the male face. Facial Plast. Surg. Clin. N. Am.16:, 277–279 (2008).
K Sveikata, I Balciuniene, J Tutkuviene, Factors influencing face aging. Literature review. Stomatologija Baltic Dental Maxillofac. J.13:, 113–116 (2011).
EC Paes, HJ Teepen, WA Koop, M Kon, Perioral wrinkles: histologic differences between men and women. Aesthet. Surg. J.29:, 467–472 (2009).
DA Gunn, H Rexbye, CE Griffiths, PG Murray, A Fereday, SD Catt, CC Tomlin, BH Strongitharm, DI Perrett, M Catt, AE Mayes, AG Messenger, MR Green, F van der Ouderaa, JW Vaupel, K Christensen, Why some women look young for their age. PloS ONE. 4(12), e8021 (2009).
C Chen, A Dantcheva, A Ross, in Proceedings of 9th International Conference on Computer Vision Theory and Applications. Impact of facial cosmetics on automatic gender and age estimation algorithms (IEEELisbon, 2014), pp. 182–190.
G Guo, X Wang, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). A study on human age estimation under facial expression changes (IEEEProvidence, 2012), pp. 2547–2553.
DT Nguyen, SR Cho, KY Shin, JW Bang, KR Park, Comparative study of human age estimation with or without preclassification of gender and facial expression. Sci. World J.1–15 (2014).
M Minear, D Park, A lifespan database of adult facial stimuli. Behav. Res. Methods Instrum. Comput.36:, 630–633 (2004).
N Ebner, M Riediger, U Lindenberger, FACES—a database of facial expressions in young, middleaged, and older women and men: development and validation. Behav. Res. Methods. 42:, 351–362 (2010).
MC Voelkle, NC Ebner, Let me guess how old you are: effects of age, gender, and facial expression on perceptions of age. Psychol. Aging. 27:, 265–277 (2012).
L Farkas, Anthropometry of the Head and Face (Raven Press, New York, 1994).
K Bush, O Antonyshyn, 3dimensional facial anthropometry using a lasersurface scannervalidation of the technique. Plast. Reconstr. Surg.98(2), 226–235 (1996).
J Kolar, E Salter, Craniofacial Anthropometry: Practical Measurement of the Head and Face for Clinical, Surgical and Research Use (Charles C. Thomas Publisher LTD, 1996).
N Ramanathan, R Chellappa, in Proceedings of IEEE Conference Computer Vision and Pattern Recognition. Modeling age progression in young faces (IEEENew York, 2006), pp. 384–394.
TF Cootes, CJ Taylor, DH Cooper, J Graham, Active shape models—their training and application. Comp. Vision Image Underst.61:, 38–59 (1995).
M Kass, A Witkin, D terzopoulos, Snakes: active contour models. Int. J. Comput. Vis.1(321), 321–331 (1988).
N Duta, AK Jain, MP DubuissonJolly, Automatic construction of 2d shape model. IEEE Trans. Pattern Anal. Mach. Intell.23:, 433–446 (2001).
J Liu, JK Udupa, Oriented active shape models. IEEE Trans. Med. Imaging. 28(4), 571–584 (2009).
TF Cootes, GJ Edwards, CJ Taylor, Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell.23:, 681–685 (2001).
G Edwards, A Lanitis, C Taylor, T Cootes, Statistical models of face images—improving specificity. Image Vis. Comput.16:, 203–211 (1998).
A Lanitis, J Taylor, TF Cootes, Toward automatic simulation of aging effects on face images. IEEE Trans. Pattern Anal. Mach. Intell.24:, 442–455 (2002).
D Gabor, Theory of communication. J. Inst. Electr. Eng.93:, 429–457 (1946).
Y Fu, Y Xu, S HT, in Proceedings of IEEE Conference Multimedia and Expo. Estimating human ages by manifold analysis of face pictures and regression on aging features (IEEEBeijing, 2007), pp. 1383–1386.
K Scherbaum, M Sunkel, HP Seidel, V Blanz, Prediction of individual nonlinear aging trajectories of faces. Computer Graphics Forum. 26(3), 285–294 (2007).
SE Choi, YJ Lee, JL S, RP K, J Kim, Age estimation using hierarchical classifier based on global and local features. Pattern Recogn.44:, 1262–1281 (2011).
BS Manjunathi, WY Ma, Texture features for browsing, retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell.18:, 837–842 (1996).
L Huang, J Lu, YP Tan, Multimanifold metric learning for face recognition based on image sets. J. Vis. Commun. Image Represent.25(7), 1774–1783 (2014).
D Beymer, T Poggio, Image representations for visual learning. Science. 272:, 1905–1909 (1996).
J Hayashi, M Yasumoto, H Ito, H Koshimizu, in Proceedings of the 7th International Conference on Virtual Systems and Multimedia. A method for estimating and modeling age and gender using facial image processing (IEEEBerkeley, 2001).
J Hayashi, M Yasumoto, H Ito, Y Niwa, H Koshimizu, in Proceedings of SICE Annual Conference. Age and gender estimation from facial image processing (IEEEOsaka, 2002), pp. 13–18.
T Ojala, M Pietikainen, D Harwood, A comparative study of texture measures with classification based on featured distribution. Pattern Recogn.29:, 51–59 (1996).
JP P, H Moon, SA Rizvi, PJ Rauss, The feret evaluation methodology for face recognition algorithms. IEEE Trans. Pattern Anal. Mach. Intell.22:, 1090–1104 (2000).
Z Yang, H Ai, in Proceedings of International Conference on Biometrics. Demographic classification with local binary patterns (SpringerSeoul, 2007), pp. 464–473.
F Gao, H Ai, in Proceedings of International Conference on Advances in Biometrics. Face age classification on consumer images with gabor feature and fuzzy lda method (SpringerAlghero, 2009), pp. 132–141.
M Riesenhuber, T Poggio, Hierarchical models of object recognition in cortex. Nature Neuroscience. 2:, 1019–1025 (1999).
T Serre, L Wolf, S Bilechi, M Riesenhuber, T Poggio, Robust object recognition with cortexlike mechanism. IEEE Trans. Pattern Anal. Mach. Intell.29:, 411–426 (2007).
G Guo, G Mu, Y Fu, TS Huang, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Human age estimation using bio inspired features (IEEEMiami, 2009), pp. 112–119.
S Yan, X Zhou, M Liu, M HasegawaJohnson, TS Huang, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Regression from patchkernel (IEEEAnchorage, 2008).
S Yan, M Liu, TS Huang, in Proceedings of IEEE Conference on Acoustics, Speech and Signal Processing. Extracting age information from local spatially flexible patches (IEEELas Vegas, 2008), pp. 737–740.
WB Horng, CP Lee, CW Chen, Classification of age groups based on facial features. Tamkang J. Sci. Eng.4:, 183–192 (2001).
H Takimoto, Y Mitsukura, M Fukumi, N Akamatsu, Robust gender and age estimation under varying facial poses. Electron. Commun. Jpn.91:, 32–40 (2008).
Y Kwon, N Lobo, Age classification from facial images. Comp. Vision Image Underst.74:, 1–21 (1999).
HG Jung, J Kim, Constructing a pedestrian recognition system with a public open database, without the necessity of retraining: an experimental study. Pattern. Anal. Applic.13:, 223–233 (2010).
RA Fisher, The statistical utilization of multiple measurements. Ann. Eugenics. 8:, 376–386 (1938).
K Fukunaga, Introduction to Statistical Pattern Recognition, 2nd edn. (Academic Press, San Diego, 1990).
PN Belhumeour, JP Hespanda, DJ Kriegman, Eigenfaces vs fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell.19:, 711–720 (1997).
DL Swets, JJ Weng, Using discriminant eigenfeatures for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell.18:, 71–86 (1996).
AM Martinez, AC Kak, PCA versus LDA. IEEE Trans. Pattern Anal. Mach. Intell.23:, 228–233 (2001).
A Gunay, VV Nabiyev, in Proceedings of 23rd International Symposium of Computer and Information Sciences. Automatic age classification with LBP (IEEEIstanbul, 2008), pp. 1–4.
T Ojala, M Pietikainen, T Maenpaa, Multiresolution grayscale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell.24:, 971–987 (2002).
T Maenpaa, M Pietikainen, Texture Analysis with Local Binary Patterns: Handbook of Pattern Recognition and Computer Vision (World Scientific, 2005).
T Ahonen, A Hadid, M Pietikainen, in Proceedings of European Conference on Computer Vision. Face recognition with local binary patterns (SpringerPrague, 2004), pp. 469–481.
D Huang, C Shan, M Ardabilian, Y Wang, L Chen, Local binary patterns and its application to facial image analysis: a survey. IEEE Trans. Syst. Man Cybernet. Part C (Appl. Rev.)41(6), 765–781 (2011). https://doi.org/10.1109/TSMCC.2011.2118750.
T Ojala, M Pietikäinen, Mäenpäa, T̈, in Proceedings of 2nd ICAPR. A generalized local binary pattern operator for multiresolution gray scale and rotation invariant texture classification (SpringerRio de Janeiro, 2001), pp. 397–406.
T Jabid, MH Kabir, O Chae, in Proceedings of 2010 Digest of Technical Papers International Conference on Consumer Electronics (ICCE). Local directional pattern (LDP) for face recognition (IEEELas Vegas, 2010). https://doi.org/10.1109/ICCE.2010.5418801.
RA Kirsch, Computer determination of the constituent structure of biological images. Computers Biomed. Res.4(3), 315–328 (1971). https://doi.org/10.1016/00104809(71)900346.
JMS Prewitt, Object Enhancement and Extraction (Academic Press, New York, 1970).
I Sobel, F G, in Presented at the Stanford Artificial Intelligence Project (SAIL). A 3 X 3 isotropic gradient operator for image processing, (1968).
WK Pratt, Digital Image Processing (Wiley, New York, 1978).
SW Lee, Offline recognition of totally unconstrained handwritten numerals using multilayer cluster neural network. IEEE Trans. Pattern Anal. Mach. Intell.18(6), 648–652 (1996). https://doi.org/10.1109/34.506416.
T Jabid, MH Kabir, O Chae, in Proceedings of 2010 20th International Conference on Pattern Recognition. Gender classification using local directional pattern (LDP) (IEEEIstanbul, 2010). https://doi.org/10.1109/ICPR.2010.373.
X Tan, B Triggs, Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans. Image Process.19:, 1635–1650 (2010).
RC Gonzalez, RE Woods, Digital Image Processing, 3rd edn. (Pearson, 2008).
MH R, K Shanmugam, I Dinstein, Texture features for image classification. IEEE Trans. Syst. Man Cybernet.3:, 610–621 (1973).
M Unser, Sum and difference histograms for texture classification. IEEE Trans. Pattern Anal. Mach. Intell.8:, 118–125 (1986).
N Zulpe, V Pawar, GLCM texture features for brain tumor classification. Int. J. Comput. Sci. Issues. 3(3), 354–359 (2012).
FR Siquira, WR Schwatz, H Pedrini, Multiscale gray level cooccurrence matrices for texture description. Neurocomputing. 120:, 336–345 (2013).
LK Soh, C Tsatsoulis, Texture analysis of SAR sea ice imagery using grey level cooccurrence matrices. IEEE Trans. Geosci. Remote Sensing. 37:, 780–795 (1999).
A Edelman, TA Arias, ST Smith, The geometry of algorithms with orthogonality constrains. SIAM J. Matrix Anal. Appl.20:, 303–353 (1999).
T Wu, P Turaga, R Chellappa, Age estimation and face verification across aging using landmasks. IEEE Trans. Inf. Forensics Secur.7:, 1780–1788 (2012).
H Karcher, Riemannian center of mass and mollifier smoothing. Commun. Pur. Appl. Math.30:, 509–541 (1977).
T Serre, M Riesenhuber, Realistic modeling of simple and complex cell tuning in the hmax model and implications for invariant object recognition in cortex. Technical report, Massachusetts Institute of Tech Cambridge Computer Science Artificial Intelligence Lab DTIC Washington DC USA Tech. Rep. AIMEMO2004017 (2004).
B Ma, Y Su, F Jurie, Covariance descriptor based on bioinspired features for person reidentification and face verification. Image Vis. Comput.32:, 379–390 (2014).
Y Huang, K Huang, D Tao, T Tan, X Li, Enhanced biologically inspired model for object recognition. IEEE Trans. Syst. Man Cybern.41:, 1668–1680 (2011).
D Song, D Tao, Biologically inspired feature manifold for scene classification. IEEE Trans. Image Process.19:, 174–184 (2010).
GD J, Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by twodimensional visual cortical filters. J.Optic. Soc. Am.2:, 1160–1169 (1985).
S Marcelja, Mathematical description of the responses of simple cortical cells. J. Optic. Soc. Am.70:, 1297–1300 (1980).
G Guo, G Mu, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Simultaneous dimensionality reduction and human age estimation via kernel partial least squares regression (IEEEProvidence, 2011), pp. 657–664.
G Guo, G Mu, in Proceedings of IEEE Conference on Face and Gesture Recognition. Joint estimation of age, gender and ethnicity:cca vs pls (IEEEShanghai, 2013), pp. 1–6.
B Cai, X Xu, X Xing, BIT: Biologically inspired tracker. IEEE Trans. Image Process.25:, 1327–1339 (2016).
M Haghighat, S Zonouz, M AbdelMottaleb, in Proceedings of Computer Analysis of Images and Patterns Conference. Identification using encrypted biometrics (SpringerYork, 2013), pp. 440–448.
MR Turner, Texture discrimination by gabor functions. Biol. Cybernet.55:, 71–82 (1986).
L Shen, L Bai, A review on gabor wavelets for face recognition. Pattern Anal. Appl.9:, 273–292 (2006).
JG Daugman, Spatial visual channels in the Fourier plane. Vis. Res. 24:, 891–910 (1984).
I Lampl, D Ferster, T Poggio, M Riesenhuber, Intracellular measurements of spatial integration and the max operation in complex cells of the cat primary visual cortex. J. Neurophysiol.92:, 2704–2713 (2004).
T Serre, L Wolf, T Poggio (eds.), Object Recognition with Features Inspired by Visual Cortex (IEEE, San Diego, 2005).
J Mutch, D Lowe, Object class recognition and localization using sparse features with limited receptive fields. Int. J. Comput. Vis.80:, 45–57 (2008).
VN Vapnik, An overview of statistical learning theory. IEEE Trans. Neural Netw.10(5), 988–999 (1999). https://doi.org/10.1109/72.788640.
K Ueki, T Hayashida, T Kobayashi, in Proceedings of IEEE Conference on Automatic Face and Gesture Recognition. Subspacebased age group classification using facial images under various lighting conditions (IEEESouthampton, 2006), pp. 43–48.
I Huerta, C Fernandez, C Segura, J Hernando, A Prati, A deep analysis on age estimation. Pattern Recognit. Lett.68:, 239–249 (2015).
H Bay, T Tuytelaars, LV Gool, Surf: Speeded up robust features. Comput. Vis.ECCV. 3951:, 404–417 (2006).
B Triggs, N Dalal, in Proceedings of IEEE on Compter Vision and Pattern Recognition. Histograms of oriented gradients for human detection (IEEESan Diego, 2005), pp. 886–893.
Z Hu, Y Wen, J Wang, M Wang, R Hong, S Yan, Facial age estimation with age difference. IEEE Trans. Image Process.1–13 (2016).
G Guo, Y Fu, TS Huang, C Dyer, in Proceedings of IEEE Workshop on Applications of Computer Vision. Locally adjusted robust regression for human age estimation (IEEECopper Mountain, 2008).
S Yan, H Wang, X Tang, TS Huang, in Proceedings of IEEE Conference on Computer Vision. Learning autostructured regressor from uncertain nonnegative labels, (2007).
DT Nguyen, SR Cho, KR Park, Age estimationbased soft biometrics considering optical blurring based on symmetrical subblocks for mlbp. Symmetry, 1882–1913 (2015).
J Lu, Y Tan, Ordinary preserving manifold analysis for human age and head pose estimation. IEEE Trans. Hum. Mach. Syst. 43:, 249–258 (2013).
OFW Onifade, DJ Akinyemi, A groupwise age ranking framework for human age estimation. Int. J. Image Graphics Signal Process.5:, 1–12 (2015).
JD Akinyemi, OFW Onifade, in Proceedings of IEEE Symposium on Technologies for Homeland Security. An ethnicspecific age group ranking approach to facial age estimation using raw pixel features (IEEEWaltham, 2016), pp. 1–6.
G Guo, y Fu, TS Huang, C Dyer, in Proceedings of IEEE Conference on Computer Vision and Pattern RecognitionSemantic Learning and Applications in Multimedia Workshop. A probabilistic fusion approach to human age prediction, (2008), pp. 1–6.
A Gunay, VV Nabiyev, Facial age estimation based on decision level fusion of amm, lbp and gabor features. Int. J. Adv. Comput. Sci. Appl.6:, 19–26 (2015).
H Han, XO CLiu, AK Jain, Demographic estimation from face images: human vs machine performance. IEEE Trans. Pattern Anal. Mach. Intell.37:, 1148–1161 (2015).
K Ricanek, T Tesafaye, in Proceedings of IEEE 7th International Conference on Automatic Face and Gesture Recognition. Morph: A longitudinal image database of normal adult ageprogression (IEEESouthampton, 2006), pp. 341–345.
P Viola, M Jones, Robust realtime object detection. Int. J. Comput. Vis.57(2), 137–154 (2004).
Y Fu, N Zheng, Mface: An appearancebased photorealistic model for multiple facial attributes rendering. IEEE Trans. Circ. Syst. Video Technol.16:, 830–842 (2006).
DM Burt, DI Perrett, in Proceedings of Royal Society of London Series B Biological Sciences. Perception of age in adult Caucasian male faces: computer graphic manipulation of shape and colour information, (1995), pp. 137–143.
J Suo, T Wu, S Zhu, S Shan, X Chen, W Gao, in Proceedings of IEEE Conference in Automatic Face and Gesture Recognition. Design sparse features for age estimation using hierarchical face model (IEEEAmsterdam, 2008).
A Bastanfard, MA Nik, MM Dehshibi, in Proceedings of International Conference on Machine Vision. Iranian face database with age, pose and expression (IEEEIslamabad, 2007), pp. 50–55.
B Ni, Z Song, S Yan, in Proceedings of ACM International Conference on Multimedia. Web image mining towards universal age estimator (ACM PressBeijing, 2009), pp. 85–94.
B Ni, Z Song, S Yan, Web image and video mining towards universal and robust age estimator. IEEE Trans. Multimedia. 13:, 1217–1229 (2011).
SP Kyaw, JG Wang, EK Teoh, in Proceedings of IEEE International Conference on Information, Communication and Signal Processing. Web image mining for facial age estimation (IEEETainan, 2013).
V Blanz, T Vetter, in Proceedings of ACM Conference SIGGRAPH. A morphable model for the synthesis of 3D faces (ACM PressNew York, 1999), pp. 187–194.
M Budka, B Gabrys, Densitypreserving sampling: robust and efficient alternative to crossvalidation for error estimation. IEEE Trans. Neural Netw. Learn. Syst.24:, 22–34 (2013).
S Weiss, C Kulikowski, Computer Systems that Learn (Morgan Kaulmann, 1991).
Y Bengio, Y Grandvalet, No unbiased estimator of the variance of kfold cross validation. J. Mach. Learn. Res.5:, 1089–1105 (2004).
A Jain, R Duin, J Mao, Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell.22:, 4–37 (2000).
RR Picard, RD Cook, Cross validation of regression models. J. Am. Stat. Assoc.79:, 575–583 (1984).
R Kohavi, in Proceedings of International Joint Conference on Artificial Intelligence. A study of crossvalidation and bootstrap for accuracy estimation and model selection (Morgan Kaufmann Publishers Inc.San Francisco, 1995), pp. 1137–1143.
O Kiline, I Uysal, in Proceedings of 14th International Conference on Machine Learning and Applications. Sourceaware partitioning for robust crossvalidation (IEEEMiami, 2015).
R Duda, P Hart, D Stork, Pattern Classification, 2nd edn. (Wiley, Menlo Park, 2001).
M Stone, Crossvalidatory choice and assessment of statistical predictions. J. R. Stat. Soc. Series B (Methodological). 36:, 111–147 (1974).
T Hastic, R Tibshirani, J Friedman, J Franklin, The elements of statistical learning: data mining inferences and prediction. Math. Intell.27:, 83–85 (2005).
P Rafaeilzadeh, L Tang, H Liu, Crossvalidation. Encycl. Database Syst., 532–538 (2009).
S Geisser, The predictive sample reuse method with applications. J. Am. Stat. Assoc.70:, 320–328 (1975).
DM Allen, The relationship between variable selection and data augmentation and a method for prediction. Technometrics. 16:, 125–127 (1974).
B Efron, Estimating the error rate of a prediction rule: Improvement on crossvalidation. J. Am. Stat. Assoc.78:, 316–331 (1983).
J Shao, Linear model selection by crossvalidation. J. Am. Stat. Assoc.88:, 486–494 (1993).
A Elisseeff, M Pontil, LeaveOneOut Error and Stability of Learning Algorithms with Applications (IOS Press, 2003).
A Celisse, S Robin, Nonparametric density estimation by exact leavepout crossvalidation. Comput. Stat. Data Anal.52:, 2350–2368 (2008).
S Arlot, A survey of crossvalidation procedures for model selection. Stat. Surv.4:, 40–79 (2010).
B Efron, R Tibshirani, Bootstrap methods for standard errors, confidence intervals and other measures of statistical accuracy. Stat. Sci.1:, 54–77 (1986).
B Efron, R Tibshirani, An Introduction to the Bootstrap (Chapman & Hall, New York, 1993).
S Salzberg, On comparing classifiers: pitfalls to avoid and a recommended approach. Data Mining Knowl. Discov.1:, 317–328 (1997).
N Ramanathan, R Chellapa, W Biswas, Age progression in human faces: a survey. J. Vis. Lang. Comput.15:, 3349–3361 (2009).
P Thukral, K Mitra, R Chellappa, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing. A hierarchical approach for human age estimation (IEEEKyoto, 2012), pp. 1529–1532.
LG Farkas, MJ Katic, CR Forrest, International anthropometric study of facial morphology in various ethnic groups/races. J. Craniofacila Surg., 615–646 (2005).
R Tiwari, A Shukla, C Prakash, D Sharma, R Kumar, S Sharma, in Proceedings of IEEE International Advance Computing Conference. Face recognition using morphological methods (IEEEPatiala, 2009), pp. 529–534.
MA Hajizadeh, H Ebrahimnezhad, in Proceedings of 7th Iranian Machine Vision and Image Processing Conference. Classification of age groups from facial image using histogram of oriented gradient (IEEETehran, 2011), pp. 1–5.
KH Liu, S Yan, JCC Kuo, in Proceeding of IEEE Winter Conference on Applications of Computer Vision (WACV). Age group classification via structured fusion of uncertaintydriven shape features and selected surface features (IEEESteamboat Springs, 2014), pp. 445–452.
H Ling, S Soatto, N Ramanathan, D Jacobs, Face verification across age progression using discriminative methods. IEEE Trans. Inf. Forensics Secur.5:, 82–91 (2010).
KH Liu, S Yan, JCC Kuo, Age estimation via grouping and decision fusion. IEEE Trans. Inf. Forensics Secur.10:, 2408–2423 (2015).
CC Chang, CJ Lin, Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol.2:, 27 (2011).
P Sai, J Wang, E Teoh, Facial age range estimation with extreme learning machines. Neurocomputing. 149:, 364–372 (2015).
GB Huang, QY Zhu, CK Siew, Extreme learning machines: theory and applications. Neurocomputing. 70:, 489–501 (2006).
J Wang, W Yau, HL Wang, in Proceedings of IEEE Applications of Computer Vision (WACV). Age categorization via ecoc with fused gabor and lbp features, (2009), pp. 313–318.
G Guo, G Mu, Y Fu, C Dyer, TS Huang, in Proceedings of IEEE Conference on Computer Vision. A study on automatic age estimation using a large database (IEEEKyoto, 2009).
S Yan, H Wang, Y Fu, J Yan, X Tang, TS Huang, Synchronized submanifold embedding for person independent pose estimation and beyond. IEEE Trans. Image Process.18:, 202–210 (2009).
J Suo, F Min, S Zhu, S Shan, X Chen, in Proceedings of IEEE Conference Computer Vision and Pattern Recognition. A multiresolution dynamic model for face aging simulation (IEEEMinneapolis, 2007).
W Chao, J Liu, J Ding, Facial age estimation based on labelsensitive learning and ageoriented regression. Pattern Recogn.46:, 628–641 (2013).
KY Chang, CS Chen, YP Hung, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Ordinal hyperplane ranker with cost sensitivities for age estimation (IEEEProvidence, 2011), pp. 585–592.
K Luu, TD Bui, CY Suen, in Proceedings of IEEE Conference on Automatic Face & Gesture Recognition and Workshops. Kernel spectral regression of perceived age from hybrid facial features (IEEESanta Barbara, 2011).
K Luu, K Seshadri, M Savvides, T Bui, C Suen, in Proceedings of IJCB. Contourlet appearance model for facial age estimation (IEEEWashington, DC, 2011), pp. 1–8.
CT Lin, DL Li, JH Lai, MF Han, JY Chang, Automatic age estimation system for face images. Int. J. Adv. Robot. Syst.9:, 1–9 (2012).
AK Choober, Improving automatic age estimation algorithms using an efficient ensemble technique. Int. J. Mach. Learn. Comput.2:, 118–122 (2012).
A Hadid, M Pietikanen, Demographic classification from face videos using manifold learning. Neuracomputing. 100:, 197–205 (2013).
X Geng, C Yin, ZH Zhou, Facial age estimation by learning from label distribution. IEEE Trans. Pattern Anal. Mach. Intell.35:, 2401–2412 (2013).
AS Spizhevoi, AV Bovyrin, Estimating human age using bioinspired features and the ranking method. Pattern Recognit. Image Anal.25:, 547–552 (2015).
H Han, AKO CJain, in Proceedings of IAPR International Conference on Biometrics. Age estimation from face images: human vs. machine performance (IEEEMadrid, 2013).
Y Sun, X Wang, X Tang, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Deep convolutional network cascade for facial point detection (IEEEPortland, 2013), pp. 3476–3483.
Y Taigman, M Yang, M Ranzato, L Wolf, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Deepface: closing the gap to human level performance in face verification (Columbus, 2014), pp. 1701–1708.
M Yang, S Zhu, F Lv, K Yu, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Correspondence driven adaptation for human profile recognition (IEEEColorado Springs, 2011), pp. 505–512.
X Wang, R Guo, C Kambhamettu, in Proceedings of IEEE Winter Conference on Applications of Computer Vision. Deeplylearned feature for age estimation (IEEEWaikoloa, 2015), pp. 534–541.
Z Niu, M Zhou, L Wang, X Gao, G Hua, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Ordinal regression with multiple output cnn for age estimation (IEEELas Vegas, 2016), pp. 4920–4928.
JC Chen, A Kumar, R Ranjan, VM Patel, A Alavi, R Chellappa, in Proceedings of IEEE Conference on Biometrics, Theory, Applications and Systems. A cascaded convolutional neural network for age estimation of unconstrained faces (IEEENiagara Falls, 2016).
Z Huo, X Yang, C Xing, Y Zhou, P Hou, J Lv, X Geng, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Deep age distribution learning for apparent age estimation (IEEELas Vegas, 2016), pp. 722–729.
RC Malli, M Aygun, HK Ekenel, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Apparent age estimation using ensemble of deep learning models (IEEELas Vegas, 2016), pp. 714–721.
B Hebda, T Kryjak, in Proceedings of IEEE Conference on Computer Science and Information Systems. A compact deep convolutional neural network architecture for video based age and gender estimation (IEEEGdansk, 2016), pp. 787–790.
T Todd, SM Leonard, RE Shaw, JB Pittenger, The perception of human growth. Sci. Am.242(2), 132–144 (1980).
S k. Kim, H Lee, S Yu, S Lee, in Proceedings of 2006 1ST IEEE Conference on Industrial Electronics and Applications. Robust face recognition by fusion of visual and infrared cues (IEEESingapore, 2006), pp. 1–5. https://doi.org/10.1109/ICIEA.2006.257072.
T Kanno, M Akiba, Y Teramachi, H Nagahashi, T Agui, Classification of agegroup based on facial images of young males using neural networks. IEICE Trans. Inf. Syst.E84D:, 1094–1101 (2001).
R Iga, K Izumi, H Hayashi, G Fukano, T Ohtani, in Proceedings of SICE Annual Conference. A gender and age estimation system from face images (IEEEFukui, 2003), pp. 756–761.
SK Zhou, B Georgescu, XS Zhou, D Comaniciu, in Proceedings of IEEE International Conference on Computer Vision. Image based regression using boosting method (IEEEBeijing, 2005), pp. 541–548.
H Takimoto, Y Mitsukura, M Fukumi, N Akamatsu, in Proceedings of ICASE International Joint Conference. A design of gender and age estimation system based on facial knowledge (IEEEBusan, 2006), pp. 3883–3886.
H Takimoto, T Kuwano, Y Mitsukura, H Fukai, M Fukumi, in Proceedings of SICE Annual Conference. Appearanceage feature extraction from facial image based on age perception (IEEETakamatsu, 2007), pp. 2813–2818.
S Yan, H Wang, TS Huang, X Tang, in Proceedings of IEEE Conference on Multimedia and Expo. Ranking with uncertain labels (IEEEBeijing, 2007), pp. 96–99.
X Zhuang, X Zhou, M HasegawaJohnson, TS Huang, in Proceedings of International Conference on Pattern Recognition. Face age estimation using patchbased hidden markov model supervectors (IEEETampa, 2008).
B Xiao, X Yang, Y Xu, in Proceedings of ACM International Conference on Multimedia. Learning distance metric for regression by semidefinite programming with application to human age estimation (ACM PressBeijing, 2009).
H Han, AK Jain, Age, gender and race estimation from unconstrained face images. Technical report MSUCSE145 (Michigan State University, 2014).
C Li, Q Liu, W Dong, X Zhu, J Liu, H Lu, Human age estimation based on locality and ordinal information. IEEE Trans. Cybernet.45:, 2522–2534 (2015).
X Yang, BB Gao, C Xing, et al, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Deep label distribution learning for apparent age estimation (IEEESantiago, 2015), pp. 344–350.
Author information
Authors and Affiliations
Contributions
All authors had equal contribution towards this work. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Angulu, R., Tapamo, J.R. & Adewumi, A.O. Age estimation via face images: a survey. J Image Video Proc. 2018, 42 (2018). https://doi.org/10.1186/s1364001802786
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1364001802786
Keywords
 Age estimation
 Face
 Anthropometry
 Models
 Feature
 Classification