 Research
 Open
 Published:
Age estimation algorithm of facial images based on multilabel sorting
EURASIP Journal on Image and Video Processingvolume 2018, Article number: 114 (2018)
Abstract
Multilabel sorting learning has been successful in many fields. It can not only express the complex semantic information of learning objects, but also present good generalization ability in dealing with complex things. This paper proposes age estimation algorithm of facial images based on multilabel sorting. This estimation algorithm is for the lack of facial age dataset, and it changes the traditional multivalued classification method, simplified the problem of tedious steps to estimate age and shortened the time for model training. A series of experiments on two age datasets shows that the algorithm has achieved very good results in evaluating indicators, and these indicators include MAE (mean absolute error), CS (cumulative score), and convergence rate. When compared with some classic algorithms of age estimation, the efficiency and accuracy of the algorithm are verified.
Introduction
In recent years, multilabel sorting learning technology has been widely used in the research fields of document classification, image recognition, gene function prediction, and so on. However, the technology is relatively less applied in the field of age estimation of facial images. In the age estimation dataset, the annotation method is usually a facial image corresponding to an accurate age value, but there are many problems with such a simple annotation method. The most important of these problems is that using an accurate age value to represent the true age of a facial image is unreliable and unstable, due to the slow changes in the face’s appearance and the slight differences in facial images between similar ages as the age increases; it is easy to be confused with age classification. In addition, how to use the limited facial dataset to establish a good age estimation model has always been a problem followed with interest in the technical research process of facial age estimation, and the reason why this problem has not been resolved is due to the very small number of samples in the age dataset of facial images.
In the research of face age estimation, many scholars put forward a conclusion of classical algorithm. Geng et al. proposed Aging Patterns Subspace (AGES) algorithm [1]; by constructing a representative subspace, the aging pattern of human face is modeled, but the modeling time is longer. Guo et al. proposed the support vector machine (SVM) algorithm [2, 3] and support vector regression (SVR) algorithm [4, 5]; their algorithm is simple and robust, but it will consume a lot of machine memory and computing time. Hong et al. proposed knearest neighbor sorting (kNN) algorithm [6]; this algorithm is simple and accurate, but it needs a lot of calculation. In addition, Li et al. proposed Ordinal Hyperplanes Ranker (OHRank) algorithm [7], Geng et al. proposed Improved Iterative ScalingLearning from Label Distribution (IISLLD) algorithm [8], and Yin et al. proposed Conditional Probability Neural Network (CPN) algorithm [9]. Among them, IISLLD and CPNN are one of the label distribution (LD) algorithms [10]. All the algorithms in reference [7,8,9] can be used to realize face age recognition. However, a large number of training samples are required, and the number of training samples directly affects the accuracy of recognition. Liao et al. proposed a face age feature extraction method based on deep convolution neural network [11]; it has strong discrimination and robustness, but the implementation of neural network is more complex, and the training time of model is longer.
In this paper, a face age estimation algorithm based on multilabel sorting is proposed, which can solve the problems of insufficient training samples and long training time of the model. At the same time, face age recognition of different races and different genders also has a better discrimination. Compared with other classical algorithms, it can achieve better face age recognition effect.
A review of multilabel sorting learning
Overview of multilabel sorting learning
Multilabel sorting learning can naturally express the complex semantic information of learning objects and has been successful in many fields. It was originally applied in the field of text processing; for example, it often has wrong classification due to the ambiguity of words in the document classification task. In addressing this challenge, at first, Schapire and Singer [12] and others improved the AdaBoost algorithm and achieved significant results in BoostTexter, a Boostingbased multilabel document classification system. After that, multilabel learning began to attract the attention of more and more researchers and often appeared in many research fields such as content identification of multimedia, bioinformatics, and information retrieval.
Learning a good classifier requires a lot of training sample data. However, due to some practical problems, there are very few samples about this kind of markers, so it is impossible to train a highly accurate classification model. For the label between different categories, the traditional classification learning believes that it has almost no correlation to exist independently. However, multilabel learning believes that it may have essential interrelationships for model training. Nevertheless, it is undeniable that making full use of useful information between class labels can effectively reduce the difficulty of learning tasks under limited training data conditions.
Theoretical content of multilabel sorting learning
The key to multilabel learning is to learn possible relationships [13] between the output label of the training samples, using these relationships to optimize mathematical models and improve model accuracy. First, it requires mathematical representation of the input training samples, where X ∈ R^{d} indicates that the feature distribution of each training sample is within the ddimensional feature space, where Y = {y_{1}, y_{2}, ⋯, y_{m}} indicates a collection that covers all training sample category labels. Now, the existing training sample set is D = {(x_{1}, Y_{1}), (x_{2}, Y_{2}), ⋯, (x_{n}, Y_{n})}, in which each sample is represented by a multilabel; x_{i} = [x_{i1}, x_{i2}, ⋯, x_{id}]^{T} represents the characteristics of the ith training sample; Y_{i} ⊂ Y represents the category of annotation by the ith training sample, which is a subset of the label set. In order to meet the requirements of model training, it is necessary to uniformly train the number of class labels in the sample and use y_{i} = [y_{i1}, y_{i2}, ⋯, y_{im}]^{T} to represent the label vector corresponding to the sample. In y_{ij} ∈ {−1, +1}。, if y_{ij} = + 1, sample x_{i} is marked as category y_{j}; otherwise, sample x_{i} is not marked as category y_{j}. The purpose of multilabel learning is to get the mapping function from input to output: g : X → 2^{Y} according to the training sample set. In the design of mapping functions, it is often converted to optimal solution to the function f = X × Y → R according to the model requirements. f(x, y) indicates the possibility that the sample x is marked as y, and the greater f(x, y) is, the greater the probability that the sample x is marked as class y. Therefore, the purpose of training is to make higher confidence level of category between each training sample and its relevant sample, while the lower confidence level of category is between the training sample and its irrelevant sample.
After learning the mapping function, we need to use an evaluation indicator to measure the performance of it. For the multilabel attribute of the sample and learning the correlation between multiple label at the same time, multilabel sorting learning adopts the RL (ranking loss) function [14] as a standard to measure the function of the model. The specific calculation formula is shown in Eq. (1):
where Y_{i} and \( {\overline{Y}}_i \) represent x_{i}’s relevant label set and irrelevant label set, respectively. This indicator is used to calculate the proportion of errors in the label sorting process caused by relevant and irrelevant tags. Therefore, the smaller the value of RL is, the better. When RL is taken to 0, it means that the irrelevant tags are all behind the relevant tags on all the samples.
Method—age estimation mode of multilabel sorting
The primary difficulty faced by the research is the lack of training samples on age estimation of face images. For better age estimation results, multilabel replaces the original single label to represent marked face samples, and the sorting learning is based on the correlation between the age tag and the face image. So age estimation algorithm can well establish the mapping relationship between face image and age based on multilabel sorting learning.
The multilabel sorting learning is undoubtedly an important means to effectively alleviate the specificity of age estimation and the inaccurate age estimates caused by insufficient training data. This kind of multilabel sorting algorithm first expresses facial image with only single age label as a set of vectors, and the size of each element in the vector represents the correlation degree between the facial image and the corresponding age label. At the same time, it sorts the ages in ascending order and makes full use of the ordered information between age labels [15, 16] to integrate the estimation models of each age into a model matrix. It implements the age estimation model through the model matrix, instead of building an age estimation model by constructing multiple binary classifiers. In the model building process, it introduces trace norm [17] of the model matrix to control the complexity of its model algorithm and uses the matrix recovery theory to achieve the optimal solution of the model [18, 19]. After the age estimation model is acquired, a correlation vector is obtained through prediction of the face sample when performing the algorithm model test, all the elements in the vector are sorted in descending order, and the age label with the largest correlation is selected as the estimated age, while the basis for selection is that the larger sorting value indicates the higher relevance between the face image and the age label. This age estimation model shows a great advantage by making full use of the limited face age estimation dataset and successfully introducing a multilabel sorting technique, and at the same time, it is simple and effective to shorten the time of model training [20] in the operation of the matrix.
Based on multilabel face samples
In this paper, in order to adopt multilabel sorting to learn the age estimation model to let multiple age labels mark face images, first, X = [x_{1}, x_{2}, ⋯, x_{n}] ∈ R^{d × n} represents the input of the training sample set, T = {t_{1}, t_{2}, ⋯, t_{m}} is labeled collection of all age labels, t_{1} ≺ t_{2} ≺ ⋯ ≺ t_{m} represents the order relationship between the age label, and Y = [y_{1}, y_{2}, ⋯, y_{n}] ∈ {0, 1}^{m × n} is the age marker status corresponding to the training sample set. Wherein, if the sample x_{i} is marked with the age label t_{i}, the two are related, correspondingly to the elements y_{ij} = 1 and y_{ij} = 0 in the age label vector y_{i}; otherwise, they are not related. According to the label’s representation method, a multiple age label is used instead of a single age label in the age dataset.
Convert a singlelabel sample to a multilabel sample, where each face image corresponds to a label vector. For the traditional face age estimation problem, it is often converted into multiple category classification problem based on individual age labels, then the positive and negative samples are divided for each age value to construct binary classifier. After the use of multilabel representation, simple and effective matrix operations can not only achieve the age estimation algorithm for multilabel learning, but also learn the relationship between ages.
The establishment of age estimation model
After multilabel representation of face images, on the one hand, all age labels are integrated in ascending order. On the other hand, the traditional face age estimation problem based on multiage classification is converted into the study of age matrix. In order to control the complexity of the age estimation model, the matrix norm of the model is introduced, and the matrix estimation theory is used to solve function of age estimation. The specific process of the model is shown in Fig. 1.
First, perform a mathematical description of the correlation function before the age estimation function is established. Assuming that ℓ(z) is a loss function, the prediction function f_{i}(x) is used to estimate the age of test sample as the t_{i} confidence level, generally the confidence value is between 0 and 1. The learned model marks the age of the age tag and gives it a higher score. In order to measure whether the predicted age value predicted by age estimation function is accurate, c is used to calculate the sorting loss of face sample x_{i} marked as vector y_{i} between ages t_{j} and t_{k}. The specific calculation Eq. (2) is as follows:
In which the indicating function is I(z), the confidence value is 1 when z is logical “true,” and the confidence value is 0 when z is logical “false.” The function shows that if the age label of face sample x_{i} is t_{j} instead of t_{k}, that is y_{ij} = 1 and y_{ik} = 0, the resulting correlation measurement should have f_{j}(x_{i}) > f_{k}(x_{i}) according to the prediction function. The smaller the age difference between the age labels t_{j} and t_{k}, the closer the predicted correlation measurements are, then the smaller the sorting loss between them; on the contrary, the larger the gap of age label, the greater the similarity difference calculated, then the more likely to cause larger sorting loss. Therefore, it is in line with the definition that the correct age label is ranked in front of the incorrect age label in the face age estimation process. When the age label of face sample x_{i} is neither t_{j} nor t_{k}, then there is no sorting loss between the age labels t_{j} and t_{k}. From this, it can be seen that the sorting loss is caused by a piece of face image sample on all age labels as follows (3):
The sorting loss over the entire training data set is calculated based on a single face sample image and then obtained according to \( \sum \limits_{i=1}^n\mathrm{\varepsilonup}\left({x}_i,{y}_i\right) \). To simplify the calculation, strictly limit the prediction function to a linear function \( {f}_i\left(\mathrm{x}\right)={w}_i^Tx \). Combine the prediction functions corresponding to all age labels into a parameter matrix, and define W = [w_{1}, w_{2}, ⋯, w_{m}] ∈ R^{d × m} as the matrix parameter to be learned. Assuming that f(W) is the sorting loss over the entire training data set, define f(W) according to the previous multilabel sorting theory as shown in Eq. (4):
In the case of serious shortage of training samples, this practice is easy to produce over fitting phenomenon by directly searching the value of W to minimize the sorting loss f(W). Base on this, a longterm age estimation study found that the face age estimation process is slowly changing and there is an orderly correlation between the age labels. To take full advantage of this correlation, we believe that the prediction functions about W are linearly dependent, so W will also cause the matrix W to be low rank, and its final optimization problem can be expressed by Eq. (5):
Among them, ,·_{2} denotes the matrix spectral norm and ψ denotes the range of matrix W; the range consists of the complexity of the control prediction model and the low rank matrix characteristics. Because it is nonconvex, the amount of calculation to directly solve Eq. (5) is very large. For convenience calculations, introduce the inequality Eq. (6):
Among them, ·_{∗} denotes the matrix trace norm. Use the inequality (2–5) to replace the nonconvex item with the Eq. (7):
So the problem of original optimization becomes a solution to \( {}_{W\in \psi }{}^{\min }\ f(W) \).
In order to further simplify the model, the final age estimation objective function is established by the value range of matrix as regular terms and the optimization problem as Eq. (8) shows:
Among them, the regular item parameter takes the value λ > 0. The parameter is used to balance the loss of the regular term and the training sample set and prevents the over fitting of the objective function.
Optimal solution
After establishing a multilabel ordered objective function, a gradient descent algorithm is used to prove that the objective function is a convex function. In order to simplify the calculation, the logistic function [21] ℓ(z) = log(1 + e^{−z}) is used in this paper as the formula in Eq. (2). At the time of the t iteration, under the premise that the solution of Eq. (8) is W_{t}, firstly calculate the gradient of the objective function F(W) at W = W_{t}, and if the gradient is set to ∇F(W_{t}), then get the updated solution of the objective function as shown in Eq. (9):
Among them, η_{t} represents the step length that is updated at the tth iteration, which is generally set to a value greater than 0. Since U_{t}Σ_{t} is gradient of W_{∗} in W = W_{t}, \( {W}_t={U}_t{\Sigma}_t{V}_t^T \) is SVD (singular value decomposition) decomposition of W_{t}, then:
where, in the formula, \( {\alpha}_{jk}^i=I\left({y}_{ij}\ne {y}_{ik}\right){\ell}^{\prime}\left(\left({y}_{ij}{y}_{ik}\right){x}_i^T\left({w}_j{w}_k\right)\right) \) and \( {e}_j^m \) a is an m vector, in which only the jth element is 1 and the other positions are 0.
Because of the complexity of the SVD decomposition calculation, it is easy to cause huge computational cost when solving the gradient Eq. (10). In the calculation of the gradient of the smooth objective function, many researchers find that its convergence rate can reach O(T^{−2}). In recent years, they have found a similar pattern, which is that the objective function can be solved in an accelerated optimization manner if it contains a smooth term and a trace norm regular term. However, in this paper, accelerated proximal gradient (APG) algorithm [22] is used to solve the optimization problem of the objective Eq. (9). The available update values are shown in Eq. (11) according to the APG algorithm:
Among them, \( {W}_t^{\prime }={W}_{t1}{\eta}_t\nabla f\left({W}_{t1}\right) \) is the optimal solution of the objective function according to the SVD decomposition algorithm, as shown in Eq. (12):
In Eq. (12), Σ_{ληt} is the diagonal matrix, and (Σ_{ληt})_{ij} = max {0, Σ_{ij} − λη_{t}} is also.
After the optimal solution is solved, we need to determine the step value η_{t} of each iteration, which plays an important role in the acceleration algorithm. In this paper, a simple linear search is used to find the most suitable η_{t}. Assume that P_{η}(W_{t − 1}) is the optimal solution of Eq. (11) and Q_{η}(P_{η}(W_{t − 1}), W_{k − 1}) is the optimal value calculated by Eq. (11). First, assume a step value and then search for the optimal step value for each iteration based on F(P_{η}(W_{t − 1})) > Q_{η}(P_{η}(W_{t − 1}), W_{k − 1}). The objective function can get the optimal matrix after the optimization calculation according to Table 1.
Age estimation prediction and model evaluation criteria
Age estimation prediction
Facial age recognition algorithm based on multilabel sorting makes full use of the ordered information between age labels in training samples. Through the multilabel sorting function established in the previous section and the optimization solution, the age characteristic matrix is finally obtained. Assuming that the obtained age characteristic matrix is a, the prediction function of age estimation for this algorithm constructed thereby is as shown in Eq. (13).
Among them, x_{t} is the facial feature vector of the test face sample and y_{t} is the age label relevance vector calculated from the prediction Eq. (13). The size of each element represents the correlation degree between the tested face sample and the corresponding age label in this vector. So all elements are sorted in a descending order according to the correlation degree in the vector, which gives the result that it is most likely to approach the real age when the correlation between the topranked age value and the tested face sample is maximized. Therefore, the age of facial image can be estimated, thereby completing the design and implementation of the entire face age estimation algorithm based on multitag sorting.
Estimation model evaluation criteria
The face age estimation algorithm mainly uses mean absolute error (MAE) and cumulative score (CS) as the standard to measure the accuracy of age estimation [23].
Mean absolute error (MAE)
The mean absolute error is the average value of the absolute error between the predicted age and the true age through the age estimation of all tested face images. The specific formula is shown in Eq. (14):
Among them, \( \widehat{N} \) is the number of tested face samples and a_{i} and \( {\widehat{a}}_i \) are the true age of the ith tested face sample and the predicted age obtained by the age estimation algorithm, respectively. MAE visually describes the accuracy of sample set estimation through the age estimation algorithm. The smaller the MAE value, the higher the accuracy of the age estimation.
Cumulative score (CS)
The cumulative score indicates that the tested face image is predicted by the age estimation process, while the difference between the age estimated by the algorithm and the true age of the face image is less than or equal to the ratio between the number of test samples with predetermined threshold and the total number of test samples. The specific formula is shown in Eq. (15):
Among them, g(·) is a Boolean function, and if x ≤ 0, then g(x) = 1; conversely, g(x) = 0; e is the fault tolerance rate, which is the set threshold; \( \widehat{N} \) is the number of test samples; a_{i} and \( {\widehat{a}}_i \) are the true age of the ith tested face sample and predicted age, respectively. In the case of determining the value of threshold e, the larger the value of CS, the more samples satisfy the condition and the better the effect of age estimation. This threshold is generally set to 10 years old because it has reached the upper limit of the maximum age estimation error.
The performance of the age estimation method can be evaluated from different perspectives based on two evaluation criteria. The MAE values reflect the error level of the age estimation algorithm as a whole; the CS reflects the accuracy of the age estimation method through the error statistic curve within each age error range. These two evaluation methods are complementary and coordinated.
Experimental result and discussions
To verify the accuracy of the proposed age estimation algorithm, a series of test experiments were conducted on the two authoritative age datasets FGNET and RefinedMORPH in this paper. At the same time, to test the performance of the algorithm, the experimental comparison of the algorithm introduced in this paper will work with multiple mainstream algorithms with higher accuracy of age estimation.
Experiment setup
In order to test the performance of multilabel sorting algorithm, this paper selects two public age datasets FGNET and RefinedMORP; according to the characteristics of the respective datasets, different facial features were extracted and the experiments were organized and tested.
(1) The FGNET dataset collects images of 1002 face images of 82 different individuals scanned by old photographs. Each of these images has an average of 6 to 18 face images of different ages, and the age distribution is 0–69 years old. The AAM (active appearance model feature) [24] is extracted from face features using am_tools; it also incorporates face shape and texture information at the same time, thus fully embodying the change characteristics of the facial skull and slack skin during human growth. In the process of extracting the AAM feature of the face image, first, calibrate 68 key feature points of face, then extract face facial features with reference to the AAM feature section, and finally pick out the 95% AAM feature used by 95% of tests. The AAM feature is selected from the extracted features using a PCA (principal component analysis) dimension reduction algorithm, and its data dimension reaches 200 dimensions after selected. According to the characteristics of the age dataset itself, LOPO (leaveonepersonout) processing is used to divide training sets and test sets when conducting the experimental design. That is all face images of an object are selected as test set, and all remaining face images are used as training set. After 82 experiments, the average of all results was used as the final age estimate. Since the FGNET dataset has smaller number of pictures and is aimed at object acquisition, the use of the LOPO processing method is more precise and scientific for experimental design.
(2) In order to avoid the influence of gender and ethnicity on face age estimation studies, this paper uses the RefinedMORPH dataset. The dataset is carefully selected out of 21,060 face data samples from the MORPHII dataset. These samples include 2570 white female face photos (white female, WF), 7960 white male face photos (white male, WM), 2570 black female face photos (black female, BF), and 7960 black male face photos (black male, BM). The number distribution of the dataset is balanced on the white and black pictures. To verify the influence of gender and ethnicity for face age estimation, a total of 16 experiments were set up for the following four aspects: ① no difference between gender and race, ② crossrace, ③ transsex, and ④ crossrace and gender. Before conducting the test experiment, the 4096dimensional BIF feature was extracted from this dataset. It has been favored by many researchers in recent years in the field of face age estimation.
In the process of experimental comparison, in order to ensure the scientificity and consistency of the experiment and eliminate some influences, the age estimation algorithm proposed in this paper and other comparison algorithms are performed under the same conditions including the dataset and face recognition. To verify the performance of multilabel order learning on age estimation, this chapter selects several classic age estimation algorithms as comparison objects, including commonly used classification or regression algorithms to solve the problem of age estimation. In the experiment, UBSVM tool software was used to train SVM and SVR of face age estimation algorithm model. In the kNN algorithm, the k value is set to 30 according to general experience. The experiments of AGES, OHRank, and LD were performed according to the algorithm and parameters designed by the author. But beyond that, the two datasets also verify the convergence of the algorithm.
Meanwhile, in order to further verify the validity of the proposed multilabel learning in this paper and evaluate the impact of the ordering loss (R) and the norm (T) on age estimation in the objective function, compare it with the original classification loss (C) and the F norm (F) and compare it with the four groups of target loss functions: C&F, C&T, R&F, and R&T.
Analysis of results
According to the above experimental setup, this paper compares the proposed age estimation algorithm with the corresponding algorithm on the FGNET and RefinedMORPH datasets and conducts detailed analysis and evaluation. In order to make a scientific evaluation of the algorithm, two evaluation indexes the MAE and the CS were used in this paper.
FGNET dataset
Calculate in accordance with MAE and CS curves on the FGNET dataset. The calculated MAE values are shown in Table 2 for all algorithms.
Table 2 shows the MAE values of the AGES, SVM, SVR, kNN, OHRank, IISLLD, and CPNN algorithms in the FGNET dataset. It is thus clear that although the new proposed OHRank and CPNN algorithms have been reduced in age estimation error in recent years, the multilabel sorting algorithm proposed in this paper has reduced the estimated error rate by 3% and 9% respectively compared to the two, and it has achieved the best effect in all comparison algorithms. The CS curve is calculated on the FGNET dataset according to the CS evaluation index calculation method as shown in Fig. 2, and the threshold setting here accepts an age error value of 10 years.
It can be seen from Fig. 3 that although the FGNET dataset is small, the multilabel sorting learning algorithm proposed in this paper has almost achieved the maximum CS value at each error value. It shows that the algorithm can estimate the number of accurate samples more and more as the acceptable error increases on the whole.
RefinedMORPH dataset
Table 3 shows the MAE values calculated by using the different algorithms in the RefinedMORPH dataset based on the face experiments of different genders and races.
Table 3 shows the MAE values of the AGRES, SVM, SVR, kNN, OHRank, IISLLD, and CPNN algorithms in the RefinedMORPH dataset. In this table, the newly proposed OHRank and label distribution algorithm still belongs to the previously proposed algorithm in general, but it is not as good as the multilabel sorting learning algorithm proposed in this paper on the whole. In addition, from this table, it is also found that the smallest age estimation error and the next smallest MAE value are between samesex ethnic groups, while the largest MAE value is regardless of gender and ethnicity. This finding shows that the problem of age estimation of face is vulnerable to gender and ethnicity. From the data in Table 3, it can be concluded that the algorithm proposed in this paper is superior to other algorithms in the experiments of face age recognition of different sexes and races.
Convergence detection
Figures 3 and 4 show the convergence of the algorithm’s objective function on the FGNET and RefinedMORPH datasets, respectively. Convergence has achieved a faster rate on both datasets; it illustrates the problem of translating the age estimation into matrix model through multilabel sorting learning, which not only simplifies the steps of the age estimation model, but also shortens the construction time of the model.
Comparison between different loss functions
To verify the effect of RL(R) and the norm (T) on age estimation in the objective loss function. In this paper, classification loss (C) and F norm (F) are used as benchmarks for comparison to form C&F, C&T, R&F, R&T total four ageestimated loss functions. Table 4 shows the MAE values obtained on the FGNET and RefinedMORGH datasets. From the table, we can see that R&T achieves a smaller MAE value than other loss functions, which proves the effectiveness of the algorithm.
Based on the above experimental analysis, the age estimation algorithm of multilabel sorting learning proposed by this paper has achieved good results whether it is from the MAE value of evaluating indicator, the CS curve or the convergence speed, which is mainly attributed to the effectiveness of multilabel sorting learning. First, using multiage labels to represent face samples in the case of limited training samples, it not only enriches the representation of the dataset to certain extent, but also transforms the traditional multicategory age estimation method into the solution of age matrix, thereby shortening the training time of the age estimation model. At the same time, in order to learn the orderly information between the age tags, the algorithm uses the sorting loss function and introduces the matrix norm, which not only successfully reduces the age estimation error, but also verifies the effectiveness of the proposed target loss function in this paper through the fourth set of experiments.
Conclusions
Although multilabel learning is widely used in the fields of text analysis, bioinformatics analysis, and so on and presents good generalization ability in dealing with complex things, it is still unknown to improve the accuracy of face age estimation. For the insufficiency of the age dataset, this paper first transforms the single age label of the face image sample into the multilabel vector according to the correlation degree and then integrates a matrix of age characteristics following sequence of age, which changes the traditional method of multiple binary classification. The use and study of matrix operation of age simplifies the tedious steps of age estimation problem and shortens the model training time on age estimation model. At the same time, in order to take full advantage of the agelabel ordering information to make up for the lack of training samples, the multilabel learning method uses ranking loss function to learn the sequence information between all age tags and introduces a matrix trace norm to control the complexity of an age estimation model. The optimization of an age estimation model is achieved through the APG algorithm during the solution process of the model. For the proposed multilabel learning algorithm, this paper conducted a series of experiments on two age datasets and verified its efficiency and accuracy compared to some classical algorithms of age estimation.
Abbreviations
 AAM:

Active appearance model feature
 AGES:

Aging Patterns Subspace
 CPN:

Conditional Probability Neural Network
 CS:

Cumulative score
 IISLLD:

Improved Iterative ScalingLearning from Label Distribution
 kNN:

knearest neighbor sorting
 LD:

Label distribution
 LOPO:

Leaveonepersonout
 MAE:

Mean absolute error
 OHRank:

Ordinal hyperplanes ranker
 PCA:

Principal component analysis
 RL:

Ranking loss
 SDV:

Singular value decomposition
 SVM:

Support vector machine
 SVR:

Support vector regression
 WM:

White male
References
 1.
X. Geng, Z.H. Zhou, K. Smithmiles, Correction to “automatic age estimation based on facial aging patterns”. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 368–381 (2008)
 2.
G. Guo, G. Mu, Y. Fu, Human age estimation using bioinspired features. Comput. Vis. Pattern Recognit. 2009. CVPR 2009. IEEE, 112–119 (2009)
 3.
G. Guo, G. Mu, Simultaneous dimensionality reduction and human age estimation via kernel partial least squares regression. Comput. Vis. Pattern Recognit. IEEE, 42(7), 657–664 (2011).
 4.
G. Guo, Y. Fu, T.S. Huang, Locally adjusted robust regression for human age estimation. IEEE Trans. Pattern Anal. Mach. Intell. 76(6), 331–346 (2014)
 5.
C. Li, Q. Liu, Imagebased human age estimation by manifold learning and locally adjusted robust regression. IEEE Trans. Image Process. 64(1), 1176–1188 (2016)
 6.
R. Hong, Z. Hu, L. Liu, Understanding blooming human groups in social networks. IEEE Trans. Multimedia 17(11), 1–15 (2016)
 7.
C. Li, Q. Liu, W. Dong, Human age estimation based on locality and ordinal information. IEEE Trans. Cybern 45(11), 2522–2534 (2017)
 8.
X. Geng, Z.H. Zhou, K.S. Miles, Facial Age Estimation by Learning from Lable Distributions (Proceedings of the 24th AAAI Conference on Artificial Intelligence, Atlanta, 2010), pp. 451–456
 9.
C. Yin, X. Geng, Facial age estimation by conditional probability neural network. Pattern Recognit. Springer, Berlin Heidelberg 15(2), 243–250 (2012)
 10.
Q. Zhao, X. Geng, Selection of objective functions in marketdistributive learning. Com. Sci. Explor. 11(5), 708–719 (2017)
 11.
Liao H B, Yan Y C, Dai W H and Fan P: Age Estimation of Face Images Based on CNN and DivideAndRule Strategy, Mathematical Problems in Engineering, 2018
 12.
R.E. Schapire, Y. Singer, Improved boosting algorithms using confidencerated predictions. Mach. Learn. 37(3), 297–336 (1999)
 13.
G. Liu, Z. Lin, S. Yan, Robust recovery of subspace structures by lowrank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 171–184 (2010)
 14.
Z.B. Ren, L.L. Wang, Z.L. Fu, Multilabel classification integration learning algorithm based on ranking loss. Comput. Appl. 33((S1)), 40–42 (2013) 68
 15.
C.W.L. Chao, J.Z. Liu, J.J. Ding, Facial age estimation based on labelsensitive learning and ageoriented regression. Pattern Recogn. 46(3), 628–641 (2013)
 16.
K. Chen, S. Gong, T. Xiang, Cumulative attribute space for age and crowd density estimation. IEEE Conf. Comput. Vis. Pattern Recognit. IEEE Comput Soc. 9(4), 2467–2474 (2013)
 17.
K. Yu, X.J. Wu, Semisupervised community discovery of latent mapping based on KL divergence matrix traces. Comput. Eng. 12, 296–302 (2017)
 18.
K.C. Toh, Yun, An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Pacif. J. Optim. 6(3), 615–640 (2010)
 19.
S. Ji, J. Ye, An Accelerated Gradient Method for Trace Norm Minimization (International Conference on Machine Learning, ICML 2009, Montreal, 2009), pp. 457–464
 20.
S.J. Huang, Demilitarization of the Use of Marker Relationships in MultiMarker Learning. Journal of Nanjing University (Natural Science Edition). 56(8), 882890 (2015)
 21.
Y.F. Guo, F.Y. Ning, H.H. Chao, A socialized matrix decomposition recommendation algorithm based on logistic function. J. Beijing Inst. Technol. 36(1), 70–74 (2016)
 22.
Y.M. Wang, J.P. Zhai, Y. Mo, 3D reconstruction of human body based on orthogonal matching tracking and accelerating proximal gradient. Chin. J. Biomed. Eng. 36(4), 385–393 (2017)
 23.
Q. Wang, Face Age Estimation Based on Adaptive Marker Distribution Learning. Journal of Southeast University (Natural Science Edition). 3, 475–479 (2017)
 24.
L.F. Xu, J.Y. Wang, J.N. Cui, Dynamic expression recognition based on dynamic time warping and active appearance model. Chinese. J. Electron. Inf. (EIS) 40(2), 338–345 (2018)
Acknowledgements
The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.
Funding
This work was supported in part by a grant from the Characteristics innovation project of colleges and universities of Guangdong Province (Natural Science, No. 2016KTSCX182, 2016) and a grant from the Youth Innovation Talent Project of colleges and universities of Guangdong Province (No. 2016KQNCX230, 2016).
Availability of data and materials
We can provide the data.
About the authors
Zijiang Zhu received a master’s degree in software engineering from Wuhan University in 2009 and a senior engineer title in 2008. From 2012 to 2016, he was the seventh batch of school level training objects of “thousand, 100 and ten” projects in Guangdong higher education institutions. He is a senior member of the China Computer Federation. He is currently an associate professor, dean of the School of Information Science and Technology, and deputy director of the Institute for intelligent information processing, South China Business College of Guangdong University of Foreign Studies, Guangzhou, China. His current research areas include image processing, machine learning, and big data technology.
Hang Chen received a master’s degree in software engineering from Guangdong University of Technology in 2012 and a senior engineer title in 2007. From 2012 to 2016, he was the seventh batch of school level training objects of “thousand, 100 and ten” projects in Guangdong higher education institutions. He is currently an associate professor and associate dean of the School of Computer Science and Engineering, Tianhe College of Guangdong Polytechnic Normal University, Guangzhou, China. His current research areas include cloud computing technology, image recognition, data mining, and personalized recommendation technology.
Yi Hu received a master’s degree in software engineering from South China University of Technology in 2018 and a senior engineer title for Information System Project Management in 2015. He is currently a lecturer and dean assistant of the School of Information Science and Technology, South China Business College of Guangdong University of Foreign Studies, Guangzhou, China. His current research areas include image processing, machine learning, and big data technology.
Junshan Li was promoted to professor in 1999. He obtained a doctorate in computer system structure in 2001 and was elected as a provincial and ministerial expert in 2002. He is the head of the national boutique resource sharing course and the head of the national boutique course, the director of the China Computer Society, and the director of the Chinese Society of Image and Graphics. He is currently the director of the Institute of Intelligent Information Processing of South China Business School of Guangdong University of Foreign Studies. His current research interests include image processing and image understanding, intelligent computing, and intelligent systems.
Author information
Affiliations
Contributions
All authors take part in the discussion of the work described in this paper. The author ZZ wrote the first version of the paper and did part of the experiments of the paper. HC, JL, and YH revised the paper in different versions of the paper. All authors read and approved he final manuscript.
Corresponding author
Correspondence to Zijiang Zhu.
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Multilabel sorting
 Age estimation of facial images
 Mean absolute error
 Cumulative score