Skip to main content

Image offset density distribution model and recognition of hand knuckle


The accurate description of hand posture plays an important role in the man-machine interaction involved in coordinated assembly. Knuckle image extraction and recognition are of great significance to refine and enrich hand-pose information. These are based on nonparametric density kernel estimation observation sets corresponding to unilateral and bilateral excursion of the hand knuckle gray image. In this paper, sets of pixel positions belonging to the upper- and middle-density intervals are used as two types of image targets. Random clustering and random field multi-classification target modeling are used to learn and estimate the two target distributions of the image. The discriminant field classification learning method is used to fuse the two kinds of target models. A comprehensive representation of the image offset features is obtained. Finally, the knuckle image sample set is used to train the model, and the adaptive threshold is used to identify the hand knuckle image. The results show that the proposed method is feasible.

1 Introduction

In intelligent manufacturing systems, the development of detection technology with high intelligence and strong environmental adaptability is of great significance to improve production efficiency and enhance the flexibility of manufacturing systems and product quality [1, 2]. Machine vision-based human-computer interaction coordination assembly technology uses human assembly gestures obtained from image analysis as input information for robot task planning [3] to realize an efficient and flexible coordinated assembly process [4]. The overall information, including the biological structure of the human hand image and associated hand assembly posture [5], is the basis for inferring the gesture intention.

Gesture-recognition research has two main directions. One uses sensors, detectors, and other peripheral tools to achieve gesture recognition. Lee and You [6] identified complex static gestures using wrist band-based contour features (WBCFs). The user must wear black wristbands to accurately segment the hand area. Moschetti et al. [7] recognized nine gestures with inertial sensors placed on the index finger and wrist. This kind of method, which uses external equipment to extract the hand position and posture to achieve more accurate gesture recognition, lacks convenience. Another research direction is unmarked gesture recognition with captured images. Bao et al. [8] classified images of gestures using deep convolutional neural networks. This method requires no segmentation or detection to distinguish irrelevant non-hand regions. Dehankar et al. [9] used accurate end-point identification (AEPI) to recognize hand-gesture images against varying backgrounds and blurred images. However, the above unmarked gesture-recognition methods are not sufficiently accurate. Their robustness and stability are insufficient, and the pose cannot be completely extracted. Further research is needed to improve their ability to accurately extract hand positions.

Many new identification technologies have emerged in the field of image-feature detection. These methods are used in different fields and vary in their focus. Some focus on feature-extraction techniques. Ding et al. [10] used double local binary patterns (DLBPs) to detect frame peaks in video. Yao et al. [11] presented a feature-selection method based on filters. Some have focused on model building, such as Wang and Wang [12], who modeled an action class of body space configuration with flexible quantities. A hierarchical spatial SPN method was developed to simulate the spatial relationships among sub-images, and sub-image correlation was modeled by additional layers of the SPN. Panda et al. [13] proposed a feature-driven selection classification algorithm (FALCON) to optimize the energy efficiency of machine-learning classifiers. The study of feature clustering is helpful for image-feature classification. Li et al. [14] used an unsupervised principal component analysis (PCA)-based feature clustering algorithm to automatically select the optimal number of clusters to solve the problem of automatic anomaly detection in monitoring applications. Jiang et al. [15] proposed a self-organizing feature clustering algorithm based on fuzzy similarity to extract text features. This method is fast and can extract features better than other methods. Rahmani and Akbarizadeh [16] proposed a spectral clustering method using unsupervised feature learning (UFL).

There is a strong correlation between the structured information of the hand image and the biological structure of the hand. The specific structural information varies with the gesture model, depending on the simplified biological structure. Under static conditions, vision-based gesture-structure modeling is mainly classified as either feature template representation based on two-dimensional models or hand geometry representation based on three-dimensional models, depending on the dimension of the investigated spatial domain [17, 18]. The latter is divided into a volumetric model that considers the surface structure of the hand [19] and a joint-linking model that considers the anatomy of the hand, according to the established differences in the geometric characteristics [20]. The template-based modeling is characterized by gesture-contour information, which makes it difficult to provide detailed kinematic parameter information, and is suitable for scenes where the gesture is simple and the semantic features are clear. For complex situations in which the hand posture is variable, semantic features and time are related, and the structural parameters of the “joint-link” model of the hand are modeled. The overall kinematic representation of the hand can be obtained through structural parameter detection.

The biometric identification of the hand includes skin-color location, fingertip root detection [21], knuckle recognition, finger positioning, and kinematic correlation between features. The knuckle position feature has an important influence on the accuracy of the opponent-pose inference. Knuckle image detection methods are mainly classified as geometric analysis or texture recognition [22,23,24]. Current research of knuckle images focuses on the use of knuckles for identification, sometimes combined with fingerprints. The way and purpose of its research is similar to fingerprint detection [25]. Usha and Ezhilarasan [26] used feature-extraction methods based on angle geometry analysis (AGFEM) and contourlet transform (CTFEM) to authenticate the finger back surface (FBKS) [27], and pointed out that the distal phalangeal region of FBKS, the finger joint area near the tip of the finger, has great potential for recognition. Recognition performance is improved through extraction and integration of knuckle geometry and texture features simultaneously with fractional fusion. Lin et al. [28] provided a practical solution for biometric systems based on the back of the finger through the FKP recognition algorithm. Gao et al. [29] used an adaptive binary fusion rule to adaptively fuse the matching distances before and after reconstruction, reducing false-rejection rate. Kumar and Xu [30] used an automatic finger-recognition study of the lowest finger joint pattern formed between the metacarpal and the proximal phalange.

Image segmentation based on a skin-color model can initially solve the problem of image-positioning in the hand. The important image features that characterize the biological structure of the hand, such as finger posture and knuckle position, still must be further identified. The human finger section is the important positioning point for the human hand posture. Gesture recognition requires accurate knuckle position information for three-dimensional reconstruction to restore the hand biostructure. In the half- and full-grip postures of the hand, corresponding to the joint structure at the joint position of the hand, the grayscale distribution of the knuckle image presents an irregular convex-hull structure near the local position of the finger. A non-deterministic irregular convex hull can be used as a kind of random hidden structure of knuckle images. In a previous article [31], the author took a finger joint image as an example. The examples are directed to a random image with the above gray structure ambiguity, feature ambiguity, and difficulty in extraction. The hidden feature observation of the image is obtained by the density estimation of the gray distribution. This observation is used to establish the framework of learning and estimating algorithms for imagery-implicit feature patterns. The extraction and analysis methods of the offset features on random images are given.

In this paper, the human-computer interaction is coordinated and assembled in an indoor environment where the light intensity is relatively stable and the camera angle is relatively fixed. The research in this paper is based on the image offset density distribution. First, the image upper level density feature is modeled and analyzed with an infinite Dirichlet process model. Then, the image middle-density feature is modeled and analyzed with a Gaussian process classifying model. Finally, the two-level density features are fused by a binary Gaussian process classification. Experiments are carried out to verify the feasibility of the process.

2 Infinite Dirichlet process knuckle image high-level data hybrid model

According to the extraction of the offset feature in a previous article [31], the likelihood representation of the test image A in the random image grayscale distribution model is

$$ \widehat{\mu}\approx \mathbb{D}\left({{\tilde{\mu}}_{{\mathbb{G}}_1\left|\mathrm{\mathbb{P}}\right.}}^{\hbox{'}}\cdot {\delta}_{\left({{\tilde{\mu}}_{\left.\mathrm{\mathbb{P}}\right|}}^{\hbox{'}},{c}_1\right)},{{\tilde{\mu}}_{{\mathbb{G}}_1\left|\mathrm{\mathbb{P}}\right.}}^{\hbox{'}}\cdot {\delta}_{\left({{\tilde{\mu}}_{\left.\mathrm{\mathbb{P}}\right|}}^{\hbox{'}},{c}_{21},{c}_{22}\right)}\right), $$

where \( \widehat{\mu} \) is the approximation form of the offset measure, \( \mathbb{D} \) is the fusion structure between different offset set models under different offset parameters, \( {{\tilde{\mu}}_{{\mathbb{G}}_1\mid \mathrm{\mathbb{P}}}}^{\hbox{'}} \) is the high-level offset set probability measure, and \( {{\tilde{\mu}}_{{\mathbb{G}}_2\left|\mathrm{\mathbb{P}}\right.}}^{\hbox{'}} \) is the middle offset set probability measure.

For ease of calculation and presentation, the conditional random measure is expressed as

$$ p\left(\cdot \right)\propto {{\tilde{\mu}}_{{\mathbb{G}}_1\mid \mathrm{\mathbb{P}}}}^{\hbox{'}}\left(\cdot \right), $$

where p() is the non-negative two-dimensional density function corresponding to the target distribution.

For the learning problem of unilateral offset density in image stochastic models, this section uses an infinite Dirichlet process hybrid model. Based on the gray-level position data extracted from the nonparametric density kernel estimation results, the probability measure \( {{\tilde{\mu}}_{{\mathbb{G}}_1\mid \mathrm{\mathbb{P}}}}^{\hbox{'}} \) of the offset set belonging to the fixed threshold c in the image domain is learned. The number of clusters is described as a random state, and the Gibbs sampling method is used to iteratively study the density structure of the hierarchical probability form under the assumption of the Markov neighborhood. Through learning and modeling the offset set distribution, the unilateral estimation of the gray particle random model is realized.

2.1 Horizontal density clustering and Markov assumptions for discrete observations

In the layered observations of the density estimate fK, the process of determining the positions of the unilaterally offset grid points that belong to the horizontal parameter c is equivalent to the marking process on the discrete grid points of the image:

$$ \mathrm{V}=\left\{\left(x,y\right)\in Z|{f}_K\left(\left(x,y\right);{t}_X\right)/\left(\underset{Z}{\max }{f}_K\right)\ge c,c>0\right\} $$
$$ Z=\mathrm{V}\cup \left(Z\backslash \mathrm{V}\right). $$

Among them, the marker amount constitutes a hidden variable at the observation grid Z. To learn the distribution model of the observations by using observations, the relationship between observations, label classes, and offset measures on grid Z must be established. On V and Z\V, respectively, the position in the observed set V has a definite marker class 1 on the image. However, the labeling category Z\V on the unobserved position set is uncertain. Under the assumption of the continuity of the distribution model, the position where the marker category is indefinite should be understood as not observed, and the 0 marker cannot directly determine the corresponding observation result. The label category indicates whether the observation position belongs to the offset set under the level c. However, when estimating the overall offset measure using the observation data, it is necessary to further specify the mark relationship between the elements of the sets V and Z\V to integrate the mark relationships on the entire grid point Z. In connection with the data-extraction process in the previous article [30], the dependency relationship between grid observations can be established on Z using the mixture graph structure as a basis for subsequent inference learning using observation data. The relationship between marker categories and grid positions is established through a directed graph structure. At the same time, pairwise Markov random fields are used to establish a dependency relationship between the discrete grid points on the imaging domain, i.e., the distribution of p() on Z.

Thus, the hidden Markov model with observation markers is constructed on the grid point Z, as shown in Fig. 1. Among them, the hidden variable is the mark type, and the correlation factor is the local dependency on the offset set. The observations extracted based on the density estimate have neighborhood structures similar to those observed in the original grayscale image. Therefore, the corresponding semantic p() on the grid point Z not only forms a meaning on the image as a whole but also has a dependency in the local area and is a local Markov hypothesis on the corresponding undirected graph model:

$$ p\left\{{x}_i\in \mathrm{V}|{x}_{Z\backslash i}\right\}=p\left\{{x}_i\in \mathrm{V}|{x}_{\Gamma (i)}\right\}. $$
Fig. 1
figure 1

Hidden Markov model of excursion set observation on discrete lattice in image fields

That is to say, observations that depend on the overall distribution are separated from the whole in the form of local associations.

According to the above analysis, on the one hand, the offset measure on the random hyperparameter field f reflects the characteristics of the observation mark classification and the density distribution agglomeration under the local relation. On the other hand, considering that when the offset set level parameter is higher, the Euler indicative number of the offset set is larger, and it shows that the local coverage of the offset set at the high level in the planar domain is more complete and shows more of a clustering trend. Therefore, the learning problem for p() can be transformed to a random clustering learning problem. This section uses the infinite Dirichlet process mixture model of the clustering model to construct the probability density p() in Eq. 2. The Gibbs sampling method is used to iteratively study the density structure of the hierarchical probability form under the assumption of the Markov neighborhood.

2.2 Nonparametric distributions and infinite Dirichlet processes

To improve the adaptability of the model to the target distribution, the model of the target distribution p is represented as a nonparametric hybrid model,

$$ p\left({x}_{\tau (1)},{x}_{\tau (2)},\dots, {x}_{\tau (N)}\right)\kern0.6em =\underset{\Theta}{\int }p\left(\theta \right)\prod \limits_{i=1}^Np\left({x}_i|\theta \right) d\theta, $$

where θ is a hyperparameter, which is not limited to a limited form of distribution to improve the learning effectiveness and image-recognition rate.

In particular, the Dirichlet process defines the distribution of stochastic components on a stochastic probability measure as an effective alternative to parametric model learning. The nonparametric method constructs a stochastic process on the infinite dimensional parameter space Θ and quantifies it by the finite statistics of the stochastic process, where Θ is the measurable space. The Dirichlet process is defined by the base measure H on Θ and the central parameter α. Θ's limited distribution(T1,  … , Tk) is:

$$ {\cup}_{k=1}^K{T}_k=\Theta, \kern0.4em {T}_k\cap {T}_l=\varnothing, k\ne l $$

The mean of the random probability distribution G on Θ over the finite-match diversity T follows the Dirichlet distribution:

$$ \left(G\left({T}_1\right),\dots, G\left({T}_k\right)\right)\sim Dir\left(\alpha H\left({T}_1\right),\dots, \alpha H\left({T}_k\right)\right). $$

The random process DP(α, H) is defined by the central parameter α and the base measure H.

Since the parameter α controls the probability distribution of random parameter sets in the Dirichlet process, the later update and accurate sampling have a decisive effect on the convergence of iterative learning. Since the sampling strategy of α is related to the generation mechanism of the random measure distribution in the Dirichlet process, the sampling details corresponding to different generation mechanisms differ slightly. In this section, under the idea of discrete approximation based on lattice Gibbs mixed sampling, the prior distribution is taken as the Gamma distribution,

$$ \alpha \sim G\left(a,b\right). $$

The posterior condition update form using a multi-gamma distribution mixing representation (taking a mix number of 2) is

$$ \left(\alpha |\eta, k\right)\sim {\pi}_{\eta }G\left(a+k,b-\log \left(\eta \right)\right)+\left(1-{\pi}_{\eta}\right)G\left(a+k-1.b-\log \left(\eta \right)\right), $$

where G is the Gamma distribution, K is the current number of updated clusters in the Dirichlet blending process (DPMM), and n is the observed data volume,

$$ \eta \sim Beta\left(\alpha +1,n\right) $$
$$ {\pi}_{\eta }=\frac{a+K-1}{a+K-1+n\ast \left(b-\log \left(\eta \right)\right)}. $$

To improve the sampling accuracy and stability, the Monte Carlo sampling method is used and the sample mean of the above conditional distribution is taken as the final sampling result:

$$ p\left(\alpha |{D}_n\right)\approx {N}^{-1}{\sum}_{s=1}^Np\left(\alpha |{\eta}_s,{k}_s\right), $$

where N is the number of samples and ks may have a degeneration value of K.

For the intra-group parameters, the specific components of the image offset set target distribution can adopt a two-dimensional Gaussian distribution. In order to make the update law of Gaussian distribution, parameters meet the requirements of a posteriori maximization. For the two quantities to be learned, namely the mean parameter and the covariance, the normal-Inverse-Wishart distribution [32] can be taken as the conjugate form of the corresponding joint edge distribution. The posterior update law of its parameters is:

$$ \left\{\begin{array}{l}{\mu}_n=\frac{K_0}{K_0+n}\kern0.5em {\mu}_0+\frac{n}{K_0+n}\;\overline{x}\\ {}{K}_n={K}_0+n\\ {}{v}_n={v}_0+n\\ {}{\Lambda}_n={\Lambda}_0+S+\frac{K_0\;n}{K_0+n}\;\left(\overline{x}-{\mu}_0\right){\left(\overline{x}-{\mu}_0\right)}^T\end{array}\right. $$

where μ0, Λ0, K0, andν0 are the initialized mean parameters, the scale matrix, the data dimension, and the degree of freedom. Get posterior joint edge distribution:

$$ \rho \left(\mu, \sum |\kern0.24em D,{\mu}_0,{\mathrm{K}}_0,{\Lambda}_0,{\nu}_0\right)= NIW\left(\mu, \sum |\kern0.24em {\mu}_n,{\mathrm{K}}_n,{\Lambda}_n,{\nu}_n\right) $$

By sampling the above distribution, an effective clustering parameter update can be obtained, and the update learning of parameters in each mixed Gaussian component can be realized.

2.3 Infinite Dirichlet process mixed model based on collapsed Gibbs sampling

According to the N observations \( x={\left\{{x}_i\right\}}_{i=1}^N \) of the Dirichlet process mixed model, the hidden variable label zi, the total number of clusters, and the corresponding parameter \( {\left\{{\theta}_k\right\}}_{k=1}^K \) are inferred. The exact posterior distribution p(π, θ| x) contains the distributions corresponding to all possible category labeling spaces, and it uses a collapsed Gibbs sampling algorithm to implement iterative learning of an infinite clustering mixture model. First, all observed variables are sampled with their corresponding hidden variables zi, then the posterior edge π of the polynomial corresponding to the current label class distribution and all clustering hyperparameters \( {\left\{{\theta}_k\right\}}_{k=1}^K \) is calculated.

Fixing the rest of the observation variables of the latent variable z\i, the current distribution of the hidden variables of the current measurement is

$$ p\left({z}_i|{z}_{\backslash i},x,\alpha, \lambda \right)\propto p\left({z}_i|{z}_{\backslash i},\alpha \right)p\left({x}_i|z,{x}_{\backslash i},\lambda \right). $$

Under the assumption of exchangeable text, the first item in the above formula can be expressed as

$$ p\left({z}_i|{z}_{\backslash i},\alpha \right)=\frac{1}{\alpha +N-1}\left({\sum}_{k=1}^K{N}_k^{-i}\delta \left({z}_i,k\right)+\alpha \delta \left({z}_i,\overline{k}\right)\right), $$

where \( \overline{k} \) represents the cluster label in all current, infinitely many empty tag categories. Similar to the finite mixture model, the likelihood of observing the fixed class model at xi is

$$ p\left({x}_i|{z}_i=k,{z}_{\backslash i},{x}_{\backslash i},\lambda \right)=p\left({x}_i|\left\{{x}_j|{z}_j=k,j\ne i\right\},\lambda \right). $$

Similarly, the predicted likelihood of the current observation xi under the new marker \( \overline{k} \) is

$$ p\left({x}_i|{z}_i=\overline{k},{z}_{\backslash i},{x}_{\backslash i},\lambda \right)=p\left({x}_i|\lambda \right)=\underset{\Theta}{\int }f\left({x}_i|\theta \right)h\left(\theta |\lambda \right) d\theta, $$

where H(λ) is the specified conjugate prior. The Dirichlet process hybrid model contains infinitely many goals to be learned parameters and generalizes the learning inference of the finite mixture model. The specific flow is as follows:

  • The next resample of sample marker \( {z}_i^{(t)} \) is started with the Dirichlet hyperparameters \( {\alpha}_0^{\left(t-1\right)} \) and \( {z}_i^{\left(t-1\right)}\left(i=1,\dots, N\right) \).

  • The random array {1, 2,  … , N} of the observation sequence τ() is sampled.

  • According to the last iteration, the initialization parameters are set to z = z(t − 1) and \( {\alpha}_0={\alpha}_0^{\left(t-1\right)} \).

  • For random arrangement iτ(1), … , τ(n):

    1. (a)

      The observation data xi are removed from the marker class zi, and the sufficient statistics \( {S}_{z_i} \) and \( {n}_{z_i} \) of the observation class zi are updated.

    2. (b)

      If xi is the only observation in the current category, the category label and all corresponding clustering parameters are cleared. Update statistics \( {S}_{z_i} \) and \( {n}_{z_i} \), total K = K − 1 of marker class.

    3. (c)

      Relabel all non-empty activation categories 1, …, K.

    4. (d)

      Calculate the prediction likelihood for all K-like clusters that are activated based on the statistics \( {\left\{{S}_k\right\}}_{k=1}^K \) and \( {\left\{{n}_k\right\}}_{k=1}^K \):

$$ {f}_k\left({x}_i\right)=p\left({x}_i|\left\{{x}_j|{z}_j=k,j\ne i\right\},\lambda \right). $$

At the same time, calculate the potential marker distribution:

$$ {f}_{K+1}\left({x}_i\right)=\int F\left({x}_i|\theta \right){G}_0\left(\theta \right) d\theta . $$
  1. (e)

    Sample new class of zi from the (K + 1)-dimensional polynomial distribution:

$$ {z}_i\sim \left(\alpha {f}_k\left({x}_i\right)\delta \Big({z}_i,\overline{k}\left)+{\sum}_{k=1}^K{N}_k^{-i}{f}_k\left({x}_i\right)\delta \right({z}_i,k\Big)\right)/{Z}_i $$
$$ {Z}_i=\alpha {f}_k\left({x}_i\right)+{\sum}_{k=1}^K{N}_k^{-i}{f}_k\left({x}_i\right), $$

where \( {N}_k^{-i} \) is the total number of observations for which the current observation position i belongs to the label k.

  1. (f)

    If zi = K + 1, a new clustering marker is obtained and denoted as K + 1. The new clustering parameter corresponding to (K + 1) is sampled by H(ϕi| xi).

  2. (g)

    Update sufficient statistics \( {\left\{{S}_k\right\}}_{k=1}^K \) and \( {\left\{{n}_k\right\}}_{k=1}^K \) for all category markers.

  • It is judged whether all categories are resampled. If not, return to the flag u1, and return to for the next resampling.

  • Sample all clustering parameters for all tagged classes:

$$ {\theta}_k^{(t)}\sim p\left({\theta}_k|\left\{{x}_i|{z}_i^{(t)}=k\right\},\lambda \right). $$
  • Sample using the auxiliary variable method:

$$ \alpha \sim Gamma\left(a,b\right),{\alpha}_0^{(t)}\sim p\left({\alpha}_0|K,n,a,b\right). $$

3 Method—knuckle image mid-level data model

In view of the complexity of the random offset set itself, the difference between the offset characteristics corresponding to different offset parameter intervals is relatively large. And the further the offset parameter is from the standard value of 1, the more complex the corresponding feature. Therefore, in the learning process of random image bilateral offset measurement, especially for the case of small offset parameters, it is necessary to deeply analyze the random distribution characteristics of the actual offset observations in the training image database and to select an appropriate model for learning. In this section, we obtain the \( {{\tilde{\mu}}_{{\mathbb{G}}_2\mid \mathrm{\mathbb{P}}}}^{\hbox{'}} \) equivalent density estimate \( p\left(\cdot \right)\propto {{\tilde{\mu}}_{{\mathbb{G}}_2\mid \mathrm{\mathbb{P}}}}^{\hbox{'}}\left(\cdot \right) \) by learning the multi-label distribution random field model for the mid-density location.

3.1 Mid-level data distribution training based on Gaussian process classification

Due to the complexity of the distribution patterns in the middle-level data, it is difficult to obtain a mid-level migration density distribution model with relatively obvious features and a certain resolution. According to the nonparametric density kernel estimation result, the image gray position data corresponding to the offset parameter in the selected interval segment are taken as an observation of the random offset image bilateral offset set. The endpoints of the offset parameter interval are c21 = 0.50 and c22 = 0.85, and the middle-layer data are further divided into a multilayer structure corresponding to the three types of labels, according to the level of the corresponding density level, as shown in Fig. 2. Figure 2 a–c respectively correspond to observations in the intervals 0.70–0.85, 0.60–0.75, and 0.50–0.65. With the decrease of the offset parameter, the distribution pattern in Fig. 2a has certain regression characteristics. Figure 2b shows a clustering trend, and Fig. 2c shows the spread features. Comparing the transitions between the three graphs, it can be seen that the mid-level data distribution does not obviously have the clustering patterns or trends implicit in the high-level data distribution. Instead, it reflects the characteristics of random fields, i.e., the overall distribution of middle-level data, has the transition characteristics from clustering to irregular diffusion. In the above judgment of the distribution characteristics of the mid-level data, a random distribution modeling method can be used to learn the typical distribution states of the three parameter segments in the “clustering-diffusion” classification mode and to obtain \( {{\tilde{\mu}}_{{\mathbb{G}}_2\mid \mathrm{\mathbb{P}}}}^{\hbox{'}} \) in Eq. 1, the estimated results on the plane domain.

Fig. 2
figure 2

Three-class label observation of middle-layer data from knuckle image. From left to right is a, b, and c

For multi-classification problems on the random field, consider that different label values correspond to different horizontal parameter ranges, and the label class values are taken in a limited discrete space. At the same time, the relationship between multi-category tags at the image position is not completely determined. Therefore, the random distribution of all category labels must be uniformly modeled and expressed to better restore the overall characteristics of the mid-level data distribution. In this section, the Gaussian process model is used to take the observation data as the training sample set X. The Bernoulli distribution is used to represent the probability of a single-class label at a fixed position of the image, and the probability result of the class label y at the airport is used as the training output. The distribution pattern among the three types of tags further contains two types of information: one is the activation and transformation of state tags at the same location and the other is the distribution relationship between different locations and multiple states. For the former, the Gibbs form is used to represent the parameter association in the corresponding polynomial distribution of the label. Accounting for the limitations of the complexity of the learning process, this paper assumes that the different tag classes between image locations are irrelevant, and the joint distribution of tags of the same type has Gaussian characteristics. The Gaussian field function f is used to represent tag associations between the same classes:

$$ {y}_i^c\mid {f}_i^c\sim \mathrm{Bern}\left(\sigma \left({f}_i|{y}_i^c=1\right)\right) $$
$$ p\left({y}_i^c|{f}_i^c\right)={\pi}_i^c=\sigma \left({y}_i^c{f}_i^c\right)=\frac{\exp \left({f}_i^c\right)}{\sum_{c\hbox{'}}\exp \left({f}_i^{c\hbox{'}}\right)}, $$

where the location i tag \( {y}_i^c \) has {0, 1} value, and the f vector form is \( f=f\left({f}_1^1,\dots, {f}_n^1,{f}_1^2,\dots, {f}_n^2,{f}_1^3,\dots, {f}_n^3\right) \). It has a prior form \( f\mid X\sim \mathcal{N}\left(0,K\right) \), where K is the corresponding covariance function and n is the amount of training data. Assuming the category information is not related, K has the form of a diagonal matrix, K =  diag {k1, k2,  … , kc}, where kc represents the trust relationship between each type of tag data. Therefore, the learning of the middle-level migration measure is transformed to the learning of the random quantity f.

3.2 Posterior calculations on Gaussian fields with multiple binary classifications

Since the field fi = f(| xi) is a Gaussian function, the posterior form is also Gaussian:

$$ f\sim N\left(\widehat{f}|f,{A}^{-1}\right)\propto \exp \left(-\frac{1}{2}{\left(f-\widehat{f}\right)}^{\mathrm{T}}A\left(f-\widehat{f}\right)\right). $$

The maximum posterior estimate of the implicit function f is defined as \( \widehat{f}=\arg {\max}_fp\left(f|X,y\right) \), \( A=-\mathrm{\nabla \nabla}\log p\left(f=\widehat{f}|X,y\right). \)

Under the Bayesian framework,

$$ p\left(f|X,y\right)=p\left(y|f\right)p\left(f|X\right)/p\left(y|X\right). $$

Since the classification mark y of test dataset X is not directly related to f, i.e., p(y| X) does not include f, the posterior maximum solution of f corresponds to the log likelihood of \( \widehat{f} \):

$$ {\displaystyle \begin{array}{l}\Psi (f)\triangleq \log \left(p\left(y|f\right)p\left(f|X\right)\right)\\ {}\kern4em =-\frac{1}{2}{f}^T{K}^{-1}f+{y}^Tf-{\sum}_{i=1}^n\begin{array}{l}\log \left({\sum}_{c=1}^C\exp {f}_i^c\right)\\ {}-\frac{1}{2}\log \left|K\right|-\frac{Cn}{2}\log 2\pi \end{array}\end{array}}. $$

The posterior solution \( \widehat{f} \) corresponds to the zero of Ψ = 0. After differentiation of the above formula, it is obtained that

$$ \mathrm{\nabla \Psi }=-{K}^{-1}f+y-\pi . $$

The zero point of this type is the prediction solution \( \widehat{f}=K\left(y-\widehat{\pi}\right) \) of the implicit function f variable. We further use the following differential relationship:

$$ -\frac{\partial^2}{\partial {f}_i^c\partial {f}_i^{c\hbox{'}}}\log {\sum}_j\exp \left({f}_i^j\right)={\pi}_i^c{\delta}_{cc\hbox{'}}+{\pi}_i^c{\delta}_{cc\hbox{'}}+{\pi}_i^c{\pi}_i^{c\hbox{'}} $$
$$ \mathrm{\nabla \nabla \Psi }=-{K}^{-1}-W,\kern0.4em W\triangleq \mathit{\operatorname{diag}}\left(\pi \right)-{\Pi \Pi}^{\hbox{'}}, $$

where Π is a Gibbs distribution π corresponding to a cn × n scale column block matrix.

We use the Newton iteration format to obtain implicit function updates:

$$ {f}^{new}=f-{\left(\mathrm{\nabla \nabla \Psi}\right)}^{-1}\mathrm{\nabla \Psi } $$
$$ {\displaystyle \begin{array}{l}f-{\left(\mathrm{\nabla \nabla \Psi}\right)}^{-1}\mathrm{\nabla \Psi }=f+{\left({K}^{-1}+W\right)}^{-1}\left(-{K}^{-1}f+y-\pi \right)\\ {}\kern11.5em ={\left({K}^{-1}+W\right)}^{-1}\left( Wf+y-\pi \right).\end{array}} $$

Because the matrix K is a larger cn × cn diagonal block matrix and the bandwidth is large, to improve the accuracy and speed of the inversion, the following decomposition is used:

$$ {\displaystyle \begin{array}{l}{\left({K}^{-1}+W\right)}^{-1}=K-K{\left(K+{W}^{-1}\right)}^{-1}K\\ {}=K-K{\left(K+{D}^{-1}-{RO}^{-1}{R}^{\mathrm{T}}\right)}^{-1}K\\ {}=K-K\left(E- ER{\left(O+{R}^{\mathrm{T}} ER\right)}^{-1}{R}^{\mathrm{T}}E\right)K\\ {}=K-K\left(E- ER{\left({\sum}_c{E}_c\right)}^{-1}{R}^{\mathrm{T}}E\right)K\end{array}}, $$

where E = (K + D−1)−1 = D1/2(I + D1/2KD1/2)−1D1/2.

The convergence monitoring of the above iterative process is represented by the likelihood value of the training data:

$$ p\left(y|X\right)=\int p\left(y|f\right)p\left(f|X\right) df=\int \exp \left(\Psi (f)\right) df. $$

Under Laplacian approximation, the form of Eq. 30 under local approximation of Ψ() is

$$ \Psi (f)\approx \Psi \left(\widehat{f}\right)-\Psi \left(\varDelta f|\widehat{f}\right)\simeq \Psi \left(\widehat{f}\right)-\frac{1}{2}{\left(f-\widehat{f}\right)}^{\mathrm{T}}A\left(f-\widehat{f}\right). $$

The likelihood of the training data for the posterior model can be approximated as

$$ p\left(y|X\right)\simeq q\left(y|X\right)=\exp \left(\Psi \left(\widehat{f}\right)\right)\int \exp \left(-\frac{1}{2}{\left(f-\widehat{f}\right)}^{\mathrm{T}}A\left(f-\widehat{f}\right)\right) df. $$

The likelihood of the training data for the posterior model can be approximated as

$$ \int \exp \left(-\frac{1}{2}{\left(f-\widehat{f}\right)}^{\mathrm{T}}A\left(f-\widehat{f}\right)\right) df\propto \frac{1}{\sqrt{{\left|{K}^{-1}+W\right|}^{-1}}}. $$

The logarithmic form of the likelihood can be expressed as

$$ {\displaystyle \begin{array}{l}\log p\left(y|X\right)\simeq \log q\left(y|X\right)=-\frac{1}{2}{\widehat{f}}^{\mathrm{T}}{K}^{-1}\widehat{f}+\log p\left(y|\widehat{f}\right)-\frac{1}{2}\log \left|K\right|-\frac{1}{2}\log \left|{K}^{-1}+W\right|\\ {}=-\frac{1}{2}{\widehat{f}}^{\mathrm{T}}{K}^{-1}\widehat{f}+{y}^{\mathrm{T}}\widehat{f}-{\sum}_{i=1}^n\log \left({\sum}_{c=1}^C\exp {\widehat{f}}_i^c\right)-\frac{1}{2}\log \left|{I}_{Cn}+{W}^{1/2}{kW}^{1/2}\right|\end{array}} $$

From this, an implicit function posterior update based on the overall training dataset is obtained. The algorithm is as follows:

  • Input observation measurement marker y, covariance matrix K, and probability marker function initialization f 0.

  • Calculate the label distribution law of the current observation variable:

$$ \Pi :\kern0.5em p\left({y}_i^c|{f}_i\right)={\pi}_i^c=\exp \left({f}_i^c\right)/{\sum}_{c\hbox{'}}\exp \left({f}_i^{c\hbox{'}}\right) $$
$$ \mathrm{\nabla \Psi }=-{K}^{-1}f+y-\pi, \kern0.5em \mathrm{\nabla \nabla \Psi }=-{K}^{-1} $$

where W  diag (π) − ΠΠT.

  • For each class of implicit labels c = 1,2, …, C, calculate:

$$ L:= \mathrm{Cholesky}\left({I}_n+{D}_c^{1/2}{K}_c{D}_c^{1/2}\right) $$
$$ {E}_c={D}_c^{1/2}{L}^T\backslash \left(L\backslash {D}_c^{1/2}\right),\kern0.5em {z}_c:= {\sum}_i\log {L}_{ii}. $$
  • Calculate transition parameters:

$$ M:= \mathrm{Cholesky}\left({\sum}_i{E}_i\right) $$
$$ b:= \left(D-{\Pi \Pi}^T\right)f+y-\pi, \kern0.5em c:= EKb $$
$$ a:= b-c+{ERM}^T\backslash \left(M\backslash \left({R}^Tc\right)\right)\kern0.5em f:= Ka. $$
  • Calculate the objective function and determine if it converges. If it does not converge, return to .

$$ \mathrm{objective}\kern0.3em \mathrm{function}=-\frac{1}{2}{a}^Tf+{y}^Tf+{\sum}_i\log \left({\sum}_c\exp \left({f}_c^i\right)\right) $$
  • Compute edge likelihood prediction and hidden signature distribution edge prediction:

$$ \log q\left(y|X,\theta \right):= -\frac{1}{2}{a}^Tf+{y}^Tf+{\sum}_i\log \left({\sum}_c\exp \left({f}_c^i\right)\right)-{\sum}_c{z}_c $$
$$ \widehat{f}:= f(label). $$

3.3 Structure of positive definite kernel function of random information in the middle level

The design of the difference matrix under different systems contains different understanding of training data sets. On the one hand, there is an association between the extraction process of image data and its physical meaning. On the other hand, the actual image data has the characteristics of being distributed near higher layer data during the extraction process. Therefore, the covariance matrix used in this section of the multi-class learning algorithm references the results of high-level data learning. The high-level data model information is substituted into the covariance matrix of the middle-level data learning to further improve the learning effect of the GP model. For the three-category learning process used in this section,

$$ K\left(x,{x}^{\hbox{'}}\right)=\mathit{\operatorname{diag}}\left\{{k}_1,{k}_2,{k}_3\right\}, $$

and the design parameter array is p = [1, 0.5  ,  0.25; 0.5  ,  1  , 0.5; 0.25  ,  0.5  , 1].

In the above equation, the sub-diagonal array k1 corresponds to the position of the image observation data at the density estimation level of 75 to 85% in the mid-level dataset of the image. After testing, it was found that although the aggregation level of this category dataset is weaker than the aforementioned high-level data model, it still has a certain clustering trend. Therefore, the associated credits between this category of data can be designed as an exponential clustering pattern, and the closest clustering component of the observation data can be found. The final correlation result of the clustering trend between x and x' is determined using an isotropic exponential function. The design of the parameter array pij further confirms the labeling of the best high-level clustering component to which the two observations belong, i.e., when the data in the two images belong to the same clustering component in a high-level model, they have a higher degree of trust.

In the design of sub-diagonal arrays, the image data corresponding to the 60–75% density estimation level in the mid-level dataset of the image no longer have a clustering trend, but surround the cluster centers. The clustering center has both attractiveness and repulsiveness to the category data. Therefore, we can consider adding a certain empirical offset Δ to the likelihood value of the high-level model corresponding to the observation position of the category image. The offset is taken as the empirical value 0.7. In the underlying data, i.e., the 50–65% density estimation range of the mid-level data in the image, the position distribution has basically been irrelevant to the clustering, and only the distance form is used:

$$ {k}_1\left(x,{x}^{\hbox{'}}\right)=\exp \Big\{\left\{-\min \left\{\left(x-{\mu}_i\right){k}_i{\left(x-{\mu}_i\right)}^{\hbox{'}}/\Delta {l}_i\right\}-\min \left\{\left({x}^{\hbox{'}}-{\mu}_i\right){k}_i{\left({x}^{\hbox{'}}-{\mu}_i\right)}^{\hbox{'}}/\Delta {l}_i\right\}-\frac{\left\Vert x-{x}^{\hbox{'}}\right\Vert }{2{p}_{ij}}\right\} $$
$$ {k}_2\left(x,{x}^{\hbox{'}}\right)=\exp \left\{-\left(\Delta -{\sum}_i{\omega}_i\mathcal{N}\left(x,{\mu}_i,{\sigma}_i\right)\right)-\left(\Delta -{\sum}_i{\omega}_i\mathcal{N}\left({x}^{\hbox{'}},{\mu}_i,{\sigma}_i\right)\right)-\left\Vert x-{x}^{\hbox{'}}\right\Vert /l\right\} $$
$$ {k}_3\left(x,{x}^{\hbox{'}}\right)=\exp \left\{-\left\Vert x-{x}^{\hbox{'}}\right\Vert /l\right\}. $$

Figure 3 shows an example of the covariance matrix in the learning process of a multi-class model of middle-finger data in the distal phalanx and middle-finger images. The covariance scale is 3n × 3n, and n is the data volume of the layer density observation set in the training image. It can be seen that there is a certain difference in the degree of trust between the different diagonal block arrays for class positions. Only the sub-array k3, in the form of a distance has a stronger associative relationship with respect to the positions of gray particles in the same class. Considering that the degree of data association carried by the sub-array k1 in the form of clustering is the smallest, it indicates that the overall diffusion trend of the middle-level data is strong and the clustering trend is relatively weak.

Fig. 3
figure 3

Covariance matrix of GP learning on middle-layer data from knuckle image. a Covariance matrix on GCP model on far knuckles 1 and 2. b Covariance matrix on GCP Model on middle knuckles 1 and 2

3.4 Model prediction process

The marker-predicted implicit vector functionsf of the test data x obey the approximation distribution:

$$ {f}_{\ast}\sim q\left({f}_{\ast }|X,y,{x}_{\ast}\right). $$

Under the Bayesian conditional distribution, the prediction of the test position x in the training dataset X is expressed in integral form as

$$ q\left({f}_{\ast }|X,y,{x}_{\ast}\right)=\int p\left({f}_{\ast }|X,{x}_{\ast },f\right)q\left(f|X,y\right) df. $$

Since p(f| X, x, f) and q(f| X, y) are Gaussian distributions, the c-label prediction of test data x is

$$ {\mathbb{E}}_q\left[{f}^c\left({x}_{\ast}\right)|X,y,{x}_{\ast}\right]={k}_c{\left({x}_{\ast}\right)}^{\mathrm{T}}{K}_c^{-1}{\widehat{f}}^c={k}_c{\left({x}_{\ast}\right)}^{\mathrm{T}}\left({y}^c-{\widehat{\pi}}^c\right), $$

where kc(x) is the c-type marker covariance vector between test data x and all training set data X. The prediction covariance matrix is

$$ {\displaystyle \begin{array}{l}{\operatorname{cov}}_q\left({f}_{\ast }|X,y,{x}_{\ast}\right)=\Sigma +{Q}_{\ast}^{\mathrm{T}}{K}^{-1}{\left({K}^{-1}+W\right)}^{-1}{K}^{-1}{Q}_{\ast}\\ {}=\mathit{\operatorname{diag}}\left(k\left({x}_{\ast },{x}_{\ast}\right)\right)-{Q}_{\ast}^{\mathrm{T}}{\left(K+{W}^{-1}\right)}^{-1}{Q}_{\ast}\end{array}} $$

where Σ is a C × C matrix and the sub-diagonal matrix has the form \( {\Sigma}_{cc}={k}_c\left({x}_{\ast },{x}_{\ast}\right)-{k}_c^{\mathrm{T}}\left({x}_{\ast}\right){K}_c^{-1}{k}_c\left({x}_{\ast}\right) \). In this section, the Monte Carlo method is used to sample the above prediction mean and prediction covariance matrix and obtain the sample mean value as an a posteriori prediction. The forecasting process based on the training set at the random field is:

  • Input posterior edge prediction \( \widehat{f} \), covariance matrix K, detection x.

  • Calculate the current observation variable label distribution law Π:

$$ p\left({y}_i^c\left|{\widehat{f}}_i\right.\right)={\pi}_i^c=\exp \left({\widehat{f}}_i^c\right)/{\sum}_{c\kern0.5em \hbox{'}}\exp \left({\widehat{f}}_i^{c\kern0.5em \hbox{'}}\right) $$
$$ \mathrm{\nabla \Psi }=-{K}^{-1}\widehat{f}+y-\pi, \kern0.5em \mathrm{\nabla \nabla \Psi }=-{K}^{-1} $$

where W  diag (π) − ΠΠT.

  • For each class of implicit labels c = 1,2, …, C, calculate

$$ L:= \mathrm{Cholesky}\left({I}_n+{D}_c^{1/2}{K}_c{D}_c^{1/2}\right) $$
$$ {E}_c={D}_c^{1/2}{L}^T\backslash \left(L\backslash {D}_c^{1/2}\right) $$
$$ M:= \mathrm{Cholesky}\left({\sum}_i{E}_i\right) $$
$$ {\mu}_{\ast}^c:= {\left({y}^c-{\pi}^c\right)}^T{k}_{\ast}^c $$
$$ b:= {E}_c{k}_{\ast}^c,\kern0.5em c:= {E}_c\left(R\left({M}^T\backslash \left(M\backslash \left({R}^Tb\right)\right)\right)\right). $$
  • For each type of implicit label c' = 1,2, …, C, calculate

$$ {\sum}_{cc\kern0.5em \hbox{'}}:= {c}^T{k}_{\ast}^{c\kern0.5em \hbox{'}};\cdot {\sum}_{cc\kern0.5em \hbox{'}}:= {\sum}_{cc}+{k}_c\left({x}_{\ast },{x}_{\ast}\right)-{b}^T{k}_{\ast}^c. $$
  • Initialize Monte Carlo posterior sampling: π 0.

  • Posterior distribution of sampling test position markers:

$$ {f}_{\ast}\sim N\left({\mu}_{\ast },\Sigma \right),{\pi}_{\ast}:= {\pi}_{\ast }+\exp \left({f}_{\ast}^c\right)/{\Sigma}_{c\kern0.5em \hbox{'}}\exp \left({f}_{\ast}^{c\kern0.5em \hbox{'}}\right). $$
  • Calculate the regularized estimate vector:

$$ {\tilde{\pi}}_{\ast}:= {\pi}_{\ast }/S. $$
  • Calculate tag category prediction vector:

$$ {\mathbb{E}}_{q\kern0.5em (f)}\left[\pi \left(f\left({x}_{\ast}\right)\right)\left|{x}_{\ast },X,y\right.\right]:= {\pi}_{\ast } $$

4 Knuckle image recognition based on learning results of two layers of observation data

In the previous section, based on the high- and mid-level data in gray image density estimation, offset measurement estimation under different offset level parameters was implemented. At the same time, the learning results of the two-layer data model were used as two kinds of offset information features on the grayscale image. Since the above two types of migration features are the specific forms of the overall random set migration characteristics of the image in the interval, it is obviously necessary to integrate the above two features as the characteristics of the overall image features. According to the process of data-extraction and model-generation, it can be seen that there is a strong correlation between the two features, and there is even consistency in the overlapping range of the horizontal parameters. From the modeling process on the Poisson Gaussian field of random images, the two kinds of offset information also have strong compatibility.

From the perspective of information fusion and feature learning, two types of feature models that have been learned can be used as detectors of two kinds of offset features on the image, and the detection results are two likelihood values of a specific image under the above model. The likelihood value corresponding to the positive sample image is higher, and the negative sample image is the opposite. Therefore, the fusion of offset features is the learning process of jointly distributing the two likelihood values on the offset eigenvalue plane. Furthermore, since the size of the training library in the aforementioned model learning process is not large, the amount of information provided by the training results is not sufficient, and the learning result is not perfect. Also, the offset feature itself has a strong random feature. Based on the above analysis, the likelihood value fusion process in the feature plane is not suitable for learning with a generative model. Therefore, in this section, the likelihood values of the two models labeled with positive and negative samples are used as input. The Gaussian process classification in the discriminative learning method is used to fuse the likelihood values of the two types of images in the feature plane. The estimation of the joint overall distribution of the two types of features is obtained, and the joint image is directly identified based on the estimation results.

4.1 Binary classification and image offset information fusion based on Gaussian process

Depending on whether it belongs to the hand joint image, the test image is given the y mark {−1, 1} at the corresponding observed data point in the two-layer model likelihood space. In this way, the aforementioned fusion process can be presented as Binary Gaussian process learning. The learning result is the probability distribution of the marker y = 1 on the discriminant field for the joint target and non-joint targets. Different from middle-level information modeling, in the learning process of the classification information for the marker information y in this section, the sample domain is a training dataset generated from the two types of model likelihood values corresponding to the test image set. The learning domain is a normalized feature plane.

In the binary Gaussian classification process, an offset model likelihood set X = {xi}i = 1, … , n with a label y = {yi}i = 1, … , n is used as a training dataset. Taking X as the model input quantity, marking y as the final observation and measurement of the fusion model, and discriminating the learning process on the field is the construction and learning process of the correlation method between the input quantity and observation quantity. This association method includes two main aspects, which are the classification of tags under specific input quantities and the distribution relationship between corresponding tags of different input quantities. Obviously, the former can be naturally embodied in conditional probability form, while the latter is now the joint distribution of the marker variables in the discriminant random field. The Gaussian random field provides an effective way to comprehensively represent this association method. By building the Gaussian implicit function f on the feature plane, the conditional distribution of the marker classification is decomposed into two independent parts: yf and fX. At the same time, the joint distribution of marker variables is transformed to the description of the structure of the field function f instead of directly modeling the associated structure on the conditional field yX. Considering the binarization of the label, it is clear that the label value of the discriminant field on the lattice point domain can be expressed as a Bernoulli distribution by using the implicit function f:

$$ {y}_i\mid {f}_i\sim Bern\left(\sigma \left({f}_i|{y}_i=1\right)\right). $$

The logic transformation σ() transforms the Gaussian variable fi to the range 0 ~ 1 as the control parameter of the activation label yi = 1:

$$ p\left({f}_i|{y}_i\right)=\sigma \left({y}_i{f}_i\right)=\frac{1}{1+\exp \left(-{y}_i{f}_i\right)}. $$

Since the value of label y is 0 in the range, the specific form of conditional field fX can be given by using the Gaussian process a priori \( f\mid X\sim \mathcal{N}\left(0,K\right) \), where K is the binary covariance function on field f. It can be seen that the main content of classification learning is the posterior update of f and the prediction of p(f| f, X, y, x) at the test position x.

Since the Gaussian field fi = f(| xi) is a Gaussian function, the posterior form also maintains a Gaussian form:

$$ f\sim N\left(\widehat{f}|f,{A}^{-1}\right)\propto \exp \left(-\frac{1}{2}{\left(f-\widehat{f}\right)}^{\mathrm{T}}A\left(f-\widehat{f}\right)\right), $$

where \( \widehat{f}=\arg {\max}_fp\left(f|X,y\right) \) and \( A=-\mathrm{\nabla \nabla}\log p\left(f=\widehat{f}|X,y\right) \).

According to the Bayesian rule, the maximum posterior estimate of the implicit function f in the above formula is

$$ p\left(f|X,y\right)=p\left(y|f\right)p\left(f|X\right)/p\left(y|X\right). $$

In Eq. 47, because the classification mark y of test dataset X is not directly related to f, p(y| X) does not include f. Then f’s posterior maximization solution \( \widehat{f} \) only must consider the numerator, and the corresponding logarithmic form is

$$ {\displaystyle \begin{array}{l}\Psi (f)\triangleq \log \left(\left(y|f\right)p\left(f|X\right)\right)\\ {}=\log p\left(y|f\right)-\frac{1}{2}{f}^{\mathrm{T}}{K}^{-1}f-\frac{1}{2}\log \left|K\right|-\frac{n}{2}\log 2\pi \end{array}}. $$

\( \widehat{f} \) corresponds to the zero of Ψ(f) = 0:

$$ \mathrm{\nabla \Psi }(f)=\nabla \log p\left(y|f\right)-{K}^{-1}f $$
$$ \widehat{f}=K\left(\nabla \log p\left(y|\widehat{f}\right)\right). $$

The standard Newton-Raphson iterative format can be used to solve the nonlinear equation Ψ(f) = 0:

$$ {f}^{new}=f-{\left({\nabla}^2\Psi (f)\right)}^{-1}\mathrm{\nabla \Psi }(f) $$
$$ {\displaystyle \begin{array}{l}f-{\left({\nabla}^2\Psi \right)}^{-1}\mathrm{\nabla \Psi }=f+{\left({K}^{-1}+W\right)}^{-1}\left(\nabla \log p\left(y|f\right)-{K}^{-1}f\right)\\ {}={\left({K}^{-1}+W\right)}^{-1}\Big( Wf+\nabla \log p\left(y|f\right)\end{array}}, $$

where  Ψ(f) =     log p(y| f) − K−1 =  − W − K−1, i.e., the a posteriori covariance function in Eq. 46:

$$ A={K}^{-1}+W. $$

Equations 51 and 53 yield the posterior format \( q\left(f|X,y\right)=N\left(\widehat{f},{\left({K}^{-1}+W\right)}^{-1}\right) \) of \( \widehat{f} \).

Considering the numerical stability in learning, the numerical characteristics of important matrices in the iterative process must be analyzed. The adjustment method of the inverse matrix is changed to make the eigenvalues of the matrix away from 0, so as to ensure the accuracy of the solution. According to the aforementioned model construction, the relationship between p(yi| fi) and p(yj| fj) has been transferred to the structure of field f. Therefore, jp(yi| f) is 0, and W has the diagonal matrix form W =  diag (π1(1 − π1),  … , πn(1 − πn)), where πi = p(yi = 1| fi). In combination with Eq. 45, the numerical form of the derivative of the objective function Ψ() is

$$ {\displaystyle \begin{array}{l}\frac{\partial }{\partial {f}_i}\log p\left({y}_i|{f}_i\right)={t}_i-{\pi}_i\\ {}{t}_i=\left({y}_i+1\right)/2\end{array}} $$
$$ {\left.\mathrm{\nabla \nabla}\log p\left(y|f\right)\right|}_{ii}=\frac{\partial^2}{\partial {f}_i^2}\log p\left({y}_i|{f}_i\right)=-{\pi}_i\left(1-{\pi}_i\right). $$

In addition, the matrices K and W in the Newton iteration of Eq. 52 are both larger n × n sparse squares. (K−1 + W)−1 can be decomposed by using the positive definite matrix B:

$$ B=I+{W}^{\frac{1}{2}}{KW}^{\frac{1}{2}} $$
$$ {\displaystyle \begin{array}{l}{\left({K}^{-1}+W\right)}^{-1}=K-K{\left(K+{W}^{-1}\right)}^{-1}K\\ {}=K-{KW}^{\frac{1}{2}}{W}^{-\frac{1}{2}}{\left(K+{W}^{-1}\right)}^{-1}{W}^{-\frac{1}{2}}{W}^{\frac{1}{2}}K\\ {}=K-{KW}^{\frac{1}{2}}{W}^{-\frac{1}{2}}\left(I-W{\left({K}^{-1}+W\right)}^{-1}{W}^{\frac{1}{2}}\right){W}^{\frac{1}{2}}K\\ {}=K-{KW}^{\frac{1}{2}}{\left(I+{W}^{\frac{1}{2}}{KW}^{\frac{1}{2}}\right)}^{-1}{W}^{\frac{1}{2}}K=K-{KW}^{\frac{1}{2}}{B}^{-1}{W}^{\frac{1}{2}}K\end{array}}. $$

Equation 56 obviously produces a diagonal band matrix. The inverse matrix can be quickly calculated by means of Cholesky decomposition. The inversion format in Eq. 57 is more stable than the directly solved inverse matrix of A.

The convergence monitoring of the above a posteriori iteration is given by the model likelihood of the label value:

$$ p\left(y|X\right)=\int p\left(y|f\right)p\left(f|X\right) df=\int \exp \left(\Psi (f)\right) df. $$

Under the Laplacian approximation, the form of Eq. 57 under the local approximation of Ψ() is

$$ \Psi (f)\approx \Psi \left(\widehat{f}\right)-\Psi \left(\varDelta f|\widehat{f}\right)\simeq \Psi \left(\widehat{f}\right)-\frac{1}{2}{\left(f-\widehat{f}\right)}^{\mathrm{T}}A\left(f-\widehat{f}\right) $$
$$ p\left(y|X\right)\simeq q\left(y|X\right)=\exp \left(\Psi \left(\widehat{f}\right)\right)\int \exp \left(-\frac{1}{2}{\left(f-\widehat{f}\right)}^{\mathrm{T}}A\left(f-\widehat{f}\right)\right) df. $$

The integral term in Eq. 60 can be simplified to

$$ \int \exp \left(-\frac{1}{2}{\left(f-\widehat{f}\right)}^{\mathrm{T}}A\left(f-\widehat{f}\right)\right) df\propto \frac{1}{\sqrt{{\left|{K}^{-1}+W\right|}^{-1}}}. $$

The logarithmic form of the a posteriori prediction is expressed as:

$$ {\displaystyle \begin{array}{l}\log q\left(y|X\right)=-\frac{1}{2}{\widehat{f}}^{\mathrm{T}}{K}^{-1}\widehat{f}+\log p\left(y|\widehat{f}\right)-\frac{1}{2}\log \left|K\right|-\frac{1}{2}\log \left|{K}^{-1}+W\right|\\ {}=-\frac{1}{2}{\widehat{f}}^{\mathrm{T}}{K}^{-1}\widehat{f}+\log p\left(y|\widehat{f}\right)-\frac{1}{2}\log \left|B\right|\end{array}}. $$

For test data x, the posterior mean f under Laplacian approximation is expressed as

$$ {\mathbb{E}}_q\left[{f}_{\ast }|X,y,{x}_{\ast}\right]=k{\left({x}_{\ast}\right)}^{\mathrm{T}}{K}^{-1}\widehat{f}=k{\left({x}_{\ast}\right)}^{\mathrm{T}}\nabla \log p\left(y|f\right), $$

and the forecasting variance of Gaussian approximation is

$$ {\displaystyle \begin{array}{l}{\mathbb{V}}_q\left[{f}_{\ast }|X,y,{x}_{\ast}\right]={\mathbb{E}}_{p\left({f}_{\ast }|X,{x}_{\ast },f\right)}\left[{\left({f}_{\ast }-\mathbb{E}\left[{f}_{\ast }|X,{x}_{\ast },f\right]\right)}^2\right]\\ {}+{\mathbb{E}}_{q\left(f|X,y\right)}\left[\right(\mathbb{E}\left[{f}_{\ast }|X,{x}_{\ast },f\right],\\ {}-\mathbb{E}\left[{f}_{\ast }|X,y,{x}_{\ast}\right]\left){}^2\right]\end{array}}. $$

Under the Gaussian process assumption, the above equation has the following analytical form:

$$ {\displaystyle \begin{array}{l}{\mathbb{V}}_q\left[{f}_{\ast }|X,y,{x}_{\ast}\right]=k\left({x}_{\ast },{x}_{\ast}\right)-{k}_{\ast}^{\mathrm{T}}{K}^{-1}{k}_{\ast }+{k}_{\ast}^{\mathrm{T}}{K}^{-1}{\left({K}^{-1}+W\right)}^{-1}{K}^{-1}{k}_{\ast}\\ {}=k\left({x}_{\ast },{x}_{\ast}\right)-{k}_{\ast}^{\mathrm{T}}\left(K+{W}^{-1}\right){{}^{-1}k}_{\ast}\end{array}}. $$

Under the Eq. 56,

$$ {\displaystyle \begin{array}{l}{\mathbb{V}}_q\left[{f}_{\ast }|y\right]=k\left({x}_{\ast },{x}_{\ast}\right)-k{\left({x}_{\ast}\right)}^{\mathrm{T}}{W}^{\frac{1}{2}}{B}^{-1}k\left({x}_{\ast}\right){W}^{\frac{1}{2}}\\ {}=k\left({x}_{\ast },{x}_{\ast}\right)-k{\left({x}_{\ast}\right)}^{\mathrm{T}}{W}^{\frac{1}{2}}{\left({LL}^{\mathrm{T}}\right)}^{-1}k\left({x}_{\ast}\right){W}^{\frac{1}{2}}\\ {}=k\left({x}_{\ast },{x}_{\ast}\right)-{v}^{\mathrm{T}}v\end{array}}, $$

where \( v=L\backslash \left({W}^{\frac{1}{2}}k\left({x}_{\ast}\right)\right) \). Calculate the positive marker class probability corresponding to the Bernoulli distribution based on the predicted mean and predicted likelihood:

$$ \tilde{\pi_{\ast }}\simeq {\mathbb{E}}_q\left[{\pi}_{\ast }|X,y,{x}_{\ast}\right]=\int \sigma \left({f}_{\ast}\right)q\left({f}_{\ast }|X,y,{x}_{\ast}\right){df}_{\ast }. $$

In summary, the two-class Gaussian process prediction algorithm based on Laplace is:

  • Input posterior edge prediction \( \widehat{f} \), covariance function k, and detection X.

  • \( W:= -\mathrm{\nabla \nabla}\log p\left(y|\widehat{f}\right) \).

  • L Cholesky(I + W1/2KW1/2).

  • \( {f}_{\ast}:= k{\left({x}_{\ast}\right)}^T\nabla \log p\left(y|\widehat{f}\right) \).

  • vL\(W1/2k(x)).

  • \( \mathbb{V}\left[{f}_{\ast}\right]:= k\left({x}_{\ast },{x}_{\ast}\right)-{v}^Tv \).

  • \( {\overline{\pi}}_{\ast}:= \int \sigma (z)\mathcal{N}\left(z|{\overline{f}}_{\ast },\mathbb{V}\left[{f}_{\ast}\right]\right) dz \).

The predicted edge distribution with tag category 1 is \( {\overline{\pi}}_{\ast } \).

4.2 Knuckle target recognition algorithm based on offset feature distribution

Based on the learned image layered offset fusion GP model, the model likelihood of the fusion image of the test image is used as the image feature. In the range of the test image domain, according to this feature combined with the maximum between-class variance method, self-adaptive threshold recognition is performed for the far finger and middle finger in the image. The concrete manifestation is that sub-image extraction is performed after the template data are calculated from the test image data. The nonparametric density kernel estimation calculations and the evolution of the level set of interest regions are performed on the obtained sub-images to obtain high- and mid-level data for density kernel estimation. High-level data are used to calculate the high-level data model likelihood through the high-level data model (DPMM). After the middle-level data are transformed by the data discrete group, the mid-level data model is used to calculate the mid-level data model likelihood. We then combine the two to calculate the two-level data fusion GP model likelihood.

The likelihood value of the fusion GP model is calculated at each template position of the test image, and this likelihood value is used as the information matrix corresponding to the image feature. Due to the limitations of the integrity of the model, detection points with higher likelihood values may appear at non-joint target locations. The largest cluster-like variance method is used to eliminate the position with the highest GP-likelihood in the case of threshold adaptation. Then, nonlinear evolution is performed on the feature information matrix with the high likelihood value removed, the features are enhanced, and the joint target is detected again using the adaptive threshold detection method. Because the detection position of the high threshold at non-joint positions is mostly unstable, it is difficult to recover it by the neighborhood information after rejection by the threshold. At the same time, a high threshold value at the joint location is more stable, so it can be recovered by the evolution of the neighborhood information.

5 Analysis of results and discussions

5.1 DPMM model learning process

Examples of iterative learning monitoring of high-level density data DPMM in far- and middle-knuckle gray scale images are shown in Figs. 4 and 5. Using the above collapsed Gibbs sampling method, the Dirichlet process model for high-level data distribution of knuckle images is iteratively learned. Iterative initialization uses the k-means algorithm to classify the results and records the first 300 steps of likelihood monitoring values. The a priori parameters in the normal-inverse-Wishart distribution are taken as κ = 0.1, ν = 4; the a priori hyper parameters in the mixed Gamma distribution are taken as a = 0.1, b = 0.1; and the parameters of the Dirichlet process are initialized as α = 10. To improve the sampling accuracy of matrix parameters, the Cholesky decomposition of the covariance matrix obtained by iterative updating is performed. Sample moment statistics are made on its eigenvalues and feature directions, and matrix sampling is used to recover valid matrix samples.

Fig. 4
figure 4

Convergence of DPMM random cluster of far knuckles. a DPMM random clustering example far knuckles 1. b DPMM random clustering example far knuckles 2. c DPMM random clustering example far knuckles 3. d DPMM random clustering example far knuckles 4

Fig. 5
figure 5

Convergence of DPMM random cluster of middle knuckles. a DPMM random clustering example of middle knuckles 1. b DPMM random clustering example of middle knuckles 2. c DPMM random clustering example of middle knuckles 3. d DPMM random clustering example of middle knuckles 4

From the results, the convergence speed of DPMM is faster and the smoothness of the likelihood curve is greater. On the one hand, because the number of clusters is flexible, the model has further improved the identification of the structure within the training dataset. The process of testing the number of linked random clusters can further clarify the sampling results. In the initial phase of the iterative process, the number of clusters suddenly increases by several times the convergence value. As shown in Figs. 4 and 5, different from the parameter optimization in the traditional finite mixture model, this stage corresponds to the sampling algorithm performing a random search in a wide range of clustering models, so that the model can quickly determine a more stable clustering mode. On the other hand, the Dirichlet distribution uses an a priori structure, so that the update process of the DPMM internal parameters can be more effectively controlled under higher-level conditional distributions, manifesting that the convergence curve has higher smoothness in the stable region.

5.2 Offset measurement data learning results

Examples of the DPMM model learning results on the training image library are shown in Fig. 6. Under the condition of K = 3 clustering initialization, the flexible random clustering modeling for image high-offset density position distribution is realized. A high-level distribution likelihood model of knuckle images with a more complex internal structure is obtained. The far phalanx goal learning results shown in the figure clearly show that the clustering self-adaptive process has similar results to data density clustering. The model scale has a strong ability to adapt to the training set. The distribution of clustering represented by the model likelihood results is not only consistent with the observed characteristics on the whole. The characteristic orientation of the internal clustering component also reflects the features of the knuckles under the grip of the hand. It shows that this algorithm has better modeling ability for the high-level distribution of far-knuckle images.

Fig. 6
figure 6

Results of DPMM random cluster of knuckles. a Result of DPMM random cluster of far knuckles 1 and 2. b Result of DPMM random cluster of far knuckles 3 and 4. c Result of DPMM random cluster of middle knuckles 1 and 2. d Result of DPMM random cluster of knuckles 3 and 4

Using the aforementioned middle-level distribution learning and prediction algorithm, the multi-classification model of middle-level data for each image is learned in 51 positive images of distal phalanx images and 51 positive sample images of middle phalanx images, respectively, as shown in Fig. 7. In Fig. 7, from left to right, there are first-, second-, and third-class hidden flags.

Fig. 7
figure 7

Example of hidden label Gaussian process learning on middle-layer image data for knuckles. a Example of learning based on middle-layer data on far knuckles image. b Example of learning based on middle-layer data on middle-knuckle image

From the prediction results in Fig. 7, it can be seen that the three-category tag learning results of layer data in the finger image can more clearly show the design goals of the model. The marker results of the learning prediction also conform to the hypothesis of the distribution of layer data in the knuckle image.

Considering the learning accuracy and computational complexity of the fusion process, the lattice field [1 : 1 : 101]  [1 : 1 : 101] generated after discretizing the feature plane is used as the discriminant field. We take the covariance matrix as an isotropic exponential form:

$$ k\left({x}_1,{x}_2\right)=\exp \left(-\frac{{\left\Vert {x}_1-{x}_2\right\Vert}^2}{\kappa}\right), $$

where the scale parameter is κ = 0.007. In the distal phalanx and middle-finger image test libraries (51 positive and negative samples), the high- and middle-level data-extraction process, feature likelihood calculation, and fusion model learning are completed. Based on the high-level data DPMM model learning results, the high-level data model likelihoods are obtained. We combine the mid-level three-classification model to calculate the likelihood value of the corresponding observation data and normalize the two similarity values as a labeled test set for the supervised learning of the two-class Gaussian process. Multi-offset feature likelihood distributions, covariance functions, and fusion model (GP) learning results are shown in Fig. 8, in which the left graphs of each map are far-knuckle results, and the right graphs are the middle-finger results.

Fig. 8
figure 8

Synthesis learning of multilayer data from knuckle image. a Covariance of knuckle image. b Result of learning on knuckle image

According to Fig. 8b, in the normalized feature plane, the first characteristic direction of the positive sample fusion distribution follows the characteristic line (0, 0)-(100, 100) direction in the feature plane. The second feature direction is nearly perpendicular to the feature line; the first feature direction of the negative sample fusion distribution is close to the vertical direction of the feature line. The angle relationship between the feature direction and the feature line indicates that the two types of offset features that are fused constitute a certain degree of discrimination between positive and negative samples, and the fusion results show a stronger forecast of this differentiation. Comparing the left and right graphs shown in Fig. 8b, the high-end model of far-knuckle images has better discrimination between positive and negative samples than the middle-level model, while the fusion prediction of middle-finger images shows the opposite result. The main reason for the difference between the above models is the obvious differences in the random structure of the distal phalanx and middle phalanx images, which are embodied in the differences in distribution patterns at different levels of displacement.

5.3 Recognition for various algorithms

Under the fixed threshold condition, the recognition ability of the high-level data DPMM model, the middle-level data implicit marking GP model, and the DPMM+ GP model combining the two are briefly analyzed. We artificially produced four finger-knuckle image databases with library capacity of 330, 1344, 1896, and 1400 images. These knuckles are taken by industrial cameras and come from people of different genders, ages and sizes. In these image libraries, positive and negative samples each make up half. In these image libraries, two are far-finger image libraries and two are middle-finger image libraries. To test the adaptability of the recognition algorithm to the fuzzy objects, the joint features corresponding to all the knuckle images in the four image libraries are artificially selected to be weaker than those in the training library. That is, testing on a feature-rich joint image can achieve higher recognition capabilities. Three models are used in the four image libraries for detection to compare the optimal recognition ability of each algorithm for each image library; through the test analysis, the threshold of the highest recognition capability of the above three models in each image library is taken as the best recognition threshold of each algorithm in the library and plotted to a receiver operating characteristic curve (ROC). At the same time, through actual measurement, it is found that the difference between the best recognition thresholds of the same algorithm in different image databases is small. Therefore, the following four ROC curves can be compared and analyzed as a whole, as shown in Fig. 9.

Fig. 9
figure 9

Constant threshold image detection for knuckles based on different data models. a ROC of far-knuckle detection based on multi-model 1 and 2. b ROC of middle-knuckle detection based on multi-model 1 and 2

Considering that the area under the curve (AUC) on the ROC is a measure of the recognition ability of the identifier, it can be clearly seen from Fig. 9 that in the far-knuckle test library, the comprehensive model recognition ability of the middle-level data model and the two-tier data is not as good as that of high-level image features. In the middle-finger test library, the recognition curves of the two high-level data models are low, and the corresponding AUC is less than 0.5. The recognition ability of the high-level data model in the far-finger library is obviously higher, while the middle-tier data model in the middle-finger library has stronger recognition ability. The above shows that the existing data model has great differences in the ability to identify different types of knuckle objects. It also potentially indicates that there is a certain difference between deep model categories in the distribution of far- and middle-finger image data.

In Fig. 9b, the AUC values of the high-level data model corresponding to the two ROCs are 0.5134 and 0.2332. It can be seen that the recognition effect of high-level models in the middle-finger image database (2) is not obvious, and wrong classifications even appear. Further tests show that the high-level model with high likelihood is the middle-finger area, not the finger joint area. This phenomenon occurs because the intermediate region has relatively small local information entropy due to the smooth grayscale distribution. The high-level data volume is larger and denser than the usual joint image data, which undermines the model’s assumptions on data distribution, and therefore, it has poor recognition capability. At the same time, according to Fig. 9, it can be seen that the ROC corresponding to the two-layer data fusion model is located between the high- and middle-level models. It shows that the recognition based on the fusion model has the effect of comprehensively judging the two features. In the case that the high- and middle-level models differ greatly in their ability to identify models, they can provide effective comprehensive evaluation, which is more prominent in Fig. 9b. The minimum area under the curve for the fusion model is (a) 0.4512 on the left of the figure, and the maximum is (b) 0.7880 on the right of the figure. The results show that the fixed threshold identification method has stable and correct classification ability under the condition of existing limited data and test set in the environment where the light intensity is relatively stable and the imaging angle does not change much.

To further improve the recognition ability of the fusion model, combined with the learned DPMM+GP model, adaptive threshold joint detection is performed on a hand test image containing a finger joint. The size of the test image is controlled to include the size of about 1000 templates, as shown in Fig. 10. Among them, black indicates that there is no finger joint at the position, the gray portion is the artificially marked finger joint region, and the white position is the joint point position result recognized by the adaptive threshold segmentation. When the detected joint position falls within the manually determined gray area, the joint identification is correct. Being away from the gray area indicates that the detected position has a large deviation from the true position. With the aid of diffusion evolution, the recognition result is closer to the real target area, which obviously improves the accuracy of finger joint target recognition. It is worth noting that the detection results and the marked areas shown in Fig. 10 are all at the pixel level. Therefore, the detection error of the above algorithm in the actual image is also at the pixel level. For actual detection tasks, joint detection can be initially implemented.

Fig. 10
figure 10

Adaptive threshold image detection for knuckles. a Adaptive threshold image detection for far knuckles based on DPMM+GP. b Adaptive threshold image detection for middle knuckles based on DPMM+GP

6 Conclusions

In this paper, nonparametric density kernel estimation results are used as observation sets, and the estimation of multi-level migration of knuckle images is estimated using both random clustering iterative learning and a multi-class random field model. Further, through the fusion learning of multilayer migration features, the overall characteristics of knuckle images are constructed, and the detection and recognition capabilities of the above multiple models under fixed and adaptive thresholds are compared. At the same time, a knuckle position image recognition algorithm based on an offset feature fusion model under adaptive threshold conditions is presented. Threshold recognition is carried out on the image with relatively stable light intensity. The results show that the corresponding algorithm is feasible. For the environment with large change of light intensity and the large change of camera angle, it is necessary to further study the adaptability of image threshold.


  1. Y. Wang, T. Chen, Z. He, C. Wu, Review on the machine vision measurement and control technology for intelligent manufacturing equipment. Control Theory Appl. 32(3), 273–286 (2015)

    Google Scholar 

  2. M. Liu, J. Ma, M. Zhang, Z. Zhao, D. Yang, Q. Wang, Online operation method for assembly system of mechanical products based on machine vision. Comput. Integr. Manuf. Syst. 21(9), 2343–2353 (2015)

    Google Scholar 

  3. Y. Wang, D. Ewert, R. Vossen, S. Jeschke, A visual servoing system for interactive human-robot object transfer. J. Autom. Control Eng 3(4), 277–283 (2015)

    Article  Google Scholar 

  4. J.T.C. Tan, F. Duan, R. Kato, T. Arai, Safety strategy for human-robot collaboration: design and development in cellular manufacturing. Adv. Robot. 24, 839–860 (2010)

    Article  Google Scholar 

  5. M.K. Bhuyan, K.F. MacDorman, M.K. Kar, D.R. Neog, Hand pose recognition from monocular images by geometrical and texture analysis. J. Vis. Lang. Comput. 28, 39–55 (2015)

    Article  Google Scholar 

  6. D.-L. Lee, W.-S. You, Recognition of complex static hand gestures by using the wristband-based contour features. IET Image Process. 12(1), 80–87 (2018)

    Article  Google Scholar 

  7. A. Moschetti, L. Fiorini, D. Esposito, P. Dario, F. Cavallo, Toward an unsupervised approach for daily gesture recognition in assisted living applications. IEEE Sensors J. 17(24), 8395–8403 (2017)

    Article  Google Scholar 

  8. P. Bao, A.I. Maqueda, C.R. del Blanco, N. García, Tiny hand gesture recognition without localization via a deep convolutional network. Consum. Electron. 63(3), 251–257 (2017)

    Article  Google Scholar 

  9. A.V. Dehankar, S. Jain, V.M. Thakare, Using AEPI Method for Hand Gesture Recognition in Varying Background and Blurred Images, 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA), vol 1 (2017), pp. 404–409

    Google Scholar 

  10. Y. Ding, Q. Zhao, B. Li, X. Yuan, Facial expression recognition from image sequence based on LBP and Taylor expansion. IEEE Access 5, 19409–19419 (2017)

    Article  Google Scholar 

  11. C. Yao, Y.-F. Liu, B. Jiang, J. Han, J. Han, LLE score: A new filter-based unsupervised feature selection method based on nonlinear manifold embedding and its application to image recognition. IEEE Trans. Image Process. 26(11), 5257–5269 (2017)

    Article  MathSciNet  Google Scholar 

  12. J. Wang, G. Wang, Hierarchical spatial sum–product networks for action recognition in still images. IEEE Trans. Circuits Syst. Video Technol. 28(1), 90–100 (2018)

    Article  Google Scholar 

  13. P. Panda, A. Ankit, P. Wijesinghe, K. Roy, FALCON: feature driven selective classification for energy-efficient image recognition. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36(12), 2017–2029 (2017)

    Article  Google Scholar 

  14. H. Li, A. Achim, D. Bull, Unsupervised video anomaly detection using feature clustering. IET Signal Process. 6(5), 521–533 (2012)

    Article  MathSciNet  Google Scholar 

  15. J.-Y. Jiang, R.-J. Liou, S.-J. Lee, A fuzzy self-constructing feature clustering algorithm for text classification. IEEE Trans. Knowl. Data Eng. 23(3), 335–349 (2011)

    Article  Google Scholar 

  16. M. Rahmani, G. Akbarizadeh, Unsupervised feature learning based on sparse coding and spectral clustering for segmentation of synthetic aperture radar images. IET Comput. Vis. 9(5), 629–638 (2015)

    Article  Google Scholar 

  17. R.B. Dan, P.S. Mohod, Survey on Hand Gesture Recognition Approaches [J]. Int. J. Comput. Sci. Inf. Technol. 5(2), 2050–2052 (2014)

    Google Scholar 

  18. P. Garg, N. Aggarwal, S. Sofat, Visual based hand gesture recognition. Int. J. Comput. Electr. Autom. Control Inform. Eng. 3(1), 186–191 (2009)

    Google Scholar 

  19. Fai CC, Silvia A, Alessandro B, Alain F, Mehdi M, Francesco, P. Constraint study for a hand exoskeleton: human hand kinematics and dynamics. J Robotics. 2013:1-17.

  20. S. Kang, B. Choi, D. Jo, Faces detection method based on skin color modeling. J. Syst. Archit. 64(C), 100–109 (2016)

    Article  Google Scholar 

  21. G. Wu, W. Kang, Robust fingertip detection in a complex environment. IEEE Trans. Multimedia 18(6), 978–987 (2016)

    Article  Google Scholar 

  22. A. Kumar, C. Ravikanth, Personal authentication using finger knuckle surface. IEEE Trans. Inf. Forensics Secur. 4(1), 98–110 (2009)

    Article  Google Scholar 

  23. K. Usha, M. Ezhilarasan. Hybrid Detection of Convex Curves for Biometric Authentication Using Tangents and Secants. The 3rd IEEE International Advanced Computer Conference, Ghaziabad, India, February 22–23, 2013:763–768. Adv. Comput. Conf., 2013 , 7903 (5) :763–768

  24. K. Usha, M. Ezhilarasan, Finger knuckle biometrics–a review. Comput. Electr. Eng. 45(C), 249–259 (2014)

    Google Scholar 

  25. H.-C. Huanga, C.-T. Hsiehb, M.-N. Hsiao b, C.-H. Yehb, A study of automatic separation and recognition for overlapped fingerprints. Appl. Soft Comput. 71, 127–140 (2018)

    Article  Google Scholar 

  26. K. Usha, M. Ezhilarasan, Fusion of geometric and texture features for finger knuckle surface recognition. Alex. Eng. J. 55(1), 683–697 (2016)

    Article  Google Scholar 

  27. K. Usha, M. Ezhilarasan, Robust personal authentication using finger knuckle geometric and texture features. Ain Shams Eng. J. (2016) In press

  28. Z. Lin, L. Zhang, D. Zhang, H. Zhu, Online finger-knuckle-print verification for personal authentication. Pattern Recogn. 43, 2560–2571 (2010)

    Article  Google Scholar 

  29. G. Gao, L. Zhang, J. Yang, L. Zhang, D. Zhang, Reconstruction based finger-knuckle-print verification with score level adaptive binary fusion. IEEE Trans. Image Process. 22(12), 5050–5062 (2013)

    Article  MathSciNet  Google Scholar 

  30. A. Kumar, Z. Xu, Personal identification using minor knuckle patterns from palm dorsal surface. IEEE Trans. Inf. Forensics Secur. 11(10), 2338–2348 (2016)

    Article  Google Scholar 

  31. S. Yang, L. Gong, Excursion characteristic learning and recognition for hand image knuckles based on log Gaussian Cox field. Trans. Chin. Soc. Agric. Machinery 48(1), 353–360 (2017)

    Google Scholar 

  32. Stanley Sawyer. Wishart Distribution and Inverse-Wishart Sampling. Washington University. 2007, Technical Report

    Google Scholar 

Download references


The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions. I would like to acknowledge all our team members, especially Luqi Gong. These authors contributed equally to this work.

About the authors

Shiqiang Yang was born in Baiyin, Gansu, P.R. China, in 1973. He received the Ph.D degree in mechanical engineering from Xi’an University of Technology of China, Xi’an, China, in 2010. From 2005 to 2018, he was with Xi’an University of Technology of China, Since 2009, he has been an associate professor with the School of Mechanical and Precision Instrument Engineering, Xi’an University of Technology, Xi’an, China. From 2011 to 2018, he conducted the Master Research with the School of Mechanical and Precision Instrument Engineering, Xi’an University of Technology, Xi’an. His current research interests include Intelligent robot control, Image recognition, behavior detection and recoginition.

Luqi Gong was born in Xianyang, Shaanxi, P.R. China, in 1991. He received the Master degree from the Xi’an University of Technology of China, Xi’an, China, in 2016. He research interests include image recognition, image processing and biometric detection.

Dan Qiao was born in Handan, Hebei, P.R. China, in 1994. He received the Bachelor degree from the Xi’an University of Technology of China, Xi’an, China, in 2016. Now, he works in Xi’an University of Technology of China as Master student. He research interests include image recognition, image processing and biometric detection.


This work is supported by the National Natural Science Foundation of China (Grant No.51475365).

Availability of data and materials

Please contact author for data requests.

Author information

Authors and Affiliations



All authors take part in the discussion of the work described in this paper. These authors contributed equally to this work. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shiqiang Yang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, S., Gong, L. & Qiao, D. Image offset density distribution model and recognition of hand knuckle. J Image Video Proc. 2019, 23 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: