In view of the complexity of the random offset set itself, the difference between the offset characteristics corresponding to different offset parameter intervals is relatively large. And the further the offset parameter is from the standard value of 1, the more complex the corresponding feature. Therefore, in the learning process of random image bilateral offset measurement, especially for the case of small offset parameters, it is necessary to deeply analyze the random distribution characteristics of the actual offset observations in the training image database and to select an appropriate model for learning. In this section, we obtain the \( {{\tilde{\mu}}_{{\mathbb{G}}_2\mid \mathrm{\mathbb{P}}}}^{\hbox{'}} \) equivalent density estimate \( p\left(\cdot \right)\propto {{\tilde{\mu}}_{{\mathbb{G}}_2\mid \mathrm{\mathbb{P}}}}^{\hbox{'}}\left(\cdot \right) \) by learning the multi-label distribution random field model for the mid-density location.
Mid-level data distribution training based on Gaussian process classification
Due to the complexity of the distribution patterns in the middle-level data, it is difficult to obtain a mid-level migration density distribution model with relatively obvious features and a certain resolution. According to the nonparametric density kernel estimation result, the image gray position data corresponding to the offset parameter in the selected interval segment are taken as an observation of the random offset image bilateral offset set. The endpoints of the offset parameter interval are c21 = 0.50 and c22 = 0.85, and the middle-layer data are further divided into a multilayer structure corresponding to the three types of labels, according to the level of the corresponding density level, as shown in Fig. 2. Figure 2 a–c respectively correspond to observations in the intervals 0.70–0.85, 0.60–0.75, and 0.50–0.65. With the decrease of the offset parameter, the distribution pattern in Fig. 2a has certain regression characteristics. Figure 2b shows a clustering trend, and Fig. 2c shows the spread features. Comparing the transitions between the three graphs, it can be seen that the mid-level data distribution does not obviously have the clustering patterns or trends implicit in the high-level data distribution. Instead, it reflects the characteristics of random fields, i.e., the overall distribution of middle-level data, has the transition characteristics from clustering to irregular diffusion. In the above judgment of the distribution characteristics of the mid-level data, a random distribution modeling method can be used to learn the typical distribution states of the three parameter segments in the “clustering-diffusion” classification mode and to obtain \( {{\tilde{\mu}}_{{\mathbb{G}}_2\mid \mathrm{\mathbb{P}}}}^{\hbox{'}} \) in Eq. 1, the estimated results on the plane domain.
For multi-classification problems on the random field, consider that different label values correspond to different horizontal parameter ranges, and the label class values are taken in a limited discrete space. At the same time, the relationship between multi-category tags at the image position is not completely determined. Therefore, the random distribution of all category labels must be uniformly modeled and expressed to better restore the overall characteristics of the mid-level data distribution. In this section, the Gaussian process model is used to take the observation data as the training sample set X. The Bernoulli distribution is used to represent the probability of a single-class label at a fixed position of the image, and the probability result of the class label y at the airport is used as the training output. The distribution pattern among the three types of tags further contains two types of information: one is the activation and transformation of state tags at the same location and the other is the distribution relationship between different locations and multiple states. For the former, the Gibbs form is used to represent the parameter association in the corresponding polynomial distribution of the label. Accounting for the limitations of the complexity of the learning process, this paper assumes that the different tag classes between image locations are irrelevant, and the joint distribution of tags of the same type has Gaussian characteristics. The Gaussian field function f is used to represent tag associations between the same classes:
$$ {y}_i^c\mid {f}_i^c\sim \mathrm{Bern}\left(\sigma \left({f}_i|{y}_i^c=1\right)\right) $$
(20)
$$ p\left({y}_i^c|{f}_i^c\right)={\pi}_i^c=\sigma \left({y}_i^c{f}_i^c\right)=\frac{\exp \left({f}_i^c\right)}{\sum_{c\hbox{'}}\exp \left({f}_i^{c\hbox{'}}\right)}, $$
(21)
where the location i tag \( {y}_i^c \) has {0, 1} value, and the f vector form is \( f=f\left({f}_1^1,\dots, {f}_n^1,{f}_1^2,\dots, {f}_n^2,{f}_1^3,\dots, {f}_n^3\right) \). It has a prior form \( f\mid X\sim \mathcal{N}\left(0,K\right) \), where K is the corresponding covariance function and n is the amount of training data. Assuming the category information is not related, K has the form of a diagonal matrix, K = diag {k1, k2, … , kc}, where kc represents the trust relationship between each type of tag data. Therefore, the learning of the middle-level migration measure is transformed to the learning of the random quantity f.
Posterior calculations on Gaussian fields with multiple binary classifications
Since the field fi = f(⋅| xi) is a Gaussian function, the posterior form is also Gaussian:
$$ f\sim N\left(\widehat{f}|f,{A}^{-1}\right)\propto \exp \left(-\frac{1}{2}{\left(f-\widehat{f}\right)}^{\mathrm{T}}A\left(f-\widehat{f}\right)\right). $$
(22)
The maximum posterior estimate of the implicit function f is defined as \( \widehat{f}=\arg {\max}_fp\left(f|X,y\right) \), \( A=-\mathrm{\nabla \nabla}\log p\left(f=\widehat{f}|X,y\right). \)
Under the Bayesian framework,
$$ p\left(f|X,y\right)=p\left(y|f\right)p\left(f|X\right)/p\left(y|X\right). $$
(23)
Since the classification mark y of test dataset X is not directly related to f, i.e., p(y| X) does not include f, the posterior maximum solution of f corresponds to the log likelihood of \( \widehat{f} \):
$$ {\displaystyle \begin{array}{l}\Psi (f)\triangleq \log \left(p\left(y|f\right)p\left(f|X\right)\right)\\ {}\kern4em =-\frac{1}{2}{f}^T{K}^{-1}f+{y}^Tf-{\sum}_{i=1}^n\begin{array}{l}\log \left({\sum}_{c=1}^C\exp {f}_i^c\right)\\ {}-\frac{1}{2}\log \left|K\right|-\frac{Cn}{2}\log 2\pi \end{array}\end{array}}. $$
(24)
The posterior solution \( \widehat{f} \) corresponds to the zero of ∇Ψ = 0. After differentiation of the above formula, it is obtained that
$$ \mathrm{\nabla \Psi }=-{K}^{-1}f+y-\pi . $$
(25)
The zero point of this type is the prediction solution \( \widehat{f}=K\left(y-\widehat{\pi}\right) \) of the implicit function f variable. We further use the following differential relationship:
$$ -\frac{\partial^2}{\partial {f}_i^c\partial {f}_i^{c\hbox{'}}}\log {\sum}_j\exp \left({f}_i^j\right)={\pi}_i^c{\delta}_{cc\hbox{'}}+{\pi}_i^c{\delta}_{cc\hbox{'}}+{\pi}_i^c{\pi}_i^{c\hbox{'}} $$
(26)
$$ \mathrm{\nabla \nabla \Psi }=-{K}^{-1}-W,\kern0.4em W\triangleq \mathit{\operatorname{diag}}\left(\pi \right)-{\Pi \Pi}^{\hbox{'}}, $$
(27)
where Π is a Gibbs distribution π corresponding to a cn × n scale column block matrix.
We use the Newton iteration format to obtain implicit function updates:
$$ {f}^{new}=f-{\left(\mathrm{\nabla \nabla \Psi}\right)}^{-1}\mathrm{\nabla \Psi } $$
(28)
$$ {\displaystyle \begin{array}{l}f-{\left(\mathrm{\nabla \nabla \Psi}\right)}^{-1}\mathrm{\nabla \Psi }=f+{\left({K}^{-1}+W\right)}^{-1}\left(-{K}^{-1}f+y-\pi \right)\\ {}\kern11.5em ={\left({K}^{-1}+W\right)}^{-1}\left( Wf+y-\pi \right).\end{array}} $$
(29)
Because the matrix K is a larger cn × cn diagonal block matrix and the bandwidth is large, to improve the accuracy and speed of the inversion, the following decomposition is used:
$$ {\displaystyle \begin{array}{l}{\left({K}^{-1}+W\right)}^{-1}=K-K{\left(K+{W}^{-1}\right)}^{-1}K\\ {}=K-K{\left(K+{D}^{-1}-{RO}^{-1}{R}^{\mathrm{T}}\right)}^{-1}K\\ {}=K-K\left(E- ER{\left(O+{R}^{\mathrm{T}} ER\right)}^{-1}{R}^{\mathrm{T}}E\right)K\\ {}=K-K\left(E- ER{\left({\sum}_c{E}_c\right)}^{-1}{R}^{\mathrm{T}}E\right)K\end{array}}, $$
(30)
where E = (K + D−1)−1 = D1/2(I + D1/2KD1/2)−1D1/2.
The convergence monitoring of the above iterative process is represented by the likelihood value of the training data:
$$ p\left(y|X\right)=\int p\left(y|f\right)p\left(f|X\right) df=\int \exp \left(\Psi (f)\right) df. $$
(31)
Under Laplacian approximation, the form of Eq. 30 under local approximation of Ψ(⋅) is
$$ \Psi (f)\approx \Psi \left(\widehat{f}\right)-\Psi \left(\varDelta f|\widehat{f}\right)\simeq \Psi \left(\widehat{f}\right)-\frac{1}{2}{\left(f-\widehat{f}\right)}^{\mathrm{T}}A\left(f-\widehat{f}\right). $$
(32)
The likelihood of the training data for the posterior model can be approximated as
$$ p\left(y|X\right)\simeq q\left(y|X\right)=\exp \left(\Psi \left(\widehat{f}\right)\right)\int \exp \left(-\frac{1}{2}{\left(f-\widehat{f}\right)}^{\mathrm{T}}A\left(f-\widehat{f}\right)\right) df. $$
(33)
The likelihood of the training data for the posterior model can be approximated as
$$ \int \exp \left(-\frac{1}{2}{\left(f-\widehat{f}\right)}^{\mathrm{T}}A\left(f-\widehat{f}\right)\right) df\propto \frac{1}{\sqrt{{\left|{K}^{-1}+W\right|}^{-1}}}. $$
(34)
The logarithmic form of the likelihood can be expressed as
$$ {\displaystyle \begin{array}{l}\log p\left(y|X\right)\simeq \log q\left(y|X\right)=-\frac{1}{2}{\widehat{f}}^{\mathrm{T}}{K}^{-1}\widehat{f}+\log p\left(y|\widehat{f}\right)-\frac{1}{2}\log \left|K\right|-\frac{1}{2}\log \left|{K}^{-1}+W\right|\\ {}=-\frac{1}{2}{\widehat{f}}^{\mathrm{T}}{K}^{-1}\widehat{f}+{y}^{\mathrm{T}}\widehat{f}-{\sum}_{i=1}^n\log \left({\sum}_{c=1}^C\exp {\widehat{f}}_i^c\right)-\frac{1}{2}\log \left|{I}_{Cn}+{W}^{1/2}{kW}^{1/2}\right|\end{array}} $$
(35)
From this, an implicit function posterior update based on the overall training dataset is obtained. The algorithm is as follows:
-
① Input observation measurement marker y, covariance matrix K, and probability marker function initialization f ≔ 0.
-
② Calculate the label distribution law of the current observation variable:
$$ \Pi :\kern0.5em p\left({y}_i^c|{f}_i\right)={\pi}_i^c=\exp \left({f}_i^c\right)/{\sum}_{c\hbox{'}}\exp \left({f}_i^{c\hbox{'}}\right) $$
$$ \mathrm{\nabla \Psi }=-{K}^{-1}f+y-\pi, \kern0.5em \mathrm{\nabla \nabla \Psi }=-{K}^{-1} $$
where W ≜ diag (π) − ΠΠT.
-
③ For each class of implicit labels c = 1,2, …, C, calculate:
$$ L:= \mathrm{Cholesky}\left({I}_n+{D}_c^{1/2}{K}_c{D}_c^{1/2}\right) $$
$$ {E}_c={D}_c^{1/2}{L}^T\backslash \left(L\backslash {D}_c^{1/2}\right),\kern0.5em {z}_c:= {\sum}_i\log {L}_{ii}. $$
$$ M:= \mathrm{Cholesky}\left({\sum}_i{E}_i\right) $$
$$ b:= \left(D-{\Pi \Pi}^T\right)f+y-\pi, \kern0.5em c:= EKb $$
$$ a:= b-c+{ERM}^T\backslash \left(M\backslash \left({R}^Tc\right)\right)\kern0.5em f:= Ka. $$
$$ \mathrm{objective}\kern0.3em \mathrm{function}=-\frac{1}{2}{a}^Tf+{y}^Tf+{\sum}_i\log \left({\sum}_c\exp \left({f}_c^i\right)\right) $$
$$ \log q\left(y|X,\theta \right):= -\frac{1}{2}{a}^Tf+{y}^Tf+{\sum}_i\log \left({\sum}_c\exp \left({f}_c^i\right)\right)-{\sum}_c{z}_c $$
$$ \widehat{f}:= f(label). $$
Structure of positive definite kernel function of random information in the middle level
The design of the difference matrix under different systems contains different understanding of training data sets. On the one hand, there is an association between the extraction process of image data and its physical meaning. On the other hand, the actual image data has the characteristics of being distributed near higher layer data during the extraction process. Therefore, the covariance matrix used in this section of the multi-class learning algorithm references the results of high-level data learning. The high-level data model information is substituted into the covariance matrix of the middle-level data learning to further improve the learning effect of the GP model. For the three-category learning process used in this section,
$$ K\left(x,{x}^{\hbox{'}}\right)=\mathit{\operatorname{diag}}\left\{{k}_1,{k}_2,{k}_3\right\}, $$
(36)
and the design parameter array is p = [1, 0.5 , 0.25; 0.5 , 1 , 0.5; 0.25 , 0.5 , 1].
In the above equation, the sub-diagonal array k1 corresponds to the position of the image observation data at the density estimation level of 75 to 85% in the mid-level dataset of the image. After testing, it was found that although the aggregation level of this category dataset is weaker than the aforementioned high-level data model, it still has a certain clustering trend. Therefore, the associated credits between this category of data can be designed as an exponential clustering pattern, and the closest clustering component of the observation data can be found. The final correlation result of the clustering trend between x and x' is determined using an isotropic exponential function. The design of the parameter array pij further confirms the labeling of the best high-level clustering component to which the two observations belong, i.e., when the data in the two images belong to the same clustering component in a high-level model, they have a higher degree of trust.
In the design of sub-diagonal arrays, the image data corresponding to the 60–75% density estimation level in the mid-level dataset of the image no longer have a clustering trend, but surround the cluster centers. The clustering center has both attractiveness and repulsiveness to the category data. Therefore, we can consider adding a certain empirical offset Δ to the likelihood value of the high-level model corresponding to the observation position of the category image. The offset is taken as the empirical value 0.7. In the underlying data, i.e., the 50–65% density estimation range of the mid-level data in the image, the position distribution has basically been irrelevant to the clustering, and only the distance form is used:
$$ {k}_1\left(x,{x}^{\hbox{'}}\right)=\exp \Big\{\left\{-\min \left\{\left(x-{\mu}_i\right){k}_i{\left(x-{\mu}_i\right)}^{\hbox{'}}/\Delta {l}_i\right\}-\min \left\{\left({x}^{\hbox{'}}-{\mu}_i\right){k}_i{\left({x}^{\hbox{'}}-{\mu}_i\right)}^{\hbox{'}}/\Delta {l}_i\right\}-\frac{\left\Vert x-{x}^{\hbox{'}}\right\Vert }{2{p}_{ij}}\right\} $$
(37)
$$ {k}_2\left(x,{x}^{\hbox{'}}\right)=\exp \left\{-\left(\Delta -{\sum}_i{\omega}_i\mathcal{N}\left(x,{\mu}_i,{\sigma}_i\right)\right)-\left(\Delta -{\sum}_i{\omega}_i\mathcal{N}\left({x}^{\hbox{'}},{\mu}_i,{\sigma}_i\right)\right)-\left\Vert x-{x}^{\hbox{'}}\right\Vert /l\right\} $$
(38)
$$ {k}_3\left(x,{x}^{\hbox{'}}\right)=\exp \left\{-\left\Vert x-{x}^{\hbox{'}}\right\Vert /l\right\}. $$
(39)
Figure 3 shows an example of the covariance matrix in the learning process of a multi-class model of middle-finger data in the distal phalanx and middle-finger images. The covariance scale is 3n × 3n, and n is the data volume of the layer density observation set in the training image. It can be seen that there is a certain difference in the degree of trust between the different diagonal block arrays for class positions. Only the sub-array k3, in the form of a distance has a stronger associative relationship with respect to the positions of gray particles in the same class. Considering that the degree of data association carried by the sub-array k1 in the form of clustering is the smallest, it indicates that the overall diffusion trend of the middle-level data is strong and the clustering trend is relatively weak.
Model prediction process
The marker-predicted implicit vector functionsf∗ of the test data x∗ obey the approximation distribution:
$$ {f}_{\ast}\sim q\left({f}_{\ast }|X,y,{x}_{\ast}\right). $$
(40)
Under the Bayesian conditional distribution, the prediction of the test position x∗ in the training dataset X is expressed in integral form as
$$ q\left({f}_{\ast }|X,y,{x}_{\ast}\right)=\int p\left({f}_{\ast }|X,{x}_{\ast },f\right)q\left(f|X,y\right) df. $$
(41)
Since p(f∗| X, x∗, f) and q(f| X, y) are Gaussian distributions, the c-label prediction of test data x∗ is
$$ {\mathbb{E}}_q\left[{f}^c\left({x}_{\ast}\right)|X,y,{x}_{\ast}\right]={k}_c{\left({x}_{\ast}\right)}^{\mathrm{T}}{K}_c^{-1}{\widehat{f}}^c={k}_c{\left({x}_{\ast}\right)}^{\mathrm{T}}\left({y}^c-{\widehat{\pi}}^c\right), $$
(42)
where kc(x∗) is the c-type marker covariance vector between test data x∗ and all training set data X. The prediction covariance matrix is
$$ {\displaystyle \begin{array}{l}{\operatorname{cov}}_q\left({f}_{\ast }|X,y,{x}_{\ast}\right)=\Sigma +{Q}_{\ast}^{\mathrm{T}}{K}^{-1}{\left({K}^{-1}+W\right)}^{-1}{K}^{-1}{Q}_{\ast}\\ {}=\mathit{\operatorname{diag}}\left(k\left({x}_{\ast },{x}_{\ast}\right)\right)-{Q}_{\ast}^{\mathrm{T}}{\left(K+{W}^{-1}\right)}^{-1}{Q}_{\ast}\end{array}} $$
(43)
where Σ is a C × C matrix and the sub-diagonal matrix has the form \( {\Sigma}_{cc}={k}_c\left({x}_{\ast },{x}_{\ast}\right)-{k}_c^{\mathrm{T}}\left({x}_{\ast}\right){K}_c^{-1}{k}_c\left({x}_{\ast}\right) \). In this section, the Monte Carlo method is used to sample the above prediction mean and prediction covariance matrix and obtain the sample mean value as an a posteriori prediction. The forecasting process based on the training set at the random field is:
-
① Input posterior edge prediction \( \widehat{f} \), covariance matrix K, detection x.
-
② Calculate the current observation variable label distribution law Π:
$$ p\left({y}_i^c\left|{\widehat{f}}_i\right.\right)={\pi}_i^c=\exp \left({\widehat{f}}_i^c\right)/{\sum}_{c\kern0.5em \hbox{'}}\exp \left({\widehat{f}}_i^{c\kern0.5em \hbox{'}}\right) $$
$$ \mathrm{\nabla \Psi }=-{K}^{-1}\widehat{f}+y-\pi, \kern0.5em \mathrm{\nabla \nabla \Psi }=-{K}^{-1} $$
where W ≜ diag (π) − ΠΠT.
-
③ For each class of implicit labels c = 1,2, …, C, calculate
$$ L:= \mathrm{Cholesky}\left({I}_n+{D}_c^{1/2}{K}_c{D}_c^{1/2}\right) $$
$$ {E}_c={D}_c^{1/2}{L}^T\backslash \left(L\backslash {D}_c^{1/2}\right) $$
$$ M:= \mathrm{Cholesky}\left({\sum}_i{E}_i\right) $$
$$ {\mu}_{\ast}^c:= {\left({y}^c-{\pi}^c\right)}^T{k}_{\ast}^c $$
$$ b:= {E}_c{k}_{\ast}^c,\kern0.5em c:= {E}_c\left(R\left({M}^T\backslash \left(M\backslash \left({R}^Tb\right)\right)\right)\right). $$
-
④ For each type of implicit label c' = 1,2, …, C, calculate
$$ {\sum}_{cc\kern0.5em \hbox{'}}:= {c}^T{k}_{\ast}^{c\kern0.5em \hbox{'}};\cdot {\sum}_{cc\kern0.5em \hbox{'}}:= {\sum}_{cc}+{k}_c\left({x}_{\ast },{x}_{\ast}\right)-{b}^T{k}_{\ast}^c. $$
$$ {f}_{\ast}\sim N\left({\mu}_{\ast },\Sigma \right),{\pi}_{\ast}:= {\pi}_{\ast }+\exp \left({f}_{\ast}^c\right)/{\Sigma}_{c\kern0.5em \hbox{'}}\exp \left({f}_{\ast}^{c\kern0.5em \hbox{'}}\right). $$
$$ {\tilde{\pi}}_{\ast}:= {\pi}_{\ast }/S. $$
$$ {\mathbb{E}}_{q\kern0.5em (f)}\left[\pi \left(f\left({x}_{\ast}\right)\right)\left|{x}_{\ast },X,y\right.\right]:= {\pi}_{\ast } $$