We propose an exemplar-based image inpainting algorithm using angle-aware patch matching which is used to recover missing regions consisting of textural and structural components. And it can make inpainting result look more natural in connection. The overall architecture of our inpainting system is shown in Fig. 3. The first step is to initialize all unknown pixels in the missing region by surface fitting technique. At the second step, we need to calculate the priority function to determine the filling order of every pixel point at the boundary and select the target patch to be filled according to the size of gradient value of filled points dynamically. Next, we search for multiple matching patches using angle rotation strategy from the source region, and these patches have the most similar features to the target patch. And according to the proposed similarity metric, we find that the optimal source patch can achieve satisfactory inpainted results.
Initialization by surface fitting method
The aim of the subsection is to estimate pixel values for the missing region of an image. These values are plausible but have some randomness. Based on this case, we apply the surface fitting technique in 3D subspace to initialize the pixel values within the missing region. We utilize the moving least squares (MLS) method [25] to fit a surface in 3D subspace according to surrounding pixels of the missing region.
Given an image I viewed as a 2D matrix, we project pixels of the image to a 3D subspace according to similar geometrical structure and regard the gray value of each pixel as the height of the 3D coordinate. The missing pixels of the image form the holes in 3D point clouds as shown in Fig. 4a. Figure 4a is the point cloud of the incomplete image in which the black part is the missing region in 3D subspace and Fig. 4b is the fitted surface by MLS. By this initialization, we can preserve some structure features of the damaged regions. Thus, we fill the hole by fitting a surface which is generated by moving least square method and we can get a complete image I'with the estimated pixels.
In Fig. 5, we fit a real color image called ‘Lena.’ Figure 5a is an original image. Figure 5b is an incomplete image with a small missing block. In Fig. 5c the hole is filled by quadric fitting. We can observe that the intensity of the fitted region is similar to the intensity of surrounding pixels. And the estimated pixels can be regarded as the prior knowledge for inpainting and provide certain structural information. More importantly, we restore the damaged region more precisely by using the proposed algorithm.
Calculation of the target patch priorities
For all points belonging to Ω, the points filled earlier can influence the points filled afterward. Thus, how to determine the filling order of these points is extremely important. The priorities of the target patches centered at these points are also critical to preserving structural information in the inpainting process. The confidence values decrease too rapidly in terms of Eq. (2) so that the priority order becomes insignificant. Recently, Wang et al. [18] propose a novel inpainting algorithm based on space varying updating strategy and structure consistent patch matching which are used to deal with the dropping problem of the confidence and improve matching quality, respectively. In this method, instead of initializing the confidences of newly filled pixels to the same value as in Eq. (6), they consider that the priority of the center pixel \( \hat{p} \) is higher than that of its surroundings. Consequently, an upper bound and a lower bound are defined to restrain the space varying confidence of points in \( {\Psi}_{\hat{p}}\cap \Omega \). However, the values of confidence in this method only pay attention to the known pixels in \( {\Psi}_{\hat{p}} \) while ignoring the effect of the unknown pixels in\( {\Psi}_{\hat{p}} \).
In our algorithm, the values of the unknown pixels have been estimated roughly as presented in Section 3.1. Thus, we need consider the contribution of all pixels in\( {\Psi}_{\hat{p}} \). However, the estimated pixels are not enough precise in the unknown region. So we introduce weight factors for balancing the importance of pixels between \( {\Psi}_{{\hat{p}}_{\Phi}} \) and \( {\Psi}_{{\hat{p}}_{\Omega}} \), where \( {\Psi}_{{\hat{p}}_{\Phi}} \) and \( {\Psi}_{{\hat{p}}_{\Omega}} \) denote the known and unknown regions in \( {\Psi}_{\hat{p}} \) respectively. A large weight should be assigned to \( {\Psi}_{{\hat{p}}_{\Phi}} \), in contrast, a small weight should be assigned to \( {\Psi}_{{\hat{p}}_{\Omega}} \) in order to better preserve the structural information in the utmost extent. To maintain the newly filled pixels having smaller confidences than the current existing pixels, the upper bound is rewritten as
$$ {C}_{\mathrm{up}}=\frac{\sum_{k\in {\Psi}_{\hat{p}}}C(k)}{\left|{\lambda}_1{\Psi}_{{\hat{p}}_{\Phi}}+{\lambda}_2{\Psi}_{{\hat{p}}_{\Omega}}\right|} $$
(7)
where \( \left|{\lambda}_1{\Psi}_{{\hat{p}}_{\Phi}}+{\lambda}_2{\Psi}_{{\hat{p}}_{\Omega}}\right| \) denotes the number of all pixels including both known pixels and estimated pixels. λ1 and λ2 are balancing factors which can control the status of the confidence, and they are adjusted dynamically in range (0, 1), where λ1 = 1 − λ2 and λ1 > λ2. The lower bound is set as\( C\left(\hat{p}\right) \).
The new confidence term can be summarized as
$$ {C}_n(k)=\mathbf{\max}\left(-\beta \ast dis{\left(\hat{p},k\right)}^2+{C}_{up},C\left(\hat{p}\right)\right),\kern0.5em k\in {\Psi}_{{\hat{p}}_{\Omega}} $$
(8)
where\( \mathrm{dis}\left(\hat{p},k\right) \) is Euclidean distance between two pixels used to differentiate pixels in different locations of the patch, and β is the decreasing factor for limiting the dropping rate of the confidence term, set as 0.02 empirically in our experiments.
The data term D(p) is a benefit to reconstruct the local linear structure and texture. However, the priority value may be closer to zero when the data term is zero. In this case, in order to eliminate the above malpractice, we add a curvature factor to the data term based on [26]. Hence, D(p) can be rewritten as
$$ D(p)=D(p)+1/S(p) $$
(9)
$$ S(p)=\nabla \cdot \left[\frac{\nabla {I}_p}{\left|\nabla {I}_p\right|}\right] $$
(10)
where S(p) is the curvature of the isophotes through the center pixel p, which produces a better effect with a significant change in the linear structure. Besides, we take a patch along the direction of the isophotes with a higher data-term value into consideration. And we introduce the intensity information of point p at the image I as I(xp, yp), in which (xp, yp) is the coordinate of pointp and the propagation of intensity is along the direction of the isophotes.
The priority function defining the optimal filling order can be rewritten as
$$ P(p)=C(p)\left(D(p)+\frac{1}{S(p)}\right) $$
(11)
Finally, we find a pixel \( \hat{p} \) in the contour δΩ with the highest priority according to Eq. (11).
After finding a point\( \hat{p} \), we need to construct the target patch \( {\Psi}_{\hat{p}} \)centered at it. In previous methods, the size of the target patch is fixed, so the runtime and matching inaccuracy are increased. Based on above reasons, we introduce a novel idea that the size of the target patch should be selected dynamically according to frequency information of the image content. We notice that the high-frequency region contains more edge details and structural information, while the low-frequency region represents the smooth part in images. Therefore, we firstly divide an image into two components: the low-frequency component RL and the high-frequency component RH. To speed up convergence and enhance global consistency, a small patch is used in RH to enhance the restoration of the edge and structure details, while a large patch is employed in RL to reduce runtime. Here, we apply a threshold operator γ set as 0.3 empirically to determine the size of each target patch. We compare threshold value γ with the gradient value of point \( \hat{p} \). If the gradient value is larger than γ, we will construct a smaller target patch, otherwise construct a larger one. The equation of the gradient is as follows
$$ {\displaystyle \begin{array}{l}{G}_{\hat{p}}\left(x,y\right)=\sqrt{{g_x}^2+{g_y}^2}\\ {}{g}_x\left(i,j\right)=g\left(i+1,j\right)-g\left(i,j\right)\\ {}{g}_y\left(i,j\right)=g\left(i,j+1\right)-g\left(i,j\right)\end{array}} $$
(12)
where gx and gy denote gradients of horizontal and vertical directions of point (i, j) respectively.
Finding the optimal source patch by angle-aware patch matching scheme
At the moment, the optimal source patch should be found from the whole source region after defining the size of the target patch to be filled first. The similarity metric is very important to find the most similar patch from the source region. Previously, many traditional similarity metric methods, including Euclidean distance, mean squared error (MSE) as well as the sum of squared difference (SSD), etc., fail to adequately consider the matching coherence and ignore the differences of direction and structure variations within two patches. Due to the difference of the angle of the patterns, the same pattern can give rise to the unreasonable matching results between patches. Hence, we propose a similarity metric based on the angle-aware to avoid the inexact matching caused by different angles of patterns in two patches. In our algorithm, the purposed of rotation is to find more similar contents which ensure the consistency of matching results. With regard to this, we need to search for multiple source patches and select the best one after rotating as shown in Fig. 6. Meanwhile, we introduce a Jaccard similarity coefficient to enhance the similarity between patches, and when we calculate the similarity between the target patch and each rotated source patch, its value is fixed. The similarity metric is defined as
$$ {\displaystyle \begin{array}{l}D\left({\Psi}_{\hat{p}},{\Psi}_q\right)=\tau {\left\Vert {G}_{\sigma}\otimes \left({\Psi}_{\hat{p}}-\mathbf{R}\left({\Psi}_{q_i},\varDelta \theta \right)\right)\right\Vert}^2+\eta {D}_{SSD}\left(\nabla {\Psi}_{\hat{p}},\nabla {\Psi}_{q_i}\right),\\ {}\kern21em i=1,2,\cdots, n\end{array}} $$
(13)
where τ is a Jaccard similarity coefficient defined as \( \frac{\left|{\Psi}_{\hat{p}}\cap {\Psi}_q\right|}{\left|{\Psi}_{\hat{p}}\cup {\Psi}_q\right|} \), where \( \left|{\Psi}_{\hat{p}}\cap {\Psi}_q\right| \) and \( \left|{\Psi}_{\hat{p}}\cup {\Psi}_q\right| \) are the number of same pixels and the number of all pixels within \( {\Psi}_{\hat{p}} \) and Ψq respectively. The larger the coefficient is, the more similar two patches are. Gσ is a Gaussian filter in which σ is a standard deviation whose change interval is [0.4, 0.6] leading to good inpainting results, and ⊗ is the convolution operator. When σ > 0.6, the difference between the target patch and a source patch will be over-smoothed by Gaussian filter, which will yield more matching errors. When σ < 0.4, the result is poor. R(⋅, ⋅) is a rotation function which rotates each source patch to guarantee the consistency of the patterns in matching results. Δθ is a rotated angle for each source patch, its value is 20° and the rotated range is from − 90° to 90°. The purpose of this choice is that if the rotated angle is too large or too small, the inaccurate matching results will be amplified. We yet apply gradient features between two patches to the proposed distance metric in addition to color features. DSSD is the sum of squared differences over the gradients of two patches, and the gradient dimensions are weighted by η.
The similarity metric considers the property of textures and structures and also merges the gradient information to make the edge more outstanding. Unlike the previous methods finding a source patch from the source region, we want to find n nearest neighbors for each target patch. In this search process, we utilize a nearest neighbor field (NNF) [23] defined as a multi-value function f(⋅) which can map each target patch coordinate to multiple source patches coordinates so that the matching results is more accurate. The multi-value mapping is as follows
$$ f\left({\Psi}_{\hat{p}}\right)={\Psi}_{q_i},i=1,2,\dots, n $$
(14)
We store these distance values between \( {\Psi}_{\hat{p}} \) and \( {\Psi}_{q_i} \) to an additional array. Then, we compare these distance values by a competitive mechanism according to Eq. (13) in which we select the optimal source patch for each target patch. That is, the distance value is the smallest.
Updating the pixels of target patch
So far, we have found the best matching patch \( {\Psi}_{\hat{q}} \) for the target patch\( {\Psi}_{\hat{p}} \). In the last step, the previous methods directly copy the intensity values of those pixels within \( {\Psi}_{\hat{q}} \) to corresponding pixels of the unknown part of the target patch [13, 15, 27]. In contrast, we add the intensity values of those pixels in the optimal matching patch to corresponding initialized pixels in the unknown part of the target patch, and their average values are used to fill the missing region within the target patch, which can reduce the inconsistency of structures and textures. Then, we update the confidence values of the newly filled pixels as
$$ C(k)={C}_n(k),\forall k\in {\Psi}_{{\hat{p}}_{\Omega}} $$
(15)
The boundary δΩ is also updated. And the total process is repeated until the final result is obtained.