Image restoration based on sparse representation using feature classification learning

In the image inpainting method based on sparse representation, the adaptability of over-complete dictionary has a great influence on the result of image restoration. If the over-complete dictionary cannot effectively reflect the differences between different local features, it may result in the loss of texture details, resulting in blurred or over-smooth phenomenon in restored images. In view of these problems, we propose an image restoration method based on sparse representation using feature classification learning. Firstly, we perform singular value decomposition on the local gradient vector. According to the relationship between the main orientation and the secondary orientation, we classify all the local patches into three categories: smooth patch, edge patch and texture patch. Secondly, we use K-Singular Value Decomposition method to learn over-complete dictionaries that adapt to different features. Finally, we use Orthogonal Matching Pursuit method to calculate the sparse coding of target patches with different local features on their corresponding over-complete dictionaries, and use the over-complete dictionary and corresponding sparse coding to restore the damaged pixels. A series of experiments on various restoration tasks show the superior performance of the proposed method.


Introduction
Image restoration, also called image inpainting, originates from the restoration of damaged works of art [1,2]. Its main purpose is to achieve automatic restoration of damaged images by computers [3]. For a damaged image with scratches, text overlays, holes, and so on, the computer automatically restores the damaged region according to certain principles or algorithms, so that the obtained image is more natural, and people who are not familiar with the original image cannot notice the restoration trace [4]. So far, image restoration technology has been applied in many fields, such as the restoration of old photographs and precious documents, digital protection of cultural relics [5], special effects of film and television production, and so on. discontinuity in the edge structure. Shi et al [25] proposed to constrain the dictionary atom selection by adding double regular terms, but only using the consistency of known information cannot guarantee the similarity of unknown information, so there may be some problems such as loss of texture details. Newson et al [26] proposed a non-local patch-based image inpainting method. They used PatchMatch algorithm to search for nearest neighbors, used a weighted mean scheme to improve the comparison of textured patches, and used initialization and a multi-scale scheme to achieve more satisfactory solutions. Li et al [27] utilized operator splitting method, obtained a relaxed minimization problem with two variables, then used a universal framework called iterative coupled inpainting algorithm to restore damaged images.
By studying the classical restoration methods based on sparse representation, we find the adaptability of over-complete dictionary have a great influence on the result of image restoration. If the over-complete dictionary cannot effectively reflect the differences between different local features, this may lead to the loss of texture details, resulting in blurred or over-smooth phenomenon in restored images. In the proposed method, we perform singular value decomposition on the local gradient vector of the image patch, estimate the local main orientation according to the calculated singular values, and judge the local feature of image patch. The image patches are divided into three categories: smooth patches, edge patches and texture patches. Then, in order to make the over-complete dictionary adapt to different features, we use image patches with the same local features as a group of training samples, and use K-SVD method to learn dictionaries. Therefore, we can get three over-complete dictionaries in the proposed method. Finally, we use Orthogonal Matching Pursuit (OMP) method to calculate the sparse coding of target patches with different local features on their corresponding over-complete dictionaries, and use the over-complete dictionary and corresponding sparse coding to restore the damaged pixels.
The rest of this paper is organized as follows: In Section 2, we introduce the sparse representation model of images. In Section 3, we give a detailed description of the proposed method, including the method of classifying image patches, the method of learning dictionaries, and the method of restoring damaged image patches using the obtained over-complete dictionaries. The experiments and discussions are performed in Section 4. Finally, we conclude this work in Section 5.

Image sparse representation model
In the sparse representation theory, using an over-complete dictionary D, the signal xcan be represented as a linear combination of dictionary atoms [28]: is the over-complete dictionary, x ∈ ℝ n is the signal, and α = [α 1 , α 2 , ⋯, α L ] T is the sparse representation coefficient, it contains only a handful of non-zero elements, most of which are zeros.
In the image sparse representation model, it is assumed that there exists an overcomplete dictionary, and the image can be represented as a linear combination of dictionary atoms. Therefore, the sparse representation model of an image is: where ‖⋅‖ 0 representsl 0 norm. When the condition x ≈ Dα is satisfied, we need to find the representation coefficient α which includes the least non-zero elements.
In the specific calculation process, the constraint item is often converted into a penalty item, and Eq. (2) is transformed into an unconstrained optimization problem, i.e.: where kDα − xk 2 2 represents the reconstruction error, and ‖α‖ 0 represents the sparsity.

Proposed method
In recent years, because the method based on sparse representation can effectively avoid the mismatch between the exemplar patch and the target patch, researchers are conducting extensive and in-depth research on it. In the method, over-complete dictionary is one of the most important factors, and its performance has a great influence on the restoration results. DCT (Discrete Redundant Transform) dictionary is one of the most commonly used dictionaries. It can achieve good restoration effect for smooth images, but for texture-rich images, it will lead to the loss of texture details and the restoration effect need to be improved [29]. Shen et al [23] directly used the undamaged patches to form an over-complete dictionary, although the construction of the dictionary is relatively simple, the dictionary has limited ability to restore image.
In the proposed method, in order to further enhance the adaptability of the overcomplete dictionary, we classify the image patches into three categories according to the local features, and then use the K-SVD algorithm to train dictionaries separately, so that these dictionaries can better adapt to the image patches with different features. Finally, we use OMP method to calculate the sparse coding of target patches on their corresponding over-complete dictionaries, and use the over-complete dictionary and corresponding sparse coding to restore the damaged pixels. In this way, the adaptability of the over-complete dictionary can be effectively improved, and better restoration effect can be obtained.

Image patch classification
As we know, for smooth image patches, the gray values of the pixels are relatively close, while for non-smooth patches, the difference between the gray values of pixels is relatively large. Therefore, the variance of an image patch can effectively reflect its smoothness.
The local variance of an image patch is defined as: wherenis the number of pixels in the image patch, x is the average of all the pixels, it is defined as: Chang and Zhang EURASIP Journal on Image and Video Processing (2020) 2020:50 Page 4 of 18 In the proposed method, we first used the variance to classify image patches into smooth patches and non-smooth patches. We set a threshold. If the variance of an image patch is less than the threshold, it means that the gray values of all pixels are relatively close, and we consider it a smooth patch. Otherwise, it means that the difference between pixels is relatively large, and we consider it a non-smooth patch.
In addition, we noticed that in non-smooth patches, there are two kinds of image patches with different local features: edge patches and texture patches. However, the ability of local variance to discriminate these two types of image patches is limited. Therefore, we cannot distinguish edge patches from texture patches using the local variance.
How can we effectively distinguish edge patches from texture patches? By observing these two kinds of image patches, we find that the local main orientations of edge patches and texture patches are quite different. As shown in Fig. 1, (a) and (b) are edge patches, (c) and (d) are texture patches, they are all 8 × 8 image patches. In order to display clearly, we magnified them by 10 times in equal proportion. As can be seen from Fig. 1, in edge patches, the consistency of main orientations is clear and obvious, while in texture patches, the consistency of main orientations is relatively blurred.
Based on the above analysis, in the proposed method, we use the principal component analysis method to estimate the local main orientation of the image patch. According to the relationship between the main orientation and the secondary orientation, the edge patch and the texture patch can be distinguished. If the difference between the main orientation and the secondary orientation is relative large, that is, the consistency of main orientations is clear and obvious, we consider it an edge patch. Otherwise, the difference between the main orientation and the secondary orientation is relative small, that is, the consistency of main orientations is blurred, and we consider it a texture patch.
For an image patch f(x, y), its local main orientation should be orthogonal to the mean of gradient vectors g ! i ¼ ∇ f ðx i ; y i Þ of all the pixels in the image patch. Therefore, the estimation of the main orientation can be transformed into finding a unit vector a ! to minimize the inner product between it and g ! i , that is, we need to solve the following problems [30]: where g x i is the component of g It can be seen from Eq. (7) that the unit vector a ! that minimizes Eq. (6) is the eigenvector corresponding to the minimum eigenvalue of the matrixC. For an image patchf(x, y), we convert the gradient vectors of all pixels into n × 2 matricesG: We perform singular value decomposition on the matrixG: By decomposition, we obtain two singular values: s 1 ands 2 . s 1 represents the energy in the main orientation, and s 2 represents the energy orthogonal to the main orientation. Therefore, according to the difference between s 1 ands 2 , edge patches and texture patches are distinguished. Let: We set a threshold β. If r > β , it means that the difference between s 1 and s 2 is relative large, that is, the consistency of main orientations is clear and obvious, we judge the current patch as an edge patch. Otherwise, it means that the difference between s 1 and s 2 is relative small, that is, the consistency of main orientations is blurred, we judge the current patch as a texture patch.
For example, we use the proposed classification method to divide the image patches of two images (standard test image Lena and House) into three categories, and the results are shown in Fig. 2, where (a) shows the obtained smooth patches, (b) shows the obtained edge patches, and (c) shows the obtained texture patches. It should be mentioned that there are many image patches for each type. In order to facilitate display, we only show 100 image patches for each type. Besides, we also show the classification results of the whole image in (d), where black indicates smooth patches, gray indicates edge patches, and white indicates texture patches.
From the Fig. 2 we can see that using the above methods, we can effectively classify the image patches into three categories: smooth patches, edge patches and texture patches. In the image patches in (a), that is, the first column in Fig. 2, the gray values of the pixels are relatively close, so they are judged as smooth patches. In the image patches in (b), that is, the second column in Fig. 2, the consistency of main orientations is clear and obvious, so they are judged as edge patches. In the image patches in (c), that is, the third column in Fig. 2, although the gray values of pixels are quite different, the consistency of the main orientations is not obvious, so they are judged as texture patches. Besides, it can be seen from the classification results of the entire image, as shown in (d), the black image patches (i.e. the smooth patches) are located in the smooth area of the image, the gray image patches (i.e. the edge patches) are located in the edge area of the image, and the white image patches (i.e. the texture patches) are located in the texture area of the image. Therefore, these results can verify the effectiveness of the classification method used in this paper.

Dictionary learning
After classifying image patches according to local features, we take image patches with the same features as a group and learn the over-complete dictionary respectively, so that the obtained over-complete dictionaries can be adapted to image patches with different local features.
At present, K-SVD algorithm is the most commonly used method for learning dictionaries. Given a set of signal samples X = {x 1 , x 2 , ⋯, x N } ∈ ℝ n × N , we need to find an over-complete dictionary D, so that Xcan be sparsely represented on the dictionary D, i.e.: where The difference between Eq. (11) and Eq. (2) is that the dictionary in Eq. (2) is known, the coefficients are unknown, while in Eq. (11), the dictionary and the coefficients are unknown. Therefore, the process of solving Eq. (11) consists of two stages: sparse coding and dictionary updating, the dictionary atoms and coefficients are updated simultaneously by iteration. Suppose X s = {x i | i ∈ Ω s } represents a set of training samples randomly extracted from smooth feature patches, X e = {x i | i ∈ Ω e } represents a set of training samples randomly extracted from edge feature patches, and X t = {x i | i ∈ Ω t } represents a set of training samples randomly extracted from texture feature patches. Using K-SVD algorithm to learn over-complete dictionaries adapted to different local features can be described as follows: min D e ;α e X e − D e α e k k 2 s:t: In the learning process of Eq. (12), (13), and (14), each type of over-complete dictionary requires an initial value. In some methods, the DCT dictionary is used as the initial value. In proposed method, we randomly select image patches fromΩ s ,Ω e , and Ω t , respectively, to form the initial dictionaries D s , D e , and D t , so as to accelerate the convergence speed of the algorithm. This learning process is iterative and consists of two stages: sparse coding and dictionary updating. In sparse coding stage, we use the OMP algorithm to calculate the representation coefficients of an image patch on the current over-complete dictionary. In dictionary updating stage, dictionary atoms and representation coefficients are updated at the same time.
After the learning process, we obtain three dictionaries adapted to different characteristics: smooth dictionary D s , edge dictionary D e , and texture dictionary D t . In Fig. 3, we show three dictionaries learned from the standard test image Lena using the above method, where (a) is smooth dictionary D s , (b) is edge dictionary D e , and (c) is texture dictionary D t .

Image patch restoration
After getting three over-complete dictionaries, we can use them to restore the image patches of different features. Suppose xis the current damaged image patch, and Mis the binary mask, it uses one to indicate the pixels that need to be restored, and uses zero to indicate the already existing pixels. The sparse coding of x on the corresponding over-complete dictionary can be described as follows: In Eq. (15), since the image patch x includes damaged pixels, in order to restore the damaged pixels, we need to extract effective information from x to form the x ′ . Accordingly, the effective information of each atom is extracted from the dictionary D to form the D ′ .
In the proposed method, image patches with different features are sparsely coded in different dictionaries. Therefore, the sparse coding of smooth patch, edge patch and texture patch can be described as follows: Chang and Zhang EURASIP Journal on Image and Video Processing (2020) 2020:50 Page 8 of 18 For Eqs (16), (17) and (18), we use the OMP algorithm to solve them and get the corresponding sparse representation coefficients:α s ,α e , andα t .
Finally, we use over-complete dictionaries:D s ,D e , D t and corresponding representation coefficients:α s , α e ,α t to restore the damaged image patches. The restoration processes of smooth patch, edge patch and texture patch are described as follows: x i e ¼ x i e ; i∉M D e α e ; i∈M ð20Þ wherei indicates the location of pixels in an image patch. If the current pixel is undamaged, it remains unchanged. Otherwise, it is restored using the dictionary and the corresponding coefficient.

Experimental results and discussion
In order to verify the feasibility and effectiveness of the proposed method, we selected six standard test images in the experiment, as shown in Fig. 4, where (a) is Baboon, (b) is Barbara, (c) is Cameraman, (d) is Couple, (e) is House, and (f) is Lena. We performed simulation experiments in three aspects: removing text coverage, removing scratches, and filling holes. For better comparison and analysis, we restored each damaged image using BSCB method [7], TV method [8], DCT method, K-SVD method [19], NLPB (Non-Local Patch-Based Image Inpainting, NLPB for short) method [26], and the proposed method, respectively. The experimental environment is Matlab 2014, and the computer is configured as an Intel (R) Core (TM) i5-6200U 2.3GHz processor, 4G Memory. The main parameters are set as follows: The size of the image patch is 8 × 8, and the size of all the overcomplete dictionary is64 × 256. The iteration times of the BSCB method and the TV method are set to 2000 and 1000, respectively.
In order to objectively and quantitatively compare the restored images, we compare the results of each method from two aspects of Peak Signal-to-Noise Ratio (PSNR for short) and Structure Similarity (SSIM for short). On the one hand, PSNR is analyzed and calculated based on the difference between the gray values of two images, and it is often used to measure the degree of signal distortion. It can reflect the difference of corresponding pixels between the restored image and the original image, so we use it to measure the restoration effect of each method.
Assuming that f(x, y) andf ðx; yÞ are the original pixels and the restored pixels respectively, for an image M × N, the PSNR is defined as follows: where MSEis the Mean Square Error, and it is defined as: On the other hand, SSIM measures the similarity of two images in structure information from three aspects: luminance, contrast and structure. The SSIM is defined as follows: where α > 0,β > 0,γ > 0, are used to adjust the weight of the three parts respectively. lð f ;f Þis luminance function, it is defined as: where L is the dynamic change range of the pixel, for the gray image, its value is 255. K 1 ≪ 1 is a constant, u f and uf are the average luminance of the two images.
cð f ;f Þ is the contrast function, it is defined as: where σ f and σf are the standard deviation of the two images respectively, and K 2 ≪ 1 is a constant.
sð f ;f Þ is the structure function, it is defined as: where C 3 = C 2 /2.

Removing text coverage
In some shared pictures or documents, special marks or text are often added, which will affect the use of pictures or documents to a certain extent. Therefore, we often need to remove these marks or text from images. In this group of experiments, we artificially generate some texts and add them to each image, obtaining the damaged images we are going to restore, as shown in Fig. 5. We use each method to remove the covered text from these images. Finally, we respectively calculate the PSNR and SSIM between the original images and the restored images to quantitatively compare the performance of each method. The PSNR of each method is shown in Table 1, where the maximum PSNR value of each image is indicated in bold.
The SSIM of each method is shown in Table 2, where the maximum SSIM value of each image is indicated in bold. It should be noted that because the value of SSIM is between 0 and 1, we reserve five decimal places for better differentiation.
In Table 1, BSCB method, DCT method and NLPB method did not obtain the highest PSNR value on all images. TV method obtained the highest PSNR value on Cameraman and Couple, and K-SVD method obtained the highest PSNR value on Baboon and Barbara. The proposed method obtained the highest PSNR value on three images (Couple, House and Lena). In Table 2, the proposed method obtained the highest SSIM value on all images. It can be seen from the data in Table 1 and Table 2 that our method has achieved better restoration effects compared with other methods. The reason is that our method classifies the sample patches and learns the corresponding overcomplete dictionaries according to different classes, so that the over-complete dictionaries can adapt to different local features. Thus the proposed method can better restore the texture or structure information of the damaged image.

Removing scratches
In the old photos or precious historical documents, some scratches often appear due to long-term use or limitations of the preservation environment. These scratches have a great influence on the historical value of photos or materials. Therefore, we often need to remove scratches from images. In this group of experiments, we artificially generate broken scratches and add them to each image, obtaining the damaged images we are going to restore, as shown in Fig. 6. We use each method to remove the scratches from the images. Finally, we respectively calculate the PSNR and SSIM between the original images and the restored images to quantitatively compare the performance of each method. The PSNR of each method is shown in Table 3, where the maximum PSNR value of each image is indicated in bold.
The SSIM of each method is shown in Table 4, where the maximum SSIM value of each image is indicated in bold. It should be noted that because the value of SSIM is between 0 and 1, we reserve five decimal places for better differentiation.
As can be seen from Table 3, BSCB method and method B did not obtain the highest PSNR value on all images. TV method obtained the highest PSNR value on Couple. The proposed method obtained the highest PSNR value on Barbara. K-SVD method obtained the highest PSNR value on Baboon and Lena, and NLPB method obtained the highest PSNR value on Cameraman and House. It can be seen from Table 4 that, BSCB method, TV method and DCT method did not obtain the highest SSIM value on all images. K-SVD method obtained the highest SSIM value on Lena. NLPB method obtained the highest SSIM value on Cameraman. However, the proposed method obtained the highest SSIM value on four images (Baboon, Barbara, Couple and House). This shows that although our method does not obtain the highest PSNR value on some images, it can obtain the most similar structural information on most images, and can effectively restore the texture or edge structure of damaged images.

Filling holes
The long-term preservation or repeated use of photos or documents will cause some pixels to be lost, forming some voids in the photos or documents. Therefore, we often need to restore the holes in images. In this group of experiments, we artificially generate some holes and add them to each image, obtaining the damaged images we are going to restore, as shown in Fig. 7. We use each method to remove the holes from the images. Finally, we respectively calculate the PSNR and SSIM between the original images and the restored images to quantitatively compare the performance of each method. The PSNR of each method is shown in Table 5, where the maximum PSNR value of each image is indicated in bold.
The SSIM of each method is shown in Table 6, where the maximum SSIM value of each image is indicated in bold. It should be noted that because the value of SSIM is between 0 and 1, we reserve five decimal places for better differentiation.
In Table 5, BSCB method and K-SVD method did not obtain the highest PSNR value on all images. DCT method obtained the highest PSNR value on Baboon. NLPB method obtained the highest PSNR value on Barbara. TV method obtained the highest PSNR value on Cameraman and Couple. The proposed method obtained the highest PSNR value on House and Lena. In Table 6, BSCB method, TV method, DCT method, and NLPB method did not obtain the highest SSIM value on all images. K-SVD method obtained the highest SSIM value on Baboon, Barbara and Lena. The proposed method obtained the highest SSIM value on Cameraman, Couple and House. It can be seen from the data in Table 5 and Table 6 that compared with other method, the proposed method can obtain better results, retain better texture or structure information, and make the repaired image more consistent with the requirements of visual consistency.

Discussion
Based on the data in Table 1-6, we calculated the average values of PSNR and SSIM for each method, as shown in Table 7.  For a more intuitive comparison, we display the data in Table 7 graphically, as shown in Fig. 8 and Fig. 9.
It can be seen from Table 7, Fig. 8 and Fig. 9 that the PSNR and SSIM of the first three methods (BSCB method, TV method, and DCT method) are smaller than those of the latter three methods (K-SVD method, NLPB method, and proposed method). The first three methods are original and classic image restoration methods. The latter three methods are improved image restoration methods proposed by researchers. In particular, the proposed method achieves the highest PSNR and SSIM among all methods.
The PSNR and SSIM of BSCB method are the lowest, which means that its restoration effect is worse than other methods. The reason is that it spreads the effective information smoothly to the inside of the damaged area according to the direction of the isophotes, which will lead to the loss of texture details. However, as the most classical method of image restoration, it provides the idea of automatic image restoration, which makes image restoration become a research hotspot of computer vision.
The restoration effect of TV method is better than BSCB method, the reason is that by solving the Partial Differential Equation, the effective information is anisotropically diffused and the restoration effect is improved to a certain extent.
As a simple method based on sparse representation, the PSNR and SSIM of DCT method are smaller than those of K-SVD method. The reason is that although the construction of DCT dictionary is relatively simple, its adaptability is poor and it cannot adapt to different features of images.
Compared with the first three methods, the PSNR and SSIM of the latter three methods are higher. The reason is that these three methods have improved the classic method from different aspects and improved the effect of image restoration. The K-SVD method has achieved higher PSNR values and obtained a better restoration effect. The reason is that it uses the sample patches extracted from the original image to learn the over-complete dictionary, which can improve the ability of the over-complete dictionary to represent the natural image, so that it can adapt to the images of different features and improve the reconstruction effect. The NLPB method also obtained better restoration result. The reason is that it improves the patch-based restoration method from many aspects. It uses PatchMatch algorithm to search for nearest neighbors, uses a weighted mean scheme to improve the comparison of textured patches, and uses initialization and a multi-scale scheme to achieve more satisfactory solutions. Based on the above improvements, it has also achieved higher PSNR and SSIM.
Compared with other methods, the proposed method has achieved good restoration effect, and the PSNR and SSIM are the highest of all methods. The reason is that, we use local features to divide the image patches into different classes, and then learn the corresponding over-complete dictionary according to different classes, which can further improve the adaptability of the over-complete dictionary, so the image patch can be better sparsely expressed. Therefore, texture or structure information can be better preserved in the restored image.
In addition, it should be noted that in recent years, with the rapid development of deep learning technology, researchers have introduced deep learning into the field of image inpainting, and proposed some methods based on deep neural network, which provides a new and broad idea for the research of image inpainting [31]. Yu et al [32] proposed a coarse-to-fine generative image inpainting method with a novel contextual attention module. The contextual attention module significantly improves image inpainting results by learning feature representations for explicitly matching and attending to relevant background patches. Sagong et al [33] proposed a novel network structure called PEPSI. They adopted a structure consisting of a single shared encoding network and a parallel decoding network to reduce the number of convolution operations. Zeng et al [34] proposed a Pyramid-context ENcoder Network (PEN-Net) for  image inpainting by deep generative models. They used a pyramid-context encoder to progressively learn region affinity by attention from a high-level semantic feature map and transfer the learned attention to the previous low-level feature map. Besides, there are seven papers on image inpainting methods based on deep learning at the 2020 CVPR. Also, it should be mentioned that in this article we did not compare our method with the method based on deep learning. We think that the basic ideas of these two types of methods are different. The method based on sparse representation calculates the sparse coding of the damaged patch on the over-complete dictionary, and then reconstruct the damaged patch using the sparse coding and the over-complete dictionary, while the method based on deep learning uses a large number of real images to train the generative model and the discriminative model, so that the deep network can learn the feature distribution of the real images, and then uses the generative model to automatically generate the image of the damaged region to achieve the purpose of image restoration. However, we have realized that deep neural networks, especially generative adversarial networks, have very powerful capabilities in image restoration. Therefore, we have read a lot of references and started to conduct in-depth research on restoration methods based on deep learning, hoping to further improve the restoration effect of large-scale damaged images.

Conclusions
In order to improve the adaptability of the over-complete dictionary and effectively improve the restoration effect of sparse-representation-based method, we proposed a novel method based on feature classification learning. We performed singular value decomposition on the local gradient vector of image patch, and according to the relationship between the main orientation and the secondary orientation, we distinguish edge patches from texture patches effectively. Then, based on sample patches with different features, we learn over-complete dictionaries that adapt to different features. Finally, we use the obtained over-complete dictionary to sparsely encode and reconstruct the damaged image patches. Simulation results demonstrate the feasibility and effectiveness of our method. In the next study, we will conduct in-depth research on inpainting methods based on generative adversarial networks to further improve the restoration effect for object removal.