Skip to main content

Single image super-resolution by directionally structured coupled dictionary learning

Abstract

In this paper, a new algorithm is proposed based on coupled dictionary learning with mapping function for the problem of single-image super-resolution. Dictionaries are designed for a set of clustered data. Data is classified into directional clusters by correlation criterion. The training data is structured into nine clusters based on correlation between the data patches and already developed directional templates. The invariance of the sparse representations is assumed for the task of super-resolution. For each cluster, a pair of high-resolution and low-resolution dictionaries are designed along with their mapping functions. This coupled dictionary learning with a mapping function helps in strengthening the invariance of sparse representation coefficients for different resolution levels. During the reconstruction phase, for a given low-resolution patch a set of directional clustered dictionaries are used, and the cluster is selected which gives the least sparse representation error. Then, a pair of dictionaries with mapping functions of that cluster are used for the high-resolution patch approximation. The proposed algorithm is compared with earlier work including the currently top-ranked super-resolution algorithm. By the proposed mechanism, the recovery of directional fine features becomes prominent.

1 Introduction

Super-resolution (SR) is the goal in image data presentation which is already an active area of research for some years due to the interest in high-resolution (HR) images in many applications. Of course, HR images can easily be generated by using a high-definition (HD) camera. For some applications, it is still not yet practical to install such a camera (e.g., due to limitations of the capacity of the data channel), or simply not cost-efficient in a particular context of computer vision, medical imaging, or satellite imaging.

Recently proposed image representation approaches use sometimes sparse representation models for storage or transmission reasons. According to the so-called Sparseland model [1], a set of signals called the dictionary is created for linearly representing the signals of interest. These dictionaries are designed by selecting image patches from a natural set of images and iteratively minimizing the representation error. Sparsity is used as the regularizing technique (for achieving SR image representations) by enforcing the concept that low-resolution (LR) projections are preserved in linear relations of their HR counterparts [2].

Earlier dictionary learning algorithms for super-resolution were focused on learning the separate HR and LR dictionaries for super-resolution. In [3], authors propose a joint dictionary learning mechanism for learning HR and LR dictionaries in a joint feature space, thus enforcing the similarity between HR and LR sparse coefficients. At the image reconstruction, stage authors proposed the invariance of sparse coefficients for HR and LR patches. In [4], authors propose a multi-scale dictionary learning approach where wavelets were used for analysis of the LR images and dictionaries were learned at different resolution levels. By doing so, authors designed compact dictionaries at different resolution levels achieving reduced computational cost. In [5], authors propose multi-scale dictionary learning by introducing local and non-local priors for the task of single-image super-resolution. These priors are used to recover SR images by suppressing artifacts and estimating the required HR image pixels. However, in the recent work, authors propose the use of classification of training data based on scale invariant features and learn the class-dependent dictionaries instead of a single universal one or on multi-scales.

Related work includes Dong et al. [6] where authors proposed to divide the training data by k-means algorithm and clustered data into different sets and then applied the dictionary learning to get the compact dictionaries. In [7], Feng et al. use k-subspace clustering and divided the data into different subspaces and then dictionaries were learned from those subspaces in a shared bases manner. More recently, Yu et al. [8] consider the design of structural dictionaries. In [8], authors considered the orthogonal bases for the dictionary atoms and designed structured dictionaries from those orthogonal bases. Yang et al. [9] propose the use of multiple patches based clustered dictionaries instead of a single universal one. In this mechanism, the authors studied the geometric properties of the image patches. Patches were clustered into different clusters depending on their geometric property. Dictionaries were obtained from training the image patches from these clusters. In [10], authors designed nine LR directional dictionaries for solving the single-image SR problem. Here, the LR dictionaries were learned by the K-SVD algorithm [1] and HR dictionaries were obtained by solving a pseudo-inverse problem. An important thing to note here is that despite clustering the data into the directional templates, in the dictionary learning process there is no coupling between the HR and LR sparse representation coefficients. Because the SISR problem depends on the invariance of the sparse coefficients. The idea of single dictionary learning with no coupling between the sparse coefficients has already been superseded by [11, 12].

In [12], the authors proposed a coupled dictionary learning mechanism for training of HR and LR dictionaries. In this setup, an alternate mechanism is applied to the sparse coefficients of HR and LR patches; for each iteration, one sparse coefficient is chosen either the HR or LR and it is used to update both the HR and LR dictionaries. In doing so, authors achieved a slight improvement in forcing the sparse coefficients of HR and LR to be the same and thus produced results on par with the state-of-the-art algorithm published in [11].

In this paper, for the task of SISR, the basic idea of Yang et al.’s [11], the approach is assumed that the HR and LR have the same sparse coefficients. Instead of using a single pair of dictionaries as done in [11] and [12], multiple directional dictionaries are proposed as done in [10]. The training data is divided into eight directional clusters and a non-directional one. The training data is clustered by correlating the training patches with already developed directional templates. These templates have directional structure. It is shown that these are helpful in creating compact and directional dictionaries.

Now for each cluster, a pair of directional and compact dictionaries are designed along with their mapping functions. In the image recovery stage, each patch at hand is recovered with each cluster dictionary by calculating the sparse representations and using the already designed HR dictionary and mapping functions. Then, based on the sparse representation error, a proper dictionary pair is selected along with a mapping matrix. Sparse coefficients are calculated for the LR patch using the selected LR dictionary and mapping matrix. Then, HR patches are reconstructed by using the sparse representation along with the corresponding HR dictionary and mapping matrix. This clustering mechanism, along with the mapping function paradigm, allows us to super-resolve patches with high-frequency components. Experiment results show that the proposed algorithm is on par with the existing state-of-the-art algorithms and shows improvement in recovering images with directional fine features.

The rest of the paper is structured as follows. Section 2 presents the super-resolution via sparse representations. Section 3 describes the proposed algorithm. Section 4 reports simulations. Section 5 concludes. Section 5.1 gives future recommendations.

2 Image super-resolution

Achieving SISR is a type of problem that is ill-posed. Researchers tried to regularize the solution process. Recently, authors proposed a very effective method called sparsity, for regularization. Sparsity has a very nice property of scale invariance (to some extent) due to resolution blur [11]. Using sparsity as a regularizer, one can find HR from LR images using the scale invariance of sparse coefficients.

Let x H be the HR signal vector extracted from an HR image in the form of the 2−D patch, then vectorized into column form. Let D H be the corresponding HR dictionary whose columns represent atoms. We can represent this signal vector x H by using the sparse representations as x H D H α H , where α H is a sparse coefficient matrix for the HR signal vector with only very few non-zero elements.

Let x L be the corresponding LR signal vector extracted in the same manner after performing blurring and down-sampling operation on the HR images. The sparse representation of this vector LR signal can be given as x L D L α L , where D L represents the dictionary for the LR signal vector and α L represents the sparse coefficient matrix for the LR signal vector.

The LR signal vectors are generated by blurring and down-sampling the HR images. Let ψ be this blurring and down-sampling operator applied on HR images to generate the LR signal vectors. Using this operator we can relate the HR and LR signal vectors. This operation is expressed in Eq. 1. This concept also extends to the sparse representation of HR and LR signal vectors. Considering the invariance of the sparse coefficients due to resolution blur, we can also relate the HR and LR dictionaries by the same operator. This operation is expressed in Eq. 2.

$$ {x}_{L} \approx \psi {x}_{H}, $$
(1)
$$ {D}_{L} \approx \psi {D}_{H}. $$
(2)

See [11]. It follows that

$$ {x}_{L} \approx \psi {x}_{H} \approx \psi {D}_{H}\alpha_{H} \approx {D}_{L}\alpha_{L}. $$
(3)

From Eq. 3, it is concluded that α H α L .

This is the background of a key idea for solving the problem of SISR by sparse representations. If x L ,D L , and D H are given, then one can calculate α L by using some vector selection algorithm either by greedy methods or relaxation methods. Finally, the HR patches can be estimated by

$$ {x}_{H} \approx {D}_{H}\alpha_{L}. $$
(4)

3 The proposed method

This proposal addresses dictionary learning and image reconstruction by multiple dictionary learning and selective sparse coding, and it is outlined in the algorithms presented in Figs. 1 and 2. During the dictionary learning process, a set of directionally structured dictionaries is learned along with a non-directional one. These learned dictionaries along with their inherent mapping functions are used for the reconstruction of the desired images.

Fig. 1
figure 1

Proposed dictionary learning algorithm

Fig. 2
figure 2

Proposed image reconstruction

3.1 The proposed dictionary learning algorithm

In the training phase, patches are extracted from a number of natural images. These images are taken from the set provided by Yang et al. [11]. These sets of natural images are very rich in high-frequency content and are suitable for the training of dictionaries. To obtain the training set first, the LR counterparts of the HR images are obtained by down-sampling and blurring. These LR images are then interpolated by bicubic interpolation to match the dimensions of the HR images for convenience and called those images the mid-resolution (MR) images. Patches are extracted from HR and MR images from the same spatial locations and classified into nine clusters. The patch templates for clustering are designed with the eight different directional orientations to cover the two-dimensional image space and are given as y={0°,22.5°,45°,67.5°,90°,112.5°,135°,157.5°}. Each cluster template created has a specific direction with all possible shifts. These directional orientations given in y were selected after performing various tests and experiments for the optimum performance. The current directional spacing between the templates is 22.5°. If this value was increased, the number of clusters will be less and so will be the performance of the algorithm. On the contrary, if this value was less the number of clusters will increase thereby increasing the computation cost at the image recovery stage. Some of the directional templates are shown in Fig. 3 along with their shifted versions. For all the eight directions, we have considered all possible shifts.

Fig. 3
figure 3

Samples of some directional templates showing 0°,90°, and 45° orientations

The patches are extracted and clustered into these directional template clusters by a correlation between a given patch and the template. Decisions are made based on suitable thresholds chosen from the empirical set based on a histogram of correlation. After evaluating results on different patch sizes and number of samples of training data, the threshold value 0.69 was selected for the optimum performance of the algorithm. Next, a coupled dictionary learning problem is formulated and solved to obtain the clustered dictionary pairs and their mapping functions.

Let \({W^{y}_{H}}\) and \({W^{y}_{L}}\) be the HR and LR training data, respectively. The following energy function is proposed and minimized (approximately); by solving, the corresponding compact directional dictionaries along with the needed mapping function are obtained [13].

$$ \begin{aligned} &{}\min \{{D}^{y}_{H}, {D}^{y}_{L}, f(\cdot)\} E_{\text{data}}({D}^{y}_{H},{W}^{y}_{H}) +E_{\text{data}}({D}^{y}_{L},{W}^{y}_{L})\\ &+\gamma E_{\text{map}}(f({\alpha^{y}_{H}}),{\alpha^{y}_{L}}) +\lambda E_{\text{reg}}({\alpha^{y}_{H}}, {\alpha^{y}_{L}}, f(\cdot), {D}^{y}_{H}, {D}^{y}_{L}), \end{aligned} $$
(5)

where E data(·,·) is the data fidelity term, E map(·,·) is the mapping fidelity, and E reg is the regularizer. The coupling between the sparse coefficients of HR and LR data over dictionaries is related by the mapping function f(·). The HR and LR dictionaries are optimized concurrently with the mapping function.

The problem in Eq. 5 can be converted into a ridge regression and dictionary learning problem considering the mapping to be a linear function as:

$$ \begin{aligned} {}\min&\{{D}^{y}_{H}, {D}^{y}_{L}, f(\cdot)\} \lVert {W^{y}_{H}}-{D}^{y}_{H}{\alpha^{y}_{H}}{\rVert_{F}^{2}} +\lVert {W^{y}_{L}}-{D}^{y}_{L}{\alpha^{y}_{L}}{\rVert_{F}^{2}}\\ &+\gamma \!\lVert {\alpha^{y}_{L}}\,-\,{M}^{y} \!{\alpha^{y}_{H\!}}{\rVert_{F}^{2}} \,+\,{\lambda^{y}_{H}}\!\lVert {\alpha^{y}_{H}}\!\rVert_{\!1}\,+\,{\lambda^{y}_{L}}\!\lVert {\alpha^{y}_{L}}\!\rVert_{1} \!\,+\,{\lambda^{y}_{m}}\lVert \!{M}^{y}{\!\rVert_{F}^{2}}\\ &\,\,s.t. ~ \lVert {D}^{y}_{H,i}\rVert_{l_{2}}\leq 1 \wedge \lVert {D}^{y}_{L,i}\rVert_{l_{2}}\leq 1 ~, ~ \text{for all} ~ i, \end{aligned} $$
(6)

where \(\gamma, {\lambda ^{y}_{H}}, {\lambda ^{y}_{m}}\), and \( {\lambda ^{y}_{L}}\) represent the regularization terms for the optimum performance, and \({D}^{y}_{H,i}\) and \({D}^{y}_{L,i}\) are the atoms of \({D}^{y}_{H}\) and \({D}^{y}_{L}\), respectively.

The problem formulated by Eq. (6) can be solved by optimizing one parameter at a time while considering the others as being constant. As the mapping function (matrix) M y is linear, bi-directional transforms are learned from \({\alpha ^{y}_{H}}\) to \({\alpha ^{y}_{L}}\) and vice versa.

After initializing matrix M and dictionary D, one can find the sparse coefficients α by applying:

$$ \begin{aligned} {\kern-19.5pt}\min \{{\alpha^{y}_{H}}\} \lVert {W^{y}_{H}}-{D}^{y}_{H}{\alpha_{H}^{y}}{\rVert_{F}^{2}} + \gamma \lVert {\alpha^{y}_{L}}-{M}_{H}^{y}{\alpha_{H}^{y}}{\rVert_{F}^{2}} + {\lambda^{y}_{H}}\lVert {\alpha^{y}_{H}}\rVert_{1}\\ {}\min \{{\alpha^{y}_{L}}\} \lVert {W^{y}_{L}}-{D}^{y}_{L}{\alpha_{L}^{y}}{\rVert_{F}^{2}} + \gamma \lVert {\alpha_{H}^{y}}-{M}^{y}_{L}{\alpha_{L}^{y}}{\rVert_{F}^{2}} + {\lambda^{y}_{L}}\lVert {\alpha^{y}_{L}}\rVert_{1}. \end{aligned} $$
(7)

The problem in Eq. 7 can easily be solved by applying l 1norm minimization algorithm such as least-angle regression (LARS) [14].

Now for the dictionary update stage using the current sparse coefficients, the following problem is solved as:

$$\begin{array}{@{}rcl@{}} \min\{{D}^{y}_{H},{D}^{y}_{L}\} \lVert {W}^{y}_{H}-{D}^{y}_{H}{\alpha_{H}^{y}}{\rVert_{F}^{2}} +\lVert {W}^{y}_{L}-{D}^{y}_{L}{\alpha^{y}_{L}}{\rVert_{F}^{2}}\\ s.t. \;\; \text{for all} \; i, \lVert {D}^{y}_{H,i}\rVert_{l_{2}}\leq 1 \wedge \lVert {D}^{y}_{L,i}\rVert_{l_{2}}\leq 1. \end{array} $$
(8)

Now the problem in Eq. 8 is called quadratically constrained quadratic program (QCQP). It can be easily solved as done in [11]. Finally by keeping the dictionary and the sparse coefficients fixed, the matrix M can be updated as:

$$ \min\{{M}^{y}\} \lVert {\alpha^{y}_{L}}-{M}^{y}{\alpha^{y}_{H}}{\rVert_{F}^{2}} +({\lambda^{y}_{m}}/\gamma) \lVert {M}^{y}{\rVert_{F}^{2}}. $$
(9)

The problem in Eq. 9 is called the ridge regression problem and can be solved as:

$$ {M}^{y} = {\alpha^{y}_{L}}({\alpha^{y}_{H}})^{T}({\alpha^{y}_{H}}({\alpha^{y}_{H}})^{T}+({\lambda^{y}_{m}}/\gamma)\cdot I)^{-1}, $$
(10)

where I represents the identity matrix. By this strategy, a set of directional dictionaries is developed along with their mapping function (matrix). The proposed training algorithm is summarized in Algorithm 1.

Figure 4 shows the convergence curves of the proposed algorithm. Here mean squared error is calculated from HR to LR and then LR to HR sparse representations of the training patch pairs after updating the HR and LR mapping matrices in each iteration. The mapping functions are initialized as the identity matrices and our proposed algorithm converges stably.

Fig. 4
figure 4

Convergence curves of the proposed algorithm

3.2 The proposed image reconstruction algorithm

During the reconstruction stage, a set of test images is selected from different datasets [15, 16], and also, some benchmark images are selected for testing proposed algorithm. Figure 5 shows the images used in the testing phase. Care has been taken in selecting the images. It was made sure to take the images different from the training set. At the image recovery stage, a given LR image is first up-converted into the MR level by bicubic interpolation. This is done for matching the size of the HR and the (now transformed) LR image. Patches and features are extracted from this up-converted image by applying a full overlap selection scheme. This is followed by a selective sparse coding step. It needs to be identified which dictionary pair along with its mapping function gives the least sparse representation error.

Fig. 5
figure 5

Images used in testing phase. From left to right and top to bottom correspond to AnnieYukiTim, Barbara, BooksCIMAT, Butterfly, Fence, ForbiddenCity, HowMany, Kodak-05, Kodak-08, Michoacan, MissionBay, NuRegions, Peppers, Rocio, Starfish, Yan

This corresponds to a model selection scenario. We need to find which dictionary pair among the nine clusters will give the least sparse representation error and hence the best HR patch recovery. This is done by recovering HR patch from LR patch at hand using each directional dictionary pair and its mapping function. For patch-based sparse recovery, first the sparse coefficients of the LR patch are calculated by [14] using the LR patch and LR dictionary. Then HR dictionary is used along with mapping functions to recover the HR patch assuming the invariance property of the sparse coefficients. The dictionary and mapping pair which gives the least sparse representation error is chosen for the HR patch estimation. Here a very basic approach is presented to show the need and effect of directional clustering. By using all dictionaries for HR, patch recovery serves as a perfect model selection (PMS) which can be used as a reference while designing different cluster selection models. In this case, the results show peak signal to noise ratio (PSNR) improvements of 1 dB over the baseline algorithms.

Finally, those approximated HR vector patches are reshaped into two-dimensional form. As we know, patches were extracted with full-overlap, and the overlap-add method of [11] is employed at the end to get the approximate HR image. The reconstruction process is summarized in Algorithm 2.

4 Results and discussion

The proposed algorithm is compared with the algorithm of Yang et al. [11], algorithm of Xu et al. [12], and Bicubic technique (Bic.).

Tables 1 and 2 list the PSNR, structural similarity index measurement (SSIM) [17], Sharpness and contrast [18] measures for the compared algorithms on different scale parameters. Table 1 shows the results for scale parameter 2. Table 2 shows the results for scale parameters 3 and 4. The proposed algorithm uses a patch size of 6×6 with 216 dictionary atoms for each directional cluster. The baseline algorithm of Yang et al. [11] and the algorithm of Xu et al. [12] use a patch size of 6×6 with 216 dictionary atoms in the spatial domain. The algorithms being compared are of different nature and care has been taken to use the values that give optimum performance of the algorithms, being compared. Full overlapping is employed for all the algorithm to achieve the best performance. A single data set of training images used by [11] is selected here for patch extraction and around 10,000 patches were extracted for each cluster for the proposed algorithm and around 100,000 patches were extracted for the spatial domain algorithm of Yang et al. [11] and Xu et al. [12]. The simulation was carried out by setting all other parameters same. Images for all algorithms are super-resolved by different scale parameters. For the implementation of the Bicubic technique Matlab’s (imresize) function is used. The baseline algorithm of Yang et al. [11] uses a single universal dictionary for the task of SISR as well as [12] does. The proposed algorithm uses nine compact dictionaries covering eight different orientations of the image feature space.

Table 1 PSNR (top left), sharpness (top right), SSIM (bottom left) and contrast (bottom right), scale factor 2 comparison of the bicubic (Bic.) technique, algorithm of Yang et al.’s [11], algorithm of Xu et al.’s [12] and proposed algorithm
Table 2 PSNR (top left), sharpness (bottom left), SSIM (top right), and contrast (bottom right), for each image first row (scale factor 3) and second row (scale factor 4) comparison of the bicubic (Bic.) technique, algorithm of Yang et al.’s [11], algorithm of Xu et al.’s [12], and the proposed algorithm

4.1 Quantitative experimentation

LR images are reconstructed by the three algorithms and bicubic technique to their original sizes. The PSNR and SSIM as given in [11] and [17] are used along with sharpness and contrast measures used by Liu et al. [18] for the quantitative performance evaluation.

The PSNR measure for a reconstructed image is calculated as follows:

$$ {\mathcal{M}}_{\text{PSNR}}(\mathbf{x},\hat{\mathbf{x}})=10 \log_{10} \frac{255^{2}}{{\mathcal{E}}_{\text{MSE}}(\mathbf{x},\hat{\mathbf{x}})}, $$
(11)

where x is the original HR image having size of M×N, \(\hat {\mathbf {x}}\) is the estimation, and \({\mathcal {E}}_{\text {MSE}}(\mathbf {x},\hat {\mathbf {x}})\) is the mean square error (MSE) given for x and \(\hat {\mathbf {x}}\) as follows:

$$ {\mathcal{E}}_{\text{MSE}}(\mathbf{x},\hat{\mathbf{x}})=\frac{1}{MN}\sum\limits_{i=1}^{M}\sum\limits_{j=1}^{N} (\mathbf{x}_{ij}-\hat{\mathbf{x}}_{ij})^{2}. $$
(12)

The SSIM [17] is used as a perceptual quality metric, which is more compatible with human image quality perception than the PSNR measure. The sharpness and contrast measures, as introduced by Liu et al. [18], are at first calculated as s(i,j) and c(i,j), respectively, for each pixel position (i,j) and then averaged for the whole image.

Regarding s(i,j) and c(i,j), consider an image I and A(i,j) as being the 8-adjacent pixels “around” (i,j) (not including (i,j)); then

$$ s(i,j)=\lVert I(i,j)-\mu_{A(i,j)}\rVert_{1} $$
(13)

where s(i,j) is the sharpness value of image I at (i,j), and μ A(i,j) the mean value of I at pixel locations in A(i,j). For the contrast, let

$$ c(i,j)=\frac{1}{MN}\sum\limits_{x=1}^{M}\sum\limits_{y=1}^{N} \lVert I(i,j)-I(x,y)\rVert_{1}, $$
(14)

where c(i,j) is the contrast value of image I at (i,j).

The sharpness and contrast values are for comparing the contrast and sharpness values of reconstructed images with those of the original images. The table shows absolute errors (i.e., the absolute difference in contrast or sharpness from the original value, divided by the original value). Smaller values indicate less deviation from true contrast and sharpness.

Tables 1 and 2 indicate that images reconstructed by the proposed algorithm have less deviation in terms of sharpness from the original value. This corresponds to the observation that the proposed algorithm is well able to recover high-frequency components better than the other algorithms. Also, there is slightly more deviation from the original contrast value when compared with the other algorithms.

Tables 1 and 2 lists the comparison results for the proposed algorithm with the spatial domain state-of-the-art algorithm of [11] and bicubic technique. The proposed algorithm produces better results when compared with Yang et al.’s [11], due to the directional clustered dictionary learning. The proposed algorithm gives an average PSNR raise of 1.01, 0.59, and 0.72 dB for scale parameters 2, 3, and 4 over the state of the algorithm of Yang et al. [11] with SSIM improvement of 0.0127, 0.0182, and 0.0442 for scale parameters 2, 3, and 4 when tested on [15, 16] data sets and some other benchmark images. The improvements over the coupled K-SVD algorithm of Xu et al. [12] is 0.5, 0.66, and 0.72 dB in terms of PSNR for scale parameters 2, 3, and 4. The improvements in SSIM values are 0.0002, 0.0207, and 0.0447 for scale parameters 2, 3, and 4. The improvements over the bicubic technique over this set of test images is 2.32, 3.18, and 2.07 dB in terms of PSNR for scale parameters 2, 3, and 4 and 0.0623, 0.1412, and 0.1017 in terms of SSIM for scale parameters 2, 3, and 4, respectively. This justifies the fact that directional clustered dictionaries better recover some of the high-frequency components of the LR image.

From Table 1, one can see that the average PSNR and SSIM results of the proposed algorithm are less than the algorithm of [12] for scale parameter 2. This is due to the fact that the algorithm by [12] uses a coupled K-SVD approach for the dictionary update stage, also after recovering the HR patches a geometric mean algorithm is implemented to get the HR image estimate which serves as an additional post processing. However, the proposed (PMS) clearly outperforms the compared algorithms for all scale parameters.

Table 3 shows the comparison of the noisy natural images in terms of PSNR and SSIM for scale up factor 2. From Table 3, one can clearly observe that the proposed algorithm gives an average PSNR raise of 2.56 dB over the bicubic technique, 1.01 dB over the algorithm of [11], and 1.04 dB over the algorithm of [12]. In terms of SSIM the proposed algorithm gives an average raise of 0.0750 over the bicubic technique, 0.0189 over the algorithm of [11], and 0.0190 over the algorithm of [12].

Table 3 PSNR (top) and SSIM (bottom), comparison of the bicubic (Bic.) technique, algorithm of Yang et al.’s [11], algorithm of Xu et al.’s [12], and the proposed algorithm

It is noted here that the computational cost of the proposed algorithm increases nine times as compared to the algorithms of [11] and [12]. It is well known that the most expensive stage in the dictionary learning process is the sparse representation stage which is a vector selection process. Using each directional dictionary along with mapping to recover the HR patch increases the computational cost given that the proposed algorithm is using the same number of dictionary atoms and patch size. However, in some applications, one can compromise the number of computations given that the improvement margin in quality is considerable.

We also tested other dictionary model selection approaches which can reduce the computational cost. One approach that we used during the testing phase of the proposed algorithm for cluster selection was only the correlation of the LR patch at hand with each directional cluster and then using that dictionary pair for HR patch reconstruction. Using this very simple approach on average using the same test images and scale parameter 2, the PSNR improvements were 0.3 dB over the algorithm of [11] and SSIM improvement of 0.0031. These results are given in Table 1 last column. In this case, the computational cost is same as the baseline algorithms with only additional correlation computation. In this scenario, the only extra cost is the correlation computation for cluster decision when comparing with the baseline algorithms. In the same way, one can use different probabilistic models for deciding which cluster to use during the reconstruction phase given that the clustering is carried out by correlation with designed templates. One can also exploit hidden Markov trees (HMT) between the HR and LR training data and develop suitable models.

4.2 Qualitative experimentation

Here, the zoomed versions of the reconstructed images for scale parameter 3 are shown for the comparison. Figure 6 shows the zoomed original image and the reconstructed images by the algorithms used for comparison. Images are zoomed to further clarify the comparisons. Looking into Fig. 6, one can see that the reconstruction by bicubic technique shows a significant amount of blur; however, the reconstructed images by the algorithm of Yang et al. [11] are slightly clearer than the bicubic technique. Looking at the zoomed Barbara and Kodak-05 image, it is clear that the reconstruction by the proposed algorithm is much sharper around the edges and more clear in terms of sharpness best viewed on HD device. The proposed algorithm is able to recover the sharper patches more efficiently than the baseline algorithm.

Fig. 6
figure 6

Visual comparison of Barbara, Kodak-05, Starfish, Yan, from left to right correspond to: original, bicubic, [11, 12], and proposed method

5 Conclusions

The directional clustering with coupled dictionary learning is proposed for the problem of SISR. Nine pairs of directional dictionaries are designed. The proposed algorithm uses a patch size of 6×6 with 216 dictionary atoms to ensure the much needed computational cost. The proposed algorithm outperforms the spatial domain baseline algorithm of Yang et al. [11]. The proposed algorithm performs quite well when compared with the algorithm of Xu et al. [12] due to clustering and coupled dictionary learning with mapping functions.

From the results, it can be seen that the proposed idea of clustering-based coupled dictionary learning and mapping functions can produce better results when compared with the state-of-the-art algorithms.

For scale parameter 2 compared to the bicubic interpolation, the proposed algorithm gives 2.32 dB improvement as tested over the set of benchmark images. The proposed algorithm provides a 1.01 dB improvement over the baseline algorithm of Yang et al. [11], and 0.5 dB improvement over the algorithm of Xu et al. [12] as tested over the image data sets [15, 16]. Visual results also verify those quantitative results.

5.1 Future recommendations

Considering the possibilities of the extension of this work, it is suggested that in the process of designing dictionaries, one can employ the model selection from LR to HR by learning hidden Markov models [19]. Moreover, to generate the LR images, the blur filter is assumed as the bicubic filter. This work can be extended to include and compare the accurate camera blur models as in [20].

References

  1. M Elad, M Aharon, Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. (TIP). 15:, 3736–3745 (2006).

    Article  MathSciNet  Google Scholar 

  2. DL Donoho, Compressed sensing. IEEE Trans. Inf. Theory. 52:, 1289–1306 (2006).

    Article  MathSciNet  MATH  Google Scholar 

  3. J Yang, J Wright, T Huang, Y Ma, Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010).

    Article  MathSciNet  Google Scholar 

  4. B Ophir, M Lustig, M Elad, Multiscale dictionary learning using wavelets. IEEE J. Sel. Topics in Signal Process. 5(5), 1014–1024 (2011).

    Article  Google Scholar 

  5. K Zhang, X Gao, D Tao, X Li, in Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR’12). Multi-scale dictionary for single image super-resolution (IEEE ProvidenceRI, USA, 2012), pp. 1114–1121.

    Chapter  Google Scholar 

  6. W Dong, L Zhang, G Shi, X Wu, Image deblurring and super-resolution by adaptive sparse doamin selection and adaptive regularization. IEEE Trans. Image Process. 20:, 1838–1857 (2011).

    Article  MathSciNet  Google Scholar 

  7. J Feng, L Song, X Yang, W Zhang, in Proceedings of the IEEE International Conference on Image Processing, (ICIP’11). Learning dictionaries via subspace segmentation for sparse representation (IEEE BrusselsBelgium, 2011), pp. 1245–1248.

    Google Scholar 

  8. G Yu, G Sapiro, S Mallat, in Preceedings of IEEE International Conference on Image Processing, (ICIP’10). Image modelling and enhancement via structured sparse model selection (IEEEHong Kong, 2010), pp. 1641–1644.

    Chapter  Google Scholar 

  9. S Yang, M Wang, Y Chen, Y Sun, Single image super-resolution reconstruction via learned geometric dictionaries and clustered sparse coding. IEEE Trans. Image Process. 21:, 4016–4028 (2012).

    Article  MathSciNet  Google Scholar 

  10. F Farhadifard, E abar, M Nazzal, H Ozkaramanli, in Proceedings IEEE Signal Processing Communication Applications Conference (SIU’2014). Single image super-resolution based on sparse representation via directionally structured dictionaries (IEEE TrabzonTurkey, 2014), pp. 1718–1721.

    Google Scholar 

  11. J Yang, Z Wang, Z Lin, S Cohen, T Huang, Coupled dictionary training for image super-resolution. IEEE Trans. Image Process. 21:, 3467–3478 (2012).

    Article  MathSciNet  Google Scholar 

  12. J Xu, C Qi, Z Chang, in Proceedings of IEEE International Conference on Image Processing (ICIP’14). Coupled K-SVD dictionary training for super-resolution (IEEEParis, France, 2014), pp. 3910–3914.

    Google Scholar 

  13. S Wang, L Zhang, Y Liang, Q Pan, in Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR’12). Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis (IEEE ProvidenceRI, USA, 2012), pp. 2216–2223.

    Chapter  Google Scholar 

  14. B Efron, T Hastie, I Johnstone, R Tibshirani, Least angle regression. Ann. Stat. 32:, 407–499 (2004).

    Article  MathSciNet  MATH  Google Scholar 

  15. R Franzen, Kodak lossless true color image suite (2014). onliner0k.us/graphics/kodak/index.html. accessed 20 January 2016.

  16. R Klette, Concise computer vision (Springer, London, 2014). Single images. online:ccv.wordpress.fos.auckland.ac.nz/data/single-images/. accessed 20 Jan 2016.

    Book  MATH  Google Scholar 

  17. Z Wang, AC Bovik, HR Sheikh, EP Simoncelli, Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004).

    Article  Google Scholar 

  18. D Liu, R Klette, in Proceedings of International Conference on Image Vision Computing New Zealand (IVCNZ’15). Sharpness and contrast measures on videos (IEEE AucklandNew Zealand, 2015). IEEE online.

    Google Scholar 

  19. RK Lama, MR Choi, GR Kwon, Image interpolation for high-resolution display based on the complex dual tree wavelet transform and hidden markov. Multimedia Tools Appl. online, 1–12 (2016).

  20. N Efrat, D Glasner, A Apartsin, B Nadler, A Levin, in Proceedings of IEEE International Conference on Computer Vision (ICCV’13). Accurate blur models vs. image priors in single image super-resolution (IEEE SydneyAustralia, 2013), pp. 2832–2839.

    Chapter  Google Scholar 

Download references

Authors’ contributions

Both the authors have contributed equally to the text, while JA has implemented the algorithms and performed most of the tests. Both authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junaid Ahmed.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmed, J., Shah, M.A. Single image super-resolution by directionally structured coupled dictionary learning. J Image Video Proc. 2016, 36 (2016). https://doi.org/10.1186/s13640-016-0141-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13640-016-0141-6

Keywords