Skip to main content

Single-frame super-resolution for remote sensing images based on improved deep recursive residual network


Single-frame image super-resolution (SISR) technology in remote sensing is improving fast from a performance point of view. Deep learning methods have been widely used in SISR to improve the details of rebuilt images and speed up network training. However, these supervised techniques usually tend to overfit quickly due to the models’ complexity and the lack of training data. In this paper, an Improved Deep Recursive Residual Network (IDRRN) super-resolution model is proposed to decrease the difficulty of network training. The deep recursive structure is configured to control the model parameter number while increasing the network depth. At the same time, the short-path recursive connections are used to alleviate the gradient disappearance and enhance the feature propagation. Comprehensive experiments show that IDRRN has a better improvement in both quantitation and visual perception.

1 Introduction

Remote-sensing applications mainly process and analyze remotely sensed images extracted by satellites to analyze useful information on the ground, including disaster monitoring, environmental detection, geology, and resource exploration [1]. As a key indicator for measuring satellite remote sensing performance, the spatial resolution of remote sensing images is very important in practical applications. High-resolution (HR) images are usually desired for remote sensing analysis and processing procedure. However, remote sensing images always distort due to the limitations of remote sensing image sensors and other factors like optical system aberration, atmospheric disturbance, movement, and noise of imaging system. The simplest way to improve the resolution is to increase the sensors’ density of remote sensing image acquisition equipment. However, this will generate shot noise, cause a big amount of hardware costs, increase the weight and volume of the sensor, and add the difficulty of satellite launch, which is not conducive to the application and popularization of high-resolution sensors [2,3,4]. In this respect, SISR is a better approach. It is an image post-processing technology, which is based on digital signal processing theory and can effectively and conveniently improve image resolution. SISR is mainly divided into two types: reconstruction-based SISR and learning-based SISR. In remote sensing applications, without increasing hardware investment, it can obtain high-resolution images of regions of interest, improve the recognition accuracy of targets of interest in images, and increase the value of image applications [5].

The reconstruction-based method mainly uses the imaging process of low-resolution (LR) images to build a model and proposes a series of constraints on the reconstructed image. The classic algorithms mainly include the iterative backward projection (IBP) [6], projection onto convex sets (POCS) [7], and Bayesian maximum a posteriori (MAP) [8], among which, the MAP method is the most widely used, usually with a regular term [9] to build a MAP solution framework. As for the total variation (TV) regular method [10], it is believed that the total variation of a noisy image is always greater than the total variation of a pure image, so the problem of suppressing noise in reconstruction is solved by constraining the total variation of the image; in general total variation (GTV) regularization [11], the distance relationship between the point of interest and the domain is further accurately described. Gradually, more reasonable and effective regularized [12, 13] image models are used for super-resolution restoration of images. Reconstruction-based SISR algorithms are insufficient in utilization of the prior information of the image itself. Most of these methods use some prior knowledge of the image’s edge and local smoothness to form constraints, and then use iterative algorithms to solve the optimization problem, but when the magnification is large, the reconstructed image is often too smooth, which lacks sharpness.

The learning-based method mainly learns the mapping relationship between the LR and HR images by training on the training set in advance and uses the learned mapping relationship to restore the high-resolution image. Learning-based SISR algorithm was first developed by Freeman et al. [14] and then applied by Baker et al. [15] to reconstruct the face image. Super-resolution reconstruction based on clustering [16, 17] has achieved good results, and the method of learning based on the sparse representation [18, 19] is the most widely used; reference [20] improves image feature extraction and dimensionality reduction during dictionary training so that the reconstructed image retains more high-frequency detail information; reference [21] proposes the sparse representation of the sample database composed of the low-resolution and high-resolution sample image blocks, and the over-complete dictionary corresponding to the training image pair is used. In recent years, super-resolution restoration using deep learning has begun to appear. Reference [22] proposes that the three-layer convolution corresponds to the extraction of image blocks, feature non-linear mapping, and final reconstruction. The interpolation-enlarged LR image is input to reconstruct the image. A method of feedback residual network based on deep edge guidance is proposed in reference [23], and images are trained according to different frequency bands and routes through recursive residual network. Reference [24] puts forward the idea of using residual learning to implement image reconstruction; Reference [25] conducts a convolution operation on a low-resolution image and finally performs an upsampling operation at the end of the network, that is, an operation to improve the resolution; in reference [26], the idea of generative confrontation is introduced into super-resolution, and a confrontation network and a discrimination network are used to simulate the confrontation. The discrimination network is used to judge the predicted high-resolution image generated by the generation network. However, these learning-based SISR techniques require sufficient HR training examples in order to perform properly and generalize well. In addition, they usually tend to overfit quickly due to the models’ complexity and the lack of training data.

To overcome the problems mentioned above, we propose a novel fusion SR method named IDRRN in this paper. A recursive residual network is introduced into the super-resolution restoration of remote sensing images. In this network model, global residual learning and local residual learning are introduced to reduce the difficulty of training deep networks, and a recursive block composed of residual units is used. To learn the residual image between high-resolution and low-resolution images, we can boost the accuracy by increasing the network depth without adding any weight parameters. Without loss of image restoration quality, the deep learning model is improved to make its network structure more concise and compact. By connecting multiple secondary filters in the deep network, the accuracy is significantly improved. This model uses local residual learning instead of global residual learning to train deep networks, which is more conducive to information transmission and gradient flow. The infusion of a recursive structure in the residual block reduces the parameters and makes the model more compact. Taking the uninterpolated LR image as input, and finally using the deconvolution layer at the end of the network to directly upsample to the SR output image, the calculation complexity is greatly reduced.

The algorithm has been adapted to be efficiently executed in parallel and presents some methodological improvements to make the model more efficient and effective. Experimental results show that the proposed method performs significantly against existing methods in evaluation indicators and visual effect.

1.1 Related works

We briefly review the ideas and work progress related to this paper in this section. Firstly, we discuss the image degradation in remote sensing and get the mathematical model of LR images. Next, we describe the main idea of deep learning and its application in SISR algorithms. Finally, we illustrate the image restoration model of learning the residual by the convolutional neural network (CNN), in which the corruption is considered as “residual information.”

1.2 Image degradation in remote sensing

The formation of remote sensing images has gone through several links. In these links, the problems of image degradation and quality degradation inevitably occur. In order to obtain high-quality spatial images, the acquired remote sensing images need to be denoised and deblurred [27]. As shown in Fig. 1, a degradation model is first established from the original image to the actual acquired image, where the original image is a high-resolution image and the actual acquired is a LR image.

Fig. 1
figure 1

Degradation model of remote sensing images

When each image is taken by remote sensing, the blurry point spread function in different spatial domains Bi and motion deformation parameters Mi under different effects Di, a LR image sequence can finally be obtained. After the image degradation model is established, the mathematical model of the low-resolution image can be expressed as follows:

$$ {g}_i={D}_i{B}_i{M}_i\boldsymbol{f}+{n}_i,i=1,2,\dots, q $$

Among them, gi is the vectorized representation of the low-resolution image i, q is the number of LR image frames, f is a vectorized representation of a HR image, m and n represent the spatial dimensions of the real image, Mi is the motion matrix, Bi is a fuzzy matrix, Di is the downsampling matrix, and ni is the vectorized representation of the (m × n) × 1 dimensional noise.


\( g=\left[\begin{array}{c}{\mathrm{g}}_1\\ {}{\mathrm{g}}_2\\ {}\dots \\ {}{\mathrm{g}}_{\mathrm{p}}\end{array}\right] \) , \( H=\left[\begin{array}{c}{\mathrm{D}}_1{\mathrm{B}}_1{\mathrm{M}}_1\\ {}{\mathrm{D}}_2{\mathrm{B}}_2{\mathrm{M}}_2\\ {}\dots \\ {}{\mathrm{D}}_{\mathrm{p}}{\mathrm{B}}_{\mathrm{p}}{\mathrm{M}}_{\mathrm{p}}\end{array}\right] \) , \( n=\left[\begin{array}{c}{\mathrm{n}}_1\\ {}{\mathrm{n}}_2\\ {}\dots \\ {}{\mathrm{n}}_{\mathrm{p}}\end{array}\right] \) , p = 1, 2, …, q (2)

then the degradation model of q LR remote sensing images can be abbreviated as follows:

$$ \boldsymbol{g}=\boldsymbol{Hf}+\boldsymbol{n} $$

Among them, g is a vectorized representation of a LR image, His the degradation matrix, and n is a vectorized representation of noise.

1.3 Deep learning for SISR in remote sensing

High-resolution remote sensing images play an important role in agricultural and forestry monitoring, urban planning, and military reconnaissance. As the smallest size that can be distinguished by the spatial details of the target in the image, the spatial resolution of the remote sensing image is one of the key indicators for evaluating the image quality. However, due to the high-cost and time-consuming development of HR remote sensing satellites, how to obtain HR images economically and conveniently has always been a major challenge in the field of remote sensing. Super-resolution reconstruction technology is a favorite resort to such problems. The general objective in SR is to improve the image resolution beyond the sensor limits, that is, to increase the number of image pixels while providing finer spatial details than those captured by the original acquisition instrument.

The SISR of remote sensing images is an ill-conditioned inverse problem, so reasonable image feature expression is particularly important in the reconstruction process. Deep learning methods, especially CNN, can perform feature transformation and non-linear mapping on LR images to obtain complex feature expressions of LR images and then build LR images to HR images complex mapping relationship. The essence of deep learning is a self-learning method for data representation, replacing manually extracting features by using unsupervised or semi-supervised feature learning and hierarchical feature acquisition methods.

Super-resolution convolutional neural network (SRCNN )[22] has begun the era of deep convolutional neural networks dealing with super-resolution problems. The algorithm takes the result of LR image interpolation as the network input and obtains a HR image after three convolutional transformations. After three steps of feature extraction, nonlinear transformation, and feature restoration, a very good restoration effect is obtained. The first convolution layer is the extraction of image features. Image blocks are extracted from the LR image and each block is represented as a high-dimensional vector. Given a low-resolution image x, the process can be expressed as follows:

$$ {N}_1(x)=\max \left(0,{f}_1x+{d}_1\right) $$

Among them, f1 is the convolution kernel of the first convolution layer, which can be regarded as a filter. d1 represents the bias of the first layer.

The second convolution layer is a non-linear mapping between features, mapping each high-dimensional vector to another high-dimensional vector. Each mapping vector is a conceptual representation of HR blocks, which can be expressed as follows:

$$ {N}_2(x)=\max \left(0,{f}_2{N}_1(x)+{d}_2\right) $$

Here f2 and d2 represent the filter and bias of the second convolution layer.

The third convolution layer is a process of reconstructing an image to generate HR image. This operation stitches the above HR image blocks to generate a final HR image, which can be expressed as:

$$ {N}_3(x)={f}_3{N}_2(x)+{d}_3 $$

Here, f3 and d3 represent the filter and bias of the third convolution layer.

The entire convolutional neural network model continuously reduces the loss of the network through iteration. When the loss value is minimized and stabilized, the corresponding weight and bias of each layer of convolution are the optimal results of the network.

Accompanying the robust development of deep learning algorithms and great success of SRCNN, super-resolution recovery algorithm based on deep convolutional networks developed rapidly, and various improved variants and new network structures appeared accordingly, such as fast super-resolution convolutional neural network (FSRCNN) [28], very deep convolutional networks for image super-resolution (VDSR) [24], super-resolution generative adversarial network (SRGAN) [26], end-to-end deep and shallow networks (EEDS) [29], and enhanced deep super-resolution network (EDSR) [30]. This greatly improves the practical application of deep learning for SISR.

1.4 Deep residual network

Residual network (ResNet) is proposed to solve the problem of network degradation when the deep neural network has too many hidden layers. Its main idea is to learn the residual function instead of the original function based on the input, which makes the training of the deeper network simpler, and can get better performance from the deeper network [31,32,33]. Its network structure is shown in Fig. 2.

Fig. 2
figure 2

Network architecture of ResNet

Reference [34] pointed out that two weight layers and an activation function ReLu are regarded as a basic unit, and then, the input and output of the unit are added at the pixel level through a jump connection, that is, the corresponding pixels in the feature map are added, and the residual operation is performed as follows:

$$ H(x)=F(x)+x $$

Among them, x represents the input of a basic unit, H(x) represents the result of the residual calculation, and F(x) represents the basic unit calculation result.

The residual block structure is as follows:

$$ {x}_o=U(x)=\sigma \left(F\left(x,W\right)+h(x)\right) $$

Among them, xo represents the output of the residual block, h(x) is an identity mapping and h(x) = x, W is a set of weights, F(x, W) is the residual mapping to be learned, σ represents Relu activate function, and U represents a residual block function. The residual mapping is easier to optimize than the original mapping.

The proposed residual network breaks the argument that deepening the number of layers in the network cannot improve performance. Moreover, the structure of the deep residual network is simple, which solves the problem of performance degradation of deep convolutional neural networks under extremely deep conditions, and the classification performance is excellent.

1.5 Proposed improved method

1.5.1 Recursive structure

Reference [35] proposed deeply-recursive convolutional network (DRCN) algorithm, which introduced recursive algorithm in residual network. The recursive structure consists of 16 chain structures. DRCN passes the recursive results through the reconstruction layer each time, generating intermediate results of HR images. DRCN’s recursive structure allows weight parameters to be shared in the convolutional layer, effectively controlling model parameters. However, in order to solve the problem that the training deep model is prone to vanish or explode gradients, each recursive learning needs to be supervised, which undoubtedly increases the burden on the network.

In response to the above issues, in this paper, the improved recursive structure is introduced into the residual block to reduce the network scale and make the model more compact. At the same time, the weights are shared among the residual blocks, reducing the number of model parameters. The residual block function is defined as:

$$ {H}^{\mu }=R\left({H}^{\mu -1}\right)=F\left({H}^{\mu -1},W\right)+{H}^0 $$

Hμ is the μth output of the first residual block, R represents the residual block function, F(Hμ − 1, W) is the residual mapping to be learned, W is the set of weights, and H0 is the feature image output through the first convolution layer.

A convolution layer and a Relu layer are introduced at the beginning of the recursive block and then superimpose multiple residual blocks, which forms a recursive structure. Among them, H0 refers to the identity mapping of each residual block, and B represents the number of residual blocks contained in the recursive structure. The algorithmic recursive structure is shown in Fig. 3.

Fig. 3
figure 3

Recursive block structure of DRCN

The result of the μth residual block can be obtained by the residual block function R recursively.

1.5.2 Network structure optimization

The algorithm introduces local residual learning to reduce the difficulty of training the deep network. First, the high-frequency features of the LR input image are extracted through the convolution layer, and then after each two-layer convolution layer, the feature image extracted by the first convolution layer is added. That is, the inputs of all identity branches in the residual block remain the same. In this way, more image information can be transmitted to the deeper layer of the network, and its identity branch also helps the back propagation of gradients during training, avoiding the overfitting phenomenon [36]. The improved residual block structure consists of two convolutional layers and two Relu layers. The residual block structure is shown in Fig. 4.

Fig. 4
figure 4

Improved residual block structure. It consists of two convolutional layers and two Relu layers

Recursive structure is introduced in the residual block. The parameters are reduced, which is more helpful for information transmission and gradient flow. LR input image goes through a convolutional layer and a Relu layer, extracts features, and then inputs the extracted features into several residual blocks, and recursively learn the residual mapping function. Finally, at the end of the network, a deconvolution layer is used to directly upsample the learned residual image and restore SR output image. The optimized network structure is shown in Fig. 5.

Fig. 5
figure 5

Improved network of IDRRN. It consists of three parts: feature extraction, nonlinear mapping of residual function, and SR image reconstruction

It can be seen from the figure the number of convolutional layers in each residual network unit. In the improved network, there are more layers of residual network units at the front part of the network and fewer layers of the residual network units at the later part. This design can make the entire network contain deeper network branches while using the same number of parameters, thereby improving the quality of the generated images. The deep branches of the adjusted network increase, so that the optimized network can work more efficiently. At the same time, in order to avoid gradient dispersion and overfitting in deep networks, a pooling layer is added to the branches with deeper network layers, that is, residual network units near the output end.

The whole network is composed of three parts: feature extraction, nonlinear mapping of residual function, and SR image reconstruction. The LR input image passes through a convolutional layer and a Relu layer to extract features, and then, the extracted features are input into several residual blocks, and the residual mapping function is learned recursively. Finally, at the end of the network, a deconvolution layer is used to directly upsample the learned residual image to reconstruct the SR output image.

1.6 Evaluation criteria

Objectively, the deviation error between the restored image and the original image is generally used to evaluate the quality of the image restoration. In this paper, peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and Erreur Relative Globale Adimensionnelle de Synthèse (ERGAS) are used as reference evaluation indicators for image quality.

The larger the PSNR value, the smaller the difference between the reconstruction result and the original image, the better the reconstruction effect. The calculation formula is as follows:

$$ PSNR=10\ast {\log}_{10}\left\{\frac{{\left[\max \left({X}_i\right)\right]}^2\ast MN}{{\left\Vert {X}_i-{Y}_i\right\Vert}^2}\right\} $$

Among them, Xi is the high-resolution image of the original reference, Yi is the reconstructed image, M and N are the height and width of the image, and generally, the maximum value of max(Xi) is 255, which can be directly substituted in the formula.

PSNR is mainly based on the comparison between pixels, and the evaluation of the local structure of the image is relatively weak. Sometimes the PSNR values of the two images are close, but the visual effects of the images are very different. Images generally have their own structures, and there is more or less correlation between adjacent pixels. SSIM is a structural parameter between the reconstruction result and the reference high-resolution image. The calculation formula is as follows:

$$ SSIM\left(X,Y\right)=\frac{\left(2{\mu}_X{\mu}_Y+{C}_1\right)\left(2{\sigma}_{XY}+{C}_2\right)}{\left({\mu}_X^2+{\mu}_Y^2+{C}_1\right)\left({\sigma}_X^2+{\sigma}_Y^2+{C}_2\right)} $$

Among them, X and Y are the reference HR image and the restored result image, respectively. μX and μY represent the average pixel value of two image pairs, which are defined as follows:

$$ {\mu}_X=\frac{1}{N}\sum \limits_{i=1}^NX(i) $$
$$ {\mu}_Y=\frac{1}{N}\sum \limits_{i=1}^NY(i) $$

N is the number of dimensions to expand the image by column. σX and σY are the corresponding variance, defined as follows:

$$ {\sigma}_X={\left(\frac{1}{N-1}\sum \limits_{i=1}^N{\left(X(i)-{\mu}_X\right)}^2\right)}^{\frac{1}{2}} $$
$$ {\sigma}_Y={\left(\frac{1}{N-1}\sum \limits_{i=1}^N{\left(Y(i)-{\mu}_Y\right)}^2\right)}^{\frac{1}{2}} $$

σXY is the covariance, which is defined as:

$$ {\sigma}_{XY}=\frac{1}{N-1}\sum \limits_{i=1}^N\left(X(i)-{\mu}_X\right)\left(Y(i)-{\mu}_Y\right) $$

C1 and C2 are normal number whose denominator is not zero. The value of SSIM ranges from 0 to 1. The closer the value is to 1, the more similar the two images are, and the better the reconstruction result is.

ERGAS is a quality evaluation method proposed for image fusion research, which reflects the degree of spectral distortion between the restored image and the reference image. It is also commonly used in the super-resolution restoration quality evaluation of images. The calculation formula is as follows:

$$ ERGAS=100\frac{h}{l}\sqrt{\frac{1}{K}\sum \limits_{k=1}^K{\left(\frac{RMSE(k)}{\mu (k)}\right)}^2} $$

l and h represent the resolution before and after image reconstruction, K represents the number of bands, μ(k) represents the average of k band, and RMSE represents the root mean square error of the image. The ideal value of ERGAS is 0.

2 Results and discussion

2.1 Experimental environment and settings

The experimental software environment uses Ubuntu 14.04, Python 2.7, TensorFlow 1.4; the hardware environment is Intel Core i7-6700K, RAM 16GB, and the GPU is NVIDIA GTX1080. We use remote sensing image scene classification data set NWPU-RESIS45 [37] created by Northwestern Polytechnical University. Data set includes 45 scenes, each scene has 700 images, and each image size is 256×256, ensuring the authenticity and diversity of experimental data.

From each type of remote sensing image, 100 images with obvious features are selected, with a total of 4500 images. These images constitute a training data set to train the algorithm model. In addition, a total of 450 images of each type are chosen as test data sets, and different SR algorithms (SRCNN, FSRCNN, DRCN, VDSR, EDSR, and IDRRN) are used to simulate the test results. There are some of the training images as shown in Fig. 6, comprising the following scenes: airplane, basketball_court, bridge, circular_farmland, harbor, industrial_area, intersection, and parking_lot.

Fig. 6
figure 6

Scenes used in the experiments. a Airplane, b basketball_court, c bridge, d circular_farmland, e harbor, f industrial_area, g intersection, and h parking_lot

For input images, first use the magnification factor n to downsample the original training image, and it becomes an LR image. Then crop the LR image into a set of sub-images with stride s and sizefsub × fsub pixel and crop the corresponding size from the corresponding real image to (nfsub)2 pixel HR sub-images. These LR/HR sub-image pairs are training samples. To ensure that the image size does not change during the mapping process, the convolutional layers are filled with “0.” When training IDRRN, the deconvolution filter will generate a size of (nfsub − n + 1)2 output image. Therefore, we need to crop the n − 1 pixel boundaries of HR sub-image.

2.2 Quantitative results of SR methods

The network depth of the IDRRN algorithm proposed in this paper has 12 layers. The filter size should be odd so that it has a center, such as 3×3, 5×5, or 7×7. The use of smaller convolution kernels is one of the current trends to reduce parameters while ensuring network accuracy.

The parameter setting of the convolution layer is the same as VDSR [24]. All convolutional layer filters are 3×3 in size and the number of filters is 64. The deconvolution uses the mean value of 0, the standard deviation is 0.001 random initialization of Gaussian distribution, and take Relu function as activation function. The size of the filter refers to the DRCN algorithm [35], which is 5×5. The step is equal to the amplification factor n. During training, the size of the image batch is 128, the momentum is 0.9, and the weight attenuation parameter is 0.0001. The initial learning rate is set to 0.1, then the learning rate is halved every 15 generations; the learning stops after 120 generations, and the loss function is the MSE (mean square error) function.

The performance of the proposed approach has been compared with the results obtained by six different SR methods available in the literature (Bicubic, SRCNN [22], FSRCNN [28], DRCN [35], VDSR [24], and EDSR [30]). Three different scaling factors, ×2, ×3, and ×4, have been tested over the considered image data set (airplane, bridge, harbor, intersection, and parking_lot). All the tested methods have been used considering the default settings suggested by the methods’ authors for each particular scaling ratio. Table 1 provides a brief PSNR/SSIM description of the SR techniques.

Table 1 PSNR/SSIM values of state-of-the-art SR methods. The bold number indicates the best performance

As shown in Table 1, the average PSNR and SSIM values of the images generated by the method in this paper are higher than other current mainstream SISR algorithm. The PSNR values are optimal in 5 types of scenarios. The maximum boost value is 5.19dB, when under ×2 magnification, the maximum boost value is 3.99dB when under ×3 magnification, and the maximum boost value is 2.74dB when under ×4 magnifications. In terms of value, except for the ×4 magnification conditions of harbor and intersection, the rest are optimal. The algorithm in this paper reaches maximum boost value 0.1088 at ×2 magnification, maximum boost value 0.1839 at ×3 magnification, and maximum boost value 0.0759 at ×4 magnification.

Because of the particularity of remote sensing images, this paper uses ERGAS value in Formula (17) to compare the SR effect in order to further verify the effectiveness of the improved algorithm. From Table 2, we can get that among the 15 ERGAS data results, the IDRRN algorithm obtained 11 optimal values.

Table 2 ERGAS values of state-of-the-art SR methods. The bold number indicates the best performance

By analyzing and comparing the SR results of Tables 1 and 2, we find that the recursive residual learning can transfer more effective image information to the depth of the network, learn more image features, and make the image restoration quality improve greatly.

Furthermore, the proposed IDRRN approach from inherent parameter sharing obtains higher parameter efficiency compared to other learning-based methods. In Fig. 7, we illustrate the parameters-to-PSNR relationship of our model and several state-of-the-art methods, including SRCNN, FSRCNN, DRCN, VDSR, and EDSR. Our method represents a favorable trade-off between model size and SR performance and has modest processing time.

Fig. 7
figure 7

Average PSNR and number of parameters for scale factor ×3 of various SISR methods

The addition of improved recursive structure does not need to increase the number of parameters. In addition, it improves the restoration quality of the image. The network structure is more compact and the objective performance is better.

2.3 Visual results and discussion

In order to demonstrate the effectiveness of our approach more fully, we also show some of the visual comparisons on three scales ×2, ×3, and ×4. Figures 8, 9, and 10 show the qualitative evaluation results of various algorithms. By enlarging the details of the image, the quality of image restoration of several SISR methods can be intuitively evaluated from the visual effect.

Fig. 8
figure 8

Comparison of restored HR images of “airplane_061” obtained via various methods with a scale factor of ×2. a Bicubic, b SRCNN, c FSRCNN, d DRCN, e VDSR, f EDSR, g IDRRN, and h original

Fig. 9
figure 9

Comparison of restored HR images of “harbor_342” obtained via various methods with a scale factor of ×3. a Bicubic, b SRCNN, c FSRCNN, d DRCN, e VDSR, f EDSR, g IDRRN, and h original

Fig. 10
figure 10

Comparison of restored HR images of “bridge_142” obtained via various methods with a scale factor of ×4. a Bicubic, b SRCNN, c FSRCNN, d DRCN, e VDSR, f EDSR, g IDRRN, and h original

It can be seen from the figures that our method has a significant improvement in both image sharpness and clarity. After image processing, it is easier to identify multiple image categories in the remote sensing image. IDRRN overcomes the shortcomings of the overall smooth reconstruction result of the traditional method, and the reconstruction result restores more high-frequency details.

In addition, from the comparison of the enlarged parts of the tail of the aircraft in Fig. 8, the ships in the port in Fig. 9, and the vehicles on the bridge in Fig. 10, it can be seen that the image after the SR reconstruction by the IDRRN generation network is sharper compared with other mainstream algorithms. It has a better performance in the restoration of remote sensing image details, and it is more effective in repairing complex textures in damaged images. After repairing, the details in the image are richer and more consistent with the visual characteristics of the human eye. With the SR restoration of the remote sensing image, the texture and edges are clearer, and the objects in the output image are easier to recognize.

3 Conclusion

In this paper, we propose a new type of residual network that introduces an improved recursive structure in the residual block. The jump connection and recursive structure can effectively reduce the burden of carrying characteristic information on the network, achieving high-quality SR remote sensing image recovery. Experiments were performed using the NWPU-RESISC45 remote sensing image data set, and PSNR, SSIM, and ERGAS are the objective quality evaluation index of image SR. Experimental results show that compared with other super-resolution methods based on CNN, the method in this paper has more compact network structure and fewer model parameters, and the reconstruction details are more abundant. Moreover, the restoration results have better visual effects and are more conducive to further remote sensing image analysis.

In the next work, we will try to generalize the proposed IDRRN method to color images by designing a more compact network structure and improving the loss function of the model. In addition, we hope to further improve the details of super-resolution images and the repair effect of complex textures.

Availability of data and materials

The supporting data involved in the current study are available from the corresponding author by reasonable request.



Single-frame image super-resolution


Improved deep recursive residual network


High resolution


Low resolution


Iterative backward projection


Projection onto convex sets


Maximum a posterior


Total variation


General total variation


Convolutional neural network


Super-resolution convolutional neural network


Fast super-resolution convolutional neural network


Very deep convolutional networks for image super-resolution


Super-resolution generative adversarial network


End-to-end deep and shallow networks


Enhanced deep super-resolution network


Residual network


Deeply-recursive convolutional network


Peak signal-to-noise ratio


Structural similarity


Erreur Relative Globale Adimensionnelle de Synthèse


Mean square error


  1. H. Ghassemian, A review of remote sensing image fusion methods. Inf. Fusion 32, 75–89 (2016)

    Article  Google Scholar 

  2. J.C. White, N.C. Wulder, M. Vastaranta, T. Hilker, P. Tompalski, Remote sensing technologies for enhancing forest inventories: a review. Can. J. Remote Sens. 42(5), 619–641 (2016)

    Article  Google Scholar 

  3. J.M. Haut, R. Fernandez-Beltran, M.E. Paoletti, J. Plaza, A. Plaza, F. Pla, A new deep generative network for unsupervised remote sensing single-image super-resolution. IEEE Trans. Geosci. Remote Sens. 56(11), 6792–6810 (2018)

    Article  Google Scholar 

  4. W. Ma, Z. Pan, J. Guo, B. Lei, Achieving super-resolution remote sensing images via the wavelet transform combined with the recursive res-net. IEEE Trans. Geosci. Remote Sens. 57(6), 3512–3527 (2019)

    Article  Google Scholar 

  5. J. Gu, X. Sun, Y. Zhang, K. Fu, and L. Wang, Deep residual squeeze and excitation network for remote sensing image super-resolution, Remote Sens. 11(15), 1817 (2019).

  6. S. Singh, M.K. Kalra, J. Hsieh, P.E. Licato, S. Do, H.H. Pien, M.A. Blake, Abdominal CT: comparison of adaptive statistical iterative and filtered back projection reconstruction techniques. Radiology 257(2), 373–383 (2010)

    Article  Google Scholar 

  7. X. Li, W. Fu, Regularized super-resolution restoration algorithm for single medical image based on fuzzy similarity fusion. J. Image Video Proc. 2019, 83 (2019)

    Article  Google Scholar 

  8. H. Shen, L. Zhang, B. Huang, P. Li, A MAP approach for joint motion estimation, segmentation, and super resolution. IEEE Trans. Image Process. 16(2), 479–490 (2007)

    Article  MathSciNet  Google Scholar 

  9. S. Huang, J. Sun, Y. Yang, Y. Fang, Y. Que, Robust single-image super-resolution based on adaptive edge-preserving smoothing regularization. IEEE Trans. Image Process. 27(6), 2650–2663 (2018)

    Article  MathSciNet  Google Scholar 

  10. X. Li, Y. Hu, X. Gao, D. Tao, B. Ning, A multi-frame image super-resolution method. Signal Process. 90(2), 405–414 (2010)

    Article  Google Scholar 

  11. S. Farsiu, M.D. Robinson, M. Elad, P. Milanfar, Fast and robust multiframe super resolution. IEEE Trans. Image Process. 13(10), 1327–1344 (2004)

    Article  Google Scholar 

  12. N. Del Gallego, J. Ilao, Multiple-image super-resolution on mobile devices: an image warping approach. J Image Video Proc 2017, 8 (2017)

    Article  Google Scholar 

  13. L. Zhou, X. Lu, L. Yang, A local structure adaptive super-resolution reconstruction method based on BTV regularization. Multimed. Tools Appl. 71(3), 1879–1892 (2014)

    Article  Google Scholar 

  14. W.T. Freeman, E.C. Pasztor, O.T. Carmichael, Learning low-level vision. Int. J. Comput. Vision 40(1), 25–47 (2000)

    Article  Google Scholar 

  15. C. Liu, H.Y. Shum, W.T. Freeman, Hallucinating faces: theory and practice. Int. J. Comput. Vision 52(4), 1289–1306 (2007)

    Google Scholar 

  16. J.J. Li, X.H. Li, Super-resolution reconstruction method for single frame image based on clustering. Comput. Eng. 39(7), 284–287 (2013)

    Google Scholar 

  17. J. Gao, Y. Wang, M. Cai, Y. Pan, H. Xu, J. Jiang, H. Ji, H. Wang, Mechanistic insights into EGFR membrane clustering revealed by super-resolution imaging. Nanoscale 7(6), 2511–2519 (2015)

    Article  Google Scholar 

  18. Q. Dai, S. Yoo, A. Kappeler, A.K. Katsaggelos, Sparse representation-based multiple frame video super-resolution. IEEE Trans. Image Process. 26(2), 765–781 (2017)

    Article  MathSciNet  Google Scholar 

  19. J.C. Ferreira, E. Vural, C. Guillemot, Geometry-aware neighborhood search for learning local models for image super-resolution. IEEE Trans. Image Process. 25(3), 1354–1367 (2016)

    Article  MathSciNet  Google Scholar 

  20. Z. Zhu, F. Guo, H. Yu, C. Chen, Fast single image super-resolution via self-example learning and sparse representation. IEEE Trans. Multimedia 16(8), 2178–2190 (2014)

    Article  Google Scholar 

  21. H. Yin, S. Li, L. Fang, Simultaneous image fusion and super-resolution using sparse representation. Inf. Fusion 14(3), 229–240 (2013)

    Article  Google Scholar 

  22. C. Dong, C.C. Loy, K.M. He, X.O. Tang, Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)

    Article  Google Scholar 

  23. W. Yang, J. Feng, J. Yang, F. Zhao, J. Liu, Z. Guo, S. Yan, Deep edge guided recurrent residual learning for image super-resolution. IEEE Trans. Image Process. 26(12), 5895–5907 (2017)

    Article  MathSciNet  Google Scholar 

  24. J. Kim, J. K. Lee, and K. M. Lee, Accurate image super-resolution using very deep convolutional networks, in Proc. IEEE Conf. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, Nov. 2016, pp. 1646-1654.

  25. W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network, in Proc. IEEE Conf. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, Nov. 2016, pp. 1874-1883.

  26. C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, Photo-Realistic single image super-resolution using a generative adversarial network, in Proc. IEEE Conf. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 4681-4690.

  27. U. Mudenagudi, S. Banerjee, P.K. Kalra, Space-time super-resolution using graph-cut optimization. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 995–1008 (2011)

    Article  Google Scholar 

  28. Dong C, Loy CC, Tang X. Accelerating the super-resolution convolutional neural network, in Proc. European Conf. Comput. Vis. (ECCV). Amsterdam: Springer; 2016. pp. 391–407.

  29. Y. Wang, L. Wang, H. Wang, P. Li, End-to-End image super-resolution via deep and shallow convolutional networks. IEEE Access 7, 31959–31970 (2019)

    Article  Google Scholar 

  30. B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, Enhanced deep residual networks for single image super-resolution, in Proc. IEEE Conf. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 136-144.

  31. C. Yan, B. Gong, Y. Wei, Y. Gao, Deep multi-view enhancement hashing for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. (2020)

  32. C. Yan, B. Shao, H. Zhao, R. Ning, Y. Zhang, F. Xu, 3D room layout estimation from a single RGB image. IEEE Trans. Multimedia (2020)

  33. C. Yan, Z. Li, Y. Zhang, Y. Liu, X. Ji, Y. Zhang, Depth image denoising using nuclear norm and learning graph model. ACM Trans. Multimedia Comput. Commun. Appl. (2020)

  34. K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, Nov. 2016, pp. 770-778.

  35. J. Kim, J. K. Lee, and K. M. Lee, Deeply-recursive convolutional network for image super-resolution, in Proc. IEEE Conf. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, Nov. 2016, pp. 1637-1645.

  36. Y. Tai, J. Yang, X. Liu, Image super-resolution via deep recursive residual network, in Proc. IEEE Conf. Vis (Pattern Recognit. (CVPR), Honolulu, 2017), pp. 3147–3155

    Google Scholar 

  37. G. Cheng, J. Han, X. Lu, Remote sensing image scene classification: benchmark and state of the art. P. IEEE 105(10), 1865–1883 (2017)

    Article  Google Scholar 

Download references


The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.


This work was supported by the Opening Project of Jiangsu Key Laboratory of Advanced Numerical Control Technology (SYKJ201804), by the Project funded by Jiangsu Postdoctoral Science Foundation (2019K041), by Nanjing Subsidy Project of IUR Cooperation (201722043), and by Changzhou Sci&Tech Program (CE20195030). It was also supported in part by the Deanship of Scientific Research at King Saud University for funding this work through research group No. RG-1441-331.

Author information

Authors and Affiliations



Jiali Tang and Jie Zhang designed the algorithm and revised the article content. Jiali Tang carried out the experiments and wrote the manuscript. Dan Chen and Najla Al-Nabhan gave the suggestions on the structure of the manuscript and participated in modifying it. Chenrong Huang approved the final manuscript and optimized the English language. The authors read and approved the final manuscript.

Authors’ information

Jiali Tang received the Ph.D. degree in Mechatronic Engineering from Jiangsu University, China, in 2016. He is currently an Associate Professor at the College of Computer Engineering, Jiangsu University of Technology. His research interests include image processing, computer vision, and pattern recognition.

Jie Zhang received the M.S. degree in Computer Applications Technology from Yangzhou University, China, in 2012. He is currently an Associate Professor at the College of Computer Engineering, Jiangsu University of Technology. His research interests include image processing and intelligent information systems.

Dan Chen received the M.S. degree in Computer Application Technology from Nanjing University of Aeronautics and Astronautics, China, in 2009. She is currently a lecturer with the College of Computer Engineering, Jiangsu University of Technology. Her research interests include embedded hardware and intelligent information system.

Najla Al-Nabhan works as an Assistant Professor, at the Computer Science Department (CS), College of Computer and Information Sciences (CCIS), King Saud University (KSU), Riyadh, Saudi Arabia. She received her Ph.D. degree in CS, from KSU. She was also a visiting Ph.D. student at George Washington University. Her research interest includes wireless sensor networks, the Internet of Things, networking and systems, communication modeling, mobile computing, cloud computing, emergency management, and UAVs.

Chenrong Huang is a professor at the Nanjing Institute of Technology. She received the M.S. and Ph.D. degree in Computer Application Technology from the Nanjing University of Science & Technology, China, in 1992 and 2005, respectively. Her current research interests include computer vision and image understanding.

Corresponding author

Correspondence to Chenrong Huang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, J., Zhang, J., Chen, D. et al. Single-frame super-resolution for remote sensing images based on improved deep recursive residual network. J Image Video Proc. 2021, 20 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: