Pansharpening based on convolutional autoencoder and multi-scale guided filter

AL Smadi, Ahmad; Yang, Shuyuan; Kai, Zhang; Mehmood, Atif; Wang, Min; Alsanabani, Ala

doi:10.1186/s13640-021-00565-3

Research
Open access
Published: 19 July 2021

Pansharpening based on convolutional autoencoder and multi-scale guided filter

Ahmad AL Smadi¹,
Shuyuan Yang¹,
Zhang Kai²,
Atif Mehmood¹,
Min Wang³ &
…
Ala Alsanabani¹

EURASIP Journal on Image and Video Processing volume 2021, Article number: 25 (2021) Cite this article

3922 Accesses
5 Citations
Metrics details

Abstract

In this paper, we propose a pansharpening method based on a convolutional autoencoder. The convolutional autoencoder is a sort of convolutional neural network (CNN) and objective to scale down the input dimension and typify image features with high exactness. First, the autoencoder network is trained to reduce the difference between the degraded panchromatic image patches and reconstruction output original panchromatic image patches. The intensity component, which is developed by adaptive intensity-hue-saturation (AIHS), is then delivered into the trained convolutional autoencoder network to generate an enhanced intensity component of the multi-spectral image. The pansharpening is accomplished by improving the panchromatic image from the enhanced intensity component using a multi-scale guided filter; then, the semantic detail is injected into the upsampled multi-spectral image. Real and degraded datasets are utilized for the experiments, which exhibit that the proposed technique has the ability to preserve the high spatial details and high spectral characteristics simultaneously. Furthermore, experimental results demonstrated that the proposed study performs state-of-the-art results in terms of subjective and objective assessments on remote sensing data.

1 Introduction

There are many applications based on remote sensing satellites that require observation of the alterations of the earth, such as image fusion [1–3] and mapping land cover [4]. Given that, pansharpening is one of the essential interests of many scientists. It is difficult that the remote sensing satellites can obtain a panchromatic image (PAN) and a multi-spectral image (MS) with the qualities of both high spatial resolution and high spectral resolution at the same time due to data transmission impediment. However, the main objective of pansharpening is fusing the high spatial resolution PAN image with the corresponding high spectral resolution MS image to acquire high spatial and spectral resolutions for MS image [5].

As indicated by [6–8], a wide assortment of image fusion techniques can be classified into two classes based on the way of extracting a spatial detail from a PAN image: (1) component substitution (CS) and (2) multi-resolution analysis (MRA). And some methods do not belong to these two categories, such as model-based pansharpening method [9, 10]. Among the conventional component substitution-based methods include intensity-hue-saturation (IHS) [11], principal component analysis (PCA) [12], Gram-Schimidt [13], and Brovey transform [14], etc. in which the detail information is extracted by the difference between the PAN image and linear combination of the upsampled MS image; therefore, the component substitution-based methods have a spectral distortion in the fused image. In contrast, the multi-resolution analysis-based methods, such as Smoothing Filter-based Intensity Modulation (SFIM) [15], generalized Laplacian pyramid (MTF-GLP) [16], and indusion [17], extract the detail information by the difference between the PAN image and its low resolution. These methods offer an outstanding spectral resolution, but they suffer from spatial distortion in the fused image. The edge-preserving filtering techniques have drawn an important role in pansharpening. Guided image filter [18] is one of the well-known techniques. Yang et al. [19] introduced multi-scale guided filer based on adaptive intensity-hue-saturation (MSGF); they used the intensity image as a guidance image to enhance the PAN image. In our work, the multi-scale guided filter is used to enhance the semantic detail map by utilizing the enhanced intensity image as a guidance image that is obtained by CAE.

Recently, the use of deep neural networks has been a hot topic in many fields [20–25]. Researchers have started investigating this topic for pansharpening. Scarpa et al. [21] proposed the convolutional neural network-based pansharpening method.

Residual convolutional neural network (RCNN) was utilized to achieve pansharpening [26]. Huang et al. [27] introduced a pansharpening model using deep neural networks (DNN), which utilized the relationship between PAN image patches and MS image patches for training the neural network. More recently, in [28], convolutional autoencoder (CAE)-based multi-spectral image fusion was introduced in which the low-resolution MS images is fed into the trained CAE to generate estimated high-resolution MS images; then, the fusion process is achieved by injecting the detailed map of each image into the corresponding estimated high-resolution MS bands. Inspired by this, we propose a pansharpening technique based on a convolutional autoencoder. First, the convolutional autoencoder is trained from the degraded PAN image patches to generate the original PAN image patches; the AIHS component is then tested on the trained network to obtain enhanced intensity components. Further, the guided filter is employed to enhance the PAN image using the enhanced intensity component. Finally, the experiments are conducted on both real and degraded datasets. We showed that the fusion process of the convolutional autoencoder with a guided filter is capable of preserving the high spatial details and high spectral characteristics simultaneously, which is a start-of-the-art approach on multiple tasks. And our method is also more robust against spectral and spatial distortions.

1.1 Convolutional autoencoder

Autoencoder belongs to unsupervised learning that considers an input image and attempts to reconstruct it back. The convolutional autoencoder is a sort of convolutional neural network that reproduces the input image patches at the output. However, the design of a convolutional autoencoder comprises two fundamental phases, which are the encoding phase and the decoding phase. The encoding phase represents half of the network, and it incorporates convolution and max-pooling layers. In contrast, the decoding phase for recreating the input image patches from the degraded pieces comprises deconvolution and upscaling layers [29].

1.1.1 Encoding phase

A convolution among an input volume I={I₁,⋯,I_D} with D dimension and every convolutional layer is composed of n convolutional filters $F^{(1)}=\left \{F_{1}^{(1)}, \ldots, F_{\mathrm {n}}^{(1)}\right \}$ which is considered to produce m features.

$$ O_{m}=a\left(I * F_{m}^{(1)}+b_{m}^{(1)}\right) \quad m=1,2, \cdots, n $$

(1)

O_m represents the feature maps of the input I, b_m represents the bias, and a denotes an activation function.

1.1.2 Decoding phase

The produced m feature maps considered to be used as input to the decoder, to reconstruct the input image, which is obtained by the consequence of the convolution between O={O_i=1}ⁿ with convolutional filters $F^{(2)}=\left \{F_{1}^{(2)}, \ldots, F_{\mathrm {n}}^{(2)}\right \}$ that estimated as follows:

$$ \tilde{I}=a\left(O * F_{\mathrm{m}}^{(2)}+b_{\mathrm{m}}^{(2)}\right) $$

(2)

Considering that both the output image patches and its input have the same dimension, therefore, it is conceivable to relate I and $\tilde {I}$ using a loss function to update the weights during training, for example, mean square error (MSE).

$$ \mathcal{L}(I, \tilde{I})=\frac{1}{2}\|I-\tilde{I}\|_{2}^{2} $$

(3)

1.2 Adaptive intensity-hue-saturation

The IHS technique belongs to CS-based methods that introduced [30], and it is just appropriate for MS images with three bands [11]. Even though the IHS strategy displays extraordinary spatial quality, it severely experiences spectral distortion. The general formula for generating an intensity component is as follows:

$$ I=\sum_{i=1}^{n} \alpha_{i} M_{i^{\prime}th} $$

(4)

where α_i denotes the weight coefficients, and n represents the number of spectral bands. M_i indicates the i_th band of the upsampled MS band. Therefore, Rahmani et al. [31] AIHS was introduced, in which the optimal weights are obtained by solving the following optimization problem:

$$ \alpha_{i}^{\ast }={\arg }\min \limits_{\mathrm{\alpha}_{\mathrm{i}}}\left \|{PAN- \sum \limits_{\mathrm{i}=1}^{\mathrm{n}} {\mathrm{\alpha}_{\mathrm{i}}\mathbf M_{i^{\prime}th}} }\right \|^{2} $$

(5)

where PAN denotes panchromatic image.

1.3 Guided filter

The guided filter GF was introduced by He et al. [32]. The uses of guided filter have been widely utilized in image processing fields such as detail enhancement and image fusion. The guided filter can maintain a strategic distance from ringing artifacts. The GF depends on a local linear model that is using the guided image gui to filter the input image inp. Therefore, the output image Out can conserve the essential data of the inp and obtain the variation trend of gui at the same time [19]. Mathematically, the guided filter is employed to find a pair of scalar values a_i and b_i that solves the following problem [33]:

$$ \underset{a_{i}, b_{i}}{\operatorname{argmin}} \frac{1}{n}\left\|\mathbf{inp}_{i}-\left(a_{i} \mathbf{gui}_{i}+b_{i}\right)\right\|_{2}^{2}+\zeta\left|a_{i}\right|_{2}^{2} $$

(6)

Here, n denotes to the number of pixels in a squared window w with size (2 r+1) ×(2 r+ 1), and ζ is a small regularization constant that prevents large a_i.

$$ a_{i}=\frac{\frac{1}{n}\left(\mathbf{inp}_{i}-\bar{\mathbf{inp}}_{i}\right)^{\mathrm{T}}\left(\mathbf{\mathbf{gui}}_{i}-\bar{\mathbf{gui}}_{i}\right)}{\frac{1}{n}\left(\mathbf{gui}_{i}-\bar{\mathbf{gui}}_{i}\right)^{\mathrm{T}}\left(\mathbf{gui}_{i}-\bar{\mathbf{gui}}_{i}\right)+\mathrm{\zeta}} $$

(7)

$$ =\frac{\operatorname{cov}\left(\mathbf{inp}_{i}-\bar{\mathbf{inp}}_{i}, \mathbf{gui}_{i}-\bar{\mathbf{gui}}_{i}\right)}{\operatorname{var}\left(\mathbf{gui}_{i}-\bar{\mathbf{gui}}_{i}\right)+\mathrm{\zeta}} $$

(8)

$$ b_{i}=\bar{\mathbf{inp}}_{i}-a_{i} \bar{\mathbf{gui}}_{i} $$

(9)

Here, $\bar {\mathbf {inp}}_{i}$ and $\bar {\mathbf {gui}}_{i}$ represent the input image mean and the guidance image mean, respectively. Thus, after computing a_i; b_i for all windows in the image, the filtering output is computed as follows:

$$ \mathbf{Out}_{i}=\bar{a}_{i} \mathbf{inp}_{i}+\bar{b}_{i} $$

(10)

The following equation represented the guided filter operation in this paper:

$$ \mathbf{Out}=\mathbf{GF} (\mathbf{gui}, \mathbf{inp}) $$

(11)

2 Methodology

In this paper, we propose a pansharpening technique based on a convolutional autoencoder and CS-based method. First, we highlight the steps for building our technology are:

Utilize the convolutional autoencoder to enhance to enhance the intensity component which is obtained by AIHS from MS and PAN images. And the spatial resolution enhancement of the degraded PAN image is used the to train the model.
Generate the intensity component of the MS image by utilizing AIHS-based method, which is then fed to trained convolutional autoencoder considering this as a testing step.
Utilize the estimated intensity component to enhance the PAN image by using the guided filter.
The fusion step represents the last phase of the proposed technique. However, it will be explained in detail later.

Figure 1 illustrates the schematic of the proposed method.

2.1 Enhancing the spatial detail

To enhance the spatial detail of the intensity component, we utilize the convolutional autoencoder network in which the relationship between PAN image patches and its degraded form is learned. Note that the degraded PAN image is generated using bi-cubic interpolation. The convolutional autoencoder is used to minimize the difference between input image patches and reconstruction output original image patches. Figure 2 illustrates the applied structure of the convolutional autoencoder.

According to [28], the same description of the training network would apply here: the PAN image and its spatially degraded image are partitioned into 8 ×8 patches with 5 overlapping pixels that include 500,000 patch pairs, 30 epochs for training, considering that the relationship between PAN image patches and its degraded image patches is learned by the training network. The following equation illustrates the output patches of the convolutional autoencoder network at each iteration:

$$ \left\{\tilde{P}_{\mathrm{i}}\right\}_{\mathrm{i}=1}^{\mathrm{n}}=\text{Dec}\left(\text{Enc}\left(\left\{P_{\mathrm{i}}^{\mathrm{L}}\right\}_{\mathrm{i}=1}^{\mathrm{n}}\right)\right) $$

(12)

where $\left \{\tilde {P}_{\mathrm {i}}\right \}_{\mathrm {i}=1}^{\mathrm {n}},\left \{P_{\mathrm {i}}^{\mathrm {L}}\right \}_{\mathrm {i}=1}^{\mathrm {n}}$ represent the output and input patches, respectively. Enc and Dec indicated the encoding and decoding processes, respectively. The encoding process involves several layers starting with (1) the input image patch 8 ×8; (2) the Conv2D layer that indicates a 2D convolutional layer with 16 filters 3 ×3 kernel size, activation “ReLU” and padding “same”; the “ReLU” activation is used due to its simplicity and computation efficiency compared to other activation functions [34]. (3) MAX-Pooling layer that indicates a 2D max-pooling 2 ×2 region with padding “same”; (4) Conv2D layer with 8 filters 3 ×3 kernel size, activation “ReLU” and padding “same”; (5) Max-Pooling 2 ×2 region with padding “same”; and (6) Conv2D layer with 8 filters 3 ×3 kernel size, activation “ReLU” and padding “same”. The CAEs are fully convolutional networks; thus, the decoding process is including a convolution. The decoding process involves several layers starting with (1) the Conv2D layer that indicates a 2D convolutional layer with 8 filters 3 ×3 kernel size, activation “ReLU” and padding ‘same’; (2) the UpSampling layer that indicates a 2D UpSampling 2 ×2 region; (3) the Conv2D layer with 8 filters 3 ×3 kernel size, activation “ReLU” and padding “same”; (4) UpSampling 2 ×2 region; (5) the Conv2D layer with 16 filters 3 ×3 kernel size, activation “ReLU” and padding “same”; and (6) the Conv2D layer with 1 filter 3 ×3 kernel size, activation “linear” and padding “same”. Thus, Adadelta optimization is used throughout training, and the MSE between the reconstructed output patches and the target patches $\left \{P_{\mathrm {i}}^{\mathrm {H}}\right \}_{\mathrm {i}=1}^{\mathrm {n}}$ is used for updating the weights as follows:

$$ \mathcal{L}\left(\left\{\tilde{P}_{\mathrm{i}}\right\}_{\mathrm{i}=1}^{\mathrm{n}},\left\{P_{\mathrm{i}}^{\mathrm{H}}\right\}_{\mathrm{i}=1}^{\mathrm{n}}\right) =\frac{1}{2} \sum_{i=1}^{\mathrm{n}}\left\|\tilde{P}_{\mathrm{i}}-P_{\mathrm{i}}^{\mathrm{H}}\right\|_{2}^{2} $$

(13)

After updating the weights, the back-propagation algorithm is utilized for training the convolutional autoencoder network. In the stage of testing, because of similar characteristics between the PAN and the corresponding intensity component of the MS image, the trained network is relied upon to improve the intensity component of MS image; firstly, the intensity component I which is generated by Eq. (5) is partitioned $\left \{I_{\mathrm {i}}\right \}_{\mathrm {i}=1}^{\mathrm {n}}$ and is then fed to the trained network for generating an estimated intensity component$\left \{E_{I_{i}}\right \}_{\mathrm {i}=1}^{\mathrm {n}}$. Thus, the $\left \{E_{I_{i}}\right \}_{\mathrm {i}=1}^{\mathrm {n}}$ is being tiled.

2.2 Fusion process

The estimated intensity component E_I is employed to enhance the PAN image by using the two-scale guided filter. Firstly, the E_I is being used as the guidance image and the PAN image as the input image.

$$ O_{1}=\mathbf{GF} (E_{I}, PAN) $$

(14)

The difference between the approximation image O₁ and the input image E_I is represented by the spatial detail D₁. Hence, D₁ will blend with low-frequency component and may cause serious spectral distortion [35]; therefore, D₁ is then utilized as the input image for the second scale of guided filter O₂.

$$ D_{1}=PAN - O_{1} $$

(15)

$$ O_{2}=\mathbf{GF} (E_{I}, O_{1}) $$

(16)

The difference between O₁ and O₂ is represented by the spatial detail D₂.

$$ D_{2}=O_{1} - O_{2} $$

(17)

The total semantic map D_Total is injected into the upsampled MS image through injection gains g_i which are adjusted by (19).

$$ D_{Total}=D_{1} + D_{2} $$

(18)

$$\mathrm{g}_{\mathrm{i}}=\frac{\operatorname{cov}\left(MS_{i},E_{I}\right)}{\operatorname{var}(E_{I})} $$

(19)

The high-resolution multi-spectral (HRMS) fused image is conducted by the following equation:

$$ \mathbf{HRMS}=MS_{\mathrm{i}}+\mathrm{g}_{\mathrm{i}}D_{Total} $$

(20)

3 Results and discussion

In this section, several experiments were performed on different datasets to evaluate the performance of the model based on some quality metrics. Here, 8×8 patches with 5 overlapping pixels of the degraded PAN and the original PAN images that include 500,000 patch pairs were utilized for training the network. In total, six datasets have been selected for implementation purposes. Three degraded datasets (full reference), which means the reference image is available, and three real datasets (no reference image), namely QuickBird and GeoEye.

Therefore, we compared our technique with several conventional efficient pansharpening methods, such as IHS [11], PCA [12], BDSD [36], PRACS [37], and AIHS [31], and several state-of-the-art methods such as SFIM [15], MTF-GLP [16], Indusion [17], MSGF [19], CAE [28], and PNN [38]. Moreover, seven image quality indexes are broadly utilized, to assess the quality of the fused image, which are:

1
Correlation coefficient (CC) [39]
2
Universal Image Quality Index (UIQI) [40]
3
Quaternion Theory-based Quality Index (Q4) [40]
4
Root mean square error (RMSE) [41]
5
Relative average spectral error (RASE) [42]
6
Spectral Angle Mapper (SAM) [43]
7
Erreur Relative Globale Adimensionnelle de Synthese (ERGAS) [44]

To assess the quality of the fused images concerning real datasets, D_s,D_λ, and QNR [45] were employed. The ideal value of each quality index is shown in parentheses in the tables.

3.1 Parameter investigation

Here, we study the influence of parameter setting in the guided filter on the fusion simulation of degraded QuickBird-1 dataset, namely, window size r and the regularization parameter ζ. Figures 3, 4, and 5 illustrate the influence of these parameters, where the horizontal axis is the regularization parameter ζ concerning three cases of window size r and the vertical axis is quality index results. Therefore, as can be seen, the best performance results originated from setting the parameters r and ζ at 8 and 0.8², respectively.

3.2 Fusion results of degraded datasets (full reference)

In this section, the simulations were carried out on degraded datasets that have the reference image to evaluate our proposed method according to Wald’s protocol [46]. Regarding the degraded datasets (QuickBird, GeoEye), the sizes of the MS image and the PAN image are 64 ×64 and 256 ×256, respectively. The descriptions of the experimental datasets are shown in Table 1.

Table 1 Descriptions of the experimental datasets

Full size table

3.2.1 Experiments on degraded QuickBird datasets

In this section, two pairs of QuickBird satellite datasets were examined; Fig. 6 illustrates the fusion results of the degraded QuickBird-1 dataset. For better comparison, the red square area is enlarged and then displayed at the bottom left of the fusion image. As can be observed, Fig. 6d–j methods have more inferior pansharpening results than CAE and proposed methods.

Figure 6i–j suffer from spatial distortion. Figure 6m suffers from spatial and spectral distortions. The fusion result of the PNN method is depicted in Fig. 6n, which produces some unnatural color compared with the reference image. Furthermore, Fig. 6l CAE and proposed method Fig. 6o look most similar to the reference image Fig. 6a, but the proposed method performs better in terms of spectral and spatial fidelity. Similar observations can be made regarding the experimental results from the QuickBird-2 dataset. Figure 7 displays the fusion results of the degraded QuickBird-2 dataset. For better visual comparison, the red rectangle area is enlarged and then displayed at the bottom of the selected area; thus, the proposed and CAE methods have performed better visual effects.

In terms of objective evaluation, the numerical indexes of fused images for Figs. 6 and 7 are computed and reported in Tables 2 and 3, respectively. From both tables, it is clear that our method can contribute to the best values in terms of quality indexes.

Table 2 Numerical results of the full reference QuickBird-1 dataset

Full size table

Table 3 Numerical results of the full reference QuickBird-2 dataset

Full size table

3.2.2 Experiment on degraded GeoEye dataset

Figure 8 displays the fusion results of the degraded GeoEye-1 dataset. The red square area is enlarged and then displayed at the bottom left of the fusion image. As shown in Fig. 8f, PCA produced seedy color in the fused image, and Fig. 8f–h suffer from the spectral distortion. Here, it can be seen that the SFIM, Indusion, and MTF-GLP methods perform well, as shown in Fig. 8i–k. We can also observe from Fig. 8l that the result of the CAE method has a color problem at the vegetation area compared with the reference image. The colors of the fusion image for MSGF and PNN methods have remarkable distortion, as shown in Fig. 8m, n. Overall, the proposed method created the fused image, with appropriate spectral and spatial resolution, as shown in Fig. 8o compared with others.

The numerical indexes of fused images for Fig. 8 are computed and reported in Table 4. From the table, it is clear that our method can contribute to the best values in the most quality indexes.

Table 4 Numerical results of the full reference GeoEye-1 dataset

Full size table

3.3 Fusion results of real datasets (no reference)

Regarding real datasets, two kinds of real datasets (QuickBird, GeoEye) were implemented, and the sizes of the MS image and the PAN image are 256 ×256 and 1024 ×1024, respectively.

3.3.1 Experiments on real QuickBird datasets

Two pairs of real QuickBird satellite datasets were examined; for better visual comparison, the red square area is enlarged and then displayed at the bottom left of the fusion image. Figure 9 displays the fusion results of real QuickBird-1 dataset.

The fusion results of all methods improved, but the CS-based method and CAE method suffer from spectral distortion, as shown in Fig. 9c, e, and k. The BDSD fusion method has remarkable distortions. For SFIM, Indusion, and MTF-GLP methods, they can achieve relatively better results regarding spectral resolution than others, as shown in Fig. 9h–j. The MSGF method suffers from spatial distortion, as shown in Fig. 9l, and the colors of the fusion image for the PNN method have remarkable distortions. However, the fusion result of the proposed method can perform better than others, as shown in Fig. 9o. Similarly, the observations can be done regarding the experimental results from the real QuickBird-2 dataset. Figure 10 displays the fusion results of the real QuickBird-2 dataset. The CS-based methods suffer from spectral distortion, as shown in Fig. 10c, e. The BDSD fusion method has remarkable distortions as shown in Fig. 10e. The CAE method can achieve well concerning the spatial aspect but still has a lighter color in the vegetation area compared with the upsampled MS image, as shown in Fig. 10k.

The fusion results of SFIM, Indusion, MTF-GLP, MSGF, PNN, and proposed methods improved in both aspects of spectral and spatial.

The numerical measurements of real data fused images for Figs. 9 and 10 are computed and listed in Tables 5 and 6, respectively.

Table 5 Numerical results of the real QuickBird-1 dataset

Full size table

Table 6 Numerical results of the real QuickBird-2 dataset

Full size table

Table 5 illustrates the proposed method performed the best value in terms of D_λ and D_s. Thus, our method showed the best value in terms of D_λ and QNR, as reported in Table 6.

3.3.2 Experiment on real GeoEye dataset

Figure 11 displays the fusion results of the real GeoEye-1 dataset. The selected red square area is enlarged and then displayed at the bottom right of the fusion image for better visual comparison. As shown in Fig. 11c–e, these methods can perform well regarding spatial aspect but suffer from spectral distortion, and Fig. 11f–i and l, suffer from notable spectral and spatial distortion. Here, it can be seen that the MTF-GLP, CAE, and proposed methods perform well, as shown in Fig. 11j, k, and o.

Overall, the proposed method created the fused image, with appropriate spectral and spatial resolution.

The numerical indexes of fused images for Fig. 11 are computed and reported in Table 7. From Table 7, the PNN method can perform the best value in terms of D_λ, followed by our method. Overall, our method can still contribute to the best values concerning quality indexes.

Table 7 Numerical results of the real GeoEye-1 dataset

Full size table

4 Conclusion

In this paper, we have proposed a pansharpening technique based on a convolutional autoencoder with AIHS and a multi-scale guided filter. The proposed method first trained the convolutional autoencoder to learn the relationship between the panchromatic image and its degraded version. The trained network is used to enhance the intensity component. Furthermore, the multi-scale guided filter is used to enhance the original panchromatic image. Several experiments were conducted, and the article has put in place the results of the experiment. The outcomes of this research are, first, in terms of visual aspect, the proposed method includes more of the spectral detail of the MS image and spatial detail of the panchromatic image than existing fusion methods. Second, the quality indexes of our method show significant enhancements compared with comparative methods. Overall, the model developed in this research was able to preserve appropriate spatial and spectral aspects of fusion image compared with comparative methods in both aspects, subjective and objective evaluations.

Availability of data and materials

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

PAN:: Panchromatic image
MS:: Multi-spectral image
CNN:: Convolutional neural network
AIHS:: Adaptive intensity-hue-saturation
CS:: Component substitution
MRA:: Multi-resolution analysis
PCA:: Principal component analysis
GS:: Gram-Schimidt
BT:: Brovey transform
SFIM:: Smoothing Filter-based Intensity Modulation
MTF-GLP:: Generalized Laplacian pyramid
MSGF:: Multi-scale guided filer
RCNN:: Residual convolutional neural network
CAE:: Convolutional autoencoder
GF:: Guided filter
BDSD:: Band-dependent spatial-detail
PRACS:: Partial replacement adaptive CS
PNN:: Pansharpening by convolutional neural networks. CC: Correlation coefficient
UIQI:: Universal Image Quality Index
RMSE:: Root mean square error
RASE:: Relative average spectral error
SAM:: Spectral Angle Mapper
ERGAS:: Erreur Relative Globale Adimensionnelle de Synthese. D_s: Spatial distortion
D _λ :: Spectral distortion
QNR:: Quality with no reference

References

K. Zhang, M. Wang, S. Yang, L. Jiao, Spatial–spectral-graph-regularized low-rank tensor decomposition for multispectral and hyperspectral image fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.11(4), 1030–1040 (2018).
Article Google Scholar
A. Al Smadi, A. Abugabah, in Proceedings of the 2018 the 2nd International Conference on Video and Image Processing. Intelligent information systems and image processing: a novel pan-sharpening technique based on multiscale decomposition, (2018), pp. 208–212.
F. Zhang, K. Zhang, Superpixel guided structure sparsity for multispectral and hyperspectral image fusion over couple dictionary. Multimedia Tools Appl.79(7), 4949–4964 (2020).
Article Google Scholar
J. Xu, H. Zhao, P. Yin, D. Jia, G. Li, Remote sensing classification method of vegetation dynamics based on time series Landsat image: a case of opencast mining area in China. EURASIP J. Image Video Process.2018(1), 113 (2018).
Article Google Scholar
A. Alsmadi, S. Yang, K. Zhang, Pansharpening via deep guided filtering network. Int. J. Image Process. Vis. Commun.5:, 1–8 (2018).
Google Scholar
G. Vivone, L. Alparone, J. Chanussot, M. Dalla Mura, A. Garzelli, G. A. Licciardi, R. Restaino, L. Wald, A critical comparison among pansharpening algorithms. IEEE Trans. Geosci. Remote Sens.53(5), 2565–2586 (2014).
Article Google Scholar
L. Alparone, L. Wald, J. Chanussot, C. Thomas, P. Gamba, L. M. Bruce, Comparison of pansharpening algorithms: outcome of the 2006 GRS-S data-fusion contest. IEEE Trans. Geosci. Remote Sens.45(10), 3012–3021 (2007).
Article Google Scholar
A. Mookambiga, V. Gomathi, Comprehensive review on fusion techniques for spatial information enhancement in hyperspectral imagery. Multidim. Syst. Sign. Process.27(4), 863–889 (2016).
Article MathSciNet MATH Google Scholar
F. Palsson, J. R. Sveinsson, M. O. Ulfarsson, J. A. Benediktsson, in 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). Model based pansharpening method based on TV and MTF deblurring (IEEE, 2015), pp. 33–36.
W. Li, Y. Li, Q. Hu, L. Zhang, Model-based variational pansharpening method with fast generalized intensity–hue–saturation. J. Appl. Remote. Sens.13(3), 036513 (2019).
Article Google Scholar
T. -M. Tu, S. -C. Su, H. -C. Shyu, P. S. Huang, A new look at IHS-like image fusion methods. Inf. Fusion. 2(3), 177–186 (2001).
Article Google Scholar
P. Kwarteng, A. Chavez, Extracting spectral contrast in Landsat Thematic Mapper image data using selective principal component analysis. Photogramm. Eng. Remote Sens.55(1), 339–348 (1989).
Google Scholar
B. Aiazzi, S. Baronti, M. Selva, Improving component substitution pansharpening through multivariate regression of ms + pan data. IEEE Trans. Geosci. Remote Sens.45(10), 3230–3239 (2007).
Article Google Scholar
A. R. Gillespie, A. B. Kahle, R. E. Walker, Color enhancement of highly correlated images. II. Channel ratio and “chromaticity” transformation techniques. Remote Sens. Environ.22(3), 343–365 (1987).
Article Google Scholar
J. Liu, Smoothing filter-based intensity modulation: a spectral preserve image fusion technique for improving spatial details. Int. J. Remote Sens.21(18), 3461–3472 (2000).
Article Google Scholar
B. Aiazzi, L. Alparone, S. Baronti, A. Garzelli, M. Selva, MTF-tailored multiscale fusion of high-resolution MS and Pan imagery. Photogramm. Eng. Remote Sens.72(5), 591–596 (2006).
Article Google Scholar
M. M. Khan, J. Chanussot, L. Condat, A. Montanvert, Indusion: fusion of multispectral and panchromatic images using the induction scaling technique. IEEE Geosci. Remote Sens. Lett.5(1), 98–102 (2008).
Article Google Scholar
K. He, J. Sun, X. Tang, Guided image filtering. IEEE Trans. Pattern. Anal. Mach. Intell.35(6), 1397–1409 (2012).
Article Google Scholar
Y. Yang, W. Wan, S. Huang, F. Yuan, S. Yang, Y. Que, Remote sensing image fusion based on adaptive IHS and multiscale guided filter. IEEE Access. 4:, 4573–4582 (2016).
Article Google Scholar
W. Shi, S. Liu, F. Jiang, D. Zhao, Z. Tian, Anchored neighborhood deep network for single-image super-resolution. EURASIP J. Image Video Process.2018(1), 34 (2018).
Article Google Scholar
G. Scarpa, S. Vitale, D. Cozzolino, Target-adaptive CNN-based pansharpening. IEEE Trans. Geosci. Remote Sens.56(9), 5443–5457 (2018).
Article Google Scholar
S. Huang, J. Wu, Y. Yang, P. Lin, Multi-frame image super-resolution reconstruction based on spatial information weighted fields of experts. Multidim. Syst. Sign. Process.31(1), 1–20 (2020).
Article MathSciNet MATH Google Scholar
S. Baghersalimi, B. Bozorgtabar, P. Schmid-Saugeon, H. K. Ekenel, J. -P. Thiran, Dermonet: densely linked convolutional neural network for efficient skin lesion segmentation. EURASIP J. Image Video Process.2019(1), 71 (2019).
Article Google Scholar
A. Mehmood, M. Maqsood, M. Bashir, Y. Shuyuan, A deep Siamese convolution neural network for multi-class classification of Alzheimer disease. Brain Sci.10(2), 84 (2020).
Article Google Scholar
Y. Wang, H. Bai, L. Zhao, Y. Zhao, Cascaded reconstruction network for compressive image sensing. EURASIP J. Image Video Process.2018(1), 77 (2018).
Article Google Scholar
Y. Rao, L. He, J. Zhu, in 2017 International Workshop on Remote Sensing with Intelligent Processing (RSIP). A residual convolutional neural network for pan-shaprening (IEEE, 2017), pp. 1–4.
W. Huang, L. Xiao, Z. Wei, H. Liu, S. Tang, A new pan-sharpening method with deep neural networks. IEEE Geosci. Remote Sens. Lett.12(5), 1037–1041 (2015).
Article Google Scholar
A. Azarang, H. E. Manoochehri, N. Kehtarnavaz, Convolutional autoencoder-based multispectral image fusion. IEEE Access. 7:, 35673–35683 (2019).
Article Google Scholar
S. Dolgikh, Spontaneous concept learning with deep autoencoder. Int. J. Comput. Intell. Syst.12(1), 1–12 (2018).
Article Google Scholar
W. CARPER, T. LILLESAND, R. KIEFER, The use of intensity-hue-saturation transformations for merging spot panchromatic and multispectral image data. Photogramm. Eng. Remote Sens.56(4), 459–467 (1990).
Google Scholar
S. Rahmani, M. Strait, D. Merkurjev, M. Moeller, T. Wittman, An adaptive IHS pan-sharpening method. IEEE Geosci. Remote Sens. Lett.7(4), 746–750 (2010).
Article Google Scholar
K. He, J. Sun, X. Tang, in European Conference on Computer Vision. Guided image filtering (Springer, 2010), pp. 1–14.
C. N. Ochotorena, Y. Yamashita, Anisotropic guided filtering. IEEE Trans. Image Process.29:, 1397–1412 (2019).
Article MathSciNet Google Scholar
Y. Bengio, I. Goodfellow, A. Courville, Deep Learning, vol. 1 (MIT Press, Massachusetts, USA, 2017).
MATH Google Scholar
Y. Song, W. Wu, Z. Liu, X. Yang, K. Liu, W. Lu, An adaptive pansharpening method by using weighted least squares filter. IEEE Geosci. Remote Sens. Lett.13(1), 18–22 (2015).
Article Google Scholar
A. Garzelli, F. Nencini, L. Capobianco, Optimal MMSE pan sharpening of very high resolution multispectral images. IEEE Trans. Geosci. Remote Sens.46(1), 228–236 (2007).
Article Google Scholar
J. Choi, K. Yu, Y. Kim, A new adaptive component-substitution-based satellite image fusion by using partial replacement. IEEE Trans. Geosci. Remote Sens.49(1), 295–309 (2010).
Article Google Scholar
G. Masi, D. Cozzolino, L. Verdoliva, G. Scarpa, Pansharpening by convolutional neural networks. Remote Sens.8(7), 594 (2016).
Article Google Scholar
M. Imani, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.11(12), 4994–5004 (2018).
Z. Wang, A. C. Bovik, A universal image quality index. IEEE Signal Process. Lett.9(3), 81–84 (2002).
Article Google Scholar
P. Jagalingam, A. V. Hegde, A review of quality metrics for fused image. Aquat. Procedia. 4:, 133–142 (2015).
Article Google Scholar
P. Mhangara, W. Mapurisa, N. Mudau, Comparison of image fusion techniques using satellite pour l’Observation de la Terre (SPOT) 6 satellite imagery. Appl. Sci.10(5), 1881 (2020).
Article Google Scholar
G. P. Petropoulos, K. P. Vadrevu, C. Kalaitzidis, Spectral angle mapper and object-based classification combined with hyperspectral remote sensing imagery for obtaining land use/cover mapping in a Mediterranean region. Geocarto Int.28(2), 114–129 (2013).
Article Google Scholar
F. Palsson, J. R. Sveinsson, M. O. Ulfarsson, J. A. Benediktsson, Quantitative quality evaluation of pansharpened imagery: consistency versus synthesis. IEEE Trans. Geosci. Remote Sens.54(3), 1247–1259 (2015).
Article Google Scholar
L. Alparone, B. Aiazzi, S. Baronti, A. Garzelli, F. Nencini, M. Selva, Multispectral and panchromatic data fusion assessment without reference. Photogramm. Eng. Remote Sens.74(2), 193–200 (2008).
Article Google Scholar
T. Ranchin, B. Aiazzi, L. Alparone, S. Baronti, L. Wald, Image fusion–the arsis concept and some successful implementation schemes. ISPRS J. Photogramm. Remote. Sens.58(1-2), 4–18 (2003).
Article Google Scholar

Download references

Acknowledgements

No other acknowledgments.

Funding

This work was supported by the National Natural Science Foundation of China (Nos. 61771380, 61906145, U1730109, 91438103, 61771376, 61703328, 91438201, U1701267, 61703328), the Equipment pre-research project of the 13th Five-Years Plan (Nos. 6140137050206, 414120101026, 6140312010103, 6141A020223, 6141B06160301, 6141B07090102), the Major Research Plan in Shaanxi Province of China (Nos. 2017ZDXM-GY-103,017ZDCXL-GY-03-02), the Foundation of the State Key Laboratory of CEMEE (Nos. 2017K0202B, 2018K0101B 2019K0203B, 2019Z0101B), and the Science Basis Research Program in Shaanxi Province of China (Nos. 16JK1823, 2017JM6086, 2019JQ-663).

Author information

Authors and Affiliations

School of Artificial Intelligence, Xidian University, No. 2 South Taibai Road, Xian, 710071, China
Ahmad AL Smadi, Shuyuan Yang, Atif Mehmood & Ala Alsanabani
School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, Shandong, China
Zhang Kai
Key Laboratory of Radar Signal Processing, Xidian University, No. 2 South Taibai Road, Xian, China
Min Wang

Authors

Ahmad AL Smadi
View author publications
You can also search for this author in PubMed Google Scholar
Shuyuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhang Kai
View author publications
You can also search for this author in PubMed Google Scholar
Atif Mehmood
View author publications
You can also search for this author in PubMed Google Scholar
Min Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ala Alsanabani
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AAL and YS conceptualized and carried out the implementation; AAL, KZ, and AM wrote and reviewed the paper; AS and MW were in charge of the overall research and contributed to the paper writing; YS contributed to funding acquisition. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Shuyuan Yang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

AL Smadi, A., Yang, S., Kai, Z. et al. Pansharpening based on convolutional autoencoder and multi-scale guided filter. J Image Video Proc. 2021, 25 (2021). https://doi.org/10.1186/s13640-021-00565-3

Download citation

Received: 20 October 2020
Accepted: 10 June 2021
Published: 19 July 2021
DOI: https://doi.org/10.1186/s13640-021-00565-3

Pansharpening based on convolutional autoencoder and multi-scale guided filter

Abstract

1 Introduction

1.1 Convolutional autoencoder

1.1.1 Encoding phase

1.1.2 Decoding phase

1.2 Adaptive intensity-hue-saturation

1.3 Guided filter

2 Methodology

2.1 Enhancing the spatial detail

2.2 Fusion process

3 Results and discussion

3.1 Parameter investigation

3.2 Fusion results of degraded datasets (full reference)

3.2.1 Experiments on degraded QuickBird datasets

3.2.2 Experiment on degraded GeoEye dataset

3.3 Fusion results of real datasets (no reference)

3.3.1 Experiments on real QuickBird datasets

3.3.2 Experiment on real GeoEye dataset

4 Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords