Cascaded reconstruction network for compressive image sensing

The theory of compressed sensing (CS) has been successfully applied to image compression in the past few years, whose traditional iterative reconstruction algorithm is time-consuming. Fortunately, it has been reported deep learning-based CS reconstruction algorithms could greatly reduce the computational complexity. In this paper, we propose two efficient structures of cascaded reconstruction networks corresponding to two different sampling methods in CS process. The first reconstruction network is a compatibly sampling reconstruction network (CSRNet), which recovers an image from its compressively sensed measurement sampled by a traditional random matrix. In CSRNet, deep reconstruction network module obtains an initial image with acceptable quality, which can be further improved by residual reconstruction network module based on convolutional neural network. The second reconstruction network is adaptively sampling reconstruction network (ASRNet), by matching automatically sampling module with corresponding residual reconstruction module. The experimental results have shown that the proposed two reconstruction networks outperform several state-of-the-art compressive sensing reconstruction algorithms. Meanwhile, the proposed ASRNet can achieve more than 1 dB gain, as compared with the CSRNet.


Introduction
In the traditional Nyquist sampling theory, the sampling rate must be at least twice of the signal bandwidth in order to reconstruct the original signal losslessly. On the contrary, compressive sensing (CS) theory is a signal acquisition paradigm, which can sample a signal at sub-Nyquist rates but realize the high-quality recovery [1]. Later, Gan et al. proposed block compresses sensing to reduce the algorithm's computational complexity to avoid directly applying CS on images with large size [2]. Due to CS's excellent performance on sampling, CS has already been widely used in a great deal of fields, such as communication, signal processing, etc.
In the past decades, CS theory has advanced considerably, especially in the development of reconstruction algorithms [3][4][5][6][7][8][9][10]. Compressive sensing reconstruction aims to recover the original signal x ∈ R n×1 from the compressive sensing measurement y ∈ R m×1 (m n). The CS measurement is obtained by y = x, where ∈ R m×n is a *Correspondence: hhbai@bjtu.edu.cn Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China CS measurement matrix. The process of reconstruction is highly ill-posed, because there exist more than one solutions x ∈ R n×1 that can generate the same CS measurement y. To solve this problem, the early reconstruction algorithms always assume the original image signal has the property of l p -norm sparsity. Based on this assumption, several iterative reconstruction algorithms have been explored, such as orthogonal matching pursuit (OMP) [3] and approximate message passing(AMP) [4]. Distinctively, the extension of the AMP, denoising-based AMP (D-AMP) [5], employs denoising algorithms for CS recovery and can get a high performance for nature images. Furthermore, many works incorporate prior knowledge of the original image signals, such as total variation sparsity prior [6] and KSVD [7], into CS recovery framework, which can improve the CS reconstruction performance. Particularly, TVAL3 [8] combines augmented Lagrangian method with total variation regularization, which is also a perfect CS image reconstruction algorithm. However, almost all these reconstruction algorithms require to solve an optimization problem. Most of those algorithms need hundreds of iterations, which inevitably leads to high In recent years, some deep learning-based methods have been introduced into the low-level problems and get excellent performance, such as image super-resolution [11,12], image artifact removal [13], and CS image reconstruction [14][15][16][17]. Recently, some deep network-based algorithms for CS image reconstruction have been proposed. ReconNet is proposed in [14], which takes CS measurement of image patch as input and outputs its corresponding image reconstruction. Especially, for patchbased CS measurement, ReconNet, inspired of SRCNN [11], can retain rich semantic contents at low measurement rate as compared to the traditional methods. In [15], a framework is proposed to recover images from CS measurements without the need to divide images into small blocks, but there is no competitive advantage for the performance of the reconstruction compared with other algorithms. In [16,17], the residual convolutional neural network is introduced in the image reconstruction for compressive sensing, which can preserve some information in previous layers and also can improve the convergence rate and accelerate the training process. Different from the optimization-based CS recovery methods, the neural network-based methods often directly learn the inverse mapping from the CS measurement domain to original image domain. As a result, it effectively avoids expensive computation and achieves a promising image reconstruction performance.
In this paper, two different cascaded reconstruction networks are proposed to meet different sampling methods. Firstly, we propose a compatibly sampling reconstruction network (CSRNet), which is employed to reconstruct high-quality images from compressively sensed measurements sampled by a random sampling matrix. In CSRNet, deep reconstruction network module can obtain an initial image with acceptable quality, which can be further improved by residual network module based on convolutional neural network. Secondly, in order to improve the sampling efficiency of CS, an automatically sampling module is designed, which has a fully connected layer to learn a sampling matrix automatically. In addition, the residual reconstruction module is presented, which can match the sampling module. Both the sampling module and its matching residual reconstruction module form a complete compressive sensing image reconstruction network, named ASRNet. As compared with CSRNet, ASRNet can achieve more than 1 dB gain. The experimental results demonstrate the proposed networks outperform several state-of-the-art iterative reconstruction algorithms and deep-learning-based approaches in objective and subjective quality.
The rest of this paper is organized as follows. In Section 2, two novel networks are proposed for different sampling methods. In Section 3, the performance of the proposed networks is examined. We conclude the paper in Section 4.

The methods of proposed networks
In this section, we describe the proposed two networks CSRNet in Fig. 1 and ASRNet in Fig. 4. The first network, CSRNet, is designed to reconstruct image from the CS measurement sampled by a random matrix. The second one is a complete compressive sensing image reconstruction network, ASRNet, consisting of both sampling and reconstruction module. Here, our sampling module contains only one fully-connected layer (FC), which is more powerful to imitate traditional Block-CS sampling process.

CSRNet
Our proposed CSRNet consists of three modules, initial reconstruction module, deep reconstruction module, and residual reconstruction module. The initial reconstruction module takes the CS measurement y as input and outputs a B×B-sized preliminary reconstructed image. As shown in the Fig. 1, the deep reconstruction module takes the preliminary reconstructed image as input and outputs a same-sized image. The deep reconstruction module contains three convolutional layers, shown in Fig. 2. The first layer generates 64 feature map with 11×11 kernel. The second layer uses kernel of size 1 × 1 and generates 32 feature maps. And the third layer produces one feature map with 7 × 7 kernel, which is the output of this module. All the convolutional layers have the same stride of 1, without pooling operation, and appropriate zero padding is used to keep the feature map size constant in all layers. Each convolutional layer is followed by a ReLU layer except the last convolutional layer. Here, deep reconstruction network module can obtain an initial image with acceptable quality, which is more suitable to residual network module than cascaded residual network module [16]. The residual reconstruction network has the similar architecture as the deep reconstruction network, shown in Fig. 3, which learns the residual information between the input data and the ground truth. In our model, we set B = 32.
In order to train our CSRNet, we need CS measurements corresponding to each of the extracted patches. For a given measurement rate, we construct a measurement matrix, B , by first generating a random Gaussian matrix of appropriate size, followed by orthonormalizing its rows. Then, we apply y i = B × x ver−i to obtain the set of CS measurements, where x ver−i is the vectorized version of an image patch x i . Thus, an input-label pair in the training set can be represented as The loss function is the average reconstruction error over all the training image blocks, given by where N is the total number of image patches in the training dataset, x i is the ith patch, and y i is the corresponding CS measurement. The initial reconstruction mapping, the deep reconstruction mapping, and the residual reconstruction mapping are represented as f 1 , f 2 , and f 3 respectively. In addition, {W 1 , W 2 , W 3 } are the network parameters which can be obtained in the training.

ASRNet
Our proposed ASRNet contains three modules, sampling module, initial reconstruction module, and residual reconstruction module, as shown in Fig. 4. In the sampling  module, we use a fully connected layer to imitate the traditional compressed sampling process. And the process of compressed sampling is expressed as y i = B x i in traditional Block-CS. If the image is divided into B×B blocks, the input of the fully connected layer is a B 2 ×1 vector. For the sampling ratio α, we can obtain n B = B 2 × α sampling measurements. The initial reconstruction module and residual reconstruction module are matching with the sampling module. The initial reconstruction module takes those sampling  measurements as input and outputs a B×B-sized preliminary reconstructed image. Similar to sampling module, we also use a fully connected layer to imitate the traditional initial reconstruction process, which can be presented byx j =φ B × y j . In our design, theφ B can be learned automatically instead of computing by the complicated MMSE linear estimation. The residual reconstruction module is similar as the residual reconstruction module in CSRNet, shown in Fig. 3. The output of the residual reconstruction module is the final output of the network. Given the original image x i , our goal is to obtain the highly compressed measurement y j with the compressed sampling module and then accurately recover it to the original image x j with reconstruction module. Since the sampling module, the initial reconstruction module, and the residual reconstruction module form an end-to-end network, they can be trained together and do not need to be concerned with what the compressed measurement is in training. Therefore, the input and the label are all the image itself for training our ASRNet. Following most of deep learning-based image restoration where {W 4 , W 5 , W 6 } are the network parameters needed to be trained, f 4 is the sampling, and f 5 and f 6 correspond the initial reconstruction mapping and residual reconstruction mapping respectively. It should be noted that we train the compressed sampling network and the reconstruction network together, but they can be used independently.

Results and discussion
In this section, we evaluate the performance of the proposed methods for CS reconstruction. We will firstly introduce the details during our training and testing. Then, we show the quantitative and qualitative comparisons with four state-of-the-art methods.

Training
The dataset used in our training is the set of 91 images in the [14]. The set 5 from [14] constitutes to be our validation set. We only use the luminance component of the images. We uniformly extract patches of size 32 × 32 from these images with a stride equal 14 for training and 21 for validation to form the training dataset of 22,144 patches and the validation dataset which contains 1112 patches. Both CSRNet and ASRNet use the same dataset. We train the proposed networks with different measurement rates (MR) = 0.25, 0.10, 0.04, and 0.01. The Caffe is used to train the proposed model.

Objective quality comparisons
Our proposed algorithm is compared with four representative CS recovery methods, TVAL3 [8], D-AMP [5], ReconNet [14], and DR 2 -Net [16]. The first two belong to traditional optimization-based methods, while the last two are recent network-based methods. For the simulated data in our experiments, we evaluate the proposed methods on the same test images as in [14], which consists of 11 grayscale images. Here, nine images have size of 256 × 256 and two images are 512 × 512. We compute the PSNR value for total 11 images, and the results are shown in Table 1. We use the BM3D [18] as the denoiser to remove the artifacts resulting due to patch processing. It is obvious to see that ASRNet can achieve more than 1 dB gain, as compared with CSRNet. We add SSIM comparison between our proposed networks and network-based methods, ReconNet and DR 2 -Net, as shown in Table 2. From Tables 1 and 2, it can be found that our proposed CSRNet and ASRNet outperform other

Time complexity
The time complexity is a key factor for image compressive sensing. In the progress of reconstruction, the networkbased algorithms are much faster than traditional iterative reconstruction methods, so we only compare the time complexity with ReconNet and DR 2 -Net. Table 3 shows the average time for reconstructing nine sized 256 × 256 images of those network-based methods. From the Tables 1, 2, and 3, we can observe that the proposed CSRNet and ASRNet outperform the Recon-Net and DR 2 -Net in terms of PSNR, SSIM, and time complexity. And our ASRNet obtains the best performance in all objective quality assessments. Notably, ASR-Net run fastest which is very important for real-time applications.

Visual quality comparisons
Our proposed algorithm is compared with four representative CS recovery methods, TVAL3, D-AMP, ReconNet, and DR 2 -Net in visual. Figures 5 and 6 show the visual comparisons of Parrots in the case of measurement rate = 0.1 with and without BM3D respectively. It is obvious that the proposed CSRNet and ASRNet are able to reconstruct more details and sharper, which offers better visual reconstruction results than other network-based algorithms. The other three groups are shown in Figs. 7, 8

Evaluation on our proposed architectures
In order to verify the innovation and rationality of our networks' architectures in more detail, we add a comparison between the intermediate outputs and the final outputs of our methods in objective and subjective quality. Apart from the above four MR, we additionally train the models of CSRNet and ASRNet at the MR = 0.2 and 0.15. We calculate the mean PSNR and SSIM values of the total 11 test images at each measurement rate, as shown in

Conclusion
In this paper, two cascaded reconstructed networks are proposed for different CS sampling methods. In most previous works, the sample matrix is a random matrix in CS process. And the first network is a compatibly sampling reconstruction network (CSRNet), which can reconstruct high-quality image from its compressively sensed measurement sampled by a traditional random matrix. The second network is adaptively sampling reconstruction network (ASRNet), by matching automatically sampling module with corresponding residual reconstruction module. And the sampling module could perfectly solve the problem of sampling efficiency in compressive sensing. Experimental results show that the proposed networks, CSRNet and ASRNet, have achieved the  significant improvements in reconstruction results over the traditional and neural network-based CS reconstruction algorithms both in terms in quality and time complexity. Furthermore, ASRNet can achieve more than 1 dB gain, as compared with CSRNet.