Performing scalable lossy compression on pixel encrypted images

Compression of encrypted data draws much attention in recent years due to the security concerns in a service-oriented environment such as cloud computing. We propose a scalable lossy compression scheme for images having their pixel value encrypted with a standard stream cipher. The encrypted data are simply compressed by transmitting a uniformly subsampled portion of the encrypted data and some bitplanes of another uniformly subsampled portion of the encrypted data. At the receiver side, a decoder performs content-adaptive interpolation based on the decrypted partial information, where the received bit plane information serves as the side information that reflects the image edge information, making the image reconstruction more precise. When more bit planes are transmitted, higher quality of the decompressed image can be achieved. The experimental results show that our proposed scheme achieves much better performance than the existing lossy compression scheme for pixel-value encrypted images and also similar performance as the state-of-the-art lossy compression for pixel permutation-based encrypted images. In addition, our proposed scheme has the following advantages: at the decoder side, no computationally intensive iteration and no additional public orthogonal matrix are needed. It works well for both smooth and texture-rich images.


Introduction
Compression of encrypted data draws much attention in recent years due to the security concerns in a service-oriented environment such as cloud computing [1,2]. The traditional way of securely and efficiently transmitting redundant data is to first compress the data to reduce the redundancy then encrypt the compressed data. At the receiver side, decryption is performed prior to decompression. However, in some application scenarios (e.g., sensor networking), a sender may first perform encryption with a simple cipher and then send it to a network provider. The network provider always has the interest to reduce the rate. It is desirable to be able to compress the encrypted data without the key to reduce the security concerns. At the receiver side, joint decryption and decompression will be used to reconstruct the original data.
It has been proved in [1] that the overall system performance of such approach can be as good as the conventional approach, that is, neither the security nor the compression efficiency will be sacrificed by performing compression in the encrypted domain. Two practical approaches to lossless compression of encrypted binary images and to lossy compression of encrypted Gaussian sequence are also presented in [1]. In the first approach, the original binary image is encrypted by adding a pseudorandom string; the encrypted data are compressed by finding the syndromes with respect to a low-density parity-check (LDPC) code [3]. In the second approach, the original data are encrypted by adding an iid Gaussian sequence, and the encrypted data are quantized and compressed as the syndromes of a trellis code. In [4], compression of encrypted data for both memoryless sources and sources with hidden Markov correlation using LDPC codes is also studied. A study [5] introduces a few methods for lossless compression of encrypted grayscale and color images by employing LDPC codes to various bit planes and exploiting the spatial and cross-plane correlation among pixels. In [6], Liu et al. proposed to decompose the encrypted image in a progressive manner, and the most significant bits in the higher levels are compressed using rate-compatible punctured turbo codes. The decoder can observe a low-resolution version of the image, study the local statistics based on it, and use the statistics to estimate the content in the higher levels.
Another study [7] presents some algorithms for compressing encrypted data and demonstrates blind compression of encrypted video by developing statistical models for source data and extending these models to video. All of the works mentioned above use the distributed source coding (DSC) technique. However, a frequent backward channel communication is needed for the joint decryption and decoding at the receiver, and thus, large delay maybe of concern. So, DSC-based methods may not be a desirable choice in some practical network transmission scenarios.
There are a few works on lossy compression for encrypted data. In [8], the authors introduce a compressive sensing technique to achieve lossy compression of encrypted image data, and a basis pursuit algorithm is appropriately modified to enable joint decompression and decryption. In the state-of-the-art work [2], a lossy compression and iterative reconstruction for permutation-based encrypted image is proposed. However, when using the permutation-based encryption, only the pixel positions are permuted, but the pixel values are not masked in the encryption phase, which means that the histogram of the encrypted image will remain the same as the original image, revealing some significant information. Iterative reconstruction may have a hard time to converge for a texture-rich image. An additional public orthogonal matrix of huge size is needed for the decompression at the receiver side. If the size of the to-be-compressed image is 512×512, then the size of the public orthogonal matrix is about 512×512×512×512. Each target rate requires a distinct public orthogonal matrix. The huge public orthogonal matrix results in huge storage space requirement and computational load. Note that such a public orthogonal matrix cannot be used in the compression for pixel-value encrypted image. There is another lossy compression and iterative reconstruction for encrypted image proposed in the state-of-the-art work [9]. The encryption method of the image is by making a modulo-256 addition on the original pixel values with pseudorandom numbers. The scheme is scalable, and it performs very well with iteration.
In this paper, we propose a scalable lossy compression scheme for images having their pixel value encrypted with a standard stream cipher. At the receiver side, a decoder performs a content-adaptive interpolation prediction based on the decrypted partial information, and the received bit plane information serves as the side information to facilitate accurate image reconstruction. The experimental results show that our proposed scheme achieves much better performance than the existing lossy compression scheme for pixel-value encrypted image and achieves similar performance as the state-of-the-art lossy compression for permutation-based encrypted images. The rest of this paper is organized as follows: in 'The proposed scalable compression scheme' section, we describe the proposed compression scheme for encrypted images in detail. The 'Experimental results' section shows the experimental results with comparison to the state-of-the-art works. The conclusion is made in the 'Conclusions' section.

The proposed scalable compression scheme
We assume the images have been encrypted by applying a standard stream cipher to the pixel values in the spatial domain. Even though the pixel value has been encrypted, the resulting encrypted data preserve some of the inherent property of the original image, e.g., the spatial relationship of pixels and the bit plane structure and their relative importance. This leads us to adopt a multi-resolution and bit plane-based scalable approach for the compression. The basic idea is to package and transmit a downsampled version of the encrypted image as the base layer, then selectively transmit additional bit plane information from another downsampled version (with a different spatial offset) of the encrypted image to facilitate the interpolation/ reconstruction of the higher resolution image at the receiver. This process can be recursively applied in a multi-layer structure. This results in an embedded, compressed, and encrypted bitstream, where the bitstream can be cut off flexibly to meet a target bit rate constraint without requiring complex communication/ negotiation between the encoder and decoder as was the case in some prior work that used DSC, e.g., [1,[3][4][5][6][7]. In the following, we describe our proposed scheme in a two-layer scenario.
Suppose the size of an original 8-bit grayscale image is N 1 ×N 2 . It is encrypted with a standard stream cipher, resulting in an encrypted image E.
To compress, we downsample the encrypted image by a factor of two in both dimensions and generate four subimages, denoted as E 00 , E 01 , E 10 , and E 11 . Here, the first digit '1'(or '0') denotes that the horizontal offset for downsampling is 1 (or 0), the second digit '1' (or '0') denotes that the vertical offset is 1 (or 0). As shown in Figure 1, each icon is a pixel. We use the four icons to distinguish the four sub-images after downsampling. When they are decrypted and decompressed, they are denoted as 00, 01, 10, and 11 sub-images, respectively.
The uncompressed E 00 sub-image will be transmitted to the decoder. Some of the E 11 sub-image's bit planes will be transmitted, too, according to the target bit rate. The target bit rate (R) per information source bit can be calculated by: where N is the number of bit planes of sub-image E 11 to be transmitted.  Figure 1. We predict the 11 sub-image using the 00 sub-image with the context-adaptive interpolation (CAI) scheme proposed in [6]. In this work, we propose to use the received bit plane values of sub-image 11 as the side information to facilitate the estimation of the image edge information in the context adaptive interpolation thus improving the prediction. For the 10 sub-image, there are also four neighboring pixels in the 00 and 11 subimages as shown in the bottom right of Figure 1. So, when the receiver obtains sub-image 00 and subimage 11, the 10 sub-image (and 01 sub-image) can be predicted by the conventional CAI (please refer to [6] for a detailed description of the conventional CAI). In the following, we only present the improved CAI prediction of the 11 sub-image with the received bit plane information as the side information.
Let 0 be a pixel in the 11 sub-image which is to be predicted and t = [t 1 , t 2 , t 3 , t 4 ] be the vector of its four neighboring pixels (please refer to the top left of Figure 1). The preliminary prediction of pixel 0 with CAI [6] is: In Equation 2, the local region is classified into four types: smooth, horizontally edged, vertically edged, and other median-related edge. With the received bit plane values of sub-image 11, we can match the bit plane values of pred 0 with the received bit plane value. If they match with each other, we accept the preliminary prediction value; otherwise, we find a better-matching prediction using the image edge directions other than the four local regions considered in Equation 2.
The decoder will receive E 00 sub-image and some bit planes of the E 11 sub-image. After decryption, the decoder will get 00 sub-image and some bit planes of the 11 subimage. We denote the decimal value of the bit planes which are transmitted and decrypted as w. Take N = 2 for example, b 7 b 6 of the 11 sub-image was considered. If b 7 b 6 = (10) 2 Let Δ be the stepsize corresponding to the most significant bit plane of the side information. In this paper, we adopt Δ = 2 7 when N < 8 in our scheme. Define the matching distance d as follows: where mod( ) is the modulation operation. As b 7 b 6 was known, we calculate the distance d between w and the decimal value of the same bit planes of pred 0 . The distance can be used to judge whether the pred 0 matches well. If the distance is large, such that: we consider that pred 0 does not match well. Then, other two prediction values pred 1 and pred 2 , which correspond to other image edge directions, will compete with the preliminary prediction value pred 0 for the best match with the side information: where sum( ) denotes the summation operation, and max( ) and min( ) denote taking maximum and minimum operation, respectively. We find the best match by seeking the minimum value of min(|d|, M − |d|) among the three prediction values pred 0 , pred 1 and pred 2 and obtain the final best matching prediction pred. Finally, with the side information, the corresponding prediction value r can be further refined to be: Table 1 The PSNR (dB) of the reconstructed images using our scheme r ¼ where floor(pred/Δ) × Δ is the value of the b 8 of the prediction in the pixel; w × Δ/M is the value of the bit planes transmitted in the pixel.

Experimental results
In this section, we will examine the performance of our proposed method and also compare it with the existing state-of-the-art works. The proposed compression scheme is applied on a variety of images with different sizes. We will show the test results for four selected standard images which have varying texture contents. The test images used here are Lena, Baboon, Man, and Hill. All the test images are 8-bit grayscale images of 512×512. Results for a twolayer decomposition structure are presented. Table 1 shows the peak signal-to-noise ratio (PSNR) of the decompressed image with varying bit rates (bit rate per information source bit). The bit rate is determined by N, the number of transmitted bit planes of the E 11 sub-image. Higher rate leads to higher PSNR. Proposed Method Method in [8] Method in [7] Method in [9]  reconstructed Lena is 39.7 dB (Figure 2d). It is observed that the decompressed images (Figure 2c,d) are very similar to the original image, and there is no visible compression artifact. With N increasing from 2 to 6, the PNSR increases significantly, but with N increasing from 6 to 7, the PNSR increases slowly because the least significant bit plane b 0 has little effect on the pixel value. Higher rate leads to better quality of the reconstructed image. Figure 3 illustrates the PSNR of the reconstructed images with the varying rates when Lena and Baboon are used. It shows that the proposed scheme works for both smooth and texture-rich images.
There are few existing works on the lossy compression of the pixel-value encrypted image. We compare our proposed method with the method in [8], which applies compressive sensing technique to compress the encrypted image. Figure 4 shows that our method achieves much better performance than the method in [8] on the same Lena image. With the bit rate changing from 0.25 to 0.44, the PSNRs of our method are all higher than 34 dB, while the PSNRs of the method in [8] are lower than 30 dB.
We also compare our method to the methods in [2] and [9]. Our proposed method achieves similar performance as the method in [2] for the pixel-value unencrypted image (Figure 4). A public orthogonal matrix is used in [2] to disperse the estimation error in the permutation-based encrypted domain. Note that such a public orthogonal matrix cannot be used in the pixel value-encrypted domain. Compared to the most recent method [9], our method is a little worse than the method in [9], but our method is not an iterative method and both methods in [2] and [9] are iterative ones and thus may have the issue of convergence for a texture-rich image and possible intensive computation.

Conclusions
In this paper, we propose a lossy compression scheme for pixel-value encrypted images. The main contributions are as follows: 1. At the receiver side, the received bit plane information serves as the side information to facilitate the estimation of image edge information thus making the image reconstruction more precise. The more bit planes are transmitted, the higher quality of the reconstructed image. 2. The experimental results show that our proposed scheme achieves much better performance than the existing lossy compression scheme for pixel-value encrypted images, and also achieves similar performance as the state-of-the-art lossy compression on the pixel permutation-based encrypted images.
3. Compared to the state-of-the-art work, our proposed scheme also has the following advantages: at the decoder side, no computationally intensive iteration and no additional public orthogonal matrix are needed. The scheme can be applied to both smooth and texture-rich images.
In the future, we will also extend our work to compression of encrypted video.