Defeating data hiding in social networks using generative adversarial network

As a large number of images are transmitted through social networks every moment, terrorists may hide data into images to convey secret data. Various types of images are mixed up in the social networks, and it is difficult for the servers of social networks to detect whether the images are clean. To prevent the illegal communication, this paper proposes a method of defeating data hiding by removing the secret data without impacting the original media content. The method separates the clean images from illegal images using the generative adversarial network (GAN), in which a deep residual network is used as a generator. Therefore, hidden data can be removed and the quality of the processed images can be well maintained. Experimental results show that the proposed method can prevent secret transmission effectively and preserve the processed images with high quality.


Introduction
With the fast development of information technology, the online social networks (OSN) can provide us a convenient transmission of various messages. However, terrorists can also use OSN to transmit secret messages by hiding data inside the posted images. Generally, it is difficult for a server to detect whether an image contains secret messages inside the content. One possible solution is to interfere with the image content in OSN and destroy the hidden data that might be embedded.
There are two categories of data hiding technologies, i.e., steganography and watermarking [1]. The former hides many data into a cover while aiming at avoiding detection. In most cases, steganography is fragile to common attacks, and hidden data can be removed easily. The latter focuses on embedding data robustly, making the hidden data difficult to be destroyed. However, fewer data can be hidden into a cover by watermarking, which is widely used for copyright protection in social networks [2,3]. Steganalysis is a technique to detect whether an image contains hidden data [4,5]. However, steganalysis is not precise enough, esp. in cases of small embedding rates [6]. Besides, as there are many processed images in OSN, it would inevitably necessarily result in large false alarm rates. Therefore, it is more reliable to defeat the covert transmission by interfering with the image content.
Typical image processing operations on images, e.g., recompression, down-sampling, and beautification [7], can defeat most steganography methods without robustness. However, it is difficult to remove the messages hidden by robust steganography or watermarking tools. In previous studies, researchers have proposed some methods of destroying digital watermarking. In [8], an attacking method is proposed to remove redundancy through the self-similarities of image pixels. In [9,10], the wavelet transform-based watermarking and singular value decomposition-based watermarking can be defeated, respectively. These methods are mainly useful for specific watermarking algorithms [11].
The development of deep learning brings forward more tools for image processing, e.g., image classification, reconstruction, and recognition [12][13][14][15]. As most data-hiding methods can be viewed as adding noises, it would be useful to remove the hidden data by image denoising. Although many methods in [16][17][18][19][20] can offer better denoising performances than traditional methods, they are not good at removing the hidden data, esp. the date hidden by robust information-hiding tools. In this paper, we propose a new framework of defeating covert transmission in OSN. Inspired by the generative adversarial network (GAN) [21], we design a generator and a discriminator to destroy the secret data that might be hidden in the OSN images. After processing the images using the proposed method, a receiver cannot extract the hidden data from the processed images, even the robust data hiding methods are used by the sender. Meanwhile, the image quality can be well preserved. The rest of the paper is organized as follows. Section 2 introduces the related works. In Section 3, we present a detailed implementation of the proposed framework. Experimental results are presented in Section 4, and Section 5 concludes the paper.

Related works
Social networks such as Weibo, Twitter, and Instagram have various image processing functions. General steganography algorithms are not robust, in other words, social networks can easily break the secret information of stego images. Therefore, we use watermarking algorithms to verify the performance of our method, considering that terrorist may apply the robust and imperceptible watermarking for covert communication. The algorithms for testing should be typical and will not fail to normal lossy channel. After comprehensive consideration, we chose three classic algorithms in the field of digital image watermarking, which are based on quantized index modulation (QIM) [22], spread spectrum (SS) [23], and uniform log-polar mapping (ULPM) [24], respectively. Some brief introductions are given in this section.
The QIM algorithm quantifies the original cover into several different index intervals by different quantifiers, which is also the embedding process. There are generally two quantifiers due to the embedded information that is binary, and the quantization area is not coincident because of the disjunction. The watermarking will be extracted according to the quantitative index interval of modulated data. Receiver can detect hidden data by the shortest distance method when the channel interference is not serious.
As an important work in frequency domain watermarking, the contribution of SS algorithm lies in the introduction of spread spectrum communication technology. The spread spectrum code with pseudorandom and cross-correlation properties plays a key role in system. And the energy distribution of embedded watermarking signal is extended to a wider spectrum, which improves the security and robustness capability.
The researchers of ULPM algorithm propose a watermarking robust to rotated, scaled, translated, cropped distortion and general print-scan simultaneously. This study eliminates the interpolation distortion and expands the embedding space. A discrete log-polar point can be obtained by performing the near ULPM to the frequency index in the Cartesian system, and the data of which is then embedded to the corresponding DFT coefficient in Cartesian system, ensuring the integrated robust performance and efficiency.
Although the above three watermarking algorithms have difference in robustness, they will not easily fail in the face of lightweight image processing in social networks. Our method can prevent illegal communication by using of robust watermarking, so as to break the hidden data that cannot be influenced by traditional attacks. And the quality of processed images is slightly affected. Meanwhile, the method can be regarded as a new evaluation for the robustness of information hiding.

Overall framework
The flowchart of the proposed method is illustrated in Fig. 1. We provide a holistic approach to prevent security risks in social networks, which no longer relies on steganalysis due to possible failure in detection. We first generate watermarked image sets by randomly adding watermark into normal images, and we use binary random sequences as secret data, i.e., possibilities for 0 and 1 are equal. We send all pairs of image sets to GAN and gain the processed models by learning the mapping of watermarked images to clean images. Subsequently, all models would be integrated into the social networks to block illegal communication hidden in transmitted images. The detailed steps are described as follows: 1) In the initial stage, clean images of the database DS C are first added with a watermark generated by random binary message. We denote the watermarking algorithms as ϕ 1 , ϕ 2 , …, ϕ n respectively, and the corresponding watermarked image datasets as DS ϕ 1 ; DS ϕ 2 ; …; DS ϕ n . The watermarking algorithms should cover both classical and state-of-the-art algorithms. 2) In the training phase, DS C is sent to the generator G and discriminator D n times with n watermarked datasets severally. We follow the optimization equation proposed by [21] where p data (x) and p z (z) denote the distribution of real data and generated false data, respectively. E calculates their mathematical expectation. The value function V represents the performance of D. For each training objective, G fits from the prior distribution on DS C , ensuring that the expected error of D for the generated data is as large as possible. Then D should distinguish the real samples from the generated samples more accurately through the log-likelihood. The Model-ϕ 1 , Model-ϕ 2 ,…, Model-ϕ n record training parameters for each session. The details of the network design will be introduced in Sections 3.2 and 3.3.

3)
We deploy all the aforementioned training models on social networks in the application process. One of the rules is not to judge whether a transmitted image contains a watermark and the type of watermark to guarantee the practicality of our method. The effect of each model is only valid for images with its corresponding or similar watermarking algorithms due to the characteristic of data distribution. Therefore, we scramble all models under random n times sampling without replacement for n times process, and the rearrangement is Model-1, Model-2,…, Model-n. The operation would be done whenever an image is transmitted. As an example, the Model-ϕ 1 gained by DS C and DS ϕ 1 should have a small influence on the ϕ 2 -watermarked image. However, the watermarked image that applies ϕ 2 can be processed by Model-ϕ 2 during n times process to remove secret data in any case.
It should be noted that our training scheme is not to mix the watermarked images of all labels. Because different types of watermarked images have great differences in data distribution, it may lead to the instability of network learning and the failure of the models. The framework avoids the problem to some extent. Meanwhile, it is apparent that the data distribution of clean images is different from that of the watermarked images, which also guarantees that clean images are not largely affected. We obtain the result under n times random sampling to ensure the randomness of the processed image at the pixel level. Besides, in many cases, data senders are able to find patterns of image processing in social networks by repeatedly uploading and downloading. The framework prevents such phenomenon effectively.

The architecture of generator G
We use the method in [18] as the generator to gain the mapping of watermarked images to clean images. The applied convolutional neural network (CNN) can efficiently and flexibly mine deep features of images by combining residual learning and batch normalization (BN). Because of the truth that the deeper networks generated by merely adding layers would not always bring positive benefits, the combination method avoids convergence difficulties and the saturation or even slowdown in network performance.
We synchronize the training errors of deep and shallow networks by introducing shortcut connections on the stacked layer. Specifically, we denote the original mapping to be learned as H ðxÞ for network input x and output y, while the residual mapping is F ðxÞ ¼ H ðxÞ − x. When the residual is zero, the network would not be negatively optimized because the identity mapping happens on the stack layer. In theory, the most intuitive benefit is to cut the amount of learning required to make training more accessible. Next, we take the residual image as output directly through only one residual unit, which is different from the classic residual network with multiple shortcut connections. At the same time, the BN layer is employed to improve the generalization ability and reduce the training pressure caused by adapting to the distribution changes of each iteration. Figure 2 provides the architecture of generator and discriminator network. The network depth is set to 21, which is determined by balancing model effect and training time. We apply 64 filters of size 3 × 3 on the input watermarked image I W . The output 64 feature maps are fed into the 19 repeated convolutional layers composed of 64 kernels with size 3 × 3, and batch normalization is added after each convolution. The residual image I R is reconstructed by the corresponding number of image channels, aiming to approximate the real residual of I W and clean image I C . Except for the TanHyperbolic (TanH) function used on the output layer, all other layers take rectified linear units (ReLU) as the activation function for the stability of training. At the end of the network, generated image I G is obtained by subtracting I R from I W . We denote training parameters of the generator G as θ G = {ω 1~L ; b 1~L }, where ω 1~L and b 1~L represent the weights and biased of the L-th layer, respectively. We express the relationship between the above image labels by Eq. (2).
We use a real-valued tensor of size N×H×W×C, where the images are sized N×H×W with C channels, and the training batch size is N.
Our learning goal is guided by the loss function, which consists of content loss and adversarial loss. The content loss adopts the mean-squared error (MSE) of the output residual image and the real residual as the optimization objective, which is the most frequently used in the perceptual loss. Since it can be intuitively regarded as the pixelwise difference, the detailed result is calculated by Eq. (3) However, the accuracy of gradient descent direction is not high enough by simply using error back-propagation through MSE, especially where there is little visual disparity between watermarked image and target clean image. We expect that the probability of a fake image being judged as clean by discriminator is vast, and keep pace with the minimization trend of MSE. Therefore, the adversarial loss is further added to update gradient more precisely and make sure the generated image is as similar as possible to the groundtruth. The adversarial loss can be calculated as follows: Finally, we define the total generator loss as where β = 10 −3 . Empirically, for the balance of generator and discriminator, the proportion of adversarial loss is generally slightly smaller.

The architecture of discriminator D
We set up a pre-processing layer based on prior knowledge before the image is formally inputted into the discriminator. Image quality would affect the results of an algorithm under normal circumstances. The processing of database is not restricted to the normalization of image pixels. It is crucial to eliminate irrelevant information and take advantage of useful information on the basis of simplifying data to the greatest extent. Because the difference between watermarked image and clean image is totally small in our task, it can be regarded as a weak noise signal in high frequency. High-pass filtering operation can amplify the signal by weakening the other image contents, which would drive the subsequent network to perform better at classification. We denote the highpass filter as F, and the filtered image R under batch N can be obtained by Eq. (6) where k = 1, 2, …, N. The symbol ⨂ represents convolution operation, and the label on behalf of generated image G and clean image C. We use the following filter kernel, which is commonly employed in steganalysis.
Inspired by the principles summarized in DCGAN [25], the core part of the discriminator network consists of 8 convolutional layers, and the number of kernels increases gradually from 64 to 512 by a factor of 2. We utilize the stacked convolution kernel of size 3 × 3 instead of the size 5 × 5 used in the original method without changing the perceptive field. This setting allows the mapping to contain more nonlinear functions and to represent more features with fewer parameters. The probability of sample classification is calculated by the cross-entropy error function, after 512 feature maps pass through the full connection layer and sigmoid activation function. More importantly, we add the BN layer and the LeakyReLU activation function in all convolutional layers except the input layer for the sake of discrimination stability.
Similarly, we utilize parameter θ D to construct discriminator as D θ D . The optimization goal is defined as follows: The discriminator is able to determine the probability of real as higher as possible when the input image is clean. For the generated fake image, the detecting result is low. The network achieves Nash equilibrium during the interaction between discriminator and generator, and the final generated image is sufficient to deceive discriminator.

Experimental setting
We test three classic watermarking algorithms based on QIM, SS, and ULPM, respectively. The image dataset employed in our experiments is COCO [26], which contains 200,000 plain color images. In practice, we select 10,000 images from training set and 1000 images from testing set randomly for experiments. A larger training naturally will increase the computational complexity and might cause positive feedback to the results. All images are resized to 192 × 192 for simplicity.
In the initial stage before training, we first set the label of the original training image as clean. Next, the above-mentioned three watermarking algorithms are utilized to generate watermarked image denoted as ϕ QIM , ϕ SS , and ϕ ULPM . The length of message sequence is randomly selected from 40-bit to 120-bit. According to the payload capacity of each algorithm, we consider the length range of message comprehensively, which enlarges the effect of the model on watermarked images with various data extent. Though these watermarking algorithms are mainly designed for gray images, they can be easily applied on color images by embedding data in the Y channel. We separately send the clean images and three watermarked datasets to GAN to gain three processed models named Model-ϕ QIM , Model-ϕ SS , and Model-ϕ ULPM . The image pre-processing of network includes normalizing the pixels to [-1, 1] and high-pass filtering. Our models are trained for 7500 iterations based on the Adam optimizer, and hyperparameter momentum is set to 0.9. The learning rate is decayed exponentially from 1e−4 to 1e−6. To avoid the oscillation of loss, all weights are initialized by a normal distribution with a mean of 0 and a standard deviation of 0.02. The slope is 0.2 in all layers activated by Leaky ReLU. We conduct the experiments on a PC with Intel (R) Core (TM) i7-6850K CPU 3.60 GHz and a GTX1080Ti GPU. It averagely takes about 1.5 days to train each model on GPU.

Evaluations on process effectiveness
For objective image assessment, we use three metrics to assess the degree of damage and the impact on the quality of watermarked images. The value of each objective metric is the mean result on testing sets. The first is the data extraction error rate of processed images. We denote the number of wrong message bits as n error , and n m is the length of embedded messages, the error rate result is calculated by Eq. (9) which approaches 50% means that secret data is completely destroyed. Peak signalto-noise-ratio (PSNR) and structural similarity index (SSIM) as two universal criteria are also applied. The former measures fidelity of watermarked images and processed images, while the latter evaluates visual loss. A higher PSNR or SSIM generally indicated better visual quality.
We test the effectiveness of each processed model in the first step to ensure that the saved models can process corresponding watermarked images. The lengths of secret message are 40, 60, 80, 100, and 120 bits, respectively. As in the training phase, the message is also embedded in the Y channel. Figure 3 shows the relationship between data extracting errors and payloads. For the testing images of ϕ QIM watermark scheme, the average error rate can reach around 40% or higher, which indicates the secret data has been basically destroyed. While the watermarked images of ϕ SS and ϕ ULPM perform slightly better than ϕ QIM in fault tolerance due to non-blind and error-correction code. However, the ratios of data error for each payload tested are more than 30%, indicating that the extracted data has lost the original meaning. Figure 4 shows the effect of model on the quality of watermarked images. With payload increasing, the influence of Model-ϕ QIM and Model-ϕ SS is getting larger, and Model-ϕ ULPM is stabilizing. However, high SSIM proves strong imperceptibility of the proposed framework. As we reconstruct the pixel content of watermarked images to approximate their original images, the degree of impact on image quality depends on the watermark algorithm principle.
As mentioned above, it is meaningless to apply a single model to the images watermarked by the corresponding algorithm in practice because we cannot classify the type of transmitted images. Hence, we further serial all models in random order so that images are processed three times. Obviously, there are six kinds of outcomes. We denote all processes as P QIM − SS − ULPM , P QIM − ULPM − SS , P SS − QIM − ULPM , P SS − ULPM − QIM , P ULPM − QIM − SS , and P ULPM − SS − QIM . Next, we embed 80-bit and 100-bit messages in the images of testing set by ϕ QIM , ϕ SS , and ϕ ULPM to generate the watermarked images processed images are almost identical to that before processing. However, with the decrease of the quality factor, the perception of visual distortion increases gradually. Meanwhile, the results of image filtering, noising, and gamma correction are obviously not promising. We compare the recovery of the 80-bit message and image quality variation in the six processes with JPEG compression. The results of all outcomes are shown in Table 1. It is observed that different order of three models offers individual results. For a watermarked image, the best situation is that the model trained by the corresponding watermarked images is placed first. Other models produce incorrect effects toward a clean image in pixel content to bring a chain reaction. JPEG compression as the most common image processing operation works explicitly until the QF= 10. However, the image quality will drastically deteriorate, which is not allowed in real social network application. The worst results from randomization in our method can also ensure channel security without much change in image quality.
Other traditional attacks mentioned above and the six processes are applied on watermarked images with a 100-bit message, and testing results are listed in Table 2. Although different watermarking algorithms have different performance in resisting various kinds of traditional attacks, the data extraction error rate is still inferior to our method while the watermarked images have been seriously distorted, according to Fig. 5. We can speculate that the traditional attacks will cause intolerable distortion to watermarked images when achieving sufficient data error rate, which further proves the effectiveness of our proposed method. to the watermarked image. The PSNR between the processed image and the watermarked image is used to distinguish outcomes. The results are shown in Fig. 6. Apart from the fact that human eyes can barely distinguish differences, we assure that the distribution of internal pixels is different through image quality.

Anti-analyzability of process and impact on clean images
On the other hand, the majority of images transmitted over social networks are free from secretly embedded data. It is also necessary to verify that the model has little effect on these pure images. We process pure images without any watermarking in testing sets using the six processes in Section C and list the average value of PSNR and SSIM of these images in Table 3. The results in Table 3 prove that the impact of defeating potential data hiding proposed in the paper is mere and controllable. Also, better performance of removing secret data will result in lower influence on the nonwatermarked images.

Conclusion
In the paper, we consider that social networks are weak in the face of illegal communication hidden by robust algorithms, and steganalysis performs not well in the small payload. We propose a GAN-based method to defeat data hiding, which learns the mapping from the watermarked images to the corresponding clean images. The experiments prove that the process models trained are effective in destroying hidden data basically while ensuring the quality of the processed image. To resist collusion attack, we increase the vigilance for communication channel analysts by sampling without replacement repeatedly from the process models. For future study, we consider to improve the breaking rate and integrate more robust data hiding schemes by designing more efficient schemes to integrate all watermarking algorithms.