A coverless steganography method based on generative adversarial network

The traditional information hiding is realized by embedding the secret information into the multimedia, but it will inevitably leave the modification mark in the carrier. This paper proposed a new method of coverless information hiding. First, the improved Wasserstein GAN (WGAN-GP) model is constructed, and the model is trained with disguised images and secret images. Then, after the model is stable, a disguised image is passed to the generator. Finally, the generator generates the image that is visually the same as the secret image, thereby achieving the same effect as transmitting the secret image. Experimental results show that this method not only has a good effect on the security of secret information transmission, but also increases the capacity of information hiding.


Introduction
The rapid development of computer network technology has enabled multimedia information such as videos, texts, and images to be transmitted quickly on the network. However, the network provides information sharing, brings convenience to people, and also has many security risks. Information hiding technology is an important modern information security technology. It is a method of transmitting secret information by hiding secret information in texts, images, videos, and other carriers [1]. According to its use, information hiding can be divided into steganography and digital watermarking. Steganography is used for the transmission of secret information, while digital watermarking is used for copyright protection and other scenarios [2][3][4]. According to the hidden protocol, it can be divided into no secret key steganography system, public key steganography system, and private key steganography system [5][6][7]. According to the technology, it can be divided into spatial domain [8],  [9], frequency domain-based [10], and structure-based steganography. The simplest steganography algorithm is the least significant bit (LSB) information hiding, but it leaves a significant modification feature to the steganography image [11]. With the development of steganography, the new steganography algorithm can maintain more complex image statistical features, such as HUGO [12], SUNI-WARD [13], and WOW [14]; the paper [15] proposed a scheme of encrypting, compressing, and finally reconstructing the secret image of the secret image; the paper [16] proposed non-uniform watermark sharing based on optimal iterative BTC for image tampering recovery scheme, and so on. Adaptive embedding strategy [17] can automatically embed secret information into the texture and noise-rich image region, so as to maintain complex higher-order statistical features [18]. In order to combat the more advanced adaptive steganography, the features involved in steganalysis are gradually becoming more complex and high-dimensional. In recent years, high-order statistical characteristics based on complex correlation modeling in image domain have become the main research characteristics of steganalysis [19]. PSRM (Projection Speciation Rich Model) [20], Selection of Rich Model Steganalysis Features Based on Decision Rough Set (2020) 2020: 18 Page 2 of 10 α-Positive Region Reduction [21], and other models are based on such high-order and high-dimensional features, and they have achieved good detection effect. At present, neural networks have become a research hotspot in various fields [22][23][24]. In this paper, neural network is introduced into the research of image information hiding. First, build a WGAN-GP model. Then, the disguised image is sent to the generation network, and the secret image is input to the discrimination network as a real image, and the generation network and the discrimination network are trained using the maximum and minimum games. The generation network distinguishes the real image from the generated image as far as possible and finally discriminates that the network cannot distinguish the real image from the generated image, so as to obtain the generated image that is visually identical to the secret image. The contributions of our work are as follows: -In the paper, the WGAN-GP model is constructed.
For the first time, the WGAN-GP model is used for image information hiding, and the image is input into the generator instead of random noise. The generated image is visually identical to the secret image, so as to achieve the same effect as the transmission of secret image. -The recipient can generate the same image as the secret image by using this separate generation network and the received disguised image. The disguised image is transmitted without any modification or embedding operation, which effectively avoids the detection of steganalysis algorithm. -The generated image is visually the same as the secret image. Only the camouflage image and the corresponding generator can get the secret image, which is highly secure.
The rest of this article is organized as follows. Several related models are introduced in Section 2. The proposed method and experimental environment are introduced in Section 3. Experimental results and discussions are shown in Section 4. Finally, the conclusions are shown in Section 5.

Generative adversarial network
Generative adversarial network (GAN) was proposed in 2014 and has attracted much attention in various fields. New GAN models and applications have been emerging continuously [25]. WGAN-GP [26] is also a derivative model of GANs. This paper proposes to use WGAN-GP model for image information hiding. GAN consists of generator and discriminator. The generator is used to study the real image distribution, and the generated image is more real, which makes it difficult for the discriminator to distinguish true from false. The discriminator demands to discriminate whether the received picture is true or false. In the whole process, the generator strives to make the generated image more real, while the discriminator strives to identify real or false images, which is like a two-person game. The generator and the discriminator continuously compete with each other, and finally, the two networks reach dynamic balance: the image distribution generated by the generator learns the distribution of the real image, and the discriminator cannot judge whether it is a true image or a false image, and the prediction probability of the given image is approximately equal to 0.5. An example can explain GAN more intuitively: the gang making fake money is equivalent to a generator. They want to cheat the bank by forging money, so that the fake money can be traded normally. And the bank is equivalent to a discriminator, which needs to judge whether the money is real money or fake money. The purpose of the counterfeit currency gang is to create a counterfeit currency that the bank cannot identify and deceive the bank. The bank must accurately identify the counterfeit currency. Therefore, we can summarize the above content: true = 1, false = 0, the label that the discriminator will label the real image is 1, and the label of the generated image is 0; the generated confrontation network structure is displayed in the Fig. 1.
There is no mandatory restriction on the choice of the generation model (generator) and the discriminant model (discriminator) in GANs. In [25], and they use a multilayer perceptron. The GANs define noise p z (x) as a priori that is used to learn the probability distribution p g of the generator on the training data x, G (z) represents the mapping of the input noise into data (e.g., the generated image.), and D (x) represents the probability that x comes from the distribution of real data p data instead of p g . Therefore, the optimized objective function defines minimax in the following form: Minmax is to maximize (1) when updating discriminator and to minimize (1) when updating generator. When the generator updates the discriminator, the optimal solution is . When the generator is updated, the objective function takes the global minimum (if and only if condition p g = p data is satisfied). The result of the last two model games is that the generator will create fake data. The discriminator is difficult to determine whether the data created by the generator is real, that is, D(G(z)) = 0.5. In GAN, if the discriminator is trained too well, the generator will not be able to get enough gradient to continue optimization. If the discriminator is trained too weakly, the indicator effect is not significant, and the generator will not be able to learn effectively. In this way, the training of discriminator is difficult to control, which is the root of the difficulty of GAN training. The emergence of WGAN solves this problem.

Earth mover's distance
Wasserstein GAN (WGAN) [27] explored a more appropriate measure of "generating the difference between distributions". Earth-mover distance (EM), also known as Wasserstein distance [28], is defined as: inf (P r ,P g ) is a collection of all possible joint distributions of P r and P g combined. For each possible joint distribution γ , a real sample x and a generated sample y can be obtained from the middle sample (x, y) ∼ γ , and the value ||x − y|| of the pair of samples is calculated, so that the expected value E (x,y)∼γ [ ||x−y||] of the sample to the distance under the joint distribution can be calculated. The next term that can be taken from this expected value in all possible joint distributions is defined as the EM distance. The meaning of earth mover is the meaning of the bulldozer. This name is very appropriate. Because intuitively, the EM distance is to measure the minimum cost of pushing P r pile of "sand" and P g pile of "location", of which γ is a "bulldozing" scheme.

Wasserstein GAN
EM distance is used to GAN. It is difficult to solve the EM distance directly, but the problem can be transformed into the following formula 3 using a theory called Kantorovich-Rubinstein duality [29]. This formula means that all functions f satisfying the 1-Lipschitz limit are taken to the previous term of In other words, the Lipschitz limit specifies the maximum local variation A of a continuous function, as K-Lipschitz [30] is: Then, use the neural network method to solve the above optimization problem: This neural network is very similar to the discriminator in GAN. There are only a few subtle differences, and it is named critic to distinguish it from discriminator. The differences between the two are: 1. The last layer of critic discards sigmoid because it outputs a fraction in the general sense, unlike the probability that discriminator outputs.
2. Critic's objective function has no log entry, which is derived from the above derivation. 3. Critic has to truncate the parameters in a certain range after each update, also called weight clipping, in order to guarantee the Lipschitz limit mentioned above. 4. The better critic training, the better for the enhancer of the generator, so you can safely train critic.
Although the mathematical proof is very complicated, the final change is very simple. The structure of WGAN is shown in Fig. 2:

Improved training of Wasserstein GANs
WGAN is sometimes accompanied by problems such as low sample quality and difficulty in convergence. In order to guarantee Lipschitz restrictions, WGAN uses the weight clipping method, but weight clipping causes two major problems: Modeling ability is weakened and gradient explosion or disappearance. The alternative proposed in [26] is to add gradient penalty (GP) to critic loss. The new network model was called WGAN-GP.
Original critic loss Our gradient penalty (5) Inspired by the fact that WGAN-GP generates handwritten characters from MNIST data set, this paper constructs the only WGAN-GP model belonging to sender and receiver for image information hiding. Instead of random noise, camouflage image is transmitted to generator to generate an image with the same sense as secret image. The structure of the WGAN-GP model used in this paper is shown in Fig. 3.
After the model is stabilized, the disguised image passed to the generator can only generate the image which is the same as the secret image in appearance, thereby ensuring the security of the information.

Experimental environment and data set
The 10000 images extracted from the LFW [31] data set are used as the data set of this experiment, in which 5000 images of the disguised images and the secret images are respectively 256 × 256 grayscale images. The python version is 3.5, tensorflow version is 1.10, and the GPU is 1080.

Structure of generators and discriminators
In this paper, the generator used by the WGAN-GP model has 65,536 neurons in the input layer, 64 neurons in the hidden layer, and 65,536 neurons in the output layer. The ReLU activation function was used in the input layer and the hidden layer, while the sigmoid activation function was used in the output layer. The input layer of the discriminator has 65,536 neurons, the hidden layer has 64 neurons, and the output layer has 1 neuron. The ReLU activation function is used in the input layer and the hidden layer.

Information hiding and extraction process
The model was trained with disguised images and secret images. When the model is stable, the image generated by the generator is visually the same as the secret image. The receiver receives the disguised image and uses the generator to generate the same image as the original secret image, thereby obtaining a secret image. The same effect is achieved by transmitting disguised images as transmitting secret images. The overall process of the experiment is shown in Fig. 4.

Results and discussion
As the number of iterations of the model increases, the image generated by the generator gets closer and closer to the secret image. As can be seen from the following example in Fig. 4, when the model is trained 1000 times, the generated image is a noise image. When the model is trained 5000-10000 times, you can see the approximate image content. When the number of trainings reaches 50,000 times, the generated image is visually identical to the secret image and can replace the secret image. The disguised image is in the first column, the original secret image is in the sixth column, and the images generated under different training times of the model are in the second to fifth columns of Fig. 5. When the model is stable, it is difficult to discriminate the generated image from the original secret image, which In addition to the visual comparison between the generated images and the secret images, 1000 images were randomly selected from the LFW data set. A few examples are taken to analyze the generated images, and the disguised images are shown in Fig. 7.
In order to prove the practicality and generalization of this method, 1000 images were randomly selected from CelabA [32] and ImageNet [33] data sets for experimental verification, and the generated images after the model was stabilized were analyzed by histogram. In Figs. 8 and 9, the disguised images are in the first column, the secret images are in the second column, the generated images are in the third column, the histogram of the secret images is in the fourth column, and the histogram of the generated images is in the fifth column.
Generate the generator by training the model with disguised images and secret images, save the generator after the model training is stable, and construct the mapping relationship between the trained generators and the corresponding disguised images. In order to prove the security of this scheme, we use disguised images and trained generators to obtain secret images. As can be seen from Fig. 10, only disguised images and corresponding generation models can obtain the same visually as the secret images. Otherwise, only the noise image is obtained, which proves that the method is safe. Information hiding capacity is one of the key indicators of information hiding systems. This method is to realize the secure transmission of secret images by transmitting disguised images without any modification. The receiver can obtain the same visual image as the secret image by receiving the disguise image and transmitting it to the trained generator. The method increases the information hiding capacity. The definition of information hiding capacity is shown in Eq. 6, and a simple comparison is made with several common information hiding methods in hiding capacity. The comparison results are shown in Table 1.

Relative capacity =
Absolute capacity The size of the image (6)

Conclusions
In this paper, the WGAN-GP model was constructed as required. The model is trained using disguised images and secret images so that the transmitted disguised images can be visually the same as the secret images after being passed to the generator. The transmission is a disguised image without any modification, which is not easy to cause the suspicion of the attacker. This method not only solves