JPEG image steganography payload location based on optimal estimation of cover co-frequency sub-image

The excellent cover estimation is very important to the payload location of JPEG image steganography. But it is still hard to exactly estimate the quantized DCT coefficients in cover JPEG image. Therefore, this paper proposes a JPEG image steganography payload location method based on optimal estimation of cover co-frequency sub-image, which estimates the cover JPEG image based on the Markov model of co-frequency sub-image. The proposed method combines the coefficients of the same position in each 8 × 8 block in the JPEG image to obtain 64 co-frequency sub-images and then uses the maximum a posterior (MAP) probability algorithm to find the optimal estimations of cover co-frequency sub-images by the Markov model. Then, the residual of each DCT coefficient is obtained by computing the absolute difference between it and the estimated cover version of it, and the average residual over coefficients in the same position of multiple stego images embedded along the same path is used to estimate the stego position. The experimental results show that the proposed payload location method can significantly improve the locating accuracy of the stego positions in low frequencies.

scheme of stego positions is known, if the investigator can locate the steganography payload with the accuracy higher than randomly guessing, he (or she) can extract the hidden information by a collision attack.
Although Quach [17] has proved the locatability of modified pixels in a single stego image, the actual steganography payload algorithms designed for a single stego image can only locate the steganography payload with low accuracy because it is very difficult to precisely estimate the cover of the given stego image and about half of the stego elements are still unchanged [18]. However, for the convenience of communication, many communication participants use the same key in a certain period of time and limit the embedding ratio. At this point, if they use multiple images with the same size to embed a large amount of data, the investigator may possess a number of stego images each containing payload at the same locations. Under such a scenario, in 2008, Ker [19] firstly proposed a payload location algorithm based on weighted stegoimage (WS) residuals for least significant bit (LSB) replacement. After that, many payload location algorithms have been proposed for spatial image steganography under this condition. Chiew and Pieprzyk [20] modified Ker's algorithm to locate the payload of binary image replacement steganography under the same condition. Ker and Lubenko [21] proposed a payload location algorithm for LSB matching, which filters the horizontal, vertical, and diagonal wavelet subbands of stego images by Wiener filter, and locates the stego pixel positions according to the absolute sum of the wavelet residuals in the same positions of multiple images embedded messages into the same positions. Quach [22,23] proposed several payload location algorithms for LSB replacement and LSB matching, which employ the Viterbi decoding algorithm or Quadratic Pseudo-Binary Optimization (QPBO) algorithm to find the optimal estimate of the cover image, and compute the residuals between the estimated cover images and the stego images to locate the payload. Gui et al. [24] proposed a payload location algorithm for LSB matching steganography by fusing the mean of 4 neighborhood pixels and 8 residuals computed along 8 different directions by the algorithm proposed by Quach [22]. Liu et al. [25] proposed a payload location algorithm for embedding messages into the spatial images subjected to JPEG compression by LSB replacement or LSB matching, which estimates the cover images by JPEG recompressing the stego images and decompressing the re-compressed versions. Yang et al. [15] proved the properties of the optimal stego subset of the multiple least significant bits (MLSB) steganography, then proposed a payload location algorithm and a stego key recovery algorithm based on the optimal stego subset. Sun et al. [26] proposed a payload location algorithm base on a tailored deep neural network (DNN) equipped with the improved feature named the "mean square of adjacency pixel difference." The above algorithms can locate the payload of LSB replacement, LSB matching, and MLSB replacement steganography with high accuracy and even can be used to estimate groups in group parity steganography or extract the hidden message for some special cases. However, they cannot work for the steganography algorithms with JPEG image as cover.
When the messages are embedded into the JPEG images, recently, the authors [27] proposed a payload location method based on co-frequency sub-image filtering for a category of pseudo-random scrambled JPEG image steganography. The accuracy of this payload location method is influenced by the fidelity of the estimated cover images and can be improved if a more precise estimator can be designed.
Activated by the optimal cover estimation method proposed by Quach in [22] for spatial image steganography, this paper proposes a payload location method for JPEG image steganography based on the optimal estimation of cover co-frequency subimage. Instead of directly applying the maximum a posterior (MAP) probability algorithm to the given stego spatial image to estimate the cover spatial image by the method in [22], the proposed method divides the stego JPEG image into 64 cofrequency sub-images, then applies the MAP algorithm to estimate the optimal cover co-frequency sub-images, and combines them to obtain the optimal cover JPEG image. This makes use of the correlation between the coefficients in the same position of adjacent blocks with a size of 8 × 8.
The structure of this paper is as follows: Section 2 briefly introduces the random JPEG image steganography targeted in this paper. Section 3 proposes the payload location method based on the optimal estimation of cover co-frequency sub-image. Section 4 gives a specific payload location algorithm for F5 steganography. Section 5 presents the experimental results and the discussions. Finally, the paper is summarized in Section 6.

Related work-Pseudo-random JPEG image steganography
In order to improve the security of JPEG image steganography, the steganographer often embeds secret messages into the quantized DCT coefficients scrambled pseudorandomly. And because there are a lot of quantized DCT coefficients with value of 0 in JPEG images, if the steganographer embeds messages into these coefficients, the doubtful artificial clue will be found by steganalyzer. Thus, many JPEG image steganography methods do not embed message bits into these coefficients and do not embed message bits into the coefficients whose values would be changed to be 0. These JPEG image steganography methods can be described as follows.
Input: a cover JPEG image C = c 1 c 2 …c N , a secret message bit sequence M = m 1 m 2 … m L and a stego key K.
Output: a stego JPEG image. Steps: 1. Scramble the quantized DCT coefficients in the cover JPEG image C according to the stego key K, to generate the scrambled coefficient sequence  2.5.Embed the ith message bit into the jth coefficient c 0 j . 2.6.If the embedding changes the value of coefficient c 0 j to be the value which cannot carry a message, for example, F5 steganography changes the coefficient value 1 to be 0, assign the index of the scrambled coefficient as j + 1, viz. j = j + 1. If j > N, return 0, otherwise go to step 2.3. 2.7.Assign the index of the secret message bit as i + 1, viz. i = i + 1. If i > L, go to step 3. 2.8.Assign the index of the scrambled coefficient as j + 1, viz. j = j + 1. If j > N, return 0, otherwise go to step 2.2. 3. Inverse scramble the coefficient sequence after embedding according to the stego key K; 4. Encode the obtained coefficient sequence to a stego JPEG image, and return the generate stego JPEG image.
3 Methods-Payload location based on optimal estimation of cover cofrequency sub-image

Principle
When the secret messages are embedded into the pseudo-randomly scrambled coefficients as described in Section 2, if the investigator possesses T stego images S 1 , S 2 , ⋯, S T embedded along the same embedding path, then either of the following two cases may happen to the coefficients S 1 (i, j), S 2 (i, j), …, S T (i, j) in the same position (i, j) of T stego images: 1) If the position (i, j) is a stego position, the steganographer will determine whether to embed the message bit into the coefficient in this position according to whether the coefficient is available. Thus, any coefficient of S 1 (i, j), S 2 (i, j), …, S T (i, j) is either an unavailable coefficient or a stego coefficient containing a message bit. 2) If the position (i, j) is a non-stego position, the steganographer will not embed the message bit into the coefficient in this position regardless of whether the coefficient is available. Thus, no coefficients of S 1 (i, j), S 2 (i, j), …, S T (i, j) contain a message bit.
Let C 1 , C 2 , …, C T denote the corresponding cover images of the stego images S 1 , S 2 , …, S T . A residual r t (i, j) of the coefficient in the position (i, j) of the tth stego image is defined as Let rði; jÞ denote the mean of all r t (i, j) over T stego images in the position (i, j).
If the position (i, j) is a non-stego position, rði; jÞ must equal to 0, viz. rði; jÞ ¼ 0. If the position (i, j) is a stego position, rði; jÞ must be larger than or equal to 0, viz. rði; jÞ ≥ 0 , where the equal sign only holds in the case of that all of the coefficients C 1 (i, j), C 2 (i, j),…, C T (i, j) are not modified. When one possesses enough stego images, the probability that none of the coefficients C 1 (i, j), C 2 (i, j),…, C T (i, j) is modified is small. Thus, the investigator should be able to distinguish the stego positions from the non-stego positions according to the means of residuals if he can obtain the cover images.
However, the investigator often cannot know the cover JPEG images. In this case, if the investigator can estimate the cover images, which are denoted byĈ 1 ;Ĉ 2 ; …;Ĉ T , he can compute the mean of the estimated residuals in the same position (i, j) of different stego images as follows: If the investigator possesses enough stego images embedded along the same path and can estimate the covers of them accurately enough, he may also be able to distinguish the stego positions from the non-stego positions with a success rate higher than a random guess based on the averaged estimated residuals as follows: where f(i, j) = 1 denote that the position (i, j) is determined as a stego position, f(i, j) = 0 denote the position (i, j) is determined as a non-stego position, and Thr is a decision threshold.
Certainly, the more accurately the cover JPEG images are estimated, the higher the accuracy of payload location is. Therefore, in the following subsection of this section, a method is proposed to estimate the optimal cover co-frequency sub-images, then combine them to estimate the cover JPEG image.

Optimal cover JPEG image estimation
In [22], Quach et al. considered the strong correlation between neighboring pixels of spatial image and used the maximum a posterior (MAP) probability algorithm to estimate the optimal cover image corresponding to the stego image of LSB replacement and LSB matching steganography, which was used to locate the hidden information of LSB replacement and LSB matching steganography. In JPEG compression, the DCT transformation of pixel values greatly reduces the correlation between adjacent coefficients. And in order to improve the efficiency of JPEG compression, the DCT transformation is performed on each non-overlapping pixel block with a size of 8 × 8. Since the coefficients in the same position represent the magnitude of energy in the same frequency and the adjacent blocks in an image still have strong similarity, the coefficients in the same position of adjacent blocks still have a strong correlation. According to the property, this section will use the same method in [27] to divide the given JPEG images into 64 co-frequency sub-images, then use the maximum a posterior probability algorithm to estimate the optimal cover co-frequency sub-images, and combine them to get the optimal estimation of cover JPEG image.

Markov model of co-frequency sub-image
Let S d t and C d t denote the co-frequency sub-images composed of the dth quantized DCT coefficients in all 8 × 8 blocks of the tth stego image and its cover image, d = 1, 2, …, 64. In a statistical sense, the optimal estimation of cover co-frequency sub-images corresponding to S d t should be the cover co-frequency sub-image estimationĈ Then, the optimal cover co-frequency sub-image estimation is transformed into a problem of maximum a posterior probability estimation.
Similar to [22], the following two assumptions are set: where k is a given positive integer. Eq. (5) indicates that each quantized DCT coefficient in the stego co-frequency sub-images is only related to the corresponding quantized DCT coefficient in the cover co-frequency sub-images, while Eq. (6) indicates that the cover co-frequency sub-image C d t is modeled with a k-order Markov model. For a given steganography algorithm, one can calculate the probabilities that the quantized DCT coefficient value changes to different possible values under a specific embedding rate α, viz. the transition probability in assumption (5). Besides, the prior probability in (6) can be computed from a large number of cover images.
After dividing all quantized DCT coefficients into 64 co-frequency sub-images, each sub-image is scanned by four modes as shown in Fig. 1 to calculate the co-occurrence matrices of the adjacent elements.
In JPEG image, the distributions of coefficient values in different co-frequency subimages show obvious differences. As shown in Fig. 2, the absolute values of coefficients in the low frequencies (corresponding to the upper left positions) are usually larger and equal to zero with the lowest probabilities, and most of the absolute values of coefficients in the high frequencies (corresponding to the lower right positions) equal to zero. Figure 3 presents the frequencies of zero coefficient in the different sub-images, where 10,000 images with a size of 512 × 512 in Bossbase 1.01 (http://agents.fel.cvut.cz/ stegodata/) are JPEG compressed with a quality factor of 75. The abscissa is the index of the position in the 8 × 8 block from left to right and top to bottom. It can be seen that the relative frequencies of zero coefficient in the sub-images corresponding to the lower right positions are close to 1.

Optimal cover JPEG image estimation based on first-order Markov model
In theory, we should compute the probabilities for all possible covers and search the cover which satisfies Eq. (4). But there are too many possible coefficient values in the cover image to search the whole possible space. Fortunately, the co-frequency subimage can be modeled by the hidden Markov model, and the Viterbi algorithm is a common method to solve the problem of the hidden Markov model. It has been used in cover image estimation of spatial steganography such as LSB replacement and LSB matching in [22]. Therefore, The Viterbi algorithm will also be adopted to search the optimal cover co-frequency sub-image. The Viterbi algorithm first computes the scores of the possible values of the first cover element as follows: Then, the scores of the possible values of the subsequent cover elements are computed as follows: where c k, i is possible value of the kth cover element in the ith image. Take a stego co-frequency sub-image with four quantized DCT coefficients S = (2, 0, −1, 1) of the typical F5 steganography as example, where the embedding ratio is 0.5.
According to the embedding rule of F5 steganography, the possible values of the four cover coefficients are c 1 ∈ {2, 3}, c 2 ∈ {−1, 0, 1}, c 3 ∈ {−1, −2}, and c 4 ∈ {1, 2}. Figure 4 shows the trellis for Viterbi algorithm, which takes the possible values of four cover coefficients as nodes. The Viterbi algorithm first computes the scores of nodes in the first column of the trellis, where the value of p(c 1 ) can be obtained by statistics of a large number of cover JPEG images. For ease of understanding, it is assumed that the values of p(c 1 ) are as shown in the second column of Table 1. When the embedding ratio of F5 steganography is q, the coefficient value transition probability of F5 steganography is as follows: Then the scores of the subsequent nodes are computed in sequence by Eq. (8), and each node is connected with the previous node which maximizes its score. The values of p(c k | c k − 1 ) also can be obtained by statistics of a large number of cover JPEG images. It is assumed that the values of p(c k | c k − 1 ) are as shown in the last column of Table 1. Fig. 4 The trellis for Viterbi algorithm based on the first-order cover probability model Table 1 Example of the first-order cover probability model Finally, take the coefficient values in the path ending at the node with the largest score in the last column as the optimal estimation of the cover coefficients, as shown by the gray node in Fig. 4. It can be seen that when the embedding ratio is 0.5, the optimal estimation of the cover coefficient sequence of S = (2, 0, −1, 1) isĉ ¼ ð3; − 1; − 2; 2Þ .
After the optimal estimation of each cover co-frequency sub-image is obtained by the Viterbi algorithm, one can place the coefficients of all estimated cover co-frequency subimages at the original positions of them to combine the optimal estimation of the cover JPEG image. The whole process is shown in Fig. 5, which is described in Algorithm 1.
In theory, each cover co-frequency sub-image may be estimated more precisely by the first-order Markov model in the corresponding frequency. However, in many frequencies, there are a large number of coefficients with value of 0 which result in that the statistical significance of non-zero coefficient is not significant. Thus, in follows the first-order Markov model merged over different positions is used to estimate the cover co-frequency sub-images.

Payload location algorithm for F5 steganography without Matrix Encoding
The F5 steganography algorithm improves F4 by using shuffling. In F5 steganography, the positive odd and negative even represent the bit 1, while the positive even and negative odd represent the bit 0, and the DCT coefficients with value of 0 and DC coefficients do not carry secret information. The coefficient value transition probability of F5 steganography is shown by (9). When T stego JPEG images of F5 steganography are given, we can adopt the existing quantitative steganalysis algorithms to estimate the embedding ratios and then use the proposed Algorithm 1 in Section 3 to estimate the corresponding cover JPEG images. For each given stego JPEG image, we can scan it by 4 different modes as shown in Fig. 1, and then 4 estimated cover JPEG images can be obtained by Algorithm 1.
After that, the residuals between the given stego image and the estimated cover JPEG images are computed as follows: which is slightly different from the previous residual calculation Eq. (1). For each position, 4T residuals can be computed from the given T stego JPEG images and 4T estimated cover JPEG images by (10), and then be averaged. The averaged value will be used to determine whether this position is a stego position. The detailed steps of the payload location for F5 steganography are given in Algorithm 2.

Experimental setup
In total, 10,000 PGM images with a size of 512 × 512 were downloaded from the BOSSbase1.01 and converted to cover JPEG images with a quality factor of 75. Nine thousand images were randomly selected from the generated cover JPEG images to count the first-order Markov model of cover co-frequency sub-image. The remaining 1000 images were used to test the performance of the proposed algorithm. A pseudorandom path was generated by scrambling the integer sequence 1, 2,…, 512 × 512. Then along the generated path, the pseudo-random message bits were embedded into the remaining 1000 images by F5 steganography (without matrix encoding) with ratio q = 0.5.

Markov model selection
From Algorithm 1 and 2, it can be found that the payload location accuracy is highly affected by the adopted first-order Markov model. In Section 3, we suggest to merge the Markov models over different frequencies to estimate the cover co-frequency subimage more precisely. Thus, we tried to merge proper Markov models.
Firstly, the 64 Markov models m 1 …m 64 counted from sub-images corresponding to 64 positions in 8 × 8 matrix were applied to estimate the cover JPEG images separately, and the Markov model m i with the highest payload location accuracy was selected. Then, each of the remaining 63 models was merged to m i to obtain 63 new merged modes m i1 …m i63 , and the merged Markov model m ij with the highest payload location accuracy was selected. This operation was repeated until all models were merged. The merged model with the highest payload location accuracy was selected as the final model.
One thousand test stego JPEG images with embedding ratio 0.5 were used to select the proper merged Markov model. Table 2 presents the location correctness of each co-frequency sub-images with the single corresponding Markov model, namely, 64 cofrequency sub-image models are used for the corresponding sub-images respectively. Table 3 shows the results when the optimal merged Markov model was used.
In Tables 2 and 3, the correctness in the most upper left is not shown because the DC coefficients are not changed by F5 steganography. Comparing Table 2 with 3, we can see that for most positions, the location accuracy by using the optimal merged Markov model is much higher than that by using the individual model. Especially, the algorithm with the optimal merged Markov model can rightly distinguish the stego positions in low frequencies with accuracy close to 90%, even close to 95%. For the highfrequency positions, because there are very few available coefficients, it is still hard to distinguish the stego positions.   Figure 6 shows the payload location accuracy of MAP-F5 with the optimal merged Markov model for different numbers of stego images when the embedding ratio is 0.5. It can be seen that the more the number of stego images, the higher the accuracy. As the number of images increases, the fluctuation of the residual means becomes smaller, and the residual means are closer to the change caused by information embedding. Therefore, the number of stego images is very important for locating the stego positions. Figure 7 compares the accuracies of the proposed algorithm and the payload location algorithm based on co-frequency sub-image wavelet filtering (CSW-F5 ) [27]. The 1000 stego images are generated with the same embedding path and the embedding ratio of 0.5. In the upper left corner of 8 × 8 block where the number of the 0 coefficient is relatively small, MAP-F5 obtains better results than CSW-F5. In practice, the results of the two payload location algorithms can be further combined. This paper proposes a payload location method based on optimal estimation of cover co-frequency sub-image. The proposed method divides each given stego JPEG image into 64 co-frequency sub-images, then estimates the optimal cover JPEG image by applying the maximum a posterior probability algorithm to the co-frequency sub-images, and finally determines the stego positions according to the averaged residuals between given multiple stego images embedded along the same path and the estimated cover images. The proposed method is applied to the payload location for F5 steganography without matrix encoding and the experimental results show that the proposed algorithm can locate the stego positions with higher accuracy than prior works. However, the proposed payload location method cannot work for the modern adaptive JPEG image steganography, JUNIWARD, UERD, and GUED. Therefore, in future, we will try to adapted the proposed cover JPEG image estimation method for the modern adaptive JPEG steganography. Besides, we will also try to improve the performance by using unsupervised learning to cluster the image blocks with similar contents [28].