Compared to existing reversible data hiding methods, the proposed method can embed much more data with less distortion. The proposed framework is mainly based [12] and is divided into six sub-sections, including prediction via multi-directional gradient scheme, embedding algorithm, embedding selection by non-linear regression analysis and self-block standard deviation statistics, automatic embedding range decision, extracting and reversing algorithm, and overflow and underflow problem.
Prediction via multi-directional gradient scheme
The accuracy of prediction method can determine the embedding capacity of an RDH system as well as the image quality after the embedding. In this paper, we propose a multi-directional gradient prediction method. Original image is divided into the cross set, the star, the circle, and square for four embedding. The block diagram of embedding process is shown in Fig. 1, and the block diagrams of extracting and reversing process are shown in Fig. 2.
The prediction procedure is described as following:
- 1.
Assume I is a 5×5 8 bit grayscale original image, where I(i,j) is one pixel of the image, as shown in Fig. 3a. First, all pixels of the image I are divided into four groups “square,” “cross,” “star,” and “circle” as shown in Fig. 3b. We define the four groups as G1, G2, G3, and G4, respectively. With their independency to each other, we can utilize G2, G3, and G4 to predict G1. We only discuss G1 in this section since G2,G3, and G4 are the same cases.
- 2.
Mirror the 5×5 original image I into 7×7 mirror image MI, as shown in Fig. 3c.
- 3.
The G1 is hidden as missing image, and then the four neighboring pixels are utilized to predict central pixel by Eq. (10), where MI(i,j) is the position of the predicted central pixel, as shown in Fig. 3d. Image PI is the prediction result of the 7×7 MI, as shown in Fig. 3e.
$$ {}\begin{aligned} {\text{PI}}(i,j)&={\text{round}}\left(\left({\text{MI}}(i,j-1)+{\text{MI}}(i+1,j)+{\text{MI}}(i,j+1)\right.\right.\\ &\quad\left.\left.+{\text{MI}}(i-1,j)\right)/4\right) \end{aligned} $$
(10)
- 4.
In order to calculate the gradient information of the pixels of the border, the prediction image PI is mirrored into a 9×9 mirroring prediction image MPI, as shown in Fig. 3f. Afterwards, the multi-directional gradient information is generated through four kinds of sobel masks as shown in Fig. 4. The four masks are defined as mx, my, mxy, and myx, where mx is the horizontal mask, my is the vertical mask, mxy is 45∘ mask, myx is 135∘ mask, respectively.
We use Eqs. (11)–(14) to calculate the gradient information of the vertical direction Δx, the gradient information of the horizontal direction Δy, the gradient information of the 45∘ direction Δxy, and the gradient information of the 135∘ direction Δyx.
$$ \Delta x = |mx\times {\text{MPI}}| $$
(11)
$$ \Delta y = |my\times {\text{MPI}}| $$
(12)
$$ \Delta xy = |mxy\times {\text{MPI}}| $$
(13)
$$ \Delta yx = |myx\times {\text{MPI}}| $$
(14)
- 5.
In order to generate the estimated image EI, we calculate the missing image MI by four kinds of gradient information Δx, Δy, Δxy, and Δyx, as indicated in Eqs. (15)–(22). With Eqs. (15)–(22), we can generate eight weights of the eight neighboring positions, x_weight1, x_weight2, y_weight1, y_weight2, xy_weight1, xy_weight2, yx_weight1, and yx_weight2, as shown in Fig. 5.
$$ \begin{aligned} x\_\text{weight1}&=\text{Weight}/\left(\Delta x(i,j)+{\text{Coe}}\times \Delta x(i,j-1)\right.\\ &\quad\left.+\Delta x(i,j-2)+1\right) \end{aligned} $$
(15)
$$ \begin{aligned} x\_\text{weight2}&=\text{Weight}/\left(\Delta x(i,j)+{\text{Coe}}\times \Delta x(i,j+1)\right.\\ &\quad\left.+\Delta x(i,j+2)+1\right) \end{aligned} $$
(16)
$$ \begin{aligned} y\_\text{weight1}&=\text{Weight}/\left(\Delta y(i,j)+{\text{Coe}}\times \Delta y(i-1,j)\right.\\ &\quad\left.+\Delta y(i-2,j)+1\right) \end{aligned} $$
(17)
$$ \begin{aligned} y\_\text{weight2}&=\text{Weight}/\left(\Delta y(i,j)+{\text{Coe}}\times \Delta y(i+1,j)\right.\\ &\quad\left.+\Delta y(i+2,j)+1\right) \end{aligned} $$
(18)
$$ {}\begin{aligned} xy\_\text{weight1}&=\text{Weight}/\left(\Delta xy(i,j)+{\text{Coe}}\!\times\! \Delta xy(i\,-\,1,j\,-\,1)\right.\\ &\quad\left.+\Delta xy(i-2,j-2)+1\right) \end{aligned} $$
(19)
$$ {}\begin{aligned} xy\_\text{weight2}&\,=\,\text{Weight}/\!\left(\!\Delta xy(i,j)+{\text{Coe}}\times \Delta xy(i+1,j+1)\right.\\ &\quad\left.+\Delta xy(i+2,j+2)+1\right) \end{aligned} $$
(20)
$$ {}\begin{aligned} yx\_\text{weight1}&\,=\,\text{Weight}/\left(\Delta yx(i,j)+{\text{Coe}}\times \Delta yx(i\,-\,1,j+1)\right.\\ &\left.+\Delta y(i-2,j+2)+1\right) \end{aligned} $$
(21)
$$ {}\begin{aligned} yx\_\text{weight2}&\,=\,\text{Weight}/\!\left(\!\Delta yx(i,j)\,+\,{\text{Coe}}\!\times\! \Delta yx(i+1,j-1)\right.\\ &\left.+\Delta y(i+2,j-2)+1\right) \end{aligned} $$
(22)
Where Weight and Coe are two weight parameters. In general, the closer the positions are, the more information is provided, such as the vertical weight and the horizontal weight. In contrast, the farther position results in the lack of the information provided, such as the 45∘ weight and 135∘ weight. Therefore, we use two parameters Weight and Coe to adjust the weight of the rule. In this paper, we apply PSO algorithm [30] to estimate the most appropriate weight values. This algorithm is applied to solve optimization problems, and refer to Section 4.3. On the other hand, if the amount of gradient information of the neighboring pixel is tremendous, the neighboring pixel contributes less to predict the central pixel. On the contrary, the pixel has more contribution. Moreover, the eight weights of the eight neighboring pixels are utilized by Eq. (23) to estimate the 5×5 estimated image P, where MI(i,j) represents one pixel of the missing image MI.
$$ {}\begin{aligned} P(i,j)&=\left\lfloor \left(x\_{\text{weight}}1\times {\text{MI}}(i,j-1)+x\_{\text{weight}}2\times {\text{MI}}(i,j+1)\right.\right. \\ &+y\_{{\text{weight}}1}\times {\text{MI}}(i-1,j)+y\_{{\text{weight}}2}\times {\text{MI}}(i+1,j) \\ &+xy\_{{\text{weight}}1}\times {\text{MI}}(i-1,j-1)\,+\,xy\_{{\text{weight}}2}\times {\text{MI}}(i+\!1,j\,+\,1) \\ &\left.+yx\_{{\text{weight}}1}\times {\text{MI}}(i-1,j+1)+y\_{{\text{weight}}2}\times {\text{MI}}(i+1,j\,-\,1)\right)\\ &\left./ {\text{weight}}\_{{\text{sum}}}\right\rfloor \\ {\text{where}} \\ {\text{weight}}\_{{\text{sum}}} &= x\_{{\text{weight}}1} + x\_{{\text{weight}}2} + y\_{{\text{weight}}1} + y\_{{\text{weight}}2} \\ &+ xy\_{{\text{weight}}1} + xy\_{{\text{weight}}2} + yx\_{{\text{weight}}1} + yx\_{{\text{weight}}2} \end{aligned} $$
(23)
Then, the 5×5 difference histogram e is generated from the difference values between the original image I and the estimated image P by Eq. (24).
$$ e(i,j)=I(i,j)-P(i,j) $$
(24)
Embedding algorithm
Assume T1 and T2 are two thresholds, where T1≥0 and T2<0. Before embedding, the T1 and T2 thresholds are decided appropriately by Section 3.4. Next, the embedding position are differentiated into allowing embedding or non-allowing embedding one by Section 3.3. If the position is allowing embedding, the message is embedded by Eq. (26). If the position is non-allowing embedding, the position is skipped. A key pseudo-random binary sequence generated by the encryption key is utilized to encrypt secret message w through exclusive-or operation, encryption data is generated h, as shown in Eq. (25).
$$ h=w \oplus {\text{key}} $$
(25)
The procedure of embedding message is described below:
$$ {}e'(i,j)= \left\{\begin{array}{ccc} e(i,j)+T1+1 & if & e(i,j)>T1\ {\text{and}}\ T1\geq0 \\ e(i,j)+T2 & if & e(i,j)<T2\ {\text{and}}\ T2<0 \\ 2e(i,j)+h & if & T2\leq e(i,j)\leq T1 \end{array}\right. $$
(26)
where h∈{0,1} is the current encrypted hidden bit.
After embedding the hidden data, the stego-image S is obtained as
$$ S(i,j)=e'(i,j)+P(i,j) $$
(27)
Likewise, embed the three sets cross, star, and circle as above. Finally, the stego-image S and two threshold values, T1 and T2 are outputted.
Embedding selection by non-linear regression analysis and self-block standard deviation statistics
By Section 3.2, we can know that the difference histogram e and two thresholds T1, T2 are utilized to embed messages. In general, the difference histogram is concentrated on 0, thus the 0 position is embedded preferentially when embedding. Next, move current position to the left or right for embedding, as shown in Section 3.4. When embedding messages, not all positions can be embedded, but in order to comply with the rule of a reversible data hiding method, all positions must be shifted, which leads to the reduction of image quality after embedding. Therefore, we hope design a rule that can classify these positions into allowing embedding positions and non-allowing embedding positions, and then reduce the unnecessary shifting to improve the image quality after embedding.
Training stage
First, we choose 30 nature images. Each image is passed through stage 1 to 4 of Section 3.1 to generate mirroring prediction image MPI and to calculate the standard deviation value of the current position σi,j by Eq. (28)
$$\begin{array}{@{}rcl@{}} \begin{aligned} \sigma_{i,j}&=\left|\sqrt{\frac{1}{8}\sum_{(i,j)\in\omega}[{\text{MPI}}_{i,j}-\overline{{\text{MPI}}}_{i,j}]^{2}} \right| \\ {\text{where}} \\ \overline{{\text{MPI}}}_{i,j}&=\frac{1}{8}\sum_{(i,j)\in\omega}{\text{MPI}}_{i,j} \\ \omega &= \left\{(i-1,j-1), (i+1,j+1), (i-1,j+1),(i+1,j-1), \right.\\ & \left.(i-1,j), (i+1,j), (i,j-1), (i,j+1)\right\} \end{aligned} \end{array} $$
(28)
Next, calculate the difference histogram e by the stage 5 of Section 3.1. The two thresholds T1=4, T2=− 4 are chosen to calculate the probability of the current position that difference histogram e(i,j) is T2≤e(i,j)≤T1 when standard deviation value is σ, and that is the probability of embedding EP(σ), as shown in Eq. (29)
$$\begin{array}{@{}rcl@{}} {\text{EP}}(\sigma) &=& \frac{{\text{EC}}(\sigma)}{{\text{EC}}(\sigma)+{\text{NEC}}(\sigma)} \\ {\text{where}}\ 0\leq \sigma \leq 49 \end{array} $$
(29)
Where EC is the embedding capacity, and NEC is the non-embedding capacity. Figure 6 indicates the histogram of the embedding probability, where x axis is the size of the standard deviation value σ, and y axis is the size of the embedding probability EP. We can find that when the lower the standard deviation leads to the higher probability of embedding, it represents the position is in a smooth region, thus it is easy to predict. Otherwise, this position is in a complex area, so it is difficult to predict. We set a threshold th, when σ(i,j)≤th, the position is used to embedding the message h by Eq. (26), otherwise, the position is skipped. During embedding, the embedding rage T1=4 and T2=− 4 are applied. We also utilize the 30 nature images to do the embedded statistics, where threshold th is 2, 4, 6, 8, 12, 16, 20, and 24. When the threshold th is the same and embedding rage T1=0∼4, T2=0∼− 4, we count the relation between the PSNR and the embedding capacity, as shown in Fig. 7, where the label origination denotes the threshold is not applied when embedding. Figure 7 shows that the best embedding threshold is different in different embedding capacity. It also shows that the image quality is really improved after embedding when the embedding threshold is used. We use non-linear regression analysis method to predict a quadratic curve function in each embedding threshold th, as shown in Eq. (30), where threshold th = 2,4,6,8,12,16,20,and24 and x is the embedding capacity.
$$ y({\text{th}},x)=a_{{\text{th}}}+b_{{\text{th}}}x+c_{{\text{th}}}x^{2} $$
(30)
Figure 8 shows the relation figure of the embedding capacity and the quality of the image when the threshold th = 12. The solid line is the line graph after statistics actually and the dotted line is the quadratic curve graph by using the non-linear regression analysis. Therefore, we can generate eight quadratic curve functions.
Testing stage
Before embedding, if this image is the type that has many edges, the amount of the small standard deviation values is relatively small, such as standard deviation values = 1, 2, and 3. It would cause some problems like the amount of the embedding is limited or the embedding range is increased too much, which reduces the quality of the embedding. In order to avoid this special case, after generating mirroring prediction image MPI, count the amount of the current standard deviation value σi,j in advance, and then set the largest standard deviation value as initial value init_th.
Next, employ the capacity of the embedding message x to find the best threshold best_th by Eq. (31).
$$ {\text{best\_{th}}}=\arg\max_{{\text{th}}} y({\text{th}},x) $$
(31)
If best_th<init_th, the best threshold best_th=init_th. If the best threshold best_th≥init_th, the best_th is not changed. Finally, we can utilize the best threshold best_th to generate the best decision and to differentiate between embedding region and non-embedding region, and then needless shifting is reduced.
Automatic embedding range decision
In Sections 3.1 and 3.3, our method needs to decide the embedding rage T1 and T2. Therefore, we propose a method that can utilize the size of the embedding messages to generate the best range T1 and T2 automatically to achieve the best quality of the embedding, as show in Figs. 9 and 10.
The proposed methodology is described in two stages:
- 1.
Generating an initial embedding range
First, input the embedding messages x and the best_th be generated by Section 3.3. Next, initialize F_T1=0, F_T2=0 and D=R, and then employ F_T1, F_T2 and th to embed messages by the embedding method of Section 3.2. The embedding range is F_T1 and F_T2, while the amount that can be embedded is less than the capacity of embedded messages x, the embedding range expands to the left or right. In contrast, while the amount can be embedded is more than or equal to the capacity of embedded messages x, F_T1, F_T2, and D are outputted, where D is used to judge left or right for expansion. The aim is, when increasing the range, it would balance expand to the left or right from center to achieve the best image quality, as shown in Fig. 9.
- 2.
Generating the best embedding range
First, input the F_T1, F_T2 and D that be generated from stage 1 and the best_th that be generated from Section 3.3. We use the best_th to let the embedding position differentiate between allowing embedding and non-allowing embedding. Next, it is expanded to the left or right from center balanced again to generate another embedding range S_T1 and S_T2. We compare the image quality of the embedding range F_T1, F_T2 with the image quality of the embedding range S_T1, S_T2. Sometimes, it would add the total amount of the small standard deviation values by increasing the embedding range. This will increase the success rate of judging the embedded position and the non-embedded position, thereby increases the image quality after embedding. However, due to this condition, the condition would probably increase the unnecessary shifting to reduce the image quality after embedding. Therefore, we compare the image quality after embedding these two kinds of status and find the best method to output embedding range, as shown in Fig. 10.
Figure 11 shows the example of embedding hiding messages into a difference image. Assume x is 20,000 bits, and we get the results of 8 quadratic curve functions by Eq. (31), as follows:
- 1.
th2:y=53.5,th4:y=55.3,th6:y=53.8,th8:y=53,
- 2.
th12:y=52.2,th16:y=51.8,th20:53.6,th24:y=53.4
among them the max y is th4, so best_th = 4.
If the embedding position’s standard deviation value σ≤ 4, the position is allowing embedding position, otherwise it is non-allowing embedding position. Assume a message h=0101. σ(1)≤ 4, this position is allowing embedding position. We can find that the position can be an embedded message when − 1≤e≤1 by Eq. (26); otherwise, it cannot be an embedded message, but it still needs to be shifted. Thus, e(1) = 2 cannot be an embedded message, but we still shift it. We can calculate e′(1) = 4 by Eq. (26). Since σ(2)≤4, it is an allowing embedding position. e(2) = − 1 means it can be embedded message h(1) = 0, then we can calculate e′(2) = − 2 by Eq. (26). Similarly, σ(3)≤4 is an allowing embedding position, e(3)=0 can be embedded message h(2) = 1, then e′(3) = 1. σ(4) is an allowing embedded position, the position of e(4) cannot be an embedding message, thus e′(4) = 5. σ(5) > 4, so it is a non-allowing embedding position, then e’(5) = 4 and it needs no shifting. e′(6) = 2 is an allowing embedding position; it can be an embedded message h(3) = 0, then e′(6) = 2. σ(7) is a non-allowing embedding position, then e′(7) = 0 and it needs no shifting. σ(8) and σ(9) are allowing embedding positions, the position of e(8) cannot be an embedded message, then e′(8) = − 3. The position of e(9) can be an embedded message h(4) = 1, then e′(9) = − 1. Finally, the embedded difference image e′ is generated.
Extracting and reversing algorithm
The procedure of message extraction and recovery are described below:
- 1.
Divide 5×5 stego-image S into 4 groups: square, cross, star, and circle. We define the 4 groups as G1, G2, G3, G4, respectively. We only discuss G4 in this sub-section because G3, G2, G1 are the same cases.
- 2.
Mirror the stego-image S into a 7×7 mirror image MS.
- 3.
Hidden the G4 as missing image, and the four neighboring pixels are utilized to predict center pixel, and then a 5×5 prediction image PS is generated.
- 4.
Mirror prediction image PS into a 9×9 mirror prediction image MPS.
- 5.
Calculate the weights of the eight neighboring pixels. Then a 5×5 stego-estimated image P′ is generated.
- 6.
A 5×5 difference histogram e′ is generated by Eq. (32).
$$ e'(i,j)=S(i,j)-P'(i,j) $$
(32)
- 7.
The embedding position differentiates between allowing embedding and non-allowing embedding by Section 3.3. If the position is allowing embedding, the hiding bit h is extracted by Eq. (33). If the position is non-allowing embedding, the position is skipped.
$$ h=e'(i,j)\ {\text{mod}}\ 2 \qquad if \quad 2\times T2 \leq e'(i,j) \leq 2\times T1 + 1 $$
(33)
- 8.
The key pseudo-random binary sequence is utilized to decrypt h through exclusive-or operation to get original secret message w.
- 9.
If the position is allowing embedding, original error prediction e(i,j) is obtained by Eq. (34). If the position is non-allowing embedding, e(i,j)=e′(i,j).
$$ {}e(i,j)\,=\, \left\{\!\!\begin{array}{ccc} e'(i,j)-T1-1 & {\text{if}} & e'(i,j)>2\times T1+1 \\ e'(i,j)-T2 & {\text{if}} & e'(i,j)<2\times T2 \\ \lfloor e'(i,j)/2 \rfloor & {\text{if}} & 2\times T2 \leq e'(i,j) \leq 2\!\times\! T1 + 1 \end{array}\right. $$
(34)
Recovery of the value of the original image I(i,j) is as follows:
$$ I(i,j)=e(i,j)+P(i,j) $$
(35)
Figure 12 shows the example of message extraction and recovery image. Assume x is 20,000 bits, the best_th = 4 can be calculated by Eq. (34), and T1=1, T2 = − 1. σ(1) and σ(2) are less than or equal to best_th, e′(1) and e′(2) are allowing embedding positions. We can find that − 2≤e′≤3 is a position of the embedded message by Eq. (34). Otherwise, it is a position of the non-embedded message. Therefore, e′(1) = 4, the position is a position of the non-embedded message, then we can calculate e(1) = 2 by Eq. (34). e′(2) = − 2 is a position of the embedded message, it can be calculated h′(1) = 0 by Eq. (33) and it can be recovered e(2) = − 1 by Eq. (34). Similarly, e′(3) is an allowing embedding position, and it is also a position of the embedded message, then h′(2) = 1, e(3) = 0. σ(4) is an allowing embedding position, and e′(4) is a position of the non-embedded message, then e(4) = 3. σ(5) > 4 is a non-allowing embedding position, then e(5) = 4, it need not shifting. σ(6) is a allowing embedding position, e′(6) is a position of the embedded message, then h′(3) = 0, e(6) = 1. σ(7) is a non-allowing embedding position, then e(7) = 0, it needs no shifting. σ(8) and σ(9) are allowing embedding positions, e′(8) is a position of the non-embedded message, e(8) = − 2. e′(9) is a position of the embedded message, then h′(4)=1, e(9) = − 1. Finally, the message h′=0101 and the reduced difference image e can be generated.
Overflow and underflow problem
A stego-image S is generated from Section 3.2. If the pixel is outside of 0∼255, it is called overflow or underflow. It cannot recover after embedding. Therefore, in embedding stage, we must consider this problem. This study uses the solution proposed by [13]. It is described below:
- 1.
Construct the m×n location map L, where m, n is the length and width of the original image I, respectively. Then, set all the positions L(i)=1.
- 2.
If embedding positions e(i,j) is [1,254], set L(i)=0 and embedding message. Otherwise, set L(i)=1 and switch into the next embeddable position.
- 3.
Encode the location map L by the lossless compression.
- 4.
Record the least significant bits of first 2⌈log2(m×n)⌉+LS image pixels where LS is the length of the compressed location map L.
During decoding processing, first the compressed location map L is reconstructed from 2⌈log2(m×n)⌉+LS image pixels of marked image. Then, the original location map is further generated by lossless decompression. Finally, the secret message is extracted and the host image is recovered.