A context-adaptive SPN predictor for trustworthy source camera identification

Sensor pattern noise (SPN) has been recognized as a reliable device fingerprint for camera source identification (CSI) and image origin verification. However, the SPN extracted from a single image can be contaminated largely by image content details from scene because, for example, an image edge can be much stronger than SPN and hard to be separated. So, the identification performance is heavily dependent upon the purity of the estimated SPN. In this paper, we propose an effective SPN predictor based on eight-neighbor context-adaptive interpolation algorithm to suppress the effect of image scene and propose a source camera identification method with it to enhance the receiver operating characteristic (ROC) performance of CSI. Experimental results on different image databases and on different sizes of images show that our proposed method has the best ROC performance among all of the existing CSI schemes, as well as the best performance in resisting mild JPEG compression, especially when the false-positive rate is held low. Because trustworthy CSI must often be performed at low false-positive rates, these results demonstrate that our proposed technique is better suited for use in real-world scenarios than existing techniques. However, our proposed method needs many such as not less than 100 original images to create camera fingerprint; the advantage of the proposed method decreases when the camera fingerprint is created with less original images.


Introduction
Digital images are easy to modify and edit via image-editing software. Image content becomes unbelievable. Using this kind of forged image should be avoided as evidence in a court of law, as news, as part of a medical record, or as financial documents. There are some works focused on image component forensics in recent years [1][2][3]. The work in [3] first proposed using the imaging sensor pattern noise (SPN) to trace back the imaging device and solve the camera source identification (CSI) problem. They extracted SPN from wavelet high-frequency coefficients using the wavelet-based denoising filter [4]. A camera reference SPN is built by averaging residual noise from multiple images taken by the same camera. In [5], an innovative and recently introduced denoising filter, namely, a sparse 3D transform-domain collaborative filtering (BM3D) [6], is used to extract the SPN. This filter is based on an enhanced sparse representation in a transform domain. A maximum likelihood method is proposed in [7] to estimate the camera reference SPN. It will be named the MLE CSI method for short in this paper. Later, [8] proposed a more stable detection statistic, the peak-to-correlation energy measure (PCE), to suppress periodic noise contamination and enhance CSI performance. The authors of [9] proposed a forgery-detection method using SPN to determine if an image is tampered. Li [10] demonstrated that the SPN extracted from a single image can be contaminated by image scene details and proposed some models to attenuate the strong signal component of noise residue. However, attenuating strong components from scene details may also attenuate the useful SPN components [11]. Kang et al. [11] proposed a detection statistic correlation over circular correlation norm (CCN) to lower the false-positive rate and a whitecamera reference SPN to enhance the ROC performance [12]. The noise residues extracted from the original images are whitened first and then averaged to generate the white-camera phase reference SPN. We call it the phase CSI method for short in the rest of this paper.
Although there have been some prior studies dedicated to improving the performance of CSI based on SPN in recent years, an effective method to eliminate the contamination of the image scene details is still lacking. In order to reduce the impact of scene details while preserving SPN at the same time, an edge-adaptive SPN predictor based on a four-neighbor context-adaptive interpolation (PCAI4) [13] was proposed and has been proved to have improvement on CSI performance via extensive experiments. This paper is an extension work of our conference paper [13]. Because the method PCAI4 only predicts the center pixel from its four-neighboring pixels, in this paper, we will extend this method by making use of all the eight-neighboring pixels and propose an edge-adaptive SPN predictor based on eight-neighbor context-adaptive interpolating prediction, as well as a CSI method with this advanced predictor. We have also conducted extensive experiments on different image datasets and reported new results in this paper. Thanks to its adaptability to image edge and context, the predicted SPN is much purer and performs better for CSI. The experimental results on different image databases show that our proposed method can achieve the best ROC performance among all of the existing CSI schemes on different sizes of images and has the best performance in resisting mild JPEG compression.
The rest of this paper is organized as follows. In Section II, we will first introduce our context-adaptive interpolating prediction algorithm. Then, an eight-neighbor SPN predictor is proposed to improve the CSI performance. In Section III, we evaluate the performance of our proposed algorithm and compare its performance with state-ofthe-art CSI methods on different image databases. The conclusion of this paper is made in Section IV.

Context-adaptive interpolator
The context-adaptive interpolation (CAI) method predicts a center pixel from its four-neighbor pixels. We will call it the 'CAI4' in this paper. The SPN predictor using CAI4 [13] is based on the CAI [14] interpolation algorithm which is adapted from the gradient-adaptive predictor (GAP) [15]. In the CAI4 method, the local regions are classified into four types: smooth, horizontally edged, vertically edged, and other. A mean filter is used to estimate the center-pixel value in smooth region; in edged regions, the center pixel is predicted along the edge. In other regions, a median filter is applied. Taking p to be a centerpixel value to be predicted, and t = [n, s, e, w] T to be a vector of its four-neighboring pixels as in Figure 1, the predicted pixel value p̂using CAI4 method can be formulated aŝ In (1), a smooth region will never be estimated as the edged region, and the interpolation prediction in the edged regions are adapted from the GAP [15]. The center pixel is predicted according to different types of edge regions, which is classified by the four-neighboring pixel values with an empirical threshold. The threshold has little impact on the experimental results and set to be 20 according to the former work [15].

Extending CAI4 to CAI8
The CAI4 method only predicts the center pixel from its four-neighbor pixels because it is proposed as an adaptive interpolation algorithm and is not aware of the other four diagonal pixels. As we are using it to predict SPN knowing all the neighbor pixels in Figure 1, we can extend and enhance the CAI4 method by making use of all the eight-neighboring pixels. We call this method 'CAI8' in short form.
In CAI8 method, the local regions are classified into six types: smooth, horizontally edged, vertically edged, left-diagonal edge, right-diagonal edge, and others. In the smooth region, a mean filter is used to estimate the center pixel from the eight-neighboring pixels; in the horizontal and vertical edge regions, the center-pixel value is predicted along the edge as the same as CAI4. In the diagonal-edge region, the center-pixel value is also estimated along the corresponding edge; in other regions, a median filter is applied. Taking p′ to be the center-pixel value to be predicted by CAI8, t′ = [n, s, e, w, en, es, wn, ws] T to be a vector of its eight-neighboring pixels as shown in Figure 1, then the predicted pixel value p̂0 using the CAI8 method can be formulated as follows: wn n en w p e ws s es Figure 1 Neighborhood of the center pixel to be predicted.
In (2), the center-pixel value is predicted along different directions of the edge, including in the diagonally edged region which is ignored by CAI4. So, the predicted result can suppress the interference of image edge better and has less prediction error.

Source camera identification with SPN predictor based on CAI8
SPN can be contaminated largely by the image scene, especially in the texture regions. Method CAI8 can predict a center-pixel value accurately in allusion to different local regions because it is adaptive to image edge and local context. So, the difference between the predicted value and actual value can suppress the impact of image edge better while preserving the SPN components at the same time.
Let y = {y i | i = 0, 1, …, N-1} be the camera reference SPN, and x = {x i } be the noise residue extracted from a test image. For the null hypothesis, y is not the correct camera reference SPN of the noise residue x extracted from a test image, i. e., the test image is not taken by the reference camera. In other words, x is a negative sample for y. For the affirmative hypothesis, y is the correct camera reference SPN of the noise residue x extracted from a test image, i.e., the test image is taken by the reference camera. In other words, x is a positive sample for y.
In the following, we will propose a context-adaptive SPN predictor based on CAI8, which is called PCAI8 in short form, and a source camera identification method with PCAI8.
(1) Firstly, we take the difference D of the predicted value and actual value, where CAI(⋅) means the pixel-wise CAI8 prediction as shown in Equation 2.
(2) In order to further eliminate the impact of the image scene and extract a more accurate camera reference SPN, we then perform a pixel-wise adaptive Wiener filter based on the statistics estimated from the neighborhood of each pixel, assuming that the SPN is a white Gaussian signal corrupted by image content. For each pixel (i, j), the optimal predictor for the estimated SPN is where σ̂2 represents the estimated local variance for the original noise-free image, and σ 2 0 represents the overall variance of the additive white Gaussian noise (AWGN) signal, i.e., the SPN here. To a large extent, the performance of the predictor depends on the accuracy of the estimated local variance. We use the maximum a posteriori probability (MAP) estimation to estimate the local variance as following: where m is the size of a neighborhood N m for each pixel.
Here, we take m = 3. The overall variance of the SPN σ 2 0 is also unknown. The detailed discussion of the choice of the parameter σ 2 0 can be found in [3]; the authors of [3] found that the choice of the parameter σ 2 0 has little impact on the experimental results, and our experiments also verified this point. We follow the work in [3] and use σ 2 0 ¼ 9 in all experiments to make sure that the predictor extracts a relatively consistent level of the SPN.
Our proposed SPN predictor PCAI8 is adaptive to different image edge regions according to all eight-neighbor pixels, and the PCAI8 method is more accurate than PCAI4 in classifying edge's area, so it is expected that the predicted SPN has less scene noise from the original image than PCAI4 and other denoising filters.
(3) The estimated camera reference SPN y' is obtained by averaging all the residual noise W k {W k (i, j)} (the estimated SPN from each image) extracted from the same camera as follows: where L denotes the total number of images used for the extraction of camera reference SPN. The residual noise W k (i, j) is extracted pixel-wise according to Equation 4.
(4) In order to further suppress the unwanted artifacts caused by camera processing operations such as color interpolation and JPEG compression blocking artifacts, we adopt two pre-processing operations proposed in [7] to enhance the estimated SPN before it is used for identification. So, the final estimated camera reference SPN y can be expressed as where the ZM(⋅) operation makes y' to have zero mean in every row and column, and the WF(⋅) operation makes ZM(y') to have a flat frequency spectrum using the Wiener filter in Fourier domain.
(5) Finally, calculate the detection statistic c(x, y) between the camera reference SPN y and the noise residue x extracted from a test image with Equation 4. We use the detection statistic CCN to measure the similarity between the image noise residue x and a camera's reference SPN y. We use CCN instead of PCE [8] because it can lower the false-positive rate at the same true-positive rate (please refer to [11] for details). The CCN value c(x, y) is defined as: where A is a small neighbor area around zero where r xy x i y i , and |Α| is the size of A. The size of A is chosen to be a block of 11 × 11 pixels. The circular shift vector y m = {y i⊕m }, where the operation ⊕ is modulo N addition in ℤ N . The circular cross-correlation r xy (m) is defined as In the next section, we will evaluate the CSI performance of our proposed method.

Experimental results
In this section, we will compare the CSI performance of the proposed PCAI8 method with the existing state-ofthe-art methods on two different image databases. In 'Part A' section, an image database built by ourselves is used. In this database, blue sky images can be used to extract more  Figure 2 The overall ROC curves on 128 × 128 image blocks in our own database.
accurate reference patterns. In 'Part B' section, we use a public image database, the 'Dresden Image Database' (DID) [16], which can be downloaded from the internet [17]. Cameras in this image database cover different camera brands or models and different devices of the same camera model. We choose two of Li's models, 'model 3' and 'model 5', in our experimental comparison because they show better results according to Li's work [10]. Furthermore, all model parameters are chosen the same as those in Li's work, and we use model 3 or model 5 to denote the image noise residue attenuated by model 3 or model 5 in our results. As a result, we compare our PCAI8 method with the MLE method from [7], BM3D method [5], PCAI4 method [13], phase method [11], and Li's method [10]  The detection statistic CCN is used to measure the similarity between the image noise residue x and a camera's reference SPN y for all methods. In order to make a fair comparison, before the calculation of detection statistic, for all four methods, we performed the same pre-processing operations as shown in (7) on the estimated reference PRNU/SPN y before the calculation of detection statistic. The experiments on different image databases demonstrate that our method always has the best performance among all existing methods regardless of using CCN, PCE, or correlation as a detection statistic. So, we report the experimental results with detection statistic CCN to measure the similarity between the image noise residue x and a camera's reference SPN y for all methods.

Part A
On the first image database, we use seven different cameras in our experiments. Table 1 shows the image format, native resolution, and imaging sensor property of the cameras (PS means PowerShot). All images are in JPEG format with the highest JPEG quality factor provided by the cameras, except in raw data format for the Nikon D40 (Shanghai, China) and Minolta A2 (Konica, Tokyo, Japan). For each camera, we have two sub-image datasets which are the test image dataset and original image dataset, respectively. The original image dataset is used for camera reference SPN extraction. It has been proved that a more accurate camera reference SPN can be extracted by using blue sky images [7]. So, the original images are taken on a sunny day of the blue sky whose content is flat or near flat. The test images are taken under a variety of environments, from indoor furniture to outdoor sight. The images in the test image dataset are used as test samples for CSI. The CSI experiment is performed on the image block with different sizes from 128 × 128 to 512 × 512. The image block is cropped from the center of a full-size photo.
For each chosen camera, we extract the camera refer-  total true-positive rate (TPR) and total false-positive rate (FPR) are calculated to draw the overall ROC curve. The overall ROC curve performances of our proposed PCAI method compared with other SPN CSI methods are shown in Figures 2, 3 and 4. In practical applications, it is often necessary to ensure a sufficiently low FPR; therefore, the ROC performance in low FPR case is more critical. So, the horizontal axis of all the ROC curves in this paper is in logarithmic scale, in order to show the detail of the ROC curves with a low FPR.
The experimental results show that the proposed PCAI8 method outperforms the others and enhances the ROC performance of CSI for images of different sizes. The proposed PCAI8 method, the PCAI4 method, and the phase SPN method can achieve a 100% TPR at a low FPR on an image block of 512 × 512 pixels in our experimental environment. From Figures 2, 3 and 4, we also notice that both PCAI methods, including PCAI4 and PCAI8, achieve better ROC performance than other methods because of the SPN predictor PCAI has less scene noise residue. Compared to PCAI4, PCAI8 always achieves better performance than PCAI4, which means that the PCAI8method can suppress the scene noise better than PCAI4. False positive rate Figure 6 The overall ROC curves on images with size of 256 × 256 pixels. When an image is JPEG-compressed, the SPN is impaired at the same time, so it becomes more difficult to use SPN for CSI. Figure 5 shows the overall ROC curves performance on JPEG-compressed images of 512 × 512 pixels, with a quality factor (QF) of 90%. The number of test images is the same as that mentioned above. The results with the other sizes are not shown here because they are also similar. The experimental results show that the proposed PCAI8 method also has the best performance in resisting mild JPEG compression and achieves perfect detection.
Although camera fingerprint can be created with as much as possible original images, sometime we cannot have as much as 100 original images for camera finger-  print creation. So, we also investigate the performance when camera fingerprint is extracted using less than 100, e.g., 30, original images from the original image dataset; the other setup is the same as Figure 3. It is observed from Figure 6 that the advantage of the proposed PCAI decreases when the camera fingerprint is extracted using only 30 original images, but it still achieves similar performance as the state-of-the-art MLE method.

Part B
In this part, we report the experimental results on 3,320 images of 17 cameras from the Dresden Image Database. This image database contains some images with some special shooting environment and setting, such as a high ISO value which results in high shooting noises. It makes the CSI challenging on this image database. The 17 camera devices belong to four camera brands or  models. Each camera model has 3 to 5 different camera devices. The different camera devices with the same camera model have the same in-camera processing, such as JPEG compress and color filter array (CFA) interpolation. Table 3 shows the information of each device. Device ID is the unique identification for each camera device. Image no. denotes the number of images in the camera devices, and the resolution is the native resolution of the camera devices.
Most settings of the experiments in this part are similar with the ones in 'Part A' section. We use the luminance channel of all the images to extract sensor pattern noises of test images and reference SPN of each camera device. All the image blocks are of three sizes (i.e., 128 × 128, 256 × 256, and 512 × 512 pixels) and are all cropped from the center of full-size images. In this image database, exactly blue sky images are not available. All the images are ordinary scene pictures in daily life. There are about 200 images of each camera device (Table 3).
In our experiments, we use the five-fold cross-validation method. Assume that one database contains N × K images taken by N cameras; each camera is responsible for K images. Firstly, we divide the images of each camera device into five groups averagely. In each fold, we randomly choose one group as the test image dataset (about K/5 images for each camera), and the other four groups as original images dataset (about K × 4/5 images for each camera). The original image dataset is used for extracting the camera reference SPN, and images from the test image dataset may be used as positive test samples or negative test samples. For each chosen camera, we extract the camera reference SPN using its original image dataset; the test images (about K/5 images) of this camera are selected as the positive samples, and the test images of the other N − 1 cameras (each camera is responsible for K/5 images) are selected as the negative samples. So, we get K/5 CCN values of positive samples and K/5 × (N − 1) CCN values of negative samples for each chosen camera. After five folds, totally, we get K CCN values of positive samples and K × (N − 1) CCN values of negative samples for each camera. At last, the overall ROC curve is obtained in a similar way as mentioned in 'Part A' section.
An obvious characteristic of this database is that some camera devices belong to the same brand. Most of the previous works, including the experiments in 'Part A' section, only considered different camera brands. It might lead to a problem that we cannot make a clear division of camera source identification and camera model identification because the extracted SPN might contain part of camera model noises, which could be regarded as fingerprints of a special camera model. These noises play different roles in experiments dependent on the models of tested cameras. So, if all the tested cameras come from different camera brands, the SPN with more camera model noises might give a better performance than the more accurate one which is with less camera model noises in it. And, the results of such experiments are not very reliable when different camera devices of the same camera model are considered.
In order to make the experiments more convincing, we first compare the performance between our method and other methods in the same camera brand. Figure 7 shows the overall ROC curve performance on images of five camera devices (device ID: C0 to C4) in Casio_EX-Z150. We use the five-fold cross validation method in this experiment. Only the results on 512 × 512 sizes are showed since the results in other sizes are similar.
The experimental results show that our proposed method has the best performance in identifying the source of images taken by the same camera brand and model. The proposed method can achieve a high TPR of 97% at a low FPR of 10 −3 for images with size of 512 × 512, which means that only few images are misjudged.
In the following, we report the CSI experimental results on the whole DID database. In plotting the overall ROC curves on all the images in the DID database, we totally get 3,320 CCN values of positive samples and 53,120 CCN values of negative samples. The results with three different image sizes are shown in Figures 8, 9 and 10.
The experimental results also show that both PCAI8 and PCAI4 have better performance than the other methods in identifying images of different source camera models regardless of different image sizes. Table 4 shows the TPR  .8%, respectively. The improvement is 7.1%, 9.5%, 3.9%, 9.5%, 14.0%, and 19.4%, respectively. The performance of PCIA8 achieves little better than that of PCAI4. The experimental results in both 'Part A' and 'Part B' sections show that the propose method achieves better performance for CSI whether the influence of camera model is considered or not. In 'Part A' section, we compare all methods on seven cameras with different camera models in our image database. In 'Part B' section, we test all methods on five camera devices with the same model and also test all methods on 17 camera devices with the same model or different models. All the experiments on images with different sizes show that our proposed method has the best ROC performance among all of the existing CSI schemes.
The computation time to get the noise residue x from a test image of each method with Intel® (Santa Clara, CA, USA) Xeon®CPU E5-2603 1.80 GHz and Matlab (MathWorks, Bangalore, India) is shown in Table 5. It is observed that both PCAI4 and PCAI8 methods have the best efficiency.

Conclusion
In this paper, we propose a source camera identification scheme based on an eight-neighbor context-adaptive SPN predictor to enhance the ROC performance of CSI. The SPN predictor can suppress the effect of image content better and lead to a more accurate SPN estimation because of its adaptability of different image edge regions. Extensive experiment results on different image databases and on different sizes of images show that our proposed PCAI method achieves the best ROC performance among all of the state-of-the-art CSI schemes and also has the best performance in resisting mild JPEG compression (e.g., with a quality factor of 90%) simultaneously, especially when the false-positive rate is held low (e.g., P fp = 10 −3 ). Because trustworthy CSI must often be performed at low false-positive rates, these results demonstrate that our proposed technique is better suited for use in real-world scenarios than existing techniques. However, our proposed method needs many such as not less than 100 original images to create a camera fingerprint; the advantage of the proposed method decreases when the camera fingerprint is created with less original images.