A study of hepatic fibrosis staging methods using diffraction enhanced imaging

The early hepatic fibrosis staging is very important for timely diagnosis, prognosis, and treatment of all chronic liver diseases. Diffraction-enhanced imaging, which can provide much more information on soft tissue morphology than conventional absorption radiography, might be a potential noninvasive technique to diagnose and stage hepatic fibrosis. This paper presents different feature extraction strategies and classification methods to automatically classify hepatic fibrosis using diffraction-enhanced imaging images. Texture features are obtained using a total of three methods including first order feature, gray level co-occurrence matrix, and grayscale gradient co-occurrence matrix. The fusion of these texture features is also studied. The principal component analysis is used to reduce the dimension of the features and redundant information among data. The features are classified using two popular classification techniques, namely, K-nearest neighbors and support vector machines. On the basis of the comparison of different feature strategies and classification methods, we can identify the suitable methods for grading hepatic fibrosis. The proposed approach efficiently classifies the hepatic fibrosis DEI images into four classes with the highest classification accuracy of 99.99%. We further demonstrate the potential of the DEI images in staging hepatic fibrosis.

and poor compliance [1]. Hemorrhage, infection, and biliary peritonitis may occur. As a matter of fact, there are 3% morbidity and 0.03% mortality rates in liver biopsy [2]. In addition, the sampling volume of liver biopsy is only about 1/50,000 of the liver parenchymal tissue, leading to inevitable sampling errors [3,4]. Imaging techniques have been widely used to evaluate the progress of liver fibrosis during the past decade, because of their advantages of noninvasive and high repeatability [5]. Those conventional imaging modalities including but are not only limited to ultrasonography (US), computed tomography (CT), and magnetic resonance imaging (MRI) provide morphological features of liver to evaluate liver fibrosis and could differentiate between cirrhosis or severe fibrosis and normal liver. However, accurate staging of fibrosis or diagnosis of mild fibrosis was often not achievable [6,7]. X-ray Diffraction-enhanced imaging (DEI) is a phase contrast technology [8,9]. For biological soft tissues, the phase shift is more than 1000 times larger than the absorption term. Therefore, DEI can generate excellent contrast of biological soft tissues compared to the conventional absorption radiography. Li et al. have demonstrated that the DEI images show great visibility of fibrous liver architectures, and DEI may be a potential noninvasive technique to diagnose and stage hepatic fibrosis [10].
In clinical practice, the interpretation of stages of hepatic fibrosis from medical images largely depends on the experiences of radiologists. Machine learning methods may assist the radiologists to analyze the medical images, to ease their workload, and to improve the accuracy of the diagnosis results [11][12][13][14][15][16][17][18][19][20][21]. Lee et al. [11] assessed gradations of hepatic fibrosis using a deep convolutional neural network (DCNN) with US. The DCNN showed high accuracy and achieved better performance than that of radiologists in the diagnosis of cirrhosis. Choi et al. [13] developed a deep learning system (DLS) for staging liver fibrosis by using CT images in the liver. Results showed that the DLS had a staging accuracy of 79.4% for diagnosing significant fibrosis, advanced fibrosis, and cirrhosis. K. Yasaka al [14]. staged liver fibrosis by using a deep learning model based on CT images, with moderate performance. Kayaalti et al. [15] obtained a comprehensive feature set from the CT images. The SVM and KNN were used to classify those features. The method proposed in this paper was successfully applied to pairwise fibrosis staging. However, its classification results for multi-class fibrosis still need to be further improved. Yasaka et al. [16] investigated the performance of a DCNN model in the staging of liver fibrosis using gadoxetic acid-enhanced hepatobiliary phase magnetic resonance (MR) imaging. The DCNN model exhibited a high diagnostic performance in the staging of liver fibrosis. House et al. [18] extracted texture features from combined MRI of T2-weighted, high-resolution, spin echo sequence. The receiver operating characteristics curve was employed to assess the diagnostic performance of texture. However, the performance of texture in MRI was modest when classifying patients with mild and intermediate fibrosis stages. Zhang et al. [21] used texture features calculated from GLCM as the input SVM to differentiate fibrosis using MRI and CT. The experimental results showed that the MRI images provided a higher classification accuracy than the CT images. Li et al. [10] applied DEI method to the imaging of hepatic fibrosis in rat liver sections and showed that DEI image can discriminate the degree of fibrosis. As a follow-up work of [10,22] applied BP network to automatically discriminate liver fibrosis using texture feature and achieved the accuracy rate of 95.1%.
As aforementioned, the current work on staging of the liver fibrosis using machine learning mainly focuses on ultrasound images [11,12], CT images [13][14][15]21], and MRI images [16][17][18][19][20][21]. Most imaging techniques available at present can accurately detect cirrhosis or severe fibrosis, but usually fail to detect stage of fibrosis due to the limited sensitivity and specificity. In [10,22], the potential of DEI image in the classification of hepatic fibrosis was explored. In [10,22], only the texture features that extracted from GLCM were used to describe the characteristics of liver fibers. The extracted texture features were only preliminarily analyzed by human vision [10] or BP [22]. This paper is the continuation of [22]. In this paper, more kinds of texture features, fusion features, and features after dimensionality reduction are developed to describe the hepatic fibers of DEI image, and more machine learning algorithms are applied to automatically classify the extracted features into different fibrosis stages for the automatic staging of hepatic fibrosis. Compared with [22], the performance of liver fiber grading has been significantly improved. We also identify the potentials of DEI in liver fibrosis classification, providing a reference for the selection of optimal features, classification methods, and parameters in automatic staging of liver fibrosis Our paper is structured as follows. Section 2 introduces the research background and relevant methods. Section 3 demonstrates the experimental results and discussions. Finally, Section 4 draws some remarks and conclusions.

Materials
The hepatic fibrous samples of rats are prepared in the Peking University Health Science Center. Hepatic fibrosis in rats is induced by human albumin. The liver samples are cut into about 2 mm thick sections for DEI imaging. Procedures involving animals and their care are carried out in accordance with the NIH guidelines (NIH Pub. No. 85-23, revised 1996). The DEI images of rats are collected at the 4W1A topography and imaging station of Beijing Synchrotron Radiation Facility. There are four different grades of liver fibrosis: no fibrosis (normal), mild fibrosis, moderate fibrosis, and severe fibrosis. Figure 1 shows the representative DEI images of each class. After the imaging examination, the specimens are made into histology pictures, which are reviewed by three pathologists from the Third Hospital of Peking University. Fibrosis stages are evaluated according to the METAVIR scorning system. The diagnostic results obtained by the three pathologists are used as the gold standard for assessment.

Methods
The staging of hepatic fibrosis can be considered as a classification problem. The classification framework is shown in Fig. 2. All the algorithms are implemented using the Matlab R2016b software. More details are given as follows.

Image pre-processing stage
Pre-processing is performed on the DEI images to achieve reliable and acceptable classification accuracy. The main objective of this stage is to eliminate the background noise, reduce the difference of illumination, and normalize the DEI images. The background images with the same experiment conditions and parameters are collected. The DEI images are divided by the background image with the same experiment condition to remove the background noise. After preprocessing, the light difference is removed, and the background noise is reduced. After the removal of background noise, the image gray level is standardized to be between 0 and 1.
The liver region is segmented manually. The regions of interest (ROIs) are obtained from liver regions, which do not contain big vessels in the normalized images. The size of ROIs is 100 × 100 pixels. A total of 11,169 ROIs are selected in the rat data set, including 2090 normal samples, 2246 mild samples, 3306 moderate samples, and 3527 severe samples. These ROI samples will be used for feature extraction.

Feature extraction stage
Feature extraction is a critical step in fibrosis staging. Texture pattern of hepatic fibrosis is one of the most important biomarkers in the diagnosis of hepatic fibrosis grade. There are different texture structures in different stages of liver fibrosis. Therefore, these texture features are extracted to describe the hepatic fibrosis. The first order statistic features (FO), texture feature-based gray level co-occurrence matrix (GLCM) [23], texture feature-based grayscale gradient co-occurrence matrix (GGCM) [24] are employed separately.
The FO features describes the distribution of image pixel intensities, which can be obtained based on ROI images. A total of 14 first order texture features including energy, entropy, kurtosis, maximum, mean, mean absolute deviation, median, minimum, range, root mean square, skewness, standard deviation, uniformity, and variance, are calculated. GLCM describes the distribution of co-occurring pixel values at a given offset and reflects the relationship between the neighbor pixels. The gray level of ROI images is reduced to 32 bits to save the computation time. The matrices of GLCM are calculated symmetrically at 0, 45, 90, and 135°with 1-pixel distance and normalized thereafter. For one ROI image, four GLCMs are obtained. Nine features including angular second moment (energy), inertia, inverse, difference moment, entropy, sum average, difference average, sum entropy, and difference average are extracted from each GLCM matrix. Thus, a total of 36 features can be extracted from each ROI image.
GGCM can discover the joint distribution of gray and gradient of pixel in the image and reflects the spatial relationship between each pixel and its neighboring pixels, which depicts the image texture in a clear way. The texture feature-based GGCM is calculated based on ROI images. Gradient image is obtained by using the gradient of each pixel in the ROI image. The gradient of each pixel is calculated by 3 × 3 Sobel operator. The gradient images are normalized to 32 gray levels to reduce the computation time. GGCM is achieved with the ROI image and gradient image. A total of 15 statistical characteristics including small gradient advantage, big gradient advantage, inhomogeneity of grayscale distribution, non-uniformity of gradient distribution, energy, gray mean, gradient mean, grayscale standard deviation, gradient standard deviation, correlation, gray entropy, gradient entropy, entropy of mixing, differential moment, and inverse moment can be acquired by using GGCM. Each kind of features represents the characteristics of one aspect of the image. Therefore, it is necessary to study the fusion features. To simplify our presentation, we introduce the following abbreviations: the fused feature-based FO and GLCM are abbreviated as FO_GLCM, the fused feature-based FO and GGCM FO_GGCM, the fused feature-based GLCM and GGCM GLCM_GGCM, the fused feature-based FO, GLCM, and GGCM are FO_GLCM_GGCM. The next step is dimension reduction for improving the classification efficiency. Principal component analysis (PCA) [25] is adopted here to reduce feature dimension and minimize the feature space.

Classification stage
The K-nearest neighbors (KNN) [26] and support vector machines (SVM) [27,28] are used as classification approaches in our work.
In the K-NN classification, samples are classified based on their distances to other samples in training data set. In this paper, different neighbor number k and different distance functions are studied to obtain the best classification result.
The SVM classifier aims at finding the super hyperplane with the largest margin between two classes of samples in the feature space. Kernel functions project the data from a low dimensional space into a high-dimensional feature space to improve linear separability as much as possible. Different kernel functions and kernel scales are used. Liver fibrosis staging is a multi-class problem, and we need to construct a suitable multi-class classifier. In this paper, one-versus-one method is applied to combine multiple binary classifiers to build up multi-class classifier.
The performance of classification depends on both the training and test samples. Ten-fold cross validation is used to train and evaluate classification model in our work. The total number of training samples is 11,169. These samples are divided into 10 groups, of which nine groups have 1118 samples and one group has 1107 samples. Cross validation is repeated for 10 times with each sub-sample validated once. The experimental results demonstrated in our paper are the average of 10 trials.
Accuracy, specificity, and sensitivity are deployed to evaluate the performances of the classifiers, such as training time and prediction accuracy. Parallel computing can be utilized to improve the efficiency [29,30].

Results and analysis
In our experiments, FO, GLCM, GGCM, FO_GLCM, FO_GGCM, GLCM_GGCM, and FO_GLCM_GGCM are separately calculated. Each type of features is classified using the KNN and SVM classifier separately.

Performances of different features using KNN classifier
The performance of KNN classifier is affected by the number of neighbors (k) and distance metric. The KNN classifiers are tested with different numbers of neighbors and different distance metrics so as to find the best parameters for staging hepatic fibrosis. The number of neighbors is set to be between 1 and 50, and distance metrics considered are Euclidean distance, cosine angle distance, Murkowski distance, Mahalanobis distance, city distance, Chebychev distance, correlation distance, Hamming distance, Jaccard distance, and Seuclidean distance. Experimental results show that the KNN classification accuracy is the highest when the number of neighbors is set 1. Therefore, in the following experiments, the number of neighbors is set to 1. Figure 3 shows accuracy of KNN classifiers with different distance metrics. Apparently, Minkowski distance, Euclidean distance, and Cosine distance have achieved the best classification accuracy among the ten distance metrics. Table 1 shows the performance of the KNN classifiers with Euclidean distance, Cosine distance, and Minkowski distance in more detail. We can observe that when KNN classifier is used in staging fibrosis, Minkowski distance gives the highest accuracy and sensitivity, Cosine distance metric the highest specificity, and Euclidean distance the highest efficiency. Although the accuracy of the KNN classification obtained by Euclidean distance is slightly lower than that of Minkowski distance, the training time is shorter than that of the latter, and the prediction speed of the former is much faster than that of the latter. Thus, when KNN is used to staging fibrosis, to pursue algorithm accuracy and sensitivity, we can choose Minkowski distance; to pursue efficiency, we can choose Euclidean distance; and to pursue specificity, we can choose Cosine distance metric.
The performance of each feature set-based KNN is provided in Table 2. In the KNN, the number of the nearest neighbors is set to be 1, and Euclidean distance is used as the distance metric. Accuracy of all KNN classifiers based on different types of features is all higher than 98.5%. Among the classification results of single type features, GGCM achieves the highest classification accuracy and specificity by using only 15 features, and GLCM achieves the highest accuracy and sensitivity by using 36 features. Compared with GLCM, the training time of GGCM is shorter, and the prediction speed is faster. Thus, although the accuracy obtained by GLCM and GGCM are comparable, it is more suitable to use GGCM. GGCM features are the best choices for grading liver fibrosis with single type features and KNN. The accuracy, sensitivity, and specificity of KNN classifiers are improved by the fusion of different types of features. Among the classification results of the fusion features with two types of features, their accuracies are the same. However, FO_GLCM obtains the highest sensitivity, FO_GGCM obtains the highest specificity, and FO_GGCM obtains the highest efficiency. With regard to the fusion of FO, GLCM, and GGCM, the accuracy is improved to 99.8%, and the specificity is improved to 99.92%. FO_GLCM_ GGCM obtains the highest accuracy and specificity among all the features using KNN classifier. To conclude, if KNN classifier is used for staging hepatic fibrosis, FO_ GLCM_GGCM feature is the best choice for pursuing high accuracy and low misdiagnosis rate, FO_GLCM feature is the best choice for pursuing diagnosis accuracy, and GGCM is the best choice for pursuing efficiency. Figure 4 a, b shows the training time and testing speed of KNN classifiers with different numbers of features, respectively. The training time increases, and the testing speed decreases with the growth of the number of features. We can conclude that the training time of KNN classifier with fusion feature is longer, and the prediction speed is slower, because the number of fusion features is more than that of single type of features.

Performances of different types of features using SVM classifier
The kernel function and kernel scale have a significant influence on the performance of SVM classifiers. The SVM classifiers with different kernel functions and kernel scales are trained to find the best parameters, which are suitable for classifying hepatic fibrosis. Four kernel functions are adopted, i.e., linear kernel, quadratic polynomial kernel, cubic polynomial kernel, and Gaussian kennel. Kernel scales are set to be between 0.01 and 2 at an interval of 0.01. The performances of the SVM classifiers with the best kernel scale and different kernel functions are given in Table 3. Apparently, when SVM classifier is used in staging fibrosis, Cubic kernel function yields the highest accuracy and specificity among the four kernel function, but its training time is longer. Gaussian kernel obtains the highest sensitivity, and it is the most efficient in training phase as its training time is the shortest. However, its accuracy, specificity, and prediction speed are slightly lower than those of Cubic kernel function. Therefore, when SVM classifier is applied to staging fibrosis, to pursue accuracy and specificity, Cubic kernel can be chosen. On the other hand, if the sensitivity and training speed of the algorithm are considered, Gaussian kernel function is a competitive candidate. The performances of different types of features based SVM are given in Table 4. In the SVM classifiers, Gaussian kernel is adopted.
Accuracies of all SVM classifiers based on different type of features are higher than 98.4%. Among the SVM classification results of single type features, GLCM achieves the highest accuracy and highest sensitivity; GGCM can obtain the highest specificity with relatively shorter training time and faster prediction speed. Although the classification accuracy and sensitivity obtained by using GGCM features are slightly lower than  those of GLCM features, the number of features is far less, the training time is shorter, and the prediction speed is faster. Hence, using GGCM features is more efficient for feature extraction and classification than GLCM features. The accuracy, sensitivity, and specificity of SVM classifiers are improved by the fusion of two types of features. Among the classification results of the fusion features with the two types of features, FO_GLCM and GLCM_GGCM can obtain the highest accuracy, FO_GLCM the highest sensitivity, and FO_GGCM the highest specificity. Moreover, with the fusion of all the types of features, the accuracy is improved to 99.81%, sensitivity 99.69%, and specificity 99.69%. The training time is 48.903 s, and prediction speed is 10,000 objects each second. The conclusion is that when SVM classifier is used for staging the liver fibrosis, the highest accuracy, sensitivity, and specificity can be obtained with FO_GLCM_GGCM. In other words, FO_GLCM_GGCM feature is the best choice when pursuing accuracy, low misdiagnosis rate, and high diagnosis accuracy.

3Feature dimension reduction
The above experimental results show that the accuracy, sensitivity, and specificity of KNN classifier are improved by the fusion of different types of features compared with single type features. However, with the growth of fusion feature dimension, the efficiency of classification may decrease, which means that the training time will increase, and prediction speed will decrease. To cope with this issue, PCA method is used to reduce dimensions as well as improve the efficiency of classifier. The relationship between the accuracy of KNN classifier and PCA dimensionality reduction is shown in Fig. 5. It is clearly visible that the classification accuracy increases with the growth of feature dimension. However, when the feature dimension increases to a certain amount, accuracy becomes stable. Figure 6 shows the performance of the classifiers before and after dimensionality reduction using KNN classifiers with different types of features. From Fig. 6, it can be seen that PCA dimension reduction improves classification accuracy, sensitivity, and specificity, reduces training time, and speeds up prediction. With the PCA dimensionality reduction, we can use fewer features to achieve higher accuracy, sensitivity, and specificity. This indicates that the feature has redundant information that can be removed by PCA. After dimension reduction of FO_GLCM_GGCM feature by PCA, the highest accuracy reaches 99.99%, the highest sensitivity 100%, and the highest specificity 99.97%.

Discussions
The experimental results given in the above sections are discussed as follows: 1. Performance of different classifiers Figure 7 shows the performance of KNN and SVM for different feature sets. It is clearly visible that (1) for 6/7 feature sets, the accuracy of SVM is better than that of KNN; (2) for 5/7 feature sets, the sensitivity of SVM is better than that of KNN; (3) for 6/7 feature sets, the specificity of SVM is better than that of KNN; (3) the training time of KNN is far lower than that of SVM, so the training speed of KNN is much faster; and (4) for FO and GGCM, the prediction speed of KNN is faster than that of SVM for the two feature sets with low feature dimension, while for the data sets with high feature dimension, the prediction speed of SVM is faster than that of KNN. In general, SVM is better than KNN in performance. KNN is better than SVM in efficiency.

Performance of different features
It can be discovered from Tables 2 and 4 that GLCM and GGCM are capable of achieving better accuracy, sensitivity, and specificity. On the other hand, the performance of FO is worse. This indicates that texture features extracted from the GLCM and GGCM are more discriminative than the FO in staging of hepatic fibrosis. Therefore, GGCM and GLCM are more suitable than FO for characterizing different grades of liver fibrosis, when a single type of feature is chosen to represent the degree of fibrosis.
Fusion of different types of features can improve the classification performance. Especially, the best classification results are obtained by the fusion of three types of features. There is complementary information useful for grading liver fibers among different types of features. Although the classification performance of single FO feature is not ideal, the classification performance is improved, when combined with GLCM or GGCM. Thus, we can conclude that FO, GLCM, and GGCM all contain valuable information for fibrosis grading.

Best classifier and feature combination
From Tables 2 and 4, it can be found out that combination of SVM classifier and FO_GLCM_GGCM feature achieves the highest accuracy; combination of KNN classifier and FO_GLCM feature achieves the highest sensitivity; and combination of KNN classifier and FO_GLCM_GGCM achieves the highest specificity. The shortest training time is achieved by KNN classifier with FO features; with both the performance and efficiency of the classifier under consideration, hybridization of KNN classifier with GGCM or FO_GLCM_GGCM is a good choice.

The effect of dimension reduction
After dimensionality reduction by PCA, a higher accuracy can be achieved by using fewer numbers of features. Obviously, features have redundant information to be removed by feature selection, and dimension reduction by PCA can improve the efficiency of grading liver fibrosis. From the experimental results presented in this paper, we discover that using FO_GLCM_GGCM after dimensionality reduction by PCA and KNN classifier achieves the highest accuracy, sensitivity, and specificity.

Performance comparison with the previous study
(See figure on previous page.) Fig. 6 The performances of KNN classifier with different types of features before and after dimensionality reduction. a Feature numbers before and after dimensionality reduction. b Accuracy before and after dimensionality reduction. c Sensitivity before and after dimensionality reduction. d Specificity before and after dimensionality reduction. e Training time before and after dimensionality reduction. f Prediction speed before and after dimensionality reduction In our preliminary work [22], GLCM texture feature and BP classifier were used to classify liver fibrosis. Figure 8 shows the result when the GLCM texture feature is classified by BP classifier in [22] and KNN and SVM classifier in this paper, respectively. It can be seen from it the classification of accuracy, sensitivity, and specificity of KNN and SVM in this paper are better than those of BP classifier in [22].
In addition to the further study of GLCM in [22], this paper also studies FO, GGCM, and fusion features. Figure 9 shows the performance of different features using the same classifier. As can be seen from it that except for FO feature, the accuracy and specificity of other features are better than GLCM. The sensitivity of FO_GLCM, FO_ GGCM, and GLCM_ GGCM is better than that of GGCM. Better feature extraction method to describe the hepatic fibrosis than GLCM is presented in this paper. Table 5 summarizes similar study that has been carried out in the staging of liver fibrosis. Most of the results are based on US [11,12], CT [13,15,21], and MRI [21] images. The work in [22] and this paper explore the DEI-based staging of hepatic fibrosis. According to the accuracies in Table 5, DEI images have certain advantages over US, CT, and MRI in hepatic fibrosis staging. In our preliminary work [22], BP network is applied to discriminate liver fibrosis in DEI image-based GLCM texture feature, which can achieve the accuracy rate of 95.1%. In the present paper, more types of texture features, fusion features, and features after dimensionality reduction are proposed to describe the hepatic fibers of DEI images, and a few more machine learning algorithms are applied to automatically classify different fibrosis stages. The classification performances have been significantly improved.

Conclusions
Due to the advantages of containing much more information on soft tissue morphology than the conventional absorption radiography, DEI images are used in this paper. In  the proposed framework, different feature extraction strategies and classification methods are adopted and compared, which can provide a reference for the appropriate selection of models and parameter settings in hepatic fibrosis grading. Our approach can efficiently classify the hepatic fibrosis DEI images into four classes with a classification accuracy of 99.99%. The DEI has been proved to be a promising non-invasive technique in staging of hepatic fibrosis. The experimental results also demonstrate that grading the liver fibrosis using DEI images has interesting potentials in clinical applications. In our future work, new liver fibrous grading approaches based on deep learning techniques will be investigated.