Skip to main content

Hybrid model-based early diagnosis of esophageal disorders using convolutional neural network and refined logistic regression

Abstract

Accurate diagnosis of the stage of esophageal disorders is crucial in the treatment planning for patients with esophageal cancer and in improving the 5-year survival rate. The progression of esophageal cancer typically begins with precancerous esophageal disorders such as gastroesophageal reflux disease (GERD), esophagitis, and non-dysplasia Barrett’s esophagus, eventually advancing to low- and high-dysplasia Barrett’s esophagus and ultimately to esophageal adenocarcinoma (EAC). The majority of prior research efforts have primarily focused on the identification of general gastrointestinal (GI) tract diseases and the detection of esophageal cancer, with limited attention to the diverse spectrum of esophageal disorders. To address this gap, an innovative framework called Hybrid Model-Based Esophageal Disorder Diagnosis (HMEDD) is developed in this work. The primary goal of HMEDD is to enable early diagnosis of various esophageal disorders using gastroscopic images. HMEDD combines the feature extraction capabilities of an Esophageal Convolutional Neural Network (EsoNet) with the high classification accuracy of a Refined Logistic Regression (RLR) model. EsoNet comprises 14 weight layers and kernels \(\left(3\times 3\right)\) used for high-level deep feature learning. Esophageal disorders are classified using the RLR model, which is developed by fine-tuning hyperparameters in the traditional Logistic Regression (LR) model using Random Search Cross-Validation (RandomizedSearchCV). HMEDD is extensively validated using a data set containing numerous esophageal abnormalities captured through gastroscopic images. The results of this work demonstrate the effectiveness of HMEDD in accurately classifying different esophageal disorders, with an impressive accuracy of 92.15%. These findings will assist physicians in the accurate early diagnosis of esophageal disorders, ultimately preventing their progression to cancer.

1 Introduction

A gastroscope is a sophisticated diagnostic imaging instrument that offers high-quality imaging of vital esophageal tissues [1, 2]. It uses a flexible optical fiber to guide light into the esophageal cavity and an image sensor, such as a complementary metal oxide semiconductor (CMOS) or a charge-coupled device (CCD), to capture reflections from the mucous membrane within the cavity [3, 4]. It then transforms these reflections into electronic signals through a conversion process. Following a sequence of electrical signal processing steps, it generates gastroscopic images depicting esophageal mucosa [5, 6]. Currently, these gastroscopic images play a crucial role in the examination and diagnosis of upper gastrointestinal (GI) disease [7].

In clinical settings, a range of esophageal diseases can be identified by gastroscopic analysis of the esophagus. Based on this analysis, the esophagus can be broadly categorized into the following four groups: normal esophagus, esophagus with precancerous diseases, esophagus with early stage cancer, and esophagus with advanced-stage cancer. The latter two categories depend on the depth of cancer cell invasion beneath the mucous membrane [8]. Patients diagnosed with advanced-stage esophageal cancer have a 5-year survival rate ranging from 15% to 25%, while those with early stage esophageal cancer exhibit a significantly higher 5-year survival rate, reaching as high as 92–93% [9,10,11]. Thus, precise identification of esophageal disorders is essential for facilitating appropriate treatment planning and improving the 5-year survival rate among esophageal cancer patients [12]. Nevertheless, the diagnostic procedure using a gastroscope poses various challenges, including clinician fatigue, lack of experience, and diverse appearance of lesions. These factors may increase the risk of misdiagnosis and missed diagnoses [7, 8, 13, 14]. Computer-aided diagnosis (CAD) has proved to be effective in addressing many of these challenges, significantly enhancing the accuracy and efficiency of the diagnosis of esophageal disorders [15, 16].

The application of CAD methods in gastroscopic image processing primarily involves three tasks: classification, segmentation, and object detection. In the classification task, an imagewise categorization of the lesion type is carried out, requiring only imagewise labeled data for training [17,18,19]. In the segmentation task, a pixelwise categorization is conducted, indicating the type of lesion for each pixel [20, 21]. In the object detection task, both the lesion type and its location within the image are predicted [15, 22].

Automated classification helps doctors swiftly screen images for lesions among a large data set of gastroscopic images, thus enabling them to distinguish between various diseases and saving significant time, which is crucial in clinical settings. Conventional classification approaches generally utilize manually designed algorithms for image feature extraction and a classifier, i.e., a support vector machine (SVM), for classification [23, 24]. The fundamental manual features such as color and texture from images are extracted, which are then processed using classifiers such as SVM and random forest (RF) [25,26,27]. Esophageal images are characterized using classic descriptors such as scale-invariant feature transform (SIFT) and speed-up robust features (SURF). Methods such as Bayesian classifiers and optimum path forests are used to carry out patch-based classification [28]. Conversely, discriminative features from unprocessed images (raw images) can be directly discerned using deep learning, without the need for manual design of descriptive features. This method eliminates the reliance on prior knowledge and shows significant performance enhancements in comparison with traditional methods [29], i.e., cervix-type image classification using deep convolutional neural networks (CNNs) [30], herniation detection using a hybrid deep learning model [31] and face recognition using self-attention [32].

For example, in a previous study [17], a classification approach was developed using GoogleNet, which was specifically designed to differentiate between non-cancerous lesions and malignant esophageal squamous cell carcinoma in endoscopic images. In another study [18], a pretrained CNN was finetuned for the classification of three gastric diseases by magnifying narrow-band images. Furthermore, a CNN model based on the same CNN approach as [18] was developed in another study [19] to ascertain the invasion depth of gastric cancer through gastroscopic images. Furthermore, another diagnostic system, utilizing a two-stream CNN based on Inception-ResNet, was developed [33]. This system, which was discussed in detail in [34], effectively identifies precancerous disease and esophageal cancer from endoscopic images. In another study that focused on data augmentation, an advanced convolutional generative adversarial network model was introduced [35]. In addition, LeNet-5 was subsequently used as a classifier to identify adenocarcinoma and Barrett’s esophagus. Moreover, esophageal adenocarcinoma (EAC) regions were identified in a study by extracting features using a 50-layer deep residual network (ResNet) [36]. This network was initialized with Image Net parameters, using transfer learning (TL) to understand features from endoscopic images. Similarly, a previous study utilized TL with Image Net parameters, using architectures such as GoogleNet, VGG16, and AlexNet to extract features [37]. Then, to detect EAC, the extracted features were classified using traditional classifiers such as SVM and RF.

In some previous studies [38], various deep learning techniques such as regional-based CNN (R-CNN) [39], fast R-CNN [40], faster R-CNN [41], and single-shot detector (SSD) [42] were evaluated. In these methods, features obtained from the VGG16 network were used, and bounding boxes were generated to locate EAC regions in endoscopic images. In another study [43], Gabor filter responses were manually crafted from endoscopic images, which were then combined with DenseNet CNN-based extracted features. The merged features were fed into faster R-CNN to identify esophagitis and EAC. In another study [44], a hybrid deep learning model designed to classify esophageal lesions into three categories was used: normal, esophagitis, and esophageal cancer. In addition, a novel and highly efficient deep-dense CNN designed specifically for classifying esophageal diseases was proposed in a study [45]. This network demonstrated exceptional ability in classifying esophageal images into four primary types: normal esophagus, precancerous esophageal conditions, early stage esophageal cancer, and advanced esophageal cancer. However, the existing methods are primarily based on machine learning (ML) and deep learning and are limited in their ability to simultaneously categorize a comprehensive range of esophageal disorders. Previous approaches primarily focused on identifying specific abnormal regions, predominantly cancerous conditions such as EAC. Notably, in the literature, there is a lack of studies focusing on identifying diverse esophageal disorders, especially in their early precancerous stages. The majority of previous studies focused only on cancerous stages, with a few addressing one or two precancerous disorders. To address these gaps, in this work, an innovative hybrid model is introduced that aims at early diagnosis of a variety of esophageal disorders using gastroscopic images.

The key contributions of the work are as follows:

  1. 1.

    Integrating the EsoNet and RLR models, a novel HMEDD model is developed for the early diagnosis of a diverse range of esophageal disorders.

  2. 2.

    A new EsoNet architecture is proposed to learn high-level features from preprocessed images.

  3. 3.

    An RLR model is presented to classify images into various types of esophageal disorders based on the high-level features learned.

  4. 4.

    The performance of the proposed architecture is assessed using accuracy, precision, recall, F1 score, and error rate metrics on a test set.

The structure of this paper unfolds as follows: Sect. II elaborates on the proposed methodologies, providing detailed explanations. Section III outlines the data set used for evaluation, whereas Sect. IV presents experimental results and a comprehensive discussion of the results. Section V presents the conclusions drawn from this work.

2 Proposed methodology

The HMEDD framework for the automatic early diagnosis of a diverse range of esophageal disorders is presented in Fig. 1, which consists of two primary stages: deep feature extraction and classification of esophageal disorders. In the former, high-level features are extracted using the EsoNet architecture. In the latter, the extracted features, which encapsulate essential information from gastroscopic images, are fed into the RLR model for the classification of various esophageal disorders. The combination of these two stages forms the core of the HMEDD framework. In this section, a comprehensive exploration of each subtask within the hybrid model is presented.

Fig. 1
figure 1

Overall proposed methodology for early diagnosis of various esophageal disorders using HMEDD

2.1 Preprocessing

The preprocessing technique called image augmentation was used to train the model, specifically the EsoNet model, on gastroscopic images to reduce overfitting concerns. It is a vital technique to train deep learning models as it increases the diversity of the training data set without collecting additional data manually. This in turn renders the model better generalizable to real-world scenarios and reduces the risk of overfitting to the original training data. For the classification of esophageal disorders, the application of the image augmentation technique is crucial to enhancing the diversity and robustness of the data set used to train the model. The six augmentation techniques, namely, horizontal flip, vertical flip, brightness adjustment, zoom range, rotation, and shear range, implemented via the Keras preprocessing library, play a vital role in simulating various real-world scenarios and conditions that may be encountered in gastroscopic images. In the horizontal flip technique, the image is flipped along its horizontal axis, creating a mirrored version. For esophageal disorders, this technique can be beneficial while handling variations in the orientation of abnormalities within the esophagus as it ensures the model’s ability to recognize features from different perspectives. Similar to the horizontal flip, in the vertical flip technique, a vertically mirrored representation of the image is generated. This technique can be especially useful in scenarios where the orientation of lesions or abnormalities in the esophagus may be inverted. In the brightness adjustment technique, the brightness of the image is altered within a specified range (with values between 1.2 and 1.0), thus mimicking changes in lighting conditions during gastroscopic procedures. This technique helps the model adapt to varying levels of illumination, a common scenario in medical imaging. Introducing a zoom effect with a specified range (zoom of 0.2) allows the model to learn from images with different scales. This technique is relevant for the classification of esophageal disorders as abnormalities may appear at various distances from the gastroscopic camera. By rotating the image by 30 degrees, the model can recognize features from different angles. In esophageal disorders, where the shape and orientation of abnormalities vary, this technique aids in enhancing the model’s robustness. Shear transformations with a specified range (shear of 0.2) simulate deformations in the image. This is particularly important while handling variations in the appearance of lesions or abnormalities, such as distortions caused by the gastroscopic procedure. The outcomes of these augmentation techniques are demonstrated in Fig. 2, showcasing their impact on the appearance of esophageal gastroscopic images. The augmented data set, which is enriched with these variations, enhances the model’s ability to generalize and accurately classify a diverse range of esophageal disorders during training.

Fig. 2
figure 2

Different image augmentations on esophageal images for the early diagnosis of various esophageal disorders: a input image; b horizontal flip; c shear 0.2; d rotation 30 degrees; e brightness adjustment (1.2, 1.0); f vertical flip; and g zoom 0.2

2.2 Feature extraction using the EsoNet architecture

The primary purpose of using the EsoNet architecture is to extract features for classifying esophageal disorders, which can be achieved by constructing feature vectors from the output of the last dense layer in the network. Both training and testing subset images are sent through the feature extractor, resulting in the generation of a fixed-length feature vector for each image. These feature vectors effectively capture relevant and discriminative information extracted by the EsoNet model and subsequently serve as the input data for the classification model, which is responsible for predicting esophageal disorders based on the extracted features. This two-step process, involving feature extraction by the EsoNet architecture followed by classification, allows for a robust and effective classification of esophageal disorders.

The design of a general CNN architecture typically incorporates both convolutional and pooling layers, with the former being responsible for extracting intricate features from input images and the latter serving to reduce the dimensionality of feature maps. Following convolutional layers, feature maps are flattened into a one-dimensional array using a flattened layer and then fed into fully connected layers. The final output layer makes predictions based on input images. A similar approach is followed in the case of the proposed EsoNet architecture. The earlier layers consist of convolutional layers that capture important features from input data. However, unlike a traditional CNN for classification, the EsoNet architecture is specifically designed for feature extraction and does not include an output layer for class prediction. The proposed EsoNet architecture comprises an input layer, 12 convolutional layers, 4 max-pooling layers, 14 ReLU activation layers, 2 dropout layers, and 2 dense layers. The architecture is organized into 5 convolutional slabs, each with a varying number of convolutional layers. After convolutional layers, a flattened layer is applied, which is followed by a sequence of dense layers. Batch normalization is strategically applied after specific layers to enhance training stability, while max-pooling layers effectively reduce the dimensionality of feature maps in subsequent layers. A flow diagram that illustrates the proposed architecture is presented in Fig. 3. The layers of the EsoNet architecture and their activations and parameters are shown in Table 1.

Fig. 3
figure 3

The proposed EsoNet architecture for deep feature learning in gastroscopic esophageal images

Table 1 Detailed description of proposed EsoNet architecture layers, activations, and parameters

The proposed EsoNet architecture processes input images of dimensions 128 pixels in both height and width. Before feeding input images into the EsoNet architecture, six data augmentation techniques are applied, ensuring that the input images are of dimensions 128 × 128 × 3, with these augmentation methods in place. In the first convolutional slab, 2D convolution is performed using a filter size of 3 × 3 and using 32 filters. This initial slab consists of just one convolutional layer. Furthermore, Leaky Rectified Linear Unit (Leaky ReLU) is used as the activation function at this layer, which is an extension of the standard ReLU activation function; it is similar to the standard one except that it returns a small negative value for negative input instead of zero. This modification helps prevent the dying ReLU issue, where some neurons can become inactive during training. Leaky ReLU activation function can be expressed as follows:

$$LeakyReLU\left( v \right) = \left\{ {\begin{array}{*{20}c} {v, v > 0} \\ { \propto v, v \le 0} \\ \end{array} } \right.$$
(1)

where \(v\) is the input value and \(\alpha\) is a small positive slope constant, typically a small fraction like 0.01.

Given that this is the first layer responsible for extracting high-level features from input images, filters of smaller filter size are recommended, such as 3 × 3 instead of larger filters such as 5 × 5 or 7 × 7. The output size of the convolutional layer in the first slab is 128 × 128 × 32, which is determined by the number of filters (32) and the dimensions of the input (128 × 128). While this yields the initial feature maps extracted from input images, it is important to note that the distribution of input batches can vary significantly between batches, depending on the included image type. This variability can pose challenges for the convergence of optimizer algorithms, potentially destabilizing the training process. To address this issue and facilitate a more stable training process, batch normalization (BN) is applied, which can be expressed as follows:

$$BN\left( v \right) = \,{{\left( {v - \mu } \right)} \mathord{\left/ {\vphantom {{\left( {v - \mu } \right)} {\sqrt {\left( {\sigma^{2} + \tau } \right)} }}} \right. \kern-0pt} {\sqrt {\left( {\sigma^{2} + \tau } \right)} }}$$
(2)

where \(v\) is the input value, \(\mu\) is the mean of the batch, \({\sigma }^{2}\) is the variance of the batch, and \(\tau\) is a small constant added for numerical stability.

Batch normalization helps ensure that the input to each layer follows a unit Gaussian distribution. Hence, it accelerates the training process by promoting faster convergence and reduces the reliance on specific weight initialization strategies. Convolution of an output feature map \(\left(F\right)\) at a given position \(\left(x,y\right)\) in the convolutional slab can be expressed as follows:

$$F\left[ {x,y} \right] = \sum \left( {W*P} \right)\left[ {x,y} \right] + b$$
(3)

where \(W\) is the filter of dimensions 3 × 3 in the convolutional slab, \(P\) is the input feature map at the same position \(\left(x,y\right)\), and \(b\) is the bias term.

The batch-normalized tensor, with initial dimensions of 128 × 128 × 32, is forwarded to the second slab, where two consecutive convolutional layers are employed. In these layers, convolution is carried out using 3 × 3 filters, and each step utilizes 64 different filters. Leaky ReLU activation functions are applied throughout the process. This setup enables the extraction of deeper features that may carry more significance for classifying esophageal disorders. The output dimensions of each convolutional layer within the second slab are 128 × 128 × 64. Then, batch normalization is applied once again to the extracted feature maps. In deeper CNNs, it is common to use a larger number of filters in deeper layers to capture deep features. However, this can lead to an increase in the dimensions of feature maps and computational complexity. To address this, max-pooling layers are introduced to downsample feature maps. In this architecture, max-pooling layers of dimensions 2 × 2 are used, reducing the dimensions of feature maps by half. Max-pooling operation for downsampling can be written as follows:

$$P = {\text{max}}\left( {F\left( {x:x + 2,y:y + 2} \right]} \right)$$
(4)

where \(P\) is the output feature map, \(F\) is the input feature map, and \(x\) and \(y\) iterate over the input feature map with a step size of 2.

The output from the preceding block, after max-pooling layers, results in feature maps of dimensions 64 × 64 × 64. These are given as input for the third slab, which mirrors the structure of the previous block. The primary difference lies in having three convolutional layers, each with 3 × 3 filters and 128 different filters, while still employing Leaky ReLU activation. Following convolutional layers, batch normalization and max-pooling layers are applied. Max-pooling layers further reduce the dimensions of feature maps to 32 × 32 × 128.

The fourth slab consists of four consecutive convolutional layers with 3 × 3 filters and 256 different filters, using Leaky ReLU activation. Once again, max-pooling is applied. The output size of the fourth slab of each convolutional layer is 32 × 32 × 256. This setup is consistent with previous slabs.

In the fifth and final slab, which receives input of dimensions 16 × 16 × 256, larger 5 × 5 filters are used. This block features two convolutional layers with 512 different filters and Leaky ReLU activation, which is followed by a max-pooling layer, reducing the output dimensions to 6 × 6 × 512. The output from the previous block is flattened into a one-dimensional array using a flattened layer. Then, a dense layer with 2048 neurons utilizes Leaky ReLU activation. Another dropout layer with a rate of 0.2 follows, and this process is repeated with a second dense layer of 2048 neurons and a final dropout layer of 0.2. The dense layer operation can be expressed as follows:

$$Dense\left( v \right) = T*v + b$$
(5)

where \(v\) is the input vector, \(T\) is the weight matrix, and \(b\) is the bias term.

These layers collectively enable EsoNet to efficiently learn deep features for the classification of esophageal disorders, as well as the steps explained in Algorithm 1.

A visual representation of feature maps within each convolutional slab in the EsoNet architecture is presented in Fig. 4, illustrating the sequential transformations applied to the input image during feature extraction for the classification of esophageal disorders. Convolutional slabs collectively form the backbone of feature extraction in the EsoNet architecture. In visualization, each layer of convolutional slabs captures specific features from the input image, which are progressively abstracted and refined as the image passes through subsequent slabs. Due to the hierarchical nature of this process, the network learns increasingly complex and discriminative representations of the input and captures intricate patterns associated with different esophageal disorders. The extracted features, which are represented in feature maps, serve as a structured and informative encoding of the input image. These features are then utilized for the classification of esophageal disorders in the subsequent stages. This visual representation helps us understand the EsoNet architecture, which transforms raw gastroscopic images into a structured set of features that can be leveraged for accurate and effective classification of esophageal disorders.

Fig. 4
figure 4

Visual representation of feature maps of each convolutional slab in the EsoNet architecture: a input image; b conv slab1; c conv slab2; d conv slab3; e conv slab4; and f conv slab5

Algorithm 1
figure a

Deep Feature Learning using EsoNet Architecture for Gastroscopic Images

2.3 Classification using Refined Logistic Regression (RLR)

In the proposed esophageal disorder classification model, the classification stage marks the final step. The extracted features are categorized into distinct classes, such as normal esophagus, gastroesophageal reflux disease (GERD), polyps, esophagitis, non-dysplasia Barrett’s esophagus, and dysplasia Barrett’s esophagus. In this work, an RLR model is introduced to accurately classify these esophageal disorders.

RLR is an extension of traditional LR where hyperparameters are tuned using random search cross-validation (RandomizedSearchCV) to improve model performance, as shown in Fig. 5. The key hyperparameters that can be tuned include regularization strength \(\left(\gamma \right)\), solver type, and other LR parameters. The objective function of RLR remains similar to that of traditional LR but includes the tuned hyperparameters, which can expressed as follows:

$$Q\left( {\theta ,\gamma ,solver} \right) = \frac{1}{x}\mathop \sum \limits_{j = 1}^{x} \left[ {b^{\left( j \right)} {\text{log}}\,\left( {g_{\theta } \left( {a^{\left( j \right)} } \right)} \right) + \left( {1 - b^{\left( j \right)} {\text{log}}\left( {1 - g_{\theta } (a^{\left( j \right)} } \right)} \right)} \right] + \frac{\gamma }{2x}$$
(6)

where \(Q\left(\theta ,\gamma ,solver\right)\) is the RLR objective function, \(\theta\) is the model parameter (coefficients), \(\gamma\) is the regularization strength hyperparameter (e.g., L1, L2), and \(solver\) is the optimization solver used by LR (e.g., liblinear, lbfgs, newton-cg), \(x\) is the total number of samples in the data set, \(j\) is the index for individual samples, \({b}^{(j)}\) denotes the actual label (0 or 1) of the \(j\)th sample, \({g}_{\theta }{(a}^{\left(j\right)})\) is the predicted probability of the \(j\)th sample being class 1, based on parameters \(\theta\), \(\frac{1}{2x}\) denote as normalization term.

Fig. 5
figure 5

Generation of Refined Logistic Regression (RLR) model by hyperparameter tuning using RandomizedSearchCV

Algorithm 2
figure b

Refined Logistic Regression (RLR)

The objective function consists of two terms:

  1. a)

    logistic loss, which measures the difference between the predicted probabilities and the true labels; and

  2. b)

    the L2 regularization term, which adds the squared values of feature weights to the loss, where regularization strength \(\gamma\) is a hyperparameter.

The RandomizedSearchCV process is carried out to find optimal hyperparameters \(\gamma\) and \(solver\) that minimize the objective function and improve the model’s performance in classification. RandomizedSearchCV explores a range of hyperparameters by sampling different combinations of specified parameter distributions and evaluates each combination through cross-validation before selecting the best-performing set of hyperparameters. In RLR, hyperparameters such as \(\gamma\) and \(solver\) are tuned to achieve better model performance. The exact search space and distributions for hyperparameters are defined while setting up RandomizedSearchCV. Algorithm 2 describes the steps to implement the RLR model.

3 Materials

The HMEDD approach was experimentally validated using the Hyper Kvasir data set, along with additional open-source images (Kaggle). The complete Hyper Kvasir [46] data set, comprising images, videos, and metadata, can be accessed via the Open Science Framework (OSF) at https://osf.io/mh9sj/ or directly from https://datasets.simula.no/hyper-kvasir/. This data set is open access and facilitates research and development in the field of medical image analysis and classification of gastrointestinal conditions. This data set is a comprehensive collection of images captured from the GI tract, focusing on diseases related to esophageal disorders. It is rich and well annotated, with images meticulously reviewed and verified by medical experts, including experienced endoscopists. It encompasses diseases in both the lower and upper GI tracts; however, in this work, only esophageal disorders are considered. The data set comprises six distinct classes: normal esophagus, GERD, polyps, esophagitis, non-dysplasia Barrett’s esophagus, and dysplasia Barrett’s esophagus. The images in this data set are of varying resolutions and use the RGB color format. The data set comprises a total of 2,198 images after applying various augmentations, including horizontal flip, vertical flip, brightness adjustment, zoom range, rotation, and shear range, implemented via the Keras preprocessing library (described in Sect. 2.1). A detailed description of the data set is provided in Table 2. For the experimental evaluation of the HMEDD framework, the data set was partitioned into a standard 80:20 ratio, where 80% of the images were used for training and the remaining 20% were reserved for testing. This partition allows for robust model training and thorough evaluation while ensuring a balanced representation of data for both the training and testing phases. To provide a visual reference, a subset of sample images from the data set is presented in Fig. 6, demonstrating the diverse range of images included in the data set.

Table 2 Comprehensive data set description of esophageal disorders before and after augmentation
Fig. 6
figure 6

Data set samples showing various types of esophageal disorders (Normal Esophagus, Gastroesophageal Reflux Disease, Polyps, Esophagitis, Non-dysplasia Barrett’s Esophagus, Dysplasia Barrett’s Esophagus)

4 Results and discussion

The outcomes of the proposed method are explored in this section, revealing the robust performance of the HMEDD framework in classifying gastroscopic esophageal images into six categories. In this evaluation, metrics such as accuracy, precision, recall, F1-score, error rate, and Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) score are used to assess the model’s effectiveness across various aspects of classification. In addition, a comparison with the existing methods highlights the superiority of the HMEDD framework in recognizing different types of esophageal disorders. Through a detailed examination of results and comparisons, valuable insights are gained into the effectiveness and potential applications of the proposed approach in the diagnosis of esophageal disorders. The experimental results were obtained using the Python programming language in combination with open-source Keras and a CNN library utilizing TensorFlow as the backend. The Python code was composed using Jupyter Notebook. The prototype was evaluated on a Windows 10 Lenovo laptop with a 64-bit operating system. The laptop was equipped with an Intel (R) Core (TM) i3-7130U processor running at 2.70 GHz, 8 GB of RAM, and 400 GB of hard disk capacity.

4.1 Performance metrics

The evaluation metrics for the proposed method include accuracy \((Ar)\), precision \((Pc)\), recall \((Rc)\), F1-score \((F1s)\), and error rate \((Er)\), which are calculated using Eqs. (7)–(11):

$$Ar = \frac{{\mathop \sum \nolimits_{a} m_{aa} }}{{\mathop \sum \nolimits_{a,b} m_{ab} }}$$
(7)
$$Pc_{b} = \frac{{m_{bb} }}{{\mathop \sum \nolimits_{b} m_{ba} }}$$
(8)
$$Rc_{b} = \frac{{m_{bb} }}{{\mathop \sum \nolimits_{a} m_{ba} }}$$
(9)
$$F1s_{b} = \frac{{2 \times Pc_{b} \times Rc_{b} }}{{Pc_{b} + Rc_{b} }}$$
(10)
$$Er = 1 - Ar$$
(11)

where \(a\) and \(b\) denote the indices of specific categories, while \({m}_{ab}\) represents the number of instances where the \(a\) th category is predicted as the \(b\) th category. Similarly, \({m}_{aa}\) and \({m}_{bb}\) denote the correct predictions for the \(a\) th and \(b\) th categories, respectively. The metrics \(Ar\) (accuracy) and \(F1s\) (F1-score) evaluate the overall classification capabilities. \(Er\) represents the error rate, which indicates the percentage of incorrectly classified instances in the entire set of occurrences. \(Pc\) represents the precision rate, which indicates the accuracy of the identification of disorders, while \(Rc\) represents the sensitivity to disorders, which indicates the ability to correctly identify the instances of disorders.

4.2 Results

To evaluate the proposed HMEDD model, the data set was separated into training and testing subsets. The HMEDD model was designed to extract features and classify gastroscopic esophageal images into six distinct classes. The feature extraction process uses EsoNet features, while the classification task is accomplished using RLR. EsoNet can extract high-level features from gastroscopic esophageal images. Through its deep layers, EsoNet effectively learns intricate patterns and representations from raw images and thus enhances the accuracy of the classification process. This efficient representation learning is crucial for capturing subtle details in images, thus enabling the model to distinguish between various esophageal disorders accurately. Moreover, the use of EsoNet results in optimized classification. By concatenating the features extracted from EsoNet, a robust input for the RLR model is generated. This integration significantly improves the accuracy and reliability of the final classification results.

To validate the transition from LR to RLR during training, the training subset is divided into a training set and a validation set. This validation is conducted by estimating accuracy scores for various hyperparameters, using the RandomizedSearchCV technique, and cross-validation. The following primary hyperparameters are considered for tuning: regularization strength, solver, and max iterations. In this tuning process, both L1 and L2 regularization strengths were explored, and their impact on model accuracy is depicted in Fig. 7. Similarly, the accuracy score results for different solvers are presented in Fig. 8, providing insights into their impact on model accuracy. These figures collectively offer a comprehensive understanding of changes in hyperparameters that influence the accuracy of the RLR model. The accuracy scores obtained for L1 and L2 regularization strength parameters in cross-validation with five folds (K = 5) are as follows: 0.7667 with the newton-cg solver, 0.7236 with lbfgs, and 0.706 with liblinear. Furthermore, the accuracy scores achieved with different optimization solvers, namely, newton-cg, lbfgs, and liblinear, are 0.7667, 0.7236, and 0.7536, respectively, in the context of cross-validation with five folds. The optimal hyperparameter configuration, specifically L2 regularization, newton-cg solver, and 124 max iterations, is determined through this process, representing the optimal setup to enhance the accuracy of the model in the classification of esophageal disorders.

Fig. 7
figure 7

Results of accuracy scores corresponding to regularization strength during hyperparameter tuning

Fig. 8
figure 8

Results of accuracy score for different optimization solvers during hyperparameter tuning

The performance of the HMEDD model was thoroughly evaluated on the testing subset of the data set using a comprehensive set of metrics, such as Ar, Pc, Rc, F1s, Er, and AUCs. A confusion matrix was used as a visual representation of the performance of the classification algorithm, indicating true-positive, true-negative, false-positive, and false-negative values for each class in the classification task. In the classification of esophageal disorders, initially the performance of the HMEDD model is precisely evaluated using a confusion matrix based on the testing subset. Each row of the matrix represents instances predicted for a specific class, while each column represents instances belonging to an actual class. This matrix clearly illustrates the proficiency of the HMEDD model in categorizing gastroscopic images into various esophageal disorders, namely, normal esophagus (C-0), GERD (C-1), polyps (C-2), esophagitis (C-3), non-dysplasia Barrett’s esophagus (C-4), and dysplasia Barrett’s esophagus (C-5).

The outcomes of the HMEDD model’s esophageal disorder classification are presented as a confusion matrix in Fig. 9. The results affirm that the HMEDD model classified esophageal images into six classes, namely, C-0, C-1, C-2, C-3, C-4, and C-5. In these matrices, the x-axis represents the target class, whereas the y-axis represents the output class. For instance, in 20% of the test database, out of a total of 433 cases, the HMEDD model accurately identified 76 instances as C-0, 70 as C-1, 72 as C-2, 71 as C-3, 49 as C-4, and 61 as C-5. The confusion matrix not only offers a detailed view of the number of correctly or incorrectly identified images but also provides vital data for computing essential performance metrics. These visual representations offer deep insights into the accuracy and effectiveness of the HMEDD model in the classification of different esophageal disorders. These confusion matrices highlight the significant impact of deep feature learning from gastroscopic images, showcasing the robust predictive performance of the model. They not only showcase the correct and incorrect identifications but also furnish crucial information for the calculation of key performance metrics, enriching our understanding of the potential of the HMEDD model.

Fig. 9
figure 9

Confusion matrix results for the proposed HMEDD approach using 20% of the testing subset

Table 3 presents a comprehensive breakdown of metrics such as Ar, Pc, Rc, F1s, and Er derived from the corresponding confusion matrices of the 20% testing subset in the database and describes the performance of the HMEDD model in classifying each esophageal disorder category. The rich and diverse features acquired through EsoNet played a pivotal role in enhancing the performance of the HMEDD model. This robustness was particularly important as it allowed the model to handle variations in imaging conditions such as lighting and angles, ensuring consistent and accurate classification under different scenarios. The output feature layer of EsoNet is 16 times smaller than the input size of the gastroscopic image due to downsampling operations applied during the network’s convolutional and pooling layers. EsoNet processes input images through a series of convolutional filters and pooling operations, which reduce the spatial dimensions of feature maps. Specifically, in the case of an input image with dimensions of 128 × 128 pixels, convolutional layers employ filters to extract features, and the subsequent pooling layers progressively decrease the size of the feature maps. Hence, the output feature layer, representing the learned features, is 16 times smaller in both width and height than the input image. This size reduction allows the network to focus on efficiently capturing essential patterns and details in the input images. Furthermore, the effectiveness of EsoNet in multiclass classification is a significant advantage. The RLR model is used for the classification of various esophageal disorders based on the extracted deep features. RLR performance was enhanced by refining the hyperparameters of the LR model based on RandomizedSearchCV. The integration of EsoNet features in the RLR model significantly improves the accuracy of categorization in the proposed HMEDD approach.

Table 3 Performance analysis of esophageal disorder categories using HMEDD approach

As shown in Table 4, the combination of EsoNet with RLR significantly improved the overall performance of the proposed HMEDD approach, facilitating the precise categorization of various esophageal disorders. The HMEDD model shows exceptional performance, achieving noteworthy values of 0.9215 for Ar, 0.9200 for Pc, 0.9200 for Rc, 0.9200 for F1s, and 0.0785 for Er on the 20% testing subset. The impressive accuracy rate (Ar) is attributable to the effective utilization of EsoNet, which comprises five convolutional slabs for extracting deep features. The first convolutional slab captures low-level features from the input image, such as basic shapes, edges, and textures relevant to the characteristics of esophageal tissues. Building upon the features extracted in the previous slab, the second convolutional slab performs additional convolutions to capture more complex patterns and structures, including learning hierarchical representations. The third convolutional slab further refines feature maps and extracts higher level features that are increasingly abstract and representative of more intricate details in the gastroscopic image. This may involve recognizing specific patterns indicative of different esophageal disorders. As the next step in the hierarchical feature extraction, the fourth convolutional slab refines and combines abstract features. It aims to enhance the network’s understanding of complex relationships between different elements in input images, which is crucial for accurate classification. The fifth and final convolutional slab extracts the most abstract and high-level features from input images, which is essential for capturing the nuance characteristics of esophageal disorders, thus enabling the model to make informed decisions during classification. While classifying esophageal disorders, achieving high Pr and Rc is crucial for both minimizing misclassifications and ensuring that the model effectively captures instances of different disorders present in the data set. The HMEDD model makes precise classification across all six classes, as evidenced by its outstanding performance across various evaluation criteria.

Table 4 Overall performance of HMEDD approach using different metrics

Finally, the ROC curve, as illustrated in Fig. 10, serves as a visual representation of the performance of the proposed HMEDD model across six distinct categories. Each category corresponds to a specific type of esophageal disorder, and the ROC curve provides insights into the model’s ability to distinguish between different classes. AUC scores associated with each category quantify the overall discriminatory power of the model for that particular class. AUCs for each category are as follows: C-0 (0.96), C-1 (0.95), C-2 (0.91), C-3 (0.83), C-4 (0.84), and C-5 (0.86). The average AUC calculated across all categories is 0.8917, indicating the efficiency of the HMEDD model. These results, along with metrics such as Ar, Pc, Rc, F1s, and Er, collectively highlight the effectiveness of the HMEDD model in the accurate and comprehensive classification of various types of esophageal disorders at an early stage.

Fig. 10
figure 10

ROC curve analysis of the proposed HMEDD model across six esophageal disorders types: a normal esophagus; b GERD; c polyps; d esophagitis; e non-dysplasia Barrett’s esophagus; and f dysplasia Barrett’s esophagus

4.3 Ablation studies

To evaluate the effectiveness of the HMEDD model, ablation studies were conducted using EsoNet as the benchmark. The evaluation process began with the introduction of EsoNet with Softmax, which was then replaced with different classifiers, namely, Support Vector Machine (SVM), Random Forest (RF), Logistic Regression (LR), and finally RLR (HMEDD, proposed); the classification was conducted after high-level feature learning from dense layers. All these implementations utilized various ML models. Then, the classification performance of EsoNet–Softmax, EsoNet–SVM, EsoNet–RF, EsoNet–LR, and HMEDD on the esophageal disorders data set was assessed. Through visual comparison via confusion matrices (shown in Figs. 9 and 11) and computing the values of Ar, Pc, Rc, F1s, and Er, the proposed method consistently achieved optimal results (indicated in boldface in Table 5).

Fig. 11
figure 11

Confusion matrices for EsoNet with different types of classifiers: a EsoNet–Softmax; b EsoNet–SVM; c EsoNet–RF; and d EsoNet–LR

Table 5 Performance comparison of esophageal disorders classification using different classifiers and metrics

The ROC curve analysis of different classifiers is shown in Fig. 12. The efficiency of the HMEDD model is evident from the average AUC calculated across all categories, with values of 0.8100, 0.8833, 0.7933, 0.8600, and 0.8917 for EsoNet–Softmax, EsoNet–SVM, EsoNet–RF, EsoNet–LR, and HMEDD, respectively. These findings underscore the superior overall classification performance of the proposed model, highlighting the efficacy of the novel HMEDD approach.

Fig. 12
figure 12

Analysis of ROC curve for EsoNet with different types of classifiers: a EsoNet–Softmax; b EsoNet–SVM; c EsoNet–RF; d EsoNet–LR

These results indicate that the proposed HMEDD model outperforms other models in terms of Ar, Pc, Rc, F1s, Er, and AUCs. As shown in Fig. 11d, the LR method’s confusion matrix illustrates consistent results across all categories, particularly for non-dysplasia Barrett’s esophagus and dysplasia Barrett’s esophagus. Based on this observation, LR was used for further refinement aimed at enhancing the classification performance. The integration of RLR with EsoNet features contributed to the robustness and stability of the classification process. The adaptability of RLR, achieved through hyperparameter tuning during training, allowed the model to fine-tune its performance, making it effective in handling complexities and variabilities present in medical imaging data. This adaptability is crucial for achieving optimal generalization to a diverse range of esophageal disorders. The proposed HMEDD approach not only shows superior classification performance but also emphasizes the significance of incorporating adaptive hyperparameter tuning, especially through the use of RLR, in achieving an accurate and robust diagnosis of esophageal disorders. The model’s ability to effectively tune hyperparameters enhances its capability to precisely classify a diverse range of esophageal disorders, making it a promising solution for early stage diagnosis.

4.4 Discussions

This section delves into a comprehensive analysis of the results obtained using the proposed HMEDD model, focusing on feature extraction, hyperparameter tuning, classification model performance, and comparative studies. The integration of EsoNet in the HMEDD model for feature extraction proved instrumental as the model achieved accurate classifications of gastroscopic images of the esophagus. The deep layers of EsoNet effectively learned intricate patterns and representations from raw images (as shown in Fig. 4) and significantly enhanced the accuracy of the classification process. The concatenated features from EsoNet provided a robust input for the RLR model, which in turn contributed to improved classification results. The transition from LR to RLR in the HMEDD model involved a meticulous hyperparameter tuning process. The following parameters were considered crucial for optimization: regularization strength, solver, and max iterations. The impact of L1 and L2 regularization on accuracy scores is shown in Fig. 7, whereas the influence of different solvers is presented in Fig. 8. The optimal configuration, i.e., L2 regularization, newton-cg solver, and 124 max iterations, significantly enhanced the accuracy of the model in classifying esophageal disorders. The model’s performance was thoroughly evaluated on the testing subset, using a comprehensive set of metrics such as Ar, Pc, Rc, F1s, Er, and AUCs. The confusion matrix, as shown in Fig. 9, visually represents the model’s proficiency in categorizing gastroscopic images into various esophageal disorders. A detailed breakdown of metrics derived from the corresponding confusion matrices is presented in Table 3, offering insights into the model’s robustness and handling of variations in imaging conditions. The rich and diverse features obtained through EsoNet played a pivotal role in enhancing the performance of the HMEDD model. A comprehensive breakdown of metrics such as Ar, Pc, Rc, F1s, and Er is presented in Table 4, highlighting the effectiveness of EsoNet features in facilitating the precise categorization of various esophageal disorders. Downsampling operations applied during EsoNet’s convolutional and pooling layers were crucial to ensuring consistent and accurate classification across different scenarios. Compared with other methods, the proposed method consistently demonstrated higher F1s values across various esophageal disorders. The F1s value, a composite parameter of Pc and Rc, reflects the overall classification ability. The ROC curve, as shown in Fig. 10, provides a visual representation of the performance of the proposed HMEDD across six distinct categories. The AUCs associated with these classes quantify the overall discriminatory power of the model for each class. The average AUC of 0.8917 indicates the efficiency of the HMEDD model in accurately classifying various types of esophageal disorders at an early stage. Ablation studies were conducted using EsoNet as the benchmark, in which the classification performance of various ML classifiers was assessed. A statistical comparison of EsoNet with Softmax; EsoNet with different classifiers (SVM, RF, and LR); and the proposed HMEDD with RLR is presented in Table 5. The consistently superior results of the HMEDD model underscore its overall classification performance, emphasizing the efficacy of this novel approach. This superior classification performance makes HMEDD a promising solution for early stage diagnosis of esophageal disorders.

Furthermore, to assess the performance of the HMEDD method, a comparative analysis was conducted using other established methods, as shown in Table 6. The methods used for comparison encompass diagnosing esophageal squamous cell carcinoma using the following: endocytoscopic system images without the need for a biopsy-based histological reference [17]; a CNN, which is a deep learning methodology, to autonomously categorize esophageal cancer and discern it from premalignant lesions [34]; deep convolutional generative adversarial networks for data augmentation and LeNet-5 and AlexNet in the classification process, enabling the identification of Barrett’s esophagus and adenocarcinoma [35]; and a densely constructed CNN tailored for the classification of esophageal diseases [45]. As shown in Table 6, the proposed HMEDD method consistently demonstrated higher accuracy in classifying various esophageal disorders than other methods. Specifically, the proposed method achieved optimal values in classifying different esophageal disorders compared with others that focused on a limited range of esophageal disorders and esophageal cancer only.

Table 6 Comparison of the proposed HMEDD method with the existing methods

Although the proposed approach yielded superior results than other relevant methods in the classification of esophageal disorders, it has notable constraints as well. The original data set size was used in the proposed method, comprising 314 gastroscopic images. However, there is a scarcity of high-volume Barrett’s esophagus images, leading to an imbalance in the data set. To address this, a balanced data set was generated by selecting an equal number of images for each category, augmented to a total of 2198 images (based on image augmentation). Despite these efforts, the size of the data set remains the primary limitation. Future work focuses on expanding the data set with a more extensive collection of images for each category. Additionally, refining the model to address specific characteristics of esophageal disorders will be crucial for enhancing overall classification performance. Incorporating preprocessing techniques [47,48,49] such as image denoising to reduce noise and enhance critical details, along with implementing contrast adjustment and sharpening for improved image quality, is expected to significantly enhance the model's ability to accurately distinguish between different esophageal conditions.

5 Conclusion

The HMEDD model, powered by deep learning, significantly contributes to the early diagnosis of various esophageal disorders. The framework developed in this study encompasses key stages, including image augmentation in preprocessing, deep feature extraction with EsoNet, hyperparameter tuning through RandomizedSearchCV, and classification via RLR. It shows remarkable performance in the classification of various esophageal disorders, which is attributable to the integration of the advanced feature extraction technique of EsoNet and the accurate classification capabilities of RLR. Experimental validation conducted on the Hyper Kvasir data set and open-source images showcase the effectiveness of the HMEDD model, with impressive accuracy (Ar), precision (Pc), recall (Rc), F1-score (F1s), and error rate (Er) values of 0.9215, 0.9200, 0.9200, 0.9200, and 0.0785, respectively. Upon comparison with the existing methods in the literature, the proposed method shows superior performance, surpassing previous studies with an impressive level of accuracy. This model thus empowers physicians to better identify and diagnose esophageal disorders, reducing both false positives and false negatives, which in turn leads to more accurate treatment plans and improved patient outcomes. Incorporating deep learning into this domain has the potential to bolster early detection efforts, mitigating the risk of esophageal disorders and cancer, and improving patient outcomes.

Availability of data and materials

The data sets used and analyzed during the current study are available from the corresponding author on reasonable request. The Hyper Kvasir data set, including images, videos, and metadata, is accessible via the Open Science Framework (OSF) at https://osf.io/mh9sj/ or directly from https://datasets.simula.no/hyper-kvasir/.

Abbreviations

Ar:

Accuracy

AUC:

Area under the receiver operating characteristic (ROC) curve

BN:

Batch normalization

CAD:

Computer-aided diagnosis

CCD:

Charge-coupled device

CMOS:

Complementary metal oxide semiconductor

CNNs:

Convolutional neural networks

DF:

Deep features

EAC:

Esophageal adenocarcinoma

Er:

Error Rate

EsoNet:

Esophageal convolutional neural network

F1s:

F1-Score

GERD:

Gastroesophageal reflux disease

GI:

Gastrointestinal

HMEDD:

Hybrid model-based esophageal disorder diagnosis

Leaky ReLU:

Leaky rectified linear unit

LR:

Logistic regression

ML:

Machine learning

OSF:

Open science framework

Pc:

Precision

R-CNN:

Regional-based convolutional neural network

RandomizedSearchCV:

Random search cross-validation

ResNet:

Deep residual network

RF:

Random forest

RLR:

Refined logistic regression

ROC:

Receiver operating characteristic

Rc:

Recall

SSD:

Single-shot detector

SIFT:

Scale-invariant feature transform

SURF:

Speed-up robust features

SVM:

Support vector machine

TL:

Transfer learning

References

  1. T. Keen, C. Brooks, Principles of gastrointestinal endoscopy. Surgery 38(3), 155–160 (2020)

    Google Scholar 

  2. W. Du, N. Rao, D. Liu, H. Jiang, C. Luo, Z. Li, T. Gan, B. Zeng, Review on the applications of deep learning in the analysis of gastrointestinal endoscopy images. IEEE Access 7, 142053–142069 (2019)

    Article  Google Scholar 

  3. J. Kim, H. Al Faruque, S. Kim, E. Kim, J.Y. Hwang, Multimodal endoscopic system based on multispectral and photometric stereo imaging and analysis. Biomed. Opt. Express 10(5), 2289–2302 (2019)

    Article  Google Scholar 

  4. M. Vasilakakis, A. Koulaouzidis, D.E. Yung, J.N. Plevris, E. Toth, D.K. Iakovidis, Follow-up on: optimizing lesion detection in small bowel capsule endoscopy and beyond: from present problems to future solutions. Expert Rev. Gastroenterol. Hepatol. 13(2), 129–141 (2019)

    Article  Google Scholar 

  5. M.V. Sivak, Gastrointestinal endoscopy: past and future. Gut 55(8), 1061–1064 (2005)

    Article  Google Scholar 

  6. S. Haase, A. Maier, in Endoscopy, ed. by A. Maier, S. Steidl, V. Christlein, J. Hornegger J, Medical Imaging Systems: An Introductory Guide [Internet], Chapter 4. (Springer, Cham (CH), 2018), PMID: 31725221

  7. J. Mannath, K. Ragunath, Role of endoscopy in early oesophageal cancer. Nat. Rev. Gastroenterol. Hepatol. 13(12), 720–730 (2016)

    Article  Google Scholar 

  8. R.M. Gore, M.S. Levine, in Diseases of the upper GI tract, ed. by J. Hodler, R.A. Kubik-Huch, G.K. von Schulthess. Diseases of the Abdomen and Pelvis 2018–2021: Diagnostic Imaging - IDKD Book [Internet], Chapter 10. (Springer, Cham (CH), 2018), PMID: 31314379

  9. M. José, D. Arnal, Á.F. Arenas, Á.L. Arbeloa, Esophageal cancer: risk factors, screening and endoscopic treatment in Western and Eastern countries. World J. Gastroenterol. 21(26), 7933–7943 (2015)

    Article  Google Scholar 

  10. F.L. Huang, S.J. Yu, Esophageal cancer: risk factors, genetic association, and treatment. Asian J. Surg. 41(3), 210–215 (2018)

    Article  Google Scholar 

  11. A. Mocanu, R. Bârla, P. Hoara, S. Constantinoiu, Current endoscopic methods of radical therapy in early esophageal cancer. J. Med. Life 8, 150–156 (2015)

    Google Scholar 

  12. Y. Sun, T. Zhang, W. Wu, D. Zhao, N. Zhang, Y. Cui, Y. Liu, J. Gu, P. Lu, F. Xue, J. Yu, J. Wang, Risk factors associated with precancerous lesions of esophageal squamous cell carcinoma: A screening study in a high risk Chinese population. J. Cancer 10(14), 3284–3290 (2019)

    Article  Google Scholar 

  13. S. Mönig, M. Chevallay, N. Niclauss, T. Zilli, W. Fang, A. Bansal, J. Hoeppner, Early esophageal cancer: the significance of surgery, endoscopy, and chemoradiation. Ann. N. Y. Acad. Sci. 1434, 115–123 (2018)

    Article  Google Scholar 

  14. K. Goda, A. Dobashi, N. Yoshimura, M. Kato, H. Aihara, K. Sumiyama, H. Toyoizumi, T. Kato, M. Ikegami, H. Tajiri, Narrow-band imaging magnifying endoscopy versus lugol chromoendoscopy with pink-color sign assessment in the diagnosis of superficial esophageal squamous neoplasms: a randomised noninferiority trial. Gastroenterol. Res. Pract. 2015, 1–10 (2015)

    Article  Google Scholar 

  15. Y. Horie, T. Yoshio, K. Aoyama, S. Yoshimizu, Y. Horiuchi, A. Ishiyama, T. Hirasawa, T. Tsuchida, T. Ozawa, S. Ishihara, Y. Kumagai, M. Fujishiro, I. Maetani, J. Fujisaki, T. Tada, Diagnostic outcomes of esophageal cancer by artificial intelligence using convolutional neural networks. Gastrointest. Endosc. 89(1), 25–32 (2019)

    Article  Google Scholar 

  16. Y. Mori, S. Ei Kudo, H.E.N. Mohmed, M. Misawa, N. Ogata, H. Itoh, M. Oda, K. Mori, Artificial intelligence and upper gastrointestinal endoscopy: current status and future perspective. Dig. Endosc. 31(4), 378–388 (2019)

    Article  Google Scholar 

  17. Y. Kumagai, K. Takubo, K. Kawada, K. Aoyama, Y. Endo, T. Ozawa, T. Hirasawa, T. Yoshio, S. Ishihara, M. Fujishiro, J. Ichi Tamaru, E. Mochiki, H. Ishida, T. Tada, Diagnosis using deep-learning artificial intelligence based on the endocytoscopic observation of the esophagus. Esophagus 16(2), 180–187 (2019)

    Article  Google Scholar 

  18. X. Liu, C. Wang, J. Bai, G. Liao, Fine-tuning pre-trained convolutional neural networks for gastric precancerous disease classification on magnification narrow-band imaging images. Neurocomputing 392, 253–267 (2020)

    Article  Google Scholar 

  19. Y. Zhu, Q.C. Wang, M.D. Xu, Z. Zhang, J. Cheng, Y.S. Zhong, Y.Q. Zhang, W.F. Chen, L.Q. Yao, P.H. Zhou, Q.L. Li, Application of convolutional neural network in the diagnosis of the invasion depth of gastric cancer based on conventional endoscopy. Gastrointest. Endosc. 89(4), 806-815.e1 (2019)

    Article  Google Scholar 

  20. D. Liu, H. Jiang, N. Rao, W. Du, C. Luo, Z. Li, L. Zhu, T. Gan, Depth information-based automatic annotation of early esophageal cancers in gastroscopic images using deep learning techniques. IEEE Access 8, 97907–97919 (2020)

    Article  Google Scholar 

  21. L.J. Guo, X. Xiao, C.C. Wu, X. Zeng, Y. Zhang, J. Du, S. Bai, J. Xie, Z. Zhang, Y. Li, X. Wang, O. Cheung, M. Sharma, J. Liu, B. Hu, Real-time automated diagnosis of precancerous lesions and early esophageal squamous cell carcinoma using a deep learning model (with videos). Gastrointest. Endosc. 91(1), 41–51 (2020)

    Article  Google Scholar 

  22. M. Ohmori, R. Ishihara, K. Aoyama, K. Nakagawa, H. Iwagami, N. Matsuura, S. Shichijo, K. Yamamoto, K. Nagaike, M. Nakahara, T. Inoue, K. Aoi, H. Okada, T. Tada, Endoscopic detection and differentiation of esophageal lesions using a deep neural network. Gastrointest. Endosc. 91(2), 301-309.e1 (2020)

    Article  Google Scholar 

  23. D.Y. Liu, T. Gan, N.N. Rao, Y.W. Xing, J. Zheng, S. Li, C.S. Luo, Z.J. Zhou, Y.L. Wan, Identification of lesion images from gastrointestinal endoscope based on feature extraction of combinational methods with and without learning process. Med. Image Anal. 32, 281–294 (2016)

    Article  Google Scholar 

  24. F. Riaz, F.B. Silva, M.D. Ribeiro, M.T. Coimbra, Invariant Gabor texture descriptors for classification of gastroenterology images. IEEE Trans. Biomed. Eng. 59(10), 2893–2904 (2012)

    Article  Google Scholar 

  25. F. Van Der Sommen, S. Zinger, E.J. Schoon, P. De With, Supportive automatic annotation of early esophageal cancer using local gabor and color features. Neurocomputing 144, 92–106 (2014)

    Article  Google Scholar 

  26. A. Setio et al., Evaluation and comparison of textural feature representation for the detection of early stage cancer in endoscopy. In Proc. 8th Int. Conf. Comput. Vis. Theory Appl. 238–243 (2013).

  27. L.A. de Souza et al., A survey on Barrett’s esophagus analysis using machine learning. Comput. Biol. Med. 96, 203–213 (2018)

    Article  Google Scholar 

  28. L. A. De Souza, L. C. S. Afonso, C. Palm, and J. P. Papa, Barrett’s esophagus identification using optimum-path forest. In Proc. 30th SIBGRAPI Conf. Graph., Patterns Images. 308–314 (2017).

  29. G. Litjens, T. Kooi, B.E. Bejnordi, A.A.A. Setio, F. Ciompi, M. Ghafoorian, J.A.W.M. van der Laak, B. van Ginneken, C.I. Sánchez, A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)

    Article  Google Scholar 

  30. T. Natarajan, L. Devan, Transfer learning supported accurate assessment of multiclass cervix type images. Proc. Inst. Mech. Eng. [H] 237(2), 265–281 (2023)

    Article  Google Scholar 

  31. G. Valarmathi, S. Nirmala Devi, Automatic localization and classification of intervertebral disc herniation using hybrid classifier. Biomed. Signal Process. Control 86, 105291 (2023)

    Article  Google Scholar 

  32. C. Yan, L. Meng, L. Li, J. Zhang, Z. Wang, J. Yin, J. Zhang, Y. Sun, B. Zheng, Age-invariant face recognition by multi-feature fusionand decomposition with self-attention. ACM Trans. Multimedia Comput. Commun. Appl. 18, 1–18 (2022)

    Article  Google Scholar 

  33. C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. Inception-v4, inception-ResNet and the impact of residual connections on learning. In Proc. 31st AAAI Conf. Artif. Intell. 4278–4284 (2017).

  34. G. Liu, J. Hua, Z. Wu, T. Meng, M. Sun, P. Huang, X. He, W. Sun, X. Li, Y. Chen, Y, Automatic classification of esophageal lesions in endoscopic images using a convolutional neural network. Ann. Trans. Med. 8(7) (2020)

  35. L.A. de Souza et al., Assisting Barrett’s esophagus identification using endoscopic data augmentation based on generative adversarial networks. Comput. Biol. Med. 126, 104029 (2020)

    Article  Google Scholar 

  36. R. Mendel, A. Ebigbo, A. Probst, H. Messmann, and C. Palm. Barrett’s esophagus analysis using convolutional neural networks. In Proc. Bild-verarbeitung Für Die Medizin. 80–85 (2017).

  37. S. Van Riel, F. Van Der Sommen, S. Zinger, E. J. Schoon, and P. H de With, “Automatic detection of early esophageal cancer with CNNs using transfer learning,” in Proc. 25th IEEE Int. Conf. Image Process. 1383–1387 (Oct. 2018).

  38. N. Ghatwary, M. Zolgharni, X. Ye, Early esophageal adenocarcinoma detection using deep learning methods. Int. J. Comput. Assisted Radiol. Surg. 14(4), 611–621 (2019)

    Article  Google Scholar 

  39. R. Girshick, J. Donahue, T. Darrell, J. Malik, Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)

    Article  Google Scholar 

  40. R. Girshick. “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vision. 1440–1448 (2015).

  41. S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proc. Adv. Neural Inf. Process. Syst. 91–99 (2015).

  42. W. Liu et al. SSD: Single shot multibox detector. In Proc. Eur. Conf. Comput. Vision. 21–37 (2016).

  43. N. Ghatwary, X. Ye, M. Zolgharni, Esophageal abnormality detection using DenseNet based faster R-CNN with gabor features. IEEE Access 7, 84374–84385 (2019)

    Article  Google Scholar 

  44. T. Prince, B. WondmanehGetahun, K. AmbachewGoshu, C. DessalewMengesha, G. WorkuMuche and G. Ramkumar, "Multi-Classification and Segmentation of Esophageal Lesions Using an Improved Deep Learning Model from Endoscopic Images," 2023 Eighth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), Chennai, India. 1–8, https://doi.org/10.1109/ICONSTEM56934.2023.10142773 (2023).

  45. W. Du, N. Rao, C. Dong, Y. Wang, D. Hu, L. Zhu, B. Zeng, T. Gan, Automatic classification of esophageal disease in gastroscopic images using an efficient channel attention deep dense convolutional neural network. Biomed. Opt. Express 12, 3066–3081 (2021)

    Article  Google Scholar 

  46. H. Borgli, V. Thambawita, P.H. Smedsrud, S. Hicks, D. Jha, S.L. Eskeland, K.R. Randel, K. Pogorelov, M. Lux, D.T.D. Nguyen, D. Johansen, HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Sci. Data 7(1), 283 (2020)

    Article  Google Scholar 

  47. C. Yan, B. Gong, Y. Wei, Y. Gao, Deep multi-view enhancement hashing for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 43(4), 1445–1451 (2020)

    Article  Google Scholar 

  48. C. Yan, Z. Li, Y. Zhang, Y. Liu, X. Ji, Y. Zhang, Depth image denoising using nuclear norm and learning graph model. ACM Trans. Multimedia Comput. Commun. Appl. 16(4), 1–17 (2020)

    Article  Google Scholar 

  49. C. Yan, T. Teng, Y. Liu, Y. Zhang, H. Wang, X. Ji, Precise no-reference image quality evaluation based on distortion identification. ACM Trans. Multimedia Comput. Commun. Appl. 17(3s), 1–21 (2021)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge Dr Ilakkiya Sekar (129179) MBBS., MS, FIAGES, Tagore Medical College and Hospital, Chennai and Dr R.Vedanayagi,(166168) MBBS., MMST (pursuing), Indian institute of Technology, Kharagpur in all aspect of study and for validating the result.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Contributions

Data collection, conception and design of work—R Janaki. Analysis tools and performance of analysis—R Janaki & D Lakshmi. Drafted the work—R Janaki & D Lakshmi. Grammar and supervision—D Lakshmi.

Corresponding author

Correspondence to R. Janaki.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Janaki, R., Lakshmi, D. Hybrid model-based early diagnosis of esophageal disorders using convolutional neural network and refined logistic regression. J Image Video Proc. 2024, 19 (2024). https://doi.org/10.1186/s13640-024-00634-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13640-024-00634-3

Keywords