 Research
 Open Access
 Published:
A probabilistic segmentation and entropyrank correlationbased feature selection approach for the recognition of fruit diseases
EURASIP Journal on Image and Video Processing volume 2021, Article number: 14 (2021)
Abstract
Agriculture plays a critical role in the economy of several countries, by providing the main sources of income, employment, and food to their rural population. However, in recent years, it has been observed that plants and fruits are widely damaged by different diseases which cause a huge loss to the farmers, although this loss can be minimized by detecting plants’ diseases at their earlier stages using pattern recognition (PR) and machine learning (ML) techniques. In this article, an automated system is proposed for the identification and recognition of fruit diseases. Our approach is distinctive in a way, it overcomes the challenges like convex edges, inconsistency between colors, irregularity, visibility, scale, and origin. The proposed approach incorporates five primary steps including preprocessing,Standard instruction requires city and country for affiliations. Hence, please check if the provided information for each affiliation with missing data is correct and amend if deemed necessary. disease identification through segmentation, feature extraction and fusion, feature selection, and classification. The infection regions are extracted using the proposed adaptive and quartile deviationbased segmentation approach and fused resultant binary images by employing the weighted coefficient of correlation (CoC). Then the most appropriate features are selected using a novel framework of entropy and rankbased correlation (EaRbC). Finally, selected features are classified using multiclass support vector machine (MCSCM). A PlantVillage dataset is utilized for the evaluation of the proposed system to achieving an average segmentation and classification accuracy of 93.74% and 97.7%, respectively. From the set of statistical measure, we sincerely believe that our proposed method outperforms existing method with greater accuracy.
Introduction
The plant diseases affect both quality and quantity of agricultural products by interfering with set of processes including plant growth, flower and fruit development, and absorbent capacity, to name but a few [1]. Therefore, early detection and classification of plant diseases play a vital role in agriculture farming. Nevertheless, two possible options may be availed — manual inspection and computer vision techniques. The former method is quite difficult and requires a lot of efforts and time [2], while the latter is mostly followed because of its improved performance [3]. Plants show range of symptoms from their early to final stages, which can be easily observed on fruits and leaves/stem with the naked eye. Therefore, set of symptoms can be categorized using computer vision (CV) and other machine learning (ML) methods [4].
A great effort has been made in the field of CV to process visual features extracted from fruits’ images for the recognition of multiple diseases [5]. Several existing methods worked well, but not considering different set of constraints — specifically related to image quality [6–11], training/testing samples, number of labels, and disease complexity, to name but a few [12]. In this article, two fruits are selected and four different types of fruits’ diseases are initially focused including apple scab, apple rust, grapes rot leaves, and grapes leaf blight. Mostly existing methods follow a typical architecture, which includes (a) preprocessing block, (b) segmentation block, (c) feature extraction block, and (d) classification block. Several detection methods are employed by scholars working in this domain including clustering, thresholding, color, shape, and texturebased methods, adaptive approaches, etc. All these methods are somewhat problem dependent and by some means following a same trend — addressing one sort of problems while keeping other problems’ parameters fixed. Therefore, no universal mechanism exists which efficiently deals with all kind of problems.
In this article, we are primarily focusing on the classification of aforementioned diseases by following fundamental steps. Our primary contributions are enumerated below:
Major contributions
In this article, we introduced a new automated method for the identification and recognition of apple and grape diseases. The proposed method consists of five major steps: (a) contrast stretching; (b) identification of disease part by a fusion of novel adaptive and quartile deviation (QD)based segmentation, which efficiently performs at the change in scale, origin, and irregularity of infection regions; (c) feature extraction and fusion; (d) an integrated framework of entropy and rank correlation is implemented for feature selection; and (e) classification. Our major contributions are listed below.

1.
A contrast stretching technique based on global min and max values is proposed, which defines a contrast range to determine lower and upper threshold values.

2.
An adaptive thresholding method following trapezoidal rule is proposed, which works in two steps: (1) location of infected regions and (2) computing threshold based on maxima and minima — calculated after taking second derivative.

3.
A parallel feature fusion methodology is opted, which jointly takes advantage of three sets of feature (color, texture, and shape)to select the most discriminant value.

4.
To overcome the problem of curse of dimensionality, a feature selection methodology is proposed, which efficiently assigns ranks to set of features based on entropy.
Literature review
Several methods exist in literature, which accurately classify fruit diseases using computer vision methods [13–18]. Specifically, for the identification of apple and grape diseases, various methods are proposed, which somehow manage to classify set of diseases with acceptable accuracy and sensitivity [19–22]. In unsupervised methods, range of algorithms are proposed including Kmeans clustering [23], global thresholding with morphological operations [24], graph cut methods [25], color segmentation [26], CLPSObased fuzzy color segmentation [27], and adaptive approaches [28], to name but a few.
Bhivini et al. [2] introduced a framework to classify infected regions in apples. In the first stage of segmentation, they utilized Kmeans clustering to excerpt the infected region and then extract color and texture features from the segmented part. Subsequently, feature fusion is performed using simple concatenation prior to classification using random forest method. Similarly, Shiv et al. [5] introduced a novel method to classify apple diseases based on color, texture, and shape features. The introduced method is comprised of three fundamental steps of segmentation using Kmeans; extraction of color, texture, and shape features; and classification using multiclass SVM. Following the same trend, Shiv et al. [28] introduced an adaptive approach to detect infectious regions including apple scab, rot, and blotch by achieving a classification accuracy of 93%. The proposed method incorporates three primary steps of segmentation using Kmeans, feature extraction, and classification using multiclass SVM.
Zhang et al. [29] followed a novel machine learning method for detecting apple diseases. They made use of HSI, YUV, and gray color spaces for the removal of background via thresholding. The infectious regions are extricated by a region growing method to calculate shape, color, and texture features for each region. Finally, the most prominent features are classified using SVM, which are selected using genetic algorithm (GA) and correlationbased feature selection (CFS) method. Similarly, Soni et al. [30] identified plant diseases by following two fundamental steps of segmentation and classification. In the first step, ringbased segmentation is performed to identify infectious regions, followed by the feature extraction step. A probabilistic neural network is used for the final classification of diseases from randomly selected images acquired from the web. Lee et al. [31] implemented a swarm optimizationbased method for the identification of apple diseases. Stochastic PSO algorithm finds out 10 spectral features based on pair of bands to return distinctiveness between each pair of classes. The selected features are later utilized by SVM to achieve improved performance. Harshal et al. [32] introduced a framework for the identification and classification of grape diseases. They implemented a background subtraction method for segmentation and later analyze the regions after passing through a highpass filter. Thereafter, unique fractalbased texture features are extracted and finally classified through a multiclass SVM. They selected downy mildew and black rot diseases for evaluation and achieved classification accuracy of 96.6%.
Pranjali et al. [33] introduced a novel approach of fused classifiers for efficient classification of grape diseases. Initially, both SVM and ANN are utilized independently and then a new ensembles classifier is constructed for final classification. Similarly, Awate et al. [34] introduced a novel idea in which they utilized Kmeans for segmentation. Later, texture, color, morphological, and structural features are calculated, which are then subjected to ANN classifier for final classification. A general comparison with recent methods is also provided in Table 1 — in terms of segmentation technique, type of features, feature selection, classification method, disease type, and classification accuracy.
From the recent studies, it is quite clear that set of methods including fuzzy, thresholding, and Kmeans are mostly utilized for the identification of infectious regions. Recently, inclusion of saliency and CNNbased techniques show improved performance in this domain of agricultural farming [38]. Moreover, color and texture features are mostly utilized for final classification, but “curse of dimensionality” is somehow ignored. In this article, we are primarily focusing on contrast stretching, infectious region segmentation, and ultimately feature selection to avoid aforementioned problem. The contrast stretching technique improves the visual characteristics of an input image, which can help in the segmentation phase. A proposed feature selection algorithm aids in improving the overall classification accuracy.
Proposed method
In this section, the proposed method is explained, which incorporates series of steps including preprocessing, image segmentation and fusion, feature extraction, fusion and selection, and a final step of classification. Figure 1 demonstrates a working framework of the proposed method — clearly explaining series of aforementioned steps.
Contrast stretching
Contrast stretching is mostly applied on the images in which visual contents need to be enhanced. In this article, a global contrast stretching technique is proposed, which directly affects the infectious regions by making them maximally differentiable compared to the background. This method initially finds the global maxima and minima of each red, green, and blue channel to generate a new global minima and maxima values. These calculated values are later utilized to find a new range of intensity values against each channel, which in turns locate a new low and high threshold values.
Let ψ(i,j,k) is an original 3dimensional RGB image, (256×,256×3), where \(\psi _{1}(i, j)=\frac {r}{\sum _{k=1}^{3}r^{k}},\psi _{2}(i, j)=\frac {g}{\sum _{k=1}^{3}g^{k}}\), and \(\psi _{3}(i, j)=\frac {b}{\sum _{k=1}^{3}b^{k}}\) represent the modified red, green, and blue channels. Here, the red channel is fraction of \(red=\frac {red}{red+green+blue}\); therefore, we used \(\sum \) for addition of all pixel values of three channels, and their histograms are shown in Fig. 2.
Suppose T_{L} and T_{H} are low and high threshold values which initialize as 0.01 and 1, respectively. Then global maxima and minima are calculated using initial T_{L} and T_{H} values as follows:
where ϕ_{max} and ϕ_{min} are global maximum and minimum values, Max and Min represents the max and min functions which select the maximum and minimum values from each channel k, where k∈{1:3} of three respective channels red, green, and blue denoted by ψ_{1},ψ_{2}, and ψ_{3}.
The initial values of global maximum and minimum are 1 and 0. Then calculate a new global minimum pixel image by subtracting ϕ_{min} in to the original image ψ(i,j,k) and effects are shown in Fig. 3b. The information of subtracted image is stored in a temporary array (T_{ar}) of size 256×256 and find the maximum and minimum pixel value for the entire processed image by Eqs. 2 and 3:
These values are utilize to calculate the range of contrast by Eq. 4.
where R_{ctr} denotes the contrast range image of dimension 256×256 as shown in Fig. 3c. To control the variation of contrast stretching, the low threshold (T_{L}) and high threshold values (T_{H}) are updated by Eqs. 5 and 6.
The values of low threshold and high threshold are utilized in contrast stretching cost function to concatenate the results of each channels. The cost function produced the new image, which is more enhanced as compared to original image. The cost function is defined by Eq. 7:
where F_{cost}(i,j,k) is a resultant contrast stretched image and R_{ctr} is contrast range value which lies between 0 and 1. Equation 7 shows that if \(\frac { T_{ar}}{T_{H}T_{L}}\geq R_{ctr}\), then the diseased region in the image is enhanced; otherwise, it improves the background. Contrast stretching final results are shown in Figs. 3 and 4, which are later processed in segmentation phase.
Disease identification
In this section, the proposed segmentation method is elucidated — comprising of proposed segmentation and fusion methods. In the former one, a trapezoidal based adaptive thresholding and a quartile deviation (Q.D)based segmentation method are employed independently, while, in the latter, binary images are fused using proposed method of weighted coefficient of correlation. Figure 1 demonstrates set of steps for image segmentation and fusion.
Trapezoidal based adaptive thresholding
Let F_{cost}(i,j,k) is a contrast stretched image. To identify the infectious regions, a trapezoidal rule is employed [39], which calculates the area of infection by utilizing max and min pixel values.
where Total_{n} denotes the total number of pixels in F_{cost}(i,j,k). A second derivative of an image is later computed and Eq. 8 is updated to find max and min pixel values. The obtained pixel values are finally embedded into a cost function to extract the infectious regions.
where D(i,j) and D^{2}(i,j) represent the first and second derivatives of an input image, and Max_{up} and Min_{up} are the updated max and min pixel values. These updated values are initially compared with the old max and min values, defined in Eq. 8, and later updated to calculate the area of infection.
\( \int _{\alpha }^{\beta }f(i)di\) representing area of the infected region, which is further utilized in the threshold function.
where ξ denotes pixels which are directly linked to \(\int _{\alpha }^{\beta }f(i)di\), and T(i,j) represents an optimized adaptive segmented image; sample results are shown in Fig. 5.
Quartile deviationbased segmentation
Quartile deviationbased segmentation is a new segmentation method, which can be directly mapped on to the input image, prior to the thresholding step to generate a binary image. This method works on the basis of coupling — depending on the curve changes. The coupling points are utilized with the normalization function, because Q.D is a property of a normal distribution. Let f(t)∈F_{cost}(i,j,k) having dimension (256×256×3), then the initial function is defined as:
where (μ−r) and (μ+r) represent the points of inflection. Taking L.H.S and putting the normalization function in Eq. 15:
Equating \(\frac {t\mu }{\sigma }=X\) and simplify dt=σdX to obtain a new equation:
According to even property of normal distribution, it will become:
where r denotes final Q.D value, which is finally utilized in desired cost function for the extraction of infectious regions in fruits and plants. The output of the cost function is in the form of infectious and normal pixels.
where t∈F_{cost}(i,j,k) and F_{out}(t) represents the pixels showing infection, which are set in the threshold function to obtain a binary segmented image.
where F_{QD}(i,j) represents the final Q.Dbased segmented image and t_{i} denotes the current enhanced image pixel. The Q.D segmentation results including their contour, mesh graph, and 3D contour images are shown in Fig. 6.
Image fusion
Image fusion concept is mostly employed, where information from multiple sources (images) is consolidated into fewer images, usually a single one. In this article, a weighted coefficient of correlation (WCoC)based technique is implemented for pixelbased fusion of two segmented images. Actual range of CoC lies between (−1 : 1), but in this work, we are working on binary images; therefore, the resultant image is a binary. This method finds a strong correlation between pixels of both images. The highest correlated pixels are assigned higher weights, while lower correlated pixels are considered to be a background and eliminated. Suppose \(\{p_{1}, p_{2}, \dots, p_{n}\}\) are uncorrelated pixels from both segmented images T(i,j) and F_{QD}(i,j) having the same standard deviation, the correlation coefficient is defined as:
where γ_{12} denotes a correlation between pixels which is initialized as γ_{12}=0.
Let σ^{2}(p_{1})=S^{2} and σ^{2}(p_{2})=S^{2} so {i,j}=(u_{1}+u_{2}). Let (i,j)∈(x,y), then the mathematical formulation is done as:
Then assign the weight and bias values which are selected to be 0.8 and 2.5.
The above equation is simplified as:
where \(\sigma ^{2}(p_{1})=\frac {\sum (p_{1}\bar {p_{1}})^{2}}{n}, S^{2}=\frac {\sum (p_{1}\bar {p_{1}})^{2}}{n}, nS^{2}=\sum (p_{1}\bar {p_{1}})^{2}, \sum (x\bar {x})^{2}=2nS^{2}, \sum (y\bar {y})^{2}=2nS^{2}, \sum (x\bar {x})(y\bar {y})=nS^{2}\) and R_{i,j} correlate those pixels which sum is 1. The final fusion results are shown in Fig. 7.
Analysis of segmentation results
For the analysis of proposed segmentation technique against each disease, we selected 400 image samples (100 against each disease — apple scab, apple rust, grapes rot leaves, and grape leaf blight); few can be seen in Fig. 8. Three measures are implemented to show the performance of the proposed method including accuracy, Jaccard Index, and false negative rate — calculated as follows:
where R_{i,j} is a proposed segmented image, S(i,j) is a ground truth, and TP_{l} represents correlated pixels. Results in tabular are provided in Table 2, and graphical results along with their ground truths are shown in Figs. 9 and 10. Additionally, few other sample segmentation results are provided in Fig. 11. The maximum accuracy of 95.63% is achieved from the tested images; moreover, the minimum reported negative rate is 4.37, maximum Jaccard Index is 99.26%, overall average accuracy is 93.74%, average Jaccard Index is 94.17%, and negative rate is 6.26%. Average results are also plotted in Fig. 12, which describes a range of segmented accuracy on all selected images.
Feature extraction
Features play their vital role in recognizing the primary contents of an images or signals. Therefore, in the field of pattern recognition and machine learning, set of techniques are proposed [40–45]. On the one hand, optimal set of features lead to an accurate classification, while, on the other hand, irrelevant and redundant features are one of the factors for high misclassifications. In this article, we are not only focusing on the utilization of multiple set of features but also avoiding feature redundancy by implementing a suitable feature selection method. We utilize three different types of features including statistical, color [46], and texture (segmented local binary patterns (SLBP)) from the segmented images.
For color features, RGB, HSV, LAB, and YCbCr color spaces are used and four measures, mean, standard deviation, entropy, and skewness, are calculated against each channel. From each color space, we obtain a feature vector of size 1×12, which increases up 1×48 for all selected color spaces, and N×48 for N images.
For statistical features, Harlick [47] is implemented, which originally used 14 features, but we added 8 new features including correlation 2, cluster prominence, cluster shade, dissimilarity, energy, homogeneity 1, homogeneity 2, and max probability. Addition of these features improves the overall classification accuracy but also increases the computational time. A complete mathematical description of each feature is provided in Table 3, and the final vector size is 1×88.
LBP [48] belongs to a category of texture features, which captures the information related to the neighboring pixels. In this work, ‘A’ channel from LAB color space is utilized as an input for feature extraction, because it provides more information compared to other channels. The proposed segmented local binary pattern features (SLBPF) is based on three steps: (a) calculate the distance between extracted set of LBP features, (b) calculate the statistical features of LBP, and (c) calculate the entropy features of their 8 neighborhood features. The extracted features are simply concatenated each other and make a new feature vector of size 1×72.
where Ψ_{LBP} is a feature vector and \( S(u)=\left \{ \begin {array}{ll} 1 & if \ \ u\geq 0 \\ 0 & if \ \ u<0 \end {array}\right \}\) is a threshold function, n=8,g_{p} denotes total number of neighbors, and g_{c} is a pivot location [49]. Distance between feature is calculated using relation:
where \(\vec {D}_{ij}\) denotes the distance matrix which is later utilized to compute the mean, variance, skewness, and kurtosis. Later, these metrics are concatenated to generate a new vector having dimension 1×64. The entropy features of each 8 neighboring features are computed as:
where a_{x} and a_{y} denote the neighboring ith and jth features; 8 entropy features are extracted and concatenated with the previous vector to obtain a new feature vector having size 1×72. Finally, all features are fused [50] to generate a resultant vector of size 1×208. The core architecture of feature extraction and selection is shown in Fig. 13.
Feature selection
To avoid redundancy, the feature selection step plays a primary role by eliminating and discarding the irrelevant and repeated information, hence selecting the most discriminant information. In this article, we implemented a new method based on rank correlation and entropy value of fused vector. The proposed method incorporates three fundamental steps: (a) calculate the correlation between fused features, (b) find the entropy value of fused features, and (c) selection of features with minimum entropycorrelation values. Find the entropy value of fused features and multiply by rank correlation; (c) set a threshold function to select those features, which are minimum to entropycorrelation value. It is given that extracted fused features f_{1},f_{2},...f_{n} are rank from 1 to n. We need to find out the correlation between the rank of given features. The rank correlation is defined as:
where f_{1} and f_{2} represents the fused feature vector. The above equation solves and simplifies as \(\sum f_{1}, \sum f_{2}=\frac {n(n+1)}{2}\) and \(\sum (f_{1})^{2}, \sum (f_{2})^{2}=\frac {n(n+1)(2n+1)}{6}\). Then calculating the difference between fused features, given as: As φ=f_{1}−f_{2}, where φ denotes the difference between features and taking square both sides and apply \(\sum \) and divided by 2 both sides, then it will become as \(\sum f_{1}f_{2}=\frac {n(n+1)(2n+1)}{6}\frac {\sum \varphi ^{2}}{2}\) and \(n\sum f_{1}f_{2}\sum (f_{1})\sum (f_{2})=\frac {n^{2}(n^{2}1)}{12}\frac {n\sum \varphi ^{2}}{2}\). Similarly, \(n\sum f_{1}^{2}\) and \( n\sum f_{2}^{2}\) is \(=\frac {n^{2}(n^{2}1)}{12}\). Put these simplifications in Eq. 36 and becomes:
where \(\sum \varphi ^{2}=\sum (f_{1})^{2}+ \sum (f_{2})^{2}2\sum f_{1}f_{2}\). Then calculate the entropy value of fused feature vector and multiply it with the correlation. The obtained value is compared with each feature of fused vector and select the features based of final threshold function as follows:
Resultant vector \(\overrightarrow {F(Vec)}\) is utilized for final classification. We performed simulations several times and found selected vector in the range of 180–195. In several experiments, mostly the selected vector size is between 180 and 195. Finally, the multiclass SVM [51] is used as a base classifier for the classification of apple and grape diseases, and its classification results were compared with other wellknown classification methods such as ensemble, decision trees, etc. Two kernel functions of SVM are utilized in this work such as linear and radial basis function (RBF). The linear kernel is used for binary class problem along other parameters such as kernel scale is automatic, classification method is one vs one, and standardized data is true. Similar for RBF kernel, the other parameters include a kernel scale is manual, box constraint level is 4, multiclass method is one vs all, and gamma is initialized as 0.3.
Experimental results and discussion
In this section, the proposed method is validated on a publicly available dataset, PlantVillage [52] — containing set of diseased and healthy images (Fig. 14). To prove the authenticity of the proposed algorithm, firstly, individual features are classified and latter fusion and selection is applied. A 10fold crossvalidation methodology is also opted along with a providence of a fair comparison with other stateoftheart classifiers including decision trees (DT), quadratic discriminant analysis (QDA), quadratic SVM (QSVM), cubic SVM (CSVM), fine KNN (FKNN), weighted KNN (WKNN), ensemble boosted trees (EBT), and ensemble subspace discriminant analysis (ESDA). Six statistical measures are considered for the performance comparison of the proposed method, sensitivity (Sen), specificity (Spec), precision (Prec), false positive rate (FPR), false negative rate (FNR), and accuracy. Training/testing ratio is selected to be 50:50 having 50% training images and 50% for the testing. All the experiments are done in Matlab 2017b, utilizing a personal Intel Core i7 computer.
Apple scab disease
In this section, the classification results on apple scab diseases are presented. Total 2275 images of apple scab (630) and apple healthy (1645) are collected from the PlantVillage dataset. The results are accomplished in two phases. In the first phase, the results are obtained from each extracted set of features as depicted in Table 4 having maximum accuracy on multiclass SVM 94.1%, 86.3%, and 72.0% for SLBP, statistical, and color features, respectively. Then these results are compared with the proposed entropyrank correlationbased selection method. Table 5 shows a maximum accuracy of 97.1%, FNR 2.9%, sensitivity 96.15%, specificity 96.2%, FPR 0.039, and precision 96.10%. Proposed results are confirmed with their confusion matrix of apple scab given in Table 6. From Tables 4 and 5, it is clearly shown that the proposed feature selection method produced best results as compared to individual set of features. Moreover, the proposed method is also compared with previous stateoftheart methods as presented in Table 7, which gives the authenticity of the proposed entropyrank correlation method.
Apple rust disease
A total of 1920 images are collected from the PlantVillage dataset containing apple rust (275) and apple healthy (1645) images. The experiments are being performed in two steps, where in the first step classification results are obtained on each extracted set of features (Table 8). Maximum accuracy achieved is on MSVM classifier, which is 93.2%, 90.9%, and 95.8% for SLBP, Harlick, and color features, respectively. In the second step, selected features are utilized for classification using the proposed method — showing improved performance (Table 9). Classification results are also confirmed using confusion matrix given in Table 6. From Tables 8 and 9, it is quite cleared, with the proposed feature selection method, performance improved significantly. Additionally, proposed classification results are also compared with the existing methods given in Table 7.
Grape diseases
Two types of grape diseases, grapes rot leave and grapes leaf blight, are selected in this section for classification. Total 2679 images are collected from the PlantVillage dataset which include grapes black rot (1180), grapes leaf blight (1076), and healthy (423). The same trend is being followed; in the first step, classification results are obtained on each extracted set of feature (Table 10). In Table 10, the classification results are obtained on grapes rot leaves having accuracy 93.2%, 90.9%, and 95.8% for SLBP, Harlick, and color features, respectively. Also, the proposed classification results of grapes leaf blight are presented in Table 11 with maximum accuracy of 96.30% — also confirmed from the confusion matrix (Table 6). Finally, the proposed results are compared with existing methods described in Table 7, which shows that the proposed method performs significantly well compared to existing methods.
Final classification
In this section, all selected diseases are utilized for classification, and the proposed method is directly implemented on it. The testing results are given in Table 12 having a maximum accuracy of 97.1% on multiclass SVM. The proposed testing results are confirmed by their confusion matrix given in Table 13, which shows the authenticity of the proposed method.
Discussion
On a broader perspective, two primary domains are somewhat covered: (1) infected region segmentation and (2) discriminant feature selection. A proposed method of segmentation is directly relying on image fusion from two different sources — selected results can be seen in Figs. 7, 9, 10, and 11 and Table 2 — having maximum achieved accuracy of 95.63% and average accuracy of 93.45%. In the latter phase, feature selection, three types of features are fused by implementing a simple serialbased method, which are finalized using the entropyrank correlation method. Five experiments are done on selected diseases, apple scab, apple rust, rot grapes leaves, grapes leaf spot, and final classification on all diseases to achieve an accuracy of 97.1%, 94.70%, 96.60%, 96.30%, and 97.7%, respectively. For validation, the classification results are obtained on individual feature type as presented in Tables 4, 8, and 10. The proposed entropyrank correlation results are presented in Tables 5, 9, 14, 11, and 12, which are confirmed by confusion matrix given in Tables 6 and 13, which clearly shows the authenticity of the proposed method. Additionally, 8 new statistical features improve the overall accuracy by embedding set of unique features (Fig. 15). In Fig. 15, it is explained that when 14 texture features are computed, then the achieved accuracies are 81.9%, 82.7%, 81.8%, and 84.5% for apple scab, rust, grapes rot, and grape blights, respectively, whereas the addition of 8 features increases the overall accuracy to 86.3%, 87.2%, 90.9%, and 91.7%, respectively.
In Fig. 16, the F1 score is calculated for the proposed feature selection approach. The F1 score is computed for all selected diseases such as apple scab, apple rust, grapes rot, and grapes leaf blight. The proposed feature selection results in terms of sensitivity, precision, F1 score, and accuracy show that the proposed feature selection method performed better as compared to individual feature sets. Finally, a comparison is conducted with latest techniques in Table 7 which shows that the proposed method performs significantly well as compared to existing methods.
Conclusion
Detection and classification of fruit diseases is an important research area in the field of computer vision and pattern recognition. Due to the complexity and irregularity of diseases in apple and grape leaves/fruits, several existing methods are unable to achieve the required classification accuracy. Therefore, in this article, a new technique is implemented for apple and grape disease detection and classification, which is based on fusion of a novel adaptive thresholding and Q.Dbased segmentation. Later on, set of different features are extracted to perform a serialbased fusion. A novel entropyrank correlation technique is implemented for robust feature selection, which works efficiently, compared to individual features and existing related methods in terms of accuracy, sensitivity, precision, and FPR. The proposed method works not only efficiently on WEB images but also efficiently for publicly available datasets, which contains a lot of challenges like noise and background complexity, to name but a few. From this research, we finally conclude that a combination of set of different features increases the overall accuracy but also increases the computational time and complexity. Therefore, it is somewhat mandatory to involve a feature selection method. A segmentation step plays its role in the extraction of better features — leading to better classification. As a future work, deep features will be utilized instead of conventional, as well as, number of disease will be increase, but the selection step is somewhat obligatory even with the deep features.
Availability of data and materials
Not applicable
Declarations
Abbreviations
 HOG:

Histogram of oriented gradients
 QD:

Quartile deviation
 CLPSO:

PSO
 SVM:

Support vector machine
 ISADH:

Improved sum and difference of histogram
 RGB:

Red, green, blue
 GA:

Genetic algorithm
 CFS:

NNN
 GCH:

NN
 CCV:

NN
 LBP:

Local binary patterns
 CLBP:

MM
 SGDM:

NN
 GLCM:

Graylevel occurrences matrix
 SLBP:

Segmented local binary patterns
 PSO:

Particle swarm optimization
 ANN:

Artificial neural network
 SLBPF:

SLBP features
 DT:

Decision tree
 QDA:

Quadratic discriminant analysis
 QSVM:

Quadratic SVM
 CSVM:

Cubic SVM
 FKNN:

Fine Knearest neighbor
 WKNN:

Weighted KNN
 EBT:

Ensemble boosted tree
 ESDA:

Ensemble subspace discriminant analysis
 FPR:

False positive rate
 FNR:

False negative rate
References
X. F. Wang, Z. Wang, S. W. Zhang, Y. Shi, in International Conference on Information Technology and Management Innovation (ICITMI 2015). Monitoring and discrimination of plant disease and insect pests based on agricultural IOT (Atlantis Press, 2015), p. 112115.
B. J. Samajpati, S. D. Degadwala, in 2016 International Conference on Communication and Signal Processing (ICCSP). Hybrid approach for apple fruit diseases detection and classification using random forest classifier (IEEE, 2016), pp. 1015–1019.
M. K. Tripathi, D. D. Maktedar, in 2016 International Conference on Computing Communication Control and automation (ICCUBEA). Recent machine learning based approaches for disease detection and classification of agricultural products (IEEE, 2016), pp. 1–6.
A. Camargo, J. S. Smith, An imageprocessing based algorithm to automatically identify plant disease visual symptoms. Biosyst. Eng.102(1), 9–21 (2009).
S. R. Dubey, A. S. Jalal, Apple disease classification using color, texture and shape features from images. Signal Image Video Process. 10(5), 819–826 (2016).
S. Zhang, X. Wu, Z. You, L. Zhang, Leaf image based cucumber disease recognition using sparse representation classification. Comput. Electron. Agric.134:, 135–141 (2017).
M. Sharif, M. Attique Khan, M. Faisal, M. Yasmin, S. L. Fernandes, A framework for offline signature verification system: best features selection approach. Pattern Recogn. Lett. (2018).
M. A. Khan, T. Akram, M. Sharif, M. Y. Javed, N. Muhammad, M. Yasmin, An implementation of optimized framework for action classification using multilayers neural network on selected fused features. Pattern. Anal. Applic., 1–21 (2018).
M. Nasir, M. A. Khan, M. Sharif, I. U. Lali, T. Saba, T. Iqbal, An improved strategy for skin lesion detection and classification using uniform segmentation and feature selection based approach. Microsc. Res. Tech. (2018).
M. A. Khan, M. Sharif, M. Y. Javed, T. Akram, M. Yasmin, T. Saba, License number plate recognition system using entropybased features selection approach with SVM. IET Image Process. 12(2), 200–209 (2017).
M. Sharif, M. A. Khan, T. Akram, M. Y. Javed, T. Saba, A. Rehman, A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropybased features selection. EURASIP J. Image Video Process. 2017(1), 89 (2017).
S. Zhang, Z. Wang, Cucumber disease recognition based on GlobalLocal Singular value decomposition. Neurocomputing. 205:, 341–348 (2016).
M. Sharif, M. A. Khan, Z. Iqbal, M. F. Azam, Lali Ikram Ullah M., M. Y. Javed, Detection and classification of citrus diseases in agriculture based on optimized weighted segmentation and feature selection. Comput. Electron. Agric.150:, 220–234 (2018).
S. Zhang, Y. Zhu, Z. You, X. Wu, Fusion of superpixel, expectation maximization and PHOG for recognizing cucumber diseases. Comput. Electron. Agric.140:, 338–347 (2017).
A. Akula, R. Ghosh, S. Kumar, H. K. Sardana, in Proceedings of International Conference on Computer Vision and Image Processing. Local binary pattern and its variants for target recognition in infrared imagery (SpringerSingapore, 2017), pp. 297–307.
U. Solanki, U. K. Jaliya, D. G. Thakore, A survey on detection of disease and fruit grading. Int. J. Innov. Emerg. Res. Eng.2(2), 109–114 (2015).
G. Pass, R. Zabih, J. Miller, in Proceedings of the fourth ACM international conference on Multimedia. Comparing images using color coherence vectors (ACM, 1997), pp. 65–73.
S. R. Dubey, A. S. Jalal, Fruit disease recognition using improved sum and difference histogram from images. Int. J. Appl. Patt. Recog.1(2), 199–220 (2014).
G. Amayeh, A. Erol, G. Bebis, M. Nicolescu, in ISVC. Accurate and efficient computation of high order zernike moments (Springer, 2005), pp. 462–469.
S. R. Dubey, A. S. Jalal, Apple disease classification using color, texture and shape features from images. Signal Image Video Process.10(5), 819–826 (2016).
A. Kadir, L. E. Nugroho, A. Susanto, P. I. Santosa. Neural network application on foliage plant identification, (2013).
H. AlHiary, S. BaniAhmad, M. Reyalat, M. Braik, Z. ALRahamneh, Fast and accurate detection and classification of plant diseases. Mach Learn. 14(5) (2011).
S. R. Dubey, P. Dixit, N. Singh, J. P. Gupta, Infected fruit part detection using kmeans clustering segmentation technique. Ijimai. 2(2), 65–72 (2013).
V. Ashok, D. S. Vinod, in 2014 International Conference on Contemporary Computing and Informatics (IC3I). Automatic quality evaluation of fruits using Probabilistic Neural Network approach (IEEE, 2014), pp. 308–311.
Y. Boykov, Graph cuts and efficient ND image segmentation. Int. J. Comp. Vis. (IJCV). 70(2), 109–131 (2006).
B. Sowmya, B. Sheelarani, Colour image segmentation using soft computing techniques. Int. J. Soft Comput. Appl.4:, 69–80 (2009).
A. Borji, M. Hamidi, in Fuzzy Information Processing Society, 2007. NAFIPS’07. Annual Meeting of the North American. CLPSObased fuzzy color image segmentation (IEEE, 2007), pp. 508–513.
S. R. Dubey, A. S. Jalal, Adapted approach for fruit disease identification using images. arXiv preprint arXiv:1405.4930 (2014).
Z. Chuanlei, Z. Shanwen, Y. Jucheng, S. Yancui, C. Jia, Apple leaf disease identification using genetic algorithm and correlation based feature selection method. Int. J. Agric. Biol. Eng.10(2), 74–83 (2017).
P. Soni, R. Chahar, in IEEE International Conference on Power Electronics, Intelligent Control and Energy Systems (ICPEICES). A segmentation improved robust PNN model for disease identification in different leaf images (IEEE, 2016), pp. 1–5.
M. Shuaibu, W. S. Lee, Y. K. Hong, S. Kim, Detection of apple Marssonina blotch disease using particle swarm optimization. Trans. ASABE. 60(2), 303–312 (2017).
H. Waghmare, R. Kokare, Y. Dandawate, in 2016 3rd International Conference on Signal Processing and Integrated Networks (SPIN). detection and classification of diseases of grape plant using opposite colour local binary pattern feature and machine learning for automated decision support system (IEEE, 2016), pp. 513–518.
P. B. Padol, S. D. Sawant, in 2016 International Conference on Global Trends in Signal Processing, Information Computing and Communication (ICGTSPICC). Fusion classification technique used to detect downy and powdery mildew grape leaf diseases (IEEE, 2016), pp. 298–301.
A. Awate, D. Deshmankar, G. Amrutkar, U. Bagul, S. Sonavane, in 2015 International Conference on Green Computing and Internet of Things (ICGCIoT). Fruit disease detection using color, texture analysis and ANN (IEEE, 2015), pp. 970–975.
S. R. Dubey, A. S. Jalal, in 2012 Third International Conference on Computer and Communication Technology (ICCCT). Detection and classification of apple fruit diseases using complete local binary patterns (IEEE, 2012), pp. 346–351.
P. K. Kharde, H. H. Kulkarni, An unique technique for grape leaf disease detection (2016).
P. B. Padol, A. A. Yadav, in Conference on Advances in Signal Processing (CASP). SVM classifier based grape leaf disease detection (IEEE, 2016), pp. 175–179.
H. Wang, G. Li, Z. Ma, X. Li, in 2012 5th International Congress on Image and Signal Processing (CISP). Image recognition of plant diseases based on backpropagation networks (IEEE, 2012), pp. 894–900.
Ş Ozturk, B. Akdemir, Fuzzy logicbased segmentation of manufacturing defects on reflective surfaces. Neural Comput. Appl.29(8), 107–116 (2018).
M. A. Khan, T. Akram, M. Sharif, A. Shahzad, K. Aurangzeb, M. Alhussein, S. I. Haider, A. Altamrah, An implementation of normal distribution based segmentation and entropy controlled features selection for skin lesion detection and classification. BMC Cancer. 18(1), 638 (2018).
A. Liaqat, M. A. Khan, J. H. Shah, M. Sharif, M. Yasmin, S. L. Fernandes, Automated ulcer and bleeding classification from WCE images using multiple features fusion and selection. J. Mech. Med. Biol., 850038 (2018).
M. Raza, M. Sharif, M. Yasmin, M. A. Khan, T. Saba, S. L. Fernandes, Appearance based pedestrians’ gender recognition by employing stacked auto encoders in deep learning. Futur. Gener. Comput. Syst.88:, 28–39 (2018).
T. Akram, M. A. Khan, M. Sharif, M. Yasmin, Skin lesion segmentation and recognition using multichannel saliency estimation and MSVM on selected serially fused features. J. Ambient Intell. Humanized Comput., 1–20 (2018).
M. A. Khan, T. Akram, M. Sharif, M. Awais, K. Javed, H. Ali, T. Saba, CCDF: automatic system for segmentation and recognition of fruit crops diseases based on correlation coefficient and deep CNN features. Comput. Electron. Agric.155:, 220–236 (2018).
Z. Iqbal, M. A. Khan, M. Sharif, J. H. Shah, M. Habib ur Rehman, K. Javed, An automated detection and classification of citrus plant diseases using image processing techniques: a review. Comput. Electron. Agric.153:, 12–32 (2018).
J. K. Patil, R. Kumar, Color feature extraction of tomato leaf diseases. Int. J. Eng. Trends Technol.2(2), 72–74 (2011).
R. M. Haralick, K. Shanmugam, Textural features for image classification. IEEE Trans. Syst. Man Cybern.6(1973), 610–621.
T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution grayscale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern. Anal. Mach. Intell.24(7), 971–987 (2002).
L. Zhang, R. Chu, S. Xiang, S. Liao, S. Z. Li, in International conference on biometrics. Face detection based on multiblock lbp representation (SpringerBerlin, 2007), pp. 11–18.
J. Yang, J. Y. Yang, D. Zhang, J. F. Lu, Feature fusion: parallel strategy vs. serial strategy. Pattern Recognit.36(6), 1369–1381 (2003).
Y. Liu, Y. F. Zheng, in Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005. IJCNN’05, vol. 2. Oneagainstall multiclass SVM classification using reliability measures (IEEE, 2005), pp. 849–854.
D. Hughes, M. Salathé, An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv preprint arXiv:1511.08060 (2015).
L. G. Nachtigall, R. M. Araujo, G. R. Nachtigall, in 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI). Classification of apple tree disorders using convolutional neural networks (IEEE, 2016), pp. 472–476.
Acknowledgements
This research work is partially sponsored by Deanship of Scientific Research at University of Hail, Kingdom of Saudi Arabia. The authors are grateful for this financial support. We also like to thank the Plant Village community for developing this large dataset and VC, HITEC University, Taxila Pakistan.
Funding
Not applicable
Author information
Authors and Affiliations
Contributions
MAK and TA developed this idea and performed the simulations by developing different patches of code with full integration. They are also responsible for this complete writeup. Different accuracy criteria are finalized and also simulated by these authors. MS performed technical supports throughout the paper. AM has given a complete shape to this article and identified several issues and helped the primary authors to overcome all those shortcomings. Moreover, this author is also responsible for the funding of this manuscript. TS is responsible for the final proofreading along with the technical support in the classification step due to her research major. NN provided technical support in different sections which include feature extraction and fusion along with the issues raised in the development of a feature selection method. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Khan, M.A., Akram, T., Sharif, M. et al. A probabilistic segmentation and entropyrank correlationbased feature selection approach for the recognition of fruit diseases. J Image Video Proc. 2021, 14 (2021). https://doi.org/10.1186/s13640021005582
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13640021005582
Keywords
 Contrast stretching
 Segmentation
 Fusion
 Feature extraction
 Feature selection
 Classification