An evolutionary classifier for steel surface defects with small sample set
© The Author(s) 2017
Received: 29 February 2016
Accepted: 4 July 2017
Published: 17 July 2017
Nowadays, surface defect detection systems for steel strip have replaced traditional artificial inspection systems, and automatic defect detection systems offer good performance when the sample set is large and the model is stable. However, the trained model does work well when a new production line is initiated with different equipment, processes, or detection devices. These variables make just tiny changes to the real-world model but have a significant impact on the classification result. To overcome these problems, we propose an evolutionary classifier with a Bayes kernel (BYEC) that can be adjusted with a small sample set to better adapt the model for a new production line. First, abundant features were introduced to cover detailed information about the defects. Second, we constructed a series of support vector machines (SVMs) with a random subspace of the features. Then, a Bayes classifier was trained as an evolutionary kernel fused with the results from the sub-SVM to form an integrated classifier. Finally, we proposed a method to adjust the Bayes evolutionary kernel with a small sample set. We compared the performance of this method to various algorithms; experimental results demonstrate that the proposed method can be adjusted with a small sample set to fit the changed model. Experimental evaluations were conducted to demonstrate the robustness, low requirement for samples, and adaptiveness of the proposed method.
With the growth in competition among producers of steel strip, the quality of the steel strip surface has become very important. Steel strip quality and the surface quality of structural products have assumed increasingly significant importance . The importance of surface quality requires that effective and efficient methods be implemented to replace conventional artificial visual inspection during which an expert can only inspect 0.05% of the total steel surface, which can be easily impacted by fatigue and other unfavorable conditions. Artificial inspection cannot satisfy the quality requirements. Therefore, automatic, high-accuracy steel surface inspection systems have become essential to the production system.
Surface defect inspection systems mainly consist of two parts: defect segmentation and defect processing. Defect processing entails feature extraction and defect classification. In recent years, abundant research has been conducted, and different kinds of feature extraction and classification methods have been introduced to classify steel strip surface defect. In [2, 3], the defects were classified by using K-nearest neighbor (KNN) methods with a co-occurrence matrix. Santanu Ghorai et al.  described an automated visual inspection system with discrete wavelet transform (DWT) features and a support vector machine (SVM). Wu et al.  described an algorithm with an undecimated wavelet transform (UWT) and a mathematical morphology to detect geometric defects that achieved a 90.23% accuracy that is difficult to achieve in real industrial application in 2008. However, in 2013, a noise-robust method based on a completed local binary proposed by Song  averaged 98% accuracy and was shown to be effective enough to be applied to real production. A review of vision-based steel surface inspection systems  indicates that most such systems have achieved >90% accuracy.
For application to real production, some difficulties for steel surface defect detection remain. In a real production environment, different products can be produced on a single production line, and some products are manufactured on several production lines at the same time. With the passage of time, the physical conditions of the equipment and of the detection device will both change. Each of these variables has only a small impact on the real-world model; however, the classifier trained by the original database will not work well on the changed real-world model. If we trained every production line and every product with a single database, it would take a long time for a steel company to get a large enough sample set for a newly built production line without a trained inspection system.
Unfortunately, most research has focused on algorithms that work only when a large number of samples is available, there little research has focused on this production problem. Solly et al. proposed a rapidly evolving system with an expert’s feedback; however, that research describes an adaptive interactive evolution methodology for determining parameters to control segmentation of surface defects on images, and the classification accuracy will still be impacted by changing the real-world model.
Therefore, to solve these issues, we propose an evolutionary classifier with a Bayes kernel (BYEC) that can be adjusted with small sample sets to fit the changed model. Because the small changes to the real-world model will impact some features of the classifier, the misclassification of the classifier is mainly impacted by these features. Nevertheless, if we reduce the impact of these features, we should be able to restore the classifier to get a relatively high accuracy . Thus, we built a classic classifier with an evolutionary Bayes kernel and adjusted the kernel with a small sample set for the changed production model rather than training every product and every production line with a single data set.
Our technical contribution includes three points. First, we propose an evolutionary classifier with a Bayes kernel (BYEC) for steel defect classification. Second, we adopt multiple SVMs to predict individual features, enabling our evolutionary Bayes kernel to fit well with a small sample set for the changed production model. Third, and most importantly, we combine five selected features and prove that our method has high accuracy, is adaptive, and has a low requirement for samples in our experiments.
The rest of the paper is organized as follows: Section 2 depicts the procedures of the surface defect inspection system. Section 3 presents the features we used in this paper. The method to build SVM subclassifiers on a random subspace of features is provided in Section 4. The method to build the Bayes kernel and evolve the kernel is discussed in Section 5. Section 6 gives comparative experimental results of this algorithm and research on factors that affect the results. Finally, we conclude this paper in Section 7 and give some suggestions for future work.
2 Structure of the evolutionary classifier for steel surface defects
As it illustrated in the Fig. 1, five kinds of features were introduced in this system. The reason of this is because the penalize of some features in the adjustment process drop some useful information of the defect and may lead to misclassification, so superfluous features were imported. We integrated the uniform local binary patterns (ULBP) , gray-level co-occurrence matrix (GLCM) features , the histogram of oriented gradient (HOG) feature, and gray-level histogram and Gabor filter features. These abundant features guarantee enough information for the classifier. To utilize these features, a multiple classifier system was imported into this classifier.
The classification part has two components in Fig. 1: multiple SVM classifiers and a Bayes kernel classifier. Compared with the classic steel surface inspection system, the fuser Bayes kernel makes the key contribution to this system by fusing the results from the multiple classifiers and adjusting the hybrid parameters.
The SVM  is a popular small sample set learning method. It offers very good performance for pattern classification problems by minimizing the Vapnik-Chervonenkis (VC) dimension and achieving a minimal structural risk. Because of its small sample set requirement, fast learning capability, and good performance, SVM is a good choice for this method. Because each SVM classifier is trained by a subspace of the feature space, we can evaluate and penalize the features by using the corresponding SVM classifier.
A multiple classifier system (MCS) offers many alternatives for unorthodox handing of realistic complex problems , allowing us to exploit the potential of the individual classifiers and get enhanced performance by their combination. It can perform better than the individual classifiers and is easily implemented on parallel, multithreaded, and distributed architectures, which is very important for the real-time production environment. The critical aspect of a MCS is that it is well suited to treating drift, which means that the statistical dependencies between object features and its classification may change in time so that future data may be badly processed if we maintain the same classification. Drift decreases the accuracy of the classification result, the individual classifier evaluation is done on their accuracy on the new data. The best performing classifiers are selected to constitute the MCS committee after every loop. Kolter et al.  described a dynamic weighted majority algorithm. We also use an evolvable weighted method to change our classifier with time.
The method to fuse the subclassifiers must be adjusted with a small sample set and the accuracy of the subclassifiers must be evaluated; therefore, the Bayes classifier is a good candidate as it builds a reference from prior probability to posterior probability. We can score the performance of the classifiers on the changed model by the posterior probability. With a small sample set, we can get the approximate posterior of the classifiers and use it to reweight the integrated classifier to fit the real model. A Bayes kernel was built based on this concept.
3 Extraction of features
To avoid the lack of information after integration, we introduce redundant features to overcome this weakness. Five different kinds of features are extracted in the inspection system to describe the properties of texture, color, and shape, respectively. The feature space consists of a gray-level co-occurrence matrix, a uniform local binary pattern, a histogram of oriented gradient, a gray-level histogram, and a Gabor filter.
3.1 Gray-level co-occurrence matrix
The GLCM is sensitive to rotation, so we choose four directions 0°, 45°, 90°, and 135° to cover more information and select a distance of 8. The dimensions of the GLCM is too large for us to process, so we choose Haralick features  to describe it which calculates the correlation, energy, contrast, entropy, and inertia quadrature of the GLCM. Finally, we get a vector to describe the GLCM feature.
3.2 Histogram of oriented gradient and gray level
The histogram is a key tool in image processing, it is one of the most useful techniques in gathering information about a matrix. The gray-level histogram of the defect image represents the distribution of the pixels over the gray-level scale and reflects the contrast, gray level, and other information of the image. It can be visualized as if each pixel is placed in a bin corresponding to the color intensity of that pixels. We make the histogram of the defect image to 40 bins and turn it into a feature vector as the gray-level feature.
3.3 Uniform local binary pattern
3.4 Gabor filter
4 The SVM subclassifiers on a random subspace of features
After feature extraction, we get a feature vector of dimensions. In general, when the feature dimensions are too large compared with the scale of the sample set, overfitting problems can arise. Therefore, instead of training one classifier to cover all the feature space, we separate the features with a random sampling scheme without replacement and keep almost equal dimensions for every subspace. We choose a feature subspace with the random sampling scheme rather than with a feature category or in sequence for two reasons: (1) the feature punished will involve features nearby that may contain some useful information and not interfered by the real-world model, and (2) features nearby may contain the same information and the classifier will not get a good result with nearby features.
To overcome the small sample set issue on the production line, we introduce a support vector machine, which has been widely used in many areas, such as computer vision, natural language processing, and neuroimaging, for its good performance, fast training capability, and small sample set requirements for labeled samples.
Corinna describes the standard SVM for two classifications in  within the structural risk minimization. The key of SVM is to find the hyperplane to minimize the distance between the two classes to be separated.
In which e=[1,…,1] T is the vector of all ones, Q represents an l by l positive semidefinite matrix, Q ij ≡y i y j K(x i ,x j ), and K(x i ,x j )≡ϕ(x i ) T ϕ(x j ) is the kernel function.
We combine the results from the binary classifier with a voting scheme: every binary classifier has a vote, and a data point is classified into the class with the maximum number of votes. To solve the clash of the same votes, we simply choose the class with the greater sequence number. In addition, we increase the sequence number for the classes with every multiclass SVM to avoid accumulated deviations. A simple example indicates the risks without this strategy: Assume we have three classes, A, B, and C, all with the same accuracy and scale. Then, samples classified into A are least.
5 Combining subclassifiers by using the Bayes kernel
The combination of the subclassifiers is very important to this evolutionary classifier because it not only response for improving the performance of the final integrated classifier but also for the ability to evolve itself to fit the new changed model. Many ensemble methods have been presented [21–23]; however, these methods do not suit this adaptive inspection system. This is because (1) these methods are sensitive to the size of the training sample set (although, even for a mature production line, there are not too many labeled samples), and (2) the fusion of classifiers may be biased from the combination of samples from the changed model.
This theorem was proposed for about 300 years by Thomas Bayes 12 and has developed into a great branch of machine learning. In some domains, it is presented as comparable to neural networks and to other machine learning methods. The naive Bayes classifier f(x) is described by a conjunction of attribute values when the f(x) is limited to a finite set V.
In Bayes learning, the training examples are described by a feature vector (a 1,a 2,a 3…a n ) T ; the Bayes classifier makes decisions based on the probability for every possible value and selects the most portable target.
In this model, we assign every subclassifier as a feature in the feature vector and described it as a probability matrix. The feature vector of the Bayes vector is defined as (D 1,D 2,D 3…D m ) T , the D i is the decision made by the ith classifier, and the decision of the Bayes classifier is taken from a finite set V(v 1,v 2…v n ). The prior probability that the ith classifier make decision k and the real class is j is defined as Eq. 13.
The key structure of the naive Bayes fusion kernel in this model is the post probability matrix for individual classifiers that we trained with labeled samples. To adjust our model to fit the changed real-world model, we changed the post probability matrix.
As depicted in Fig. 2, to evolve the Bayes kernel, a new post probability matrix was trained to replace the old one with the labeled samples from the changed production line. The new classifier model is combined by the naive Bayes kernel with a new post probability matrix and SVM subclassifiers. The Bayes classifier takes advantage of not only the true positive result but also the true negative result from the subclassifiers. Because some subclassifiers may lose efficacy after a real-world model change and make a biased classification, this message can also be utilized by the integrated classifier.
6 Experimental results
To evaluate the effectiveness of the inspection system for surface defects, a surface defect data set was used. We then compared this approach with some other classification methods. In addition, some factors were examined that demonstrate how they affected classification accuracy.
6.1 Experiment implementation details
6.2 Adaptiveness of the classifier
To evaluate the adaptiveness of this evolutionary integrated classifier, the original NEU defect data set and 13 defect data sets formed by adding noise to the NEU defect data set were used. The standard deviations of the noise added to the NEU defect set were used (0–13), with an equal mean of 0. A total of 212 features were extracted from every defect image.
A BYEC classifier composed of 25 SVM subclassifiers was trained by 70% of the original NEU defect data set, and the remaining 30% of the data were used to evaluate the accuracy of the BYEC classifier on the original data set. Then, we randomly sampled 10% of the processed data sets to adjust our BYEC classifier. Finally, the accuracy of the adjusted BYEC classifier and the original classifier on the processed data set were tested by the remaining 90% of the processed data. As for the BYEC classifier, 70% of the original data set were used to train the KNN, BPNN, and SVM classifier and 30% were used to evaluate the accuracy on the original data set, then the accuracy of these classifiers on the processed data set was tested by 90% of the processed data set. The BPNN and KNN parameters were determined by cross-validation testing. The average accuracy of the classifiers on every data set was run 100 times, and the data sets were sampled individually.
However, the increase of standard deviation has little impact on our BYEC classifier. It can be observed that this proposed method is more adaptive to different standard deviations in comparison with other classifiers. The accuracy of the BYEC classifier on the original data is lower than that of the SVM classifier, perhaps because of information loss from the combination of SVM classifiers. The original BYEC classifier without an adjustment process also suggests a relatively high adaptiveness compared with other classifiers.
6.3 Number of the sub-SVM classifiers
The number of sub-SVM classifiers, defined as k, is a key parameter for our BYEC classifier. It decides the particle size to which our system can be adjusted. The purpose of this section is to examine how k affects the accuracy and the adaptiveness of the BYEC classifier.
As the results indicate in Fig. 9, k should be set with the real production environment. The BYEC classifier should be set with a larger k value when higher adaptive performance is needed. However, to avoid information loss, we should set the BYEC classifier with a smaller k value when the changed model has a small bias with the original model.
6.4 Set size of the sub-SVM classifiers
An important characteristic of classifiers is the size of the sample set that used to train the classifier. The more samples that are supplied, the more information the classifier machine can learn about the model. For our BYEC classifier, the accuracy of the subclassifiers can be evaluated more precisely with more samples.
Figure 10 depicts the accuracy of BYEC classifiers adjusted by different sizes of the samples. All the classifiers were used on the data set described in Section 6.2. We can see that the classifiers trained by 10% reach the lowest accuracy and are the least adaptive, but the classifiers trained by 30% reach fairly good performance and classifiers with more samples adjusted have little advantage over this classifier in terms of accuracy and adaptiveness. Even in the defect set with a standard deviation of 13%, the gap between the BYEC classifier trained by 10% and that trained by 50% is only 1.01%. This illustrates that our BYEC classifier converged very fast and the low requirement for sample size.
6.5 Evaluating the effect of features
The five types of features selected are able to capture the properties of texture, color, and shape, respectively. Those features are described in detail by Neogi and proved to be very important for classifying steel defects . To evaluate the effect of each feature, when the number of features is less than four, we observe that the accuracy of our method dropped very significantly, with accuracy barely reaching 70%. Therefore, we discuss the meaningful situation in which four features are used to classify the steel.
As shown in Fig. 11, the accuracy of EFIC classifiers without the Gabor feature is the lowest. However, the Gabor feature is the most suitable for texture representation and discrimination of steel defect classification.
In contrast, the accuracy of EFIC classifiers without GLH is the highest. This demonstrates that the GLH feature is the least suitable for steel defect classification. The main reason for this is that the GLH feature can only capture gray features but not texture and or shape. Other good performance features followed by four features no HOG, GLCM, and LBP.
In conclusion, in our experiments, the absence of any one feature of EFIC classifiers significantly reduced the accuracy. Therefore, by combining these five features, our method can obtain satisfactory accuracy for steel defect classification.
Because accuracy decreases in steel surface classification systems with a changed production line model, in this research, we propose an evolutionary method that can be adjusted with a small sample set to fit a changed model and maintain relatively high accuracy. First, to overcome information loss in the process of evolution, we proposed five kinds of features that cover texture, color, and shape, respectively. Second, random subspace SVM classifiers are proposed to conquer the overfitting problem and fit for adjustment. Then, we introduced a naive Bayes machine to fuse the results from SVM subclassifiers that suits the adjustment and requires a small sample set. Finally, we introduced a simple method to adjust the Bayes kernel. The experimental results indicate that the BYEC algorithm is more adaptive with changed steel surface defect data set compared with other algorithms. Our research suggests that the adaptiveness of the classifier is highly related to the parameter k; with the growth of k, the BYEC classifier shows a greater adaptiveness but, unfortunately, with some accuracy loss on the original data set. The small sample set requirement was shown to have been fulfilled from the experiment results.
With the advantages and disadvantages of the BYEC algorithm, in a new production line, we can use the original BYEC algorithm without any labeled samples on the changed model; with the growth of the sample set size, we can adjust the BYEC model to become more adaptive. A new classifier can be trained to replace the old classifier as the relatively low accuracy on large sample set. Our future work will focus on increasing the accuracy on both a large sample set and a changed production model. Meanwhile, more noise-robust methods can be combined with this method to increase the adaptiveness.
1 NEU surface defect database is the the Northeastern University (NEU) surface defect database, download link: http://faculty.neu.edu.cn/yunhyan/NEU_surface_defect_database.html.
We thank Dr. Qiaochuan cheng from the Department of Electronics and Information Engineering, Tongji University, and anonymous reviewers for their useful comments and language editing which have greatly improved the manuscript.
This work is supported by the National Nature Science Foundation of China (NSFC) (60771065, 51378365), Foundation of Shanghai Institute of Technology (YJ2017-5).
MX and MJ conceived and designed the study. MX, MJ, LX, and LY performed the experiments. MX and MJ wrote the paper. GL, MX, MJ, LX, and LY reviewed and edited the manuscript. All authors read and approved the manuscript.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- N Neogi, DK Mohanta, PK Dutta, Review of vision-based steel surface inspection systems. EURASIP J Image Video Process. 2014(1), 1–19 (2014).View ArticleGoogle Scholar
- F Dupont, C Odet, M Cartont, Optimization of the recognition of defects in flat steel products with the cost matrices theory. NDT & E International. 30(1), 3–10 (1997).View ArticleGoogle Scholar
- C Unsalan, A Erci, Automated inspection of steel structures. Recent Advances in Mechatronics (Springer-Verlag Ltd, Singapore, 1999).Google Scholar
- S Ghorai, A Mukherjee, M Gangadaran, PK Dutta, Automatic defect detection on hot-rolled flat steel products. IEEE Trans. Instrum. Meas. 62(3), 612–621 (2013). doi:10.1109/TIM.2012.2218677.View ArticleGoogle Scholar
- X-Y Wu, K Xu, J-W Xu, in Image and Signal Processing, 2008. CISP’08. Congress on, 4. Application of undecimated wavelet transform to surface defect detection of hot rolled steel plates (IEEE, 2008), pp. 528–532. http://ieeexplore.ieee.org/document/4566708/.
- K Song, Y Yan, A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl. Surface Sci. 285:, 858–864 (2013).View ArticleGoogle Scholar
- C Yan, Y Zhang, J Xu, F Dai, L Li, Q Dai, F Wu, A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Process. Lett. 21(5), 573–576 (2014).View ArticleGoogle Scholar
- G Wu, H Kwak, S Jang, K Xu, J Xu, in Automation and Logistics, 2008 ICAL 2008. IEEE International Conference on. Design of online surface inspection system of hot rolled strips (IEEE, 2008), pp. 2291–2295. http://ieeexplore.ieee.org/document/4636548/.
- Y-J Liu, J-Y Kong, X-D Wang, F-Z Jiang, in Advanced Computer Theory and Engineering (ICACTE) 2010 3rd International Conference on, 6. Research on image acquisition of automatic surface vision inspection systems for steel sheet (IEEE, 2010), pp. V6–189. http://ieeexplore.ieee.org/document/5579393/.
- M Muehlemann, Standardizing defect detection for the surface inspection of large web steel (Illumination Technologies Inc, 2000). http://www.il-photonics.com/cdv2/Illumination%20tech-Light%20Sources/white%20papers/surface_inspection.PDF.
- T Ojala, M Pietikäinen, T Mäenpää, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell.24(7), 971–987 (2002).View ArticleMATHGoogle Scholar
- RM Haralick, K Shanmugam, IH Dinstein, Textural features for image classification. IEEE Trans. Syst. Man Cybern, (6), 610–621 (1973). http://ieeexplore.ieee.org/document/4309314/.
- VN Vapnik, V Vapnik, Statistical learning theory, vol. 1 (Wiley, New York, 1998).MATHGoogle Scholar
- M Woźniak, M Graña, E Corchado, A survey of multiple classifier systems as hybrid systems. Inf. Fusion. 16:, 3–17 (2014).View ArticleGoogle Scholar
- JZ Kolter, M Maloof, et al., in Data Mining, 2003. ICDM 2003. Third IEEE International Conference on. Dynamic weighted majority: a new ensemble method for tracking concept drift (IEEE, 2003), pp. 123–130. http://ieeexplore.ieee.org/document/1250911/.
- N Dalal, B Triggs, in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 1. Histograms of oriented gradients for human detection (IEEE, 2005), pp. 886–893. http://ieeexplore.ieee.org/document/1467360/.
- JJ Henriksen, 3d surface tracking and approximation using gabor filters (South Denmark University, 2007). https://www.yumpu.com/en/document/view/44234347/3d-surface-tracking-and-approximation-using-gabor-filters-covil.
- UH-G Kreßel, in Advances in kernel methods. Pairwise classification and support vector machines (MIT Press, 1999), pp. 255–268. http://dl.acm.org/citation.cfm?id=299108.
- C-W Hsu, C-J Lin, A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002).View ArticleGoogle Scholar
- C Cortes, V Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995).MATHGoogle Scholar
- M Göksedef, Ş Gündüz-Öğüdücü, Combination of web page recommender systems. Expert Syst. Appl. 37(4), 2911–2922 (2010).View ArticleGoogle Scholar
- C Porcel, A Tejeda-Lorente, M Martínez, E Herrera-Viedma, A hybrid recommender system for the selective dissemination of research resources in a technology transfer office. Inf. Sci. 184(1), 1–19 (2012).View ArticleMATHGoogle Scholar
- C Cabral, M Silveira, P Figueiredo, Decoding visual brain states from fMRI using an ensemble of classifiers. Pattern Recognit. 45(6), 2064–2074 (2012).View ArticleGoogle Scholar
- M Vangelis, L Androutsopoulos, P Georgios, in CEAS 2006 Third Conference on Email and AntiSpam (CEAS 2006). Spam filtering with naive bayes-which naive bayes? (Mountain View, 2006). www2.aueb.gr/users/ion/docs/ceas2006_paper.pdf.
- Y-J Jeon, D-C Choi, JP Yun, C Park, SW Kim, in Control, Automation and Systems (ICCAS) 2011 11th International Conference on. Detection of scratch defects on slab surface (IEEE, 2011), pp. 1274–1278. http://ieeexplore.ieee.org/document/6106307/.
- M Yazdchi, M Yazdi, AG Mahyari, in Digital Image Processing, 2009 International Conference on. Steel surface defect detection using texture segmentation based on multifractal dimension (IEEE, 2009), pp. 346–350. http://ieeexplore.ieee.org/document/5190595/.
- RC Gonzalez, Digital image processing (Pearson Education, India, 2009). http://web.ipac.caltech.edu/staff/fmasci/home/astro_refs/Digital_Image_Processing_3rdEd_truncated.pdf.