1. Introduction
Process automation is a current trend and the main focus of many industries to improve their competitiveness. Labor shortages, increasing labor costs, and the demand for high-quality products are the forces driving this movement. Locating and hiring experienced workers has become a challenging administrative task worldwide. Of the many areas in manufacturing that require an upgrade to cope with these challenges, visual inspection cannot be ignored. It is used to prevent defective or low-quality products from reaching the market and to detect and correct problematic processes in the early processing stages to reduce waste. Visual inspection is a labor-intensive process and constitutes a sizable portion of production expense. These challenges have become even more prevalent in recent years.
There are many off-the-shelf vision systems that are designed to perform vision tasks at an affordable price. These systems have simple built-in software tools that are designed to allow an experienced end user to install, configure, and operate them. Most of these tools use common image processing techniques or depend on a human expert to design the relevant features to perform the desired object recognition or visual inspection tasks. Although the hand-crafted features are able to describe the object of interest well and produce sound accuracy, specific features created by human experts that are good for one class of products may do poorly for others. This manual process often involves the redesigning of the algorithms or fine-tuning of inspection parameters that requires unique skills and extensive training. Even if possible, these added challenges make these off-the-shelf systems unsuitable for many vision applications that require sophisticated image classification and visual inspection capabilities.
The key to building a visual inspection system is to construct distinctive features or representations of data from high-dimensional observations such as images and videos. A good set of relevant visual features must be constructed in order to have a good representation of the input data. Methods, such as feature selection, feature extraction, and feature construction, have been used to obtain high-quality features [
1,
2,
3]. Feature selection is the process of selecting a subset of distinctive features from a large set of features [
1]. A subset of features can be generated by gradually adding selective features into an empty set or removing the less effective features from the original feature space according to some predetermined criteria. Feature extraction is the process of extracting a set of new features from the original features by applying functional mappings. Feature construction is the process of discovering discriminative information or subtle differences among images. All three types of approaches are developed to improve feature quality and achieve accurate image classification.
Many feature learning approaches using evolutionary techniques have been proposed over the past few years. A comprehensive survey of the state-of-the-art work on evolutionary computation for feature selection is presented in Reference [
4]. Krawiec and Bhanu [
5] used a genetic algorithm to obtain a fixed-length feature vector that is defined as a sequence of image processing operations and feature extraction steps. Wang et al. [
6] used particle swarm optimization to perform feature selection and show that particle swarm optimization is effective for rough set-based feature selection. Sun et al. used a genetic algorithm to select a subset of eigenvectors rather than the traditional method of simply taking the top eigenvectors. Their method was tested for vehicle and face detection [
7]. Sherrah et al. also used a genetic algorithm to determine whether a feature pre-processing step is necessary before classification [
8]. Genetic algorithm has also been used to automatically optimize the selection of the optimal features for classification or remote sensing data [
9]. This method aims to reduce the feature dimensionality by exploiting the genetic algorithm. Sian and Alfred proposed an evolutionary-based feature construction approach for relational data summarization [
10]. This method addresses the problem of many-to-one relationships that exist between non-target tables and target tables and improves the accuracy of the summarized data.
None of the aforementioned feature learning approaches aims at constructing or selecting features for general image classification. They mostly perform well on a single dataset for a specific application. We believe that a good feature learning algorithm should be able to automatically generate sufficient high-quality and unique features to perform accurate image classification.
Genetic algorithms [
11], as one of the most popular evolutionary computation techniques, are inspired by the process of natural selection. A population of candidate solutions to a specific optimization problem is evolved toward better solutions in genetic algorithms. Each candidate solution has a set of properties which can be altered during the evolution process.
Evolutionary learning techniques are a good choice for image classification. Genetic algorithm (GA) is an evolutionary computation technique that automatically evolves solutions based on the idea of the survival of the fittest. It is a very flexible heuristic technique that allows us to use various types of feature representations in the learning process. The application of genetic algorithm to object recognition offers many advantages. The main one is its flexibility which enables the technique to be adapted for each particular problem [
12]. The flexibility of GA in representing and evolving a wide range of models makes it a very powerful and convenient method with regard to classification.
Tran et al. used a GA with a multi-tree representation method to construct a much smaller number of new features than the original set on high-dimensional datasets in classification [
13]. A comprehensive survey on the application of GA to classification is given in Reference [
12]. Genetic algorithms have been successfully applied to feature construction methods [
14,
15]. In those methods, richer features can be constructed by applying operators to primitive features. Genetic algorithms are also used to construct multiple high-level features by adopting an embedded approach and the result shows that those high-level features constructed by the GA are effective in improving the classification performance, in most cases, over the original set of features [
16].
In this work, we aimed at developing an efficient evolutionary learning algorithm for embedded visual inspection and mobile computing applications. Compared with more general object recognition tasks, visual inspection applications often have a small number of classes and are inconvenient, if possible, to obtain a large number of images for training. Our goal was to find an efficient algorithm that can achieve an acceptable balance between the accuracy and simplicity of the system for embedded visual inspection applications.
This work built upon our previous success in using a genetic algorithm for feature construction tasks for object recognition and classification [
17,
18]. As a data-driven approach, the ECO-Feature algorithm automatically discovers salient features from a training dataset without the involvement of a human expert. It is a fully automated feature construction method. Experimental results confirmed that it is capable of constructing non-intuitive features that are often overlooked by human experts. This unique capability allows easy adaption of this technology for various object classes when the differentiation among them is not defined or cannot be well described.
We improved our algorithm since its first introduction by constructing global features from the whole image to make it more robust to image distortions than the original version that used local features [
19]. We then reduced the feature dimensionality to speed up the training process [
20]. Constructing global features proven to increase classification performance [
19]. The main weakness of the ECO-Feature algorithm is that it is designed only for binary classification that cannot be directly applied to multi-class cases. We also observe that the recognition performance depends heavily on the size of the features pool from which features can be selected and the ability of selecting the best features. This makes it difficult to determine how many features need to be constructed from the images in order to achieve the best performance.
An enhanced evolutionary learning method for multi-class image classification was developed to address the aforementioned challenges. Among other improvements, boosting was added to select the features and combine them into a strong classifier for multi-class classification. These characteristics allow the proposed algorithm to require fewer features for classification and, hence, make it more efficient. There is no doubt that other classification methods can perform as well as the proposed evolutionary learning algorithm but most of them either require handcrafted features or computational power. We included three experiments in this paper to demonstrate the performance of the proposed algorithm.
Most egg farms use manual labor to grade and inspect eggs. Human visual inspection only detects broken eggs during the process. Very few egg farms use machines to measure the size of an egg, detect defects, or to separate fertilized eggs to improve hatching efficiency. To increase revenue and ensure product quality, an automated grading and inspection machine is needed. We created one dataset for the combined experiment of two applications. It included images of fertilized and unfertilized (normal) eggs for unfertilized egg detection and removal application for the chicken hatching process. It also included images of normal eggs and eggs with dirt or cracks on the shell for egg quality evaluation applications for egg production. These were two completely different applications. We combined them to make it a challenging experiment for testing the robustness of the proposed evolutional learning algorithm.
Just as a human driver changes his or her defense mechanisms against road surface conditions, a self-driving car or an advanced driver assistance system (ADAS) must do the same. Without the input of the road condition, the system will not be able to adapt its control of the vehicle for different road conditions. Using standard autonomous driving actions for all road conditions increases the risk for accidents. Automatic evaluation of road surface condition is considered an important part of an autonomous vehicle driving system [
21]. We created two datasets for this experiment. One dataset included images of dry, wet, and sandy roads as well as roads covered with snow, ice, and a puddle of water for road condition detection. The other dataset included images of pavement with different levels of damage for pavement quality evaluation.
2. Algorithm
Although quite successful, our original ECO-Feature algorithm was designed only for binary classification that cannot be directly applied to multi-class cases [
17,
18]. We also observe that the recognition performance depends heavily on the size of the features pool from which features can be selected and the ability of selecting the best features. This makes it difficult to determine how many features need to be constructed from the images in order to achieve the best performance. We developed an enhanced evolutionary learning method for multi-class image classification to address the aforementioned challenges and applied it to four applications.
Our evolutionary learning algorithm uses evolutionary computation to construct a series of image transforms that convert the input raw pixels into high-quality image representations or features. This unique method extracts features that are often overlooked by humans and is robust to minor image distortion and geometric transformations. It uses boosting techniques to automatically construct features from training data without the use of a human expert. It is developed for multi-class image classification.
In this work, we aimed to develop an efficient image classification algorithm using evolutionary strategies for multi-class visual inspection applications for embedded vision systems. We recognize that there are many other more sophisticated and powerful methods such as machine learning and new and popular deep learning approaches that could be more capable than the proposed evolutionary learning algorithm [
22,
23]. Most of them are often overkill for our applications and require expensive computational power, a large number of images for training, long training time, and slow classification. Because of their complexity, they can hardly be considered friendly for hardware implementation. All these requirements are viewed as constraints to providing a low-cost embedded vision system for automatic visual inspection which is the main objective of this research. Our algorithm is able to run at close to 100 frames per second with an ARM processor running Linux.
2.1. Evolutionary Learning
In our algorithm, phenotypes in the genetic algorithm represent our evolutionary image transformations or the often-called image features. The genes which make up a phenotype consist of a number of select basic image transforms and their corresponding parameters. The number of genes that can be modified for each evolutionary transformation varies, since both the number of transforms and the number of parameters for each transform vary.
Each phenotype in the population represents a possible evolutionary image transformation. A fitness score is computed for each phenotype using a fitness function. A portion of the population is then selected to create a new generation. We used a tournament selection method to select phenotypes from the overall population in the genetic algorithm. In order to produce a new transformation, a pair of parent phenotypes is selected to undergo certain evolution operations including crossover and mutation.
The crossover in our method is done by rearranging the image transforms from both parents. By using crossover and mutation, a new phenotype, typically sharing many characteristics of its parents, is created. This process results in the next generation of image transformations that are different from their parent generation. Hence, the new phenotypes are likely to have a different number of image transforms and parameters. This process is repeated for several generations, evolving image features with high fitness. The evolution is terminated when a satisfactory fitness score is reached or the best fitness score remains stable for several iterations.
Figure 1 shows an overview of the training process of the proposed algorithm.
2.2. Evolutionary Image Transformation
The representation of a phenotype in our genetic algorithm is an important factor influencing the performance of the algorithm. An evolutionary image transformation or an image feature is constructed using a series of image transforms that is created by the genetic algorithm. Rather than applying a single image transform to the image, we constructed the transformation by applying a sequence of basic image transforms to the input image. This process is defined in Equation (1), where
V is the evolutionary image transformation output vector,
n is the number of transforms an evolutionary image transformation is composed of,
I defines an input image,
Ti represents each transform at step
i, and
i is the corresponding parameter vector of each transform at step
i.
Based on this definition of the evolutionary image transformation, the output of one transform is the input to the next transform. An evolutionary image transformation is learned through the evolution of this sequence of transforms. All transforms included in our algorithm are basic image transforms that can be found in almost all open source computer vision libraries. Twenty-four transforms were included in our original version [
17,
18]. In this improved version, we focused on efficiency and included only those that are filter-based and hardware friendly. Six transforms chosen for our genetic algorithm were Gabor (two wavelengths and four orientations), Gaussian (kernel size), Laplacian (kernel size), Median Blur (kernel size), Sobel (depth, kernel size, x order, y order), and Gradient (kernel size), with the parameters in parentheses. We chose these six simple transforms for their simplicity and noise removal, edge detection, and texture and shape analysis characteristics. They were not chosen for any specific applications. This set of transforms can be extended to include almost any image transforms, but we are mostly interested in those transforms that are able to extract important information from images and are simple and efficient. Using these image transforms in our method provides the system with the basic tools for feature extraction and helps speed up the evolutionary learning process.
For each transformation sequence during the evolution, its length or the number of transforms that make up the transformation sequence is not fixed. Rather, it is randomly selected by the genetic algorithm. During training, for example, our algorithm tries out different sizes of kernel for the Gaussian (variance is dependent on the kernel size), Median (blurring level is dependent on the kernel size), Laplacian (blurring and sharpening level is dependent on the kernel size), and Gradient (gradient is dependent on the kernel size) filters and automatically selects their sizes to be the most discriminant for classification. The other two transforms (Gabor and Sobel) have 6 and 4 parameters to be tried and selected by the algorithm also automatically. The type and order of the transforms in a transformation sequence and their parameters are determined by the evolution process. As a result, it is possible that any one transform from our list can be used more than once in a sequence but with different parameters.
The number of transforms, n in Equation (1), used to create an initial evolutionary transformation varies from 2 to 8 transforms. The range of 2 to 8 transforms allows the algorithm to yield good result while keeping the computation efficient and simple.
2.3. Fitness Score Calculation
As described above, a candidate image transformation or feature is constructed using a sequence of selected image transforms with their selected parameter values. Its performance must be evaluated to determine if it is indeed a good and discriminative feature for classification. A fitness function, which is designed specifically for classification, is used to compute a fitness score to evaluate how good the image transformation is during the evolution process.
We calculated a fitness score for each evolutionary image transformation based on its classification performance using a simple classifier. The classifier is trained for each image transformation using the training images and the classification performance is computed with the validation images. The fitness score is an indication of how well this classifier performs on the images.
The fitness score for each evolutionary image transformation is computed using Equation (2). It is defined based on the classification accuracy of the classifier that is associated with each constructed evolutionary image transformation. It reflects how well the classifier classifies images. Classification accuracy is a fairly generic measure for optimization; quality measures other than accuracy can be used depending on the application.
Due to the fact that the evolution requires iterating over several generations to produce satisfactory results, this classification-based fitness function must be efficiently computable. It also should be a multi-class classifier that can be easily applied to many multi-class image classification cases. Random forest classifier was chosen as the classifier to evaluate the performance of each image transformation in our method. Random forest classifier is ranked as one of the top classifiers among the 179 classifiers that have been evaluated in extensive experiments [
24]. The authors claim that the best results are achieved by the random forest classifier. Random forest shows the advantage of high computational efficiency over other popular classifiers such as SVM [
25]. Random forest is regarded as a multi-class classifier that is robust against noise, has high discrimination performance, and is capable of training and classifying at high speed [
26].
Random forest is an ensemble learning algorithm that constructs multiple decision trees. It works by randomly selecting training samples for tree construction, resulting in the construction of a classifier that is robust against noise. Random forest classifier has shown to be efficient with good generalization performance due to the fact of its randomness in training [
25]. Also, the random selection of features to be used at each node enables fast training, even when the dimensionality of the feature vector is high [
24].
2.4. Selection and Combination of Learned Features
Genetic algorithm simulates the survival of the fittest among individuals over generations and one best evolutionary image transformation is obtained at the end of each simulated evolution run. Though one single evolutionary image transformation usually has a high discrimination ability in image classification, using multiple transformations can often increase performance. It is difficult to determine how many learned transformations are needed for a specific application, and due to the randomness of our method, it may require a large pool of learned transformations in order to maintain stable performance. For this reason, boosting is employed in our framework to maintain high classification performance even with a small number of learned image transformations due to the fact that sequential training in boosting constructs complementary classifiers from the training examples.
AdaBoost is one variant of a popular boosting algorithm proposed by Freund and Schapire [
27] and is used to combine a number of weak classifiers to make a stronger classifier. Various multi-class boosting algorithms such as AdaBoost.M1 and AdaBoost.MH were proposed [
28,
29]. They both deal with multi-class classification problems with combinations of binary classification. Reference [
30] proposed a novel multi-class boosting algorithm which is referred to as Stagewise Additive Modeling using a Multi-class Exponential loss function (SAMME). The SAMME algorithm directly extends the AdaBoost algorithm to multi-class classification rather than reducing it to two-class classifications. It is the natural extension of the AdaBoost algorithm to the multi-class case, and its performance is comparable with that of AdaBoost.MH and sometimes slightly better [
30].
The training process of the SAMME classifier in our method involves reweighting the training examples within each iteration right after an evolutionary image feature is learned from the raw pixel images. The SAMME iteratively builds an ensemble of multi-class classifiers and adjusts the weights of each training example based on the performance of the weak classifier that is associated with the learned image feature in the current iteration. Examples that are misclassified will have their weights increased, while those that are correctly classified will have their weights decreased. Therefore, in subsequent iterations, the resulting classifiers are more likely to correctly classify examples that are misclassified in the current iteration.
In particular, given a training dataset {(
}, where
is the input signal and
is the corresponding class label, SAMME sequentially trains
M weak classifiers with the training dataset and calculates a weight
for the
mth classifier
based on the following equation:
where
is the error calculated for the weak classifier
.
The SAMME algorithm is very similar to AdaBoost but with a major difference in Equation (3). In order for to be positive for each weak classifier, we only need the accuracy of each weak classifier to be better than the random guessing accuracy rate 1/K which depends on the number of classes. When comparing the SAMME algorithm with the AdaBoost algorithm, AdaBoost breaks after the error goes above 50%. Thus, in AdaBoost, it is required that the error of each weak classifier be less than 50%, and this requirement is much harder to achieve than 1/K in SAMME. However, for the SAMME algorithm, because of the extra term in Equation (3), can still be positive even though the error of each weak classifier goes over 50%.
After training, the resulting SAMME model consists of a list of weak classifiers and coefficients
that indicate how much to trust each weak classifier in the model. The output of all weak classifiers is combined into a weighted sum that represents the final output of the boosted classifier:
where
M is the number of weak classifiers in the SAMME model,
is the
mth weak classifier,
is the weight for classifier
, and
k is the classification label for each class.
4. Conclusions
The aim of this work was to develop an efficient evolutionary algorithm specifically for embedded vision applications for visual inspection. The goal was to trade slight accuracy for processing efficiency in real-time performance. The proposed algorithm was not only capable of automatically learning global information from training images but also improving performance through the use of boosting techniques. The main advantage of the proposed method is its simplicity and computational efficiency, which makes it suitable for real-time embedded vision applications. We used 375, 104, 175, and 374 images for good, crack, dirt, and fertilized eggs, respectively, and 150 images for each class in both the road condition and pavement quality datasets for training. These are reasonably small numbers of images for data collection and training. The training time for our three example applications was 10 to 30 min on a PC depending on the number of iterations to reach steady-state performance. The processing speed was approximately 100 frames per second with an ARM processor running Linux, demonstrating its efficiency.
This algorithm accurately separated fertilized and unfertilized eggs and detected two common eggshell defects (i.e., dirt and crack). It distinguished six road surface conditions including dry, wet, snow, ice, puddle of water, and sand. It also recognized different road pavement qualities to detect cracks and potholes. For the three datasets created for the experiments, it obtained almost a perfect classification performance that is comparable to the performance obtained using other more sophisticated classification methods.
We recognize that these three sample applications might be viewed as straightforward cases and many other approaches could work just as well as the proposed method. As emphasized previously, our aim was to develop an efficient (fast and requiring very few training images) and easy-to-configure method that is versatile and does not depend on handcrafted features or manual tuning of parameters. These three applications clearly demonstrate the versatility of the proposed evolutionary learning algorithm and prove its suitability for real-time visual inspection applications that need to classify a small number of classes.
A hardware-friendly version of the algorithm was developed [
32]. Our hardware simulation shows that the modifications and optimizations made to the algorithm for hardware implementation did not affect its performance. The hardware-friendly version was implemented in a field programmable gate array (FPGA) [
32]. Our VHDL design can be used for building application-specific integrated circuits (ASICs) for applications that require a compact, low-cost, and low-power vision sensor.