1. Introduction
The waste generation rate is reported to have increased in the last couple of decades mainly because of the increase in economic development and urbanization [
1,
2]. Increased waste volumes are causing problems for governments in managing and processing them efficiently [
3,
4]. Although developed countries have proper waste classification systems in place (i.e., red, green, yellow), most of the waste still ends up either in landfills or incinerated, mainly because of the presence of contamination (Ziouzios et al. [
5] suggests that 75% of the municipal waste that may be recycled is wasted). Therefore, it is of significant importance for any country to enhance its ability to improve waste recycling and waste management mechanisms. Both the existing waste management techniques of landfilling and incinerating pose serious environmental and health threats to the community [
3,
6,
7,
8].
In the context of sustainable development, efficient waste management is one of the key agendas which directly influences the sustainable development goals (SGDs) [
9,
10]. However, irrespective of its importance in global sustainability, waste management has been less prioritized compared to other factors such as water and energy. Specifically, in the context of Australia, limited resources are allocated for waste management (e.g., 250 million AUD were allocated for the waste recycling and policy action plan [
9]). The Chinese waste ban in 2018 and the Council of Australian Government (COAG) export ban in 2020 have caused a national waste crisis. As of now, local governments are individually responsible for the management of waste (i.e., collection, disposal, recycling).
At the scale of local waste management, contamination in household waste is one of the highlighted challenges that significantly impacts the waste recycling process [
11]. As a standard, within Australia, a 6% to 10% contamination rate is referred to as an acceptable range; however, in recent times, the average contamination rates have been reported to be around 15%, which is much higher than the recycling waste import threshold of only 0.5% imposed by China. Educating the community through various activities, workshops and webinars is reported to be one of the commonly suggested approaches towards reducing household contamination. However, such an initiative may only be successful if accurate and widespread data is shared with the community to motivate them [
12,
13].
At the local government scale, bin-tagging or waste auditing is the adopted approach to collect waste contamination-related data and to report the contamination to corresponding customers [
9]. However, bin-tagging is done mainly by the waste collection truck driver by manually looking into the waste truck hopper captured by the camera [
9,
14]. Remondis is the leading waste management organization within the Illawarra, New South Wales (NSW), Australia, and has camera-based systems installed to facilitate the manual bin-tagging process. However, manual bin-tagging is a labor-intensive process that also impacts the driving capabilities of waste collection truck drivers. Manual data collection involves the subjective visual observations of the driver, which may result in high data variance, which needs further analysis and time resources. Therefore, there is a dire need for a unified automated waste contamination detection system using the state-of-the-art technologies towards efficient and sustainable waste management. In this context, detection of plastic-bag contamination, which is one of the most common forms of contamination in household waste, is considered as the first step for developing an automated system.
Artificial Intelligence (AI), edge-computing, Artificial Intelligence of Things (AIoT), computer vision and the Internet of Things (IoT) are disruptive technologies that have achieved huge success in dealing with complex real-world problems [
15,
16,
17,
18,
19]. In the context of waste management, various studies have been performed for waste detection and classification [
5,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29]; however, there still exists a gap in the development of a practical solution. This paper presents an edge-computing video analytics solution for automated plastic-bag contamination detection to be used in waste collection trucks for efficient contamination detection. The proposed solution implements the state-of-the-art object detection algorithms to detect plastic-bag contamination in household waste. Multiple variants of the Faster R-CNN and You Only Look Once version 4 (YOLOv4) models have been trained for plastic-bag contamination detection. For training the computer vision models, a real utility-oriented dataset (i.e., Remondis Contamination Dataset (RCD)) was developed from the manually tagged records of the Remondis collection trucks. The following are the anticipated contributions of the presented research:
- 1.
Development of a challenging utility-oriented waste contamination dataset (i.e., RCD) from the Remondis manual bin-tagging historical records and annotation for plastic-bag contamination bboxes;
- 2.
Development, validation, and analysis of an edge-computing practical solution for automated plastic-bag contamination detection in waste collection trucks.
The rest of the article is organized as follows.
Section 2 presents a review of the most relevant benchmark literature related to the use of computer vision technologies for waste detection and classification.
Section 3 provides details about the dataset used for the training and validation of the computer vision models.
Section 4 presents details about the proposed automated plastic-bag contamination detection system including the software and hardware components.
Section 5 provides information about the experimental protocols and evaluation measures.
Section 6 details the software and hardware evaluation results for the proposed system, mainly for the computer vision models.
Section 7 discusses the results and highlights the potential challenges of the problem.
Section 8 presents information about the field data collection and retraining of the model for improved performance as an essential step from an enterprise solution development perspective to ensure admissible field performance.
Section 9 provides detailed cost analysis for the proposed plastic-bag contamination detection system. Finally,
Section 10 concludes the study by highlighting the important insights and listing potential future research directions.
2. Related Work
This section presents a review of benchmark literature in regards to waste detection and classification using computer vision and edge-computing technologies. The review is organized in chronological order to highlight the advancements made over time in the domain of waste detection.
Rad et al. [
20], in the year 2017, proposed a computer vision-based litter localization and classification system using the OverFeatGoogleNet model. A custom-collected dataset of around 4000 images was used to train the computer vision models. From the results, the proposed approach was able to achieve a detection precision of 63%. The detected litter objects (e.g., leaves, cigarette buts) are not directly related to the waste contamination; however, the detection of small litter objects from the image makes it a relevant problem from a computer vision perspective. Ibrahim et al. [
21], in 2019, developed a comprehensive waste-contamination dataset (i.e., ContamiNet) towards detecting contamination in solid waste. The dataset consists of 30,000 images from multiple sources where the contamination was identified within the waste. The CNN model was trained and compared against the manual labeling. The trained CNN model was able to achieve an AUC of 0.86 compared to the manual AUC of 0.88.
Kumar et al. [
22], in 2020, proposed the use of a computer vision object detection model (i.e., YOLOv3) for efficient waste classification. A custom-developed dataset of approximately 8000 images of waste from six different classes was used to train the object detection model. Most of the images in the dataset contained a single object belonging to only one class; however, a few test images were also captured from the real world where multiple objects belonging to multiple classes were present. From the experimental investigations, an mAP of 95% was achieved by the YOLOv3 model. Later in the same year, Li et al. [
23] proposed a YOLOv3-based computer vision solution to detect water surface garbage. A custom-developed dataset of 1200 images was used to train the waste detection model. From the experimental analysis, the proposed YOLOv3 model was able to achieve an mAP of 91% among three garbage classes (i.e., bottle, plastic, Styrofoam). Although high detection performance was reported, the dataset used for the training was not challenging enough and involved very minimal background noise because the presence of water made the waste objects distinct for the detector.
Panwar et al. [
24], in 2020, proposed a dataset called AcquaVision to facilitate the use of deep transfer learning toward detecting waste objects in water. The dataset comprised 369 images annotated for four waste categories (i.e., glass, metal, paper, plastic). The RetinaNet model was implemented to detect the waste objects from images and reported an mAP of 81%. Although the implemented model performed well, the dataset was very limited and the reported performance cannot be considered a generalized performance. The images from the dataset were of good resolution with distinct waste objects and no noise in most cases (i.e., only the waste objects were present in the image). The similarity between paper and plastic bags is one of the challenges to address in this case. White et al. [
25], in 2020, developed a novel CNN model referred to as WasteNet toward classifying waste objects in the context of smart bins. The proposed model was based on the VGG16 transfer learned architecture and was trained using the TrashNet dataset consisting of 2500 images classified across six different trash classes. From the results, the proposed WasteNet model was able to achieve 97% prediction accuracy. Although high performance of the proposed model was reported, it was not compared to other literature where the TrashNet dataset was used. Further, the nature of the dataset was not complex and the image consisted of only a single class object without any noise, making it a simpler problem for a CNN-based classifier.
Kraft et al. [
26], 2021, developed an edge-computing solution for unmanned aerial vehicles to detect trash from low altitudes. An NVIDIA Jetson NX edge-computer with an object detection model was used to detect the small trash objects from the air. The computer vision models were trained using the UAVVaste dataset consisting of 774 images with 3716 bbox annotations of trash. YOLOv4, EfficientDet and Single Shot Detector (SSD) computer vision object detection models were trained and compared for their performance. From the results, the YOLOv4 model was able to achieve mAP@50 of 78%. Patel et al. [
27], in 2021, used multiple computer vision object detection models to detect garbage. The dataset consisted of 544 images with bbox annotations of garbage material in the image. EfficientDet, RetinaNet, CenterNet and YOLOv5 models were trained and performance was compared. From the experimental analysis, the YOLOv5 model was able to achieve an mAP of 61%. The dataset used was very limited and reported performance cannot be considered a generalized performance; however, the images in the dataset were from challenging real-world scenarios.
Chazhoor et al. [
28], in 2022, performed a comprehensive benchmark study to classify plastic waste using CNN transfer learned models. The WaDaBa dataset consisting of around 4000 images from seven different plastic waste classes was used to train the CNN models (i.e., AlexNet, ResNet50, ResNeXt, MobileNetv2, DenseNet, SqueezeNet). From the experimental analysis, ResNeXt was reported to perform best with an AUC of 94.8%. High classification performance was reported for CNN transfer learned models, which may be attributed to the noise-free dataset. Furthermore, results were not discussed in line with the literature where the WaDaBa dataset was already used. Radzi et al. [
29], in 2022, proposed the use of CNN classification models (e.g., ResNet50) to classify given plastic waste images into seven classes (i.e., PET, HDPE, PVC, LDPE, PP, PS, others). A custom-developed dataset of 2110 images was developed and manually annotated for seven different classes of the plastic type. From the results, the ResNet50 model was able to achieve a classification accuracy of 94%. Although the implemented model was able to achieve high accuracy, the dataset was very simple, consisting of cropped images of individual plastic objects (i.e., only one class of object in a single image), which is not the case in most practical applications, where such models are prone to fail drastically. Most recently, Ziouzios et al. [
5] developed a real-time waste-detection and classification system towards efficient solid waste management. The dataset used for the training of models consisted of 1500 images from the TACO dataset and 2500 images from a local waste-treatment agency belonging to four waste classes (i.e., plastic, glass, aluminum, other). YOLOv4 with CSPDarkNet backbone was trained and reported to achieve an mAP of 92%. Although the reported accuracy is towards the higher end, the images from the TACO dataset are not very challenging and are anticipated to be the reason for the higher accuracy.
As a summary of the literature review (see
Table 1), the waste detection problem has been reported to be addressed either as an image classification problem or as an object detection problem. However, the computer vision object detection approach for detecting waste objects is more suitable for the real-world scenario. OverFeatGoogleNet, CNN, WasteNet, AlexNet, ResNeXt, ResNet50, DenseNet, SqueezeNet and MobileNet models are the highlighted image classification models used in the literature, while YOLOv3, YOLOv4, YOLOv5, RetinaNet, EfficientDet, CenterNet and SSD are the highlighted object detection models. In most of the cases, the datasets were either not comprehensive or not challenging enough (i.e., single object per image with no background noise). These critical analyses clearly suggested the need to develop a practical solution with challenging real-world data towards identifying contamination within solid waste.
3. Remondis Contamination Dataset (RCD)
The Remondis Contamination Dataset (RCD) used for the development of computer vision models (i.e., training, testing) was established from the historical records of Remondis where the drivers manually labeled the images as contaminated. All the images are stored in jpeg format with dimensions and 72 pixels-per-inch resolution. The color scheme for all the images is RGB. The images are taken from the camera installed on the waste collection truck, pointing towards the truck hopper where waste is emptied from the bins before being processed to the main compartment. A portion of images were also captured from the camera pointing towards the bins. The images in the dataset are diverse in terms of at least three different camera zooms, offer challenging blur noise and are captured from different angles depending on the settings of camera installed on the truck. The dataset presents various waste contaminants including plastic bags, plastic bottles and food waste. RCD is a novel dataset presented for the first time in this manuscript and can serve as a benchmark for practical waste segregation purposes including detection of different waste contaminants, characterization of waste contents and counting of a certain waste content occurrence. The main differences between the existing waste contamination datasets and RCD are the actual real-world visuals and presence of contamination along with the non-contaminated waste. For the presented research, the raw dataset was labeled to detect plastic-bag contamination only.
In terms of plastic waste contamination, the dataset is highly challenging, mainly because of visual similarities between some types of plastic bags and non-contaminants. For example, a white plastic-bag is often similar to white paper. Black plastic bags are often similar to any dark portions in the image. Packaging materials are often similar to the reflecting surface of the tracker hopper. Some clear candidates of plastic bags include color bags (blue, yellow, purple), coles bags and woolie bags. As a labeling schema, six type of plastic-bag candidates were considered to be annotated for bounding box detection. The plastic-bag candidates included coles bags, woolie bags, color bags, white bags, black bags and packaging material. Annotations were done using the labelImg tool and labels were saved in .xml format, which were converted to KITTI for training purposes (see
Figure 1).
The plastic-bag contamination detection dataset was generated/curated following a number of standard steps. As a first step, the raw images captured by the camera installed on the waste collection truck were acquired from the Remondis repository. These raw images were then sorted manually to select the training candidates that included visible plastic-bag contamination. The sorted images were then annotated for the plastic-bag bounding boxes using the defined labeling criteria. The final annotated dataset was then converted to KITTI format and split into training and validation subsets for performance evaluation of trained computer vision models. The validation dataset consisted of the images that were not presented during the training process and were unseen to the model, and were used for the performance evaluation of the models. The final dataset consisted of 1125 samples (i.e., 968 for training, 157 for validation) with a total of 1851 bbox annotations (i.e., 1588 for training, 263 for validation).
5. Experimental Protocols and Evaluation Measures
A standard three-stage data-driven research approach has been used for the development of an automated plastic-bag contamination detection system (see
Figure 6). The first stage is referred to as the data preparation stage, where raw images collected from the Remondis records were sorted, filtered and processed. Further, at this stage, images were annotated using the LabelImg [
32] annotation tool for the plastic-bag bboxes. The labels were converted to KITTI format to meet the requirements of the training platform. The second stage is referred to as the model training phase, where, first, the computer vision models were selected, taking literature as reference (i.e., Faster R-CNN, YOLOv4) and hyperparameters for training were decided. The NVIDIA TAO toolkit was used to train the selected models and training performance was assessed using the training loss, validation loss and validation mAP values to ensure that training followed the standard patterns. The final stage is referred to as the testing and validation stage, where the trained models were tested and evaluated using multiple software and hardware performance matrices. Furthermore, the detailed cost analysis was also presented at this stage to demonstrate usability for real-world application.
All the computer vision object detection models used in this research were trained using the NVIDIA TAO toolkit with TensorFlow and Python at the back-end. A NVIDIA A100 Graphical Processing Unit (GPU)-powered Linux machine was used to train the models. A data split of 80:20 was used for training and validation purposes, respectively. The Faster R-CNN model was trained using three different backbones (i.e., DarkNet53, ResNet50, MobileNet), while the YOLOv4 model was trained using two different backbones (i.e., CSPDarkNet53, CSPDarkNet_tiny). All the models were initially trained using a batch size of 1 for 200 epochs and were pruned (i.e., pruning threshold of 0.2 for Faster R-CNN models, pruning threshold of 0.1 for YOLOv4 models) and re-trained for 100 more epochs. Pruning is a commonly adopted approach in neural networks in which unnecessary connections between the neurons are removed to reduce the model complexity/size without impacting the overall model integrity. This results in achieving better memory usage, saving training time, and achieving faster inference times. However, the pruning threshold should be selected carefully since it is inversely proportional to the model prediction accuracy. A pruned model may observe a decrease in prediction accuracy mainly because some important weights might have been removed during the pruning process. Therefore, it is recommended to retrain the model after pruning to retain accuracy. For Faster R-CNN models, the Stochastic Gradient Descent (SGD) optimizer was used with 0.9 momentum and a base learning rate of 0.02 with L2 regularization. Multiple data augmentation techniques including scaling, contrast change and image flipping were incorporated into the training. For the YOLOv4 models, the Adaptive momentum (Adam) optimizer was used with L1 regularization and a base learning rate of . Image flip, color variations, and jitter data augmentation approaches were used during the training.
7. Discussion of the Results
Results presented in
Section 6 show that computer vision object detection models have considerable potential towards automating the process of detecting plastic-bag contamination in waste collection trucks. Furthermore, the hardware testing results further provided evidence that such models are practical to deploy in actual real-world scenarios. From the results, overall, the YOLOv4 model with CSPDarkNet_tiny backbone emerged as the most balanced model in terms of accuracy (i.e., 63%), speed (i.e., 24.8 FPS for Jetson TX2) and power consumption (i.e., 10.68 watts for TX2). Faster R-CNN model with MobileNet backbone and YOLOv4 model with CSPDarkNet backbone were also identified as potential second and third choices, respectively, for deployment using the TX2 edge-computer.
Figure 12 and
Figure 13 show true detection and false detection, respectively, for the YOLOv4 model with CSPDarkNet_tiny backbone. In
Figure 12, it can be observed that the model was able to accurately detect the plastic-bag in the image, although the bboxes were not exactly the same as the ground truths; however, the model was able to capture the most of the plastic-bag in the image.
In terms of false detection (see
Figure 13), three examples are included; first, when the model failed to detect any plastic-bag in the image; second, when the model wrongly classified other objects as plastic bags and third, when the model failed partially by detecting only a few of the many existing plastic bags in the image. One reason for the miss-detection may be attributed to the existing noise and visually similar objects within the dataset. However, it is expected that with the availability of more images for training, the model will keep improving and over a few iterations of re-training, it will achieve a level of accuracy acceptable for real-world application. The existing model has been deployed on actual waste trucks as a pilot project to test the functionality of the hardware and collect more images for fine-tuning the object detection model. A few highlighted challenges of the dataset identified from the analysis included the low pixel resolution of images (i.e., low level of visual details), presence of noise (i.e., light reflections, glare, low lighting) and visual similarity of the plastic-bag to other objects in the image (e.g., white bag similar to white boxes and white paper, black plastic-bag similar to the dark portions, packaging material similar to the shiny reflected surfaces).
8. Field Data Collection and Model Retraining
The developed edge-computing hardware was deployed in field for three waste trucks with the aim of validating the functionality of the developed solution and collecting more data. The DeepStream application was configured with the functionality to save the image and corresponding labels in KITTI format for each detection in an external USB drive. The idea behind this activity was to monitor the performance of the deployed model and to retrain the model using the collected data. From this activity, in total, 2325 images were extracted from the field deployment. Out of these image, 314 images were separated for testing, while 2011 images were used for the retraining of the model. In addition to images collected from the field, a set of images was also extracted from the open source videos captured by the waste collection truck. In total, 2224 images from the videos source were extracted and used for the retraining of the model. All the images were annotated for the plastic-bag bounding box instances.
The YOLOv4 model with CSPDarknet_tiny backbone (i.e., the best-performing base model reported in
Section 7) was retrained with additional images collected from the field and extracted from the open source videos. In total, an additional of 4235 images were used along with the original 968 images for the retraining of the model towards achieving improved performance. The same experimental protocols as described in
Section 5 were adopted for the retraining of YOLOv4 model with CSPDarknet_tiny backbone. From the retraining results, an improved performance of 73% mAP for YOLOv4 with CSPDarkNet_tiny backbone was achieved. In addition to training performance, to better monitor the improvement of the retrained model, both the base and retrained models were subjected to an unseen test dataset of 314 images collected from the field. The performance was compared in terms of mAP, True Positives (TP), False Positives (FP) and False Negatives (FN).
Table 7 summarizes the field testing results for the base and retrained models. From the results, it can be observed that retrained model achieved mAP of 69% in comparison to the base model, which achieved mAP of 58% (i.e., an improvement of 11%). Furthermore, the number of FPs was observed to be reduced to 112 for the retrained model in comparison to 176 FPs for the base model (i.e., a reduction of 36.6% in the FPs). The FNs were also observed to be decreased by 8.29% for the retrained model. In addition, there was an increase of 6.21% in the TPs for the retrained model. The improved performance of retrained model suggests that a few more retraining iterations in the future using the data collected from the field will further improve the performance of the computer vision model for plastic-bag contamination detection.
9. Cost Analysis
Cost analysis for the developed edge-computing solution for plastic-bag contamination detection is presented in
Table 8 to inform the stakeholders and define the baseline for deploying similar solutions in various geographical locations. The presented cost analysis is for the developed prototype based on the R&D principles and is subject to reduction by at least three times once the optimized version of the product is developed on a mass scale. Overall, the costs are divided into non-recurring costs (i.e., hardware cost, software cost, services cost) and recurring costs (i.e., software maintenance cost, hardware maintenance cost, operational cost). Non-recurring costs are estimated to be
$22,245 (i.e., the hardware cost of
$2245, software development cost of
$15,000, the installation cost of
$5000) and are to be spent one time. Recurring costs are estimated to be
$15,225 for one year (i.e., the software maintenance cost of
$10,000, hardware maintenance cost of
$225, the operational cost of
$5000).
10. Conclusions
An edge-computing video analytics solution has been successfully developed and validated for automated plastic-bag contamination detection in waste collection trucks. Multiple variants of the Faster R-CNN and YOLOv4 model were trained using real waste data collected from Remondis historical manual tagging records (i.e., RCD). From the results and analysis, in terms of training performance, the YOLOv4 model with CSPDarkNet53 backbone was able to achieve the best performance (i.e., validation mAP of 67%); however, it took the longest among all models to train (i.e., 132 seconds per training epoch). On the other hand, YOLOv4 with CSPDarkNet_tiny backbone was able to achieve a comparable training performance (i.e., mAP of 65%), but was the fastest to train (i.e., 48 seconds per training epoch). A similar trend was also observed for the testing, where the YOLOv4 model was the second best (i.e., 63% mAP in comparison to 64% for the best performing model). From a hardware deployment perspective, the YOLOv4 model with CSPDarkNet_tiny backbone was the fastest (i.e., FPS of 24.8 for TX2) and consumed the least power (i.e., 10.68 watts for TX2) in comparison to all the implemented models; therefore, it is suggested as the suitable model to be deployed on TX2 edge-computers for real-time plastic-bag contamination detection in waste collection trucks. The proposed edge-computing solution was deployed on waste collection trucks to assess the functionality of the system and to collect more data for model fine-tuning. As a result, around 4235 more images from the field testing and open source videos were collected, with which the YOLOv4 model with CSPDarkNet_tiny backbone was retrained for improved performance. The retrained model was able to achieve an improved performance in comparison to the base model in terms of mAP (11% increase), FP (36.6% decrease), TP (6.21% increase) and FN (8.29% decrease). For the proposed prototype development, $22245 USD is estimated for the one-time cost to deploy the system, while $15225 USD is estimated for per year recurring costs. The visual similarity of other objects to plastic bags was highlighted as one of the critical limitations in the presented research, along with low lighting conditions and the presence of reflections. In the future, it is planned to annotate images for multiple types of plastic bags (e.g., white bag, black bag, colored bag, coles bag, woolies bag) for improved performance. Furthermore, as an extension of this research, it is intended to make use of other cameras installed on the truck to detect potholes and roadside trash.