1. Introduction
There is a growing need for more intelligent and adaptive industrial robot capabilities within robotic work cells in scenarios regarding servicing, managing EOL components, and reclaiming e-waste materials in a non-destructive manner. A vast amount of preliminary research into this field focuses on feasibility assessments of product disassembly for component re-purposing or recovery by analyzing product waste streams and the complexity of high-value component assemblies [
1,
2]. Other areas of focus are various non-destructive or semi-destructive initiatives for reclaiming materials from EOL electric vehicle waste streams [
1,
3,
4]. Some methods proposed include variations in human–machine collaboration for EOL battery recycling where humans can handle non-trivial tasks while collaborating robot platforms can manage high-volume repetitive taskings like removing screws and bolts [
4,
5]. Other areas of automated disassembly focus on consumer products like LCD screens [
6,
7], which, given the lack of predetermined knowledge of fastener position, still rely on semi-destructive methods like using angle grinders and power drills to separate components.
Mechanical fastener extraction constitutes approximately 40% of all disassembly operations [
8]. Several researchers have addressed the automated disassembly of fasteners. DiFilippo and Jouaneh [
9] developed a system to remove screws from the backs of laptops automatically using non-AI vision algorithms. Li et al. [
8] presented a device for automated hexagonal-headed threaded fastener removal that used a spiral search strategy to overcome uncertainties in the location of the fasteners. Chen et al. [
10] introduced a method that utilized images of a screw to assist in aligning the screw–tool interaction, with torque monitoring employed throughout the screw removal process.
With recent advances in vision-based AI in the classification and accurate identification of mechanical components using varying region-based convolutional neural networks (R-CNNs) and DCNN methods, many researchers [
11,
12,
13,
14,
15,
16,
17,
18,
19] have reported on the use of vision-based AI in screw detection. This makes it possible to integrate this technology with robotic systems for applications such as reclaiming e-waste components. This task is traditionally prohibitively expensive with human operators and needs to be more efficient when relying on existing automated methods.
While there is a significant interest in the use of AI methods for screw detection, only a few studies [
20,
21] have addressed the development of AI-based robotic screw disassembly systems. This paper focuses on developing a robotic system capable of extracting cross-recessed mechanical fasteners through an application programming interface (API) layer. This system integrates a standalone DCNN vision system, explicitly trained to detect cross-recessed screws, with robotic simulation software designed for industrial robot applications. The system was tested on cross-recessed screw (CRS) targets with a 0.5-inch (12.7 mm) diameter at camera ranges of 280–500 mm. It employs a stereoscopic camera system with an RGB (red, green, blue) imager. The primary contribution of this work lies in the integration of a DCNN vision-based system, simulation software, and robotic framework for screw removal, as well as the development of a two-stage imaging process for screw disassembly. Unlike previous work [
20,
21], which focused on the disassembly of hexagonal-headed screws, this research explicitly targets CRS disassembly.
Figure 1 illustrates the input images used to train this study’s DCNN implemented for fastener detection.
The remainder of this paper is organized as follows.
Section 2 of this work focuses on methodology, including introducing the simulation software, and how other software was integrated to provide detection and fastener removal capabilities.
Section 3 provides a system overview, which incorporates a discussion of the design of the hardware and software components implemented and introduces some of the mathematical and robotics concepts for the problem domain. Within
Section 3, an introduction of two of the state-machine-based extraction methods tested throughout the evaluation phase in this work is provided. This is followed by a brief discussion of managing object detections and how that pertained to the output of the DCNN predictions to the video stream provided by the robot’s tooling. The evaluation in
Section 4 provides a single sample extraction plot that incorporates readings from the custom tooling integrated with current and force sensors, a state sequence index plot, and readings from the robot position and camera detections. Bulk run data against the testing artifact introduced later in this work are provided to give a general sense of the system’s capability with respect to two of the extraction processes designed and used in this paper. A conclusion is provided in
Section 5 regarding the evaluation and discussion of the aggregate data from bulk CRS extraction testing and what was accomplished in this work.
2. Methodology
In this work, we used the RoboDK software (v5.5.2) as the simulator for our system. RoboDK is software for simulating, programming, and implementing program logic for industrial robots before deploying to physical hardware. This software platform allows users to create and simulate robotic operations for various robot brands and applications, and it also generates robot programs for the specified robots, thereby reducing downtime and improving productivity. With its extensive library of robot models and post-processors, RoboDK is easily adaptable via a standard API to different robot controllers and platforms, with minimal changes to the source code necessary to retrofit.
RoboDK also supports online programming through its user interface, which allows users to control their robots in real time, adapt to changing manufacturing requirements, and troubleshoot any issues that may arise during functional testing. Leveraging the online programming functionality and extensive API, users can stream the same code used for offline testing in a physical simulation directly to the robot’s controller via one of the software libraries of drivers.
Figure 2 illustrates the two communication methods for interfacing with physical robot hardware. The online programming method in the right half of the figure is primarily employed throughout this work.
In this project, the simulated robot continuously received commands, such as inverse kinematic endpoints, movement constraints (e.g., linear or joint move limits), tool change commands, and speed commands, through the API. These commands were converted in real time into driver calls, synchronizing the movements of the physical system. The driver also provided feedback such as error states, driver status, and real-time data like the actual robot position. This feedback integrated into the state logic, enabling the system to queue commands (to prevent sending a new move command while others are pending) and facilitate decision-making in the state-based program logic. For instance, the system can avoid specific joint movements based on constraints set on the simulated robot’s joints to prevent potential collisions. The primary functionalities that enable this simulation suite’s capabilities are given below.
Ease of use through extensive API (C#, Python (3.7.8), MATLAB (R2022a));
Robotics tools (kinematic model, trajectory planning, pose transformations);
Comprehensive post-processor system (offline programming);
Extensive driver support for major brands of industrial and collaborative robots (online programming).
The physical system to be simulated consisted of a UR5 robot equipped with task-oriented tooling and stereoscopic camera equipment.
Figure 3 shows the physical system and its simulated version in RoboDK.
The simulated variant of the UR5 is shown with collision detection boundaries, which, when executing moves, would predetermine and halt the online robot if a collision was estimated. Joint limits were implemented at the shoulder and wrist joints within the simulation to provide only kinematic solutions that would not be obstructed by the robot’s wrist-mounted electronics assembly or cable spine. In addition, the RoboDK simulation suite employs the UR5’s Denavit–Hartenberg parameters for an accurate kinematic model. These parameters are used for analytical kinematic solutions to the tool center points (TCPs) defined on the tool carriage in the simulation within
Figure 4.
A tool assembly was designed to carry an electric screwdriver, a stereoscopic camera, and a cross-profile laser and mount assembly. A compact suite of electronics (not shown) was also implemented for ingesting incoming control signals, providing signals to the motor and adjacent tools, and measuring compressive force and torque experienced by the tool during operation to be returned to the control computer. A view of the completed tool assembly is shown in
Figure 5a, followed by an exploded assembly view in
Figure 5b.
Figure 4 shows the tool frame definitions of the tool assembly in the RoboDK software. Three frames were defined in the RoboDK simulation application with respect to the CAD geometry. These were the laser, the electric screwdriver, and the camera frame. The frame definition of the camera was placed with respect to the RGB imager origin (as defined by the Intel RealSense D435 camera datasheet) and matched the camera coordinate system. The pose of the camera imager frame was stored by the camera streaming script for deprojection with each instance of recorded target positions in the camera frame.
RoboDK was chosen as the foundation for the simulation software in this work. The primary reasons for its selection include its support for online driver programming, built-in driver support for the UR product line, extensive documentation of its Python API, and a cost-effective licensing scheme suitable for academic applications. Similarly, the Intel RealSense D435 was selected over various industry stereoscopic camera systems based on several criteria: an easily implementable and open-source programming API, a calibration suite, widespread academic use, extensive documentation, and a compact form factor. The DCNN was employed within the tiny-yolo v2 framework using a model we previously developed [
12]. It was chosen for its compatibility with the existing Python version that constituted most of the project’s code base rather than being retrained on a newer model. The DCNN model we used yielded an average precision (AP) between 92.6% and 99.2% for the different testing datasets. Further information on the DCNN model and the training details are discussed in [
12]. The UR5 robotic arm was utilized as an available laboratory asset. Still, the project’s code base was designed to leverage RoboDK’s features, enabling the substitution of the UR5 in the software suite with a different 6-axis robotic arm, incorporating new geometry, drivers, and a kinematic model in the future.
Additionally, tool assembly components, such as the laser assembly, electric screwdriver, and controller electronics, were either custom-made or employed educational-grade components (as opposed to industry-grade) for cost efficiency. Hardware solutions for power conversion, force and torque sensing, switching, and control were custom-implemented using consumer off-the-shelf components such as buck converters, relays, a microcontroller, and a motor driver shield.
4. Results and Discussion
Figure 10 shows a sample extraction run using method #2’s extraction process, which correlates to a single execution of the state machine in
Figure 8. This Figure gives time series data for the force sensor, current sensor, active state, and TCP z-coordinate tools.
In
Figure 10, the top subplot illustrates the current data in milliamps (mA) as recorded by the motor driver board during the fastener extraction process. Two distinct peaks are visible. The first peak represents the initial current detection state, where a low RPM spin is used as the tool lowers to the known position of the CRS to check for thread engagement. This peak, reaching nearly 500 mA, indicates the tool’s recognition of engagement with the fastener profile, triggering the screwdriver to shut down promptly. The second peak illustrates the motor current fluctuations as the tool loosens and disengages the fastener during the extraction phase. The second subplot displays the Force-Sensitive Resistor (FSR) data, which measures force as the tool compresses and contacts the testing artifact. An initial high peak at the 25 s mark indicates the first contact, followed by a slight decrease in force and a plateau around the 27 s mark. This change occurs as the tool bit engages with the screw threads, and slightly jogs upwards to reduce the force on the fastener. The sharp decline and fluctuating readings between 30 and 40 s mark the extraction phase, where the pressure is relieved as the tool unthreads the fastener and gradually ascends in the
z-axis. The third subplot shows the state index time series data, where individual integers represent discrete states, as detailed in
Table 2. The final subplot displays the z-position of the active tool, as reported by the robot driver and the known TCP in the RoboDK software. Here, the
z-axis is oriented relative to the robot base and decreases to the second imaging state z-coordinate, as defined in
Table 3. The abrupt changes in the
z-axis observed between the ten- and twenty-second marks correspond to the software switching the active tool between the laser, RGB camera, and electric screwdriver.
The cycle time for the run shown in
Figure 10 is 50.5 s, which is consistent for each run (almost all runs took less than 55 s for each target). Roughly 40% of the time in this method is used for re-imaging the desired target and traversing to and from the robot’s staging position. This cycle time can be reduced in future iterations of this system by (a) using higher robot speed during the traverse back and forth and screw extraction phases, as the speed was conservatively kept low due to early-stage testing, (b) replacing the force-sensing resistor with a force/torque sensor which gives more reliable force detection and a faster response, and (c) deactivating the laser on the tool assembly which only provides a visual queue of the intended target and its accuracy but is not used in the logic for the extraction process.
As previously defined,
Table 2 represents the states on the
y-axis of the State Sequence Run Data subplot in
Figure 10. While the logic for the state program controlling the robots’ actions is continuously re-executing, the time series data of which state the robot is currently in are continuously aggregated for status monitoring.
Table 3 represents additional operational parameters collected on each CRS extraction attempt. These data include the relative current and conductance thresholds while the physical tooling executes, the initial and re-acquired target x, y, and z globalized coordinates, and the DCNN detection confidence threshold.
Figure 11 shows the nine targets on the testing artifact used in this work. Experiments were conducted with one type of fastener material to minimize possible surface reflection effects on detection.
Table 4 gives a sample single run for single-shot high-level imaging of all targets within the robot’s workspace. Three failures were measured from this sample extraction process, which, unlike the process outlined in
Figure 8, omitted any form of camera repositioning for more optimized target deprojection. Targets 1, 2, and 9 were noted as failing due to general inaccuracy, causing threads to not engage during the CurrentDetect state in which the tool attempts to tighten the CRS to detect thread engagement and properly seat the TCP if necessary. The overall first-pass yield for this sample run was 66.67%.
Table 5 presents a sample of a bulk testing run within the robot’s workspace. The sample extraction data are given for the nine targets on two extraction passes. After attempting each target, if the vision system detects the remaining targets, it will execute another extraction attempt on them. It should also be noted that targets 1–9 are not in any order (left to right or top to bottom) for the sample extraction run. This is because each coordinate was set initially at random until all targets were detected, and the vision system updated the targets according to a predefined global tolerance.
The two sets of coordinates for each target represent the globalized coordinates from the initial workspace staging state and the re-imaging state for the first and second image numbers, respectively. These coordinates are for the extraction that succeeded, so if it failed on the first pass, both sets of coordinates are for the second pass. The “X” in the Successful Extract columns indicates that the cross-recessed screw extraction was successful on that pass. Success is defined as the CRS threads fully disengaging from the heat-set threaded inserts on the testing artifact.
The FP and SP columns indicate “first pass” and “second pass” exclusively, referring to
Table 5. The total FPY (first-pass yield) ranges from 78% to 89%, and the total SPY (second-pass yield) is 100% in all three instances. An interesting trend that can be observed from all method #2 bulk run data and in
Table 5 is that of the confidence interval generally increasing or staying the same with the second pass, which coincides with the camera and DCNN system overall performing better (more consistent deprojection behavior) when the RGB imager is directly in line with the target. This is evident with targets 1, 2, 4, 6, and 8.
Table 6 compares the results obtained from this work with robotic screw disassembly systems reported in the literature. Note that references [
20,
21] address the removal of hexagonal-headed screws, while our work targets CRSs. All the works in the table have some form of force or torque sensing to help the tool engage with the screw. In reference [
20], a 100% extraction accuracy rate is attributed to the combined use of a deep learning model, a force/torque-based analytical model for the tool–screw interaction, and an optimization algorithm.
It should be noted that the failures and inconsistencies observed in this work are mainly due to limitations in the imaging system’s accuracy and the tolerances within the tooling assembly. The most prevalent failure mode in the automated screw disassembly is the screwdriver slipping away from the center of the cross-recessed screw when attempting to spin into the correct orientation after the force sensor detects contact with the part. This is related to inaccuracy with the deprojection system, as trying to seat the driver off-center would make the screwdriver re-orientation process more likely to slip. These limitations combined to prevent the testing and extraction of smaller-diameter cross-recessed fasteners. The stereoscopic camera’s vision system introduced errors in its deprojection model, attributable to an inherent 2% depth sensing error specified by the manufacturer and the inability to image objects closer than approximately 280 mm. An empirical compensation method was applied to enhance target positioning accuracy. Testing was conducted to derive a 3D calibration curve, which compensated for the X and Y distances (relative to the camera imagers) as a function of deprojection distance. This curve was developed for the camera system’s valid operational range (280–500 mm), where the DCNN model demonstrated reasonable target detection confidence. The calibration curve was constructed using three targets arranged vertically on a placard with known dimensions. The placard was evaluated at six positions across four depths relative to the RGB imager. Data were collected for 15 min at each position to capture a broad range of discrete detection coordinates.
The variability in the position of the tool center point defined on the electric screwdriver was estimated to be between 0.5 and 1.0 mm, which limited the overall process testing to the larger CRS targets discussed in this work. The inconsistency introduced as the tool changes orientation would necessitate a screwdriver with less variability in the chuck design, which secures the bit. This issue warrants further investigation for a more robust tooling design. Overall, the failures introduced by the prototype tooling and imaging system could be mitigated through a process design incorporating an iterative imaging technique, albeit at the expense of higher cycle times and lower first-pass yield.
5. Conclusions
This article describes a novel integration effort to combine an advanced DCNN vision system with robotic simulation software (RoboDK) that emulates a physical robot. The robot is equipped with a tool suite designed specifically for extracting cross-recessed screws. An innovative software package was developed that interfaces with all the physical components, including the tooling, camera system, UR5 robot, robot controller, and simulation software that communicates with the physical UR5. The custom-built modular software package can run the DCNN, interface with the RoboDK simulation software, and manage the physical tooling through serial communication.
The system was tested extensively, and the results are presented in the results and evaluation section from the camera’s perspective, as well as the DCNN operation and functionality of the simulation software. The software suite, which bridges the state-based control of the robot to the control of all other pieces of equipment, was able to convert API calls received and issue driver commands to the robot while synchronously providing real-time feedback. Additionally, a substantial amount of work went into driving several evolutions of state-based logic to formulate an overall process for CRS extraction. Ultimately, it was proven that the system could consistently extract most 12.7 mm diameter cross-recessed screws from a test artifact with high consistency after a single cycle of extraction process execution. Using method #1, which incorporated only a single imaging stage for all targets without adjusting the camera position, a first-pass yield typically ranging between 44 and 67% was achieved. When utilizing method #2, which allowed the system to reimage the targets on the stepped artifact from multiple positions, a first-pass yield of 78–89% and a second-pass yield of 100% was achieved.
This work encountered several limitations and identified opportunities for more refined study, particularly regarding equipment, software implementation, and testing constraints. Using consumer off-the-shelf components, like the motor controller, screwdriver, and force-sensing resistor (FSR), while cost-effective and easy to implement, led to inconsistent readings. This inconsistency required the state-based logic to rely on trend observation rather than individual readings for real-time decision-making. The DCNN was trained on a limited dataset of approximately 1000 images and three sets of cross-recessed fasteners. This restricted the vision system’s confidence and versatility, especially for larger targets and perpendicular viewing angles. Adding more pictures, especially ones taken at distances corresponding to the 300–500 mm detection range of the stereoscopic camera systems, would improve the detection accuracy of the DCNN model. Adding a higher-precision camera and more precise positioning methods would also enhance the detection accuracy. Analyzing the potential impact of different materials and surface treatments on detection is also essential to consider in future work. Given the project’s scope and the extensive integration effort, several calibration methods, like ChArUco calibration for the stereoscopic camera, were only moderately explored. Implementing these could refine the TCP in the robotic simulation software, improving target deprojection and fastener localization in the robot’s workspace.