1. Introduction
According to [
1], in the majority of cases, the deployment of industrial robots is intended to replace the role of humans in carrying out cyclical, dirty, and dangerous activities. The aforementioned tasks are ideally performed in closed robotic cells. Such cells prevent the accidental entry of humans or other objects into the workplace. The philosophy of the design and principles of robotic cells is proposed in [
2]. Problems occur when it is impossible to create a closed cell, or when interaction between a robot and a human is essential. According to [
3], the current trends in making collaboration safe consist of pressure mats, light barriers, laser scanners, and the use of collaborative robots. A comparison between classic robots and collaborative robots is provided in [
4], showing that the disadvantages of collaborative robots include their lower working speed and higher price. Hence, it is increasingly worthwhile to use external security elements to ensure the safety of the workplace.
An interesting approach to Human–Robot collaboration (HRC) is a contact-type security system called AIRSKIN by Blue Danube Robotics that is described in [
5]; the system is a wearable safety cover for robotic manipulators consisting of pressure-sensitive pads. These pads detect collisions with the use of air as the sensing medium. The drawback of this technology is the maximal speed limit of operation and the need to sense objects or persons after contact with the pressure-sensitive pads.
As mentioned in [
6], a requirement for implementing HRC is to create a supervised secure and co-operative production envelope between the robot system and a person in the workspace through a virtual protective room referred to as SafetyEYE by the company PILZ. The SafetyEYE system can detect object intrusions into adjustable zones, and based on the type of zone intruded into it responds accordingly, such as by modifying the operational speed of the robotic system. This safety solution has also its drawbacks. It does not provide the possibility to grant security exceptions for expected intrusions within its predefined zones. Another drawback is its lack of human recognition algorithms.
Another current state-of-art safety system mentioned in [
7] is from Veo Robotics and called FreeMove. This system monitors work cells in 3D and implements dynamic speed and separation monitoring. This safety system, based on [
8], is still awaiting certification. It promises a real-time understanding of the workspace with the possibility to exclude large objects, such as workpieces, so that they do not trigger the safety system. Its disadvantage lies in the versions of robotic controllers it supports, therefore it is suitable only for newer robotic systems.
Such external security elements create zones in the vicinity of the robotic workplace. If something enters such zones, work is slowed or completely halted, as described in more detail in [
9]. Security elements based on infrared/light detection only detect their surroundings in one plane. This knowledge has resulted in the idea of monitoring workplaces not only in a single plane but in 3D.
Methods for obtaining 3D data, where every object’s distance from the point of observation is known, have been understood for decades. The most common method is to use two cameras; the basic principles of geometry are dealt with in more detail in [
10,
11]. Another approach for obtaining 3D data is to use a monocular camera and estimate the distance; this approach is described in more detail in [
12,
13,
14,
15]. Other methods include the use of depth cameras. The depth camera principle of operation may be based on the duration of travel of a beam of light (this is explained in more detail in [
16,
17]), or using structured light that deforms [
18]. Current problems that can occur when using camera systems in robotics primarily concern the speed of the processing and evaluation of the 3D data and measurement noise. Despite the current hardware deficiencies, camera image depth processing has been used in interactions between humans and machines for several years. In [
19], a stereo camera is used for the teleoperation of a robotic manipulator. Camera systems in co-operation with robots also have applications in medicine, where, according to [
20], a camera has been used to calibrate a robot arm during minimally invasive surgery.
The available safety features for collaborating and zoning in the workplace are intended to identify any movement in and intrusion into the work zone. Papers [
21,
22] address the identification of human beings based on the image processing of human faces or using only images of the ears. Another work [
23] addresses the identification of moving objects together with an estimation of their distance from the camera by utilizing one camera. Such techniques make it possible to assign different authorizations depending on the object identified. However, if the objects are externally identical, it is necessary to use additional elements for unambiguous identification. One option is to use Real-Time Location Systems (RTLS) technology, which assigns each object a unique identification number. The primary function of RTLS technology is to track the position and trajectory of objects, as mentioned in [
24]. This technology can also be used to optimize trajectories, as stated in [
25,
26].
This article aims to verify the applications of depth cameras and RTLS to make robotic workplaces safe with zoning. The benefit of this method over the above-mentioned state-of-art technologies lies in incorporating RTLS technology, which will allow selected moving objects to pass through the workplace safety zones without limiting the speed of production. During the work process, moving objects are evaluated, primarily concerning whether or not the object is human. In such a case, for safety reasons no exemption can be granted for entry/transit through work zones without the production speed being restricted.
The novelty consists in the synergy of the interconnection of two co-existing technologies intended in principle for different deployments to increase the safety of robotic workplaces, as mentioned in the article. This design was practically implemented for an experimental robotic workplace, and our results and evaluation are presented below.
This paper is divided into four parts. The first one presents a short theoretical background. The second one contains the main idea of the work and the methodology used. The third part discusses the results. Lastly, the fourth concludes this paper.
3. Problem Statement and Main Work
Following the introduction, this paper aims to propose an experimental vision and RTLS-based HRC setup that should provide an approach other than the reviewed state-of-art technologies. Our aim was to analyze the downsides and try to improve them. Other states-of-art technologies provide multiple solutions to ignore or exempt areas from the workplace that are not monitored by the security system. This is beneficial, but it also has drawbacks. Excluding certain spots within the workspace from virtual fencing could create security risks. This paper’s approach addresses the issue of dynamically recognizing objects which could come from any direction with a depth camera. The solution also extracts the image features and checks with a certain confidence level if the recognized object is a person. If the recognized object is a person, no exceptions are granted, and the system reacts like any speed and separation monitored (SSM) HRC system, where the distance from the person to the robotic system is cyclically compared. If the person enters warning zones, the system slows down the robotic system’s speed, and if the intrusion continues into the danger zone it completely stops the robotic system. The recognized object is also checked against the RTLS tag position data to check if the tag ID has granted exceptions. If such a case exists, the security system ignores these intrusions only if the object is not detected as a person and allows it to move around the workplace without modifying the speed of the robotic system. The results of this experimental collaborative robotic workplace are presented in the corresponding section, which is followed by a discussion and conclusion.
3.1. Test Workplace Specifications
The HRC safeguarding of the operator on the test workplace is achieved through SSM by maintaining a certain minimum separation distance during operation. The intelligent zoning design was implemented in a functioning robotic workplace. The function of this workplace is the bin picking of objects that are then assembled using a robotic manipulator. The maximum reach of the manipulator is 540 mm; its maximum speed in T1 mode is 250 mm/s, and in T2 mode it should achieve a theoretical maximum of up to 2 m/s, based on [
47,
48]. A 3D scanner located above the robotic workplace scans a bin containing objects. After identification, the scanner sends the object’s coordinates to the robot control system, which sends the trajectory information to the robotic manipulator. The manipulator grabs the object and assembles it with another object. The supply part of the process is carried out using an autonomous supply trolley. The workplace floor plan is shown in
Figure 4. As the workplace area is limited, it was necessary to modify the individual zones’ size to perform experiments for various depth camera settings.
The primary workplace components used for this article’s preparation were a KUKA KR3 R540 robotic manipulator (the main specifications can be found in the datasheet [
47,
48]), a Stereolabs ZED depth camera (the datasheet can be found in [
49]), a Pozyx UWB real-time location system (details can be found in [
50]), a Photoneo PhoXi 3D scanner (details can be found in [
51]), an internally developed supply trolley, and a computer. The entire process can be split into two parts. The first is robot control, consisting of the bin picking of objects performed by the robotic manipulator and an automated supply by the mobile trolley. The second part is the supervisory control that monitors and zones the robotic manipulator’s working environment and, depending on the external stimuli, adjusts the performed task’s speed. Safety zones are then created using the input data from the depth camera. The UWB-based location system operates concurrently to monitor the spatial position of the tags. The depth camera is directly connected to the computer. In contrast, the RTLS system is first connected to an embedded platform that uses the OPC Unified Architecture (UA) protocol to communicate with the computer. The computer then evaluates the input data and sends the information using the TCP/IP protocol to the robot control system. The data received by the robot control system are processed by the KUKA user interface KSS (KUKA.SystemSoftwarere), more precisely by a submit file (SF). The SF is a background process inside the user interface, responding cyclically to changes using its preprogrammed logic. In other words, the SF is superior to the user interface. In this case, the SF sets the speed or halts the robot motors based on the computer’s input data. The KSS user interface contains functions for the trajectory planning of the robotic manipulator, and these functions are slaves to the SF settings. A simplified architecture and the method of communication between the components of the security workplace are shown in
Figure 5.
RGB-D workplace image processing is performed by a ZED depth camera, where the area covered by the depth camera is presented in
Figure 4. The depth camera itself can operate in four resolution modes, ranging from the lowest-quality Wide Video Graphics Array (WVGA) at a resolution of Side by Side 2 × (672 × 376) pixels through to a 2.2 K output at a resolution of 2 × (2208 × 1242) pixels. Depending on the mode, it is possible to set the frames per second between 15 and 100 Hz. Depth recording uses the same output resolutions as the RGB recording. The depth range of the camera is between 0.2 and 20 m. The camera’s field of view (FOV) is 110° × 70° × 120° (horizontal × vertical × diagonal). An 8-element wide-angle all-glass lens with optically corrected distortion and an f/1.8 aperture is used. A USB 3.0 port is used for both connection and power supply. The camera is compatible with the Windows 10, 8, 7; Ubuntu 18 and 16; and Jetson L4T operating systems. The camera enables integration with third-party SW, including Unity, OpenCV, ROS, Matlab, Python, and C++. The manufacturer’s minimum system requirements are a dual-core 2.3 GHz processor with 4 GB RAM and an NVIDIA graphics card with a compute capability of over 3.0. The camera’s position was optimally chosen so that it was possible to capture the available space around the robotic workplace.
The Pozyx location system based on UWB radio waves forms a separate part of the workplace. This location system requires fixed anchors, tags, and a computing unit to process the data. The placement of the anchors influences the measurement accuracy.
Figure 4 shows how the anchors are arranged on the ceiling within the room to illustrate their position accurately as possible. When placing the anchors, the essential advice in [
52] was followed to achieve maximum measurement accuracy. The anchors use the TWR positioning principle and need a continuous 3.3 V power supply, as they do not have batteries. The Decawave DW1000 transceiver allows a communication speed of up to 6.8 Mbps. The selected tags are shields compatible with Arduino via an I2C bus. The board also includes an STM32F4 microcontroller. The tags contain the same transceiver as the anchors and also require a continuous power supply. The manufacturer states an accuracy of 10 cm in a noise-free environment, which is sufficient for our purposes. The initial data processing is performed on an embedded Raspberry Pi 3 platform. Several tasks are performed on this embedded platform—e.g., reading data, noise filtering, and then sending the processed data using the OPC UA protocol to the computer. The use of an embedded platform has proven to be a satisfactory solution.
The main part of the algorithm is run on a computer with a 10th-generation Intel Core i5 processor and a quad-core processor running at 2.5 GHz with the option of overclocking to 4.5 GHz. The computer has 16 GB DDR4 RAM running at 2.93 GHz. A 1 TB SSD provides sufficient storage and enables high read and write speeds. A graphics card that supports the Compute Unified Device Architecture (CUDA) platform is an essential part of the computer. In this case, an Nvidia GeForce RTX 2060 with 6 GB RAM has been used. This graphics solution was needed to provide compatibility between the camera system and the computer itself.
3.2. Depth Recording Processing
Depth monitoring at a robotic workplace is achieved using a depth camera working on the passive stereo principle. The camera manufacturer offers interfaces for several programming languages; Python was used here. An API was created to read data from the depth camera with the subsequent detection of moving objects. As the camera uses the passive stereo principle, it is possible to access the left and right video streams separately and the depth recording. The depth recording from the camera is stored as a Point Cloud in the variable . Let the Point Cloud be represented as , where is a layer of information defined as , with , .
The user selects the width
W and height
H of the stream—see
Section 3.1. The Point Cloud is composed of data from four 32-bit channels containing
coordinates for the pixel
relative to the center of the camera and an
component, which contains colour information, where
A is the alpha channel.
To create the safety zones, the principle whereby an initial reference recording of the room is created to which each measurement is compared was used. The reference recording contains the distance of each pixel to the center of the camera. The initial scan quality is critical, as any error is transferred to all the other measurements. The Python programming language offers the OpenCV library that enables two images to be compared and find the differences. This work does not compare camera recordings based on changes in the colour component but rather changes in depth. Therefore, it is necessary to convert the Point Cloud data into a form used by the OpenCV library by removing the RGBA element using the method
contained in the NumPy library. The depth change between the reference and the actual recording is defined as
. For this subtraction operation, we use the subtract method included in the NumPy library:
.
with the same dimensions as the image, where information about the RGBA component is eliminated and information about the size of the coordinate change in the
X,
Y,
Z axes is retained for each pixel—i.e.,
.
Figure 6 illustrates the output from the depth camera.
The values in
can be immediately adjusted and possible measurement-related errors eliminated. The maximum change value is set based on the geometric dimensions of the monitored room. If this value is exceeded, it is clear that an erroneous measurement has occurred for the relevant pixel. Another type of undesirable measured value is a very small number—the result of measurement noise. An experimentally minimum change value for a shift in one axis of 0.3 m was set. At this value, all the undesirable noise that occurred was eliminated. This value was also sufficiently high to enable the confident identification of a human and a mobile supply trolley. The last type of undesirable value that occurs in this part is NaN (not a number). By observation, it was established that these values tended to occur at the edges of the recording. All these types of undesirable values are a negligible part of the depth recording and can be eliminated without adversely affecting the image recognition. The elimination of the undesirable measurement values was performed by replacing them with the value 0. The zero values were acquired by pixels where there was no change. After the adjustment,
contains real values ranging from 0.3 m to the room’s maximum distance. As the RGB component of each image contains values from 0 to 255, scaling is required using the following relationship:
where 0 is the lower value of the new interval, 255 is the upper value of the new interval, and
and
are the limit values measured based on the geometry of the room. After rounding to whole numbers and the conversion to grayscale, it is possible to proceed to the threshold. The threshold function compares the pixel value with the
. If this value is higher than the
, the pixel value is replaced. This replacement creates a new image consisting of only two colours. The OpenCV library provides several threshold types. After testing various threshold methods, the most suitable choice appears to be Otsu’s binarization. Otsu’s binarization works with bimodal images. These are images whose histogram has two peaks. According to [
53], the threshold value is set as the mean value between these two peaks.
As mentioned in [
54], Otsu’s algorithm tries to find a
, which minimizes the weighted within-class variance given by the equation:
where
and
are the probabilities separated by a threshold
t,
and
are the variances,
and
are the mean,
is the probability,
i is the quantity of pixels, and
I is the maximum pixel value (255).
The OpenCV library also includes another essential function:
. A contour is a curve connecting all the continuous points with the same colour. This function requires three parameters: a grayscale image, a hierarchy between contours, and an approximation method. The contour hierarchy is the relationship between the parent (external) and child (internal) contour. In this work, we selected the parameter
for the hierarchy. This setting only takes parent contours into account, as it is necessary to identify an object as a whole. The last parameter determines whether the contour should remember all the boundary points. This option was not needed, so the approximation method
, which only remembers the two endpoints, was selected.
Figure 7 shows the highlighting of an object that entered the workplace after threshold application, and
Figure 8 shows the same identified object on the depth recording of the robotic workplace.
Once an object is identified, it is possible to proceed to the computation of each pixel’s distance to the robotic workplace and establish the shortest distance. It is necessary to find the shortest distance for each object as a whole. Each created contour represents one object. The algorithm cyclically scans all the values of the pixels in the contour and calculates the distance of the given pixel
D to the center of the robotic workplace using the formula:
where
,
, and
represent the displacement for each pixel read from
and
,
, and
are the distances between the center of the camera and the center of the robotic workplace. The camera is 2.5 m above the ground and 0.9 m behind the robotic workplace, and, in addition, is rotated at an angle of 45°. The camera axis
is the same as that of the robot axis
and thus
. The values
and
are subsequently computed using Formulaes (
11) and (
12), which stem from
Figure 9.
Figure 9 shows the relative positions of the robot and camera coordinate systems.
3.3. Human Recognition
After successfully processing the depth image and identifying the objects moving in the robotic workplace, it is vital to establish whether a given object is a human. It is not suitable to apply human recognition to the depth recording. As a stereo depth camera is used, left and right RGB streams are available. The data from only one of the streams are sufficient. The data contain the RGB component of each pixel. The actual identification of humans is performed using the Histogram of Oriented Gradients (HOG). HOG is a feature descriptor that focuses on the structure or shape of an object. Unlike edge features, which can only establish whether a given pixel is an edge, HOG is capable of establishing both an edge and its direction. This is achieved by dividing the image into small regions and a gradient, and the orientation is computed independently for each region. Finally, a histogram is created for each region separately. More information about the functioning of HOG and its application in identifying humans can be found in the work [
55].
The HOG algorithm is part of the OpenCV library. When initializing HOG, it is possible to select a preprepared human detector that has already been trained to identify humans. The used function for detecting humans has four input parameters: , , , and . The parameter defines the step size of the detection window in the horizontal and vertical directions. Smaller parameter values lead to a more accurate result, but also extend the computational time. The parameter indicates the number of pixels in the x and y directions of the detection window. The last parameter, , determines the factor according to which the window is resized in each layer of an image pyramid. Again, the smaller the parameter value, the more accurate the result, but the longer the time needed for the computation. The function outputs two attributes. The first attribute contains information about all the boundary boxes. This information is in the format of the coordinates of the lower left point and the width and height of the individual box. The second attribute is , containing the confidence scores.
The human recognition subroutine is part of the algorithm for identifying objects from the depth recording. Every object that enters the workplace is identified through the change in depth and marked with a contour. The contour is defined using the coordinates of the left lower point and the width and height. The identification of humans alone is not sufficient to verify whether there really is a person in the workplace. A better option is when the same pixels, representing the depth change and human identification, overlap. It is possible to verify overlap using the conditions
or
and
or
, where
and
are the coordinates of the lower left and upper right point of the box, indicating a potential person in the image. Similarly,
and
are the coordinates of the contour bounding the depth change. If none of the conditions apply, there has been an overlap.
Figure 10 and
Figure 11 show the identification of an object from the depth recording and the verification of a person’s presence in the same recording.
3.4. Supply Trolley Recognition Using RTLS
Autonomous supply trolleys are nothing new in the industry. There are usually several such trolleys in a factory, and they all generally look alike. Although a camera can identify them in space, it cannot determine which of the trolleys it sees. The trolleys need to be equipped with unique ID numbers. In such cases, it is possible to apply a location system based on the UWB principle.
The RTLS sends information via an embedded platform about each tag’s ID and location to a computer. The RTLS works with its own coordinate system, and so the coordinate systems need to be unified. A procedure was chosen whereby the camera coordinate system is converted to the RTLS coordinate system. This conversion is performed only for the point with the smallest value for each identified object. The point with the smallest value represents the shortest distance to the robotic workplace. The location data are read from
. Using basic trigonometric functions and the robot’s known location in the RTLS coordinate system, the coordinates of the nearest point are subsequently converted to the RTLS coordinate system. It is crucial to consider several factors during the actual comparison of whether the location from the RTLS is the same as the location from the camera, as shown in
Figure 12. Both systems work with a certain degree of inaccuracy. The inaccuracy of the camera increases with distance, while the RTLS inaccuracy depends on several factors. The inaccuracy is increased in a working environment where wave reflections occur. Another possible factor could be the location of the tag on the supply trolley. As in this application, the RTLS is not used for precise positioning but to identify a supply trolley—accuracy is less critical.
4. Results and Discussion
During the experiments, the workplace’s primary safety elements—including the safety laser scanner—were switched off. The robotic manipulator worked in automatic mode but at a reduced maximum speed. An authorized person supervised the workplace for the whole period to avoid damage to health or property. Due to a lack of space, the workplace was split into three zones: danger, warning, and safe. However, the safety algorithm is not limited by the number of zones. The dynamic safety system’s functionality using the depth camera and RTLS was verified for two simulation scenarios.
The first scenario was the regular operation of the robotic workplace, including its supply. In this option, people were not allowed to enter the robotic workplace, and hence the part of the safety algorithm used to identify humans was omitted, as is shown in Algorithm 1. Trolleys that do not have a tag can enter the workplace and are not detected by RTLS, so using only RTLS to determine the objects in the workplace is not a satisfactory safety source. The objective of this scenario was to verify the reduction in the computational time needed for the safety algorithm. The processing and evaluation of the data from the depth camera were performed on a computer. The computation time needed for the safety algorithm differs depending on the number of objects for which it is necessary to calculate the smallest distance to the robotic workplace. In this scenario, the measurement was carried out separately in a case where supply trolleys were carrying out their tasks at the workplace retained. The second case was where no supply trolleys were at the workplace and the only thing moving was the robotic manipulator. The actual movement of the robotic manipulator is evaluated as movement around the workplace. Therefore, the area in which the manipulator performs its tasks is ignored, resulting in a further reduction in the computation time. The resolution of the depth camera also influences the speed of the safety algorithm.
Figure 13 and
Figure 14 show the time it took for the safety algorithm to process a single image for different depth camera settings. It is also possible to see the percentage deviation between the actual measurement and the reference recording.
The scenario’s objective was to grant an exception for a supply trolley whose movement trajectory was known. The supply trolley had a unique ID and did not supply the given workplace—it only passed through the safety zones. The given trolley did not affect the operation or safety of the robotic workplace. According to the workplace’s safety elements, the manipulator speed should be adapted as soon as the trolley entered the safety zone. The granting of the exception meant that even if the trolley passed through the safety zone, the robotic workplace’s speed did not change. Of course, this applies only if no other object enters the zone. A trolley supplying the given workplace directly interferes in the production process and does not have an exception at this workplace, so the speed changed in this case. Many factors are taken into account when evaluating exceptions, including the size of the material supplied, the distance between the supply route and the robotic system, and even the possibility of collision. Hence, a safety technician must be responsible for granting exceptions in practice. The presented safety system results in an increase in the rate of work without the need for downtime.
The second simulation scenario anticipates the presence of a human at the workplace in addition to supply trolleys. The primary part of the safety algorithm is the same for both scenarios. In this case, there is also an incorporated subroutine for recognizing humans, as is represented by Algorithm 2. The speed and quality of human detection are directly dependent on the entry parameters
and
. These parameter values were experimentally selected:
and
. When setting the parameter values, the emphasis was on ensuring that the camera identified the person in every accessible part of the robotic workplace.On the other hand, these parameter values resulted in an increase in the safety algorithm’s computational time. A longer processing time for each image increases the total response time, which is part of Equation (
3) and thereby increases the size of the zone itself. The security algorithm was tested using two cases, as in the first simulation scenario. In the first case, there were no trolleys or people in the workplace and the robotic manipulator was the only thing that moved. In the second case, supply trolleys moved around the workplace and people randomly passed through the workplace. The experiment was performed with different camera settings. The results in
Figure 15 and
Figure 16 show the computing time per cycle and, in addition, the deviation parameter between the reference recording and the actual measurement.
Algorithm 1 Security algorithm without human recognition |
Input: Stereo stream from the depth camera and tag position from the RTLS system. Output: Robotic system speed (%). - 1:
Initialize depth camera parameters (resolution) and RTLS system (OPC UA). - 2:
Set , , , , , and as the known dimensions between the depth camera (C), robotic system (R), and beginning of the RTLS coordinate system. - 3:
Save the reference depth image into . - 4:
while the robotic workplace is switched on do - 5:
Save actual depth image into . - 6:
Determine . - 7:
Convert to grayscale. - 8:
Eliminate initial measurement errors. - 9:
Scale the values using Equation ( 4). - 10:
Convert into a binary image using the function . - 11:
Contour search with the function . - 12:
Perform distance calculation between each value in the contour and the robotic system . - 13:
Convert coordinates to RTLS system coordinates. - 14:
Find all contours where the RTLS system coordinates match with the contour coordinates and the RTLS tag ID is set to have an exception. - 15:
if no other contours are found then - 16:
Set . - 17:
else - 18:
find of the remaining contours. - 19:
if ≤ then - 20:
Set for the danger zone. - 21:
else if ≥> then - 22:
Set for the warning zone. - 23:
else - 24:
Set for the safe zone. - 25:
end if - 26:
end if - 27:
end while
|
Algorithm 2 Security algorithm with human recognition |
Input: Stereo stream from the depth camera and tag position from the RTLS system. Output: Robotic system speed (%). - 1:
Initialize depth camera parameters (resolution) and RTLS system (OPC UA). - 2:
Set , , , , , and as the known dimensions between the depth camera (C), robotic system (R), and the beginning of the RTLS coordinate system. - 3:
Save the reference depth image into . - 4:
while the robotic workplace is switched on do - 5:
Save actual depth image into . - 6:
Save actual image into . - 7:
Search for the coordinates of potential human in the picture using function - 7:
. - 8:
Assign coordinates to and . - 9:
Determine . - 10:
Convert to grayscale. - 11:
Eliminate initial measurement errors. - 12:
Scale the values using Equation ( 4). - 13:
Convert into a binary image using the function . - 14:
Perform contour search with the function . - 15:
Assign contour coordinates to and . - 16:
Verify that the contour corresponds to the identified person -
. - 17:
if then - 18:
Set . - 19:
end if - 20:
Calculate the distance between each value in the contour and the robotic system , for . - 21:
Find the of each contour. - 22:
Convert and coordinates to RTLS system coordinates. - 23:
Find all contours where the RTLS system coordinates match with the contour coordinates and the RTLS tag ID is set to have an exception. - 24:
if no other contours are found and then - 25:
Set . -
else - 26:
find of the remaining contours. - 27:
if ≤ then - 28:
Set for the danger zone. - 29:
else if ≥> then - 30:
Set for the warning zone. - 31:
else - 32:
Set for the safe zone. - 33:
end if - 34:
end if
|
The granting of an exception to transit or enter the safety zone worked on the same principle as in the first simulation scenario. In this case, however, it was ensured that the person’s location took priority during the granting of the exception. This change means that an exception cannot be granted for a supply trolley if a person enters a safety zone with a trolley in it. Consider safety zone 1 as the nearest to the robotic system and safety zone 3 as the farthest from it. If a supply trolley with an exception is in safety zone 2 and a person is in safety zone 3, then the trolley will be granted an exception. We have also considered that the person will have a tag on them, and this tag has an exception. In this case, the algorithm will cancel the exception and evaluate the situation in the same way as when a person without a tag enters the workplace. This simulation scenario allocates a higher priority to a person than to other objects.
Both scenarios were tested for the presence of multiple people or trolleys at the same time. The algorithm can identify several objects simultaneously. The more objects there are in the workplace, the more distances between the object and the robotic workplace must be calculated by the algorithm, which is reflected in the speed of the algorithm. However, the safety algorithm is not intended for human–trolley interaction.
A switch to a higher camera resolution causes an increase in measurement or background noise, which can be seen on the graphs on
Figure 13,
Figure 14,
Figure 15 and
Figure 16. The Testing workspace does not have adequate protection against adverse lighting effects. The depth camera’s principle is based on passive stereo, therefore the sunlight strongly affects the lightning conditions. This impact is shown in the second graph in
Figure 15, where the brightness changes continuously during the measurement, together with external lighting conditions interfering with the robotic workplace. Meanwhile, the graph in
Figure 17 shows the effect of outdoor lighting condition changes during the day when comparing the reference image with the actual image. The reference record was made in the morning when the sun was shining into the room. A subsequent comparison of the current record with the reference shows that the sun’s rays stopped entering the room at around 09:30. Between 09:30 and 14:30, the lighting conditions were relatively constant, and sunset began around 14:30. To solve this problem, it is necessary to add a reference image refresh function after some time when no moving objects are detected. This reference refresh could solve the issue of significant light differences over time in workplaces. A more extended algorithm computation time means an increase in the safety system’s total reaction time and thus a necessary enlargement of the safety zones.
The evolution of computational time per cycle with respect to image quality is shown in
Figure 18. Based on the approximation of the data obtained from the measurements, it is possible to predict the average calculation time for the depth camera’s 2.2K resolution. The prediction is shown in
Figure 18 and should be around 0.66 s without HOG and around 1.89 s with HOG. A higher camera resolution does not allow for the quick detection of a person’s presence within the monitored workspace. However, this deficiency can be compensated for by increasing the safety zone.
Human recognition around the robotic workplace was verified for various scenarios.
Figure 19a–c aims to identify a person in different lighting conditions, different positions in the room, and different clothing colours. Meanwhile,
Figure 19d–f focus on identifying a person who is not fully visible in the camera stream. We found that the HOG algorithm can just as successfully identify moving people as static ones. Identification problems occurred when the person was directly under the camera, regardless of whether they were moving or not. The reason for this is that, when viewed from above, human proportions are lost from the camera’s view.
The speed and accuracy of the RTLS have a significant impact on the correct identification of the supply trolley. Based on the advice in [
56], three measurement points were selected for testing the accuracy. These points reflect the most common cases that can occur. The positions of the measurement points are shown in
Figure 4. During measurement, the tag was placed on a static supply trolley. Point A on
Figure 4 represents an ideal case when no object is causing interference in the proximity of the tag other than the trolley—clear line of sight. The most common variant in practice is that around the tag there are objects causing interference, such as metals—measurement point B. The last point, C, presents a case where the tag is very close to one of the anchors.
Different UWB communication settings result in different accuracies and total measurement times. Key parameters are the bitrate and preamble length. Parameters such as channel and pulse repetition frequency do not significantly affect the accuracy and measurement speed.
Table 3 compares the differences between the different UWB settings. The bitrate of 110 kbit/s and preamble length of 2048 symbols represent slow communication with a large amount of data; the opposite values and behaviors are shown for the bitrate of 6810 kbit/s and the preamble length of 64 symbols. The last option is something between these two settings. The third column of
Table 3 shows the measurement accuracy at different settings and different locations. The smaller the dispersion, the more accurate the measurement, because the points are closer to each other. The most suitable parameters are 850 kbit/s for the bitrate and 64 symbols for the preamble length, which were also used during the testing of the security algorithms. The update rate did not fall below 35 Hz for these values. As the supply trolley has a maximum speed of 0.65 m/s, the selected UWB parameters are sufficient for the given application. The trajectory of the trolley movement during the measurement did not affect its identification by RTLS.
5. Conclusions and Future Work
The contribution and novelty of this paper can be seen in the practical feasibility of the approach proposed and its improvement on the state-of-the-art technologies, as mentioned in the introduction. Thus, we have aimed to improve and expand these concepts and create a different view of the safety of robotic collaborative workspaces. This concept can revamp existing security solutions through the possibility of granting exceptions for moving objects entering danger zones. The downtimes created by incorrect intrusion detection will be reduced, and the RTLS technology can also monitor the flow of material in the process and optimize the trajectories of autonomous trolleys. Thus, the implementation of RTLS has multiple benefits for companies. The whole system can be modified according to the company’s needs. Another advantage is its compatibility with older robotic controllers. Its disadvantages are its lack of safety certification, the need to deploy RTLS (if not already available), and the longer per cycle computing time with higher camera resolutions.
The results of the experiments show how the camera resolution and number of objects in the workplace impact the image processing speed. The results show that lowering the resolution of the depth camera shortens the computation time. Keeping other things equal at the workplace, the computation time increased 9.60 times without HOG and 9.64 times with HOG when the depth camera resolution increased from 672 × 376 to 1920 × 1080. The noise rose from 3% to 10% for the mentioned resolutions. A more extended algorithm computation time means an increase in the safety system’s total reaction time and thus a necessary enlargement of the safety zones.
For the selected workplace and using the lowest camera resolution of 672 × 376, the radius of the danger zone size is 3200 mm, according to Equation (
3). For a 1920 × 1080 resolution, the danger zone’s radius is increased to 5500 mm due to the longer processing time of the depth record.
In summary, we proposed an interesting and viable HRC concept with the application of alternative technologies to improve robotic workplaces’ safety and efficiency. Using a depth camera, safety zones were created around the robotic workplace and objects moving around the workplace were concurrently identified. The depth camera was also used to verify whether an identified object at the workplace is a person by using a Histogram of Oriented Gradients. The unequivocal identification of supply trolleys was achieved with the help of a real-time location system based on ultrawideband technology. The robotic manipulator’s speed of movement was regulated based on the objects identified and their distance from the robotic workplace. The benefit of this method lies in its ability to grant exceptions for supply trolleys that do not disrupt or jeopardize the operation of the robotic workplace, thus avoiding undesirable downtime. Based on our practical experience, we have not seen in industrial applications the interconnection of RTLS technology and depth cameras for the purpose of increasing the safety of robotic workplaces, as mentioned in the article, which we consider to be our main contribution.
Future work could focus on transferring the demanding image processing from a Central Processing Unit (CPU) to a Graphics Processing Unit (GPU), which could significantly reduce the computation time and provide an opportunity to increase the resolution of the processed image. One interesting alternative would be to use a CPU and GPU for simultaneous computation. The depth camera API offers the option to save a captured point cloud in the GPU memory. This would enable us to utilize the GPU’s computational power for some of the computational work, and the result would be sent to the CPU for further evaluation. It would be worth considering using the NVIDIA Jetson platform for image processing and evaluation or using cloud-based computing options for decentralized computations.