In this section, we show how to estimate the target’s 3D position with motion stereo vision. Firstly, the proposed flow of the robot’s movement and calculation is described. Following this, a practical estimation is conducted to verify the proposed approach.
3.1. The Proposed Flow of Robot Movement and Calculation
First, the static mapping between the 3D real-world coordinates and the 2D image plane is shown in
Figure 5, where
f is the focal length and the pinhole camera is assumed. There are three coordinate systems in
Figure 5, and each has its coordinate origin.
The projection between the target object
Q (
QX,
QY,
QZ) in the real world and the
P point (
PX,
PY) in the image can be derived by simple triangulation [
14], as shown in Equation (1). It is worth mentioning that the
P point has the same coordinate values in direction
X and
Y for both real world coordinates and image plane coordinates, that is,
and
.
Equation (1) indicates the importance of
Qz, the distance in
Z direction. If we can obtain the coordinates of the
P point in the image plane, we can get all of the coordinates of the
Q point through
Qz. However, as it is well known, it is hard to obtain the distance to an object just by a single camera. One more transformation is also needed between the image plane coordinates and the pixel plane due to camera pixel specification , as shown in Equation (2):
where
α is defined as the pixel number in 1 cm.
To estimate the distance of a static object from a single camera, we need to conversely move the robot to show the distance information on the image plane, or the visual interface.
Figure 6 shows the proposed flow of robot movement in order to obtain the motion stereo vision for the target’s 3D position. The user’s touchscreen is designed to have a vertical line in the middle of the screen. The proposed steps are outlined as follows:
Step 1: The user has to control the robot to face the target, that is, the center of the target should be located on the vertical line on the touchscreen. Then the user must click on the target and the system will check and record this point. Once the check has passed, the process will be successfully activated.
Step 2: The robot automatically rotates θ degree to the left, and the target will move to the right on the tablet screen.
Step 3: The robot automatically advances forward towards distance d, and the target will move to the right on the tablet screen. The user needs to click on the target on the screen again to end the process, and the target’s 3D coordinates will be estimated.
Figure 7 clearly depicts the robot movement viewed from above. The starting point
O is the projection center of the camera in the mobile robot. In Step 2, the robot rotates
θ degrees to the left, and the target points
Q1 and
Q2 will project to the same position with respect to coordinate
X. Only when the robot continues to advance towards the target in Step 3 will the target points
Q1 and
Q2 separate on the image plane in the
X direction. Let us take target
Q1 for example, where
w is the distance to the middle of the new image plane. The
w value contains the distance information for targets in the
Z direction.
Firstly, the distance from starting point
O to
Q1 can be calculated by:
The distance
can be calculated by:
Moreover, angle
φ can be calculated by:
Finally, we can finish the calculation by obtaining angle
β through the following equation:
Based on the above derivation, it will be easier to obtain the 3D position of target
Q1 with respect to origins
O or
ONEW. As shown in
Figure 7,
PO and
PN are the projected points of
Q1 in the original pixel plane and the new pixel plane, respectively. The respective coordinates (
,
), where
, and (
,
) are recorded during the user’s two clicks in Step 1 and Step 3. Then, the real-world coordinate of
Q1 with respect to origin
O will be:
where the
Y coordinate value is obtained by Equations (1) and (2), as shown below:
We can also get the real-world coordinate of
Q1 with respect to origin
ONEW. First, the
Z direction will be changed as
ZNEW direction, as shown in
Figure 7. The distance of
Q1 in
ZNEW direction can be calculated by:
Then by applying Equations (1) and (2), we can get the real-world coordinate of
Q1 with respect to origin
ONEW as in the following equation:
Finally, let us note that for
and
, their respective
Y coordinate values which represent the height of the target, should be the same. So we have the following equation by Equations (7) and (10):
3.2. Measurement Verification
To verify the proposed prototype method and the derived calculation, a tablet with a 9.7 inch display and a 960 × 720 pixels camera was applied as the visual control interface.
Figure 8 shows the tablet on top of a metal box which was sandwiched between two bookshelves. Using a long ruler and protractor, we moved the box to simulate the movement and rotation of the mobile robot.
First, the parameter
α is about 48.7 pixel/cm according to the tablet screen size and the pixel specification (960 × 720 pixels) of the camera. Furthermore, the measurement of focal length
f is necessary for a calibrated camera system. In our measurement, a room was plastered with stickers at different distances on the floor, with a movable vertical pole labelled with different heights. Without loss of generality, we will focus on
in Equation (1) to estimate focal length
f, so
f can be calculated by
. In
Figure 9, the vertical pole was put at distances ranging from near to far away, with
Qz ranging from 100 cm to 250 cm, but
QX was kept at 36 cm. The distance between the white double arrow in each picture indicates
(pixel) or
PX (cm). All of the data for the four different cases is shown in
Table 1. The average
f was calculated as 24.03, which is adopted in the subsequent calculations.
Next, we will verify the proposed flow for estimating the 3D position of a target object. The
θ is set to 12° and
d is 20 cm. For simplicity, we will calculate
in Equation (3) for the following verification. For the first step in the proposed process, we put the vertical pole in the middle of the screen. Then in the Step 2, the robot was rotated 12° to the left and the vertical pole moved to the right on the screen. Finally, as the robot advanced 20 cm in Step 3, the vertical pole moved to the right again.
Figure 10 shows two extreme cases for the target distance measurement after Step 3 was finished, where
equals 50 cm and 250 cm, which corresponds to the nearest and farthest cases, respectively. The distance between the white double arrows in each picture indicates
(pixel) or
w (cm).
Using the derived equations from the last subsection, the step by step procedure of
calculation for the target’s 3D estimation algorithm can be summarized as:
Given constant values: θ, d, α, f.
Get the coordinates (, ) and (, ) at the start (Step 1) and end (Step 3) of the proposed approach. The variable w equals to .
Calculate β by Equation (6).
Calculate φ by Equation (5).
Calculate by Equation (4).
Calculate by Equation (3).
Estimation for the 3D coordinates of target Q1 with respect to origins O or ONEW can be accomplished by Equations (7) and (10).
Table 2 shows the input
and the resulting parameters and estimated
for the two cases in
Figure 10. It seems that the estimation result meets the expectation that the farther away the target is, the larger the error will be, which may be true for any 2D image method used in 3D distance estimation.
Let us review
Figure 6 and
Figure 7 again, which show the whole process and mathematical derivation of the target’s 3D position estimation. To make the process work, “the robot automatically rotates
θ degree to the left” in Step 2 is an important step. However, the key point in Step 2 is that the projection center of the camera, the
O point in
Figure 7, must be kept at the same point after the rotation of the mobile robot. As mentioned earlier, we can check this by observing target points at different distances, like
Q1 and
Q2 in
Figure 7, and these should project to the points on the screen with the same
X coordinate value. But in practice, it may be hard to drive a mobile robot to achieve this. So, in practice, Step 2 and Step 3 can be combined to first obtain the final correct position of the mobile robot by manual operation. After the final ideal position of the robot is obtained, we can then begin designing a method to make the mobile robot reach the correct position automatically.
To implement the automatic process for target’s 3D position estimation, in addition to the above points for Step 2 and Step 3, we should also reconsider the hardware design of the mobile robot. For example, to improve the accuracy, the DC motor which controls the wheels may need to be replaced by a stepping motor. Furthermore, the odometer data often includes some errors due to tire slips, so high friction tires may be necessary.
A remote-controlled mobile robot with video transmission is practical and effective when the operator is unable to see the environment or visibility is poor, but when the mobile robot is fitted with a robotic arm, it could be difficult and time-consuming for an inexperienced user to manipulate the robotic arm to grasp an object. With the proposed target’s 3D position estimation, the user can activate the automatic process when approaching the target, then the mobile robot and robotic arm may reach the appropriate position by inverse dynamics in a short time automatically. Therefore, it has functional applications for robotic grasping tasks, such as obstacle removal or bomb disposal.
For the proposed Wi-Fi mobile robot design in
Section 2, it appears that communication is smooth when using the video transmission and control commands. However, when the target’s 3D position estimation in this section is added, the computational consumption may cause a time delay between the click on the touchscreen and the motor reaction. Fortunately, the proposed three steps for the target’s 3D position estimation is activated by the user, and then executed by the system in an automatic way. Thus, the interactive design can be incorporated to reduce the effect of the time delay. Finally, the pitch angle of the camera will affect the positioning performance; thus, the camera and the robotic arm should be reset to the standard position once the target’s 3D position estimation is activated to avoid any error caused by the pitch angle of the camera.