You are currently viewing ROS Tutorial: How to use OpenCV in a Robot Pick and Place task for Computer Vision

ROS Tutorial: How to use OpenCV in a Robot Pick and Place task for Computer Vision

Computer Vision is an essential part of robotics. It helps the robot extract information from camera data to understand its environment. Applications range from extracting an object and its position over inspecting manufactured parts for production errors up to detecting pedestrians in autonomous driving applications.

In this article I will show how to use Computer vision in robotics to make a robot arm perform a somewhat intelligent pick and place task. For that we will use the popular and open source library OpenCV. We will learn how to install OpenCV, connect it to ROS and a Kinect depth camera, apply Computer Vision algorithms to extract the necessary information and send it to our robot. This will be a longer article in which I will explain everything in detail to describe how it all works from start to finish. For those of you only interested in specific topics I assembled a list of the content before we dive in.

In this tutorial we will discuss:

Let’s get started!

What is OpenCV and how to install it?

OpenCV was initially started at Intel and later also developed by Willow Garage (that’s where ROS was invented). It provides a collection of optimized Computer Vision algorithms that are portable and easy to use. It is open source and since 2012, the non profit organization OpenCV.org has taken over support.

If ROS is properly installed on your machine, OpenCV should already be installed as well. You can check that by running:

$ pkg-config --modversion opencv

If that doesn’t yield any results you can try:

$ dpkg -l | grep libopencv

If you find that OpenCV is not installed yet, please follow the instructions in the following link.

I work with ROS Melodic which uses OpenCV 3 by default, so that’s the version used in this tutorial. It is also possible to use the newer OpenCV 4 with Melodic but it needs more effort to get it running. One of the additional steps concerns the cv_bridge we are using to convert the images. To use OpenCV 4 you need to clone and compile it from source. This is one of the issues you will run into and there might be more. If you still want to try to set it up with OpenCV 4, you might want to start from this question on stackoverflow.

How to set up and run the Pick and Place task

We want to develop our computer vision solution and apply it in a real robot scenario. Therefore we choose the pick and place task that we also used in the previous tutorials where we worked on a pick and place task with the Moveit C++ interface and on how to do collision avoidance with Moveit and a depth camera.

The below picture shows the setup. The blue box on the left should be placed on the white plate on the right. In the first tutorial we used the C++ interface of Moveit to program the task and hardcoded the positions of the box (start position) and plate (goal position). We also told Moveit the exact position and size of the green obstacle in the middle such that it can find a path around it. In the second tutorial we added a depth camera to automatically detect the obstacle in the middle without telling Moveit about it beforehand. Now we want to use the images we get from the camera and extract the position of the box and the plate automatically by using computer vision algorithms!

We will first go through the setup and run it such that you can see what the program actually does. Then I will explain all the details of the implementation and walk you through the code.

The project consists of three main packages:

  1. The Moveit configuration package for a UR5 robot, a gripper and a Kinect depth camera. If you did the previous tutorials, this package should already be installed. If not, please go to the Quickstart section of the tutorial on how to create a Moveit configuration package and follow the steps. After that, check out the section on how to add a Kinect 3D camera to the environment in the tutorial on collision avoidance and replicate the first two steps.
  2. The OpenCV node that extracts the object positions from the camera image by applying computer vision algorithms and sends them to our application. Later in this article I will explain the exact implementation in detail. Now please go ahead and clone this repository into the src folder of your catkin workspace:
~/catkin_ws/src$ git clone https://github.com/dairal/opencv_services.git

3. The Pick and Place node that contains the actual implementation of the pick and place task. This will be a new package but the implementation will be very similar to the one we used in the previous two tutorials. To install the package, please also clone the following repository into the source folder of your catkin workspace:

~/catkin_ws/src$ git clone https://github.com/dairal/ur5_pick_and_place_opencv.git

Now that we have all the packages we need, go ahead and build your catkin workspace:

~/catkin_ws/src$ catkin_make

In order to run the task we need to execute three commands in our Linux terminal. First, we need to launch our environment to simulate the Robot in Rviz and Gazebo. We do that by running the demo_gazebo.launch file of our ur5_gripper_moveit_config package with two additional arguments:

$ roslaunch ur5_gripper_moveit_config demo_gazebo.launch world_name:=$(rospack find ur5_pick_and_place_opencv)/world/simple_pick_and_place_collision robot_spawn_position_z_axis:=1.21

Hopefully your Gazebo environment will look similar to the picture above and you see the depth camera data visualized in Rviz. Second, we launch the OpenCV node. The node we want to run is called opencv_extract_object_positions. Start it by executing the following command in a new terminal:

$ rosrun opencv_services opencv_extract_object_positions

Apart from the output of the node you will see an image that gets automatically displayed on your machine (More accurately it is multiple images that are constantly replaced by a new one). On this picture you can already see the output of our computer vision algorithms. You might see the dots that are drawn in the center of the box and the plate. These are the features we are extracting from the image. The opencv node is ready to send the extracted positions to our pick and place node. So finally, in our third command, we launch the pick_and_place_opencv node that is contained in the ur5_pick_and_place_opencv package. Therefore please execute:

$ rosrun ur5_pick_and_place_opencv pick_and_place_opencv 

Now you should be able to watch the robot move above the box, pick it up with the gripper, move it above the plate and drop it there. Fingers pressed that everything worked for you.

If you did the previous tutorials you might think well that’s nice but we could already do that before, without applying some fancy-pantsy computer vision algorithms to the camera image. That’s true but now you can try modifying the positions of the blue box and the plate in the setup. However, don’t go too crazy on that. Just shift the box and plate a small bit from their initial starting positions and start the pick and place node again, later we will talk about the reason why we can’t alter the positions too much.

I admit that it is not the most robust application and it might fail at times. More effort would be necessary to make it more robust and reliable. However, the implementation is mainly for educational purposes and I tried to make it as easily understandable as possible so let’s go ahead and I will walk you through every detail of the code.

Creating the OpenCV node

This node should give us the start and goal position of our pick and place task. In our case that’s the position of the blue box and the plate. Let me give you an overview of the steps that are necessary to achieve that:

  1. We need to fetch the camera data that is published by the Kinect camera in our setup. That requires some conversion because OpenCV works with a different image format than the one published by the camera.
  2. We apply our Computer Vision algorithms to the images. I will give you an explanation of the chosen algorithms, explain why I selected them and how to use them.
  3. Afterwards we will be able to extract boundaries and their center from the images.
  4. An intermediate step, which I will explain later, is necessary to extract the object positions of the box and the target.
  5. At this point we have the 2D start and goal positions in the image we got from the camera. However, if we want to send a position command to the robot we need to convert the 2D image position to a 3D position in the coordinate frame of the robot. We will do this transformation using the ROS tf2 package.
  6. After we have the final result we want to send it to the pick and place node. We implement a service call in the OpenCV node that can be called from external modules and gives back the start and goal position.

Now that you have an overview of the necessary steps, lets start to looking at the implementation. You can find the complete code for the OpenCV node in the opencv_extract_object_positions.cpp file which you can view on Github following this link.

As always we start in our main function by initializing the node and creating a node handler:

 // Main function
int main(int argc, char **argv)
{
  // The name of the node
  ros::init(argc, argv, "opencv_services");
   
  // Default handler for nodes in ROS
  ros::NodeHandle nh("");

1. Converting the camera images between ROS and OpenCV with the cv_bridge package

First we need to get the camera data that is published by the Kinect camera. For image subscribing and publishing it is recommended to use the image_transport package. The usage is quite similar to subscribing to the raw sensor_msgs/Image message. We include the package:

#include <image_transport/image_transport.h>

Then we define a constant string called IMAGE_TOPIC that holds the name of the topic the Kinect data is published on:

static const std::string IMAGE_TOPIC = "/camera1/rgb/image_raw";

Next we create an ImageTransport object which is the image_transport equivalent to the node handler and use it to create a subscriber that subscribes to the IMAGE_TOPIC message. With the second argument we define if we only want to get a subset of the images (e.g. 10 would mean we get every 10th published image). We want every sample so we define 1. The third argument image_cb is the name of the function we want to call when we receive a new image.

    // Used to publish and subscribe to images
  image_transport::ImageTransport it(nh);

    // Subscribe to the /camera raw image topic
  image_transport::Subscriber image_sub = it.subscribe(IMAGE_TOPIC, 1, image_cb);

Let’s now have a look at the image_cb function. ROS uses it’s own message format for images defined in sensor_msgs/Image while OpenCV uses a different format implemented as cv::Mat. They are not compatible but luckily there is the cv_bridge package that does the conversion between the formats for us so we happily include cv_bridge in our .cpp file:

 #include <cv_bridge/cv_bridge.h>

Inside the image_cb function we instantiate a cv image pointer cv_ptr. Then we call the toCvCopy function which takes the incoming image message and the type of encoding the message should be converted to as input arguments. After conversion it returns a cv image pointer that we assign to the cv_ptr we created before. If an exception is thrown, we catch it.

  cv_bridge::CvImagePtr cv_ptr;
  try
  {
    cv_ptr = cv_bridge::toCvCopy(msg, sensor_msgs::image_encodings::BGR8);
  }
  catch (cv_bridge::Exception& e)
  {
    ROS_ERROR("cv_bridge exception: %s", e.what());
    return;
  }

  float image_size_y = cv_ptr->image.rows;
  float image_size_x = cv_ptr->image.cols;

Finally, the image we receive from the depth camera looks like this:

2. Applying Computer Vision algorithms to the images

Now we want to extract the information from this image. There are different ways to go about this. You can think of the Computer Vision algorithms like a toolbox that is available to you for getting the information you want. You can combine these algorithms in different ways and there is often more than one way to reach your goal.

Inside the apply_cv_algorithms function (line 66), the first method we want to use is grayscaling. It makes applying further algorithms easier. Also, we don’t need the color information although one could think of getting the box and plate position by extracting certain colors (blue and white, respectively) from the image. This is how grayscaling is done with OpenCV:

  // convert the image to grayscale format
  cv::Mat img_gray;
  cv::cvtColor(cv_ptr->image, img_gray, cv::COLOR_BGR2GRAY);

OpenCV’s cvtColor function converts an image from one color space to another. With the third argument cv::COLOR_BGR2GRAY we choose the conversion to a gray scaled image which is then stored in the img_gray object.

Next, we employ an edge detection algorithm to detect the outlines of the objects. We use the Canny algorithm that extracts edges based on the intensity gradient. This means that it will detect significant differences between pixels next to each other as edges. You can see from the above picture that the box and the plate have different levels of brightness but they both stand out against the background which makes this algorithm especially suited. We apply the algorithm like this:

  cv::Mat canny_output;
  cv::Canny(img_gray,canny_output,10,350);

The cv::Canny function takes four arguments. The first two are input and output image. The third and forth one define threshold values and influence the level of significance of the gradients to be interpreted as edges. Especially the fourth argument defines this sensitivity. In our case we put 350 which is a rather high value. The strong contrast between the objects and the background allows us to set such a high value which has the advantage that we can “filter out” the collision box in the middle.

I won’t get into the details on how exactly the Canny algorithm is working but here you can find more information if you are interested. This is the result we get:

3. Extracting boundaries and their center from the image

Even if the bright regions in the resulting picture look like a solid line, to the computer so far it is only a bunch of pixels with a steep intensity gradient. Inside the extract_centroids function (line 77), we now want to connect these points to get the actual boundaries of the objects. Therefore, we use OpenCV’s findContours function which simply joins all pixels that have the same intensity to a curve:

  // detect the contours on the binary image using cv2.CHAIN_APPROX_NONE
   std::vector<std::vector<cv::Point>> contours;
   std::vector<cv::Vec4i> hierarchy;
   cv::findContours(canny_output, contours, hierarchy, cv::RETR_TREE, cv::CHAIN_APPROX_NONE);

Let’s have a look at the last three argument the findContours function takes. With cv::RETR_TREE we choose the contour retrieval mode. This argument is important if there are shapes that are inside other shapes. If that is the case you might want to know how they relate to each other i.e. which shape is within which one. With the fourth argument we choose how much information about these relations we want to store. The results are then stored in the hierarchy matrix that is instantiated in line 81. Our objects are well separated such that we don’t really care about that part so this is just to give you some context.

With the last argument we can define whether to use some approximation to store the boundaries which would save memory. With setting cv::CHAIN_APPROX_NONE, we don’t. The resulting contours are stored in the contours vector.

Now we found the boundaries but in the end we want to know the object positions. How do we do that? By computing the center of the contours:

  // get the moments
  std::vector<cv::Moments> mu(contours.size());
  for( int i = 0; i<contours.size(); i++ )
  { mu[i] = cv::moments( contours[i], false ); }
  
  // get the centroid of contours.
  std::vector<cv::Point2f> mc(contours.size());
  for( int i = 0; i<contours.size(); i++) {
    float mc_x = mu[i].m10/mu[i].m00;
    float mc_y = mu[i].m01/mu[i].m00;
    mc[i] = cv::Point2f(mc_x, mc_y);
  }

The algorithm is not as complicated as it looks. In line 86 we loop over all the contours we found an compute the image moments. Image moments are the weighted average of the intensities of the pixels in a certain shape. Out of these moments M we can compute the center of the shape (the centroid) as follows:

\[C_x = \frac{M_{10}}{M_{00}} \] \[C_y = \frac{M_{01}}{M_{00}} \]

We do this calculation for all the moments in lines 90 to 94. This is all we need to do in order to get the centroids of our objects.

With the following few lines we draw the contours as well as the centroids and show them in a separate window:

  // draw contours
  cv::Mat drawing(canny_output.size(), CV_8UC3, cv::Scalar(255,255,255));

  for( int i = 0; i<contours.size(); i++ )
  {
  cv::Scalar color = cv::Scalar(167,151,0); // B G R values
  cv::drawContours(drawing, contours, i, color, 2, 8, hierarchy, 0, cv::Point());
  cv::circle( drawing, mc[i], 4, color, -1, 8, 0 );
  }

  cv::namedWindow( "Contours", cv::WINDOW_AUTOSIZE );
  cv::imshow( "Extracted centroids", drawing );
  cv::waitKey(3);

This is the result we get:

The centroids are drawn as small circles and we can see that the centers of our objects are well detected. However, on the bottom of the image we can see that other centers of shapes are detected as well. The final step before we get the results will be to choose the correct ones.

4. Extracting the object positions

For selecting the right centroids we use a simple approach. We just assume that our start position (the blue box) is in the upper right part of the image and our goal position (the plate) is in the upper left part.

For this purpose I wrote a simple function that takes the vector of centroids and a user defined rectangular as input. Then it computes the average of all centroids that are inside the rectangular area. Why we need to compute the average? More than one contour and thus, more than one centroid might be detected for the same object. To tackle this problem, we simply take the average over them. The additional centroids can not be clearly seen in the images because they are very close to each other but they show up in the vector.

cv::Point2f search_centroid_in_area(std::vector<cv::Point2f> centroid_vector, cv::Rect area) {
  float sum_x = 0.0;
  float sum_y = 0.0;
  int number_of_centroids_in_area = 0;

  for( int i = 0; i<centroid_vector.size(); i++) {
    if(centroid_vector[i].inside(area)) {
      sum_x += centroid_vector[i].x;
      sum_y += centroid_vector[i].y;
      number_of_centroids_in_area++;
    }
  }
  cv::Point2f extracted_point(sum_x/number_of_centroids_in_area, sum_y/number_of_centroids_in_area);
  return extracted_point;
}

This function is called twice. Once with the search area for the starting and once with the area for the goal position:

    //get box location in 2d image
  cv::Rect box_search_area((image_size_x/2), 0, (image_size_x/2), 255);
  box_centroid = search_centroid_in_area(centroids, box_search_area);

  //get plate location in 2d image
  cv::Rect target_search_area(0, 0, (image_size_x/2), 255);
  target_centroid = search_centroid_in_area(centroids, target_search_area);

We simply choose the upper left and the upper right quarter as our search areas:

This approach is very simplistic and there are of course more elaborate solutions. We could e.g. try to detect the objects based on their shape or based on their color.

5. Converting the 2D pixel to a 3D position in the world frame

Let’s take a moment to tap ourselves on the shoulder. We successfully obtained the x and y coordinates of the objects in our 2D image! But how do these relate to the robot? In the end we want to command a 3D position to the robot where it should pick or place the object. So before we send the positions to the pick and place node, we need to convert a pixel position in a 2D image to a 3D position in the robot frame. This requires two steps:

  1. We are using the point cloud data to get a 3D position in the camera frame from the 2D pixel information.
  2. We transform the 3D position in the camera frame into the robot frame using the tf2 package.

In order to get the 3D position in the camera frame, we need to use the point cloud data from the Kinect depth camera. Therefore, we subscribe to it in the main function:

  // Subscribe to the /camera PointCloud2 topic
  ros::Subscriber point_cloud_sub = nh.subscribe(POINT_CLOUD2_TOPIC, 1, point_cloud_cb);

The POINT_CLOUD2_TOPIC is defined as "/camera1/depth/points". We get every sample of the data and our callback function is called point_cloud_cb. Let’s have a look at the first part of this function right away:

void point_cloud_cb(const sensor_msgs::PointCloud2 pCloud) {

    geometry_msgs::Point box_position_camera_frame;
    pixel_to_3d_point(pCloud, box_centroid.x, box_centroid.y, box_position_camera_frame);

    geometry_msgs::Point target_position_camera_frame;
    pixel_to_3d_point(pCloud, target_centroid.x, target_centroid.y, target_position_camera_frame);

When we receive a new point cloud we call the pixel_to_3d_point function twice and then print out the results. This function converts the 2D pixel to the 3D position in the camera frame and it looks as follows:

void pixel_to_3d_point(const sensor_msgs::PointCloud2 pCloud, const int u, const int v, geometry_msgs::Point &p)
{
// get width and height of 2D point cloud data
int width = pCloud.width;
int height = pCloud.height;

// Convert from u (column / width), v (row/height) to position in array
// where X,Y,Z data starts
int arrayPosition = v*pCloud.row_step + u*pCloud.point_step;

// compute position in array where x,y,z data start
int arrayPosX = arrayPosition + pCloud.fields[0].offset; // X has an offset of 0
int arrayPosY = arrayPosition + pCloud.fields[1].offset; // Y has an offset of 4
int arrayPosZ = arrayPosition + pCloud.fields[2].offset; // Z has an offset of 8

float X = 0.0;
float Y = 0.0;
float Z = 0.0;

memcpy(&X, &pCloud.data[arrayPosX], sizeof(float));
memcpy(&Y, &pCloud.data[arrayPosY], sizeof(float));
memcpy(&Z, &pCloud.data[arrayPosZ], sizeof(float));

p.x = X;
p.y = Y;
p.z = Z;

}

We need to find the depth information in the point cloud data that is related to our 2D image pixel. The data is stored in one array and we have to find the right element. The parameter pCloud.row_step gives us the amount of bytes in one row while pCloud.point_step tells us the amount of bytes for one point.

To compute our array position in line 154 we first multiply the variable v which is the y coordinate from the pixel position with pCloud.row_step. That means in the array we jump forward to where the row begins that contains our pixel. Then we jump to the correct column in that row by multiplying u (the x coordinate of the pixel) with pCloud.point_step to obtain the arrayPosition.

Every 3D point consists of three coordinates. Each of these coordinates is a float value which is stored in our byte array. One float usually has 4 bytes so the y coordinate would have an offset of 4 because it is stored after the x coordinate. The exact offset per coordinate is stored in the pCloud.fields array and in lines 157 to 159 we add that offset value to the arrayPosition for each coordinate. Now we copy the amount of bytes in a float from the right position in the data array to the coordinates. Then we store them in our geometry_msgs::Point variable p.

In the second part of the point_cloud_cb function we transform the 3D position from the camera frame to the robot (base) frame:

    box_position_base_frame = transform_between_frames(box_position_camera_frame, from_frame, to_frame);
    target_position_base_frame = transform_between_frames(target_position_camera_frame, from_frame, to_frame);

    ROS_INFO_STREAM("3d box position base frame: x " << box_position_base_frame.x << " y " << box_position_base_frame.y << " z " << box_position_base_frame.z);
    ROS_INFO_STREAM("3d target position base frame: x " << target_position_base_frame.x << " y " << target_position_base_frame.y << " z " << target_position_base_frame.z);
}

Lucky for us there is the tf2 transform library which makes transforming between frames easy. Before we use the transform functionality we include some library headers:

 #include <tf2_ros/buffer.h>
 #include <tf2_ros/transform_listener.h>
 #include <tf2_geometry_msgs/tf2_geometry_msgs.h>

Then we create a tf_buffer object:

tf2_ros::Buffer tf_buffer;

In the main function we create a transform listener that takes the tf_buffer object as an argument:

  tf2_ros::TransformListener listener(tf_buffer);

Now we are ready to use the transform functionality. Let’s have a look at the implementation in the transform_between_frames function:

geometry_msgs::Point transform_between_frames(geometry_msgs::Point p, const std::string from_frame, const std::string to_frame) {
    
    geometry_msgs::PoseStamped input_pose_stamped;
    input_pose_stamped.pose.position = p;
    input_pose_stamped.header.frame_id = from_frame;
    input_pose_stamped.header.stamp = ros::Time::now();

    geometry_msgs::PoseStamped output_pose_stamped = tf_buffer.transform(input_pose_stamped, to_frame, ros::Duration(1));
    return output_pose_stamped.pose.position;
}

The function takes the 3D point and the frames in between which we want to transform. The transform function requires a pose of type geometry_msgs::PoseStamped (similar to a point object but with additional information) as input.

We simply construct input_pose_stamped out of the 3D point by adding information about the frame and the time. Then we can call the transform function of the tf_buffer object we created before. It takes the constructed input_pose_stamped, the frame we want to transform to and some timeout value. As a result we get the transformed output pose, also of type geometry_msgs::PoseStamped, from which we return only the position.

Cool, we finally have a 3D position that the robot will understand.

6. Sending the resulting positions as a service call

Remember that we are doing all of this to automatically detect the start and goal positions for our pick and place task. Therefore, we need to communicate the results to our pick and place node. Until now we mostly used topics to exchange data between nodes which is great for many-to-many and one way communication. However, only the pick and place node needs to get the extracted positions upon request once before starting the motions. For this purpose, services are more suitable than topics.

We name our service opencv_services/box_and_target_position and create the following box_and_target_position.srv file inside the srv folder of the OpenCV package:

---
geometry_msgs/Point box_position
geometry_msgs/Point target_position

We also need to add the service in our CMakeLists.txt file to make sure the opencv_node/box_and_target_position.h header file is created:

 add_service_files(
   FILES
   box_and_target_position.srv
 )

After including this header in our .cpp file we can use the service. To invoke it, we advertise it in the main function:

  ros::ServiceServer service = nh.advertiseService("box_and_target_position",  get_box_and_target_position);

The get_box_and_target_position function will be called when another node calls the service:

bool get_box_and_target_position(opencv_node::box_and_target_position::Request  &req,
         opencv_node::box_and_target_position::Response &res) {
           res.box_position = box_position_base_frame;
           res.target_position = target_position_base_frame;
           return true;
         }

In it we simply assign the extracted positions which are stored in the global box_position_base_frame and target_position_base_frame variables to the response we send back to the node that requested the service.

Phew, that was a lot of work! Plenty of steps need to be done in the OpenCV node but now we can finally use it for our pick and place task.

How to call the OpenCV node service from the Pick and Place node

As I mentioned in the beginning we will use nearly the same pick and place implementation as in the last tutorial and only replace the hardcoded position values with the values we receive from the OpenCV node. For this tutorial I will only discuss the lines we need to add compared to the previous tutorial. If you are interested how exactly the pick and place task is implemented you can go through the last two tutorials (Pick and Place task with the Moveit C++ interface and How to use a depth camera with Moveit for collision avoidance). You can also find the new pick and place node in the Github repository. The pick_and_place_opencv.cpp file contains the complete implementation including descriptive comments.

To use the service, we need to add the opencv_node as a dependency in the CMakeLists.txt:

find_package(catkin REQUIRED
  COMPONENTS
    opencv_services
    ...

and in the package.xml:

...
  <build_depend>opencv_node</build_depend>
...
  <exec_depend>opencv_node</exec_depend>

The generated service header needs to be included in the .cpp file:

#include "opencv_node/box_and_target_position.h"

In the main function of the pick and place node we construct a service client box_and_target_position_srv_client for the box_and_target_position service and create the service object in line 79. Now we call the service and, if successful, print out the received positions which are stored in srv.response.

    // Get the box and the target position from the opencv node
    ros::ServiceClient box_and_target_position_srv_client = n.serviceClient<opencv_node::box_and_target_position>("box_and_target_position");

    opencv_node::box_and_target_position srv;
  
    if(box_and_target_position_srv_client.call(srv)) {
      ROS_INFO_STREAM("3d target position base frame: x " << srv.response.target_position.x << " y " << srv.response.target_position.y << " z " << srv.response.target_position.z);
      ROS_INFO_STREAM("3d box position base frame: x " << srv.response.box_position.x << " y " << srv.response.box_position.y << " z " << srv.response.box_position.z);
    } else {
      ROS_INFO_STREAM("Failed to call box and target position service");
    }

Finally we assign the values instead of the hard coded position to our box pose:

    box_pose.position.x = srv.response.box_position.x;
    box_pose.position.y = srv.response.box_position.y;
    box_pose.position.z = srv.response.box_position.z;

And to the target pose:

    target_pose1.position.x = srv.response.box_position.x;
    target_pose1.position.y = srv.response.box_position.y;
    target_pose1.position.z = srv.response.box_position.z + 0.34;

Conclusion

This was it, that’s how you can use OpenCV together with ROS to integrate Computer Vision into your robot application. You saw that it takes quite a bit of work to extract information from a camera image and convert it into useful data for a robot task. The approach used in this tutorial is for sure not the most elegant or most robust one but it shows you one way how to do it from start to finish and that was the goal.

OpenCV is a powerful library with a lot more functionality to discover. In this tutorial I intentionally used classical Computer Vision algorithms but more and more machine learning methods are used in modern Computer Vision. In one of the next tutorials we will discuss how to use OpenCV to employ machine learning methods in your robot application. Until then thank you for sticking with me till the very end and make sure to check out the other tutorials in the Robotics Tutorials overview. As always I’m happy to receive your feedback in the comments! 🙂