Week 6: Cameras, Machine Vision & OpenCV
Info
You should be able to complete all the exercises on this page within a two-hour lab session.
Introduction¶
Aims¶
This week we are finally going to make use of our TurtleBot3's camera, and look at how to work with images in ROS! Here we'll look at how to build ROS nodes that capture images and process them. We'll explore some ways in which this data can be used to inform decision-making in robotics applications.
Intended Learning Outcomes¶
By the end of this session you will be able to:
- Use a range of ROS tools to interrogate camera image topics on a ROS Network and view the images being streamed to them.
- Use the computer vision library OpenCV with ROS, to obtain camera images and process them in real-time.
- Apply filtering processes to isolate objects of interest within an image.
- Develop object detection nodes and harness the information generated by these processes to control a robot's position.
- Use camera data as a feedback signal to implement a line following behaviour using proportional control.
Quick Links¶
- Exercise 1: Using the
rqt_image_view
node whilst changing the robot's viewpoint - Exercise 2: Object Detection
- Exercise 3: Locating image features using Image Moments
- Exercise 4: Line following
Additional Resources¶
- The Initial Object Detection Code (for Exercise 2)
- A Complete Worked Example of the
object_detection.py
Node - A
line_follower
Template (for Exercise 4)
Getting Started¶
Step 1: Launch WSL-ROS
Launch your WSL-ROS environment by running the WSL-ROS shortcut in the Windows Start Menu (if you haven't already done so). Once installed, the Windows Terminal app should launch with an Ubuntu terminal instance ready to go (TERMINAL 1).
Step 2: Restore your work
When prompted (in TERMINAL 1), enter Y
to restore your work from last time1.
Step 3: Launch VS Code
Follow these steps to launch VS Code correctly within the WSL-ROS environment.
Step 4: Launch the Robot Simulation
In this session we'll start by working with the mystery world environment from last week. In TERMINAL 1, use the following roslaunch
command to load it:
TERMINAL 1:
...and then wait for the Gazebo window to open:Working with Cameras and Images in ROS¶
Camera Topics and Data¶
There are a number of tools that we can use to view the live images that are being captured by a robot's camera in ROS. As with all robot data, these streams are published to topics, so we firstly need to identify those topics.
In a new terminal instance (TERMINAL 2), run rostopic list
to see the full list of topics that are currently active on our system. Conveniently, all the topics related to our robot's camera are prefixed with camera
! Filter the rostopic list
output using grep
(a Linux command), to filter the list and only show topics prefixed with /camera
:
TERMINAL 2:
This should provide the following filtered list:
/camera/depth/camera_info
/camera/depth/image_raw
/camera/depth/points
/camera/parameter_descriptions
/camera/parameter_updates
/camera/rgb/camera_info
/camera/rgb/image_raw
/camera/rgb/image_raw/compressed
/camera/rgb/image_raw/compressed/parameter_descriptions
/camera/rgb/image_raw/compressed/parameter_updates
/camera/rgb/image_raw/compressedDepth
/camera/rgb/image_raw/compressedDepth/parameter_descriptions
/camera/rgb/image_raw/compressedDepth/parameter_updates
/camera/rgb/image_raw/theora
/camera/rgb/image_raw/theora/parameter_descriptions
/camera/rgb/image_raw/theora/parameter_updates
The real TurtleBot3 Waffles that we have at the university have a slightly different camera module to that used by the simulated robots that we are working with here. Despite this though, the camera data on our real robots is published to topics using the same ROS message formats as used in simulation, making it fairly straight-forward to transfer nodes that we develop in simulation here onto the real robots, when the time comes2.
The first items in the list of camera topics above tell us that depth information is available here. Much like the real robots, the simulated versions that we are working with here also have a camera module capable of determining depth information as well as simply capturing images. Remember from Week 3 though, that we also have a very capable LiDAR sensor to give us this type of information too, and so we won't really be using the depth capabilities of our camera in this session.
The main thing we are actually interested in here is the RGB images that are captured by the camera, and the key topic that we'll therefore be using this week is:
/camera/rgb/image_raw
Run rostopic info
on this to identify the message type.
Now, run rosmsg info
on this message type to find out exactly what information is published to the topic. You should end up with a list that looks like this:
std_msgs/Header header
uint32 seq
time stamp
string frame_id
uint32 height
uint32 width
string encoding
uint8 is_bigendian
uint32 step
uint8[] data
Questions
- What type of message is used on this topic, and which package is this message derived from?
- Using
rostopic echo
and the information about the topic message (as shown above) determine the size of the images that our robot's camera will capture (i.e. its dimensions, in pixels). It will be quite important to know this when we start manipulating these camera images later on. - Finally, considering the list above again, which part of the message do you think contains the actual image data?
Visualising Camera Streams¶
We can view the images being streamed to the above camera topic (in real-time) in a variety of different ways, and we'll explore a couple of these now.
One way is to use RViz, which can be launched using the following roslaunch
command:
TERMINAL 2:
Once RViz launches, find the camera item in the left-hand panel and tick the checkbox next to it. This should open up a camera panel with a live stream of the images being obtained from the robot's camera! The nice thing about this is that the real-time LiDAR data will also be overlaid on top of the images too!
Close down RViz by entering Ctrl+C
in TERMINAL 2.
Exercise 1: Using the rqt_image_view
node whilst changing the robot's viewpoint¶
Another tool we can use to view camera data-streams is the rqt_image_view
node.
-
To launch this, use
rosrun
as follows:
TERMINAL 2:
This is a nice tool that allows us to easily view images that are being published to any camera topic on the ROS network. Another useful feature is the ability to save these images (as
.jpg
files) to the filesystem: See the "Save as image" button highlighted in the figure above. This might be useful later on in this week's session.
-
Click the drop-down box in the top left of the window to select an image topic to display. Select
/camera/rgb/image_raw
(if it's not already selected). -
Keep this window open now, and launch a new terminal instance (TERMINAL 3).
-
Launch the
turtlebot3_teleop
node, either using the full command or a handy alias:tb3_teleop
! Rotate your robot on the spot, keeping an eye on therqt_image_view
window as you do this. Stop the robot once one of the coloured pillars in the arena is roughly in the centre of the robot's field of vision, then close theturtlebot3_teleop
node and therqt_image_view
node by enteringCtrl+C
in TERMINAL 3 and TERMINAL 2 respectively.
OpenCV and ROS¶
OpenCV is a mature and powerful computer vision library designed for performing real-time image analysis, and it is therefore extremely useful for robotic applications. The library is cross-platform and there is a Python API (cv2
), which we'll be using to do some computer vision tasks of our own during this lab session. While we can work with OpenCV using Python straight away (via the API), the library can't directly interpret the native image format used by the ROS, so there is an interface that we need to use. The interface is called CvBridge, which is a ROS package that handles the conversion between ROS and OpenCV image formats. We'll therefore need to use these two libraries (OpenCV and CvBridge) hand-in-hand when developing ROS nodes to perform computer vision related tasks.
Exercise 2: Object Detection¶
One common job that we often want a robot to perform is object detection, and we will illustrate how this can be achieved by colour filtering to detect the coloured pillar that your robot should now be looking at. In this exercise you will learn how to use OpenCV to capture images, filter them and perform other analysis to confirm the presence and location of features that we might be interested in.
Step 1
- First create a new package in your
catkin_ws/src
directory calledweek6_vision
withrospy
,cv_bridge
,sensor_msgs
andgeometry_msgs
as dependencies. - Then, run
catkin build
on the package and then re-source your environment (as you've done so many times by now!) - In the
src
folder of the package you have just created, create a new Python file calledobject_detection.py
. What else do we need to do to this file before we can run it? Do it now! - Copy the code here, save the file, then read the annotations so that you understand how this node works and what should happen when you run it.
-
Run the node using
rosrun
.Warning
This node will capture an image and display it in a pop-up window. Once you've viewed the image in this pop-up window MAKE SURE YOU CLOSE THE POP-UP WINDOW DOWN so that the node can complete its execution!
-
As you should know from reading the explainer, the node has just obtained an image and saved it to a location on the filesystem. Navigate to this filesystem location and view the image using
eog
.
What you may have noticed from the terminal output when you ran the object_detection.py
node is that the robot's camera captures images at a native size of 1080x1920 pixels (you should already know this from interrogating the /camera/rgb/image_raw/width
and /height
messages using rostopic echo
earlier, right?!). That's over 2 million pixels in total in a single image (2,073,600 pixels per image, to be exact), each pixel having a blue, green and red value associated with it - so that's a lot of data in a single image file!
Question
The size of the image file (in bytes) was actually printed to the terminal when you ran the object_detection.py
node. Did you notice how big it was exactly?
Processing an image of this size is therefore hard work for a robot: any analysis we do will be slow and any raw images that we capture will occupy a considerable amount of storage space. The next step then is to reduce this down by cropping the image to a more manageable size.
Step 2
We're going to modify the object_detection.py
node now to:
- Capture a new image in its native size
- Crop it down to focus in on a particular area of interest
-
Save both of the images (the cropped one should be much smaller than the original).
-
In your
object_detection.py
node locate the line: -
Underneath this, add the following additional lines of code:
-
Run the node again.
Remember
Make sure you close all of these pop-up windows down after viewing them to ensure that all your images are saved to the filesystem and the node completes all of its tasks successfully.
The code that you have just added here has created a new image object called
cropped_img
, from a subset of the original by specifying a desiredcrop_height
andcrop_width
relative to the original image dimensions. Additionally, we have also specified where in the original image (in terms of pixel coordinates) we want this subset to start, usingcrop_y0
andcrop_z0
. This process is illustrated in the figure below:The original image (
This may seem quite confusing, but hopefully the figure below illustrates what's going on here:cv_img
) is cropped using a process called "slicing":
Step 3
As discussed above, an image is essentially a series of pixels each with a blue, green and red value associated with it to represent the actual image colours. From the original image that we have just obtained and cropped, we then want to get rid of any colours other than those associated with the pillar that we want the robot to detect. We therefore need to apply a filter to the pixels, which we will ultimately use to discard any pixel data that isn't related to the coloured pillar, whilst retaining data that is.
This process is called masking and, to achieve this, we need to set some colour thresholds. This can be difficult to do in a standard Blue-Green-Red (BGR) or Red-Green-Blue (RGB) colour space, and you can see a good example of this in this article from RealPython.com. We will apply some steps discussed in this article to convert our cropped image into a Hue-Saturation-Value (HSV) colour space instead, which makes the process of colour masking a bit easier.
-
First, analyse the Hue and Saturation values of the cropped image. To do this, first navigate to the "myrosdata/week6_images" directory, where the raw image has been saved:
TERMINAL 2:
Then, run the following ROS Node (from the
tuos_examples
package), supplying the name of the cropped image as an additional argument:
-
The node should produce a scatter plot, illustrating the Hue and Saturation values of each of the pixels in the image. Each data point in the plot represents a single image pixel and each is coloured to match its RGB value:
-
You should see from the image that all the pixels related to the coloured pillar that we want to detect are clustered together. We can use this information to specify a range of Hue and Saturation values that can be used to mask our image: filtering out any colours that sit outside this range and thus allowing us to isolate the pillar itself. The pixels also have a Value (or "Brightness"), which isn't shown in this plot. As a rule of thumb, a range of brightness values between 100 and 255 generally works quite well.
In this case then, we select upper and lower HSV thresholds as follows:
lower_threshold = (115, 225, 100) upper_threshold = (130, 255, 255)
Use the plot that has been generated here to determine your own upper and lower thresholds.
OpenCV contains a built-in function to detect which pixels of an image fall within a specified HSV range:
cv2.inRange()
. This outputs a matrix, the same size and shape as the number of pixels in the image, but containing onlyTrue
(1
) orFalse
(0
) values, illustrating which pixels do have a value within the specified range and which don't. This is known as a Boolean Mask (essentially, a series of ones or zeroes). We can then apply this mask to the image, using a Bitwise AND operation, to get rid of any image pixels whose mask value isFalse
and keep any flagged asTrue
(or in range). -
To do this, first locate the following line in your
object_detection.py
node: -
Underneath this, add the following:
-
Now, run the node again. Three images should be generated and saved now. As shown in the figure below, the third image should simply be a black and white representation of the cropped image, where the white regions should indicate the areas of the image where pixel values fall within the HSV range specified earlier. Notice (from the text printed to the terminal) that the cropped image and the image mask have the same dimensions, but the image mask file has a significantly smaller file size. While the mask contains the same number of pixels, these pixels only have a value of
1
or0
, whereas - in the cropped image of the same pixel size - each pixel has a Red, Green and Blue value: each ranging between0
and255
, which represents significantly more data.
Finally, we can apply this mask to the cropped image, generating a final version of it where only pixels marked as True
in the mask retain their RGB values, and the rest are simply removed. As discussed earlier, we use a Bitwise AND operation to do this and, once again, OpenCV has a built-in function to do this: cv2.bitwise_and()
.
-
Locate the following line in your
object_detection.py
node: -
And, underneath this, add the following:
-
Run this node again, and a fourth image should also be generated now, this time showing the cropped image taken from the robot's camera, but only containing data related to coloured pillar, with all other background image data removed (and rendered black):
Image Moments¶
You have now successfully isolated an object of interest within your robot's field of vision, but perhaps we want to make our robot move towards it, or - conversely - make our robot navigate around it and avoid crashing into it! We therefore also need to know the position of the object in relation to the robot's viewpoint, and we can do this using image moments.
The work we have just done above led to us obtaining what is referred to as a colour blob. OpenCV also has built-in tools to allow us to calculate the centroid of a colour blob like this, allowing us to determine where exactly within an image the object of interest is located (in terms of pixels). This is done using the principle of image moments: essentially statistical parameters related to an image, telling us how a collection of pixels (i.e. the blob of colour that we have just isolated) are distributed within it. You can read more about Image Moments here and - from this - we can learn that the central coordinates of a colour blob can be obtained by considering some key moments of the image mask that we obtained from thresholding earlier:
M00
: the sum of all non-zero pixels in the image mask (i.e. the size of the colour blob, in pixels)M10
: the sum of all the non-zero pixels in the horizontal (y) axis, weighted by row numberM01
: the sum of all the non-zero pixels in the vertical (z) axis, weighted by column number
Remember
We refer to the horizontal as the y-axis and the vertical as the z-axis here, to match the terminology that we have used previously to define our robot's principal axes.
We don't really need to worry about the derivation of these moments too much though. OpenCV has a built-in moments()
function that we can use to obtain this information from an image mask (such as the one that we generated earlier):
m = cv2.moments(img_mask)
So, using this we can obtain the y
and z
coordinates of the blob centroid quite simply:
cy = m['m10']/(m['m00']+1e-5)
cz = m['m01']/(m['m00']+1e-5)
Notice, that we are adding a very small number to the M00
moment here to make sure that the divisor in the above equations is never zero and thus ensuring that we never get caught out by any "divide-by-zero" errors. When might this be the case?
Once again, there is a built-in OpenCV tool that we can use to add a circle onto an image to illustrate the centroid location within the robot's viewpoint: cv2.circle()
. This is how we produced the red circle that you can see in the figure above. You can see how this is implemented in a complete worked example of the object_detection.py
node from the previous exercise.
In our case, we can't actually change the position of our robot in the z axis, so the cz
centroid component here might not be that important to us for navigation purposes. We may however want to use the centroid coordinate cy
to understand where a feature is located horizontally in our robot's field of vision, and use this information to turn towards it (or away from it, depending on what we are trying to achieve). We'll look at this in a bit more detail now.
Exercise 3: Locating image features using Image Moments¶
Inside the tuos_examples
package there is a node that has been developed to illustrate how all the OpenCV tools that you have explored so far could be used to search an environment and stop a robot when it is looking directly at an object of interest. All the tools that are used in this node should be familiar to you by now, and in this exercise you're going to make a copy of this node and modify it to enhance its functionality.
-
The node is called
colour_search.py
, and it is located in thesrc
folder of thetuos_examples
package. Copy this into thesrc
folder of your ownweek6_vision
package by first ensuring that you are located in the desired destination folder:
TERMINAL 2:
-
Then, copy the
colour_search.py
node usingcp
as follows:
TERMINAL 2:
-
In the same way as last week, you'll also need to copy the
tb3.py
module across from thetuos_examples
package too, as this is used by thecolour_search.py
node to make the robot move:
TERMINAL 2:
-
Open up the
colour_search.py
file in VS Code to view the content. Have a look through it and see if you can make sense of how it works. The overall structure should be fairly familiar to you by now: we have a Python class structure, a Subscriber with a callback function, a main loop where all the robot control takes place and a lot of the OpenCV tools that you have explored so far in this session. Essentially this node functions as follows:- The robot turns on the spot whilst obtaining images from its camera (by subscribing to the
/camera/rgb/image_raw
topic). - Camera images are obtained, cropped, then a threshold is applied to the cropped images to detect the blue pillar in the simulated environment.
- If the robot can't see a blue pillar then it turns on the spot quickly.
- Once detected, the centroid of the blue blob representing the pillar is calculated to obtain its current location in the robot's viewpoint.
- As soon as the blue pillar comes into view the robot starts to turn more slowly instead.
- The robot stops turning as soon as it determines that the pillar is situated directly in front of it (determined using the
cy
component of the blue blob centroid). - The robot then waits for a while and then starts to turn again.
- The whole process repeats until it finds the blue pillar once again.
- The robot turns on the spot whilst obtaining images from its camera (by subscribing to the
- Run the node as it is to see this in action. Observe the messages that are printed to the terminal throughout execution.
- Your task is to then modify the node so that it stops in front of every coloured pillar in the arena (there are four in total). For this, you may need to use some of the methods that we have explored so far this week.
- You might first want to use some of the methods that we used to obtain and analyse some images from the robot's camera:
- Use the
turtlebot3_teleop
node to manually move the robot, making it look at every coloured pillar in the arena individually. - Run the
object_detection.py
node that you developed in the previous exercise to capture an image, crop it, save it to the filesystem and then feed this cropped image into theimage_colours.py
node from thetuos_examples
package (as you did earlier) - From the plot that is generated by the
image_colours.py
node, determine some appropriate HSV thresholds to apply for each coloured pillar in the arena.
- Use the
- Once you have the right thresholds, then you can add these to your
colour_search.py
node so that it has the ability to detect every pillar in the same way that it currently detects the blue one.
- You might first want to use some of the methods that we used to obtain and analyse some images from the robot's camera:
Exercise 4: Line following¶
OK, time for something a bit more interesting now: line following!
- Make sure that your
colour_search.py
node is no longer running, and also close down the Gazebo simulation that is currently active by enteringCtrl+C
in TERMINAL 1. -
In TERMINAL 2 you should still be located in your
week6_vision/src
directory, but if not then go there now:
TERMINAL 2:
-
Then, perform the necessary steps to create a new empty Python file called
line_follower.py
and prepare it for execution. - Once that's done open up the empty file in VS Code.
- Then, have a look at the template Python code for this exercise, and be sure to read the annotations too.
- Copy and paste this into your empty
line_follower.py
file and save it. -
Now, in TERMINAL 1 run a new simulation:
TERMINAL 1:
Your robot should be launched into an environment with a coloured track painted on the floor:
-
In TERMINAL 2, run the
line_follower.py
node as it is (usingrosrun
) and see what happens... Not very impressive eh?! That's where you come in! There are a few issues with this that you will need to address:- Image Cropping: You might want to have a look at the cropping that is currently being performed to hone in on a particular region of the robot's viewpoint... does this look optimal? Could it be improved?
- HSV Thresholds: Do these look appropriate for detecting the colour of the line? Determine some appropriate HSV thresholds to apply using methods that you have used earlier in this session.
- Velocity Limiting: The Node uses a proportional controller. Essentially, we specify a proportional gain (
kp
), and multiply this gain by a position error to obtain an angular velocity. What happens if this calculation outputs an angular velocity that is excessive, or is larger than the maximum angular velocity that our robot can achieve? Consider, therefore, doing some error checking on theang_vel
calculation output to ensure that any angular velocity commands are kept within some sensible limits before being published to the/cmd_vel
topic. - Tuning the Gain: PID gains need to be tuned in order to be appropriate for a particular system. Make adjustments to the proportional gain parameter (
kp
) until a sensible response is achieved, and the robot follows the line effectively. -
Conditions where the Line Can't be Seen: The angular velocity is determined by considering the
y
coordinate of a colour blob centroid. What happens in situations where the blob of colour can't be seen in the robot's field of vision though? What influence would this have on the velocity commands that are published to the/cmd_vel
topic? Consider a situation where the robot can't see the line to begin with... Try launching the robot in the arena in a different location instead, and think about how you might approach this situation:
TERMINAL 1:
-
Stopping: Finally, what happens when the robot reaches the finish line? How could you add additional functionality to ensure that the robot stops when it reaches this point? What features of the arena could you use to trigger this?
Wrapping Up¶
In this session you have learnt how to use data from a robot's camera to extract further information about its environment. The camera allows our robot to "see" and the information that we obtain from this device can allow us to develop more advanced robotic behaviours such as searching for objects, follow things or - conversely - moving away or avoiding them. You have learnt how to do some basic tasks with OpenCV, but this is a huge and very capable library of computer vision tools, and we encourage you to explore this further yourselves to enhance some of the basic principles that we have shown you today.
Saving your work¶
Remember, the work you have done in the WSL-ROS environment during this session will not be preserved for future sessions or across different University machines automatically! To save the work you have done here today you should now run the following script in any idle WSL-ROS Terminal Instance:
This will export your home directory to your University U: Drive, allowing you to restore it again later.