Basics before starting with Robotics — Part 5

I am currently trying to learn ROS (Robot Operating System) by reading the book “Programming Robots with ROS A Practical Introduction to the Robot Operating System”. It’s a good read and I have enjoyed it thus far.
What have I learned is as follows:

The ROS Graph

Some of the design goals of ROS are:
1. The application task can be decomposed into many independent subsystems, such as navigation, computer vision, grasping, and so on.
2. These subsystems can be used for other tasks, such as doing security patrols, cleaning, delivering mail, and so on.
3. With proper hardware and geometry abstraction layers, the vast majority of the application software can run on any robot.

These goals can be illustrated by the fundamental rendering of a ROS system: its graph. A ROS system is made up of many different programs running simultaneously and communicating with one another by passing messages. It is convenient to use a mathematical graph to represent this collection of programs and messages: the programs are the graph nodes, and programs that communicate with one another are connected by edges.

To reiterate: a ROS graph node represents a software module that is sending or receiving messages, and a ROS graph edge represents a stream of messages between two nodes.

Among all the traffic flying around a busy network, how do nodes find one another, so they can start passing messages? The answer lies in a program called roscore .


roscore is a service that provides connection information to nodes so that they can transmit messages to one another. Every node connects to roscore at startup to register details of the message streams it publishes and the streams to which it wishes to subscribe. When a new node appears, roscore provides it with the information that it needs to form a direct peer-to-peer connection with other nodes publishing and subscribing to the same message topics. Every ROS system needs a running roscore , since without it, nodes cannot find other nodes.

However, a key aspect of ROS is that the messages between nodes are transmitted peer-to-peer. The roscore is only used by nodes to know where to find their peers.

The ROS architecture is a hybrid between a classical client/server system and a fully distributed one, due to the presence of a central roscore that provides a name service for the peer-to-peer message streams.

When a ROS node starts up, it expects its process to have an environment variable named ROS_MASTER_URI . This variable is expected to contain a string of the form http://hostname:11311/ , which in this case would imply that there is a running instance of roscore accessible on port 11311 somewhere on a host called hostname that can be accessed over the network.

Port 11311 was chosen as the default port for roscore because it was a palindromic prime that was not being used by other popular applications in the early days of ROS, circa 2007. It has no particular significance. Any user space port number (1025–65535) can be used instead. Different ports can be specified in the roscore startup command and in the ROS_MASTER_URI environment variable to allow multiple ROS systems to coexist on a single network.

With knowledge of the location of roscore on the network, nodes register themselves at startup with roscore and then query roscore to find other nodes and data streams by name. Each ROS node tells roscore which messages it provides and which it would like to subscribe to. roscore then provides the addresses of the relevant message producers and consumers.

roscore also provides a parameter server, which is used extensively by ROS nodes for configuration. The parameter server allows nodes to store and retrieve arbitrary data structures, such as descriptions of robots, parameters for algorithms, and so on.

catkin, Workspaces, and ROS Packages

catkin is the ROS build system: the set of tools that ROS uses to generate executable programs, libraries, scripts, and interfaces that other code can use.


catkin comprises a set of CMake macros and custom Python scripts to provide extra functionality on top of the normal CMake workflow. CMake is a commonly used open source build system.

There are two files, CMakeLists.txt and package.xml, that you need to add
some specific information to in order to have things work properly. You then call the various catkin tools to generate the directories and files you’re going to need as you write code for your robots.


A workspace is simply a set of directories in which a related set of ROS code lives. You can have multiple ROS workspaces, but you can only work in one of them at any one time. The simple way to think about this is that you can only see code that lives in your current workspace.

Start by making sure that you’ve added the system-wide ROS setup script to your .bashrc file or source the file by hand:

source /opt/ros/kinetic/setup.bash

Now, we’re going to make a catkin workspace and initialize it:

mkdir -p ~/catkin_ws/src
cd ~/catkin_ws/src

This creates a workspace directory called catkin_ws (although you can call it anything you like), with a src directory inside it for your code. The catkin_init_workspace command creates a CMakeLists.txt file for you in the src directory, where you invoked it. Next, we’re going to create some other workspace files:

cd ~/catkin_ws

Running catkin_make will generate a lot of output as it does its work. When it’s done, you’ll end up with two new directories: build and devel. build is where catkin is going to store the results of some of its work, like libraries and executable programs if you use C++. devel contains a number of files and directories, the most interesting of which are the setup files. Running these configures your system to use this workspace, and the code that’s (going to
be) contained inside it. Assuming you’re using the default command-line shell ( bash ) and are still in the top-level directory of your workspace, you can do this with:

source devel/setup.bash

Congratulations! You’ve just created your first ROS workspace.

If you open a new shell (or Linux terminal), you have to source the setup.bash file for the workspace you want to work with. If you don’t do this, then the shell won’t know where to find your code. This can be annoying, since it’s an easy thing to forget. One way to get around this if you only have one workspace is to add the source ~/catkin_ws/devel/setup.bash command to your .bashrc file (with the appropriate filename, of course). This will automatically set up your workspace for you when you open a new shell.

ROS Packages

ROS software is organized into packages, each of which contains some combination of code, data, and documentation.

Packages sit inside workspaces, in the src directory. Each package directory must include a CMakeLists.txt file and a package.xml file that describes the contents of the package and how catkin should interact with it. Creating a new package is easy:

cd ~/catkin_ws/src
catkin_create_pkg my_awesome_code rospy

This changes the directory to src (where packages live) and invokes catkin_create_pkg to make the new package called my_awesome_code , which depends on the (already existing) rospy package.

The catkin_create_pkg command makes a directory with the same name as the new package (my_awesome_code) with a CMakeLists.txt file, a package.xml file, and a src directory in it.

The package.xml file contains a bunch of metadata about your new
package including:
1. The name of your package. You shouldn’t change this.
2. The version number.
3. A short description of what’s in the package and what it’s for.
4. Who’s responsible for maintaining the package and fixing bugs?
5. What license are you releasing the code under?
6. A URL, often pointing at the ROS wiki page for the package.
7. Who wrote the package? One set of tags per author.
8. What dependencies does the package have?
9. This is for information used by other tools external to catkin .

Once you have a created package, you can put your Python nodes in the src directory. Other files go in directories under the package directory, too. For instance, launch files, conventionally go in a directory called launch.


Since ROS has a large, distributed community, its software is organized into packages that are independently developed by community members. A package can be thought of as a collection of resources that are built and distributed together. Packages are just locations in the filesystem, and because ROS nodes are typically executable programs, one could manually cd around the filesystem to start all the ROS nodes of interest.

For example, the talker program lives in a package named rospy_tutorials , and its executable programs are found in /opt/ros/kinetic/share/rospy_tutorials. However, chasing down these long paths would become tiresome in large filesystems, since nodes can be deeply buried in large directory hierarchies. To automate this task, ROS provides a command-line utility called rosrun that will search a package for the requested program and pass it any parameters supplied on the command line.

To run the talker program in the rospy_tutorials package, no matter where one happened to be in the filesystem, one would first start a roscore instance in a terminal emulator window:


Then, in another terminal window, run:

rosrun rospy_tutorials talker

In the terminal with talker , there will be a sequence of timestamp messages printing to the console:

The talker program is the ROS equivalent of the canonical first program whose task is to print “Hello, world!” to the console. In the ROS case, since we are dealing with message streams rather than single statements, talker sends a stream of “hello world” messages 10 times per second, appending the Unix timestamp so that it’s easy to tell that the messages are changing over time. talker prints these messages to the console as well as sending them via ROS to any nodes who are listening.

It is instructive to think about how this is implemented. In Unix, every program has a stream called “standard output,” or stdout . When an interactive terminal runs a “Hello, world!” program, its stdout stream is received by its parent terminal program, which renders the text in a terminal emulator window. In ROS, this concept is extended so that programs have an arbitrary number of streams, connected to an arbitrary number of other programs running on machines anywhere in the network, any of which can start up or shut down at any time.

Therefore, creating a minimal “Hello, world!” system in ROS requires two nodes, with one node sending a stream of string messages to the other nodes. As we have seen, talker will periodically send “hello world” as a text message. Simultaneously, we will start a listener node, which will await new string messages and print them to the console as they arrive. Whenever both of these programs advertise themselves to the same roscore , ROS will connect them.

To create this graph on your own computer, you’ll need three terminal windows. The first two, as before, will run roscore and talker , and the third one will run listener :

rosrun rospy_tutorials listener

Hooray! The talker node is now sending messages to the listener node. We can now use some ROS command-line tools to query the system and understand more about what’s happening. First, we can use the command-line tool rostopic , which is an extremely useful tool for introspecting running ROS systems. Its simplest and most-commonly used subcommand is to print the list of current message topics to the console. While leaving the other three terminals open and running (that is, the terminals with, roscore , talker , and listener ), open a fourth terminal window and launch the ROS Qt-based graph visualizer, rqt_graph :


This will bring up a display that produces renderings. The renderings will not autorefresh, but you can click the refresh icon in the upper-left corner of the rqt_graph window when you add a node to or remove one from the ROS
graph by terminating (e.g., pressing Ctrl-C) or running (via rosrun ) its program, and the graph will be redrawn to represent the current state of the system.

Although rosrun is great for starting single ROS nodes during debugging sessions, most robot systems end up consisting of tens or hundreds of nodes, all running at the same time. Since it wouldn’t be practical to call rosrun on each of these nodes, ROS includes a tool for starting collections of nodes, called roslaunch.

Names, Namespaces, and Remapping

Names are a fundamental concept in ROS. Nodes, message streams (often called “topics”), and parameters must all have unique names. For example, the camera node on a robot could be named camera , and it could output a message topic named image and read a parameter named frame_rate to know how fast to send images.

So far, so good. But, what happens when a robot has two cameras? We wouldn’t want to have to write a separate program for each camera, nor would we want the output of both cameras to be interleaved on the image topic, since that would require all subscribers to image to have logic that separates the image streams.

More generally, namespace collisions are extremely common in robotic systems, which often contain identical hardware or software subsystems to simplify their engineering, such as identical left and right arms, cameras, or wheels. ROS provides two mechanisms to handle these situations: namespaces and remapping.

Namespaces are a fundamental concept throughout computer science. Following the convention of Unix paths and Internet URIs, ROS uses the forward slash ( / ) to delimit namespaces. Just like how two files named readme.txt can exist in separate paths, such as /home/user1/readme.txt and /home/user2/readme.txt, ROS can launch identical nodes into separate namespaces to avoid name collisions.

This avoids a topic name collision, but how could we send these data streams to another program that was still expecting to receive messages on the topic image ? One answer would be to launch this other program in the same namespace as the first, but perhaps this program needs to “reach into” more than one namespace. Enter remapping.

In ROS, any string in a program that defines a name can be remapped at runtime. As one example, there is a commonly used program in ROS called image_view that renders a live video window of images being sent on the image topic. At least, that is what is written in the source code of the image_view program. Using remapping, we can instead cause the image_view program to render the right/image topic, or the left/image topic, without having to modify the source code of image_view !

Because ROS design patterns try to encourage reuse of software, remapping names is very common when developing and deploying ROS software. To simplify this operation, ROS provides a standard syntax to remap names when starting nodes on the command line. For example, if the working directory contains the image_view program, one could type the following to map image to right/image :

./image_view image:=right/image


roslaunch is a command-line tool designed to automate the launching of collections of ROS nodes. On the surface, it looks a lot like rosrun , needing a package name and a filename.

However, roslaunch operates on launch files, rather than nodes. Launch files are XML files that describe a collection of nodes along with their topic remappings and parameters. By convention, these files have a suffix of .launch. For example, here is talker_listener.launch in the rospy_tutorials package:

Each <node> tag includes attributes declaring the ROS graph name of the node, the package in which it can be found, and the type of node, which is simply the filename of the executable program. In this example, the output=”screen” attributes indicate that the talker and listener nodes should dump their console outputs to the current console, instead of only to log files. This is a commonly used setting for debugging; once things start working, it is often convenient to remove this attribute so that the console has less noise.

roslaunch has many other important features, such as the ability to launch programs on other computers across the network via ssh , to automatically respawn nodes that crash, and so on. These features will be described throughout the book as they are necessary to accomplish various tasks. One of the most useful features of roslaunch is that it closes all of its nodes when Ctrl-C is pressed in the console containing roslaunch . Ctrl-C is a common way to force programs to exit on the Linux/Unix command line, and roslaunch
follows this convention by closing its collection of launched nodes and then finally exiting roslaunch itself when Ctrl-C is typed into its console. For example, the following command would cause roslaunch to spawn two nodes to form a talker-listener pair, as described in the talker_listener.launch file listed previously:

roslaunch rospy_tutorials talker_listener.launch

And, equally importantly, pressing Ctrl-C would cause the nodes to exit. Virtually every time you use ROS, you’ll be invoking roslaunch and eventually typing Ctrl-C in the roslaunch terminal(s) to create and destroy various collections of nodes.

roslaunch will automatically instantiate a roscore if one does not exist when roslaunchis invoked. However, this roscore will exit when Ctrl-C is pressed in the roslaunch window. If you have more than one terminal open when launching ROS programs, it’s often easier to remember to launch a roscore in a separate terminal, which is left open during the entire ROS session. Then, you can roslaunch and Ctrl-C with abandon in all other consoles, without risk of losing the roscore tying the whole system together.

The Tab Key

The ROS command-line tools have tab-completion support. When using rosrun , for example, hitting the Tab key in the middle of typing a package name will auto-complete it for you; or, if there are multiple potential completions, pressing Tab again will present you with a list of possible completions. As with many other Linux commands, using tab completion with ROS will save you a massive amount of typing, and help avoid spelling errors when trying to type long package or message names.

tf: Coordinate Transforms

One problem that might not be immediately obvious, but is extremely important, is the management of coordinate frames. Seriously, coordinate frames are a big deal in robotics.

Poses, Positions, and Orientations

Your average item-fetching robot will have a bunch of subsystems, such as a mobile base, a laser scanner attached to the base to allow it to navigate through the world, a camera (visual and/or depth) attached elsewhere to the base to find items to be fetched, and a manipulator arm with a hand that will do the actual grabbing of those items. A really good item-fetching robot might have many more features, but these are already plenty to make coordinate frames an important concern.

Let’s start with the laser on the base. To correctly interpret a range scan produced by the laser, we need to know exactly where on the base the laser is attached. Is it mounted at the front of the base? The back? Is it facing backward? Is it mounted upside-down (which is not uncommon)? More generally, we could ask: what are the position and orientation of the laser with respect to the base?

We actually need to be a bit more careful than that, asking: what are the position and orientation of the origin of the laser with respect to the origin of the base? Before we can talk about physical relationships between components on our robot, we need to pick for each component a coordinate frame of reference, or origin. In general, you can choose the origin arbitrarily, though there’s usually a widely used convention that should be followed. For example, a mobile base should have its origin at the geometric centroid of the base, with the positive x-axis pointing forward, the positive y-axis pointing left, and the positive z-axis pointing up (you could have inferred the z-axis direction because we always use righthanded coordinate systems). Other than following such conventions, the important thing is that everyone understand and agree on (usually via documentation) where each component’s origin is.

Let’s establish some terminology. In our 3D world, a position is a vector of three numbers (x, y, z) that describe how far we have translated along each axis, with respect to some origin. Similarly, an orientation is a vector of three numbers (roll, pitch, yaw) that describe how far we have rotated about each axis, again with respect to some origin. Taken together, a (position, orientation) pair is called a pose. For clarity, this kind of pose, which varies in six dimensions (three for translation plus three for rotation) is sometimes called a 6D pose. Given the pose of one thing relative to another, we can transform data between their frames of reference, a process that usually involves some matrix multiplications.

Restating our earlier question, we need to know: what is the pose (of the origin) of the laser with respect to the pose (of the origin) of the base? That’s not all, of course. And if we’re going to use the base-mounted camera to find items in the environment, then we likely need to know the camera’s pose with respect to the base. If we’re going to use the locations of items found by the camera to send goals to the hand, then we further need to know the pose of the camera with respect to the hand. This case is especially interesting because the camera-to-hand relationship might be changing all the time as the arm moves the hand with respect to the camera. Then you have the mobile base moving around in the world (e.g., defined by a map), so there’s a base-to-world relationship that is also constantly changing.

You will, eventually, want to be able to compute the pose of every component of your robot with respect to every other pose. Some relationships are static (e.g., a laser bolted to a base), while others are dynamic (e.g., a hand reaching to grasp an item). We need to capture and combine all of these relationships, ideally in such a way that we can easily convert sensor data and actuator commands among them, while doing as little math as possible (because if we do the math ourselves, we’ll just get it wrong). Enter tf .


There are many ways to manage coordinate frames and transforms between them. In ROS, continuing with the philosophy of keeping things small and modular, we take a distributed approach, using ROS topics to share transform data. Any node can be the authority that publishes the current information for some transform(s), and any node can subscribe to transform data, gathering from all the various authorities a complete picture of the robot. This system is implemented in the tf (short for transform) package, which is extremely widely used throughout ROS software.

This approach makes a lot of sense when you consider that there’s usually one place where the information for a given transform is most easily acquired or computed. For example, the driver that talks to a robot arm and has direct access to its joint encoder data might be the best node to publish the information about the transform from the start of the arm to the hand at the other end. Similarly, the node that is performing localization of the base with respect to a map is the best authority for the base-to-world transform.

We need names for coordinate frames. In tf , we use strings. The frame of the laser attached to the base might be called “laser” or, if there’s the potential for confusion, “front_laser” . You can pick any names you like, so long as they’re unique (and you should follow established naming conventions wherever they exist).

We also need a message format to use when publishing information about transforms. In tf , we use tf/tfMessage , sent over the /tf topic. You don’t need to know the details of this message, because you’re unlikely to ever manipulate one manually. It’s enough to know that each tf/tfMessage message contains a list of transforms, specifying for each one the names of the frames involved (referred to as parent and child), their relative position and orientation, and the time at which that transform was measured or computed.

Time turns out to be extremely important when we talk about sensor data and coordinate frames. If you want to combine a laser scan from one second ago with a scan from five seconds ago, then you had better keep track of where that laser was over time and be able to convert the scan data between its one-second-ago pose and its five-seconds-ago pose.

We don’t want every node that works with transform data to reinvent the publishing, subscribing, remembering, or computing of transforms. So, tf also provides a set of libraries that can be used in any node to perform those common tasks. For example, if you create a tf listener in your node, then, behind the scenes, your node will subscribe to the /tf topic and maintain a buffer of all the tf/tfMessage data published by other nodes in the system. Then you can ask questions of tf , like: Where is the laser with respect to the base? Or, where was the hand with respect to the map two seconds ago? Or, how does this point cloud taken from the depth camera look in the frame of the laser? In each case, the tf libraries handle all the matrix manipulations for you, chaining together transforms and going back in time through its buffer as needed.

As is often the case for a powerful system, tf is relatively complex, and there are a variety of ways in which things can go wrong. Consequently, there a number of tf -specific introspection and debugging tools to help you understand what’s happening, from printing a single transform on the console to rendering a graphical view of the entire transform hierarchy.

So that’s it for now. See you later.




Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store