Basics before starting with Robotics — Part 4


So, I had my presentation at the German Research Center for Artificial Intelligence (DFKI) last Thursday. My aim is to work there along with my Ph.D. as I have heard that it’s possible. Specifically, I have applied for a position on Terrestrial Robotics at Robotics Innovation Center. I was just going through the profile of Dr. Frank Kirchner and found a paper on PAZ, the details of which are as follows:

PAZ is a hierarchical perception software library that allows users to manipulate multiple levels of abstraction in accordance with their requirements or skill level. PAZ is divided into three hierarchical levels including pipelines, processors, and backends. These abstractions allow users to compose functions in a hierarchical modular scheme that can be applied for preprocessing, data augmentation, prediction, and postprocessing of inputs and outputs of machine learning (ML) models. PAZ uses these abstractions to build reusable training and prediction pipelines for multiple robot perception tasks such as 2D keypoint estimation, 2D object detection, 3D keypoint discovery, 6D pose estimation, emotion classification, face recognition, instance segmentation, and attention mechanisms.

The term design stamina hypothesis is used in software engineering to describe the capacity of software to quickly develop additional functionalities given that it contains an appropriate set of internal tools and abstractions. One of the main goals of PAZ is to create internal software structures that satisfy the design stamina hypothesis for perceptual algorithms.

As shown in Figure 1, PAZ focuses on extending multiple models across a diverse set of perception tasks. This broad generality of tasks and models is possible due to the hierarchical-API, which allows users to re-use and construct entirely new functions in a modular scheme.

The main components of each of the hierarchical levels and their
corresponding software abstractions are as follows:


The highest API level, pipelines, contains application-ready functions for 2D object detection, 2D keypoint estimation, 6D pose estimation, emotion classification, data-augmentation, and image pre-processing. The API allows the user to quickly instantiate out-of-the-box functions that can be applied directly to an image.


The high-level API is useful for rapidly creating applications; however, it might not be flexible enough for the user’s specific purposes. Therefore, PAZ builds high-level functions using a mid-level API which allows the user to modify or extend existing pipelines. The abstraction for this mid-level is referred to as a Processor. Processors are meant to perform small computations that can be re-used in other applications or entirely new algorithms. PAZ includes the SequentialProcessor abstraction to sequentially apply processors to a set of inputs. The sequential API reveals some of the flexibility and reusability of PAZ. If for example, a user wishes to input a dictionary or to add a new data augmentation function or a normalization operation one would only need to add a new processor. Furthermore,
PAZ provides an abstract template class for creating any custom new logic. However, an important consideration is that the user can pass any python
function to a pr.Sequential pipeline, and is not constrained to use the Processor base class. Another relevant aspect of the API is that it clearly depicts the processing steps of data into well-separated modules; thus, PAZ creates a programming bias to distribute computation into multiple simple functions. This allows users with limited experience either with programming or with a specific new algorithm to easily adapt, debug, or understand any aspect of its computation.


Processors allow us to easily compose, compress and extract away parameters of functions; however, most processors are built using the low-level API (backend). The backend modules are found in: backend.boxes,, backend.image, backend.keypoints, and backend.quaternion. Each of these modules is meant to be expanded or entirely replaced without affecting the functionality of the higher levels. For example, if a camera contains its own software API, one could wrap this camera-specific API with the fields and methods in order to re-use PAZ’s own specific camera utilities such as real-time prediction visualization or real-time prediction video-recording.

Some additional functionality of PAZ includes the following:

Built-in messages

PAZ includes built-in messages of common prediction types made in perceptual systems. These built-in messages include Box2D, Pose6D, and Keypoints3D. These types allow PAZ users to have an easier data exchange with other robotic frameworks such as ROS or ROCK without having to install any additional software.


PAZ provides a common interface to load multiple datasets related to object
, image segmentation, and image classification. The available datasets within PAZ are OpenImages, COCO, VOC, YCB-Video, FAT, FERPlus, and FER2013.

Automatic batch dispatching

Once a dataset has been loaded we can pass it to the batch dispatcher class (SequenceProcessing), along with any built-in or custom function for preparing processing or data augmentation. The batch dispatcher class instantiates a generator that is ready to be used directly with a scheme.

Some software engineering aspects of PAZ are as follows:

PAZ has only three dependencies: Tensorflow, OpenCV, and Numpy. Furthermore, it has continuous integration (CI) in multiple python versions (python 3.5, 3.6, 3.7, and 3.8). PAZ has unit tests for all high-level application functions along with most of the major backend modules, and it currently has a test coverage of 47%. Additionally, PAZ has automatic documentation generation directly from documentation strings.

The paper is available here.

I will get the notification regarding my interview in a week or two. That’s it for now. See you later.




MS Thesis Student, CVGL, LUMS

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium


AI Services

Running Pytorch-Transformers on Custom Datasets

Analysing paraphrasing from a neural model perspective

Developing an intuition for better understanding of convolutional neural networks

XGBoost Distributed Training and Parallel Predictions with Apache Spark

Machine Learning Assisted Discovery of Novel Sodium-Ion Battery Materials

How to edit images with GANs? Controlling the Latent Space of GANs

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Talha Hanif Butt

Talha Hanif Butt

MS Thesis Student, CVGL, LUMS

More from Medium

For Better AI, Think Outside the Model

Machine Learning Prediction in a Court of Law: SCOTUS Dataset

A Knack for Algebraic Data

Artificial intelligence — A Modern ERA