How I got my MS Thesis Idea

Camera Calibration through Camera Projection Loss

After Cross View Retrieval, while working on Cyclist Detection, I came across some functions as part of scripts provided by the authors of the Tsinghua-Daimler Cyclist Detection Dataset.

As I wasn’t able to cater to small objects to improve cyclist detection, I thought about indirectly predicting camera parameters while directly converting 2D points to 3D by embedding the above equations in a CNN with the help of Lambda layers.

But first I had to see whether what I was thinking had already been done or not. For this, I performed a literature review.

Overview of configurations for different aspects of Camera Calibration can be summarized as a table.

So after the literature review, I knew that I can go ahead with my idea which is obviously what I did. First thing was to represent mathematical equations in the form of lambda layers.

I named the lambda layer representation as Camera Projection Loss.

After designing the network, we came across another problem which was the unavailability of enough camera configurations with all the required parameters which were 13 in our case. So we decided to generate our own dataset using CARLA. We were able to generate 48 camera configurations using 2 towns having 24 configurations each. For the real dataset, we used the Tsinghua-Daimler Cyclist Detection Dataset.

Our method performed better than other methods on 7 out of 10 parameters on both synthetic and real data.

So that’s it for now, See you later.

References