I recently worked on camera calibration for stereo vision and for that I studied a little bit about the topic. What follows is my understanding of it.
A camera projects 3D world points onto the 2D image plane. Calibration is the process of finding the quantities that affect this imaging process.
Basically the problem statement of camera calibration is to write projection equations linking known coordinates of a set of 3D points and their projections and solve for camera parameters.
Why it is required
Using a calibrated camera, height and distance of an object can be measured.
Calibration parameters can be used to recover 3D quantitative measures from 2D images.
Precise calibration is required for 3D interpretation of images.
Camera calibration includes 5 intrinsic parameters while 6 extrinsic parameters.
Intrinsic parameters depend only on camera characteristics while extrinsic parameters depend on camera position.
Intrinsic parameters basically tell us the relationship between pixel coordinates and camera coordinates. These can be represented in matrix form as below.
The focal length is the distance between the pinhole and the image plane.
Principal point Offset
The camera’s “principal axis” is the line perpendicular to the image plane that passes through the pinhole. Its intersection with the image plane is referred to as the “principal point,” illustrated below.
The principal point offset is the location of the principal point relative to the film’s (image plane) origin.
Extrinsic parameters basically deal with the camera’s location and orientation in the world. These relate camera position to a known frame. In matrix form, these can be represented as below.
Rotation matrix can also be represented as below.
Three types of rotations are required as extrinsic parameters which are pitch, roll and yaw.
Rotation around the front-to-back axis is called roll.
Rotation around the side-to-side axis is called pitch.
Rotation around the vertical axis is called yaw.
Three values corresponding the translation in x,y and z dimensions are required.
Baseline is basically the translation between two cameras in stereo vision.
Combination of intrinsic and extrinsic parameters is also known as projection matrix. It’s a 3 by 4 matrix represented as below.
P = [R|T]K
If we have the 2D coordinates, then using calibration parameters, we can map to 3D and vice versa using the following equation
The above representation consists of three coordinate systems which are as follows:
World Coordinate System
Camera Coordinate System
Image Coordinate System
I am currently working on object detection and will try to explain my approach soon, if it works, hopefully.