Here is the formulation we use to get the depth estimation from the disparity data.
We now describe the geometry that defines the relationship between two corresponding pixels and the depth of a 3D point. Let us consider a 3D Euclidean point (X,Y,Z)T captured by two cameras and the two corresponding projected pixels p1 and p2. Using the camera parameters (see Equation (
2.18)), the pixel positions p1 = (x1,y1,1)T and p2 = (x2,y2,1)T can be written as
Figure 3.2: Two Aligned cameras capturing rectified images can be be employed to perform the estimation of the depth using triangulation.
The previous relation can be simplified by considering the restricted case of two rectified views so that the following assumptions can be made. First, without loss of generality, the world coordinate system is selected such that it coincides with the coordinate system of camera 1 (see Figure
3.2). In such a case, it can be deduced that C1 = 03 and R1 = I3×3. Second, because images are rectified, both rotation matrices are equal: R1 = R2 = I3×3. Third, camera 2 is located on the X axis: C2 = (tx2,0,0). Finally, and fourth, both cameras are identical, so that the internal camera parameter matrices are equal, leading to

Equation (
3.4) provides the relationship between two corresponding pixels and the depth Z of the 3D point (for the simplified case of two rectified views). The quantity f ⋅tx2∕Z is typically called the disparity. Practically, the disparity quantity corresponds to the parallax motion of objects
1. The parallax motion is the motion of objects that are observed from a moving viewpoint. This can be illustrated by taking the example of a viewer who is sitting in a moving train. In this example, the motion parallax of the foreground grass along the train tracks is higher than a tree far away in the background.
It can be noted that the disparity is inversely proportional to the depth, so that a small disparity value corresponds to a large depth distance. To emphasize the difference between both quantities, we indicate that the following two terms will be used distinctively in this thesis.
Disparity image/map: an image that stores the disparity values of all pixels.
Depth image/map: an image that represents the depth of all pixels.
The reason that we emphasize the difference so explicitly, is that we are going to exploit this difference in the sequel of this thesis. Typically, a depth image is estimated by first calculating a disparity image using two rectified images and afterwards, by converting this disparity into depth values. In the second part of this chapter, we show that such a two-stage computation can be circumvented by directly estimating the depth using an alternative technique based on an appropriate geometric formulation of the framework.
The link is as follows: