Don Herbison-Evans , donherbisonevans@yahoo.com

updated 24 September 2005, 27 May 2017

We all learn early in life that images of nearer objects appear larger than images of more distant objects. This has led to the science of Perspective, and is epitomised by the mathematical technique known as the Perspective Projection.

The basic way of making images of objects that are distant along the z axis appear smaller than those nearer to the origin is to divide all the coordinates of points in each object by their value of z. This actually projects the points along the z axis onto a screen at z=1 with an eyepoint at the origin. Mathematically, this can be subsumed under the use of homogeneous coordinates by :

- transforming all the coordinates with the matrix :
- then dividing all four of the resulting coordinates by the fourth one (thus ensuring that the fourth one is unity)

This matrix may be written as a product :

so for operation on column vectors to the right of the matrix, it can represent a rotation of 90 degrees in the x-y plane followed by a reflection along the x axis.

Similarly, the perspective matrix may be viewed as a product :

so it can represent a rotation of 90 degrees in the z-h (4th dimension) plane followed by reflection along the z axis. It actually performs a transformation of one 3D space into another. The space from z=1 to z=infinity is transformed to fit into the space between z=1 and z=0, and vice-versa. This may be understood perhaps as the effect of the rotation of 90 degrees in the z-h plane. Points on the z=1 plane transform into themselves. Points with negative z are transformed to have positive z, but have the x and y values negated as well. This may be understood as the effects of the reflection along z.

One curious effect is that objects that straddle the z=0 plane are split into two parts.

Different eyepoints, viewing planes, and viewing directions can all be accommodated by performing appropriate similarity transformation on this basic perspective matrix.

The transformation becomes a projection onto the z=1 viewplane by simply discarding all but the x and y values of the coordinates, which then have a 2D origin at the intersection of the z axis with this plane. Further transformations can accommodate other requirements for these two dimensional coordinates

The transformation is affine: it transforms straight lines into straight lines. This can be understood because in transforming the coefficients of a flat plane, its result is another flat plane. As any straight line can be written as the intersection of two such planes, the intersection of the two transformed planes will still be a straight line.

For points between z=1 and z=infinity, the transformation inverts the z order, so that points nearer to infinity are transformed to be nearer to zero. Hence in computer graphics it is convenient also to reflect the result along the z axis, moving them to the volume between z=-1 and z=0, and then add 1 to the z coordinates to make them positive and bound between 0 and +1. Then conventional hidden line and surface algorithms can be used which select for display those items with lower z values. This reflection of course cancels the effects of the z reflection matrix component in the product, leaving provision of perspective as the z-h rotation followed by a translation of 1 along the z axis:

Thus various texts confusingly give totally different matrices as providing the perspective projection, but which one they offer depends on what other subsequent transformations the authors consider desirable. Nevertheless, the basic transformation is the same.

This mathematical formulation of the perspective transformation does have two minor problems.

The first is that the perspective is for viewing from only one point in space: the eyepoint at the centre of the projection, which in this case is the origin of the coordinates. If the observer's eye is placed anywhere else when viewing the image, the image is incorrect, and the 3D effect is strained.

The second problem is that the perspective image is projected mathematically onto a flat plane, whereas the retina of the eye is approximately hemi-spherical. This leads to a distortion around the periphery of images generated by the flat perspective projection. For example, imagine looking at a pair of railway lines while standing at the gate of a level crossing. The lines will converge towards vanishing points at both your left and your right. A little thought will then show that to do this: the projections of the lines must be curved. Our eyes do not actually perform an affine transformation but do something more complicated. Mathematical perspective is only an approximation to real life.