Saturday, December 03, 2011

Reason for homogeneous (4D) coordinates in computer graphics

Homogeneous coordinates are used computer graphics - you can read this statement in every 3D computer graphics related book or article. If you ever asked yourself why this is the case, then you are at the right place...
The reason for this is to handle rotation, scaling and translation in a common way. While you can handle rotation and scaling using 3x3 matrices and vectors with 3 components, translation cannot be processed - unfortunately this is normally not further explained although it is quite easy to see:
Suppose we have a point p with coordinate (x, y, z) and we want to translate it by a distance defined by the translation vector t=(tx, ty, tz).
Of course, what you need to do to get new position p' of point p after translation is to add t to p:
p' = p + t, which means
p'.x = p.x + tx
p'.y = p.y + ty
p'.z = p.z + tz
If you try to create a 3x3 matrix M to perform this operation by p' = M * p, you get into trouble. What we require is:

x'   x + tx   |a b c|   |x|
y' = y + ty = |d e f| * |y|
z'   z + tz   |g h i|   |z|

x + tx = ax + by + cz (1)
y + ty = dx + ey + fz
z + tz = gx + hy + iz

Looking at equation (1) we directly see that a = 1 and we are left with tx = by + cz. But the translation in x-direction must be indepent of y and z coordinates, so b and c must be zero.

x + tx = 1*x + 0*y + c*z = x   => tx = 0
So only the primitive translation by zero is possible. The same applies for the equations for y and z.

Using homogeneous coordinates, that is to add a forth component w = 1 to each point, we can derive a 4x4 matrix for translation without any problems.


x + tx   |a b c d|   |x|
y + ty = |e f g h| * |y|
z + tz   |i j k l|   |z|
1        |m n o p|   |1|

we get e.g. for x-coordinate:

x + tx = ax + by + cz + d, 
so a = 1 and d = tx which leads us to the final translation matrix:

    |1 0 0 tx|
M = |0 1 0 ty|
    |0 0 1 tz|
    |0 0 0 1 |

That's it :-)