Core Animation's 3D Model

Published on 31 May, 2010

2D planes in a 3D world, simples.

Core Animation was publicly released along with Mac OS X 10.5 (Leopard) and has been extensively used by many products ever since. In order to take full advantage of the framework, it’s good to know its projection model.

You might have seen several apps which make use of its 3D properties to do some nifty effects (like flipping visual elements, etc). If you tried to recreate those effects, you might have stumbled upon advice similar to:

Set the m34 component of the transformation matrix to a small negative value (e.g., -1.0 / 2000.0) and you will get the desired effect.

Advice like that might satisfy some people but it shows a potential lack of understanding of what’s going on behind the scenes and why it achieves the desired effect.

I will try to shed some light on the details of the projection model employed by Core Animation.

3D Rendering

Before we get started, I assume that you have some familiarity with Core Animation, 3D models, transformations and coordinate systems.

From a very high level, 3D rendering frameworks perform the task of taking 3D points and producing a 2D representation of the scene. The objects in the virtual world usually undergo several transformation stages in a pipeline - for example, OpenGL has 4 of them. You can think about the pipeline as a black box that just takes those points and projects them on a 2D plane (which gets drawn on screen). Fortunately, the transformations have geometric interpretations which makes it a lot easier for us to interpret and visual their effects.

In the world of Core Animation, we have layers that are 2D planes living in a 3D space - their position is defined by the frame and zPosition properties (technically, the frame property is a calculated one, check out the documentation for the details). The programmer is responsible for specifying the positions of the layers and Core Animation takes care of rendering them.

(World) Coordinate System

Core Animation uses a left-handed coordinate system with the Z-axis’ positive space extending from the screen towards the viewer (same as OpenGL). Now, this is crucial - each layer defines a “world” (it’s own coordinate system) and any sublayers’ positions are relative to its superlayer’s “world”.

The Camera

If you imagine an abstract 3D space containing 2D planes, there’s an infinite number of points from which you can look at the scene. In OpenGL, you can set the camera but in Core Animation, there’s no obvious way to do it. Fortunately, there’s a way to adjust the camera position which we will cover further down.

Keep in mind that the notion of a camera is just another layer of abstraction - it’s just an affine transformation that get’s appended to the stack of transforms.

Transforms

Before we delve into projecting a 3D scene onto a 2D plane, we need to quickly recap 3D transforms.

Affine transformations in N-dimensional space use an (N+1) x (N+1) matrix - the reason being that it allows us to perform translation which is not possible if you just used an N x N matrix.

A problem arises if we use (N+1) x (N+1) transform matrices coupled with N-dimensional points - how are those transforms applied? We use homogenous coordinates - you can think of it as “lifting” a point into another dimension. For example, the 2D coordinate (5, 3) has a homogenous coordinate of (5, 3, 1) - you can visualise all points in the original 2D world to live on the Z = 1 plane in the 3D world. It’s the same principle for 3D coordinates - we can transform them to live in the W = 1 plane in 4 dimensions, so that (2, 5, 3) becomes (2, 5, 3, 1).

The advantage is that you can now apply the transform matrices on those homogenous coordinates. It’s very important to “scale” them back afterwards to the same N+1th plane (so that the transformed coordinates map directly back to our original N-space). Quick example - let’s say that (2, 5, 3, 1) gets transformed to (6, 8, 18, 2); we need to divide by 2 in order to get (3, 4, 9, 1) so that we can use the 3D coordinate (3, 4, 9) as our transformed point.

There are plenty of resources about homogenous coordinates on the net, so feel free to dig a bit further if the explanation above was not sufficient.

3D Projection

Now, let’s do a quick recap. We have 2D planes in a 3D space and we want to render them on a 2D surface (our screen). What’s the model (series of transformation) that Core Animation performs to display results on the screen?

Orthographic Projection

Core Animation uses a type of projection called orthographic - you can think of it as projecting the 2D planes from infinity without taking the Z distance into account (see this page for a visualisation). Orthographic projection is the reason why, by default, changing the Z position of layers has no effect on their rendered size - no matter how far away they are, their size never changes (you certainly don’t expect this behaviour in the real world).

The orthographic projection matrix can be defined as:

(1 0 0 0)
(0 1 0 0)
(0 0 0 0)
(0 0 0 1)

The effect of this matrix is that it completely ignores the Z coordinates of the objects which are rendered, which is what we see in Core Animation.

It is very important to note at this point that the framework always uses the orthographic projection in one of its final pipeline stages. In order to achieve a more realistic rendering, we can add a transform that modifies the 3D points before they reach Core Animation’s orthographic projection pipeline stage. How can we do this? Conveniently, CALayer has a sublayerTransform property which is documented as:

This property is typically used as the projection matrix to add perspective and other viewing effects to the receiver. Defaults to the identity transform.

We can just simulate perspective by setting the matrix to the standard one-point perspective transformation matrix.

Camera Position

Our sublayer transformation matrix is applied to the camera / eye coordinates which we never actually see or set (layers’ positions are specified using world coordinates). Note that:

The projection plane’s size and position (in world coordinates) is defined by the layer’s bounds property.
The “camera” is positioned at the anchor point of the projection plane using the standard left-handed coordinate system.

Having defined the projection plane dimensions and position + knowing where the “camera” is, we can just apply the standard one-point perspective transform which will essentially map all 3D points onto a 2D projection plane at some distance from the “camera”.

The standard one-point perspective projection matrix for a plane at distance D away from the camera (and away from the viewer as well) can be defined as:

(1 0 0 0)
(0 1 0 0)
(0 0 1 -1/D)
(0 0 0 0)

The matrix above will map all points onto the Z=-D plane. If you remember, Core Animation will then apply an orthographic projection and will capture the contents of the projection plane. One way to think about the effects of the above matrix is by imaging the existence of a 2D plane at -D units, so if you positioned a layer at Z=-D with frame (0, 0, 100, 80) it will look exactly the same as being positioned at (0, 0, 100, 80) and Z=0 when using the default identity matrix for the sublayerTransform property. There are a couple of reasons why you might want to use the standard one-point perspective transform:

Rotations will look realistic and show perspective.
Altering the Z positions of the layers will behave as expected - the further away layers are (along the Z axis), the smaller they will look.

A problem with using the transformation matrix above is that it maps all points onto a single 2D plane which means that Core Animation won’t be able to properly draw overlapping layers.

Demystifying m34

I have seen a lot of code floating around that simply sets the m34 property of CATransform3DIdentity to a small negative value like -1.0 / 2000.0. You might notice that it looks very similar to the standard one-point perspective projection but what about the extra 1.0 in m44? How can we think about this transformation?

The answer is quite simple - geometrically, setting the m34 component of the 3D identity transformation matrix to - 1.0 / D shifts the projection plane to Z=0. In other words, the layer with Z=0 and frame (0, 0, 100, 80) and a superlayer sublayerTransform matrix equal to

(1 0 0 0)
(0 1 0 0)
(0 0 1 -1/D)
(0 0 0 1)

would be rendered in the exact same way as the layer with Z=-D and frame (0, 0, 100, 80) and a superlayer sublayerTransform matrix equal to

(1 0 0 0)
(0 1 0 0)
(0 0 1 -1/D)
(0 0 0 0)

Another way to think about it is using offsets - each layer has an implicit offset of -D added to its Z position and layers are rendered as if a projection plane at Z=-D is used. The advantage of using the former matrix is that it preserves the correct Z-ordering.

Let’s prove that the above two transformations produce the same X and Y coordinates for an arbitrary point (remember how orthographic projections ignore the Z component?).

Take the same point twice, once at Z=z and once at Z=z-d, i.e., (x, y, z) and (x, y, z - d). Lifting them into 4D produces (x, y, z, 1) and (x, y, z - D, 1). Applying the former transform to the first coordinate and the latter transform to the 2nd coordinate yields (x, y, z, 1 - z/d) and (x, y, z - d, (d - z) /d). Now we need to normalise those coordinates (i.e., ensure w = 1) which results in (xd / (d - z), yd / (d - z), zd / (d - z), 1) and (xd / (d - z), yd / (d - z), -d, 1). As you can see, projecting those using the orthographic matrix will produce the exact same 2D point on screen.

Sample Code

The sample code is currently unavailable as it stopped working in later versions of Mac OS X.

Grabbing any good OpenGL book should provide you with enough information to deal with CA’s 3D model which is vastly simpler. As always, the Core Animation Programming Guide is a must read for anyone working with the framework. This OpenGL tutorial has nice diagrams showing the orientations of the coordinate systems.

← Back to Writings