Files
learn-wgpu/docs/concepts/coordinate-systems.md

6.8 KiB
Raw Blame History

Coordinate Systems

The Problem

Your window is a grid of pixels: 800×600 in our configuration. The 3D scene you want to render spans from -∞ to +∞ in every direction. The GPU cannot reason in window pixels because every window has a different size. It cannot reason in world space because that is application-defined. The GPU needs a standard intermediate coordinate space.

That space is ndc, Normalized Device Coordinates.

NDC Definition

NDC is a fixed, standardized cube:

Axis Min Max Meaning
X -1.0 +1.0 Left to right
Y -1.0 +1.0 Bottom to top
Z 0.0 1.0 Near to far

Any geometry in this cube that is in front of the near plane (Z ≥ 0) and behind the far plane (Z ≤ 1) is visible. Anything outside is clipped away by the GPU hardware before rasterization.

Visual Map

  (-1,+1) ────────── (+1,+1)
      │                  │
      │       (0,0)      │  ← origin = center of screen
      │                  │
  (-1,-1) ────────── (+1,-1)

Notice the origin is at the center of the screen, not the top-left. This is deliberate: 3D scenes are easier to reason about when (0,0) is the center. A camera sits at the origin and looks down the negative Z axis.

Our Triangle In NDC

Because this is the simplest possible renderer, our triangle vertices are specified directly in NDC. No projection matrix. No camera transform. No model matrix. Just three points in GPU-native space:

Corner X Y Z
Bottom-left -0.5 -0.5 0.0
Bottom-right +0.5 -0.5 0.0
Top-center 0.0 +0.5 0.0

The triangle occupies the lower half of the screen. The base runs from left to right along Y=-0.5. The peak sits on the center axis at Y=0.5. All three vertices are at Z=0, sitting exactly on the near plane.

Plot this in the NDC box above and you will see why the triangle fills half the screen. It spans 50% of the X axis (from -0.5 to +0.5) and 50% of the Y axis (from -0.5 to +0.5 in the lower half).

In a real application, vertices live in arbitrary world units and you apply a series of matrix transformations to bring them into clip space, from which the GPU produces NDC. Here we skip all of that and place the vertices directly in NDC. The vertex shader still outputs vec4<f32> and the pipeline is structurally identical.

Homogeneous Coordinates

The GPU vertex shader outputs a vec4<f32>, not a vec3<f32>. The fourth component w is the homogeneous coordinates value that enables the clip space → NDC conversion.

When the vertex shader outputs vec4<f32>(x, y, z, w), the GPU performs a step called perspective division: it divides every component by w. The result is (x/w, y/w, z/w) — this is what lands in NDC.

For our triangle, we set w = 1.0:

vec4<f32>(position, 1.0)  =  vec4<f32>(pos.x, pos.y, pos.z, 1.0)

Division by 1.0 is the identity — the position passes through unchanged. But why four components?

A w value of 1.0 means "this is a point in space." A w value of 0.0 would mean "this is a direction vector." This encoding lets the GPU handle both positions and directions with the same data type. More importantly, when you use a perspective projection matrix, the matrix encodes a varying w value per vertex (equal to the vertex's Z distance from the camera). After perspective division, the resulting NDC coordinates automatically produce the foreshortening effect that makes distant objects appear smaller. That is how perspective works on the GPU.

Our triangle uses w = 1.0 because we have no camera and no perspective — just an orthogonal placement. The value exists because the pipeline requires clip-space vec4 output, not because we need perspective.

Clip Space

Before NDC, there is clip space. This is the coordinate space the vertex shader outputs into. Clip space is a pyramid (for perspective projection) or a box (for orthographic projection) that the GPU clips against. Geometry outside the clip-space boundaries is discarded by hardware before perspective division. Our triangle is entirely inside the clip space pyramid, so nothing is clipped.

Viewport Transform (Automatic)

After perspective division produces NDC coordinates, the GPU maps them to the actual window dimensions. This is the viewport transform:

screen_x = (ndc_x + 1.0) / 2.0 * window_width
screen_y = (ndc_y + 1.0) / 2.0 * window_height

This step is automatic. You never write it in code. It is configured by the viewport transform fields in your SurfaceConfiguration, specifically the width and height values. When the surface configuration says 800×600, the GPU maps NDC [-1, +1] onto [0, 800] and [0, 600].

You do write code to update the viewport transform — but only when the window size changes. At that point, you create a new SurfaceConfiguration with the new dimensions and configure the surface. The GPU then uses the updated mapping on subsequent frames.

Depth and the Z Coordinate

All three vertices of our triangle sit at Z=0 — exactly on the near plane. This is a simplification that works fine for a flat 2D triangle, but it means we carry no depth information. In a 3D scene with overlapping geometry, you need varying Z values so the GPU can decide which surfaces are in front of others.

The mechanism that resolves this is the depth buffer. When enabled, the GPU allocates a per-pixel buffer storing the Z value of the closest surface rendered to that pixel. Each new fragment is compared against the stored depth: if the fragment is closer, it overwrites the pixel and updates the depth value; if it is farther away, it is silently discarded. This is how 3D scenes achieve correct occlusion.

Our current pipeline does not use a depth buffer. For flat 2D rendering, draw order alone determines which geometry appears on top. Depth buffering will be covered in a future tutorial when we render 3D geometry.

Summary: The Coordinate Journey

For our triangle, every vertex follows this path:

  1. Vertex data: Stored as vec3<f32> in the vertex buffer. Values are already in NDC.
  2. Vertex shader: Wraps in vec4(f32) by appending w = 1.0. This is clip space (which, for identity w, equals NDC).
  3. Perspective division: GPU divides by w = 1.0 → identity. Vertex is now in ndc.
  4. Viewport transform (automatic): GPU scales NDC to window pixel coordinates. The triangle appears on screen.

In a real 3D application, this journey includes model, view, and projection matrices before clip space. For the rainbow triangle, the journey is three steps through identity transforms. The hardware pipeline stages are the same regardless.