docs: append sections S4-S6 (shaders, vertex data, render pipeline)

This commit is contained in:
2026-05-30 17:44:31 -05:00
parent 4d429cf212
commit de38f526b9

View File

@@ -438,3 +438,450 @@ back pressure, smoothing out frame time spikes.
Steps 6 through 8 — shader module compilation, vertex buffer upload, and render
pipeline assembly — will be explored in detail in the next sections.
## S4: Writing the Shaders
New concept: **shaders are GPU programs.** A [shader](concepts/GLOSSARY.md#shader)
is a function or set of functions that runs on the GPU, compiled once at pipeline
creation time, then executed thousands of times in parallel. Each invocation
operates on different data but follows the identical instruction sequence. There
is no heap allocation, no recursion, no I/O, and no shared mutable state. The
GPU runs every invocation of a shader in lockstep: if one thread takes a
different branch, the entire wavefront serializes both paths and discards the
dead result. This is why you write shaders differently from CPU code — you
optimize for parallelism and branchless arithmetic.
A [shader module](concepts/GLOSSARY.md#shader) can contain multiple entry points.
For rendering, the two mandatory entry points are the [vertex shader](concepts/GLOSSARY.md#vertex-shader)
and the [fragment shader](concepts/GLOSSARY.md#fragment-shader). The vertex
shader runs once per [vertex](concepts/GLOSSARY.md#vertex). The fragment shader
runs once per [fragment](concepts/GLOSSARY.md#fragment) — that is, once per pixel
covered by the rasterized [primitive](concepts/GLOSSARY.md#primitive).
> **Key insight #1 — Interpolation is free hardware:** The vertex shader outputs
> per-vertex colors at `@location(0)`. The [rasterizer](concepts/GLOSSARY.md#rasterizer)
> automatically interpolates them across the triangle surface using
> [barycentric coordinates](concepts/GLOSSARY.md#barycentric-coordinates). The
> fragment shader just returns whatever it receives. The rainbow gradient is not
> programmed — it is a consequence of the pipeline architecture. You supply
> colors at three points; the hardware computes every color in between at zero
> shader cost.
**Why WGSL:** WebGPU Shading Language ([WGSL](concepts/GLOSSARY.md#wgsl)) is
the single source format. wgpu compiles it to the platform-native intermediate
at runtime: SPIR-V for Vulkan, MSL for Metal, DXIL for DirectX. You write one
shader file and wgpu produces the right binary for every backend.
**Why `include_str!("shader.wgsl")`:** This Rust macro embeds the file contents
at compile time. The shader source becomes a string literal inside your binary.
At runtime there is zero file I/O. No paths to resolve, no loading failures,
no async reads. If the file is missing or malformed, the build fails, not the
runtime.
### The Complete Shader
Create `shader.wgsl` in your project root (at the same level as `main.rs`):
```wgsl
struct VertexOutput {
@builtin(position) clip_position: vec4<f32>,
@location(0) vertex_color: vec3<f32>,
};
@vertex
fn vs_main(
@location(0) position: vec3<f32>,
@location(1) color: vec3<f32>,
) -> VertexOutput {
var out: VertexOutput;
out.clip_position = vec4<f32>(position, 1.0);
out.vertex_color = color;
return out;
}
@fragment
fn fs_main(input: VertexOutput) -> @location(0) vec4<f32> {
return vec4<f32>(input.vertex_color, 1.0);
}
```
### Line-by-Line Walkthrough
**`struct VertexOutput { ... }`** — Defines the data flowing between the vertex
shader and the fragment shader. This struct is not a Rust type and not a buffer
layout — it is the output contract of the vertex shader that the rasterizer
carries through to the fragment shader.
**`@builtin(position) clip_position: vec4<f32>`** — `@builtin(position)` is a
reserved GPU output slot. Every vertex shader must produce a `vec4<f32>` at
this slot. This value is the vertex position in [clip space](concepts/GLOSSARY.md#clip-space).
The GPU uses it for perspective division (dividing x, y, z by w to produce
[ndc](concepts/GLOSSARY.md#ndc)) and clipping. In our triangle, the w
component is 1.0, so perspective division is the identity operation — our
positions are already in the right space.
**`@location(0) vertex_color: vec3<f32>`** — `@location(0)` marks this field
for interpolation. Any `@location(n)` output from the vertex shader that is
not a builtin is automatically interpolated by the rasterizer using barycentric
weights. At each vertex, the value is exact. Inside the triangle, it is the
weighted blend of all three vertex values. The fragment shader receives a
different `vertex_color` for every pixel, without any manual interpolation code.
> **Key insight #2 — THE LOCATIONS MUST MATCH:** `shader_location: 0` in
> Rust's `VertexAttribute` MUST equal `@location(0)` in WGSL's parameter
> annotation. If they differ, the shader reads from the wrong memory offset
> and produces garbage. This is not a type error or a runtime panic — it is
> silent data corruption. The GPU reads whatever bytes live at the mismatched
> offset and interprets them as floats.
**`@vertex fn vs_main(...)`** — `@vertex` declares this function as the vertex
shader entry point. The function is invoked once per vertex in the draw call.
For our triangle with three vertices, `vs_main` runs exactly three times.
**`@location(0) position: vec3<f32>`** — This input parameter receives data
from the vertex buffer mapped by `shader_location: 0`. In our Rust
`VertexBufferLayout`, the first `VertexAttribute` reads 3 floats at offset 0
and delivers them to the shader at location 0. This is the raw NDC position.
**`@location(1) color: vec3<f32>`** — The second vertex buffer attribute
mapped to location 1. Reads 3 floats at the offset after the position
(12 bytes into each vertex) — the per-vertex RGB color.
**`var out: VertexOutput;`** — Local variable declaration. WGSL requires
explicit variable bindings. `var` creates a mutable local.
**`out.clip_position = vec4<f32>(position, 1.0);`** — Converts the `vec3`
input into [homogeneous coordinates](concepts/GLOSSARY.md#homogeneous-coordinates)
by appending w = 1.0. This promotes the position from 3D to clip space. With
w = 1.0, perspective division (x/w, y/w, z/w) leaves the coordinates unchanged.
If we were using perspective projection, the vertex shader would compute a
nontrivial w value from the depth.
**`out.vertex_color = color;`** — Passes the input color through to the output.
The rasterizer picks this field up, interpolates it across the triangle surface,
and delivers the interpolated value to every fragment.
**`@fragment fn fs_main(input: VertexOutput)`** — `@fragment` declares the
fragment shader entry point. `input` is the rasterizer's interpolated output
from the vertex shader. Every `@location(n)` field in `VertexOutput` is now
pre-blended. The `@builtin(position)` field is not interpolated — it is the
original vertex position.
**`-> @location(0) vec4<f32>`** — The fragment shader must output at least one
color value at `@location(0)`. This number must match the corresponding color
target in the [render pipeline](concepts/GLOSSARY.md#pipeline) descriptor. The
return type is `vec4<f32>` — RGBA with linear-space components.
**`return vec4<f32>(input.vertex_color, 1.0);`** — Promotes the interpolated
RGB color to RGBA by setting alpha = 1.0 (fully opaque). The
[rasterizer](concepts/GLOSSARY.md#rasterizer) interpolated `input.vertex_color`
across the triangle; we just attach an alpha channel and return it. The output
merge stage writes this color directly to the framebuffer.
### Rust Shader Module Creation
The Rust side loads the shader file at compile time and feeds the source to wgpu:
```rust
let shader_module = device.create_shader_module(
wgpu::ShaderModuleDescriptor {
label: Some("Rainbow Triangle Shader"),
source: wgpu::ShaderSource::Wgsl(include_str!("shader.wgsl").into()),
}
);
```
- **`ShaderModuleDescriptor`** — has two fields: `label` (debug string, shown
in graphics debuggers and validation messages) and `source` (the shader
text).
- **`ShaderSource::Wgsl(...)`** — wraps the WGSL string. wgpu also accepts
SPIR-V binary source via `ShaderSource::SpirV`, but WGSL is the native
path.
- **`device.create_shader_module()`** — takes the descriptor and parses +
validates the shader. On Vulkan, wgpu translates WGSL to SPIR-V internally.
If the shader has syntax errors, type mismatches, or unresolved entry points,
this call returns an error.
- **`&shader_module`** — the resulting handle is passed by reference into the
render pipeline descriptor. The module remains valid for the lifetime of the
pipeline.
## S5: Uploading Vertex Data to the GPU
New concept: **GPU memory isolation.** The GPU cannot read Rust heap or stack
memory directly. Vertex data must be laid out as a flat byte array and uploaded
into a dedicated GPU [[buffer slice]](concepts/GLOSSARY.md#buffer-slice). The
pipeline configuration then describes how to interpret those bytes: how many
bytes per vertex, what format each attribute has, and where in the vertex
strides the attribute begins.
> **Key insight #3 — `create_buffer_init` is an extension trait:** The method
> lives in `wgpu::util::DeviceExt`, not on `Device` directly. If you call
> `device.create_buffer_init(...)` without importing the trait, the compiler
> reports "method not found." This is a Rust trait-discovery issue, not a wgpu
> API issue. Add `use wgpu::util::DeviceExt;` to bring the method into scope.
### The Vertex Struct
```rust
#[repr(C)]
#[derive(Clone, Copy, bytemuck::Pod, bytemuck::Zeroable)]
struct Vertex {
position: [f32; 3],
color: [f32; 3],
}
```
- **`#[repr(C)]`** — Forces the Rust compiler to lay out the struct fields in
declaration order with no padding reordering. Without this, Rust is free to
reorder fields for optimal alignment, which would break the byte layout the
shader expects.
- **`bytemuck::Pod`** — "Plain Old Data." Guarantees the struct has no padding
holes, no destructors, and a trivial memory representation. wgpu requires
all vertex types to be Pod so they can be safely transmuted to bytes.
- **`bytemuck::Zeroable`** — Guarantees that initializing the struct's memory
to all-zero bytes produces a valid instance. Required because `Pod` alone
does not guarantee zero is a valid discriminant for enums or optional types.
Combined with Pod, it enables `bytemuck::cast_slice` to convert between
`&[Vertex]` and `&[u8]` without a `unsafe` block.
### Vertex Data
```rust
const VERTICES: &[Vertex] = &[
Vertex { position: [-0.5, -0.5, 0.0], color: [1.0, 0.0, 0.0] }, // red
Vertex { position: [ 0.5, -0.5, 0.0], color: [0.0, 0.0, 1.0] }, // blue
Vertex { position: [ 0.0, 0.5, 0.0], color: [0.0, 1.0, 0.0] }, // green
];
```
- **Positions are in NDC:** The [normalized device coordinates](concepts/GLOSSARY.md#ndc)
range from -1.0 (left/bottom) to +1.0 (right/top). Our triangle spans the
bottom half of the screen: the bottom-left corner at (-0.5, -0.5), the
bottom-right at (0.5, -0.5), and the top center at (0.0, 0.5). This
produces an upright, centered triangle.
- **CCW winding order:** The vertices are listed counter-clockwise:
red → blue → green. In a standard right-handed coordinate system, connecting
vertices in this sequence traces the triangle counter-clockwise. This
determines which face is "front" and which is "back" — critical for
[culling](concepts/GLOSSARY.md#rasterizer) and correct normal computation.
### Buffer Upload
```rust
use wgpu::util::DeviceExt;
let vertex_buffer = device.create_buffer_init(
&wgpu::util::BufferInitDescriptor {
label: Some("Vertex Buffer"),
contents: bytemuck::cast_slice(VERTICES),
usage: wgpu::BufferUsages::VERTEX,
}
);
```
- **`use wgpu::util::DeviceExt`** — imports the extension trait that adds
`create_buffer_init` to `Device`. Without this import, the method is not
visible.
- **`device.create_buffer_init(...)`** — combined allocate-and-upload. It
creates a GPU buffer, allocates system memory, copies the `contents` slice
into staging storage, and issues a synchronous copy to GPU memory. This is a
convenience wrapper around `create_buffer` + `queue.write_buffer`.
- **`bytemuck::cast_slice(VERTICES)`** — converts `&[Vertex; 3]` to `&[u8]`
by reinterpreting the same memory at a byte level. The GPU receives 72 bytes:
three vertices × 24 bytes per vertex (6 × `f32` = 6 × 4 bytes). No copy, no
serialization — just a pointer reinterpretation.
- **`BufferUsages::VERTEX`** — declares this buffer will be bound as a vertex
buffer in the pipeline. wgpu's validation layer will reject any attempt to
use this buffer for staging, uniform, or storage access. Usage bits
are chosen at creation and cannot be changed.
## S6: Compiling the Render Pipeline
New concept: **the render pipeline is a compiled GPU configuration.** A
[render pipeline](concepts/GLOSSARY.md#pipeline) bundles every decision the GPU
needs to execute a draw: which shaders to run, how to interpret vertex buffer
bytes, what [topology](concepts/GLOSSARY.md#topology) to use, whether to cull
back faces, what blend mode to apply, and where to write the output. Pipeline
creation is not a simple struct allocation — it compiles these choices into a
GPU-executable configuration. Errors in any field are caught at creation time,
not at draw time. This validation-upfront model is what makes pipelines expensive
to create but cheap to execute.
### Vertex Buffer Layout
Before the pipeline descriptor, you must tell wgpu how to parse the byte stream
in the vertex buffer into per-vertex attributes:
```rust
let vertex_buffer_layout = wgpu::VertexBufferLayout {
array_stride: std::mem::size_of::<Vertex>() as u64,
step_mode: wgpu::VertexStepMode::Vertex,
attributes: &[
wgpu::VertexAttribute {
offset: 0,
format: wgpu::VertexFormat::F32x3,
shader_location: 0,
},
wgpu::VertexAttribute {
offset: std::mem::size_of::<[f32; 3]>() as u64,
format: wgpu::VertexFormat::F32x3,
shader_location: 1,
},
],
};
```
- **`array_stride: 24`** — `size_of::<Vertex>()` = 24 bytes (6 × `f32` × 4 bytes).
This is the byte distance from one vertex to the next in the buffer. The GPU
uses this to step through the buffer: vertex 0 starts at byte 0, vertex 1
at byte 24, vertex 2 at byte 48.
- **`step_mode: Vertex`** — advance the buffer by one stride for every vertex
the vertex shader processes. The other option is `Instance`, which advances
per draw instance in instanced rendering. For a single triangle, `Vertex` is
correct: each of the three vertices has its own position and color.
- **First attribute — `shader_location: 0`**: reads 3 floats (`F32x3`) at byte
offset 0 of each vertex. These 3 floats map to the
[shader location](concepts/GLOSSARY.md#shader-location) `@location(0)` in the
vertex shader — the `position` parameter. The GPU delivers `[x, y, z]` to
that function argument.
- **Second attribute — `shader_location: 1`**: reads 3 floats at offset 12
(`size_of::<[f32; 3]>()` = 3 × 4 = 12). Skips past the position array to
the color array inside each vertex. Maps to `@location(1)` in the shader —
the `color` parameter. If the offset were 0 instead of 12, the shader would
receive the position values as the color input, rendering a triangle with
gradient colors derived from position data.
### The Complete Render Pipeline Descriptor
```rust
let pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
label: Some("Triangle Pipeline"),
layout: None,
vertex: wgpu::VertexState {
module: &shader_module,
entry_point: Some("vs_main"),
buffers: &[vertex_buffer_layout],
compilation_options: Default::default(),
},
primitive: wgpu::PrimitiveState {
topology: wgpu::PrimitiveTopology::TriangleList,
strip_index_format: None,
front_face: wgpu::FrontFace::Ccw,
cull_mode: Some(wgpu::Face::Back),
unclipped_depth: false,
polygon_mode: wgpu::PolygonMode::Fill,
conservative: false,
},
depth_stencil: None,
multisample: wgpu::MultisampleState {
count: 1,
mask: !0,
alpha_to_coverage_enabled: false,
},
fragment: Some(wgpu::FragmentState {
module: &shader_module,
entry_point: Some("fs_main"),
targets: &[Some(wgpu::ColorTargetState {
format: config.format,
blend: None,
write_mask: wgpu::ColorWrites::ALL,
})],
compilation_options: Default::default(),
}),
multiview_mask: None,
cache: None,
});
```
### Field-by-Field Walkthrough
**`RenderPipelineDescriptor` has 9 fields.** Every field must be present. The
structure does not use `..Default::default()` at the descriptor level — each
field is filled explicitly.
**`label: Some("Triangle Pipeline")`** — Debug string. Shown in GPU profilers
(RenderDoc, Nvidia Nsight) and wgpu validation error messages. Omitting it
produces anonymous pipelines that are impossible to trace during debugging.
**`layout: None`** — Derives the pipeline layout from the shader module
automatically. When no push constants or bind groups are used, `None` tells wgpu
to infer the layout. If you later add `@group(n)` bindings to your shader, you
must provide a `RenderPipelineLayout` created with `device.create_render_pipeline_layout()`.
**`vertex` — [`VertexState`](concepts/GLOSSARY.md#vertex-shader) (4 fields):**
- **`module: &shader_module`** — references the compiled shader module from S4.
- **`entry_point: Some("vs_main")`** — selects which function in the module is
the vertex shader entry point. Must match the `@vertex fn vs_main(...)`
declaration exactly.
- **`buffers: &[vertex_buffer_layout]`** — array of vertex buffer layouts.
Multiple layouts are used rarely (multi-mesh, GPU instancing with separate
instance buffers). For a single vertex buffer, one layout suffices.
- **`compilation_options: Default::default()`** — shader compilation backend
hints. Default uses the backend's standard flags for optimization and SPIR-V
version.
**`primitive` — [`PrimitiveState`](concepts/GLOSSARY.md#primitive) (7 fields):**
- **`topology: TriangleList`** — every 3 consecutive vertices form one
triangle. For 3 vertices, this produces exactly 1 triangle. If we had 6
vertices, it would produce 2 independent triangles.
- **`strip_index_format: None`** — only set for `TriangleStrip` or `LineStrip`
topologies when using restart indices. Not applicable to `TriangleList`.
- **`front_face: Ccw`** — counter-clockwise winding defines the front face of
a triangle. Combined with `cull_mode`, this determines which triangles are
visible. Because our vertices are listed CCW in S5, triangles drawn in that
order face toward the viewer.
- **`cull_mode: Some(wgpu::Face::Back)`** — discard triangles whose winding
indicates a back face. For a single triangle viewed from the front, this is
harmless but establishes correct culling for 3D geometry where back faces
are guaranteed not to be visible.
- **`unclipped_depth: false`** — depth values outside [0.0, 1.0] are clipped
(the standard behavior). `true` allows depth values beyond the normal range
to pass through — used for specific depth-testing tricks.
- **`polygon_mode: Fill`** — render the full interior of the triangle. Other
options are `Line` (wireframe edges) and `Point` (vertex points only).
- **`conservative: false`** — the rasterizer fragments only pixels provably
inside the triangle. `true` fragments every pixel that *might* intersect the
triangle — used for conservative rasterization (shadow volumes, occlusion
queries).
**`depth_stencil: None`** — No depth buffer or stencil buffer. Without depth
testing, triangles are drawn in submission order: later draws overwrite earlier
draws at the same pixel. For a single triangle this is not a concern.
**`multisample` — [`MultisampleState`](concepts/GLOSSARY.md#fragment) (3 fields):**
- **`count: 1`** — no multisampling. Each pixel produces one fragment. Higher
values (2, 4, 8) activate MSAA, sampling multiple points per pixel and
reducing aliasing at the cost of framebuffer bandwidth.
- **`mask: !0`** — all sample bits are enabled. This mask allows you to
selectively disable individual MSAA samples (advanced use case).
- **`alpha_to_coverage_enabled: false`** — do not use the alpha channel of the
fragment color as a coverage mask. Enabled for transparent edge antialiasing
(e.g., font rendering).
**`fragment` — [`FragmentState`](concepts/GLOSSARY.md#fragment-shader) (4 fields):**
- **`module: &shader_module`** — same shader module as the vertex shader.
- **`entry_point: Some("fs_main")`** — selects the fragment shader entry point.
Must match `@fragment fn fs_main(...)` in the WGSL.
- **`targets`** — array of color target states, one per render pass output
attachment. `&[Some(...)]` means one color target present. `None` at this
index would mean a render pass with no color output (e.g., depth-only pass).
- **`ColorTargetState` has exactly 3 fields** (no `view_formats` field):
- **`format: config.format`** — MUST match the surface format from
`SurfaceConfiguration`. The pipeline writes in this format; the surface
reads in this format. A mismatch at render time produces an error. If
you change the surface format, you must recreate the pipeline.
- **`blend: None`** — disables blending. Without blending, every fragment
color replaces the existing framebuffer pixel (`REPLACE` mode). With
blending, new and existing colors are combined according to a blend
equation (useful for transparency).
- **`write_mask: ColorWrites::ALL`** — write all four RGBA channels.
You can mask out individual channels (e.g., write only R and G) if you
need to preserve certain framebuffer channels across draw calls.
- **`compilation_options: Default::default()`** — fragment shader compilation
flags, same as the vertex compilation options above.
**`multiview_mask: None`** — no multiview rendering. Multiview is for
stereoscopic (VR) or multi-viewport single-pass rendering. Not used here.
**`cache: None`** — no pipeline cache. A pipeline cache stores compiled shader
binaries to speed up subsequent pipeline creation. Useful when creating many
pipelines dynamically; for a single pipeline, caching has no practical benefit.