From de38f526b9597fbbef8f93e052e519444f3f8d9d Mon Sep 17 00:00:00 2001 From: Krishna Ayyalasomayajula Date: Sat, 30 May 2026 17:44:31 -0500 Subject: [PATCH] docs: append sections S4-S6 (shaders, vertex data, render pipeline) --- docs/01-rainbow-triangle.md | 447 ++++++++++++++++++++++++++++++++++++ 1 file changed, 447 insertions(+) diff --git a/docs/01-rainbow-triangle.md b/docs/01-rainbow-triangle.md index 8a78303..e94a35b 100644 --- a/docs/01-rainbow-triangle.md +++ b/docs/01-rainbow-triangle.md @@ -438,3 +438,450 @@ back pressure, smoothing out frame time spikes. Steps 6 through 8 — shader module compilation, vertex buffer upload, and render pipeline assembly — will be explored in detail in the next sections. + +## S4: Writing the Shaders + +New concept: **shaders are GPU programs.** A [shader](concepts/GLOSSARY.md#shader) +is a function or set of functions that runs on the GPU, compiled once at pipeline +creation time, then executed thousands of times in parallel. Each invocation +operates on different data but follows the identical instruction sequence. There +is no heap allocation, no recursion, no I/O, and no shared mutable state. The +GPU runs every invocation of a shader in lockstep: if one thread takes a +different branch, the entire wavefront serializes both paths and discards the +dead result. This is why you write shaders differently from CPU code — you +optimize for parallelism and branchless arithmetic. + +A [shader module](concepts/GLOSSARY.md#shader) can contain multiple entry points. +For rendering, the two mandatory entry points are the [vertex shader](concepts/GLOSSARY.md#vertex-shader) +and the [fragment shader](concepts/GLOSSARY.md#fragment-shader). The vertex +shader runs once per [vertex](concepts/GLOSSARY.md#vertex). The fragment shader +runs once per [fragment](concepts/GLOSSARY.md#fragment) — that is, once per pixel +covered by the rasterized [primitive](concepts/GLOSSARY.md#primitive). + +> **Key insight #1 — Interpolation is free hardware:** The vertex shader outputs +> per-vertex colors at `@location(0)`. The [rasterizer](concepts/GLOSSARY.md#rasterizer) +> automatically interpolates them across the triangle surface using +> [barycentric coordinates](concepts/GLOSSARY.md#barycentric-coordinates). The +> fragment shader just returns whatever it receives. The rainbow gradient is not +> programmed — it is a consequence of the pipeline architecture. You supply +> colors at three points; the hardware computes every color in between at zero +> shader cost. + +**Why WGSL:** WebGPU Shading Language ([WGSL](concepts/GLOSSARY.md#wgsl)) is +the single source format. wgpu compiles it to the platform-native intermediate +at runtime: SPIR-V for Vulkan, MSL for Metal, DXIL for DirectX. You write one +shader file and wgpu produces the right binary for every backend. + +**Why `include_str!("shader.wgsl")`:** This Rust macro embeds the file contents +at compile time. The shader source becomes a string literal inside your binary. +At runtime there is zero file I/O. No paths to resolve, no loading failures, +no async reads. If the file is missing or malformed, the build fails, not the +runtime. + +### The Complete Shader + +Create `shader.wgsl` in your project root (at the same level as `main.rs`): + +```wgsl +struct VertexOutput { + @builtin(position) clip_position: vec4, + @location(0) vertex_color: vec3, +}; + +@vertex +fn vs_main( + @location(0) position: vec3, + @location(1) color: vec3, +) -> VertexOutput { + var out: VertexOutput; + out.clip_position = vec4(position, 1.0); + out.vertex_color = color; + return out; +} + +@fragment +fn fs_main(input: VertexOutput) -> @location(0) vec4 { + return vec4(input.vertex_color, 1.0); +} +``` + +### Line-by-Line Walkthrough + +**`struct VertexOutput { ... }`** — Defines the data flowing between the vertex +shader and the fragment shader. This struct is not a Rust type and not a buffer +layout — it is the output contract of the vertex shader that the rasterizer +carries through to the fragment shader. + +**`@builtin(position) clip_position: vec4`** — `@builtin(position)` is a +reserved GPU output slot. Every vertex shader must produce a `vec4` at +this slot. This value is the vertex position in [clip space](concepts/GLOSSARY.md#clip-space). +The GPU uses it for perspective division (dividing x, y, z by w to produce +[ndc](concepts/GLOSSARY.md#ndc)) and clipping. In our triangle, the w +component is 1.0, so perspective division is the identity operation — our +positions are already in the right space. + +**`@location(0) vertex_color: vec3`** — `@location(0)` marks this field +for interpolation. Any `@location(n)` output from the vertex shader that is +not a builtin is automatically interpolated by the rasterizer using barycentric +weights. At each vertex, the value is exact. Inside the triangle, it is the +weighted blend of all three vertex values. The fragment shader receives a +different `vertex_color` for every pixel, without any manual interpolation code. + +> **Key insight #2 — THE LOCATIONS MUST MATCH:** `shader_location: 0` in +> Rust's `VertexAttribute` MUST equal `@location(0)` in WGSL's parameter +> annotation. If they differ, the shader reads from the wrong memory offset +> and produces garbage. This is not a type error or a runtime panic — it is +> silent data corruption. The GPU reads whatever bytes live at the mismatched +> offset and interprets them as floats. + +**`@vertex fn vs_main(...)`** — `@vertex` declares this function as the vertex +shader entry point. The function is invoked once per vertex in the draw call. +For our triangle with three vertices, `vs_main` runs exactly three times. + +**`@location(0) position: vec3`** — This input parameter receives data +from the vertex buffer mapped by `shader_location: 0`. In our Rust +`VertexBufferLayout`, the first `VertexAttribute` reads 3 floats at offset 0 +and delivers them to the shader at location 0. This is the raw NDC position. + +**`@location(1) color: vec3`** — The second vertex buffer attribute +mapped to location 1. Reads 3 floats at the offset after the position +(12 bytes into each vertex) — the per-vertex RGB color. + +**`var out: VertexOutput;`** — Local variable declaration. WGSL requires +explicit variable bindings. `var` creates a mutable local. + +**`out.clip_position = vec4(position, 1.0);`** — Converts the `vec3` +input into [homogeneous coordinates](concepts/GLOSSARY.md#homogeneous-coordinates) +by appending w = 1.0. This promotes the position from 3D to clip space. With +w = 1.0, perspective division (x/w, y/w, z/w) leaves the coordinates unchanged. +If we were using perspective projection, the vertex shader would compute a +nontrivial w value from the depth. + +**`out.vertex_color = color;`** — Passes the input color through to the output. +The rasterizer picks this field up, interpolates it across the triangle surface, +and delivers the interpolated value to every fragment. + +**`@fragment fn fs_main(input: VertexOutput)`** — `@fragment` declares the +fragment shader entry point. `input` is the rasterizer's interpolated output +from the vertex shader. Every `@location(n)` field in `VertexOutput` is now +pre-blended. The `@builtin(position)` field is not interpolated — it is the +original vertex position. + +**`-> @location(0) vec4`** — The fragment shader must output at least one +color value at `@location(0)`. This number must match the corresponding color +target in the [render pipeline](concepts/GLOSSARY.md#pipeline) descriptor. The +return type is `vec4` — RGBA with linear-space components. + +**`return vec4(input.vertex_color, 1.0);`** — Promotes the interpolated +RGB color to RGBA by setting alpha = 1.0 (fully opaque). The +[rasterizer](concepts/GLOSSARY.md#rasterizer) interpolated `input.vertex_color` +across the triangle; we just attach an alpha channel and return it. The output +merge stage writes this color directly to the framebuffer. + +### Rust Shader Module Creation + +The Rust side loads the shader file at compile time and feeds the source to wgpu: + +```rust +let shader_module = device.create_shader_module( + wgpu::ShaderModuleDescriptor { + label: Some("Rainbow Triangle Shader"), + source: wgpu::ShaderSource::Wgsl(include_str!("shader.wgsl").into()), + } +); +``` + +- **`ShaderModuleDescriptor`** — has two fields: `label` (debug string, shown + in graphics debuggers and validation messages) and `source` (the shader + text). +- **`ShaderSource::Wgsl(...)`** — wraps the WGSL string. wgpu also accepts + SPIR-V binary source via `ShaderSource::SpirV`, but WGSL is the native + path. +- **`device.create_shader_module()`** — takes the descriptor and parses + + validates the shader. On Vulkan, wgpu translates WGSL to SPIR-V internally. + If the shader has syntax errors, type mismatches, or unresolved entry points, + this call returns an error. +- **`&shader_module`** — the resulting handle is passed by reference into the + render pipeline descriptor. The module remains valid for the lifetime of the + pipeline. + +## S5: Uploading Vertex Data to the GPU + +New concept: **GPU memory isolation.** The GPU cannot read Rust heap or stack +memory directly. Vertex data must be laid out as a flat byte array and uploaded +into a dedicated GPU [[buffer slice]](concepts/GLOSSARY.md#buffer-slice). The +pipeline configuration then describes how to interpret those bytes: how many +bytes per vertex, what format each attribute has, and where in the vertex +strides the attribute begins. + +> **Key insight #3 — `create_buffer_init` is an extension trait:** The method +> lives in `wgpu::util::DeviceExt`, not on `Device` directly. If you call +> `device.create_buffer_init(...)` without importing the trait, the compiler +> reports "method not found." This is a Rust trait-discovery issue, not a wgpu +> API issue. Add `use wgpu::util::DeviceExt;` to bring the method into scope. + +### The Vertex Struct + +```rust +#[repr(C)] +#[derive(Clone, Copy, bytemuck::Pod, bytemuck::Zeroable)] +struct Vertex { + position: [f32; 3], + color: [f32; 3], +} +``` + +- **`#[repr(C)]`** — Forces the Rust compiler to lay out the struct fields in + declaration order with no padding reordering. Without this, Rust is free to + reorder fields for optimal alignment, which would break the byte layout the + shader expects. +- **`bytemuck::Pod`** — "Plain Old Data." Guarantees the struct has no padding + holes, no destructors, and a trivial memory representation. wgpu requires + all vertex types to be Pod so they can be safely transmuted to bytes. +- **`bytemuck::Zeroable`** — Guarantees that initializing the struct's memory + to all-zero bytes produces a valid instance. Required because `Pod` alone + does not guarantee zero is a valid discriminant for enums or optional types. + Combined with Pod, it enables `bytemuck::cast_slice` to convert between + `&[Vertex]` and `&[u8]` without a `unsafe` block. + +### Vertex Data + +```rust +const VERTICES: &[Vertex] = &[ + Vertex { position: [-0.5, -0.5, 0.0], color: [1.0, 0.0, 0.0] }, // red + Vertex { position: [ 0.5, -0.5, 0.0], color: [0.0, 0.0, 1.0] }, // blue + Vertex { position: [ 0.0, 0.5, 0.0], color: [0.0, 1.0, 0.0] }, // green +]; +``` + +- **Positions are in NDC:** The [normalized device coordinates](concepts/GLOSSARY.md#ndc) + range from -1.0 (left/bottom) to +1.0 (right/top). Our triangle spans the + bottom half of the screen: the bottom-left corner at (-0.5, -0.5), the + bottom-right at (0.5, -0.5), and the top center at (0.0, 0.5). This + produces an upright, centered triangle. +- **CCW winding order:** The vertices are listed counter-clockwise: + red → blue → green. In a standard right-handed coordinate system, connecting + vertices in this sequence traces the triangle counter-clockwise. This + determines which face is "front" and which is "back" — critical for + [culling](concepts/GLOSSARY.md#rasterizer) and correct normal computation. + +### Buffer Upload + +```rust +use wgpu::util::DeviceExt; +let vertex_buffer = device.create_buffer_init( + &wgpu::util::BufferInitDescriptor { + label: Some("Vertex Buffer"), + contents: bytemuck::cast_slice(VERTICES), + usage: wgpu::BufferUsages::VERTEX, + } +); +``` + +- **`use wgpu::util::DeviceExt`** — imports the extension trait that adds + `create_buffer_init` to `Device`. Without this import, the method is not + visible. +- **`device.create_buffer_init(...)`** — combined allocate-and-upload. It + creates a GPU buffer, allocates system memory, copies the `contents` slice + into staging storage, and issues a synchronous copy to GPU memory. This is a + convenience wrapper around `create_buffer` + `queue.write_buffer`. +- **`bytemuck::cast_slice(VERTICES)`** — converts `&[Vertex; 3]` to `&[u8]` + by reinterpreting the same memory at a byte level. The GPU receives 72 bytes: + three vertices × 24 bytes per vertex (6 × `f32` = 6 × 4 bytes). No copy, no + serialization — just a pointer reinterpretation. +- **`BufferUsages::VERTEX`** — declares this buffer will be bound as a vertex + buffer in the pipeline. wgpu's validation layer will reject any attempt to + use this buffer for staging, uniform, or storage access. Usage bits + are chosen at creation and cannot be changed. + +## S6: Compiling the Render Pipeline + +New concept: **the render pipeline is a compiled GPU configuration.** A +[render pipeline](concepts/GLOSSARY.md#pipeline) bundles every decision the GPU +needs to execute a draw: which shaders to run, how to interpret vertex buffer +bytes, what [topology](concepts/GLOSSARY.md#topology) to use, whether to cull +back faces, what blend mode to apply, and where to write the output. Pipeline +creation is not a simple struct allocation — it compiles these choices into a +GPU-executable configuration. Errors in any field are caught at creation time, +not at draw time. This validation-upfront model is what makes pipelines expensive +to create but cheap to execute. + +### Vertex Buffer Layout + +Before the pipeline descriptor, you must tell wgpu how to parse the byte stream +in the vertex buffer into per-vertex attributes: + +```rust +let vertex_buffer_layout = wgpu::VertexBufferLayout { + array_stride: std::mem::size_of::() as u64, + step_mode: wgpu::VertexStepMode::Vertex, + attributes: &[ + wgpu::VertexAttribute { + offset: 0, + format: wgpu::VertexFormat::F32x3, + shader_location: 0, + }, + wgpu::VertexAttribute { + offset: std::mem::size_of::<[f32; 3]>() as u64, + format: wgpu::VertexFormat::F32x3, + shader_location: 1, + }, + ], +}; +``` + +- **`array_stride: 24`** — `size_of::()` = 24 bytes (6 × `f32` × 4 bytes). + This is the byte distance from one vertex to the next in the buffer. The GPU + uses this to step through the buffer: vertex 0 starts at byte 0, vertex 1 + at byte 24, vertex 2 at byte 48. +- **`step_mode: Vertex`** — advance the buffer by one stride for every vertex + the vertex shader processes. The other option is `Instance`, which advances + per draw instance in instanced rendering. For a single triangle, `Vertex` is + correct: each of the three vertices has its own position and color. +- **First attribute — `shader_location: 0`**: reads 3 floats (`F32x3`) at byte + offset 0 of each vertex. These 3 floats map to the + [shader location](concepts/GLOSSARY.md#shader-location) `@location(0)` in the + vertex shader — the `position` parameter. The GPU delivers `[x, y, z]` to + that function argument. +- **Second attribute — `shader_location: 1`**: reads 3 floats at offset 12 + (`size_of::<[f32; 3]>()` = 3 × 4 = 12). Skips past the position array to + the color array inside each vertex. Maps to `@location(1)` in the shader — + the `color` parameter. If the offset were 0 instead of 12, the shader would + receive the position values as the color input, rendering a triangle with + gradient colors derived from position data. + +### The Complete Render Pipeline Descriptor + +```rust +let pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor { + label: Some("Triangle Pipeline"), + layout: None, + vertex: wgpu::VertexState { + module: &shader_module, + entry_point: Some("vs_main"), + buffers: &[vertex_buffer_layout], + compilation_options: Default::default(), + }, + primitive: wgpu::PrimitiveState { + topology: wgpu::PrimitiveTopology::TriangleList, + strip_index_format: None, + front_face: wgpu::FrontFace::Ccw, + cull_mode: Some(wgpu::Face::Back), + unclipped_depth: false, + polygon_mode: wgpu::PolygonMode::Fill, + conservative: false, + }, + depth_stencil: None, + multisample: wgpu::MultisampleState { + count: 1, + mask: !0, + alpha_to_coverage_enabled: false, + }, + fragment: Some(wgpu::FragmentState { + module: &shader_module, + entry_point: Some("fs_main"), + targets: &[Some(wgpu::ColorTargetState { + format: config.format, + blend: None, + write_mask: wgpu::ColorWrites::ALL, + })], + compilation_options: Default::default(), + }), + multiview_mask: None, + cache: None, +}); +``` + +### Field-by-Field Walkthrough + +**`RenderPipelineDescriptor` has 9 fields.** Every field must be present. The +structure does not use `..Default::default()` at the descriptor level — each +field is filled explicitly. + +**`label: Some("Triangle Pipeline")`** — Debug string. Shown in GPU profilers +(RenderDoc, Nvidia Nsight) and wgpu validation error messages. Omitting it +produces anonymous pipelines that are impossible to trace during debugging. + +**`layout: None`** — Derives the pipeline layout from the shader module +automatically. When no push constants or bind groups are used, `None` tells wgpu +to infer the layout. If you later add `@group(n)` bindings to your shader, you +must provide a `RenderPipelineLayout` created with `device.create_render_pipeline_layout()`. + +**`vertex` — [`VertexState`](concepts/GLOSSARY.md#vertex-shader) (4 fields):** +- **`module: &shader_module`** — references the compiled shader module from S4. +- **`entry_point: Some("vs_main")`** — selects which function in the module is + the vertex shader entry point. Must match the `@vertex fn vs_main(...)` + declaration exactly. +- **`buffers: &[vertex_buffer_layout]`** — array of vertex buffer layouts. + Multiple layouts are used rarely (multi-mesh, GPU instancing with separate + instance buffers). For a single vertex buffer, one layout suffices. +- **`compilation_options: Default::default()`** — shader compilation backend + hints. Default uses the backend's standard flags for optimization and SPIR-V + version. + +**`primitive` — [`PrimitiveState`](concepts/GLOSSARY.md#primitive) (7 fields):** +- **`topology: TriangleList`** — every 3 consecutive vertices form one + triangle. For 3 vertices, this produces exactly 1 triangle. If we had 6 + vertices, it would produce 2 independent triangles. +- **`strip_index_format: None`** — only set for `TriangleStrip` or `LineStrip` + topologies when using restart indices. Not applicable to `TriangleList`. +- **`front_face: Ccw`** — counter-clockwise winding defines the front face of + a triangle. Combined with `cull_mode`, this determines which triangles are + visible. Because our vertices are listed CCW in S5, triangles drawn in that + order face toward the viewer. +- **`cull_mode: Some(wgpu::Face::Back)`** — discard triangles whose winding + indicates a back face. For a single triangle viewed from the front, this is + harmless but establishes correct culling for 3D geometry where back faces + are guaranteed not to be visible. +- **`unclipped_depth: false`** — depth values outside [0.0, 1.0] are clipped + (the standard behavior). `true` allows depth values beyond the normal range + to pass through — used for specific depth-testing tricks. +- **`polygon_mode: Fill`** — render the full interior of the triangle. Other + options are `Line` (wireframe edges) and `Point` (vertex points only). +- **`conservative: false`** — the rasterizer fragments only pixels provably + inside the triangle. `true` fragments every pixel that *might* intersect the + triangle — used for conservative rasterization (shadow volumes, occlusion + queries). + +**`depth_stencil: None`** — No depth buffer or stencil buffer. Without depth +testing, triangles are drawn in submission order: later draws overwrite earlier +draws at the same pixel. For a single triangle this is not a concern. + +**`multisample` — [`MultisampleState`](concepts/GLOSSARY.md#fragment) (3 fields):** +- **`count: 1`** — no multisampling. Each pixel produces one fragment. Higher + values (2, 4, 8) activate MSAA, sampling multiple points per pixel and + reducing aliasing at the cost of framebuffer bandwidth. +- **`mask: !0`** — all sample bits are enabled. This mask allows you to + selectively disable individual MSAA samples (advanced use case). +- **`alpha_to_coverage_enabled: false`** — do not use the alpha channel of the + fragment color as a coverage mask. Enabled for transparent edge antialiasing + (e.g., font rendering). + +**`fragment` — [`FragmentState`](concepts/GLOSSARY.md#fragment-shader) (4 fields):** +- **`module: &shader_module`** — same shader module as the vertex shader. +- **`entry_point: Some("fs_main")`** — selects the fragment shader entry point. + Must match `@fragment fn fs_main(...)` in the WGSL. +- **`targets`** — array of color target states, one per render pass output + attachment. `&[Some(...)]` means one color target present. `None` at this + index would mean a render pass with no color output (e.g., depth-only pass). + - **`ColorTargetState` has exactly 3 fields** (no `view_formats` field): + - **`format: config.format`** — MUST match the surface format from + `SurfaceConfiguration`. The pipeline writes in this format; the surface + reads in this format. A mismatch at render time produces an error. If + you change the surface format, you must recreate the pipeline. + - **`blend: None`** — disables blending. Without blending, every fragment + color replaces the existing framebuffer pixel (`REPLACE` mode). With + blending, new and existing colors are combined according to a blend + equation (useful for transparency). + - **`write_mask: ColorWrites::ALL`** — write all four RGBA channels. + You can mask out individual channels (e.g., write only R and G) if you + need to preserve certain framebuffer channels across draw calls. +- **`compilation_options: Default::default()`** — fragment shader compilation + flags, same as the vertex compilation options above. + +**`multiview_mask: None`** — no multiview rendering. Multiview is for +stereoscopic (VR) or multi-viewport single-pass rendering. Not used here. + +**`cache: None`** — no pipeline cache. A pipeline cache stores compiled shader +binaries to speed up subsequent pipeline creation. Useful when creating many +pipelines dynamically; for a single pipeline, caching has no practical benefit.