From de38f526b9597fbbef8f93e052e519444f3f8d9d Mon Sep 17 00:00:00 2001
From: Krishna Ayyalasomayajula <krishna@ayyalasomayajula.net>
Date: Sat, 30 May 2026 17:44:31 -0500
Subject: [PATCH] docs: append sections S4-S6 (shaders, vertex data, render
 pipeline)

---
 docs/01-rainbow-triangle.md | 447 ++++++++++++++++++++++++++++++++++++
 1 file changed, 447 insertions(+)
diff --git a/docs/01-rainbow-triangle.md b/docs/01-rainbow-triangle.md
index 8a78303..e94a35b 100644
--- a/docs/01-rainbow-triangle.md
+++ b/docs/01-rainbow-triangle.md
@@ -438,3 +438,450 @@ back pressure, smoothing out frame time spikes.
 
 Steps 6 through 8 — shader module compilation, vertex buffer upload, and render
 pipeline assembly — will be explored in detail in the next sections.
+
+## S4: Writing the Shaders
+
+New concept: **shaders are GPU programs.** A [shader](concepts/GLOSSARY.md#shader)
+is a function or set of functions that runs on the GPU, compiled once at pipeline
+creation time, then executed thousands of times in parallel. Each invocation
+operates on different data but follows the identical instruction sequence. There
+is no heap allocation, no recursion, no I/O, and no shared mutable state. The
+GPU runs every invocation of a shader in lockstep: if one thread takes a
+different branch, the entire wavefront serializes both paths and discards the
+dead result. This is why you write shaders differently from CPU code — you
+optimize for parallelism and branchless arithmetic.
+
+A [shader module](concepts/GLOSSARY.md#shader) can contain multiple entry points.
+For rendering, the two mandatory entry points are the [vertex shader](concepts/GLOSSARY.md#vertex-shader)
+and the [fragment shader](concepts/GLOSSARY.md#fragment-shader). The vertex
+shader runs once per [vertex](concepts/GLOSSARY.md#vertex). The fragment shader
+runs once per [fragment](concepts/GLOSSARY.md#fragment) — that is, once per pixel
+covered by the rasterized [primitive](concepts/GLOSSARY.md#primitive).
+
+> **Key insight #1 — Interpolation is free hardware:** The vertex shader outputs
+> per-vertex colors at `@location(0)`. The [rasterizer](concepts/GLOSSARY.md#rasterizer)
+> automatically interpolates them across the triangle surface using
+> [barycentric coordinates](concepts/GLOSSARY.md#barycentric-coordinates). The
+> fragment shader just returns whatever it receives. The rainbow gradient is not
+> programmed — it is a consequence of the pipeline architecture. You supply
+> colors at three points; the hardware computes every color in between at zero
+> shader cost.
+
+**Why WGSL:** WebGPU Shading Language ([WGSL](concepts/GLOSSARY.md#wgsl)) is
+the single source format. wgpu compiles it to the platform-native intermediate
+at runtime: SPIR-V for Vulkan, MSL for Metal, DXIL for DirectX. You write one
+shader file and wgpu produces the right binary for every backend.
+
+**Why `include_str!("shader.wgsl")`:** This Rust macro embeds the file contents
+at compile time. The shader source becomes a string literal inside your binary.
+At runtime there is zero file I/O. No paths to resolve, no loading failures,
+no async reads. If the file is missing or malformed, the build fails, not the
+runtime.
+
+### The Complete Shader
+
+Create `shader.wgsl` in your project root (at the same level as `main.rs`):
+
+```wgsl
+struct VertexOutput {
+    @builtin(position) clip_position: vec4<f32>,
+    @location(0) vertex_color: vec3<f32>,
+};
+
+@vertex
+fn vs_main(
+    @location(0) position: vec3<f32>,
+    @location(1) color: vec3<f32>,
+) -> VertexOutput {
+    var out: VertexOutput;
+    out.clip_position = vec4<f32>(position, 1.0);
+    out.vertex_color = color;
+    return out;
+}
+
+@fragment
+fn fs_main(input: VertexOutput) -> @location(0) vec4<f32> {
+    return vec4<f32>(input.vertex_color, 1.0);
+}
+```
+
+### Line-by-Line Walkthrough
+
+**`struct VertexOutput { ... }`** — Defines the data flowing between the vertex
+shader and the fragment shader. This struct is not a Rust type and not a buffer
+layout — it is the output contract of the vertex shader that the rasterizer
+carries through to the fragment shader.
+
+**`@builtin(position) clip_position: vec4<f32>`** — `@builtin(position)` is a
+reserved GPU output slot. Every vertex shader must produce a `vec4<f32>` at
+this slot. This value is the vertex position in [clip space](concepts/GLOSSARY.md#clip-space).
+The GPU uses it for perspective division (dividing x, y, z by w to produce
+[ndc](concepts/GLOSSARY.md#ndc)) and clipping. In our triangle, the w
+component is 1.0, so perspective division is the identity operation — our
+positions are already in the right space.
+
+**`@location(0) vertex_color: vec3<f32>`** — `@location(0)` marks this field
+for interpolation. Any `@location(n)` output from the vertex shader that is
+not a builtin is automatically interpolated by the rasterizer using barycentric
+weights. At each vertex, the value is exact. Inside the triangle, it is the
+weighted blend of all three vertex values. The fragment shader receives a
+different `vertex_color` for every pixel, without any manual interpolation code.
+
+> **Key insight #2 — THE LOCATIONS MUST MATCH:** `shader_location: 0` in
+> Rust's `VertexAttribute` MUST equal `@location(0)` in WGSL's parameter
+> annotation. If they differ, the shader reads from the wrong memory offset
+> and produces garbage. This is not a type error or a runtime panic — it is
+> silent data corruption. The GPU reads whatever bytes live at the mismatched
+> offset and interprets them as floats.
+
+**`@vertex fn vs_main(...)`** — `@vertex` declares this function as the vertex
+shader entry point. The function is invoked once per vertex in the draw call.
+For our triangle with three vertices, `vs_main` runs exactly three times.
+
+**`@location(0) position: vec3<f32>`** — This input parameter receives data
+from the vertex buffer mapped by `shader_location: 0`. In our Rust
+`VertexBufferLayout`, the first `VertexAttribute` reads 3 floats at offset 0
+and delivers them to the shader at location 0. This is the raw NDC position.
+
+**`@location(1) color: vec3<f32>`** — The second vertex buffer attribute
+mapped to location 1. Reads 3 floats at the offset after the position
+(12 bytes into each vertex) — the per-vertex RGB color.
+
+**`var out: VertexOutput;`** — Local variable declaration. WGSL requires
+explicit variable bindings. `var` creates a mutable local.
+
+**`out.clip_position = vec4<f32>(position, 1.0);`** — Converts the `vec3`
+input into [homogeneous coordinates](concepts/GLOSSARY.md#homogeneous-coordinates)
+by appending w = 1.0. This promotes the position from 3D to clip space. With
+w = 1.0, perspective division (x/w, y/w, z/w) leaves the coordinates unchanged.
+If we were using perspective projection, the vertex shader would compute a
+nontrivial w value from the depth.
+
+**`out.vertex_color = color;`** — Passes the input color through to the output.
+The rasterizer picks this field up, interpolates it across the triangle surface,
+and delivers the interpolated value to every fragment.
+
+**`@fragment fn fs_main(input: VertexOutput)`** — `@fragment` declares the
+fragment shader entry point. `input` is the rasterizer's interpolated output
+from the vertex shader. Every `@location(n)` field in `VertexOutput` is now
+pre-blended. The `@builtin(position)` field is not interpolated — it is the
+original vertex position.
+
+**`-> @location(0) vec4<f32>`** — The fragment shader must output at least one
+color value at `@location(0)`. This number must match the corresponding color
+target in the [render pipeline](concepts/GLOSSARY.md#pipeline) descriptor. The
+return type is `vec4<f32>` — RGBA with linear-space components.
+
+**`return vec4<f32>(input.vertex_color, 1.0);`** — Promotes the interpolated
+RGB color to RGBA by setting alpha = 1.0 (fully opaque). The
+[rasterizer](concepts/GLOSSARY.md#rasterizer) interpolated `input.vertex_color`
+across the triangle; we just attach an alpha channel and return it. The output
+merge stage writes this color directly to the framebuffer.
+
+### Rust Shader Module Creation
+
+The Rust side loads the shader file at compile time and feeds the source to wgpu:
+
+```rust
+let shader_module = device.create_shader_module(
+    wgpu::ShaderModuleDescriptor {
+        label: Some("Rainbow Triangle Shader"),
+        source: wgpu::ShaderSource::Wgsl(include_str!("shader.wgsl").into()),
+    }
+);
+```
+
+- **`ShaderModuleDescriptor`** — has two fields: `label` (debug string, shown
+  in graphics debuggers and validation messages) and `source` (the shader
+  text).
+- **`ShaderSource::Wgsl(...)`** — wraps the WGSL string. wgpu also accepts
+  SPIR-V binary source via `ShaderSource::SpirV`, but WGSL is the native
+  path.
+- **`device.create_shader_module()`** — takes the descriptor and parses +
+  validates the shader. On Vulkan, wgpu translates WGSL to SPIR-V internally.
+  If the shader has syntax errors, type mismatches, or unresolved entry points,
+  this call returns an error.
+- **`&shader_module`** — the resulting handle is passed by reference into the
+  render pipeline descriptor. The module remains valid for the lifetime of the
+  pipeline.
+
+## S5: Uploading Vertex Data to the GPU
+
+New concept: **GPU memory isolation.** The GPU cannot read Rust heap or stack
+memory directly. Vertex data must be laid out as a flat byte array and uploaded
+into a dedicated GPU [[buffer slice]](concepts/GLOSSARY.md#buffer-slice). The
+pipeline configuration then describes how to interpret those bytes: how many
+bytes per vertex, what format each attribute has, and where in the vertex
+strides the attribute begins.
+
+> **Key insight #3 — `create_buffer_init` is an extension trait:** The method
+> lives in `wgpu::util::DeviceExt`, not on `Device` directly. If you call
+> `device.create_buffer_init(...)` without importing the trait, the compiler
+> reports "method not found." This is a Rust trait-discovery issue, not a wgpu
+> API issue. Add `use wgpu::util::DeviceExt;` to bring the method into scope.
+
+### The Vertex Struct
+
+```rust
+#[repr(C)]
+#[derive(Clone, Copy, bytemuck::Pod, bytemuck::Zeroable)]
+struct Vertex {
+    position: [f32; 3],
+    color: [f32; 3],
+}
+```
+
+- **`#[repr(C)]`** — Forces the Rust compiler to lay out the struct fields in
+  declaration order with no padding reordering. Without this, Rust is free to
+  reorder fields for optimal alignment, which would break the byte layout the
+  shader expects.
+- **`bytemuck::Pod`** — "Plain Old Data." Guarantees the struct has no padding
+  holes, no destructors, and a trivial memory representation. wgpu requires
+  all vertex types to be Pod so they can be safely transmuted to bytes.
+- **`bytemuck::Zeroable`** — Guarantees that initializing the struct's memory
+  to all-zero bytes produces a valid instance. Required because `Pod` alone
+  does not guarantee zero is a valid discriminant for enums or optional types.
+  Combined with Pod, it enables `bytemuck::cast_slice` to convert between
+  `&[Vertex]` and `&[u8]` without a `unsafe` block.
+
+### Vertex Data
+
+```rust
+const VERTICES: &[Vertex] = &[
+    Vertex { position: [-0.5, -0.5, 0.0], color: [1.0, 0.0, 0.0] }, // red
+    Vertex { position: [ 0.5, -0.5, 0.0], color: [0.0, 0.0, 1.0] }, // blue
+    Vertex { position: [ 0.0,  0.5, 0.0], color: [0.0, 1.0, 0.0] }, // green
+];
+```
+
+- **Positions are in NDC:** The [normalized device coordinates](concepts/GLOSSARY.md#ndc)
+  range from -1.0 (left/bottom) to +1.0 (right/top). Our triangle spans the
+  bottom half of the screen: the bottom-left corner at (-0.5, -0.5), the
+  bottom-right at (0.5, -0.5), and the top center at (0.0, 0.5). This
+  produces an upright, centered triangle.
+- **CCW winding order:** The vertices are listed counter-clockwise:
+  red → blue → green. In a standard right-handed coordinate system, connecting
+  vertices in this sequence traces the triangle counter-clockwise. This
+  determines which face is "front" and which is "back" — critical for
+  [culling](concepts/GLOSSARY.md#rasterizer) and correct normal computation.
+
+### Buffer Upload
+
+```rust
+use wgpu::util::DeviceExt;
+let vertex_buffer = device.create_buffer_init(
+    &wgpu::util::BufferInitDescriptor {
+        label: Some("Vertex Buffer"),
+        contents: bytemuck::cast_slice(VERTICES),
+        usage: wgpu::BufferUsages::VERTEX,
+    }
+);
+```
+
+- **`use wgpu::util::DeviceExt`** — imports the extension trait that adds
+  `create_buffer_init` to `Device`. Without this import, the method is not
+  visible.
+- **`device.create_buffer_init(...)`** — combined allocate-and-upload. It
+  creates a GPU buffer, allocates system memory, copies the `contents` slice
+  into staging storage, and issues a synchronous copy to GPU memory. This is a
+  convenience wrapper around `create_buffer` + `queue.write_buffer`.
+- **`bytemuck::cast_slice(VERTICES)`** — converts `&[Vertex; 3]` to `&[u8]`
+  by reinterpreting the same memory at a byte level. The GPU receives 72 bytes:
+  three vertices × 24 bytes per vertex (6 × `f32` = 6 × 4 bytes). No copy, no
+  serialization — just a pointer reinterpretation.
+- **`BufferUsages::VERTEX`** — declares this buffer will be bound as a vertex
+  buffer in the pipeline. wgpu's validation layer will reject any attempt to
+  use this buffer for staging, uniform, or storage access. Usage bits
+  are chosen at creation and cannot be changed.
+
+## S6: Compiling the Render Pipeline
+
+New concept: **the render pipeline is a compiled GPU configuration.** A
+[render pipeline](concepts/GLOSSARY.md#pipeline) bundles every decision the GPU
+needs to execute a draw: which shaders to run, how to interpret vertex buffer
+bytes, what [topology](concepts/GLOSSARY.md#topology) to use, whether to cull
+back faces, what blend mode to apply, and where to write the output. Pipeline
+creation is not a simple struct allocation — it compiles these choices into a
+GPU-executable configuration. Errors in any field are caught at creation time,
+not at draw time. This validation-upfront model is what makes pipelines expensive
+to create but cheap to execute.
+
+### Vertex Buffer Layout
+
+Before the pipeline descriptor, you must tell wgpu how to parse the byte stream
+in the vertex buffer into per-vertex attributes:
+
+```rust
+let vertex_buffer_layout = wgpu::VertexBufferLayout {
+    array_stride: std::mem::size_of::<Vertex>() as u64,
+    step_mode: wgpu::VertexStepMode::Vertex,
+    attributes: &[
+        wgpu::VertexAttribute {
+            offset: 0,
+            format: wgpu::VertexFormat::F32x3,
+            shader_location: 0,
+        },
+        wgpu::VertexAttribute {
+            offset: std::mem::size_of::<[f32; 3]>() as u64,
+            format: wgpu::VertexFormat::F32x3,
+            shader_location: 1,
+        },
+    ],
+};
+```
+
+- **`array_stride: 24`** — `size_of::<Vertex>()` = 24 bytes (6 × `f32` × 4 bytes).
+  This is the byte distance from one vertex to the next in the buffer. The GPU
+  uses this to step through the buffer: vertex 0 starts at byte 0, vertex 1
+  at byte 24, vertex 2 at byte 48.
+- **`step_mode: Vertex`** — advance the buffer by one stride for every vertex
+  the vertex shader processes. The other option is `Instance`, which advances
+  per draw instance in instanced rendering. For a single triangle, `Vertex` is
+  correct: each of the three vertices has its own position and color.
+- **First attribute — `shader_location: 0`**: reads 3 floats (`F32x3`) at byte
+  offset 0 of each vertex. These 3 floats map to the
+  [shader location](concepts/GLOSSARY.md#shader-location) `@location(0)` in the
+  vertex shader — the `position` parameter. The GPU delivers `[x, y, z]` to
+  that function argument.
+- **Second attribute — `shader_location: 1`**: reads 3 floats at offset 12
+  (`size_of::<[f32; 3]>()` = 3 × 4 = 12). Skips past the position array to
+  the color array inside each vertex. Maps to `@location(1)` in the shader —
+  the `color` parameter. If the offset were 0 instead of 12, the shader would
+  receive the position values as the color input, rendering a triangle with
+  gradient colors derived from position data.
+
+### The Complete Render Pipeline Descriptor
+
+```rust
+let pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
+    label: Some("Triangle Pipeline"),
+    layout: None,
+    vertex: wgpu::VertexState {
+        module: &shader_module,
+        entry_point: Some("vs_main"),
+        buffers: &[vertex_buffer_layout],
+        compilation_options: Default::default(),
+    },
+    primitive: wgpu::PrimitiveState {
+        topology: wgpu::PrimitiveTopology::TriangleList,
+        strip_index_format: None,
+        front_face: wgpu::FrontFace::Ccw,
+        cull_mode: Some(wgpu::Face::Back),
+        unclipped_depth: false,
+        polygon_mode: wgpu::PolygonMode::Fill,
+        conservative: false,
+    },
+    depth_stencil: None,
+    multisample: wgpu::MultisampleState {
+        count: 1,
+        mask: !0,
+        alpha_to_coverage_enabled: false,
+    },
+    fragment: Some(wgpu::FragmentState {
+        module: &shader_module,
+        entry_point: Some("fs_main"),
+        targets: &[Some(wgpu::ColorTargetState {
+            format: config.format,
+            blend: None,
+            write_mask: wgpu::ColorWrites::ALL,
+        })],
+        compilation_options: Default::default(),
+    }),
+    multiview_mask: None,
+    cache: None,
+});
+```
+
+### Field-by-Field Walkthrough
+
+**`RenderPipelineDescriptor` has 9 fields.** Every field must be present. The
+structure does not use `..Default::default()` at the descriptor level — each
+field is filled explicitly.
+
+**`label: Some("Triangle Pipeline")`** — Debug string. Shown in GPU profilers
+(RenderDoc, Nvidia Nsight) and wgpu validation error messages. Omitting it
+produces anonymous pipelines that are impossible to trace during debugging.
+
+**`layout: None`** — Derives the pipeline layout from the shader module
+automatically. When no push constants or bind groups are used, `None` tells wgpu
+to infer the layout. If you later add `@group(n)` bindings to your shader, you
+must provide a `RenderPipelineLayout` created with `device.create_render_pipeline_layout()`.
+
+**`vertex` — [`VertexState`](concepts/GLOSSARY.md#vertex-shader) (4 fields):**
+- **`module: &shader_module`** — references the compiled shader module from S4.
+- **`entry_point: Some("vs_main")`** — selects which function in the module is
+  the vertex shader entry point. Must match the `@vertex fn vs_main(...)`
+  declaration exactly.
+- **`buffers: &[vertex_buffer_layout]`** — array of vertex buffer layouts.
+  Multiple layouts are used rarely (multi-mesh, GPU instancing with separate
+  instance buffers). For a single vertex buffer, one layout suffices.
+- **`compilation_options: Default::default()`** — shader compilation backend
+  hints. Default uses the backend's standard flags for optimization and SPIR-V
+  version.
+
+**`primitive` — [`PrimitiveState`](concepts/GLOSSARY.md#primitive) (7 fields):**
+- **`topology: TriangleList`** — every 3 consecutive vertices form one
+  triangle. For 3 vertices, this produces exactly 1 triangle. If we had 6
+  vertices, it would produce 2 independent triangles.
+- **`strip_index_format: None`** — only set for `TriangleStrip` or `LineStrip`
+  topologies when using restart indices. Not applicable to `TriangleList`.
+- **`front_face: Ccw`** — counter-clockwise winding defines the front face of
+  a triangle. Combined with `cull_mode`, this determines which triangles are
+  visible. Because our vertices are listed CCW in S5, triangles drawn in that
+  order face toward the viewer.
+- **`cull_mode: Some(wgpu::Face::Back)`** — discard triangles whose winding
+  indicates a back face. For a single triangle viewed from the front, this is
+  harmless but establishes correct culling for 3D geometry where back faces
+  are guaranteed not to be visible.
+- **`unclipped_depth: false`** — depth values outside [0.0, 1.0] are clipped
+  (the standard behavior). `true` allows depth values beyond the normal range
+  to pass through — used for specific depth-testing tricks.
+- **`polygon_mode: Fill`** — render the full interior of the triangle. Other
+  options are `Line` (wireframe edges) and `Point` (vertex points only).
+- **`conservative: false`** — the rasterizer fragments only pixels provably
+  inside the triangle. `true` fragments every pixel that *might* intersect the
+  triangle — used for conservative rasterization (shadow volumes, occlusion
+  queries).
+
+**`depth_stencil: None`** — No depth buffer or stencil buffer. Without depth
+testing, triangles are drawn in submission order: later draws overwrite earlier
+draws at the same pixel. For a single triangle this is not a concern.
+
+**`multisample` — [`MultisampleState`](concepts/GLOSSARY.md#fragment) (3 fields):**
+- **`count: 1`** — no multisampling. Each pixel produces one fragment. Higher
+  values (2, 4, 8) activate MSAA, sampling multiple points per pixel and
+  reducing aliasing at the cost of framebuffer bandwidth.
+- **`mask: !0`** — all sample bits are enabled. This mask allows you to
+  selectively disable individual MSAA samples (advanced use case).
+- **`alpha_to_coverage_enabled: false`** — do not use the alpha channel of the
+  fragment color as a coverage mask. Enabled for transparent edge antialiasing
+  (e.g., font rendering).
+
+**`fragment` — [`FragmentState`](concepts/GLOSSARY.md#fragment-shader) (4 fields):**
+- **`module: &shader_module`** — same shader module as the vertex shader.
+- **`entry_point: Some("fs_main")`** — selects the fragment shader entry point.
+  Must match `@fragment fn fs_main(...)` in the WGSL.
+- **`targets`** — array of color target states, one per render pass output
+  attachment. `&[Some(...)]` means one color target present. `None` at this
+  index would mean a render pass with no color output (e.g., depth-only pass).
+  - **`ColorTargetState` has exactly 3 fields** (no `view_formats` field):
+    - **`format: config.format`** — MUST match the surface format from
+      `SurfaceConfiguration`. The pipeline writes in this format; the surface
+      reads in this format. A mismatch at render time produces an error. If
+      you change the surface format, you must recreate the pipeline.
+    - **`blend: None`** — disables blending. Without blending, every fragment
+      color replaces the existing framebuffer pixel (`REPLACE` mode). With
+      blending, new and existing colors are combined according to a blend
+      equation (useful for transparency).
+    - **`write_mask: ColorWrites::ALL`** — write all four RGBA channels.
+      You can mask out individual channels (e.g., write only R and G) if you
+      need to preserve certain framebuffer channels across draw calls.
+- **`compilation_options: Default::default()`** — fragment shader compilation
+  flags, same as the vertex compilation options above.
+
+**`multiview_mask: None`** — no multiview rendering. Multiview is for
+stereoscopic (VR) or multi-viewport single-pass rendering. Not used here.
+
+**`cache: None`** — no pipeline cache. A pipeline cache stores compiled shader
+binaries to speed up subsequent pipeline creation. Useful when creating many
+pipelines dynamically; for a single pipeline, caching has no practical benefit.