docs: elevate critical WHY explanations to callout blocks

docs: add simplification caveat to graphics pipeline model
docs: add coordinate space journey diagram
2026-05-30 21:01:23 -05:00 · 2026-05-30 20:59:58 -05:00 · 2026-05-30 20:59:18 -05:00 · 2026-05-30 20:59:06 -05:00 · 2026-05-30 20:57:00 -05:00 · 2026-05-30 20:53:58 -05:00
6 changed files with 399 additions and 266 deletions
--- a/docs/01-rainbow-triangle.md
+++ b/docs/01-rainbow-triangle.md
@@ -13,7 +13,7 @@ blending the three vertex colors proportionally to their distance. The result
 is a smooth rainbow gradient across a single primitive. We do not need a texture,
 a colormap, or a fragment shader with any branching — just three colored
 vertices and the default linear interpolation the [rasterizer](concepts/GLOSSARY.md#rasterizer)
-applies to every [varying](concepts/GLOSSARY.md#varying).
+applies to every [interpolated value](concepts/GLOSSARY.md#interpolation).

 If you haven't read the [concept overview](concepts/graphics-pipeline.md), do so
 now. [Coordinate systems](concepts/coordinate-systems.md) explains how the GPU
@@ -143,7 +143,7 @@ impl ApplicationHandler<()> for App {
        let Some(window) = self.window.as_ref() else { return };

        match event {
-            WindowEvent::Resized(size) => state.resize(window, size),
+            WindowEvent::Resized(size) => state.resize(size),
            WindowEvent::CloseRequested { .. } => event_loop_ctl.exit(),
            WindowEvent::RedrawRequested => {
                state.render();
@@ -159,23 +159,17 @@ impl ApplicationHandler<()> for App {
 }
 ```

-**Why `spawn_blocking`:** The display server event loop must run to completion
-and cannot be interrupted. If we ran `run_app()` on the tokio runtime thread,
-no other async tasks could execute. By spawning it on a blocking thread, the
-tokio runtime remains free for GPU queries, driver I/O, and future background
-tasks.
+> **WHY: `spawn_blocking` for winit**
+>
+> The display server event loop must run to completion and cannot be interrupted. If we ran `run_app()` on the tokio runtime thread, no other async tasks could execute. By spawning it on a blocking thread, the tokio runtime remains free for GPU queries, driver I/O, and future background tasks.

-**Why `Handle::block_on`:** wgpu's `request_adapter` and `request_device` query
-the driver over async D-Bus/Wayland/Vulkan entrypoints. These futures must be
-polled by a runtime executor. `block_on` attaches temporarily to the runtime
-thread via its handle, polls the future to completion (~50ms), then returns the
-result.
+> **WHY: `Handle::block_on` for async GPU init**
+>
+> wgpu's `request_adapter` and `request_device` query the driver over async D-Bus/Wayland/Vulkan entrypoints. These futures must be polled by a runtime executor. `block_on` attaches temporarily to the runtime thread via its handle, polls the future to completion (~50ms), then returns the result.

-**Why `ControlFlow::Poll`:** winit supports `ControlFlow::Poll` (continuous
-redraw) and `ControlFlow::Wait` (idle until next event). A graphics application
-needs a steady render loop. `Poll` tells winit to keep firing `RedrawRequested`
-events. We re-queue ourselves inside the handler via `window.request_redraw()`,
-matching the wgpu swapchain presentation rhythm.
+> **WHY: `ControlFlow::Poll` for the render loop**
+>
+> winit supports `ControlFlow::Poll` (continuous redraw) and `ControlFlow::Wait` (idle until next event). A graphics application needs a steady render loop. `Poll` tells winit to keep firing `RedrawRequested` events. We re-queue ourselves inside the handler via `window.request_redraw()`, matching the wgpu swapchain presentation rhythm.

 **Why `request_redraw()`:** After presenting a frame to the display, we ask
 winit to schedule the next `RedrawRequested` frame. This creates an explicit
@@ -192,6 +186,18 @@ exits.

 New concept: **5-layer GPU connection.** Each layer adds a capability:

+```text
+Instance
+   │
+   ├──> Surface          (winit window → GPU surface)
+   │
+   ├──> Adapter          (select GPU: integrated vs discrete)
+   │
+   ├──> Device + Queue   (GPU connection + command submission)
+   │
+   └──> SurfaceConfiguration (swapchain: format, size, present mode)
+```
+
 1. **[Instance](concepts/GLOSSARY.md#instance)** — opens a connection to the
   graphics driver. On Vulkan this loads the Vulkan loader and registers
   instance-level extensions. On WebGL this picks the browser GPU context.
@@ -217,6 +223,7 @@ struct State {
    device: wgpu::Device,
    queue: wgpu::Queue,
    config: wgpu::SurfaceConfiguration,
+    window: Arc<Window>,
    pipeline: wgpu::RenderPipeline,
    vertex_buffer: wgpu::Buffer,
 }
@@ -235,10 +242,15 @@ struct State {
 - **`config`** — holds the surface's current width, height, pixel format, and
  [present mode](concepts/GLOSSARY.md#present-mode). When the window is resized,
  we reconfigure the surface with updated dimensions.
- **`pipeline`** — the compiled [render pipeline](concepts/GLOSSARY.md#render-pipeline).
+- **`window`** — shared reference to the winit window. Stored as an `Arc` so
+   the `resize()` method and the `CurrentSurfaceTexture::Outdated` recovery handler can
+  access the window's current dimensions. When the surface becomes outdated
+  (e.g., after a compositor restart or display hotplug), recovery requires
+  reconfiguring the swapchain with the window's live size — and that requires
+  holding a reference to the window itself.
+- **`pipeline`** — the compiled [render pipeline](concepts/GLOSSARY.md#pipeline-render).
  A render pipeline is an immutable configuration combining a shader, a vertex
-  buffer layout, a primitive topology, and a [color target](concepts/GLOSSARY.md#color-target)
-  setup. Switching pipelines mid-frame is expensive; most applications use a few
+  buffer layout, a primitive topology, and a color target setup. Switching pipelines mid-frame is expensive; most applications use a few
  pipelines and change them between draw calls.
 - **`vertex_buffer`** — GPU memory holding our vertex data. The GPU reads
  position and color data directly from this buffer during the vertex shader
@@ -389,6 +401,7 @@ impl State {
            device,
            queue,
            config,
+            window: Arc::clone(&window),
            pipeline,
            vertex_buffer,
        })
@@ -425,10 +438,20 @@ becomes a [command buffer](concepts/GLOSSARY.md#command-buffer) that is submitte
 to this queue. On Vulkan, the device corresponds to `VkDevice` and the queue
 to a `VkQueue`.

+> **Key insight — Validation layers catch GPU errors at runtime:** wgpu ships
+> with built-in validation layers that inspect your API calls for common
+> mistakes: incorrect buffer bindings, mismatched pipeline state, out-of-bounds
+> buffer slices, and resource lifecycle violations. These layers run
+> automatically during development and surface errors as log messages or
+> panics, saving hours of debugging silent GPU corruption. The tradeoff:
+> validation adds measurable overhead to every frame. In release builds,
+> disable validation by omitting `InstanceFlags::VALIDATION` when creating the
+> `Instance`, or set the `WGPU_VALIDATION=0` environment variable.
+
 **Step 5 — SurfaceConfiguration:** This allocates the
 [swapchain](concepts/GLOSSARY.md#swapchain) [framebuffers](concepts/GLOSSARY.md#framebuffer).
 We negotiate the pixel format with the driver (preferring an
-[sRGB](concepts/GLOSSARY.md#srgb) format for correct color display), pick the
+[sRGB](concepts/GLOSSARY.md) format for correct color display), pick the
 window dimensions (clamped to at least 1x1 to allow minimize-and-restore on some
 platforms), and select the [present mode](concepts/GLOSSARY.md#present-mode).
 `PresentMode::Mailbox` is a triple-buffered present mode that provides
@@ -527,12 +550,9 @@ weights. At each vertex, the value is exact. Inside the triangle, it is the
 weighted blend of all three vertex values. The fragment shader receives a
 different `vertex_color` for every pixel, without any manual interpolation code.

-> **Key insight #2 — THE LOCATIONS MUST MATCH:** `shader_location: 0` in
-> Rust's `VertexAttribute` MUST equal `@location(0)` in WGSL's parameter
-> annotation. If they differ, the shader reads from the wrong memory offset
-> and produces garbage. This is not a type error or a runtime panic — it is
-> silent data corruption. The GPU reads whatever bytes live at the mismatched
-> offset and interprets them as floats.
+> **WHY: `@location` must match between Rust and WGSL**
+>
+> `shader_location: 0` in Rust's `VertexAttribute` MUST equal `@location(0)` in WGSL's parameter annotation. If they differ, the shader reads from the wrong memory offset and produces garbage. This is not a type error or a runtime panic — it is silent data corruption. The GPU reads whatever bytes live at the mismatched offset and interprets them as floats.

 **`@vertex fn vs_main(...)`** — `@vertex` declares this function as the vertex
 shader entry point. The function is invoked once per vertex in the draw call.
@@ -564,12 +584,26 @@ and delivers the interpolated value to every fragment.
 **`@fragment fn fs_main(input: VertexOutput)`** — `@fragment` declares the
 fragment shader entry point. `input` is the rasterizer's interpolated output
 from the vertex shader. Every `@location(n)` field in `VertexOutput` is now
-pre-blended. The `@builtin(position)` field is not interpolated — it is the
-original vertex position.
+pre-blended with barycentric weights.
+
+> **Key insight — TWO `@builtin(position)` builtins, zero connection:**
+> Vertex `@builtin(position)` and fragment `@builtin(position)` are two
+> completely separate builtins that happen to share the same name. The vertex
+> shader outputs clip-space coordinates into `@builtin(position)` for the
+> rasterizer to perform perspective division and viewport transform. The
+> fragment shader receives an entirely different `@builtin(position)` injected
+> by the fragment stage, providing framebuffer pixel coordinates: `x`/`y` are
+> the pixel center within the viewport, `z` is the depth value (typically
+> [0, 1]), and `w` is the interpolated reciprocal of the vertex clip-space
+> w-coordinate (1/w). The vertex shader's position output is NOT passed to the
+> fragment shader's position input. They are independent builtins from
+> different pipeline stages. If you need to pass data from vertex to fragment
+> with interpolation, use `@location(N)` on regular struct fields — which is
+> exactly what `vertex_color` does in our shader.

 **`-> @location(0) vec4<f32>`** — The fragment shader must output at least one
 color value at `@location(0)`. This number must match the corresponding color
-target in the [render pipeline](concepts/GLOSSARY.md#pipeline) descriptor. The
+target in the [render pipeline](concepts/GLOSSARY.md#pipeline-render) descriptor. The
 return type is `vec4<f32>` — RGBA with linear-space components.

 **`return vec4<f32>(input.vertex_color, 1.0);`** — Promotes the interpolated
@@ -609,7 +643,7 @@ let shader_module = device.create_shader_module(

 New concept: **GPU memory isolation.** The GPU cannot read Rust heap or stack
 memory directly. Vertex data must be laid out as a flat byte array and uploaded
-into a dedicated GPU [[buffer slice]](concepts/GLOSSARY.md#buffer-slice). The
+into a dedicated GPU [buffer slice](concepts/GLOSSARY.md#buffer-slice). The
 pipeline configuration then describes how to interpret those bytes: how many
 bytes per vertex, what format each attribute has, and where in the vertex
 strides the attribute begins.
@@ -631,18 +665,9 @@ struct Vertex {
 }
 ```

- **`#[repr(C)]`** — Forces the Rust compiler to lay out the struct fields in
-  declaration order with no padding reordering. Without this, Rust is free to
-  reorder fields for optimal alignment, which would break the byte layout the
-  shader expects.
- **`bytemuck::Pod`** — "Plain Old Data." Guarantees the struct has no padding
-  holes, no destructors, and a trivial memory representation. wgpu requires
-  all vertex types to be Pod so they can be safely transmuted to bytes.
- **`bytemuck::Zeroable`** — Guarantees that initializing the struct's memory
-  to all-zero bytes produces a valid instance. Required because `Pod` alone
-  does not guarantee zero is a valid discriminant for enums or optional types.
-  Combined with Pod, it enables `bytemuck::cast_slice` to convert between
-  `&[Vertex]` and `&[u8]` without a `unsafe` block.
+> **WHY: `#[repr(C)]` + bytemuck for GPU data layout**
+>
+> `#[repr(C)]` forces the Rust compiler to lay out the struct fields in declaration order with no padding reordering. Without this, Rust is free to reorder fields for optimal alignment, which would break the byte layout the shader expects. `bytemuck::Pod` ("Plain Old Data") guarantees the struct has no padding holes, no destructors, and a trivial memory representation. wgpu requires all vertex types to be Pod so they can be safely transmuted to bytes. `bytemuck::Zeroable` guarantees that initializing the struct's memory to all-zero bytes produces a valid instance. Required because `Pod` alone does not guarantee zero is a valid discriminant for enums or optional types. Combined with Pod, it enables `bytemuck::cast_slice` to convert between `&[Vertex]` and `&[u8]` without an unsafe block.

 ### Vertex Data

@@ -662,8 +687,8 @@ const VERTICES: &[Vertex] = &[
 - **CCW winding order:** The vertices are listed counter-clockwise:
  red → blue → green. In a standard right-handed coordinate system, connecting
  vertices in this sequence traces the triangle counter-clockwise. This
-  determines which face is "front" and which is "back" — critical for
-  [culling](concepts/GLOSSARY.md#rasterizer) and correct normal computation.
+   determines which face is "front" and which is "back" — critical for
+   [culling](concepts/GLOSSARY.md) and correct normal computation.

 ### Buffer Upload

@@ -697,7 +722,7 @@ let vertex_buffer = device.create_buffer_init(
 ## S6: Compiling the Render Pipeline

 New concept: **the render pipeline is a compiled GPU configuration.** A
-[render pipeline](concepts/GLOSSARY.md#pipeline) bundles every decision the GPU
+[render pipeline](concepts/GLOSSARY.md#pipeline-render) bundles every decision the GPU
 needs to execute a draw: which shaders to run, how to interpret vertex buffer
 bytes, what [topology](concepts/GLOSSARY.md#topology) to use, whether to cull
 back faces, what blend mode to apply, and where to write the output. Pipeline
@@ -904,6 +929,19 @@ continues the next frame while the GPU works in the background.
 > `encoder.finish()` seals the script. `queue.submit()` dispatches it. The GPU
 > executes it later, in parallel. There is no `.await` on a draw call.

+### Render Loop Cycle
+
+```
+[RedrawRequested event]
+         │
+         ▼
+get_current_texture() → [Success?] → Yes → record commands
+         │                                        │
+         └── No (Timeout/Occluded) → skip frame    │
+                                                    ▼
+device.poll() → encoder.begin_render_pass() → draw() → submit() → present()
+```
+
 ### The `render(&mut self)` Method Signature

 ```rust
@@ -921,7 +959,7 @@ wait for GPU completion.
 ### Acquiring a Back Buffer from the Swapchain

 ```rust
-let status = self.surface.get_current_texture();
+let frame = self.surface.get_current_texture();
 ```

 `get_current_texture()` is how you acquire a back buffer from the
@@ -930,104 +968,105 @@ into for this frame. In a triple-buffered swapchain (`PresentMode::Mailbox`),
 there are up to two spare back buffers waiting for you. `get_current_texture()`
 hands you the next available one.

-In wgpu 29+, this method returns a `CurrentSurfaceTexture` **enum** — not a
-`Result`. The swapchain can be in seven distinct states, and every state is a
-valid, non-error condition:
+In wgpu 29, this method returns `CurrentSurfaceTexture`, a standalone enum with
+7 variants describing the state of the swapchain's next back buffer:

-> **Key insight #5 — 7 swapchain states you must handle:** `Success(buf)` —
-> render normally. `Suboptimal(buf)` — render but reconfig is advisable.
-> `Timeout` — skip frame (GPU late). `Occluded` — skip frame (window behind
-> another). `Outdated` — `self.resize()` to reconfigure. `Lost` — skip frame
-> (display server restarted). `Validation` — skip frame (API misuse; check
-> logs).
+> **Key insight #5 — 7 surface texture variants you must handle:**
+> `CurrentSurfaceTexture::Success(frame)` — render normally.
+> `CurrentSurfaceTexture::Suboptimal(frame)` — render (buffer available but
+> not ideal, e.g., format mismatch). `CurrentSurfaceTexture::Timeout` — skip
+> frame (GPU late). `CurrentSurfaceTexture::Occluded` — skip frame (window
+> fully covered). `CurrentSurfaceTexture::Outdated` — surface changed,
+> reconfigure. `CurrentSurfaceTexture::Lost` — surface destroyed, cannot
+> recover without re-init.
+> `CurrentSurfaceTexture::Validation { source, description }` — API
+> validation caught an error, skip frame and log.

-WHY `match` on 7 variants: `get_current_texture()` does not return a `Result`.
-All 7 states are valid and the match must be exhaustive. The Rust compiler
-enforces this — you cannot miss a variant.
+WHY `match` on the enum: `get_current_texture()` returns a
+`CurrentSurfaceTexture` enum, not a `Result`. You match on the variant
+directly. `Success` and `Suboptimal` both carry a `SurfaceTexture` you can
+render into — the only difference is that `Suboptimal` signals the buffer may
+not be ideal (e.g., a format downgrade). The Rust compiler enforces exhaustive
+matching across all 7 variants.

 ### The Complete `render` Implementation

 ```rust
 fn render(&mut self) {
-    let status = self.surface.get_current_texture();
-
-    match status {
-        wgpu::SurfaceStatus::Success(surface_texture)
-        | wgpu::SurfaceStatus::Suboptimal(surface_texture) => {
-            // Drive GPU work: shader compilation, memory allocation, fence signaling
-            if let Err(e) = self.device.poll(wgpu::PollType::Wait { submission_index: None, timeout: None }) {
-                log::error!("Device poll failed: {e}");
-                return;
-            }
-
-            let texture_view = surface_texture.texture.create_view(&Default::default());
-
-            let mut encoder = self.device.create_command_encoder(
-                &wgpu::CommandEncoderDescriptor {
-                    label: Some("Main Command Encoder"),
-                },
-            );
-
-            {
-                let mut render_pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor {
-                    label: Some("Main Render Pass"),
-                    color_attachments: &[Some(wgpu::RenderPassColorAttachment {
-                        view: &texture_view,
-                        depth_slice: None,
-                        resolve_target: None,
-                        ops: wgpu::Operations {
-                            load: wgpu::LoadOp::Clear(wgpu::Color {
-                                r: 0.1,
-                                g: 0.1,
-                                b: 0.1,
-                                a: 1.0,
-                            }),
-                            store: wgpu::StoreOp::Store,
-                        },
-                    })],
-                    depth_stencil_attachment: None,
-                    timestamp_writes: None,
-                    occlusion_query_set: None,
-                    multiview_mask: None,
-                });
-
-                render_pass.set_pipeline(&self.pipeline);
-                render_pass.set_vertex_buffer(0, self.vertex_buffer.slice(..));
-                render_pass.draw(0..3, 0..1);
-            } // render_pass drops here — render pass ends automatically
-
-            self.queue.submit(std::iter::once(encoder.finish()));
-            surface_texture.present();
+    let frame = match self.surface.get_current_texture() {
+        wgpu::CurrentSurfaceTexture::Success(frame)
+        | wgpu::CurrentSurfaceTexture::Suboptimal(frame) => frame,
+        wgpu::CurrentSurfaceTexture::Timeout => {
+            log::warn!("Surface timeout — skipping frame");
+            return;
        }
-
-        wgpu::SurfaceStatus::Timeout => {
-            // GPU took too long to finish previous work. Skip this frame.
-            log::warn!("Surface status: Timeout — skipping frame");
+        wgpu::CurrentSurfaceTexture::Occluded => {
+            log::warn!("Surface occluded — skipping frame");
+            return;
        }
-
-        wgpu::SurfaceStatus::Occluded => {
-            // Window is fully occluded by another window. Skip rendering.
-            log::debug!("Surface status: Occluded — skipping frame");
+        wgpu::CurrentSurfaceTexture::Outdated => {
+            log::warn!("Surface outdated — resizing");
+            let size = self.window.inner_size();
+            self.resize(size);
+            return;
        }
-
-        wgpu::SurfaceStatus::Outdated => {
-            // Swapchain resolution no longer matches window. Reconfigure.
-            log::warn!("Surface status: Outdated — resizing");
-            if let Some(window) = &self.window {
-                self.resize(window.inner_size());
-            }
+        wgpu::CurrentSurfaceTexture::Lost => {
+            log::error!("Surface lost — GPU resources invalidated; full re-init required");
+            // Production recovery: signal App to drop `self.state`,
+            // then recreate on the next RedrawRequested or in a
+            // dedicated recovery callback. See callout below.
+            return;
        }
-
-        wgpu::SurfaceStatus::Lost => {
-            // Display server restarted or GPU lost. Fatal without re-init.
-            log::error!("Surface status: Lost — cannot recover without re-creating State");
+        wgpu::CurrentSurfaceTexture::Validation { source, description } => {
+            log::error!("Surface validation error: {:?} — {}", source, description);
+            return;
        }
+    };

-        wgpu::SurfaceStatus::Validation { source, description } => {
-            // wgpu validated your descriptor and found it invalid.
-            log::error!("Surface validation: {source} — {description}");
-        }
+    // Drive GPU work: shader compilation, memory allocation, fence signaling
+    if let Err(e) = self.device.poll(wgpu::PollType::Wait { submission_index: None, timeout: None }) {
+        log::error!("Device poll failed: {e}");
+        return;
    }
+
+    let texture_view = frame.texture.create_view(&Default::default());
+
+    let mut encoder = self.device.create_command_encoder(
+        &wgpu::CommandEncoderDescriptor {
+            label: Some("Main Command Encoder"),
+        },
+    );
+
+    {
+        let mut render_pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor {
+            label: Some("Main Render Pass"),
+            color_attachments: &[Some(wgpu::RenderPassColorAttachment {
+                view: &texture_view,
+                depth_slice: None,
+                resolve_target: None,
+                ops: wgpu::Operations {
+                    load: wgpu::LoadOp::Clear(wgpu::Color {
+                        r: 0.1,
+                        g: 0.1,
+                        b: 0.1,
+                        a: 1.0,
+                    }),
+                    store: wgpu::StoreOp::Store,
+                },
+            })],
+            depth_stencil_attachment: None,
+            timestamp_writes: None,
+            occlusion_query_set: None,
+            multiview_mask: None,
+        });
+
+        render_pass.set_pipeline(&self.pipeline);
+        render_pass.set_vertex_buffer(0, self.vertex_buffer.slice(..));
+        render_pass.draw(0..3, 0..1);
+    } // render_pass drops here — render pass ends automatically
+
+    self.queue.submit(std::iter::once(encoder.finish()));
+    frame.present();
 }
 ```

@@ -1038,6 +1077,17 @@ from the [swapchain](concepts/GLOSSARY.md#swapchain). The swapchain cycles throu
 2–3 pre-allocated back buffers. This call returns immediately if a buffer is
 available; it does not block on the GPU.

+> **Surface Lost recovery pattern:** `Lost` means the compositor destroyed the
+> surface (display server restart, GPU reset, hotplug, etc.). Every GPU
+> resource tied to that surface — the `Surface`, `Device`, `Queue`, pipeline,
+> buffers — is irrecoverably invalidated. You cannot reuse any of them. The
+> production pattern is to set `self.state = None` in `App`, then on the next
+> `RedrawRequested` (or in a dedicated recovery callback), re-run the full
+> `State::new()` initialization chain from S3. This recreates the adapter,
+> device, surface, and all child resources. Without this, continued renders
+> against a lost surface will either panic or silently produce corrupted
+> output.
+
 **`device.poll(wgpu::PollType::Wait { submission_index: None, timeout: None })`** — **Synchronous** call that drives
 in-flight GPU work to completion: shader compilation fences, memory allocation,
 and queue signaling. Without this, resources accumulate because the device does
@@ -1083,13 +1133,13 @@ color_attachments: &[Some(wgpu::RenderPassColorAttachment {
 **`RenderPassColorAttachment` has exactly 4 fields:**

 - **`view: &texture_view`** — the framebuffer we draw into. Must match the
-  color target format in the [render pipeline](concepts/GLOSSARY.md#render-pipeline).
+  color target format in the [render pipeline](concepts/GLOSSARY.md#pipeline-render).
 - **`depth_slice: None`** — only used for 3D texture slices. Not applicable
  to 2D rendering.
 - **`resolve_target: None`** — only used for MSAA resolve. When multisampling
  is active, the render pass writes to a multisampled buffer and resolves into
  this target. We have no MSAA, so `None`.
- **`ops`** — [operations](concepts/GLOSSARY.md#render-pass) controlling load
+- **`ops`** — [operations](concepts/GLOSSARY.md#operations) controlling load
  and store behavior. Two sub-fields:
  - **`load: LoadOp::Clear(color)`** — before drawing, fill the entire
    framebuffer with this color. **This IS your background color.** Dark gray.
@@ -1114,7 +1164,7 @@ that pass the depth test). Useful for visibility-based culling.
 ### Binding State and Drawing

 **`render_pass.set_pipeline(&self.pipeline)`** — Tells the GPU which
-[render pipeline](concepts/GLOSSARY.md#render-pipeline) to use for subsequent
+[render pipeline](concepts/GLOSSARY.md#pipeline-render) to use for subsequent
 draw calls. The pipeline encapsulates the shader programs, vertex format,
 primitive topology, and output configuration. Must be set before any draw call
 in a render pass. Switching pipelines mid-pass is expensive and should be
@@ -1165,22 +1215,26 @@ buffer from "render target" to "front buffer" on the next vsync.

 ### Why the Match Arms Differ

- **`Success` / `Suboptimal`** — both deliver a `SurfaceTexture` you can render
-  into. The difference: `Suboptimal` means the current swapchain configuration
-  is not ideal for the GPU (e.g., format mismatch). You render normally but
-  should consider reconfiguring the surface during idle time.
- **`Timeout`** — the GPU exceeded the wait threshold for a back buffer. Skip
-  the frame. The GPU will catch up.
- **`Occluded`** — another fully covers your window. Skip rendering entirely —
-  the display server will not show your output. Saves GPU work.
- **`Outdated`** — the swapchain was created for a resolution that no longer
-  matches the window. Reconfigure the surface to match.
- **`Lost`** — the GPU or display server has been reset. Without re-creating
-  the device and surface, you cannot recover. In a real application, you'd
-  trigger a full re-initialization.
- **`Validation`** — wgpu rejected the surface configuration due to API misuse.
-  Check logs for the description. This is a programming error, not a runtime
-  condition.
+- **`CurrentSurfaceTexture::Success(frame)` / `Suboptimal(frame)`** — the
+  swapchain delivered a `SurfaceTexture` you can render into. `Success` means
+  the buffer is ideal. `Suboptimal` means the buffer is available but may not
+  be ideal (e.g., format mismatch, downgraded resolution). Both carry the
+  same `SurfaceTexture`. Extract `frame.texture` to create a view, render,
+  then call `frame.present()`.
+- **`CurrentSurfaceTexture::Timeout`** — the GPU exceeded the wait threshold
+  for a back buffer. Skip the frame. The GPU will catch up.
+- **`CurrentSurfaceTexture::Occluded`** — the window is fully covered by
+  another window. Skip the frame; there's no point rendering to an invisible
+  surface.
+- **`CurrentSurfaceTexture::Outdated`** — the swapchain was created for a
+  resolution that no longer matches the window. Reconfigure the surface
+  using `self.window.inner_size()` to match the current dimensions.
+- **`CurrentSurfaceTexture::Lost`** — the GPU or display server has been
+  reset. Without re-creating the device and surface, you cannot recover. In
+  a real application, you'd trigger a full re-initialization.
+- **`CurrentSurfaceTexture::Validation { source, description }`** — the wgpu
+  validation layer caught an API misuse. Log the diagnostic and skip the
+  frame.

 ## S8: Handling Window Resize

@@ -1249,8 +1303,9 @@ returns.
 ### When `resize` Is Called

 In our `App::window_event` handler (S2), the `WindowEvent::Resized(size)` arm
-calls `state.resize(window, size)`. The resize fires once for every dimension
-change. On fast window resizing, you may receive dozens of resize events in
+calls `state.resize(size)`. Since `State` owns an `Arc<Window>` (see S3),
+`resize()` has access to the window internally and needs only the new
+dimension. The resize fires once for every dimension change. On fast window resizing, you may receive dozens of resize events in
 succession. `surface.configure()` is fast enough to handle this — each call
 discards old buffers and allocates new ones. The GPU continues processing
 in-flight frames with the old buffer dimensions; there is no visual glitch
@@ -1347,4 +1402,19 @@ With the render loop and pipeline foundation in place, the next steps are:
 - **Compute shaders and GPU compute
  pipelines** — general-purpose GPU computation outside the graphics pipeline

+> **Prerequisite note — matrix math:** Every topic above ultimately depends on
+> matrix mathematics. Transforms (model, view, and projection matrices) move
+> geometry from local object space through world space, camera space, and
+> finally into clip space. In this tutorial, all vertex positions are hardcoded
+> NDC coordinates so we can focus on the rendering pipeline itself. Real
+> applications compute these coordinates via matrix multiplication: a
+> transformation matrix is uploaded to the GPU as a uniform, and the vertex
+> shader multiplies each vertex by that matrix before outputting
+> `clip_position`. If linear algebra is unfamiliar, study it before diving
+> into the next tutorials. Recommended resources: [Learn
+> OpenGL's linear algebra section](https://learnopengl.com/Getting-started/Coordinate-Systems)
+> for a graphics-oriented treatment, and
+> [3Blue1Brown's Essence of Linear Algebra](https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab)
+> for an intuitive visual foundation.
+
 Keep [concepts/GLOSSARY.md](concepts/GLOSSARY.md) handy as you move forward.
--- a/docs/TROUBLESHOOTING.md
+++ b/docs/TROUBLESHOOTING.md
@@ -57,7 +57,7 @@ vulkaninfo  # verify installation

 **Symptom:** Program crashes with a surface lost error during rendering.

-**Cause:** Display server restarted or GPU context was reset. The [Surface](concepts/GLOSSARY.md#Surface) is permanently invalidated.
+**Cause:** Display server restarted or GPU context was reset. The [Surface](concepts/GLOSSARY.md#surface) is permanently invalidated.

 **Fix:** In the tutorial, this means the window needs to be reopened. In production code, handle the `Lost` variant of `CurrentSurfaceTexture` by recreating the surface via `Instance::create_surface()`.

@@ -108,7 +108,7 @@ fn exiting(&mut self, event_loop_ctl: &ActiveEventLoop) {

 **Symptom:** Triangle renders but is a single uniform color instead of smoothly blending.

-**Cause:** Fragment shader returns a constant color instead of passing through `input.vertex_color`. The [rasterizer](concepts/GLOSSARY.md#Rasterizer) interpolates vertex colors automatically, but only if the fragment shader uses them.
+**Cause:** Fragment shader returns a constant color instead of passing through `input.vertex_color`. The [rasterizer](concepts/GLOSSARY.md#rasterizer) interpolates vertex colors automatically, but only if the fragment shader uses them.

 **Fix:** Ensure fragment shader returns the interpolated vertex color:
 ```wgsl
@@ -123,6 +123,31 @@ Not this (which returns a solid color):
 return vec4<f32>(1.0, 1.0, 1.0, 1.0);  // wrong: solid white
 ```

+## 11. WGSL shader compilation errors
+
+**Symptom:** Program panics or logs errors during `device.create_shader_module()` or `create_render_pipeline()` with messages referencing WGSL validation failures.
+
+**Cause:** wgpu validates WGSL shaders at pipeline creation time, not at runtime. Errors surface immediately when the shader module is created or when the render pipeline references it. Common causes include:
+- Syntax errors: typos in WGSL (missing semicolons, mismatched braces, incorrect keywords)
+- Type mismatches: passing `vec2<f32>` where `vec4<f32>` is expected, or mixing signed/unsigned types
+- Missing `@location` or `@builtin` attributes: vertex/fragment shader I/O without proper decoration
+- Entry point not found: the `@vertex` or `@fragment` function name doesn't match the pipeline descriptor's `entry_point` field
+
+**Fix:** Read wgpu's error messages carefully — they include the shader source line and column where the issue was detected. Follow this checklist:
+- Check the reported line/column in your WGSL source for syntax issues
+- Verify all type signatures match between vertex shader output and fragment shader input
+- Ensure every varying uses `@location(N)` and position output uses `@builtin(position)`
+- Confirm the function name in your `@vertex`/`@fragment` declaration matches the string you pass to `ProgrammableStage::entry_point`
+- If the error message is unclear, try compiling the shader in isolation to isolate syntax vs. pipeline-binding issues
+
+## 12. GPU debugging with RenderDoc
+
+**Symptom:** Rendering issues that are difficult to diagnose (artifacts, wrong output, silent failures).
+
+**Cause:** GPU debugging is hard with standard tools. Graphics pipeline state, shader execution, and buffer contents are not easily inspectable at runtime.
+
+**Fix:** Use [RenderDoc](https://renderdoc.org/) — a standalone GPU debugging tool supporting frame capture, pipeline state inspection, and shader debugging. It works with Vulkan (Linux), DX12 (Windows), and OpenGL. Launch RenderDoc, attach to your wgpu process, and capture frames to inspect the full graphics pipeline step by step.
+
 ## Additional Resources

 - [Glossary](concepts/GLOSSARY.md) — Every term defined
--- a/docs/concepts/GLOSSARY.md
+++ b/docs/concepts/GLOSSARY.md
@@ -14,73 +14,97 @@ Three weights (w0, w1, w2) that sum to 1, computed by the rasterizer for every f

 A view into GPU buffer memory defined by an offset and a length. `buffer.slice(..)` returns the full buffer. Buffer slices are used when mapping buffers for CPU read/write access or when copying data between buffers. They do not own the underlying memory — they are a window into an existing buffer.

+## Bind group
+
+A collection of GPU resources (buffers, textures, samplers) grouped together and bound to a [shader](#shader) at once. Bind groups correspond to shader `@group` declarations in [WGSL](#wgsl) and are created via `device.create_bind_group()`. They provide efficient resource switching without rebuilding the entire [pipeline](#pipeline).
+
 ## Clip space

-The [[homogeneous coordinates]](#homogeneous-coordinates) coordinate space that the [[vertex shader]](#vertex-shader) outputs into (`vec4<f32>`). The GPU clips geometry against the clip-space boundaries before performing perspective division (dividing x, y, z by w) to produce [[ndc]](#ndc). For perspective projection, clip space is a pyramid. For orthographic projection, it is a box. Geometry outside these boundaries is discarded by hardware.
+The [homogeneous coordinates](#homogeneous-coordinates) coordinate space that the [vertex shader](#vertex-shader) outputs into (`vec4<f32>`). The GPU clips geometry against the clip-space boundaries before performing perspective division (dividing x, y, z by w) to produce [ndc](#ndc). For perspective projection, clip space is a pyramid. For orthographic projection, it is a box. Geometry outside these boundaries is discarded by hardware.

 ## Command buffer

-A recorded sequence of GPU commands — buffer copies, render passes, compute dispatches — analogous to a bash script listing operations to execute. You create a command buffer, encode operations into it via a `CommandEncoder`, then submit it to the [[queue]](#queue). The GPU executes the recorded sequence asynchronously. One submission is one unit of GPU work.
+A recorded sequence of GPU commands — buffer copies, render passes, compute dispatches — analogous to a bash script listing operations to execute. You create a command buffer, encode operations into it via a `CommandEncoder`, then submit it to the [queue](#queue). The GPU executes the recorded sequence asynchronously. One submission is one unit of GPU work.
+
+## Compute shader
+
+A programmable GPU shader that operates on data in buffers without producing geometry or fragments. Compute shaders run in parallel workgroups and are used for general-purpose GPU computation (physics simulations, image processing, etc.). They use `@compute` entry points instead of `@vertex` or `@fragment`.
+
+## Culling
+
+A fixed-function optimization that discards primitives that won't appear in the final image. Back-face culling removes triangles whose winding order indicates they face away from the camera, saving [fragment shader](#fragment-shader) invocations. Culling is configured in the `Face` state of the [pipeline](#pipeline).

 ## Device

-The logical connection to the GPU. Created from an [[adapter]](#adapter), the device owns all GPU resources: buffers, textures, [[pipeline]](#pipeline) objects, shader modules, and bind groups. It is analogous to a file descriptor — the handle through which you allocate and manage GPU memory. All resource creation and destruction flows through the device.
+The logical connection to the GPU. Created from an [adapter](#adapter), the device owns all GPU resources: buffers, textures, [pipeline](#pipeline) objects, shader modules, and bind groups. It is analogous to a file descriptor — the handle through which you allocate and manage GPU memory. All resource creation and destruction flows through the device.

 ## Device poll

-`device.poll(PollType::Wait)` — a synchronous call that tells wgpu to drive all in-flight GPU work toward completion. This includes shader compilation, memory allocation on the GPU side, fence signaling, and surface frame acquisition. Without polling, wgpu's internal work queues stall. The [[polltype]](#polltype) `Wait` variant blocks the CPU thread until pending GPU tasks are done.
+`device.poll(PollType::Wait)` — a synchronous call that tells wgpu to drive all in-flight GPU work toward completion. This includes shader compilation, memory allocation on the GPU side, fence signaling, and surface frame acquisition. Without polling, wgpu's internal work queues stall. The [polltype](#polltype) `Wait` variant blocks the CPU thread until pending GPU tasks are done.
+
+## Depth buffer
+
+A per-pixel buffer that stores the depth (Z) value of each rendered fragment, enabling the GPU to determine which surfaces are in front of others. Without a depth buffer, draw order determines visibility, which breaks 3D scenes with complex overlap. Configured via the `depth_stencil_attachment` in the [render pass](#render-pass).

 ## Fragment

-A potential pixel produced by the [[rasterizer]](#rasterizer). One fragment is generated per screen pixel that a [[primitive]](#primitive) covers. A fragment carries interpolated [[vertex]](#vertex) shader outputs, a depth value, and a color. The fragment may be later discarded by depth testing, stencil testing, or alpha testing during the [[output merge]](#output-merge)(graphics-pipeline.md#stage-5-output-merge) stage. Not every fragment becomes a visible pixel.
+A potential pixel produced by the [rasterizer](#rasterizer). One fragment is generated per screen pixel that a [primitive](#primitive) covers. A fragment carries interpolated [vertex](#vertex) shader outputs, a depth value, and a color. The fragment may be later discarded by depth testing, stencil testing, or alpha testing during the [output merge](#output-merge) ([Stage 5](graphics-pipeline.md#stage-5-output-merge)) stage. Not every fragment becomes a visible pixel.

 ## Fragment shader

-GPU program running once per [[fragment]](#fragment). It receives pre-interpolated vertex shader outputs from the rasterizer and computes the final RGBA color for that fragment. This is where texture sampling, lighting calculations, and pixel-level effects happen. For the rainbow triangle, the fragment shader passes the interpolated vertex color through unchanged.
+GPU program running once per [fragment](#fragment). It receives pre-interpolated vertex shader outputs from the rasterizer and computes the final RGBA color for that fragment. This is where texture sampling, lighting calculations, and pixel-level effects happen. For the rainbow triangle, the fragment shader passes the interpolated vertex color through unchanged.

 ## Framebuffer

-The color buffer that appears on screen. During [[swapchain]](#swapchain) double-buffering, the framebuffer being drawn to is the back buffer. Once the render pass completes and you submit the buffer, it becomes the front buffer and is displayed. The framebuffer is a [[texture view]](#texture-view) tied to a surface frame.
+The color buffer that appears on screen. During [swapchain](#swapchain) double-buffering, the framebuffer being drawn to is the back buffer. Once the render pass completes and you submit the buffer, it becomes the front buffer and is displayed. The framebuffer is a [texture view](#texture-view) tied to a surface frame.

 ## Homogeneous coordinates

-A four-component representation (x, y, z, w) that enables perspective projection via the divide-by-w step. When w=1, the coordinates represent a point in 3D space. When w=0, they represent a direction vector. Perspective division (x/w, y/w, z/w) transforms clip-space coordinates into [[ndc]](#ndc). With w=1.0, division is the identity transform.
+A four-component representation (x, y, z, w) that enables perspective projection via the divide-by-w step. When w=1, the coordinates represent a point in 3D space. When w=0, they represent a direction vector. Perspective division (x/w, y/w, z/w) transforms clip-space coordinates into [ndc](#ndc). With w=1.0, division is the identity transform.
+
+## Index buffer
+
+A GPU buffer containing vertex indices (integer offsets) that define the order in which vertices form [primitive](#primitive)s. Index buffers allow vertex reuse across multiple triangles, reducing memory usage and bandwidth. Used with `render_pass.draw_indexed()` instead of `draw()`.

 ## Interpolation

-The rasterizer's automatic blending of vertex shader outputs across the surface of a triangle. For every `@location(n)` value output by the vertex shader, the [[rasterizer]](#rasterizer) computes a linear blend using [[barycentric coordinates]](#barycentric-coordinates): `value = w0*v0 + w1*v1 + w2*v2`. This is a free, hardware-accelerated feature. No shader code is required to perform interpolation.
+The rasterizer's automatic blending of vertex shader outputs across the surface of a triangle. For every `@location(n)` value output by the vertex shader, the [rasterizer](#rasterizer) computes a linear blend using [barycentric coordinates](#barycentric-coordinates): `value = w0*v0 + w1*v1 + w2*v2`. This is a free, hardware-accelerated feature. No shader code is required to perform interpolation.

 ## Instance

-The root wgpu object representing the connection to the system's graphics drivers. Created via `Instance::new()`, the instance discovers available [[adapter]](#adapter)s and manages [[surface]](#surface) creation. It is the first object created in the wgpu initialization chain.
+The root wgpu object representing the connection to the system's graphics drivers. Created via `Instance::default()`, the instance discovers available [adapter](#adapter)s and manages [surface](#surface) creation. It is the first object created in the wgpu initialization chain. (Note: `Instance::default()` calls `Instance::new(Default::default())` internally; both forms produce an equivalent instance.)

 ## Loadop

-Controls what happens to the [[framebuffer]](#framebuffer) at the start of a render pass. `LoadOp::Clear(color)` fills the entire framebuffer with a solid color — this produces your scene background. `LoadOp::Load` keeps whatever pixels are already in the framebuffer — used for multi-pass rendering where the second pass draws on top of the first.
+Controls what happens to the [framebuffer](#framebuffer) at the start of a render pass. `LoadOp::Clear(color)` fills the entire framebuffer with a solid color — this produces your scene background. `LoadOp::Load` keeps whatever pixels are already in the framebuffer — used for multi-pass rendering where the second pass draws on top of the first.

 ## Output merge

-The final GPU pipeline stage. It applies per-fragment tests (depth, stencil, alpha) and blending operations before writing pixels to the [[framebuffer]](#framebuffer). The blend state (configured in the [[pipeline]](#pipeline)) determines whether new colors replace, add, or multiply with existing framebuffer colors. For the rainbow triangle, blending is REPLACE — new pixels overwrite old ones.
+The final GPU pipeline stage. It applies per-fragment tests (depth, stencil, alpha) and blending operations before writing pixels to the [framebuffer](#framebuffer). The blend state (configured in the [pipeline](#pipeline)) determines whether new colors replace, add, or multiply with existing framebuffer colors. For the rainbow triangle, blending is REPLACE — new pixels overwrite old ones.

 ## Pipeline (render)

-A compiled GPU configuration bundling: [[vertex shader]](#vertex-shader) + [[fragment shader]](#fragment-shader) + [[topology]](#topology) + blend state + depth/stencil state + vertex buffer layout. Created once via `device.create_render_pipeline()` and reused for every frame. Changing any of these parameters requires creating a new pipeline. Pipeline creation is expensive; do not create one per frame.
+A compiled GPU configuration bundling: [vertex shader](#vertex-shader) + [fragment shader](#fragment-shader) + [topology](#topology) + blend state + depth/stencil state + vertex buffer layout. Created once via `device.create_render_pipeline()` and reused for every frame. Changing any of these parameters requires creating a new pipeline. Pipeline creation is expensive; do not create one per frame.

 ## Polltype

-The strategy passed to `device.poll()`. `PollType::Wait` blocks the calling thread until all pending GPU work finishes — equivalent to a fence wait. `PollType::Poll` checks for completed work once and returns immediately, regardless of whether work is done. For the rainbow triangle, `Wait` is correct: we need the GPU to finish the frame before requesting the next surface texture.
+The strategy passed to `device.poll()`, which processes all completed GPU commands and optionally waits for new work. `PollType::Poll` is non-blocking: it checks for completed work once and returns immediately. `PollType::Wait { submission_index, timeout }` can optionally block until a specific command submission completes (via `submission_index: Option<T>`) or until a duration elapses (via `timeout: Option<Duration>`); passing `None` for both blocks indefinitely. For the rainbow triangle, `Wait` with a short timeout is correct: we need the GPU to finish the frame before requesting the next surface texture.

 ## Present Mode

 How the display compositor handles frame buffer presentation: `PresentMode::Mailbox` uses triple buffering for tear-free rendering, `PresentMode::Fifo` provides VSYNC-locked double buffering, `PresentMode::Immediate` renders without synchronization (may show tearing).

-## Ndc
+## NDC

-Normalized Device Coordinates. The GPU's native intermediate coordinate space. X and Y range from -1.0 (left/bottom) to +1.0 (right/top). Z ranges from 0.0 (near clipping plane) to 1.0 (far clipping plane). Geometry is mapped into NDC by the GPU after perspective division. Anything outside this cube is clipped. See [[coordinate-systems.md]](coordinate-systems.md).
+Normalized Device Coordinates. The GPU's native intermediate coordinate space. X and Y range from -1.0 (left/bottom) to +1.0 (right/top). Z ranges from 0.0 (near clipping plane) to 1.0 (far clipping plane). Geometry is mapped into NDC by the GPU after perspective division. Anything outside this cube is clipped. See [coordinate-systems.md](coordinate-systems.md).
+
+## Normal vector
+
+A unit vector perpendicular to a surface at a given point, used for lighting calculations. Normals determine how light interacts with a surface (diffuse/specular reflection). In a [vertex buffer](#vertex-buffer), normals are stored per-[vertex](#vertex) and interpolated across triangles.

 ## Operations

-Paired `LoadOp` + `StoreOp` controlling [[framebuffer]](#framebuffer) behavior at [[render pass]](#render-pass) boundaries. `LoadOp` defines the pre-draw state (clear or load). `StoreOp` defines the post-draw state (store or discard). Together they form `Operations { load, store }` passed to `RenderPassColorAttachment`.
+Paired `LoadOp` + `StoreOp` controlling [framebuffer](#framebuffer) behavior at [render pass](#render-pass) boundaries. `LoadOp` defines the pre-draw state (clear or load). `StoreOp` defines the post-draw state (store or discard). Together they form `Operations { load, store }` passed to `RenderPassColorAttachment`.

 ## Primitive

@@ -88,19 +112,27 @@ A geometric shape the GPU can render: point list, line list, line strip, triangl

 ## Queue

-The submission channel to the GPU. You push [[command buffer]](#command-buffer)s into the queue via `queue.submit()`. The queue executes them asynchronously on the GPU. The queue also handles buffer uploads via `queue.write_buffer()` — these are synchronous copy operations that block until the data lands in GPU memory.
+The submission channel to the GPU. You push [command buffer](#command-buffer)s into the queue via `queue.submit()`. The queue executes them asynchronously on the GPU. The queue also handles buffer uploads via `queue.write_buffer()` — these are synchronous copy operations that block until the data lands in GPU memory.

 ## Rasterizer

-Hardware stage that converts [[primitive]](#primitive) geometry into [[fragment]](#fragment)s. For each triangle, determines which screen pixels it covers, generates one fragment per covered pixel, and computes interpolated vertex attributes using [[barycentric coordinates]](#barycentric-coordinates). The rasterizer is a fixed-function unit: no user code runs here. You configure its behavior (culling, fill mode, scissor test) via the pipeline descriptor.
+Hardware stage that converts [primitive](#primitive) geometry into [fragment](#fragment)s. For each triangle, determines which screen pixels it covers, generates one fragment per covered pixel, and computes interpolated vertex attributes using [barycentric coordinates](#barycentric-coordinates). The rasterizer is a fixed-function unit: no user code runs here. You configure its behavior (culling, fill mode, scissor test) via the pipeline descriptor.
+
+## Perspective projection
+
+A transformation matrix that maps 3D world coordinates into [clip space](#clip-space) with depth-based foreshortening, simulating how the human eye perceives distance. Objects farther from the camera appear smaller. The perspective projection matrix is typically combined with model and view matrices.

 ## Render pass

-A scoped section of a [[command buffer]](#command-buffer) that groups draw operations sharing the same target [[framebuffer]](#framebuffer) attachments. Entered via `command_encoder.begin_render_pass()` and ended by dropping the `RenderPass` variable. Between begin and end, you set the pipeline, bind vertex buffers, and issue draw calls. Everything drawn in one render pass targets the same framebuffer with the same [[operations]](#operations).
+A scoped section of a [command buffer](#command-buffer) that groups draw operations sharing the same target [framebuffer](#framebuffer) attachments. Entered via `command_encoder.begin_render_pass()` and ended by dropping the `RenderPass` variable. Between begin and end, you set the pipeline, bind vertex buffers, and issue draw calls. Everything drawn in one render pass targets the same framebuffer with the same [operations](#operations).
+
+## Sampler
+
+A GPU resource that configures how a [texture](#texture) is read during shader execution, including filtering mode (nearest/linear), addressing mode (clamp/wrap/mirror), and comparison function. Samplers are bound via [bind groups](#bind-group) and referenced in WGSL with `textureSample()`.

 ## Shader

-GPU program written in [[wgsl]](#wgsl). No heap allocation, no recursion, no I/O. The only output channel is the return value. A shader module may contain multiple entry points (`@vertex`, `@fragment`, `@compute`). The GPU runs thousands of shader invocations in parallel, each operating on different data but executing the identical program.
+GPU program written in [wgsl](#wgsl). No heap allocation, no recursion, no I/O. The only output channel is the return value. A shader module may contain multiple entry points (`@vertex`, `@fragment`, `@compute`). The GPU runs thousands of shader invocations in parallel, each operating on different data but executing the identical program.

 ## Shader location

@@ -108,11 +140,11 @@ A numeric binding label (`@location(n)`) used to tie Rust vertex buffer attribut

 ## Storeop

-Controls what happens to the [[framebuffer]](#framebuffer) at the end of a render pass. `StoreOp::Store` keeps the written pixels — this is what you want for visible frames. `StoreOp::Discard` discards the framebuffer contents — used for offscreen renders where you do not need the result on screen, saving a memory barrier.
+Controls what happens to the [framebuffer](#framebuffer) at the end of a render pass. `StoreOp::Store` keeps the written pixels — this is what you want for visible frames. `StoreOp::Discard` discards the framebuffer contents — used for offscreen renders where you do not need the result on screen, saving a memory barrier.

 ## Surface

-wgpu's connection to a window's display buffer. Created via `instance.create_surface(window)`, the surface is like a bound socket — it is tied to a specific window and cannot be unlinked. The surface manages the [[swapchain]](#swapchain) and provides new framebuffers via `surface.get_current_texture()`. If the window is resized, the surface must be reconfigured with a new `SurfaceConfiguration`.
+wgpu's connection to a window's display buffer. Created via `instance.create_surface(window)`, the surface is like a bound socket — it is tied to a specific window and cannot be unlinked. The surface manages the [swapchain](#swapchain) and provides new framebuffers via `surface.get_current_texture()`. If the window is resized, the surface must be reconfigured with a new `SurfaceConfiguration`.

 ## Surface Configuration

@@ -120,39 +152,43 @@ The `SurfaceConfiguration` struct that allocates swapchain framebuffers: format,

 ## Swapchain

-A ring buffer of 2-3 [[framebuffer]](#framebuffer) textures managed by the GPU driver. The display hardware reads from the front buffer. The application renders to the back buffer. When the frame is complete, the buffers swap: the back buffer becomes the front (displayed), and the old front becomes the available back buffer for the next frame. This prevents screen tearing by ensuring the display never reads a frame mid-update.
+A ring buffer of 2-3 [framebuffer](#framebuffer) textures managed by the GPU driver. The display hardware reads from the front buffer. The application renders to the back buffer. When the frame is complete, the buffers swap: the back buffer becomes the front (displayed), and the old front becomes the available back buffer for the next frame. This prevents screen tearing by ensuring the display never reads a frame mid-update.

 ## Texture view

-A handle referencing a region of [[texture]](#texture) memory for use inside a [[render pass]](#render-pass) or bind group. Created via `texture.create_view()`, texture views define the mip level range, aspect, and dimensionality (2D, cube, array) of the binding. Surface framebuffers are accessed as texture views inside render passes.
+A handle referencing a region of [texture](#texture) memory for use inside a [render pass](#render-pass) or bind group. Created via `texture.create_view()`, texture views define the mip level range, aspect, and dimensionality (2D, cube, array) of the binding. Surface framebuffers are accessed as texture views inside render passes.

 ## Texture

-GPU memory region storing color data. Used for both render targets (framebuffers) and samplers (loaded images). In wgpu, a texture is created from the [[device]](#device) with a defined size, format, and usage flags. You never read texture memory directly from the CPU — you access it through [[texture view]](#texture-view) bindings in shaders.
+GPU memory region storing color data. Used for both render targets (framebuffers) and samplers (loaded images). In wgpu, a texture is created from the [device](#device) with a defined size, format, and usage flags. You never read texture memory directly from the CPU — you access it through [texture view](#texture-view) bindings in shaders.

 ## Topology

-The rule for grouping vertices into [[primitive]](#primitive) shapes. `TriangleList` means every 3 consecutive vertices form one independent triangle. `TriangleStrip` means each new vertex combined with the previous two forms a triangle. `PointList` renders individual points. `LineList` renders pairs of connected vertices. Topology is set once on the [[pipeline]](#pipeline) descriptor.
+The rule for grouping vertices into [primitive](#primitive) shapes. `TriangleList` means every 3 consecutive vertices form one independent triangle. `TriangleStrip` means each new vertex combined with the previous two forms a triangle. `PointList` renders individual points. `LineList` renders pairs of connected vertices. Topology is set once on the [pipeline](#pipeline) descriptor.
+
+## UV coordinate
+
+A 2D texture coordinate (u, v) in the range [0, 1] that maps a [vertex](#vertex) to a location on a [texture](#texture). UVs are passed from the [vertex shader](#vertex-shader) to the [fragment shader](#fragment-shader) via `@location` attributes and used to sample texture colors. Also called "texture coordinates."

 ## Vertex

-A data point containing one or more attributes: position, color, UV coordinates, normals, tangents. All attributes for one vertex are stored contiguously in a [[vertex buffer]](#vertex-buffer). The stride (total bytes per vertex) is determined by the sum of all attribute sizes. In the rainbow triangle, each vertex has three `f32` position components and three `f32` color components: 24 bytes per vertex.
+A data point containing one or more attributes: position, color, UV coordinates, normals, tangents. All attributes for one vertex are stored contiguously in a [vertex buffer](#vertex-buffer). The stride (total bytes per vertex) is determined by the sum of all attribute sizes. In the rainbow triangle, each vertex has three `f32` position components and three `f32` color components: 24 bytes per vertex.

 ## Vertex buffer

-GPU [[buffer slice]](#buffer-slice) containing [[vertex]](#vertex) attribute data in a tightly packed layout. Created via `device.create_buffer()` and populated via `queue.write_buffer()`. The pipeline's vertex state describes how to interpret the buffer: stride, attribute count, and per-attribute format + [[shader location]](#shader-location) mapping.
+GPU [buffer slice](#buffer-slice) containing [vertex](#vertex) attribute data in a tightly packed layout. Created via `device.create_buffer()` and populated via `queue.write_buffer()`. The pipeline's vertex state describes how to interpret the buffer: stride, attribute count, and per-attribute format + [shader location](#shader-location) mapping.

 ## Vertex shader

-GPU program running once per [[vertex]](#vertex). It reads vertex attributes from the [[vertex buffer]](#vertex-buffer), transforms the position into [[clip space]](#clip-space), and outputs any per-vertex data the downstream pipeline stages need. The mandatory output is `@builtin(position) vec4<f32>`. Optional outputs use `@location(n)` annotations and flow into the rasterizer for interpolation.
+GPU program running once per [vertex](#vertex). It reads vertex attributes from the [vertex buffer](#vertex-buffer), transforms the position into [clip space](#clip-space), and outputs any per-vertex data the downstream pipeline stages need. The mandatory output is `@builtin(position) vec4<f32>`. Optional outputs use `@location(n)` annotations and flow into the rasterizer for interpolation.

 ## Viewport transform

-Automatic GPU step mapping [[ndc]](#ndc) coordinates (-1..+1) to [[window]](#window) pixel coordinates. Configured via `SurfaceConfiguration` `width` and `height` fields. The GPU performs: `screen_x = (ndc_x + 1) / 2 * width; screen_y = (ndc_y + 1) / 2 * height`. This step happens after perspective division, between NDC and the rasterizer. You never write this math in shader code.
+Automatic GPU step mapping [ndc](#ndc) coordinates (-1..+1) to [window](#window) pixel coordinates. Configured via `SurfaceConfiguration` `width` and `height` fields. The GPU performs: `screen_x = (ndc_x + 1) / 2 * width; screen_y = (ndc_y + 1) / 2 * height`. This step happens after perspective division, between NDC and the rasterizer. You never write this math in shader code.

 ## Window

-The operating system window created by the windowing library. In wgpu, the window is passed to `instance.create_surface()` to bind the GPU to a display target. The window dimensions dictate the [[viewport transform]](#viewport-transform) and thus the size of the rendered image. Resizing the window requires creating a new `SurfaceConfiguration` with updated dimensions.
+The operating system window created by the windowing library. In wgpu, the window is passed to `instance.create_surface()` to bind the GPU to a display target. The window dimensions dictate the [viewport transform](#viewport-transform) and thus the size of the rendered image. Resizing the window requires creating a new `SurfaceConfiguration` with updated dimensions.

 ## WGSL

--- a/docs/concepts/coordinate-systems.md
+++ b/docs/concepts/coordinate-systems.md
@@ -4,7 +4,7 @@

 Your window is a grid of pixels: 800×600 in our configuration. The 3D scene you want to render spans from -∞ to +∞ in every direction. The GPU cannot reason in window pixels because every window has a different size. It cannot reason in world space because that is application-defined. The GPU needs a standard intermediate coordinate space.

-That space is [[ndc]](GLOSSARY.md#ndc), Normalized Device Coordinates.
+That space is [ndc](GLOSSARY.md#ndc), Normalized Device Coordinates.

 ## NDC Definition

@@ -48,7 +48,7 @@ In a real application, vertices live in arbitrary world units and you apply a se

 ## Homogeneous Coordinates

-The GPU vertex shader outputs a `vec4<f32>`, not a `vec3<f32>`. The fourth component `w` is the [[homogeneous coordinates]](GLOSSARY.md#homogeneous-coordinates) value that enables the clip space → NDC conversion.
+The GPU vertex shader outputs a `vec4<f32>`, not a `vec3<f32>`. The fourth component `w` is the [homogeneous coordinates](GLOSSARY.md#homogeneous-coordinates) value that enables the clip space → NDC conversion.

 When the vertex shader outputs `vec4<f32>(x, y, z, w)`, the GPU performs a step called **perspective division**: it divides every component by `w`. The result is `(x/w, y/w, z/w)` — this is what lands in NDC.

@@ -66,7 +66,7 @@ Our triangle uses `w = 1.0` because we have no camera and no perspective — jus

 ## Clip Space

-Before NDC, there is [[clip space]](GLOSSARY.md#clip-space). This is the coordinate space the vertex shader outputs into. Clip space is a pyramid (for perspective projection) or a box (for orthographic projection) that the GPU clips against. Geometry outside the clip-space boundaries is discarded by hardware before perspective division. Our triangle is entirely inside the clip space pyramid, so nothing is clipped.
+Before NDC, there is [clip space](GLOSSARY.md#clip-space). This is the coordinate space the vertex shader outputs into. Clip space is a pyramid (for perspective projection) or a box (for orthographic projection) that the GPU clips against. Geometry outside the clip-space boundaries is discarded by hardware before perspective division. Our triangle is entirely inside the clip space pyramid, so nothing is clipped.

 ## Viewport Transform (Automatic)

@@ -77,17 +77,47 @@ screen_x = (ndc_x + 1.0) / 2.0 * window_width
 screen_y = (ndc_y + 1.0) / 2.0 * window_height
 ```

-This step is automatic. You never write it in code. It is configured by the [[viewport transform]](GLOSSARY.md#viewport-transform) fields in your `SurfaceConfiguration`, specifically the `width` and `height` values. When the surface configuration says 800×600, the GPU maps NDC `[-1, +1]` onto `[0, 800]` and `[0, 600]`.
+This step is automatic. You never write it in code. It is configured by the [viewport transform](GLOSSARY.md#viewport-transform) fields in your `SurfaceConfiguration`, specifically the `width` and `height` values. When the surface configuration says 800×600, the GPU maps NDC `[-1, +1]` onto `[0, 800]` and `[0, 600]`.

 You do write code to update the viewport transform — but only when the window size changes. At that point, you create a new `SurfaceConfiguration` with the new dimensions and configure the surface. The GPU then uses the updated mapping on subsequent frames.

+## Depth and the Z Coordinate
+
+All three vertices of our triangle sit at Z=0 — exactly on the near plane. This is a simplification that works fine for a flat 2D triangle, but it means we carry no depth information. In a 3D scene with overlapping geometry, you need varying Z values so the GPU can decide which surfaces are in front of others.
+
+The mechanism that resolves this is the [depth buffer](GLOSSARY.md#depth-buffer). When enabled, the GPU allocates a per-pixel buffer storing the Z value of the closest surface rendered to that pixel. Each new fragment is compared against the stored depth: if the fragment is closer, it overwrites the pixel and updates the depth value; if it is farther away, it is silently discarded. This is how 3D scenes achieve correct occlusion.
+
+Our current pipeline does not use a depth buffer. For flat 2D rendering, draw order alone determines which geometry appears on top. Depth buffering will be covered in a future tutorial when we render 3D geometry.
+
 ## Summary: The Coordinate Journey

 For our triangle, every vertex follows this path:

 1. **Vertex data:** Stored as `vec3<f32>` in the vertex buffer. Values are already in NDC.
 2. **Vertex shader:** Wraps in `vec4(f32)` by appending `w = 1.0`. This is clip space (which, for identity `w`, equals NDC).
-3. **Perspective division:** GPU divides by `w = 1.0` → identity. Vertex is now in [[ndc]](GLOSSARY.md#ndc).
+3. **Perspective division:** GPU divides by `w = 1.0` → identity. Vertex is now in [ndc](GLOSSARY.md#ndc).
 4. **Viewport transform (automatic):** GPU scales NDC to window pixel coordinates. The triangle appears on screen.

 In a real 3D application, this journey includes model, view, and projection matrices before clip space. For the rainbow triangle, the journey is three steps through identity transforms. The hardware pipeline stages are the same regardless.
+
+```text
+Vertex Buffer (NDC positions)
+       │
+       ▼
+Vertex Shader → Clip Space (vec4<f32>, homogeneous)
+       │
+       ▼
+[Fixed: Perspective Division: (x,y,z) ÷ w]
+       │
+       ▼
+NDC Cube [-1,+1]×[-1,+1]×[0,1]
+       │
+       ▼
+[Fixed: Viewport Transform: scale + translate]
+       │
+       ▼
+Viewport/Framebuffer (pixel coordinates)
+       │
+       ▼
+Rasterization → Fragments → Fragment Shader → Output Merge → Screen
+```
--- a/docs/concepts/graphics-pipeline.md
+++ b/docs/concepts/graphics-pipeline.md
@@ -27,11 +27,11 @@ Each stage is a pipeline filter. Data flows through; nothing flows backward. Thi

 ### Stage 1: Vertex Shader

-[[vertex shader]](GLOSSARY.md#vertex-shader) — a GPU program running once per input [[vertex]](GLOSSARY.md#vertex).
+[vertex shader](GLOSSARY.md#vertex-shader) — a GPU program running once per input [vertex](GLOSSARY.md#vertex).

-Input: vertex attributes read from the [[vertex buffer]](GLOSSARY.md#vertex-buffer). In our case: position and color.
+Input: vertex attributes read from the [vertex buffer](GLOSSARY.md#vertex-buffer). In our case: position and color.

-Output: mandatory clip-space position (`vec4<f32>`) plus any per-vertex data the [[fragment shader]](GLOSSARY.md#fragment-shader) needs downstream: color, UV coordinates, normals, etc.
+Output: mandatory clip-space position (`vec4<f32>`) plus any per-vertex data the [fragment shader](GLOSSARY.md#fragment-shader) needs downstream: color, UV coordinates, normals, etc.

 The vertex shader is the only place you transform geometry. In complex scenes this means multiplying by model-view-projection matrices. For our triangle, the vertices are already in the GPU's native coordinate space, so the vertex shader passes the position through unchanged.

@@ -39,21 +39,31 @@ The vertex shader is the only place you transform geometry. In complex scenes th

 Hardware only. No user code runs here.

-The GPU takes vertices in the order you submitted them and groups them into [[primitive]](GLOSSARY.md#primitive) shapes. With [[topology]](GLOSSARY.md#topology) set to `TriangleList`, every group of 3 consecutive vertices becomes one triangle. Vertex 0, 1, 2 → triangle A. Vertex 3, 4, 5 → triangle B.
+The GPU takes vertices in the order you submitted them and groups them into [primitive](GLOSSARY.md#primitive) shapes. With [topology](GLOSSARY.md#topology) set to `TriangleList`, every group of 3 consecutive vertices becomes one triangle. Vertex 0, 1, 2 → triangle A. Vertex 3, 4, 5 → triangle B.

 ### Stage 3: Rasterizer

-[[rasterizer]](GLOSSARY.md#rasterizer) — hardware stage that converts triangles into fragments.
+> **Note:** The 5-stage model above is a simplification for conceptual clarity. The actual WebGPU and Vulkan pipelines define 11+ stages, including fixed-function vertex post-processing stages between the programmable stages. This section covers the essential stages relevant to writing shaders and configuring pipelines.

-For each submitted triangle, the rasterizer determines which screen pixels the triangle covers. For each covered pixel, it generates one [[fragment]](GLOSSARY.md#fragment) — a "potential pixel" carrying interpolated data.
+Before rasterization proper, the GPU performs several fixed-function vertex post-processing steps on the clip-space positions output by the vertex shader:

-The critical function here is [[interpolation]](GLOSSARY.md#interpolation). The rasterizer computes [[barycentric coordinates]](GLOSSARY.md#barycentric-coordinates) — three weights (w0, w1, w2) that sum to 1 — describing where inside the triangle the pixel falls. Then for every value the vertex shader output, the rasterizer computes: `value = w0 * value0 + w1 * value1 + w2 * value2`.
+- **Perspective Division:** The clip-space `vec4` position is divided by its `w` component, converting it to normalized device coordinates (NDC) in the range [-1, 1].
+- **Clipping:** Primitives that fall entirely outside the NDC cube are discarded. Primitives that partially intersect are clipped and retriangulated.
+- **Viewport Transform:** NDC coordinates are mapped to window pixel coordinates based on the configured viewport dimensions.

-This is the step that makes colors blend across the triangle. It is free, automatic, hardware-accelerated [[interpolation]](GLOSSARY.md#interpolation). You do not write the code. The GPU computes it because it is how the rendering pipeline architecture works.
+These stages are automatic and happen in hardware. You do not write code for them.
+
+Then the [rasterizer](GLOSSARY.md#rasterizer) takes over — the hardware stage that converts triangles into fragments.
+
+For each submitted triangle, the rasterizer determines which screen pixels the triangle covers. For each covered pixel, it generates one [fragment](GLOSSARY.md#fragment) — a "potential pixel" carrying interpolated data.
+
+The critical function here is [interpolation](GLOSSARY.md#interpolation). The rasterizer computes [barycentric coordinates](GLOSSARY.md#barycentric-coordinates) — three weights (w0, w1, w2) that sum to 1 — describing where inside the triangle the pixel falls. Then for every value the vertex shader output, the rasterizer computes: `value = w0 * value0 + w1 * value1 + w2 * value2`.
+
+This is the step that makes colors blend across the triangle. It is free, automatic, hardware-accelerated [interpolation](GLOSSARY.md#interpolation). You do not write the code. The GPU computes it because it is how the rendering pipeline architecture works.

 ### Stage 4: Fragment Shader

-[[fragment shader]](GLOSSARY.md#fragment-shader) — a GPU program running once per [[fragment]](GLOSSARY.md#fragment).
+[fragment shader](GLOSSARY.md#fragment-shader) — a GPU program running once per [fragment](GLOSSARY.md#fragment).

 Input: the pre-interpolated values from the vertex shader, delivered by the rasterizer. The fragment shader receives one invocation per covered screen pixel. If a triangle covers 2000 pixels, the fragment shader runs 2000 times.

@@ -61,7 +71,7 @@ Output: the final RGBA color for that pixel. The fragment shader computes lighti

 ### Stage 5: Output Merge

-The final hardware stage before the color hits the [[framebuffer]](GLOSSARY.md#framebuffer).
+The final hardware stage before the color hits the [framebuffer](GLOSSARY.md#framebuffer).

 Per-fragment operations:

@@ -69,9 +79,7 @@ Per-fragment operations:
 - **Stencil test:** Mask drawing to specific screen regions via a stencil buffer. We disable this.
 - **Blend:** Combine the new fragment color with the existing framebuffer color. We use REPLACE — the fragment color overwrites whatever was there.

-Before the output merge, the GPU performs the [[viewport transform]](GLOSSARY.md#viewport-transform): mapping NDC coordinates to window pixel dimensions. This step is automatic and configured by your surface dimensions.
-
-After the output merge, the final color is written to the framebuffer. When you [[load op]](GLOSSARY.md#loadop) is `Clear`, the framebuffer is filled with your background color before the render pass begins. [[Storeop]](GLOSSARY.md#storeop) determines whether you keep or discard the results after the render pass.
+After the output merge, the final color is written to the framebuffer. When you [load op](GLOSSARY.md#loadop) is `Clear`, the framebuffer is filled with your background color before the render pass begins. [Storeop](GLOSSARY.md#storeop) determines whether you keep or discard the results after the render pass.

 ## Why This Matters For The Rainbow Triangle

@@ -89,6 +97,12 @@ The rainbow gradient is not programmed. There is no loop, no formula, no color b

 ## The Pipeline Object In wgpu

-In wgpu, you compile all of this into a [[pipeline]](GLOSSARY.md#pipeline): a single opaque render pipeline object encoding your shaders, topology, blend state, vertex layout, and output format. It is created once during initialization and reused every frame. Creating a pipeline up-front saves per-frame compilation and state configuration. The [[device]](GLOSSARY.md#device) owns the pipeline, and you use the [[queue]](GLOSSARY.md#queue) to submit draw calls that reference it.
+In wgpu, you compile all of this into a [pipeline](GLOSSARY.md#pipeline): a single opaque render pipeline object encoding your shaders, topology, blend state, vertex layout, and output format. It is created once during initialization and reused every frame. Creating a pipeline up-front saves per-frame compilation and state configuration. The [device](GLOSSARY.md#device) owns the pipeline, and you use the [queue](GLOSSARY.md#queue) to submit draw calls that reference it.

-The [[adapter]](GLOSSARY.md#adapter) is the physical GPU or software renderer you select. There may be multiple on a single system — a dedicated NVIDIA card plus integrated Intel graphics. You pick one adapter, create a device from it, and all resources flow from that device.
+The [adapter](GLOSSARY.md#adapter) is the physical GPU or software renderer you select. There may be multiple on a single system — a dedicated NVIDIA card plus integrated Intel graphics. You pick one adapter, create a device from it, and all resources flow from that device.
+
+## A Note On The Pipeline Model
+
+The five-stage model presented here is a simplified educational abstraction. The actual WebGPU and Vulkan graphics pipelines define 11 or more stages. Between the programmable stages lie fixed-function hardware stages including clipping, perspective division, and viewport transform that operate automatically. The five stages above capture the essential flow relevant to writing shaders and configuring pipelines for common use cases.
+
+For the complete specification of the WebGPU rendering pipeline, consult the [WebGPU Specification](https://www.w3.org/TR/webgpu/).
--- a/docs/concepts/shader-basics.md
+++ b/docs/concepts/shader-basics.md
@@ -4,7 +4,7 @@

 A shader is a GPU program. It is a piece of code that runs on the GPU instead of the CPU. Unlike a CPU program, you do not call a shader function once. You configure it, bind data to it, and then the GPU runs thousands of copies simultaneously on different data elements. One shader invocation per vertex. One shader invocation per pixel.

-Shaders are written in WGSL — [[wgsl]](GLOSSARY.md#wgsl), the WebGPU Shading Language. WGSL is compiled down to the platform's native intermediate representation: SPIR-V for Vulkan, MSL for Metal, DXIL for DirectX. You write one shader; wgpu handles the translation.
+Shaders are written in WGSL — [wgsl](GLOSSARY.md#wgsl), the WebGPU Shading Language. WGSL is compiled down to the platform's native intermediate representation: SPIR-V for Vulkan, MSL for Metal, DXIL for DirectX. You write one shader; wgpu handles the translation.

 ## WGSL Constraints

@@ -14,7 +14,7 @@ WGSL is designed for parallel execution on hardware with severe restrictions:
 - **No recursion.** The GPU has a fixed, tiny stack. Recursive calls are banned.
 - **No I/O.** No `print`, no `println`, no file access, no `socket`. A shader communicates only through its return values and writes to bound buffers/textures.
 - **Static types.** `f32`, `i32`, `u32` for scalars. `vec2<T>`, `vec3<T>`, `vec4<T>` for vectors. `mat2x2<T>` through `mat4x4<T>` for matrices. Every expression has a known type at compile time. There is no `any` and no `dyn`.
- **No arbitrary memory access.** You read from structured inputs (vertex attributes, uniform buffers, textures) and write to defined outputs. Memory is laid out contiguously in [[buffer slice]](GLOSSARY.md#buffer-slice) regions.
+- **No arbitrary memory access.** You read from structured inputs (vertex attributes, uniform buffers, textures) and write to defined outputs. Memory is laid out contiguously in [buffer slice](GLOSSARY.md#buffer-slice) regions.

 These are not bugs. They are the GPU architecture. Every shader invocation runs in an identical sandbox. That identity is what enables 1000x throughput.

@@ -24,19 +24,19 @@ A shader module contains one or more entry point functions. Each entry point is

 ### `@vertex` — Vertex Shader Entry Point

-Runs once per input [[vertex]](GLOSSARY.md#vertex). The GPU calls this function for every vertex in your draw call.
+Runs once per input [vertex](GLOSSARY.md#vertex). The GPU calls this function for every vertex in your draw call.

-**Mandatory output:** `@builtin(position) vec4<f32>` — the [[clip space]](GLOSSARY.md#clip-space) position that the GPU uses for [[primitive]](GLOSSARY.md#primitive) assembly and rasterization. Without this output, the pipeline fails.
+**Mandatory output:** `@builtin(position) vec4<f32>` — the [clip space](GLOSSARY.md#clip-space) position that the GPU uses for [primitive](GLOSSARY.md#primitive) assembly and rasterization. Without this output, the pipeline fails.

 **Optional outputs:** Any number of `@location(n)` values that flow to the fragment shader. Color, UV coordinates, normals — everything downstream needs is passed through the vertex shader output.

 ### `@fragment` — Fragment Shader Entry Point

-Runs once per [[fragment]](GLOSSARY.md#fragment) produced by the rasterizer. For a triangle covering 500 pixels on screen, the fragment shader runs 500 times.
+Runs once per [fragment](GLOSSARY.md#fragment) produced by the rasterizer. For a triangle covering 500 pixels on screen, the fragment shader runs 500 times.

 **Input:** Interpolated values from the vertex shader. If the vertex shader output `@location(0) color: vec3<f32>`, the fragment shader receives that same `@location(0)` with hardware-interpolated values.

-**Output:** `@location(0) vec4<f32>` — the final RGBA color written to the [[framebuffer]](GLOSSARY.md#framebuffer).
+**Output:** `@location(0) vec4<f32>` — the final RGBA color written to the [framebuffer](GLOSSARY.md#framebuffer).

 ## The Location Contract

@@ -44,7 +44,7 @@ Runs once per [[fragment]](GLOSSARY.md#fragment) produced by the rasterizer. For

 > **LOCATION BINDING IS THE CRITICAL LINK BETWEEN RUST AND WGSL**
 >
-> Every value flowing between Rust buffers and WGSL shader functions is tied together by a numeric [[shader location]](GLOSSARY.md#shader-location) label. The number on the Rust side must match the number on the WGSL side.
+> Every value flowing between Rust buffers and WGSL shader functions is tied together by a numeric [shader location](GLOSSARY.md#shader-location) label. The number on the Rust side must match the number on the WGSL side.
 >
 > Rust: `VertexAttribute { shader_location: 0, ... }`
 >
@@ -56,7 +56,7 @@ Runs once per [[fragment]](GLOSSARY.md#fragment) produced by the rasterizer. For

 ## Interpolation Mechanism

-Between the vertex shader and the fragment shader, the [[rasterizer]](GLOSSARY.md#rasterizer) performs a computation that most graphics tutorials treat as magic. It is not magic. It is [[interpolation]](GLOSSARY.md#interpolation).
+Between the vertex shader and the fragment shader, the [rasterizer](GLOSSARY.md#rasterizer) performs a computation that most graphics tutorials treat as magic. It is not magic. It is [interpolation](GLOSSARY.md#interpolation).

 For every `@location(n)` value the vertex shader outputs, the rasterizer computes a triangle-wide linear blend:

@@ -64,73 +64,31 @@ For every `@location(n)` value the vertex shader outputs, the rasterizer compute
 fragment_value = w0 * vertex0_value + w1 * vertex1_value + w2 * vertex2_value
 ```

-where `w0 + w1 + w2 = 1.0` and the weights are [[barycentric coordinates]](GLOSSARY.md#barycentric-coordinates) computed from the fragment's position inside the triangle.
+where `w0 + w1 + w2 = 1.0` and the weights are [barycentric coordinates](GLOSSARY.md#barycentric-coordinates) computed from the fragment's position inside the triangle.

 This interpolation is free. It is a dedicated hardware unit inside every GPU. You do not write the code. You do not pay an algorithmic cost. The rasterizer hardware computes barycentric weights and blends every vertex shader output automatically. The fragment shader receives pre-blended values and does not need to know how they were computed.

-## Concrete Shader Walkthrough
+## How Shaders Work Together

-This is the complete shader for the rainbow triangle. Every line is explained below.
+A complete rendering shader is a two-stage program compiled into a single WGSL module. The **vertex shader** runs once per vertex in your draw call, transforming raw buffer data into GPU-ready outputs. The **fragment shader** runs once per pixel produced by the rasterizer, converting interpolated vertex data into the final color written to the [framebuffer](GLOSSARY.md#framebuffer). Both stages execute in parallel across thousands of invocations — the vertex shader processes all vertices simultaneously, then the fragment shader processes all fragments simultaneously.

-```wgsl
-struct VertexOutput {
-    @builtin(position) clip_position: vec4<f32>,
-    @location(0) vertex_color: vec3<f32>,
-};
+Data flows between the vertex and fragment stages through a shared struct. The struct's fields are tagged with WGSL attributes that tell the GPU how to route each value:

-@vertex
-fn vs_main(
-    @location(0) position: vec3<f32>,
-    @location(1) color: vec3<f32>,
-) -> VertexOutput {
-    var out: VertexOutput;
-    out.clip_position = vec4<f32>(position, 1.0);
-    out.vertex_color = color;
-    return out;
-}
+- **`@location(n)`** marks values that bind to Rust vertex buffer attributes or flow between shader stages. The number `n` is a binding index: on the Rust side it appears as `shader_location: n` in a `VertexAttribute`, and in WGSL it appears as `@location(n)` on a parameter or struct field. If the numbers differ, the GPU reads from the wrong buffer offset and produces silent garbage. Between the vertex and fragment stages, `@location` values are automatically interpolated by the rasterizer using barycentric weights — the fragment shader receives a smooth blend without writing any interpolation code.
+- **`@builtin(position)`** is a reserved slot the vertex shader must output. It delivers the vertex's [clip space](GLOSSARY.md#clip-space) position as `vec4<f32>`, which the rasterizer uses for perspective division, viewport transform, and primitive assembly. The fragment shader receives its own independent `@builtin(position)` from the fragment pipeline stage — providing framebuffer pixel coordinates — not the vertex shader's output. The two builtins share a name but are completely separate values from different stages.

-@fragment
-fn fs_main(input: VertexOutput) -> @location(0) vec4<f32> {
-    return vec4<f32>(input.vertex_color, 1.0);
-}
-```
+The vertex shader produces a struct containing a `@builtin(position)` output plus any number of `@location` interpolants. The rasterizer takes these outputs, assembles [primitives](GLOSSARY.md#primitive), and for every pixel inside the triangle computes [barycentric coordinates](GLOSSARY.md#barycentric-coordinates) and blends all `@location` fields. The fragment shader receives the fully interpolated struct and outputs a `vec4<f32>` color at `@location(0)`, which maps to the [pipeline](GLOSSARY.md#pipeline)'s color attachment target.

-### Line by line
-
-**`struct VertexOutput { ... }`** — The interface between vertex and fragment stages. This struct defines everything the vertex shader sends downstream. It is the contract the rasterizer enforces.
-
-**`@builtin(position) clip_position: vec4<f32>`** — The mandatory clip-space position output. The `@builtin(position)` annotation tells the GPU this value goes to the primitive assembly / rasterizer pipeline, not to another shader stage. The GPU reads this to know where each vertex sits in 3D space.
-
-**`@location(0) vertex_color: vec3<f32>`** — An interpolant flowing from vertex to fragment stage. The `@location(0)` annotation labels this value with binding index 0. Any `@location(0)` output here becomes the `@location(0)` input to the fragment shader.
-
-**`@vertex fn vs_main(...)`** — The vertex shader entry point. The `@vertex` attribute marks this as the function the vertex pipeline stage calls.
-
-**`@location(0) position: vec3<f32>`** — Vertex buffer input at location 0. In Rust, the vertex buffer's first attribute is declared with `shader_location: 0`. This is the first half of the location contract: the Rust buffer layout and WGSL input must agree.
-
-**`@location(1) color: vec3<f32>`** — Vertex buffer input at location 1. The second vertex attribute in the buffer. Each vertex stores two values: a 3-component position and a 3-component color, contiguous in memory.
-
-**`var out: VertexOutput;`** — Local variable holding the shader output. WGSL requires explicit variable declarations.
-
-**`out.clip_position = vec4<f32>(position, 1.0);`** — Wraps the 3D position in a [[homogeneous coordinates]](GLOSSARY.md#homogeneous-coordinates) `vec4` by appending `w = 1.0`. See [[coordinate-systems.md]](coordinate-systems.md) for why `w = 1.0` is the identity for our triangle.
-
-**`out.vertex_color = color;`** — Passes the vertex color through to the fragment shader. No transformation needed — the color is already the final per-vertex color. The rasterizer will blend across the triangle surface.
-
-**`@fragment fn fs_main(input: VertexOutput) -> ...`** — The fragment shader entry point. It receives one input struct per fragment. This struct contains the rasterizer's pre-interpolated values.
-
-**`input.vertex_color`** — The color value, already blended by the rasterizer. If the current fragment is 70% close to the red vertex, 20% close to green, 10% close to blue, this value is `(0.7*1.0 + 0.2*0.0 + 0.1*0.0, 0.7*0.0 + 0.2*1.0 + 0.1*0.0, 0.7*0.0 + 0.2*0.0 + 0.1*1.0)` = `(0.7, 0.2, 0.1)`. The interpolation was performed by hardware; the fragment shader does not compute it.
-
-**`-> @location(0) vec4<f32>`** — The fragment shader output signature. `@location(0)` maps to the color attachment in the [[pipeline]](GLOSSARY.md#pipeline) render pass. It is the pixel color written to the framebuffer.
-
-**`vec4<f32>(input.vertex_color, 1.0)`** — Wraps the interpolated RGB color in `vec4` by appending alpha = 1.0 (fully opaque). The framebuffer expects a 4-component color.
+For a complete line-by-line walkthrough of our rainbow triangle shader, see [Section 4](01-rainbow-triangle.md#s4-writing-the-shaders).

 ## WGSL Source Embedding

 In wgpu, the shader source code lives as a Rust string, embedded at compile time:

 ```rust
-const SHADER_SOURCE: &str = include_str!("shaders/main.wgsl");
+const SHADER_SOURCE: &str = include_str!("shader.wgsl");
 ```

-`include_str!` reads the WGSL file during Rust compilation and inlines it as a `&'static str`. There is no runtime file I/O. The shader text is part of the binary. When you create the shader module via `device.create_shader_module()`, wgpu compiles the string to the platform's GPU intermediate format (SPIR-V, MSL, or DXIL). The compilation happens asynchronously on the [[device]](GLOSSARY.md#device) — you drive it to completion with a [[device poll]](GLOSSARY.md#device-poll).
+`include_str!` reads the WGSL file during Rust compilation and inlines it as a `&'static str`. There is no runtime file I/O. The shader text is part of the binary. When you create the shader module via `device.create_shader_module()`, wgpu compiles the string to the platform's GPU intermediate format (SPIR-V, MSL, or DXIL). The compilation happens asynchronously on the [device](GLOSSARY.md#device) — you drive it to completion with a [device poll](GLOSSARY.md#device-poll).

 This is intentional: GPU drivers are slow to initialize file paths. Embedding the source at compile time is idiomatic wgpu and eliminates a class of runtime errors.
Author	SHA1	Message	Date
Krishna Ayyalasomayajula	fc2a04fe14	docs: elevate critical WHY explanations to callout blocks	2026-05-30 21:01:23 -05:00
Krishna Ayyalasomayajula	82ea34fe37	docs: add simplification caveat to graphics pipeline model	2026-05-30 20:59:58 -05:00
Krishna Ayyalasomayajula	2bc7b5c58e	docs: add coordinate space journey diagram	2026-05-30 20:59:18 -05:00
Krishna Ayyalasomayajula	72411b9786	docs: add ASCII diagram of GPU init chain to S3	2026-05-30 20:59:06 -05:00
Krishna Ayyalasomayajula	c2e1aa3bab	docs: fix validation layer disable mechanism in GPU init section	2026-05-30 20:57:00 -05:00
Krishna Ayyalasomayajula	ccd5f2a61a	docs: add 9 missing glossary entries	2026-05-30 20:53:58 -05:00
Krishna Ayyalasomayajula	31adc35da7	docs: add SurfaceStatus::Lost recovery strategy	2026-05-30 20:51:37 -05:00
Krishna Ayyalasomayajula	598708f111	docs: add RenderDoc GPU debugging tools mention	2026-05-30 20:50:03 -05:00
Krishna Ayyalasomayajula	689b43da98	docs: add validation layer discussion to GPU init section	2026-05-30 20:49:40 -05:00
Krishna Ayyalasomayajula	2586e9b813	docs: add shader compilation troubleshooting entry	2026-05-30 20:49:31 -05:00
Krishna Ayyalasomayajula	cb7c01754f	docs: add matrix math prerequisite note to What's Next	2026-05-30 20:49:19 -05:00
Krishna Ayyalasomayajula	865f8c1191	docs: add depth buffer acknowledgment to coordinate systems	2026-05-30 20:49:18 -05:00
Krishna Ayyalasomayajula	a8c64e3643	docs: fix pipeline and operations cross-reference anchors	2026-05-30 20:47:23 -05:00
Krishna Ayyalasomayajula	f9010f9234	docs: standardize cross-reference link format to markdown	2026-05-30 20:31:59 -05:00
Krishna Ayyalasomayajula	6d72ecf45d	docs: fix all broken cross-reference links	2026-05-30 20:30:24 -05:00
Krishna Ayyalasomayajula	ecea7ce77e	docs: eliminate shader walkthrough duplication between S4 and shader-basics	2026-05-30 20:22:51 -05:00
Krishna Ayyalasomayajula	cad48bd58d	docs: standardize Instance creation to default()	2026-05-30 20:20:56 -05:00
Krishna Ayyalasomayajula	8ee04f9dce	docs: standardize shader file path to shader.wgsl	2026-05-30 20:20:53 -05:00
Krishna Ayyalasomayajula	9051de0591	docs: correct @builtin(position) explanation per WGSL spec	2026-05-30 20:17:16 -05:00
Krishna Ayyalasomayajula	7f47641fdb	docs: fix pipeline stage ordering per WebGPU/Vulkan spec	2026-05-30 20:17:09 -05:00
Krishna Ayyalasomayajula	d7fc299d5a	docs: fix render loop to use CurrentSurfaceTexture enum per wgpu 29 API	2026-05-30 20:12:18 -05:00
Krishna Ayyalasomayajula	667dea3d52	docs: fix Polltype glossary entry to include submission_index field	2026-05-30 20:12:04 -05:00
Krishna Ayyalasomayajula	3369528679	docs: fix resize() signature mismatch between call site and definition	2026-05-30 20:03:47 -05:00
Krishna Ayyalasomayajula	4a4be8b307	docs: add window field to State struct for Outdated recovery	2026-05-30 20:02:59 -05:00
Krishna Ayyalasomayajula	23a7e3b151	docs: fix SurfaceStatus API to match wgpu 29 CurrentSurfaceTexture	2026-05-30 20:01:24 -05:00
Krishna Ayyalasomayajula	98cf438f88	docs: fix Polltype glossary entry to match wgpu 29 API	2026-05-30 20:00:39 -05:00