learn-wgpu/docs/01-rainbow-triangle.md

# Building a Rainbow Triangle

## S1: What We're Building

We're creating a window containing a single triangle with smoothly blended colors:

Red at the bottom-left corner, blue at the bottom-right corner, and green at the
top vertex. The gradient between each pair of vertices is not computed by you —
it is interpolated automatically by the GPU rasterizer in hardware. You provide
three vertices, each carrying a position and a color. The rasterizer determines
every pixel covered by the triangle and computes the color for that pixel by
blending the three vertex colors proportionally to their distance. The result
is a smooth rainbow gradient across a single primitive. We do not need a texture,
a colormap, or a fragment shader with any branching — just three colored
vertices and the default linear interpolation the [rasterizer](concepts/GLOSSARY.md#rasterizer)
applies to every [interpolated value](concepts/GLOSSARY.md#interpolation).

If you haven't read the [concept overview](concepts/graphics-pipeline.md), do so
now. [Coordinate systems](concepts/coordinate-systems.md) explains how the GPU
positions geometry. [Shader basics](concepts/shader-basics.md) covers the GPU
programs that drive rendering.

## S2: The winit Application and Event Loop

New concept: **event-driven windowing.** winit is the bridge between your Rust
code and the display server (X11 or Wayland on Linux). Think of it like `epoll`
or `kqueue` but for windows, input, and display lifecycle events instead of file
descriptors.

The entire program runs on the tokio async runtime — wgpu's [adapter](concepts/GLOSSARY.md#adapter)
queries and [device](concepts/GLOSSARY.md#device) creation are async, and the
runtime is the natural home for the main event loop.

### Architecture Overview

- **`main()` is `#[tokio::main] async fn`** — the entry point runs on the tokio
  runtime, giving us access to tokio's task scheduler and I/O facilities.
- **`tokio::spawn_blocking`** — winit's `event_loop.run_app()` is synchronous
  and owns the display server connection. Blocking the tokio runtime thread with
  an indefinite sync call would starve other tasks. We offload the blocking event
  loop to a dedicated thread, then await the join handle.
- **`Handle::block_on()` in `resumed()`** — wgpu initialization (adapter and
  device queries) is async, but winit's `resumed()` handler is synchronous. We
  bridge the two execution models exactly once at startup. This initial GPU
  setup takes ~50ms of wall time.
- **`Arc<Window>`** — shared reference count to the window, needed because both
  winit event handlers and wgpu [surface](concepts/GLOSSARY.md#surface) state
  must hold a reference to the same window object across the event loop
  boundary.
- **`ControlFlow::Poll`** — continuous redraw mode. winit fires
  `RedrawRequested` as fast as the display server allows the window to be
  presented, giving us a tight render loop without a separate timer or explicit
  vsync setup. The display [present mode](concepts/GLOSSARY.md#present-mode)
  controls the actual vsync behavior.

### Dependencies

Add these to your `Cargo.toml`:

```toml
wgpu = "29"
winit = "0.30"
tokio = { version = "1", features = ["rt", "macros"] }
bytemuck = { version = "1", features = ["derive"] }
log = "0.4"
simple_logger = "5"
```

- `wgpu` — the GPU abstraction layer. Manages device lifecycles, shaders, buffers,
  pipelines, and command encoding.
- `winit` — cross-platform window creation and event dispatch. Owns the display
  server connection.
- `tokio` — async runtime for the main loop and all GPU queries.
- `bytemuck` — zero-copy casting between Rust structs and byte slices. Required
  for uploading vertex data to GPU buffers without manual serialization.
- `log` / `simple_logger` — structured logging. wgpu and winit emit diagnostic
  messages via `log` when misconfigurations or driver issues are detected.

### Complete Code

```rust
use std::sync::Arc;
use winit::application::ApplicationHandler;
use winit::dpi::LogicalSize;
use winit::event::WindowEvent;
use winit::event_loop::{ActiveEventLoop, ControlFlow, EventLoop};
use winit::window::{Window, WindowId};

#[tokio::main]
async fn main() {
    simple_logger::init_with_level(log::Level::Debug).unwrap();

    let event_loop = EventLoop::new().unwrap();
    let handle = tokio::Handle::current();

    tokio::spawn_blocking(move || {
        event_loop.run_app(&mut App {
            handle,
            window: None,
            state: None,
        })
    })
    .await
    .unwrap();
}

struct App {
    handle: tokio::Handle,
    window: Option<Arc<Window>>,
    state: Option<State>,
}

impl ApplicationHandler<()> for App {
    fn resumed(&mut self, event_loop_ctl: &ActiveEventLoop) {
        let window = Arc::new(
            event_loop_ctl
                .create_window(
                    Window::default_attributes()
                        .with_inner_size(LogicalSize::new(800.0, 600.0))
                        .with_title("Rainbow Triangle"),
                )
                .unwrap(),
        );
        event_loop_ctl.set_control_flow(ControlFlow::Poll);
        self.window = Some(window.clone());

        self.state = Some(
            self.handle
                .block_on(async {
                    State::new(window.clone()).await.expect("Failed to create wgpu State")
                })
                .expect("Failed to create wgpu State"),
        );
    }

    fn window_event(
        &mut self,
        event_loop_ctl: &ActiveEventLoop,
        _window_id: WindowId,
        event: WindowEvent,
    ) {
        let Some(state) = self.state.as_mut() else { return };
        let Some(window) = self.window.as_ref() else { return };

        match event {
            WindowEvent::Resized(size) => state.resize(size),
            WindowEvent::CloseRequested { .. } => event_loop_ctl.exit(),
            WindowEvent::RedrawRequested => {
                state.render();
                window.request_redraw();
            }
            _ => {}
        }
    }

    fn exiting(&mut self, event_loop_ctl: &ActiveEventLoop) {
        event_loop_ctl.exit();
    }
}
```

**Why `spawn_blocking`:** The display server event loop must run to completion
and cannot be interrupted. If we ran `run_app()` on the tokio runtime thread,
no other async tasks could execute. By spawning it on a blocking thread, the
tokio runtime remains free for GPU queries, driver I/O, and future background
tasks.

**Why `Handle::block_on`:** wgpu's `request_adapter` and `request_device` query
the driver over async D-Bus/Wayland/Vulkan entrypoints. These futures must be
polled by a runtime executor. `block_on` attaches temporarily to the runtime
thread via its handle, polls the future to completion (~50ms), then returns the
result.

**Why `ControlFlow::Poll`:** winit supports `ControlFlow::Poll` (continuous
redraw) and `ControlFlow::Wait` (idle until next event). A graphics application
needs a steady render loop. `Poll` tells winit to keep firing `RedrawRequested`
events. We re-queue ourselves inside the handler via `window.request_redraw()`,
matching the wgpu swapchain presentation rhythm.

**Why `request_redraw()`:** After presenting a frame to the display, we ask
winit to schedule the next `RedrawRequested` frame. This creates an explicit
render loop: render → present → request redraw → render → repeat. The rate is
governed by the [swapchain](concepts/GLOSSARY.md#swapchain) [present mode](concepts/GLOSSARY.md#present-mode).

**Why `exiting()`:** This is the final lifecycle signal before the process
terminates. On some display servers, `CloseRequested` fires on the window but
the event loop must still drain. `exiting()` ensures we have one last clean
opportunity to flush the queue and release GPU resources before the process
exits.

## S3: Connecting to the GPU — The Init Chain

New concept: **5-layer GPU connection.** Each layer adds a capability:

1. **[Instance](concepts/GLOSSARY.md#instance)** — opens a connection to the
   graphics driver. On Vulkan this loads the Vulkan loader and registers
   instance-level extensions. On WebGL this picks the browser GPU context.
2. **[Surface](concepts/GLOSSARY.md#surface)** — binds the instance to a
   specific window's swapchain. The surface is the wgpu representation of the
   window's display buffer.
3. **[Adapter](concepts/GLOSSARY.md#adapter)** — selects the physical GPU
   hardware. An adapter wraps the actual driver + silicon pair (e.g., Mesa RADV
   on AMD, NVIDIA driver on NVIDIA silicon).
4. **[Device](concepts/GLOSSARY.md#device) + [Queue](concepts/GLOSSARY.md#queue)** — the
   device owns all GPU resources (buffers, textures, shaders, pipelines). The
   queue is the submission channel: you encode work into command buffers and
   submit them to the queue.
5. **[SurfaceConfiguration](concepts/GLOSSARY.md#surface-configuration)** —
   allocates the swapchain [framebuffers](concepts/GLOSSARY.md#framebuffer) for
   this window at a specific resolution and pixel format.

### The State Struct

```rust
struct State {
    surface: wgpu::Surface<'static>,
    device: wgpu::Device,
    queue: wgpu::Queue,
    config: wgpu::SurfaceConfiguration,
    window: Arc<Window>,
    pipeline: wgpu::RenderPipeline,
    vertex_buffer: wgpu::Buffer,
}
```

- **`surface`** — connects to the window's display buffer. The `'static` lifetime
  is safe because `App` owns the window and lives for the entire lifetime of the
  process. The surface mediates all [swapchain](concepts/GLOSSARY.md#swapchain)
  operations.
- **`device`** — owns all GPU resources. Every buffer, texture, shader module,
  and pipeline created in this guide is a child of the device. When the device
  is dropped, all its children are freed.
- **`queue`** — the command submission channel. You encode a frame's worth of
  work into a [command buffer](concepts/GLOSSARY.md#command-buffer), then submit
  that buffer to the queue. The queue pushes work to the GPU hardware.
- **`config`** — holds the surface's current width, height, pixel format, and
  [present mode](concepts/GLOSSARY.md#present-mode). When the window is resized,
  we reconfigure the surface with updated dimensions.
- **`window`** — shared reference to the winit window. Stored as an `Arc` so
   the `resize()` method and the `CurrentSurfaceTexture::Outdated` recovery handler can
  access the window's current dimensions. When the surface becomes outdated
  (e.g., after a compositor restart or display hotplug), recovery requires
  reconfiguring the swapchain with the window's live size — and that requires
  holding a reference to the window itself.
- **`pipeline`** — the compiled [render pipeline](concepts/GLOSSARY.md#pipeline-render).
  A render pipeline is an immutable configuration combining a shader, a vertex
  buffer layout, a primitive topology, and a color target setup. Switching pipelines mid-frame is expensive; most applications use a few
  pipelines and change them between draw calls.
- **`vertex_buffer`** — GPU memory holding our vertex data. The GPU reads
  position and color data directly from this buffer during the vertex shader
  stage.

### Complete `State::new()` Implementation

```rust
use wgpu::Surface;

// --- Vertex type and data ---

#[repr(C)]
#[derive(Clone, Copy, bytemuck::Pod, bytemuck::Zeroable)]
struct Vertex {
    position: [f32; 3],
    color: [f32; 3],
}

const VERTICES: &[Vertex] = &[
    Vertex { position: [-0.5, -0.5, 0.0], color: [1.0, 0.0, 0.0] }, // red
    Vertex { position: [ 0.5, -0.5, 0.0], color: [0.0, 0.0, 1.0] }, // blue
    Vertex { position: [ 0.0,  0.5, 0.0], color: [0.0, 1.0, 0.0] }, // green
];

impl State {
    async fn new(window: Arc<Window>) -> Result<Self, String> {
        // Step 1: Instance — connection to the graphics driver
        let instance = wgpu::Instance::default();

        // Step 2: Surface — binds our window to the GPU's swapchain
        let surface = instance
            .create_surface(window)
            .map_err(|e| format!("Failed to create surface: {:?}", e))?;

        // Step 3: Adapter — selects the physical GPU
        let adapter = instance
            .request_adapter(&wgpu::RequestAdapterOptions {
                power_preference: wgpu::PowerPreference::HighPerformance,
                force_fallback_adapter: false,
                compatible_surface: None,
            })
            .await
            .ok_or("No GPU adapter found. Ensure Vulkan drivers are installed.")?;

        // Step 4: Device + Queue — resource owner + command submission
        let (device, queue) = adapter
            .request_device(&wgpu::DeviceDescriptor::default(), None)
            .await
            .map_err(|e| format!("Failed to request device: {:?}", e))?;

        // Step 5: SurfaceConfiguration — allocates swapchain framebuffers
        let size = window.inner_size();
        let surface_caps = surface.get_capabilities(&adapter);
        let format = surface_caps.formats.iter()
            .find(|f| f.is_srgb())
            .copied()
            .unwrap_or(surface_caps.formats[0]);

        let config = wgpu::SurfaceConfiguration {
            usage: wgpu::TextureUsages::RENDER_ATTACHMENT | wgpu::TextureUsages::TEXTURE_BINDING,
            format,
            width: size.width.max(1),
            height: size.height.max(1),
            present_mode: wgpu::PresentMode::Mailbox,
            desired_maximum_frame_latency: 2,
            alpha_mode: surface_caps.alpha_modes[0],
            view_formats: vec![format.add_srgb_suffix()],
        };
        surface.configure(&device, &config);

        // Step 6: Compile the shader module
        let shader_module = device.create_shader_module(
            wgpu::ShaderModuleDescriptor {
                label: Some("Rainbow Triangle Shader"),
                source: wgpu::ShaderSource::Wgsl(include_str!("shader.wgsl").into()),
            }
        );

        // Step 7: Upload vertex data to GPU memory
        use wgpu::util::DeviceExt;
        let vertex_buffer = device.create_buffer_init(
            &wgpu::util::BufferInitDescriptor {
                label: Some("Vertex Buffer"),
                contents: bytemuck::cast_slice(VERTICES),
                usage: wgpu::BufferUsages::VERTEX,
            }
        );

        // Step 8: Create the render pipeline
        let vertex_buffer_layout = wgpu::VertexBufferLayout {
            array_stride: std::mem::size_of::<Vertex>() as u64,
            step_mode: wgpu::VertexStepMode::Vertex,
            attributes: &[
                wgpu::VertexAttribute {
                    offset: 0,
                    format: wgpu::VertexFormat::F32x3,
                    shader_location: 0,
                },
                wgpu::VertexAttribute {
                    offset: std::mem::size_of::<[f32; 3]>() as u64,
                    format: wgpu::VertexFormat::F32x3,
                    shader_location: 1,
                },
            ],
        };

        let pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
            label: Some("Triangle Pipeline"),
            layout: None,
            vertex: wgpu::VertexState {
                module: &shader_module,
                entry_point: Some("vs_main"),
                buffers: &[vertex_buffer_layout],
                compilation_options: Default::default(),
            },
            primitive: wgpu::PrimitiveState {
                topology: wgpu::PrimitiveTopology::TriangleList,
                strip_index_format: None,
                front_face: wgpu::FrontFace::Ccw,
                cull_mode: Some(wgpu::Face::Back),
                unclipped_depth: false,
                polygon_mode: wgpu::PolygonMode::Fill,
                conservative: false,
            },
            depth_stencil: None,
            multisample: wgpu::MultisampleState {
                count: 1,
                mask: !0,
                alpha_to_coverage_enabled: false,
            },
            fragment: Some(wgpu::FragmentState {
                module: &shader_module,
                entry_point: Some("fs_main"),
                targets: &[Some(wgpu::ColorTargetState {
                    format: config.format,
                    blend: None,
                    write_mask: wgpu::ColorWrites::ALL,
                })],
                compilation_options: Default::default(),
            }),
            multiview_mask: None,
            cache: None,
        });

        Ok(Self {
            surface,
            device,
            queue,
            config,
            window: Arc::clone(&window),
            pipeline,
            vertex_buffer,
        })
    }
}
```

### Init Steps Explained

**Step 1 — Instance:** `Instance::default()` opens a connection to the graphics
driver on the current platform. On Linux with Vulkan, this loads `libvulkan.so`
and creates a Vulkan `VkInstance`. On Windows, it loads `vulkan-1.dll`. The
instance is the foundational wgpu object — every other wgpu operation requires
it.

**Step 2 — Surface:** `instance.create_surface(window)` binds the wgpu instance
to the winit `Window`. This tells the GPU: "the pixels of *this* window will be
the output of my rendering." In Vulkan terms, this is the first half of creating
a `SwapchainKHR`. The surface must match the window platform type exactly (X11,
Wayland, Windows, macOS, etc.).

**Step 3 — Adapter:** `request_adapter()` queries available GPUs and returns the
best match for the given options. With
`PowerPreference::HighPerformance`, wgpu prefers a discrete GPU over an
integrated one on hybrid systems (e.g., NVIDIA + Intel Optimus). The
`compatible_surface: None` path works because our `Instance` was created without
a display handle; on Linux with Vulkan, the adapter selection remains correct
because the surface itself was created through a compatible instance.

**Step 4 — Device + Queue:** `request_device()` allocates the logical GPU
resource manager and its submission queue. The device tracks all GPU memory and
validates API calls. The queue is the submission endpoint — every rendered frame
becomes a [command buffer](concepts/GLOSSARY.md#command-buffer) that is submitted
to this queue. On Vulkan, the device corresponds to `VkDevice` and the queue
to a `VkQueue`.

> **Key insight — Validation layers catch GPU errors at runtime:** wgpu ships
> with built-in validation layers that inspect your API calls for common
> mistakes: incorrect buffer bindings, mismatched pipeline state, out-of-bounds
> buffer slices, and resource lifecycle violations. These layers run
> automatically during development and surface errors as log messages or
> panics, saving hours of debugging silent GPU corruption. The tradeoff:
> validation adds measurable overhead to every frame. In release builds,
> disable validation by omitting `InstanceFlags::VALIDATION` when creating the
> `Instance`, or set the `WGPU_VALIDATION=0` environment variable.

**Step 5 — SurfaceConfiguration:** This allocates the
[swapchain](concepts/GLOSSARY.md#swapchain) [framebuffers](concepts/GLOSSARY.md#framebuffer).
We negotiate the pixel format with the driver (preferring an
[sRGB](concepts/GLOSSARY.md) format for correct color display), pick the
window dimensions (clamped to at least 1x1 to allow minimize-and-restore on some
platforms), and select the [present mode](concepts/GLOSSARY.md#present-mode).
`PresentMode::Mailbox` is a triple-buffered present mode that provides
consistent 60fps without tearing on most platforms.
`desired_maximum_frame_latency: 2` tells the swapchain to keep two frames of
back pressure, smoothing out frame time spikes.

Steps 6 through 8 — shader module compilation, vertex buffer upload, and render
pipeline assembly — will be explored in detail in the next sections.

## S4: Writing the Shaders

New concept: **shaders are GPU programs.** A [shader](concepts/GLOSSARY.md#shader)
is a function or set of functions that runs on the GPU, compiled once at pipeline
creation time, then executed thousands of times in parallel. Each invocation
operates on different data but follows the identical instruction sequence. There
is no heap allocation, no recursion, no I/O, and no shared mutable state. The
GPU runs every invocation of a shader in lockstep: if one thread takes a
different branch, the entire wavefront serializes both paths and discards the
dead result. This is why you write shaders differently from CPU code — you
optimize for parallelism and branchless arithmetic.

A [shader module](concepts/GLOSSARY.md#shader) can contain multiple entry points.
For rendering, the two mandatory entry points are the [vertex shader](concepts/GLOSSARY.md#vertex-shader)
and the [fragment shader](concepts/GLOSSARY.md#fragment-shader). The vertex
shader runs once per [vertex](concepts/GLOSSARY.md#vertex). The fragment shader
runs once per [fragment](concepts/GLOSSARY.md#fragment) — that is, once per pixel
covered by the rasterized [primitive](concepts/GLOSSARY.md#primitive).

> **Key insight #1 — Interpolation is free hardware:** The vertex shader outputs
> per-vertex colors at `@location(0)`. The [rasterizer](concepts/GLOSSARY.md#rasterizer)
> automatically interpolates them across the triangle surface using
> [barycentric coordinates](concepts/GLOSSARY.md#barycentric-coordinates). The
> fragment shader just returns whatever it receives. The rainbow gradient is not
> programmed — it is a consequence of the pipeline architecture. You supply
> colors at three points; the hardware computes every color in between at zero
> shader cost.

**Why WGSL:** WebGPU Shading Language ([WGSL](concepts/GLOSSARY.md#wgsl)) is
the single source format. wgpu compiles it to the platform-native intermediate
at runtime: SPIR-V for Vulkan, MSL for Metal, DXIL for DirectX. You write one
shader file and wgpu produces the right binary for every backend.

**Why `include_str!("shader.wgsl")`:** This Rust macro embeds the file contents
at compile time. The shader source becomes a string literal inside your binary.
At runtime there is zero file I/O. No paths to resolve, no loading failures,
no async reads. If the file is missing or malformed, the build fails, not the
runtime.

### The Complete Shader

Create `shader.wgsl` in your project root (at the same level as `main.rs`):

```wgsl
struct VertexOutput {
    @builtin(position) clip_position: vec4<f32>,
    @location(0) vertex_color: vec3<f32>,
};

@vertex
fn vs_main(
    @location(0) position: vec3<f32>,
    @location(1) color: vec3<f32>,
) -> VertexOutput {
    var out: VertexOutput;
    out.clip_position = vec4<f32>(position, 1.0);
    out.vertex_color = color;
    return out;
}

@fragment
fn fs_main(input: VertexOutput) -> @location(0) vec4<f32> {
    return vec4<f32>(input.vertex_color, 1.0);
}
```

### Line-by-Line Walkthrough

**`struct VertexOutput { ... }`** — Defines the data flowing between the vertex
shader and the fragment shader. This struct is not a Rust type and not a buffer
layout — it is the output contract of the vertex shader that the rasterizer
carries through to the fragment shader.

**`@builtin(position) clip_position: vec4<f32>`** — `@builtin(position)` is a
reserved GPU output slot. Every vertex shader must produce a `vec4<f32>` at
this slot. This value is the vertex position in [clip space](concepts/GLOSSARY.md#clip-space).
The GPU uses it for perspective division (dividing x, y, z by w to produce
[ndc](concepts/GLOSSARY.md#ndc)) and clipping. In our triangle, the w
component is 1.0, so perspective division is the identity operation — our
positions are already in the right space.

**`@location(0) vertex_color: vec3<f32>`** — `@location(0)` marks this field
for interpolation. Any `@location(n)` output from the vertex shader that is
not a builtin is automatically interpolated by the rasterizer using barycentric
weights. At each vertex, the value is exact. Inside the triangle, it is the
weighted blend of all three vertex values. The fragment shader receives a
different `vertex_color` for every pixel, without any manual interpolation code.

> **Key insight #2 — THE LOCATIONS MUST MATCH:** `shader_location: 0` in
> Rust's `VertexAttribute` MUST equal `@location(0)` in WGSL's parameter
> annotation. If they differ, the shader reads from the wrong memory offset
> and produces garbage. This is not a type error or a runtime panic — it is
> silent data corruption. The GPU reads whatever bytes live at the mismatched
> offset and interprets them as floats.

**`@vertex fn vs_main(...)`** — `@vertex` declares this function as the vertex
shader entry point. The function is invoked once per vertex in the draw call.
For our triangle with three vertices, `vs_main` runs exactly three times.

**`@location(0) position: vec3<f32>`** — This input parameter receives data
from the vertex buffer mapped by `shader_location: 0`. In our Rust
`VertexBufferLayout`, the first `VertexAttribute` reads 3 floats at offset 0
and delivers them to the shader at location 0. This is the raw NDC position.

**`@location(1) color: vec3<f32>`** — The second vertex buffer attribute
mapped to location 1. Reads 3 floats at the offset after the position
(12 bytes into each vertex) — the per-vertex RGB color.

**`var out: VertexOutput;`** — Local variable declaration. WGSL requires
explicit variable bindings. `var` creates a mutable local.

**`out.clip_position = vec4<f32>(position, 1.0);`** — Converts the `vec3`
input into [homogeneous coordinates](concepts/GLOSSARY.md#homogeneous-coordinates)
by appending w = 1.0. This promotes the position from 3D to clip space. With
w = 1.0, perspective division (x/w, y/w, z/w) leaves the coordinates unchanged.
If we were using perspective projection, the vertex shader would compute a
nontrivial w value from the depth.

**`out.vertex_color = color;`** — Passes the input color through to the output.
The rasterizer picks this field up, interpolates it across the triangle surface,
and delivers the interpolated value to every fragment.

**`@fragment fn fs_main(input: VertexOutput)`** — `@fragment` declares the
fragment shader entry point. `input` is the rasterizer's interpolated output
from the vertex shader. Every `@location(n)` field in `VertexOutput` is now
pre-blended with barycentric weights.

> **Key insight — TWO `@builtin(position)` builtins, zero connection:**
> Vertex `@builtin(position)` and fragment `@builtin(position)` are two
> completely separate builtins that happen to share the same name. The vertex
> shader outputs clip-space coordinates into `@builtin(position)` for the
> rasterizer to perform perspective division and viewport transform. The
> fragment shader receives an entirely different `@builtin(position)` injected
> by the fragment stage, providing framebuffer pixel coordinates: `x`/`y` are
> the pixel center within the viewport, `z` is the depth value (typically
> [0, 1]), and `w` is the interpolated reciprocal of the vertex clip-space
> w-coordinate (1/w). The vertex shader's position output is NOT passed to the
> fragment shader's position input. They are independent builtins from
> different pipeline stages. If you need to pass data from vertex to fragment
> with interpolation, use `@location(N)` on regular struct fields — which is
> exactly what `vertex_color` does in our shader.

**`-> @location(0) vec4<f32>`** — The fragment shader must output at least one
color value at `@location(0)`. This number must match the corresponding color
target in the [render pipeline](concepts/GLOSSARY.md#pipeline-render) descriptor. The
return type is `vec4<f32>` — RGBA with linear-space components.

**`return vec4<f32>(input.vertex_color, 1.0);`** — Promotes the interpolated
RGB color to RGBA by setting alpha = 1.0 (fully opaque). The
[rasterizer](concepts/GLOSSARY.md#rasterizer) interpolated `input.vertex_color`
across the triangle; we just attach an alpha channel and return it. The output
merge stage writes this color directly to the framebuffer.

### Rust Shader Module Creation

The Rust side loads the shader file at compile time and feeds the source to wgpu:

```rust
let shader_module = device.create_shader_module(
    wgpu::ShaderModuleDescriptor {
        label: Some("Rainbow Triangle Shader"),
        source: wgpu::ShaderSource::Wgsl(include_str!("shader.wgsl").into()),
    }
);
```

- **`ShaderModuleDescriptor`** — has two fields: `label` (debug string, shown
  in graphics debuggers and validation messages) and `source` (the shader
  text).
- **`ShaderSource::Wgsl(...)`** — wraps the WGSL string. wgpu also accepts
  SPIR-V binary source via `ShaderSource::SpirV`, but WGSL is the native
  path.
- **`device.create_shader_module()`** — takes the descriptor and parses +
  validates the shader. On Vulkan, wgpu translates WGSL to SPIR-V internally.
  If the shader has syntax errors, type mismatches, or unresolved entry points,
  this call returns an error.
- **`&shader_module`** — the resulting handle is passed by reference into the
  render pipeline descriptor. The module remains valid for the lifetime of the
  pipeline.

## S5: Uploading Vertex Data to the GPU

New concept: **GPU memory isolation.** The GPU cannot read Rust heap or stack
memory directly. Vertex data must be laid out as a flat byte array and uploaded
into a dedicated GPU [buffer slice](concepts/GLOSSARY.md#buffer-slice). The
pipeline configuration then describes how to interpret those bytes: how many
bytes per vertex, what format each attribute has, and where in the vertex
strides the attribute begins.

> **Key insight #3 — `create_buffer_init` is an extension trait:** The method
> lives in `wgpu::util::DeviceExt`, not on `Device` directly. If you call
> `device.create_buffer_init(...)` without importing the trait, the compiler
> reports "method not found." This is a Rust trait-discovery issue, not a wgpu
> API issue. Add `use wgpu::util::DeviceExt;` to bring the method into scope.

### The Vertex Struct

```rust
#[repr(C)]
#[derive(Clone, Copy, bytemuck::Pod, bytemuck::Zeroable)]
struct Vertex {
    position: [f32; 3],
    color: [f32; 3],
}
```

- **`#[repr(C)]`** — Forces the Rust compiler to lay out the struct fields in
  declaration order with no padding reordering. Without this, Rust is free to
  reorder fields for optimal alignment, which would break the byte layout the
  shader expects.
- **`bytemuck::Pod`** — "Plain Old Data." Guarantees the struct has no padding
  holes, no destructors, and a trivial memory representation. wgpu requires
  all vertex types to be Pod so they can be safely transmuted to bytes.
- **`bytemuck::Zeroable`** — Guarantees that initializing the struct's memory
  to all-zero bytes produces a valid instance. Required because `Pod` alone
  does not guarantee zero is a valid discriminant for enums or optional types.
  Combined with Pod, it enables `bytemuck::cast_slice` to convert between
  `&[Vertex]` and `&[u8]` without a `unsafe` block.

### Vertex Data

```rust
const VERTICES: &[Vertex] = &[
    Vertex { position: [-0.5, -0.5, 0.0], color: [1.0, 0.0, 0.0] }, // red
    Vertex { position: [ 0.5, -0.5, 0.0], color: [0.0, 0.0, 1.0] }, // blue
    Vertex { position: [ 0.0,  0.5, 0.0], color: [0.0, 1.0, 0.0] }, // green
];
```

- **Positions are in NDC:** The [normalized device coordinates](concepts/GLOSSARY.md#ndc)
  range from -1.0 (left/bottom) to +1.0 (right/top). Our triangle spans the
  bottom half of the screen: the bottom-left corner at (-0.5, -0.5), the
  bottom-right at (0.5, -0.5), and the top center at (0.0, 0.5). This
  produces an upright, centered triangle.
- **CCW winding order:** The vertices are listed counter-clockwise:
  red → blue → green. In a standard right-handed coordinate system, connecting
  vertices in this sequence traces the triangle counter-clockwise. This
   determines which face is "front" and which is "back" — critical for
   [culling](concepts/GLOSSARY.md) and correct normal computation.

### Buffer Upload

```rust
use wgpu::util::DeviceExt;
let vertex_buffer = device.create_buffer_init(
    &wgpu::util::BufferInitDescriptor {
        label: Some("Vertex Buffer"),
        contents: bytemuck::cast_slice(VERTICES),
        usage: wgpu::BufferUsages::VERTEX,
    }
);
```

- **`use wgpu::util::DeviceExt`** — imports the extension trait that adds
  `create_buffer_init` to `Device`. Without this import, the method is not
  visible.
- **`device.create_buffer_init(...)`** — combined allocate-and-upload. It
  creates a GPU buffer, allocates system memory, copies the `contents` slice
  into staging storage, and issues a synchronous copy to GPU memory. This is a
  convenience wrapper around `create_buffer` + `queue.write_buffer`.
- **`bytemuck::cast_slice(VERTICES)`** — converts `&[Vertex; 3]` to `&[u8]`
  by reinterpreting the same memory at a byte level. The GPU receives 72 bytes:
  three vertices × 24 bytes per vertex (6 × `f32` = 6 × 4 bytes). No copy, no
  serialization — just a pointer reinterpretation.
- **`BufferUsages::VERTEX`** — declares this buffer will be bound as a vertex
  buffer in the pipeline. wgpu's validation layer will reject any attempt to
  use this buffer for staging, uniform, or storage access. Usage bits
  are chosen at creation and cannot be changed.

## S6: Compiling the Render Pipeline

New concept: **the render pipeline is a compiled GPU configuration.** A
[render pipeline](concepts/GLOSSARY.md#pipeline-render) bundles every decision the GPU
needs to execute a draw: which shaders to run, how to interpret vertex buffer
bytes, what [topology](concepts/GLOSSARY.md#topology) to use, whether to cull
back faces, what blend mode to apply, and where to write the output. Pipeline
creation is not a simple struct allocation — it compiles these choices into a
GPU-executable configuration. Errors in any field are caught at creation time,
not at draw time. This validation-upfront model is what makes pipelines expensive
to create but cheap to execute.

### Vertex Buffer Layout

Before the pipeline descriptor, you must tell wgpu how to parse the byte stream
in the vertex buffer into per-vertex attributes:

```rust
let vertex_buffer_layout = wgpu::VertexBufferLayout {
    array_stride: std::mem::size_of::<Vertex>() as u64,
    step_mode: wgpu::VertexStepMode::Vertex,
    attributes: &[
        wgpu::VertexAttribute {
            offset: 0,
            format: wgpu::VertexFormat::F32x3,
            shader_location: 0,
        },
        wgpu::VertexAttribute {
            offset: std::mem::size_of::<[f32; 3]>() as u64,
            format: wgpu::VertexFormat::F32x3,
            shader_location: 1,
        },
    ],
};
```

- **`array_stride: 24`** — `size_of::<Vertex>()` = 24 bytes (6 × `f32` × 4 bytes).
  This is the byte distance from one vertex to the next in the buffer. The GPU
  uses this to step through the buffer: vertex 0 starts at byte 0, vertex 1
  at byte 24, vertex 2 at byte 48.
- **`step_mode: Vertex`** — advance the buffer by one stride for every vertex
  the vertex shader processes. The other option is `Instance`, which advances
  per draw instance in instanced rendering. For a single triangle, `Vertex` is
  correct: each of the three vertices has its own position and color.
- **First attribute — `shader_location: 0`**: reads 3 floats (`F32x3`) at byte
  offset 0 of each vertex. These 3 floats map to the
  [shader location](concepts/GLOSSARY.md#shader-location) `@location(0)` in the
  vertex shader — the `position` parameter. The GPU delivers `[x, y, z]` to
  that function argument.
- **Second attribute — `shader_location: 1`**: reads 3 floats at offset 12
  (`size_of::<[f32; 3]>()` = 3 × 4 = 12). Skips past the position array to
  the color array inside each vertex. Maps to `@location(1)` in the shader —
  the `color` parameter. If the offset were 0 instead of 12, the shader would
  receive the position values as the color input, rendering a triangle with
  gradient colors derived from position data.

### The Complete Render Pipeline Descriptor

```rust
let pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
    label: Some("Triangle Pipeline"),
    layout: None,
    vertex: wgpu::VertexState {
        module: &shader_module,
        entry_point: Some("vs_main"),
        buffers: &[vertex_buffer_layout],
        compilation_options: Default::default(),
    },
    primitive: wgpu::PrimitiveState {
        topology: wgpu::PrimitiveTopology::TriangleList,
        strip_index_format: None,
        front_face: wgpu::FrontFace::Ccw,
        cull_mode: Some(wgpu::Face::Back),
        unclipped_depth: false,
        polygon_mode: wgpu::PolygonMode::Fill,
        conservative: false,
    },
    depth_stencil: None,
    multisample: wgpu::MultisampleState {
        count: 1,
        mask: !0,
        alpha_to_coverage_enabled: false,
    },
    fragment: Some(wgpu::FragmentState {
        module: &shader_module,
        entry_point: Some("fs_main"),
        targets: &[Some(wgpu::ColorTargetState {
            format: config.format,
            blend: None,
            write_mask: wgpu::ColorWrites::ALL,
        })],
        compilation_options: Default::default(),
    }),
    multiview_mask: None,
    cache: None,
});
```

### Field-by-Field Walkthrough

**`RenderPipelineDescriptor` has 9 fields.** Every field must be present. The
structure does not use `..Default::default()` at the descriptor level — each
field is filled explicitly.

**`label: Some("Triangle Pipeline")`** — Debug string. Shown in GPU profilers
(RenderDoc, Nvidia Nsight) and wgpu validation error messages. Omitting it
produces anonymous pipelines that are impossible to trace during debugging.

**`layout: None`** — Derives the pipeline layout from the shader module
automatically. When no push constants or bind groups are used, `None` tells wgpu
to infer the layout. If you later add `@group(n)` bindings to your shader, you
must provide a `RenderPipelineLayout` created with `device.create_render_pipeline_layout()`.

**`vertex` — [`VertexState`](concepts/GLOSSARY.md#vertex-shader) (4 fields):**
- **`module: &shader_module`** — references the compiled shader module from S4.
- **`entry_point: Some("vs_main")`** — selects which function in the module is
  the vertex shader entry point. Must match the `@vertex fn vs_main(...)`
  declaration exactly.
- **`buffers: &[vertex_buffer_layout]`** — array of vertex buffer layouts.
  Multiple layouts are used rarely (multi-mesh, GPU instancing with separate
  instance buffers). For a single vertex buffer, one layout suffices.
- **`compilation_options: Default::default()`** — shader compilation backend
  hints. Default uses the backend's standard flags for optimization and SPIR-V
  version.

**`primitive` — [`PrimitiveState`](concepts/GLOSSARY.md#primitive) (7 fields):**
- **`topology: TriangleList`** — every 3 consecutive vertices form one
  triangle. For 3 vertices, this produces exactly 1 triangle. If we had 6
  vertices, it would produce 2 independent triangles.
- **`strip_index_format: None`** — only set for `TriangleStrip` or `LineStrip`
  topologies when using restart indices. Not applicable to `TriangleList`.
- **`front_face: Ccw`** — counter-clockwise winding defines the front face of
  a triangle. Combined with `cull_mode`, this determines which triangles are
  visible. Because our vertices are listed CCW in S5, triangles drawn in that
  order face toward the viewer.
- **`cull_mode: Some(wgpu::Face::Back)`** — discard triangles whose winding
  indicates a back face. For a single triangle viewed from the front, this is
  harmless but establishes correct culling for 3D geometry where back faces
  are guaranteed not to be visible.
- **`unclipped_depth: false`** — depth values outside [0.0, 1.0] are clipped
  (the standard behavior). `true` allows depth values beyond the normal range
  to pass through — used for specific depth-testing tricks.
- **`polygon_mode: Fill`** — render the full interior of the triangle. Other
  options are `Line` (wireframe edges) and `Point` (vertex points only).
- **`conservative: false`** — the rasterizer fragments only pixels provably
  inside the triangle. `true` fragments every pixel that *might* intersect the
  triangle — used for conservative rasterization (shadow volumes, occlusion
  queries).

**`depth_stencil: None`** — No depth buffer or stencil buffer. Without depth
testing, triangles are drawn in submission order: later draws overwrite earlier
draws at the same pixel. For a single triangle this is not a concern.

**`multisample` — [`MultisampleState`](concepts/GLOSSARY.md#fragment) (3 fields):**
- **`count: 1`** — no multisampling. Each pixel produces one fragment. Higher
  values (2, 4, 8) activate MSAA, sampling multiple points per pixel and
  reducing aliasing at the cost of framebuffer bandwidth.
- **`mask: !0`** — all sample bits are enabled. This mask allows you to
  selectively disable individual MSAA samples (advanced use case).
- **`alpha_to_coverage_enabled: false`** — do not use the alpha channel of the
  fragment color as a coverage mask. Enabled for transparent edge antialiasing
  (e.g., font rendering).

**`fragment` — [`FragmentState`](concepts/GLOSSARY.md#fragment-shader) (4 fields):**
- **`module: &shader_module`** — same shader module as the vertex shader.
- **`entry_point: Some("fs_main")`** — selects the fragment shader entry point.
  Must match `@fragment fn fs_main(...)` in the WGSL.
- **`targets`** — array of color target states, one per render pass output
  attachment. `&[Some(...)]` means one color target present. `None` at this
  index would mean a render pass with no color output (e.g., depth-only pass).
  - **`ColorTargetState` has exactly 3 fields** (no `view_formats` field):
    - **`format: config.format`** — MUST match the surface format from
      `SurfaceConfiguration`. The pipeline writes in this format; the surface
      reads in this format. A mismatch at render time produces an error. If
      you change the surface format, you must recreate the pipeline.
    - **`blend: None`** — disables blending. Without blending, every fragment
      color replaces the existing framebuffer pixel (`REPLACE` mode). With
      blending, new and existing colors are combined according to a blend
      equation (useful for transparency).
    - **`write_mask: ColorWrites::ALL`** — write all four RGBA channels.
      You can mask out individual channels (e.g., write only R and G) if you
      need to preserve certain framebuffer channels across draw calls.
- **`compilation_options: Default::default()`** — fragment shader compilation
  flags, same as the vertex compilation options above.

**`multiview_mask: None`** — no multiview rendering. Multiview is for
stereoscopic (VR) or multi-viewport single-pass rendering. Not used here.

**`cache: None`** — no pipeline cache. A pipeline cache stores compiled shader
binaries to speed up subsequent pipeline creation. Useful when creating many
pipelines dynamically; for a single pipeline, caching has no practical benefit.

## S7: The Render Loop — Recording and Submitting Commands

New concept: **command buffers are scripts, not function calls.** You cannot call
GPU operations directly from CPU code. Instead, you record commands into a
[command buffer](concepts/GLOSSARY.md#command-buffer) — a script that the GPU
queue executes asynchronously. Think of it like building an assembly listing:
each recording method appends an instruction. When the script is complete, you
submit it atomically to the [queue](concepts/GLOSSARY.md#queue). The GPU executes
all instructions in parallel, in whatever order it determines is optimal. There
is no `.await` on a draw call. The CPU returns immediately after submission and
continues the next frame while the GPU works in the background.

> **Key insight #4 — Command buffers are scripts, not function calls:**
> `create_command_encoder()` opens a recording session. `begin_render_pass()`
> starts a scoped drawing block. `render_pass.draw()` appends a draw command.
> `encoder.finish()` seals the script. `queue.submit()` dispatches it. The GPU
> executes it later, in parallel. There is no `.await` on a draw call.

### The `render(&mut self)` Method Signature

```rust
fn render(&mut self) {
    // ...
}
```

This is a **fully synchronous** method. It runs on the winit event loop thread
(triggered by `RedrawRequested`), has no `async` keyword, no `.await`, and takes
no tokio handle. All wgpu recording and submission operations are synchronous
and fast — they only encode instructions and push them to the queue; they do not
wait for GPU completion.

### Acquiring a Back Buffer from the Swapchain

```rust
let frame = self.surface.get_current_texture();
```

`get_current_texture()` is how you acquire a back buffer from the
[swapchain](concepts/GLOSSARY.md#swapchain). This is the framebuffer you render
into for this frame. In a triple-buffered swapchain (`PresentMode::Mailbox`),
there are up to two spare back buffers waiting for you. `get_current_texture()`
hands you the next available one.

In wgpu 29, this method returns `CurrentSurfaceTexture`, a standalone enum with
7 variants describing the state of the swapchain's next back buffer:

> **Key insight #5 — 7 surface texture variants you must handle:**
> `CurrentSurfaceTexture::Success(frame)` — render normally.
> `CurrentSurfaceTexture::Suboptimal(frame)` — render (buffer available but
> not ideal, e.g., format mismatch). `CurrentSurfaceTexture::Timeout` — skip
> frame (GPU late). `CurrentSurfaceTexture::Occluded` — skip frame (window
> fully covered). `CurrentSurfaceTexture::Outdated` — surface changed,
> reconfigure. `CurrentSurfaceTexture::Lost` — surface destroyed, cannot
> recover without re-init.
> `CurrentSurfaceTexture::Validation { source, description }` — API
> validation caught an error, skip frame and log.

WHY `match` on the enum: `get_current_texture()` returns a
`CurrentSurfaceTexture` enum, not a `Result`. You match on the variant
directly. `Success` and `Suboptimal` both carry a `SurfaceTexture` you can
render into — the only difference is that `Suboptimal` signals the buffer may
not be ideal (e.g., a format downgrade). The Rust compiler enforces exhaustive
matching across all 7 variants.

### The Complete `render` Implementation

```rust
fn render(&mut self) {
    let frame = match self.surface.get_current_texture() {
        wgpu::CurrentSurfaceTexture::Success(frame)
        | wgpu::CurrentSurfaceTexture::Suboptimal(frame) => frame,
        wgpu::CurrentSurfaceTexture::Timeout => {
            log::warn!("Surface timeout — skipping frame");
            return;
        }
        wgpu::CurrentSurfaceTexture::Occluded => {
            log::warn!("Surface occluded — skipping frame");
            return;
        }
        wgpu::CurrentSurfaceTexture::Outdated => {
            log::warn!("Surface outdated — resizing");
            let size = self.window.inner_size();
            self.resize(size);
            return;
        }
        wgpu::CurrentSurfaceTexture::Lost => {
            log::error!("Surface lost — GPU resources invalidated; full re-init required");
            // Production recovery: signal App to drop `self.state`,
            // then recreate on the next RedrawRequested or in a
            // dedicated recovery callback. See callout below.
            return;
        }
        wgpu::CurrentSurfaceTexture::Validation { source, description } => {
            log::error!("Surface validation error: {:?} — {}", source, description);
            return;
        }
    };

    // Drive GPU work: shader compilation, memory allocation, fence signaling
    if let Err(e) = self.device.poll(wgpu::PollType::Wait { submission_index: None, timeout: None }) {
        log::error!("Device poll failed: {e}");
        return;
    }

    let texture_view = frame.texture.create_view(&Default::default());

    let mut encoder = self.device.create_command_encoder(
        &wgpu::CommandEncoderDescriptor {
            label: Some("Main Command Encoder"),
        },
    );

    {
        let mut render_pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor {
            label: Some("Main Render Pass"),
            color_attachments: &[Some(wgpu::RenderPassColorAttachment {
                view: &texture_view,
                depth_slice: None,
                resolve_target: None,
                ops: wgpu::Operations {
                    load: wgpu::LoadOp::Clear(wgpu::Color {
                        r: 0.1,
                        g: 0.1,
                        b: 0.1,
                        a: 1.0,
                    }),
                    store: wgpu::StoreOp::Store,
                },
            })],
            depth_stencil_attachment: None,
            timestamp_writes: None,
            occlusion_query_set: None,
            multiview_mask: None,
        });

        render_pass.set_pipeline(&self.pipeline);
        render_pass.set_vertex_buffer(0, self.vertex_buffer.slice(..));
        render_pass.draw(0..3, 0..1);
    } // render_pass drops here — render pass ends automatically

    self.queue.submit(std::iter::once(encoder.finish()));
    frame.present();
}
```

### Step by Step

**`surface.get_current_texture()`** — Acquires the next available back buffer
from the [swapchain](concepts/GLOSSARY.md#swapchain). The swapchain cycles through
2–3 pre-allocated back buffers. This call returns immediately if a buffer is
available; it does not block on the GPU.

> **Surface Lost recovery pattern:** `Lost` means the compositor destroyed the
> surface (display server restart, GPU reset, hotplug, etc.). Every GPU
> resource tied to that surface — the `Surface`, `Device`, `Queue`, pipeline,
> buffers — is irrecoverably invalidated. You cannot reuse any of them. The
> production pattern is to set `self.state = None` in `App`, then on the next
> `RedrawRequested` (or in a dedicated recovery callback), re-run the full
> `State::new()` initialization chain from S3. This recreates the adapter,
> device, surface, and all child resources. Without this, continued renders
> against a lost surface will either panic or silently produce corrupted
> output.

**`device.poll(wgpu::PollType::Wait { submission_index: None, timeout: None })`** — **Synchronous** call that drives
in-flight GPU work to completion: shader compilation fences, memory allocation,
and queue signaling. Without this, resources accumulate because the device does
not reclaim finished work. Called once per frame. Returns
`Result<(), MaintainError>` — if the device is lost, you recover by
re-creating the device.

WHY this is synchronous: `poll()` does not spawn a task or use `.await`. It
runs a small internal loop checking Vulkan fence objects until all in-flight
work is done, then returns. On a busy GPU this can take a few milliseconds per
frame — that is normal.

**`texture.create_view(&Default::default())`** — A [texture view](concepts/GLOSSARY.md#texture-view)
is how wgpu references a texture's memory inside a render pass. The GPU does
not accept raw texture handles in render pass attachments — it requires a view
that describes the mip level range, aspect, and dimension format.
`Default::default()` creates a full-view covering all mip levels and all aspects.

**`device.create_command_encoder(&desc)`** — Opens a recording session. The
[command encoder](concepts/GLOSSARY.md#command-buffer) is where you append
instructions. Think of it as building a function body: you add statements, then
`finish()` closes the function and returns the compiled buffer.

**`encoder.begin_render_pass(&desc)`** — Starts a scoped drawing block. The
[render pass](concepts/GLOSSARY.md#render-pass) descriptor defines the target
attachments (color, depth, stencil). The returned `RenderPass` is a scoped
guard — when it drops, the render pass ends automatically.

### Render Pass Color Attachment

```rust
color_attachments: &[Some(wgpu::RenderPassColorAttachment {
    view: &texture_view,
    depth_slice: None,
    resolve_target: None,
    ops: wgpu::Operations {
        load: wgpu::LoadOp::Clear(wgpu::Color { r: 0.1, g: 0.1, b: 0.1, a: 1.0 }),
        store: wgpu::StoreOp::Store,
    },
})],
```

**`RenderPassColorAttachment` has exactly 4 fields:**

- **`view: &texture_view`** — the framebuffer we draw into. Must match the
  color target format in the [render pipeline](concepts/GLOSSARY.md#pipeline-render).
- **`depth_slice: None`** — only used for 3D texture slices. Not applicable
  to 2D rendering.
- **`resolve_target: None`** — only used for MSAA resolve. When multisampling
  is active, the render pass writes to a multisampled buffer and resolves into
  this target. We have no MSAA, so `None`.
- **`ops`** — [operations](concepts/GLOSSARY.md#operations) controlling load
  and store behavior. Two sub-fields:
  - **`load: LoadOp::Clear(color)`** — before drawing, fill the entire
    framebuffer with this color. **This IS your background color.** Dark gray.
    `LoadOp::Load` keeps existing pixels (used in UI compositing where you
    draw on top of previous content).
  - **`store: StoreOp::Store`** — after drawing, keep what was written. The
    GPU writes the result back to the texture so the swapchain can present it.
    `StoreOp::Discard` throws away the result — used for offscreen renders
    where only the depth/stencil result matters.

**`depth_stencil_attachment: None`** — No depth or stencil buffer. When you
have a depth texture, it goes here.

**`timestamp_writes: None`** — GPU hardware timestamps for profiling. Not used
in production rendering; requires a query set.

**`occlusion_query_set: None`** — hardware occlusion queries (count fragments
that pass the depth test). Useful for visibility-based culling.

**`multiview_mask: None`** — multiview rendering mask for VR / multi-viewport.

### Binding State and Drawing

**`render_pass.set_pipeline(&self.pipeline)`** — Tells the GPU which
[render pipeline](concepts/GLOSSARY.md#pipeline-render) to use for subsequent
draw calls. The pipeline encapsulates the shader programs, vertex format,
primitive topology, and output configuration. Must be set before any draw call
in a render pass. Switching pipelines mid-pass is expensive and should be
minimized.

WHY this is necessary: the GPU hardware does not store pipeline state between
frames. Every render pass starts with no pipeline bound. You must set it every
frame.

**`render_pass.set_vertex_buffer(0, self.vertex_buffer.slice(..))`** — Binds the
[vertex buffer](concepts/GLOSSARY.md#vertex-buffer) to slot 0.
`buffer.slice(..)` creates a [buffer slice](concepts/GLOSSARY.md#buffer-slice)
covering the full buffer (equivalent to `buffer.slice(0..)`). Slot 0 corresponds
to the first layout in the pipeline's vertex buffer layouts array. If you had
multiple vertex buffers (e.g., separate position and instance buffers), you'd
bind them to slots 0, 1, etc.

**`render_pass.draw(0..3, 0..1)`** — The draw command. Two `Range<u32>`
arguments:
- First range `0..3` — vertex range. Draw vertices 0, 1, 2 (three vertices
  forming one triangle).
- Second range `0..1` — instance range. Draw instance 0 (one instance).

WHY two ranges: the vertex range controls which vertices from the buffer are
read. The instance range controls instanced rendering — the same geometry drawn
multiple times with different instance-data attributes. For a single triangle,
one draw call with `0..1` instances is correct.

**Render pass scope drop** — When the `render_pass` variable goes out of scope
(the closing `}` in the block), the drop implementation ends the render pass
and performs validation. If you forgot to set the pipeline or bind a required
buffer, wgpu reports the error at drop time, not at draw time.

**`encoder.finish()`** — Seals the command encoder. Returns the finished
[command buffer](concepts/GLOSSARY.md#command-buffer) ready for submission.
After `finish()`, the encoder cannot be used again.

**`queue.submit(iter)`** — Dispatches one or more command buffers to the GPU.
Takes an iterator of command buffers. We submit exactly one: the frame's command
buffer. This is a fire-and-forget call — it queues the work and returns
immediately. The GPU executes it asynchronously, in parallel with your next
frame's CPU work.

**`surface_texture.present()`** — Queues the rendered back buffer for display.
This tells the swapchain: "this buffer is done, show it on screen." **If you
forget this, you render to a buffer nobody sees.** The swapchain cycles the
buffer from "render target" to "front buffer" on the next vsync.

### Why the Match Arms Differ

- **`CurrentSurfaceTexture::Success(frame)` / `Suboptimal(frame)`** — the
  swapchain delivered a `SurfaceTexture` you can render into. `Success` means
  the buffer is ideal. `Suboptimal` means the buffer is available but may not
  be ideal (e.g., format mismatch, downgraded resolution). Both carry the
  same `SurfaceTexture`. Extract `frame.texture` to create a view, render,
  then call `frame.present()`.
- **`CurrentSurfaceTexture::Timeout`** — the GPU exceeded the wait threshold
  for a back buffer. Skip the frame. The GPU will catch up.
- **`CurrentSurfaceTexture::Occluded`** — the window is fully covered by
  another window. Skip the frame; there's no point rendering to an invisible
  surface.
- **`CurrentSurfaceTexture::Outdated`** — the swapchain was created for a
  resolution that no longer matches the window. Reconfigure the surface
  using `self.window.inner_size()` to match the current dimensions.
- **`CurrentSurfaceTexture::Lost`** — the GPU or display server has been
  reset. Without re-creating the device and surface, you cannot recover. In
  a real application, you'd trigger a full re-initialization.
- **`CurrentSurfaceTexture::Validation { source, description }`** — the wgpu
  validation layer caught an API misuse. Log the diagnostic and skip the
  frame.

## S8: Handling Window Resize

WHY `surface.configure()` on resize: The swapchain allocates back buffers at a
fixed dimension. When the window size changes, the old back buffers no longer
match the window's display surface. Presenting a mismatched-size buffer causes
undefined behavior — the display server clips, stretches, or rejects it.
`surface.configure()` allocates new back buffers matching the new dimensions and
discards the old ones.

WHY `width.max(1)`: On some display servers, minimizing a window briefly
reports `0 × 0` size before restoring. A zero-dimension surface allocation
panics. Clamping to 1 ensures the swapchain always has valid dimensions.

WHY `std::mem::take(&mut self.config.view_formats)`: The `view_formats` field
of `SurfaceConfiguration` is an owned `Vec<TextureFormat>`. When constructing
the new configuration, you move the vector out of the old config rather than
cloning it. `mem::take` replaces the field with `Vec::new()` (zero allocation)
and returns the original vector. This avoids a heap allocation for what is
typically a 1-element vec.

```rust
fn resize(&mut self, size: wgpu::dpi::PhysicalSize<u32>) {
    if size.width > 0 && size.height > 0 {
        let config = wgpu::SurfaceConfiguration {
            usage: self.config.usage,
            format: self.config.format,
            width: size.width.max(1),
            height: size.height.max(1),
            present_mode: self.config.present_mode,
            desired_maximum_frame_latency: self.config.desired_maximum_frame_latency,
            alpha_mode: self.config.alpha_mode,
            view_formats: std::mem::take(&mut self.config.view_formats),
        };
        self.surface.configure(&self.device, &config);
        self.config = config;
    }
}
```

FIELD BY FIELD:

**`usage` / `format` / `present_mode` / `alpha_mode`** — carried over from the
old config unchanged. These properties are negotiated once at init time
and do not change on resize.

**`width` / `height`** — the new dimensions, clamped to at least 1.

**`desired_maximum_frame_latency`** — swapchain back-pressure setting. Kept from
the old config. This value controls how many frames the swapchain buffers
between CPU submission and GPU presentation. A value of 2 (triple buffering)
provides smooth frame pacing under variable CPU/GPU load. See S3 init step 5.

**`view_formats`** — additional texture formats the surface can create views
with. `std::mem::take()` moves the owned vector from the old config into the
new config. After `take()`, the old config's `view_formats` is an empty `Vec`.
This avoids a `clone()` of the vector. Since the old config is about to be
overwritten by `self.config = config`, the emptied field is irrelevant.

**`surface.configure(&self.device, &config)`** — takes a reference to the
`Device` and the new `SurfaceConfiguration`. This is not async. It allocates the
new swapchain buffers and replaces the old ones. Any in-flight renders using
old buffers complete normally; the new buffers are available after this call
returns.

### When `resize` Is Called

In our `App::window_event` handler (S2), the `WindowEvent::Resized(size)` arm
calls `state.resize(size)`. Since `State` owns an `Arc<Window>` (see S3),
`resize()` has access to the window internally and needs only the new
dimension. The resize fires once for every dimension change. On fast window resizing, you may receive dozens of resize events in
succession. `surface.configure()` is fast enough to handle this — each call
discards old buffers and allocates new ones. The GPU continues processing
in-flight frames with the old buffer dimensions; there is no visual glitch
because the swapchain handles the transition seamlessly.

## S9: Where All the Code Goes

The full source is the codeblocks in sections S2–S8, assembled in order into
`src/main.rs` and `src/shader.wgsl`.

### File Structure

```
src/
├── main.rs          # Sections S2, S3, S5 (structs), S7 (render), S8 (resize)
├── shader.wgsl      # Section S4 (the complete WGSL shader)
```

- `main.rs` combines the winit event loop (S2), the init chain and `State`
  struct (S3), the `Vertex` type and `VERTICES` constant (S5), the `render`
  method (S7), and the `resize` method (S8).
- `shader.wgsl` is the single file from S4: vertex shader, fragment shader,
  and the `VertexOutput` struct.

Refer to [concepts/GLOSSARY.md](concepts/GLOSSARY.md) for term definitions used
throughout these sections. See [TROUBLESHOOTING.md](TROUBLESHOOTING.md) for
common issues and their fixes.

## S10: Running It

Run the project:

```bash
cargo run
```

**Expected console output:** wgpu adapter info (GPU model, driver name), shader
module compilation log, pipeline creation messages, and the `simple_logger`
debug lines from surface status and device polling.

**Expected visual:** A dark gray background (from `LoadOp::Clear`) with a
rainbow triangle spanning most of the window. Red at the bottom-left corner,
blue at the bottom-right corner, green at the top vertex. Colors blend smoothly
across the triangle surface via hardware interpolation.

**Expected CPU usage:** 100% on one core due to `ControlFlow::Poll` driving a
continuous redraw loop. This is normal for a demo that redraws every vsync.
See [TROUBLESHOOTING.md](TROUBLESHOOTING.md) for common issues.

## S11: What You've Learned and What's Next

### Summary

You have built a complete GPU-rendered application from scratch. Here is what
each piece does:

- **The 5-layer wgpu [init chain](concepts/GLOSSARY.md#instance):**
  [Instance](concepts/GLOSSARY.md#instance) →
  [Surface](concepts/GLOSSARY.md#surface) →
  [Adapter](concepts/GLOSSARY.md#adapter) →
  [Device](concepts/GLOSSARY.md#device) + [Queue](concepts/GLOSSARY.md#queue) →
  [SurfaceConfiguration](concepts/GLOSSARY.md#surface-configuration).
  Each layer adds a capability: driver connection, window binding, GPU selection,
  resource management, and swapchain allocation.

- **The [render pipeline](concepts/GLOSSARY.md#pipeline-render):** Shaders,
  topology, and vertex layout compiled into a GPU configuration. Created once,
  reused every frame. Expensive to create, cheap to execute.

- **The [command buffer](concepts/GLOSSARY.md#command-buffer) model:** Record
  instructions on the CPU, submit atomically to the queue, GPU executes
  asynchronously. No `.await` on a draw call.

- **The [swapchain](concepts/GLOSSARY.md#swapchain) and
  [framebuffer](concepts/GLOSSARY.md#framebuffer):** Double-buffered rendering
  through [PresentMode::Mailbox](concepts/GLOSSARY.md#present-mode). Acquire a
  back buffer, render into it, present it to the display.

- **GPU [interpolation](concepts/GLOSSARY.md#interpolation):** Vertex attributes
  automatically blended across triangle surfaces. You supply values at three
  points; the rasterizer computes every value in between.

### What's Next

With the render loop and pipeline foundation in place, the next steps are:

- **Textures and bind groups** — loading
  images onto the GPU and sampling them in fragment shaders
- **Uniforms and 3D transforms** — projection, view, and model matrices for
  positioning geometry in 3D space
- **Lighting and material models** — diffuse, specular, and PBR shading
- **Depth buffering and z-fighting** — per-pixel depth testing for correct
  overlap ordering
- **Compute shaders and GPU compute
  pipelines** — general-purpose GPU computation outside the graphics pipeline

> **Prerequisite note — matrix math:** Every topic above ultimately depends on
> matrix mathematics. Transforms (model, view, and projection matrices) move
> geometry from local object space through world space, camera space, and
> finally into clip space. In this tutorial, all vertex positions are hardcoded
> NDC coordinates so we can focus on the rendering pipeline itself. Real
> applications compute these coordinates via matrix multiplication: a
> transformation matrix is uploaded to the GPU as a uniform, and the vertex
> shader multiplies each vertex by that matrix before outputting
> `clip_position`. If linear algebra is unfamiliar, study it before diving
> into the next tutorials. Recommended resources: [Learn
> OpenGL's linear algebra section](https://learnopengl.com/Getting-started/Coordinate-Systems)
> for a graphics-oriented treatment, and
> [3Blue1Brown's Essence of Linear Algebra](https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab)
> for an intuitive visual foundation.

Keep [concepts/GLOSSARY.md](concepts/GLOSSARY.md) handy as you move forward.