commit dbe6bdee9a094f54e66172cc7717f46d56f9f103
Author: Krishna Ayyalasomayajula <krishna@ayyalasomayajula.net>
Date:   Sat May 30 17:40:28 2026 -0500

    docs: add concept reference files (graphics-pipeline, coordinate-systems, shader-basics, GLOSSARY)

diff --git a/docs/concepts/GLOSSARY.md b/docs/concepts/GLOSSARY.md
new file mode 100644
index 0000000..e21c414
--- /dev/null
+++ b/docs/concepts/GLOSSARY.md
@@ -0,0 +1,151 @@
+# Glossary
+
+Click any term to jump to its definition. These terms are referenced across all concept files.
+
+## Adapter
+
+The physical GPU or software renderer wgpu communicates with. A single system may expose multiple adapters: a dedicated NVIDIA GPU, an integrated Intel GPU, and a software fallback (llvmpipe / SwiftShader). You select one adapter and build all resources from it. In wgpu, adapters are discovered via `Instance::enumerate_adapters()`. Picking the wrong adapter means running on the integrated GPU when the discrete GPU is available.
+
+## Barycentric coordinates
+
+Three weights (w0, w1, w2) that sum to 1, computed by the rasterizer for every fragment inside a triangle. Each weight represents the fragment's proximity to one of the three vertices. These weights drive the hardware interpolation of vertex attributes: `value = w0*value0 + w1*value1 + w2*value2`. At vertex 0, the weights are (1,0,0). At the triangle centroid, they are (1/3, 1/3, 1/3).
+
+## Buffer slice
+
+A view into GPU buffer memory defined by an offset and a length. `buffer.slice(..)` returns the full buffer. Buffer slices are used when mapping buffers for CPU read/write access or when copying data between buffers. They do not own the underlying memory — they are a window into an existing buffer.
+
+## Clip space
+
+The [[homogeneous coordinates]](#homogeneous-coordinates) coordinate space that the [[vertex shader]](#vertex-shader) outputs into (`vec4<f32>`). The GPU clips geometry against the clip-space boundaries before performing perspective division (dividing x, y, z by w) to produce [[ndc]](#ndc). For perspective projection, clip space is a pyramid. For orthographic projection, it is a box. Geometry outside these boundaries is discarded by hardware.
+
+## Command buffer
+
+A recorded sequence of GPU commands — buffer copies, render passes, compute dispatches — analogous to a bash script listing operations to execute. You create a command buffer, encode operations into it via a `CommandEncoder`, then submit it to the [[queue]](#queue). The GPU executes the recorded sequence asynchronously. One submission is one unit of GPU work.
+
+## Device
+
+The logical connection to the GPU. Created from an [[adapter]](#adapter), the device owns all GPU resources: buffers, textures, [[pipeline]](#pipeline) objects, shader modules, and bind groups. It is analogous to a file descriptor — the handle through which you allocate and manage GPU memory. All resource creation and destruction flows through the device.
+
+## Device poll
+
+`device.poll(PollType::Wait)` — a synchronous call that tells wgpu to drive all in-flight GPU work toward completion. This includes shader compilation, memory allocation on the GPU side, fence signaling, and surface frame acquisition. Without polling, wgpu's internal work queues stall. The [[polltype]](#polltype) `Wait` variant blocks the CPU thread until pending GPU tasks are done.
+
+## Fragment
+
+A potential pixel produced by the [[rasterizer]](#rasterizer). One fragment is generated per screen pixel that a [[primitive]](#primitive) covers. A fragment carries interpolated [[vertex]](#vertex) shader outputs, a depth value, and a color. The fragment may be later discarded by depth testing, stencil testing, or alpha testing during the [[output merge]](#output-merge)(graphics-pipeline.md#stage-5-output-merge) stage. Not every fragment becomes a visible pixel.
+
+## Fragment shader
+
+GPU program running once per [[fragment]](#fragment). It receives pre-interpolated vertex shader outputs from the rasterizer and computes the final RGBA color for that fragment. This is where texture sampling, lighting calculations, and pixel-level effects happen. For the rainbow triangle, the fragment shader passes the interpolated vertex color through unchanged.
+
+## Framebuffer
+
+The color buffer that appears on screen. During [[swapchain]](#swapchain) double-buffering, the framebuffer being drawn to is the back buffer. Once the render pass completes and you submit the buffer, it becomes the front buffer and is displayed. The framebuffer is a [[texture view]](#texture-view) tied to a surface frame.
+
+## Homogeneous coordinates
+
+A four-component representation (x, y, z, w) that enables perspective projection via the divide-by-w step. When w=1, the coordinates represent a point in 3D space. When w=0, they represent a direction vector. Perspective division (x/w, y/w, z/w) transforms clip-space coordinates into [[ndc]](#ndc). With w=1.0, division is the identity transform.
+
+## Interpolation
+
+The rasterizer's automatic blending of vertex shader outputs across the surface of a triangle. For every `@location(n)` value output by the vertex shader, the [[rasterizer]](#rasterizer) computes a linear blend using [[barycentric coordinates]](#barycentric-coordinates): `value = w0*v0 + w1*v1 + w2*v2`. This is a free, hardware-accelerated feature. No shader code is required to perform interpolation.
+
+## Instance
+
+The root wgpu object representing the connection to the system's graphics drivers. Created via `Instance::new()`, the instance discovers available [[adapter]](#adapter)s and manages [[surface]](#surface) creation. It is the first object created in the wgpu initialization chain.
+
+## Loadop
+
+Controls what happens to the [[framebuffer]](#framebuffer) at the start of a render pass. `LoadOp::Clear(color)` fills the entire framebuffer with a solid color — this produces your scene background. `LoadOp::Load` keeps whatever pixels are already in the framebuffer — used for multi-pass rendering where the second pass draws on top of the first.
+
+## Output merge
+
+The final GPU pipeline stage. It applies per-fragment tests (depth, stencil, alpha) and blending operations before writing pixels to the [[framebuffer]](#framebuffer). The blend state (configured in the [[pipeline]](#pipeline)) determines whether new colors replace, add, or multiply with existing framebuffer colors. For the rainbow triangle, blending is REPLACE — new pixels overwrite old ones.
+
+## Pipeline (render)
+
+A compiled GPU configuration bundling: [[vertex shader]](#vertex-shader) + [[fragment shader]](#fragment-shader) + [[topology]](#topology) + blend state + depth/stencil state + vertex buffer layout. Created once via `device.create_render_pipeline()` and reused for every frame. Changing any of these parameters requires creating a new pipeline. Pipeline creation is expensive; do not create one per frame.
+
+## Polltype
+
+The strategy passed to `device.poll()`. `PollType::Wait` blocks the calling thread until all pending GPU work finishes — equivalent to a fence wait. `PollType::Poll` checks for completed work once and returns immediately, regardless of whether work is done. For the rainbow triangle, `Wait` is correct: we need the GPU to finish the frame before requesting the next surface texture.
+
+## Ndc
+
+Normalized Device Coordinates. The GPU's native intermediate coordinate space. X and Y range from -1.0 (left/bottom) to +1.0 (right/top). Z ranges from 0.0 (near clipping plane) to 1.0 (far clipping plane). Geometry is mapped into NDC by the GPU after perspective division. Anything outside this cube is clipped. See [[coordinate-systems.md]](coordinate-systems.md).
+
+## Operations
+
+Paired `LoadOp` + `StoreOp` controlling [[framebuffer]](#framebuffer) behavior at [[render pass]](#render-pass) boundaries. `LoadOp` defines the pre-draw state (clear or load). `StoreOp` defines the post-draw state (store or discard). Together they form `Operations { load, store }` passed to `RenderPassColorAttachment`.
+
+## Primitive
+
+A geometric shape the GPU can render: point list, line list, line strip, triangle list, or triangle strip. Triangles are the universal primitive — every 3D surface is built from triangles. In wgpu, the primitive type is set on the pipeline descriptor. The rainbow triangle uses `PrimitiveTopology::TriangleList`, meaning every group of 3 consecutive vertices forms one triangle.
+
+## Queue
+
+The submission channel to the GPU. You push [[command buffer]](#command-buffer)s into the queue via `queue.submit()`. The queue executes them asynchronously on the GPU. The queue also handles buffer uploads via `queue.write_buffer()` — these are synchronous copy operations that block until the data lands in GPU memory.
+
+## Rasterizer
+
+Hardware stage that converts [[primitive]](#primitive) geometry into [[fragment]](#fragment)s. For each triangle, determines which screen pixels it covers, generates one fragment per covered pixel, and computes interpolated vertex attributes using [[barycentric coordinates]](#barycentric-coordinates). The rasterizer is a fixed-function unit: no user code runs here. You configure its behavior (culling, fill mode, scissor test) via the pipeline descriptor.
+
+## Render pass
+
+A scoped section of a [[command buffer]](#command-buffer) that groups draw operations sharing the same target [[framebuffer]](#framebuffer) attachments. Entered via `command_encoder.begin_render_pass()` and ended by dropping the `RenderPass` variable. Between begin and end, you set the pipeline, bind vertex buffers, and issue draw calls. Everything drawn in one render pass targets the same framebuffer with the same [[operations]](#operations).
+
+## Shader
+
+GPU program written in [[wgsl]](#wgsl). No heap allocation, no recursion, no I/O. The only output channel is the return value. A shader module may contain multiple entry points (`@vertex`, `@fragment`, `@compute`). The GPU runs thousands of shader invocations in parallel, each operating on different data but executing the identical program.
+
+## Shader location
+
+A numeric binding label (`@location(n)`) used to tie Rust vertex buffer attributes to WGSL shader parameters. On the Rust side: `VertexAttribute { shader_location: 0, ... }`. On the WGSL side: `@location(0) my_value: vec3<f32>`. These numbers must match exactly. Mismatched locations produce silent data corruption — the GPU reads from the wrong memory offset.
+
+## Storeop
+
+Controls what happens to the [[framebuffer]](#framebuffer) at the end of a render pass. `StoreOp::Store` keeps the written pixels — this is what you want for visible frames. `StoreOp::Discard` discards the framebuffer contents — used for offscreen renders where you do not need the result on screen, saving a memory barrier.
+
+## Surface
+
+wgpu's connection to a window's display buffer. Created via `instance.create_surface(window)`, the surface is like a bound socket — it is tied to a specific window and cannot be unlinked. The surface manages the [[swapchain]](#swapchain) and provides new framebuffers via `surface.get_current_texture()`. If the window is resized, the surface must be reconfigured with a new `SurfaceConfiguration`.
+
+## Swapchain
+
+A ring buffer of 2-3 [[framebuffer]](#framebuffer) textures managed by the GPU driver. The display hardware reads from the front buffer. The application renders to the back buffer. When the frame is complete, the buffers swap: the back buffer becomes the front (displayed), and the old front becomes the available back buffer for the next frame. This prevents screen tearing by ensuring the display never reads a frame mid-update.
+
+## Texture view
+
+A handle referencing a region of [[texture]](#texture) memory for use inside a [[render pass]](#render-pass) or bind group. Created via `texture.create_view()`, texture views define the mip level range, aspect, and dimensionality (2D, cube, array) of the binding. Surface framebuffers are accessed as texture views inside render passes.
+
+## Texture
+
+GPU memory region storing color data. Used for both render targets (framebuffers) and samplers (loaded images). In wgpu, a texture is created from the [[device]](#device) with a defined size, format, and usage flags. You never read texture memory directly from the CPU — you access it through [[texture view]](#texture-view) bindings in shaders.
+
+## Topology
+
+The rule for grouping vertices into [[primitive]](#primitive) shapes. `TriangleList` means every 3 consecutive vertices form one independent triangle. `TriangleStrip` means each new vertex combined with the previous two forms a triangle. `PointList` renders individual points. `LineList` renders pairs of connected vertices. Topology is set once on the [[pipeline]](#pipeline) descriptor.
+
+## Vertex
+
+A data point containing one or more attributes: position, color, UV coordinates, normals, tangents. All attributes for one vertex are stored contiguously in a [[vertex buffer]](#vertex-buffer). The stride (total bytes per vertex) is determined by the sum of all attribute sizes. In the rainbow triangle, each vertex has three `f32` position components and three `f32` color components: 24 bytes per vertex.
+
+## Vertex buffer
+
+GPU [[buffer slice]](#buffer-slice) containing [[vertex]](#vertex) attribute data in a tightly packed layout. Created via `device.create_buffer()` and populated via `queue.write_buffer()`. The pipeline's vertex state describes how to interpret the buffer: stride, attribute count, and per-attribute format + [[shader location]](#shader-location) mapping.
+
+## Vertex shader
+
+GPU program running once per [[vertex]](#vertex). It reads vertex attributes from the [[vertex buffer]](#vertex-buffer), transforms the position into [[clip space]](#clip-space), and outputs any per-vertex data the downstream pipeline stages need. The mandatory output is `@builtin(position) vec4<f32>`. Optional outputs use `@location(n)` annotations and flow into the rasterizer for interpolation.
+
+## Viewport transform
+
+Automatic GPU step mapping [[ndc]](#ndc) coordinates (-1..+1) to [[window]](#window) pixel coordinates. Configured via `SurfaceConfiguration` `width` and `height` fields. The GPU performs: `screen_x = (ndc_x + 1) / 2 * width; screen_y = (ndc_y + 1) / 2 * height`. This step happens after perspective division, between NDC and the rasterizer. You never write this math in shader code.
+
+## Window
+
+The operating system window created by the windowing library. In wgpu, the window is passed to `instance.create_surface()` to bind the GPU to a display target. The window dimensions dictate the [[viewport transform]](#viewport-transform) and thus the size of the rendered image. Resizing the window requires creating a new `SurfaceConfiguration` with updated dimensions.
+
+## WGSL
+
+WebGPU Shading Language. The standardized shader language for WebGPU. Static typed, no heap, no recursion, no I/O. Compiles to platform-native intermediate formats: SPIR-V (Vulkan), MSL (Metal), DXIL (DirectX). You write one WGSL module; wgpu translates it for the target backend. Syntax is similar to GLSL but stricter — all variables must be declared, all branches must terminate, and all entry points must be annotated.
diff --git a/docs/concepts/coordinate-systems.md b/docs/concepts/coordinate-systems.md
new file mode 100644
index 0000000..be3d91c
--- /dev/null
+++ b/docs/concepts/coordinate-systems.md
@@ -0,0 +1,93 @@
+# Coordinate Systems
+
+## The Problem
+
+Your window is a grid of pixels: 800×600 in our configuration. The 3D scene you want to render spans from -∞ to +∞ in every direction. The GPU cannot reason in window pixels because every window has a different size. It cannot reason in world space because that is application-defined. The GPU needs a standard intermediate coordinate space.
+
+That space is [[ndc]](GLOSSARY.md#ndc), Normalized Device Coordinates.
+
+## NDC Definition
+
+NDC is a fixed, standardized cube:
+
+| Axis | Min | Max | Meaning |
+|------|-----|-----|---------|
+| X | -1.0 | +1.0 | Left to right |
+| Y | -1.0 | +1.0 | Bottom to top |
+| Z | 0.0 | 1.0 | Near to far |
+
+Any geometry in this cube that is in front of the near plane (Z ≥ 0) and behind the far plane (Z ≤ 1) is visible. Anything outside is clipped away by the GPU hardware before rasterization.
+
+### Visual Map
+
+```
+  (-1,+1) ────────── (+1,+1)
+      │                  │
+      │       (0,0)      │  ← origin = center of screen
+      │                  │
+  (-1,-1) ────────── (+1,-1)
+```
+
+Notice the origin is at the center of the screen, not the top-left. This is deliberate: 3D scenes are easier to reason about when (0,0) is the center. A camera sits at the origin and looks down the negative Z axis.
+
+## Our Triangle In NDC
+
+Because this is the simplest possible renderer, our triangle vertices are specified directly in NDC. No projection matrix. No camera transform. No model matrix. Just three points in GPU-native space:
+
+| Corner | X | Y | Z |
+|--------|-----|-----|-----|
+| Bottom-left | -0.5 | -0.5 | 0.0 |
+| Bottom-right | +0.5 | -0.5 | 0.0 |
+| Top-center | 0.0 | +0.5 | 0.0 |
+
+The triangle occupies the lower half of the screen. The base runs from left to right along Y=-0.5. The peak sits on the center axis at Y=0.5. All three vertices are at Z=0, sitting exactly on the near plane.
+
+Plot this in the NDC box above and you will see why the triangle fills half the screen. It spans 50% of the X axis (from -0.5 to +0.5) and 50% of the Y axis (from -0.5 to +0.5 in the lower half).
+
+In a real application, vertices live in arbitrary world units and you apply a series of matrix transformations to bring them into clip space, from which the GPU produces NDC. Here we skip all of that and place the vertices directly in NDC. The vertex shader still outputs `vec4<f32>` and the pipeline is structurally identical.
+
+## Homogeneous Coordinates
+
+The GPU vertex shader outputs a `vec4<f32>`, not a `vec3<f32>`. The fourth component `w` is the [[homogeneous coordinates]](GLOSSARY.md#homogeneous-coordinates) value that enables the clip space → NDC conversion.
+
+When the vertex shader outputs `vec4<f32>(x, y, z, w)`, the GPU performs a step called **perspective division**: it divides every component by `w`. The result is `(x/w, y/w, z/w)` — this is what lands in NDC.
+
+For our triangle, we set `w = 1.0`:
+
+```
+vec4<f32>(position, 1.0)  =  vec4<f32>(pos.x, pos.y, pos.z, 1.0)
+```
+
+Division by 1.0 is the identity — the position passes through unchanged. But why four components?
+
+A `w` value of 1.0 means "this is a point in space." A `w` value of 0.0 would mean "this is a direction vector." This encoding lets the GPU handle both positions and directions with the same data type. More importantly, when you use a perspective projection matrix, the matrix encodes a varying `w` value per vertex (equal to the vertex's Z distance from the camera). After perspective division, the resulting NDC coordinates automatically produce the foreshortening effect that makes distant objects appear smaller. That is how perspective works on the GPU.
+
+Our triangle uses `w = 1.0` because we have no camera and no perspective — just an orthogonal placement. The value exists because the pipeline requires clip-space `vec4` output, not because we need perspective.
+
+## Clip Space
+
+Before NDC, there is [[clip space]](GLOSSARY.md#clip-space). This is the coordinate space the vertex shader outputs into. Clip space is a pyramid (for perspective projection) or a box (for orthographic projection) that the GPU clips against. Geometry outside the clip-space boundaries is discarded by hardware before perspective division. Our triangle is entirely inside the clip space pyramid, so nothing is clipped.
+
+## Viewport Transform (Automatic)
+
+After perspective division produces NDC coordinates, the GPU maps them to the actual window dimensions. This is the viewport transform:
+
+```
+screen_x = (ndc_x + 1.0) / 2.0 * window_width
+screen_y = (ndc_y + 1.0) / 2.0 * window_height
+```
+
+This step is automatic. You never write it in code. It is configured by the [[viewport transform]](GLOSSARY.md#viewport-transform) fields in your `SurfaceConfiguration`, specifically the `width` and `height` values. When the surface configuration says 800×600, the GPU maps NDC `[-1, +1]` onto `[0, 800]` and `[0, 600]`.
+
+You do write code to update the viewport transform — but only when the window size changes. At that point, you create a new `SurfaceConfiguration` with the new dimensions and configure the surface. The GPU then uses the updated mapping on subsequent frames.
+
+## Summary: The Coordinate Journey
+
+For our triangle, every vertex follows this path:
+
+1. **Vertex data:** Stored as `vec3<f32>` in the vertex buffer. Values are already in NDC.
+2. **Vertex shader:** Wraps in `vec4(f32)` by appending `w = 1.0`. This is clip space (which, for identity `w`, equals NDC).
+3. **Perspective division:** GPU divides by `w = 1.0` → identity. Vertex is now in [[ndc]](GLOSSARY.md#ndc).
+4. **Viewport transform (automatic):** GPU scales NDC to window pixel coordinates. The triangle appears on screen.
+
+In a real 3D application, this journey includes model, view, and projection matrices before clip space. For the rainbow triangle, the journey is three steps through identity transforms. The hardware pipeline stages are the same regardless.
diff --git a/docs/concepts/graphics-pipeline.md b/docs/concepts/graphics-pipeline.md
new file mode 100644
index 0000000..585b809
--- /dev/null
+++ b/docs/concepts/graphics-pipeline.md
@@ -0,0 +1,94 @@
+# The Graphics Pipeline
+
+## GPU vs CPU
+
+If you are a backend or systems developer, the GPU is a foreign piece of hardware. The CPU is a master chef: it cooks one dish at a time, can follow any recipe, handles exceptions mid-stream, and adapts to every condition. The GPU is a brigade of a thousand short-order cooks: each one does the same simple task on a different ingredient, cannot improvise, and cannot branch on its own. Together they process 1000× more ingredients — but only if the recipe is identical for every single one.
+
+This is the throughput vs latency distinction. The CPU minimizes latency (finish one task fast). The GPU maximizes throughput (finish many identical tasks fast). This distinction drives every constraint and design choice in graphics programming.
+
+## Why Not GPU For Everything
+
+Branching kills parallelism. Inside a GPU warp (a group of 32-64 threads running the same shader), if even one thread takes a different branch, the entire warp serializes: it runs the first branch, then the second. Divergent logic stalls the whole unit. This is why `if/else` on varying data is expensive, and `match` is essentially banned shader design.
+
+There is also PCIe transfer cost. Pushing megabytes of data to the GPU is relatively cheap — the bus was built for bulk transfers. Pulling results back, or transferring data back and forth per-frame, is a bottleneck you fight constantly.
+
+The GPU also has no heap, no recursion, no `stdio`, and no arbitrary memory allocation. Every vertex shader invocation gets the same static stack. Every fragment shader invocation is stateless. You design around this, not against it.
+
+## The Rendering Pipeline
+
+Rendering maps 3D geometry to a 2D framebuffer through five stages:
+
+```
+Vertex Shader ──→ Primitive Assembly ──→ Rasterizer ──→ Fragment Shader ──→ Output Merge
+once/vertex         groups→triangles          pixels/frag       once/fragment        depth/blend
+```
+
+Each stage is a pipeline filter. Data flows through; nothing flows backward. This is the hardware architecture of every GPU, from integrated Intel chips to RTX 5090s.
+
+### Stage 1: Vertex Shader
+
+[[vertex shader]](GLOSSARY.md#vertex-shader) — a GPU program running once per input [[vertex]](GLOSSARY.md#vertex).
+
+Input: vertex attributes read from the [[vertex buffer]](GLOSSARY.md#vertex-buffer). In our case: position and color.
+
+Output: mandatory clip-space position (`vec4<f32>`) plus any per-vertex data the [[fragment shader]](GLOSSARY.md#fragment-shader) needs downstream: color, UV coordinates, normals, etc.
+
+The vertex shader is the only place you transform geometry. In complex scenes this means multiplying by model-view-projection matrices. For our triangle, the vertices are already in the GPU's native coordinate space, so the vertex shader passes the position through unchanged.
+
+### Stage 2: Primitive Assembly
+
+Hardware only. No user code runs here.
+
+The GPU takes vertices in the order you submitted them and groups them into [[primitive]](GLOSSARY.md#primitive) shapes. With [[topology]](GLOSSARY.md#topology) set to `TriangleList`, every group of 3 consecutive vertices becomes one triangle. Vertex 0, 1, 2 → triangle A. Vertex 3, 4, 5 → triangle B.
+
+### Stage 3: Rasterizer
+
+[[rasterizer]](GLOSSARY.md#rasterizer) — hardware stage that converts triangles into fragments.
+
+For each submitted triangle, the rasterizer determines which screen pixels the triangle covers. For each covered pixel, it generates one [[fragment]](GLOSSARY.md#fragment) — a "potential pixel" carrying interpolated data.
+
+The critical function here is [[interpolation]](GLOSSARY.md#interpolation). The rasterizer computes [[barycentric coordinates]](GLOSSARY.md#barycentric-coordinates) — three weights (w0, w1, w2) that sum to 1 — describing where inside the triangle the pixel falls. Then for every value the vertex shader output, the rasterizer computes: `value = w0 * value0 + w1 * value1 + w2 * value2`.
+
+This is the step that makes colors blend across the triangle. It is free, automatic, hardware-accelerated [[interpolation]](GLOSSARY.md#interpolation). You do not write the code. The GPU computes it because it is how the rendering pipeline architecture works.
+
+### Stage 4: Fragment Shader
+
+[[fragment shader]](GLOSSARY.md#fragment-shader) — a GPU program running once per [[fragment]](GLOSSARY.md#fragment).
+
+Input: the pre-interpolated values from the vertex shader, delivered by the rasterizer. The fragment shader receives one invocation per covered screen pixel. If a triangle covers 2000 pixels, the fragment shader runs 2000 times.
+
+Output: the final RGBA color for that pixel. The fragment shader computes lighting, textures, and pixel-level effects. For our triangle, it receives the interpolated vertex color and returns it unchanged.
+
+### Stage 5: Output Merge
+
+The final hardware stage before the color hits the [[framebuffer]](GLOSSARY.md#framebuffer).
+
+Per-fragment operations:
+
+- **Depth test:** Compare the fragment's Z value against the depth buffer. Discard fragments behind already-drawn geometry. We disable this for our triangle — we only draw one primitive.
+- **Stencil test:** Mask drawing to specific screen regions via a stencil buffer. We disable this.
+- **Blend:** Combine the new fragment color with the existing framebuffer color. We use REPLACE — the fragment color overwrites whatever was there.
+
+Before the output merge, the GPU performs the [[viewport transform]](GLOSSARY.md#viewport-transform): mapping NDC coordinates to window pixel dimensions. This step is automatic and configured by your surface dimensions.
+
+After the output merge, the final color is written to the framebuffer. When you [[load op]](GLOSSARY.md#loadop) is `Clear`, the framebuffer is filled with your background color before the render pass begins. [[Storeop]](GLOSSARY.md#storeop) determines whether you keep or discard the results after the render pass.
+
+## Why This Matters For The Rainbow Triangle
+
+The entire rainbow triangle effect flows from the pipeline architecture:
+
+1. **Vertex shader** runs 3 times — once for each vertex. Each invocation outputs a position and a solid color: red, green, or blue.
+
+2. **Primitive assembly** groups those 3 vertices into one triangle.
+
+3. **Rasterizer** covers ~1000 screen pixels, generating 1000+ fragments. For each fragment, it interpolates the three vertex colors using barycentric weights. A pixel near the red vertex gets mostly red. A pixel in the center gets roughly equal parts of all three. This produces the gradient automatically.
+
+4. **Fragment shader** runs 1000+ times — once per fragment. Each invocation receives the already-interpolated color and writes it to the output.
+
+The rainbow gradient is not programmed. There is no loop, no formula, no color blending logic. The gradient is a direct consequence of the pipeline architecture: the rasterizer interpolates vertex shader outputs across the triangle surface, and the fragment shader passes the interpolated value through. You supply three colors at three corners, and the GPU fills in the continuum between them.
+
+## The Pipeline Object In wgpu
+
+In wgpu, you compile all of this into a [[pipeline]](GLOSSARY.md#pipeline): a single opaque render pipeline object encoding your shaders, topology, blend state, vertex layout, and output format. It is created once during initialization and reused every frame. Creating a pipeline up-front saves per-frame compilation and state configuration. The [[device]](GLOSSARY.md#device) owns the pipeline, and you use the [[queue]](GLOSSARY.md#queue) to submit draw calls that reference it.
+
+The [[adapter]](GLOSSARY.md#adapter) is the physical GPU or software renderer you select. There may be multiple on a single system — a dedicated NVIDIA card plus integrated Intel graphics. You pick one adapter, create a device from it, and all resources flow from that device.
diff --git a/docs/concepts/shader-basics.md b/docs/concepts/shader-basics.md
new file mode 100644
index 0000000..ca04e82
--- /dev/null
+++ b/docs/concepts/shader-basics.md
@@ -0,0 +1,136 @@
+# Shader Basics
+
+## What Is A Shader
+
+A shader is a GPU program. It is a piece of code that runs on the GPU instead of the CPU. Unlike a CPU program, you do not call a shader function once. You configure it, bind data to it, and then the GPU runs thousands of copies simultaneously on different data elements. One shader invocation per vertex. One shader invocation per pixel.
+
+Shaders are written in WGSL — [[wgsl]](GLOSSARY.md#wgsl), the WebGPU Shading Language. WGSL is compiled down to the platform's native intermediate representation: SPIR-V for Vulkan, MSL for Metal, DXIL for DirectX. You write one shader; wgpu handles the translation.
+
+## WGSL Constraints
+
+WGSL is designed for parallel execution on hardware with severe restrictions:
+
+- **No heap allocation.** There is no `Box`, no `Vec`, no `String`. All memory is static and sized at compile time.
+- **No recursion.** The GPU has a fixed, tiny stack. Recursive calls are banned.
+- **No I/O.** No `print`, no `println`, no file access, no `socket`. A shader communicates only through its return values and writes to bound buffers/textures.
+- **Static types.** `f32`, `i32`, `u32` for scalars. `vec2<T>`, `vec3<T>`, `vec4<T>` for vectors. `mat2x2<T>` through `mat4x4<T>` for matrices. Every expression has a known type at compile time. There is no `any` and no `dyn`.
+- **No arbitrary memory access.** You read from structured inputs (vertex attributes, uniform buffers, textures) and write to defined outputs. Memory is laid out contiguously in [[buffer slice]](GLOSSARY.md#buffer-slice) regions.
+
+These are not bugs. They are the GPU architecture. Every shader invocation runs in an identical sandbox. That identity is what enables 1000x throughput.
+
+## Shader Entry Points
+
+A shader module contains one or more entry point functions. Each entry point is tagged with an attribute that tells the GPU when to run it and what pipeline stage it belongs to.
+
+### `@vertex` — Vertex Shader Entry Point
+
+Runs once per input [[vertex]](GLOSSARY.md#vertex). The GPU calls this function for every vertex in your draw call.
+
+**Mandatory output:** `@builtin(position) vec4<f32>` — the [[clip space]](GLOSSARY.md#clip-space) position that the GPU uses for [[primitive]](GLOSSARY.md#primitive) assembly and rasterization. Without this output, the pipeline fails.
+
+**Optional outputs:** Any number of `@location(n)` values that flow to the fragment shader. Color, UV coordinates, normals — everything downstream needs is passed through the vertex shader output.
+
+### `@fragment` — Fragment Shader Entry Point
+
+Runs once per [[fragment]](GLOSSARY.md#fragment) produced by the rasterizer. For a triangle covering 500 pixels on screen, the fragment shader runs 500 times.
+
+**Input:** Interpolated values from the vertex shader. If the vertex shader output `@location(0) color: vec3<f32>`, the fragment shader receives that same `@location(0)` with hardware-interpolated values.
+
+**Output:** `@location(0) vec4<f32>` — the final RGBA color written to the [[framebuffer]](GLOSSARY.md#framebuffer).
+
+## The Location Contract
+
+---
+
+> **LOCATION BINDING IS THE CRITICAL LINK BETWEEN RUST AND WGSL**
+>
+> Every value flowing between Rust buffers and WGSL shader functions is tied together by a numeric [[shader location]](GLOSSARY.md#shader-location) label. The number on the Rust side must match the number on the WGSL side.
+>
+> Rust: `VertexAttribute { shader_location: 0, ... }`
+>
+> WGSL: `@location(0) color: vec3<f32>`
+>
+> If the numbers differ, the GPU reads from the wrong buffer offset. You get garbage output, silent corruption, or a crash. There is no runtime warning. The pipeline does not validate these bindings. The responsibility sits with the developer.
+
+---
+
+## Interpolation Mechanism
+
+Between the vertex shader and the fragment shader, the [[rasterizer]](GLOSSARY.md#rasterizer) performs a computation that most graphics tutorials treat as magic. It is not magic. It is [[interpolation]](GLOSSARY.md#interpolation).
+
+For every `@location(n)` value the vertex shader outputs, the rasterizer computes a triangle-wide linear blend:
+
+```
+fragment_value = w0 * vertex0_value + w1 * vertex1_value + w2 * vertex2_value
+```
+
+where `w0 + w1 + w2 = 1.0` and the weights are [[barycentric coordinates]](GLOSSARY.md#barycentric-coordinates) computed from the fragment's position inside the triangle.
+
+This interpolation is free. It is a dedicated hardware unit inside every GPU. You do not write the code. You do not pay an algorithmic cost. The rasterizer hardware computes barycentric weights and blends every vertex shader output automatically. The fragment shader receives pre-blended values and does not need to know how they were computed.
+
+## Concrete Shader Walkthrough
+
+This is the complete shader for the rainbow triangle. Every line is explained below.
+
+```wgsl
+struct VertexOutput {
+    @builtin(position) clip_position: vec4<f32>,
+    @location(0) vertex_color: vec3<f32>,
+};
+
+@vertex
+fn vs_main(
+    @location(0) position: vec3<f32>,
+    @location(1) color: vec3<f32>,
+) -> VertexOutput {
+    var out: VertexOutput;
+    out.clip_position = vec4<f32>(position, 1.0);
+    out.vertex_color = color;
+    return out;
+}
+
+@fragment
+fn fs_main(input: VertexOutput) -> @location(0) vec4<f32> {
+    return vec4<f32>(input.vertex_color, 1.0);
+}
+```
+
+### Line by line
+
+**`struct VertexOutput { ... }`** — The interface between vertex and fragment stages. This struct defines everything the vertex shader sends downstream. It is the contract the rasterizer enforces.
+
+**`@builtin(position) clip_position: vec4<f32>`** — The mandatory clip-space position output. The `@builtin(position)` annotation tells the GPU this value goes to the primitive assembly / rasterizer pipeline, not to another shader stage. The GPU reads this to know where each vertex sits in 3D space.
+
+**`@location(0) vertex_color: vec3<f32>`** — An interpolant flowing from vertex to fragment stage. The `@location(0)` annotation labels this value with binding index 0. Any `@location(0)` output here becomes the `@location(0)` input to the fragment shader.
+
+**`@vertex fn vs_main(...)`** — The vertex shader entry point. The `@vertex` attribute marks this as the function the vertex pipeline stage calls.
+
+**`@location(0) position: vec3<f32>`** — Vertex buffer input at location 0. In Rust, the vertex buffer's first attribute is declared with `shader_location: 0`. This is the first half of the location contract: the Rust buffer layout and WGSL input must agree.
+
+**`@location(1) color: vec3<f32>`** — Vertex buffer input at location 1. The second vertex attribute in the buffer. Each vertex stores two values: a 3-component position and a 3-component color, contiguous in memory.
+
+**`var out: VertexOutput;`** — Local variable holding the shader output. WGSL requires explicit variable declarations.
+
+**`out.clip_position = vec4<f32>(position, 1.0);`** — Wraps the 3D position in a [[homogeneous coordinates]](GLOSSARY.md#homogeneous-coordinates) `vec4` by appending `w = 1.0`. See [[coordinate-systems.md]](coordinate-systems.md) for why `w = 1.0` is the identity for our triangle.
+
+**`out.vertex_color = color;`** — Passes the vertex color through to the fragment shader. No transformation needed — the color is already the final per-vertex color. The rasterizer will blend across the triangle surface.
+
+**`@fragment fn fs_main(input: VertexOutput) -> ...`** — The fragment shader entry point. It receives one input struct per fragment. This struct contains the rasterizer's pre-interpolated values.
+
+**`input.vertex_color`** — The color value, already blended by the rasterizer. If the current fragment is 70% close to the red vertex, 20% close to green, 10% close to blue, this value is `(0.7*1.0 + 0.2*0.0 + 0.1*0.0, 0.7*0.0 + 0.2*1.0 + 0.1*0.0, 0.7*0.0 + 0.2*0.0 + 0.1*1.0)` = `(0.7, 0.2, 0.1)`. The interpolation was performed by hardware; the fragment shader does not compute it.
+
+**`-> @location(0) vec4<f32>`** — The fragment shader output signature. `@location(0)` maps to the color attachment in the [[pipeline]](GLOSSARY.md#pipeline) render pass. It is the pixel color written to the framebuffer.
+
+**`vec4<f32>(input.vertex_color, 1.0)`** — Wraps the interpolated RGB color in `vec4` by appending alpha = 1.0 (fully opaque). The framebuffer expects a 4-component color.
+
+## WGSL Source Embedding
+
+In wgpu, the shader source code lives as a Rust string, embedded at compile time:
+
+```rust
+const SHADER_SOURCE: &str = include_str!("shaders/main.wgsl");
+```
+
+`include_str!` reads the WGSL file during Rust compilation and inlines it as a `&'static str`. There is no runtime file I/O. The shader text is part of the binary. When you create the shader module via `device.create_shader_module()`, wgpu compiles the string to the platform's GPU intermediate format (SPIR-V, MSL, or DXIL). The compilation happens asynchronously on the [[device]](GLOSSARY.md#device) — you drive it to completion with a [[device poll]](GLOSSARY.md#device-poll).
+
+This is intentional: GPU drivers are slow to initialize file paths. Embedding the source at compile time is idiomatic wgpu and eliminates a class of runtime errors.