Shader Basics

What Is A Shader

A shader is a GPU program. It is a piece of code that runs on the GPU instead of the CPU. Unlike a CPU program, you do not call a shader function once. You configure it, bind data to it, and then the GPU runs thousands of copies simultaneously on different data elements. One shader invocation per vertex. One shader invocation per pixel.

Shaders are written in WGSL — wgsl, the WebGPU Shading Language. WGSL is compiled down to the platform's native intermediate representation: SPIR-V for Vulkan, MSL for Metal, DXIL for DirectX. You write one shader; wgpu handles the translation.

WGSL Constraints

WGSL is designed for parallel execution on hardware with severe restrictions:

No heap allocation. There is no Box, no Vec, no String. All memory is static and sized at compile time.
No recursion. The GPU has a fixed, tiny stack. Recursive calls are banned.
No I/O. No print, no println, no file access, no socket. A shader communicates only through its return values and writes to bound buffers/textures.
Static types. f32, i32, u32 for scalars. vec2<T>, vec3<T>, vec4<T> for vectors. mat2x2<T> through mat4x4<T> for matrices. Every expression has a known type at compile time. There is no any and no dyn.
No arbitrary memory access. You read from structured inputs (vertex attributes, uniform buffers, textures) and write to defined outputs. Memory is laid out contiguously in buffer slice regions.

These are not bugs. They are the GPU architecture. Every shader invocation runs in an identical sandbox. That identity is what enables 1000x throughput.

Shader Entry Points

A shader module contains one or more entry point functions. Each entry point is tagged with an attribute that tells the GPU when to run it and what pipeline stage it belongs to.

`@vertex` — Vertex Shader Entry Point

Runs once per input vertex. The GPU calls this function for every vertex in your draw call.

Mandatory output: @builtin(position) vec4<f32> — the clip space position that the GPU uses for primitive assembly and rasterization. Without this output, the pipeline fails.

Optional outputs: Any number of @location(n) values that flow to the fragment shader. Color, UV coordinates, normals — everything downstream needs is passed through the vertex shader output.

`@fragment` — Fragment Shader Entry Point

Runs once per fragment produced by the rasterizer. For a triangle covering 500 pixels on screen, the fragment shader runs 500 times.

Input: Interpolated values from the vertex shader. If the vertex shader output @location(0) color: vec3<f32>, the fragment shader receives that same @location(0) with hardware-interpolated values.

Output: @location(0) vec4<f32> — the final RGBA color written to the framebuffer.

The Location Contract

LOCATION BINDING IS THE CRITICAL LINK BETWEEN RUST AND WGSL

Every value flowing between Rust buffers and WGSL shader functions is tied together by a numeric shader location label. The number on the Rust side must match the number on the WGSL side.

Rust: VertexAttribute { shader_location: 0, ... }

WGSL: @location(0) color: vec3<f32>

If the numbers differ, the GPU reads from the wrong buffer offset. You get garbage output, silent corruption, or a crash. There is no runtime warning. The pipeline does not validate these bindings. The responsibility sits with the developer.

Interpolation Mechanism

Between the vertex shader and the fragment shader, the rasterizer performs a computation that most graphics tutorials treat as magic. It is not magic. It is interpolation.

For every @location(n) value the vertex shader outputs, the rasterizer computes a triangle-wide linear blend:

fragment_value = w0 * vertex0_value + w1 * vertex1_value + w2 * vertex2_value

where w0 + w1 + w2 = 1.0 and the weights are barycentric coordinates computed from the fragment's position inside the triangle.

This interpolation is free. It is a dedicated hardware unit inside every GPU. You do not write the code. You do not pay an algorithmic cost. The rasterizer hardware computes barycentric weights and blends every vertex shader output automatically. The fragment shader receives pre-blended values and does not need to know how they were computed.

How Shaders Work Together

A complete rendering shader is a two-stage program compiled into a single WGSL module. The vertex shader runs once per vertex in your draw call, transforming raw buffer data into GPU-ready outputs. The fragment shader runs once per pixel produced by the rasterizer, converting interpolated vertex data into the final color written to the framebuffer. Both stages execute in parallel across thousands of invocations — the vertex shader processes all vertices simultaneously, then the fragment shader processes all fragments simultaneously.

Data flows between the vertex and fragment stages through a shared struct. The struct's fields are tagged with WGSL attributes that tell the GPU how to route each value:

@location(n) marks values that bind to Rust vertex buffer attributes or flow between shader stages. The number n is a binding index: on the Rust side it appears as shader_location: n in a VertexAttribute, and in WGSL it appears as @location(n) on a parameter or struct field. If the numbers differ, the GPU reads from the wrong buffer offset and produces silent garbage. Between the vertex and fragment stages, @location values are automatically interpolated by the rasterizer using barycentric weights — the fragment shader receives a smooth blend without writing any interpolation code.
@builtin(position) is a reserved slot the vertex shader must output. It delivers the vertex's clip space position as vec4<f32>, which the rasterizer uses for perspective division, viewport transform, and primitive assembly. The fragment shader receives its own independent @builtin(position) from the fragment pipeline stage — providing framebuffer pixel coordinates — not the vertex shader's output. The two builtins share a name but are completely separate values from different stages.

The vertex shader produces a struct containing a @builtin(position) output plus any number of @location interpolants. The rasterizer takes these outputs, assembles primitives, and for every pixel inside the triangle computes barycentric coordinates and blends all @location fields. The fragment shader receives the fully interpolated struct and outputs a vec4<f32> color at @location(0), which maps to the pipeline's color attachment target.

For a complete line-by-line walkthrough of our rainbow triangle shader, see Section 4.

WGSL Source Embedding

In wgpu, the shader source code lives as a Rust string, embedded at compile time:

const SHADER_SOURCE: &str = include_str!("shader.wgsl");

include_str! reads the WGSL file during Rust compilation and inlines it as a &'static str. There is no runtime file I/O. The shader text is part of the binary. When you create the shader module via device.create_shader_module(), wgpu compiles the string to the platform's GPU intermediate format (SPIR-V, MSL, or DXIL). The compilation happens asynchronously on the device — you drive it to completion with a device poll.

This is intentional: GPU drivers are slow to initialize file paths. Embedding the source at compile time is idiomatic wgpu and eliminates a class of runtime errors.

7.9 KiB Raw Blame History