Files
learn-wgpu/docs/concepts/shader-basics.md

9.0 KiB

Shader Basics

What Is A Shader

A shader is a GPU program. It is a piece of code that runs on the GPU instead of the CPU. Unlike a CPU program, you do not call a shader function once. You configure it, bind data to it, and then the GPU runs thousands of copies simultaneously on different data elements. One shader invocation per vertex. One shader invocation per pixel.

Shaders are written in WGSL — [wgsl], the WebGPU Shading Language. WGSL is compiled down to the platform's native intermediate representation: SPIR-V for Vulkan, MSL for Metal, DXIL for DirectX. You write one shader; wgpu handles the translation.

WGSL Constraints

WGSL is designed for parallel execution on hardware with severe restrictions:

  • No heap allocation. There is no Box, no Vec, no String. All memory is static and sized at compile time.
  • No recursion. The GPU has a fixed, tiny stack. Recursive calls are banned.
  • No I/O. No print, no println, no file access, no socket. A shader communicates only through its return values and writes to bound buffers/textures.
  • Static types. f32, i32, u32 for scalars. vec2<T>, vec3<T>, vec4<T> for vectors. mat2x2<T> through mat4x4<T> for matrices. Every expression has a known type at compile time. There is no any and no dyn.
  • No arbitrary memory access. You read from structured inputs (vertex attributes, uniform buffers, textures) and write to defined outputs. Memory is laid out contiguously in [buffer slice] regions.

These are not bugs. They are the GPU architecture. Every shader invocation runs in an identical sandbox. That identity is what enables 1000x throughput.

Shader Entry Points

A shader module contains one or more entry point functions. Each entry point is tagged with an attribute that tells the GPU when to run it and what pipeline stage it belongs to.

@vertex — Vertex Shader Entry Point

Runs once per input [vertex]. The GPU calls this function for every vertex in your draw call.

Mandatory output: @builtin(position) vec4<f32> — the [clip space] position that the GPU uses for [primitive] assembly and rasterization. Without this output, the pipeline fails.

Optional outputs: Any number of @location(n) values that flow to the fragment shader. Color, UV coordinates, normals — everything downstream needs is passed through the vertex shader output.

@fragment — Fragment Shader Entry Point

Runs once per [fragment] produced by the rasterizer. For a triangle covering 500 pixels on screen, the fragment shader runs 500 times.

Input: Interpolated values from the vertex shader. If the vertex shader output @location(0) color: vec3<f32>, the fragment shader receives that same @location(0) with hardware-interpolated values.

Output: @location(0) vec4<f32> — the final RGBA color written to the [framebuffer].

The Location Contract


LOCATION BINDING IS THE CRITICAL LINK BETWEEN RUST AND WGSL

Every value flowing between Rust buffers and WGSL shader functions is tied together by a numeric [shader location] label. The number on the Rust side must match the number on the WGSL side.

Rust: VertexAttribute { shader_location: 0, ... }

WGSL: @location(0) color: vec3<f32>

If the numbers differ, the GPU reads from the wrong buffer offset. You get garbage output, silent corruption, or a crash. There is no runtime warning. The pipeline does not validate these bindings. The responsibility sits with the developer.


Interpolation Mechanism

Between the vertex shader and the fragment shader, the [rasterizer] performs a computation that most graphics tutorials treat as magic. It is not magic. It is [interpolation].

For every @location(n) value the vertex shader outputs, the rasterizer computes a triangle-wide linear blend:

fragment_value = w0 * vertex0_value + w1 * vertex1_value + w2 * vertex2_value

where w0 + w1 + w2 = 1.0 and the weights are [barycentric coordinates] computed from the fragment's position inside the triangle.

This interpolation is free. It is a dedicated hardware unit inside every GPU. You do not write the code. You do not pay an algorithmic cost. The rasterizer hardware computes barycentric weights and blends every vertex shader output automatically. The fragment shader receives pre-blended values and does not need to know how they were computed.

Concrete Shader Walkthrough

This is the complete shader for the rainbow triangle. Every line is explained below.

struct VertexOutput {
    @builtin(position) clip_position: vec4<f32>,
    @location(0) vertex_color: vec3<f32>,
};

@vertex
fn vs_main(
    @location(0) position: vec3<f32>,
    @location(1) color: vec3<f32>,
) -> VertexOutput {
    var out: VertexOutput;
    out.clip_position = vec4<f32>(position, 1.0);
    out.vertex_color = color;
    return out;
}

@fragment
fn fs_main(input: VertexOutput) -> @location(0) vec4<f32> {
    return vec4<f32>(input.vertex_color, 1.0);
}

Line by line

struct VertexOutput { ... } — The interface between vertex and fragment stages. This struct defines everything the vertex shader sends downstream. It is the contract the rasterizer enforces.

@builtin(position) clip_position: vec4<f32> — The mandatory clip-space position output. The @builtin(position) annotation tells the GPU this value goes to the primitive assembly / rasterizer pipeline, not to another shader stage. The GPU reads this to know where each vertex sits in 3D space.

@location(0) vertex_color: vec3<f32> — An interpolant flowing from vertex to fragment stage. The @location(0) annotation labels this value with binding index 0. Any @location(0) output here becomes the @location(0) input to the fragment shader.

@vertex fn vs_main(...) — The vertex shader entry point. The @vertex attribute marks this as the function the vertex pipeline stage calls.

@location(0) position: vec3<f32> — Vertex buffer input at location 0. In Rust, the vertex buffer's first attribute is declared with shader_location: 0. This is the first half of the location contract: the Rust buffer layout and WGSL input must agree.

@location(1) color: vec3<f32> — Vertex buffer input at location 1. The second vertex attribute in the buffer. Each vertex stores two values: a 3-component position and a 3-component color, contiguous in memory.

var out: VertexOutput; — Local variable holding the shader output. WGSL requires explicit variable declarations.

out.clip_position = vec4<f32>(position, 1.0); — Wraps the 3D position in a [homogeneous coordinates] vec4 by appending w = 1.0. See [coordinate-systems.md] for why w = 1.0 is the identity for our triangle.

out.vertex_color = color; — Passes the vertex color through to the fragment shader. No transformation needed — the color is already the final per-vertex color. The rasterizer will blend across the triangle surface.

@fragment fn fs_main(input: VertexOutput) -> ... — The fragment shader entry point. It receives one input struct per fragment. This struct contains the rasterizer's pre-interpolated values.

input.vertex_color — The color value, already blended by the rasterizer. If the current fragment is 70% close to the red vertex, 20% close to green, 10% close to blue, this value is (0.7*1.0 + 0.2*0.0 + 0.1*0.0, 0.7*0.0 + 0.2*1.0 + 0.1*0.0, 0.7*0.0 + 0.2*0.0 + 0.1*1.0) = (0.7, 0.2, 0.1). The interpolation was performed by hardware; the fragment shader does not compute it.

-> @location(0) vec4<f32> — The fragment shader output signature. @location(0) maps to the color attachment in the [pipeline] render pass. It is the pixel color written to the framebuffer.

vec4<f32>(input.vertex_color, 1.0) — Wraps the interpolated RGB color in vec4 by appending alpha = 1.0 (fully opaque). The framebuffer expects a 4-component color.

WGSL Source Embedding

In wgpu, the shader source code lives as a Rust string, embedded at compile time:

const SHADER_SOURCE: &str = include_str!("shader.wgsl");

include_str! reads the WGSL file during Rust compilation and inlines it as a &'static str. There is no runtime file I/O. The shader text is part of the binary. When you create the shader module via device.create_shader_module(), wgpu compiles the string to the platform's GPU intermediate format (SPIR-V, MSL, or DXIL). The compilation happens asynchronously on the [device] — you drive it to completion with a [device poll].

This is intentional: GPU drivers are slow to initialize file paths. Embedding the source at compile time is idiomatic wgpu and eliminates a class of runtime errors.