marsultor/learn-wgpu

Fork 0

Files

Krishna Ayyalasomayajula de38f526b9 docs: append sections S4-S6 (shaders, vertex data, render pipeline)

2026-05-30 17:44:31 -05:00

40 KiB

Raw Blame History

Building a Rainbow Triangle

S1: What We're Building

We're creating a window containing a single triangle with smoothly blended colors:

Red at the bottom-left corner, blue at the bottom-right corner, and green at the top vertex. The gradient between each pair of vertices is not computed by you — it is interpolated automatically by the GPU rasterizer in hardware. You provide three vertices, each carrying a position and a color. The rasterizer determines every pixel covered by the triangle and computes the color for that pixel by blending the three vertex colors proportionally to their distance. The result is a smooth rainbow gradient across a single primitive. We do not need a texture, a colormap, or a fragment shader with any branching — just three colored vertices and the default linear interpolation the rasterizer applies to every varying.

If you haven't read the concept overview, do so now. Coordinate systems explains how the GPU positions geometry. Shader basics covers the GPU programs that drive rendering.

S2: The winit Application and Event Loop

New concept: event-driven windowing. winit is the bridge between your Rust code and the display server (X11 or Wayland on Linux). Think of it like epoll or kqueue but for windows, input, and display lifecycle events instead of file descriptors.

The entire program runs on the tokio async runtime — wgpu's adapter queries and device creation are async, and the runtime is the natural home for the main event loop.

Architecture Overview

main() is #[tokio::main] async fn — the entry point runs on the tokio runtime, giving us access to tokio's task scheduler and I/O facilities.
tokio::spawn_blocking — winit's event_loop.run_app() is synchronous and owns the display server connection. Blocking the tokio runtime thread with an indefinite sync call would starve other tasks. We offload the blocking event loop to a dedicated thread, then await the join handle.
Handle::block_on() in resumed() — wgpu initialization (adapter and device queries) is async, but winit's resumed() handler is synchronous. We bridge the two execution models exactly once at startup. This initial GPU setup takes ~50ms of wall time.
Arc<Window> — shared reference count to the window, needed because both winit event handlers and wgpu surface state must hold a reference to the same window object across the event loop boundary.
ControlFlow::Poll — continuous redraw mode. winit fires RedrawRequested as fast as the display server allows the window to be presented, giving us a tight render loop without a separate timer or explicit vsync setup. The display present mode controls the actual vsync behavior.

Dependencies

Add these to your Cargo.toml:

wgpu = "29"
winit = "0.30"
tokio = { version = "1", features = ["rt", "macros"] }
bytemuck = { version = "1", features = ["derive"] }
log = "0.4"
simple_logger = "5"

wgpu — the GPU abstraction layer. Manages device lifecycles, shaders, buffers, pipelines, and command encoding.
winit — cross-platform window creation and event dispatch. Owns the display server connection.
tokio — async runtime for the main loop and all GPU queries.
bytemuck — zero-copy casting between Rust structs and byte slices. Required for uploading vertex data to GPU buffers without manual serialization.
log / simple_logger — structured logging. wgpu and winit emit diagnostic messages via log when misconfigurations or driver issues are detected.

Complete Code

use std::sync::Arc;
use winit::application::ApplicationHandler;
use winit::dpi::LogicalSize;
use winit::event::WindowEvent;
use winit::event_loop::{ActiveEventLoop, ControlFlow, EventLoop};
use winit::window::{Window, WindowId};

#[tokio::main]
async fn main() {
    simple_logger::init_with_level(log::Level::Debug).unwrap();

    let event_loop = EventLoop::new().unwrap();
    let handle = tokio::Handle::current();

    tokio::spawn_blocking(move || {
        event_loop.run_app(&mut App {
            handle,
            window: None,
            state: None,
        })
    })
    .await
    .unwrap();
}

struct App {
    handle: tokio::Handle,
    window: Option<Arc<Window>>,
    state: Option<State>,
}

impl ApplicationHandler<()> for App {
    fn resumed(&mut self, event_loop_ctl: &ActiveEventLoop) {
        let window = Arc::new(
            event_loop_ctl
                .create_window(
                    Window::default_attributes()
                        .with_inner_size(LogicalSize::new(800.0, 600.0))
                        .with_title("Rainbow Triangle"),
                )
                .unwrap(),
        );
        event_loop_ctl.set_control_flow(ControlFlow::Poll);
        self.window = Some(window.clone());

        self.state = Some(
            self.handle
                .block_on(async {
                    State::new(window.clone()).await.expect("Failed to create wgpu State")
                })
                .expect("Failed to create wgpu State"),
        );
    }

    fn window_event(
        &mut self,
        event_loop_ctl: &ActiveEventLoop,
        _window_id: WindowId,
        event: WindowEvent,
    ) {
        let Some(state) = self.state.as_mut() else { return };
        let Some(window) = self.window.as_ref() else { return };

        match event {
            WindowEvent::Resized(size) => state.resize(window, size),
            WindowEvent::CloseRequested { .. } => event_loop_ctl.exit(),
            WindowEvent::RedrawRequested => {
                state.render();
                window.request_redraw();
            }
            _ => {}
        }
    }

    fn exiting(&mut self, event_loop_ctl: &ActiveEventLoop) {
        event_loop_ctl.exit();
    }
}

Why spawn_blocking: The display server event loop must run to completion and cannot be interrupted. If we ran run_app() on the tokio runtime thread, no other async tasks could execute. By spawning it on a blocking thread, the tokio runtime remains free for GPU queries, driver I/O, and future background tasks.

Why Handle::block_on: wgpu's request_adapter and request_device query the driver over async D-Bus/Wayland/Vulkan entrypoints. These futures must be polled by a runtime executor. block_on attaches temporarily to the runtime thread via its handle, polls the future to completion (~50ms), then returns the result.

Why ControlFlow::Poll: winit supports ControlFlow::Poll (continuous redraw) and ControlFlow::Wait (idle until next event). A graphics application needs a steady render loop. Poll tells winit to keep firing RedrawRequested events. We re-queue ourselves inside the handler via window.request_redraw(), matching the wgpu swapchain presentation rhythm.

Why request_redraw(): After presenting a frame to the display, we ask winit to schedule the next RedrawRequested frame. This creates an explicit render loop: render → present → request redraw → render → repeat. The rate is governed by the swapchain present mode.

Why exiting(): This is the final lifecycle signal before the process terminates. On some display servers, CloseRequested fires on the window but the event loop must still drain. exiting() ensures we have one last clean opportunity to flush the queue and release GPU resources before the process exits.

S3: Connecting to the GPU — The Init Chain

New concept: 5-layer GPU connection. Each layer adds a capability:

Instance — opens a connection to the graphics driver. On Vulkan this loads the Vulkan loader and registers instance-level extensions. On WebGL this picks the browser GPU context.
Surface — binds the instance to a specific window's swapchain. The surface is the wgpu representation of the window's display buffer.
Adapter — selects the physical GPU hardware. An adapter wraps the actual driver + silicon pair (e.g., Mesa RADV on AMD, NVIDIA driver on NVIDIA silicon).
Device + Queue — the device owns all GPU resources (buffers, textures, shaders, pipelines). The queue is the submission channel: you encode work into command buffers and submit them to the queue.
SurfaceConfiguration — allocates the swapchain framebuffers for this window at a specific resolution and pixel format.

The State Struct

struct State {
    surface: wgpu::Surface<'static>,
    device: wgpu::Device,
    queue: wgpu::Queue,
    config: wgpu::SurfaceConfiguration,
    pipeline: wgpu::RenderPipeline,
    vertex_buffer: wgpu::Buffer,
}

surface — connects to the window's display buffer. The 'static lifetime is safe because App owns the window and lives for the entire lifetime of the process. The surface mediates all swapchain operations.
device — owns all GPU resources. Every buffer, texture, shader module, and pipeline created in this guide is a child of the device. When the device is dropped, all its children are freed.
queue — the command submission channel. You encode a frame's worth of work into a command buffer, then submit that buffer to the queue. The queue pushes work to the GPU hardware.
config — holds the surface's current width, height, pixel format, and present mode. When the window is resized, we reconfigure the surface with updated dimensions.
pipeline — the compiled render pipeline. A render pipeline is an immutable configuration combining a shader, a vertex buffer layout, a primitive topology, and a color target setup. Switching pipelines mid-frame is expensive; most applications use a few pipelines and change them between draw calls.
vertex_buffer — GPU memory holding our vertex data. The GPU reads position and color data directly from this buffer during the vertex shader stage.

Complete `State::new()` Implementation

use wgpu::Surface;

// --- Vertex type and data ---

#[repr(C)]
#[derive(Clone, Copy, bytemuck::Pod, bytemuck::Zeroable)]
struct Vertex {
    position: [f32; 3],
    color: [f32; 3],
}

const VERTICES: &[Vertex] = &[
    Vertex { position: [-0.5, -0.5, 0.0], color: [1.0, 0.0, 0.0] }, // red
    Vertex { position: [ 0.5, -0.5, 0.0], color: [0.0, 0.0, 1.0] }, // blue
    Vertex { position: [ 0.0,  0.5, 0.0], color: [0.0, 1.0, 0.0] }, // green
];

impl State {
    async fn new(window: Arc<Window>) -> Result<Self, String> {
        // Step 1: Instance — connection to the graphics driver
        let instance = wgpu::Instance::default();

        // Step 2: Surface — binds our window to the GPU's swapchain
        let surface = instance
            .create_surface(window)
            .map_err(|e| format!("Failed to create surface: {:?}", e))?;

        // Step 3: Adapter — selects the physical GPU
        let adapter = instance
            .request_adapter(&wgpu::RequestAdapterOptions {
                power_preference: wgpu::PowerPreference::HighPerformance,
                force_fallback_adapter: false,
                compatible_surface: None,
            })
            .await
            .ok_or("No GPU adapter found. Ensure Vulkan drivers are installed.")?;

        // Step 4: Device + Queue — resource owner + command submission
        let (device, queue) = adapter
            .request_device(&wgpu::DeviceDescriptor::default(), None)
            .await
            .map_err(|e| format!("Failed to request device: {:?}", e))?;

        // Step 5: SurfaceConfiguration — allocates swapchain framebuffers
        let size = window.inner_size();
        let surface_caps = surface.get_capabilities(&adapter);
        let format = surface_caps.formats.iter()
            .find(|f| f.is_srgb())
            .copied()
            .unwrap_or(surface_caps.formats[0]);

        let config = wgpu::SurfaceConfiguration {
            usage: wgpu::TextureUsages::RENDER_ATTACHMENT | wgpu::TextureUsages::TEXTURE_BINDING,
            format,
            width: size.width.max(1),
            height: size.height.max(1),
            present_mode: wgpu::PresentMode::Mailbox,
            desired_maximum_frame_latency: 2,
            alpha_mode: surface_caps.alpha_modes[0],
            view_formats: vec![format.add_srgb_suffix()],
        };
        surface.configure(&device, &config);

        // Step 6: Compile the shader module
        let shader_module = device.create_shader_module(
            wgpu::ShaderModuleDescriptor {
                label: Some("Rainbow Triangle Shader"),
                source: wgpu::ShaderSource::Wgsl(include_str!("shader.wgsl").into()),
            }
        );

        // Step 7: Upload vertex data to GPU memory
        use wgpu::util::DeviceExt;
        let vertex_buffer = device.create_buffer_init(
            &wgpu::util::BufferInitDescriptor {
                label: Some("Vertex Buffer"),
                contents: bytemuck::cast_slice(VERTICES),
                usage: wgpu::BufferUsages::VERTEX,
            }
        );

        // Step 8: Create the render pipeline
        let vertex_buffer_layout = wgpu::VertexBufferLayout {
            array_stride: std::mem::size_of::<Vertex>() as u64,
            step_mode: wgpu::VertexStepMode::Vertex,
            attributes: &[
                wgpu::VertexAttribute {
                    offset: 0,
                    format: wgpu::VertexFormat::F32x3,
                    shader_location: 0,
                },
                wgpu::VertexAttribute {
                    offset: std::mem::size_of::<[f32; 3]>() as u64,
                    format: wgpu::VertexFormat::F32x3,
                    shader_location: 1,
                },
            ],
        };

        let pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
            label: Some("Triangle Pipeline"),
            layout: None,
            vertex: wgpu::VertexState {
                module: &shader_module,
                entry_point: Some("vs_main"),
                buffers: &[vertex_buffer_layout],
                compilation_options: Default::default(),
            },
            primitive: wgpu::PrimitiveState {
                topology: wgpu::PrimitiveTopology::TriangleList,
                strip_index_format: None,
                front_face: wgpu::FrontFace::Ccw,
                cull_mode: Some(wgpu::Face::Back),
                unclipped_depth: false,
                polygon_mode: wgpu::PolygonMode::Fill,
                conservative: false,
            },
            depth_stencil: None,
            multisample: wgpu::MultisampleState {
                count: 1,
                mask: !0,
                alpha_to_coverage_enabled: false,
            },
            fragment: Some(wgpu::FragmentState {
                module: &shader_module,
                entry_point: Some("fs_main"),
                targets: &[Some(wgpu::ColorTargetState {
                    format: config.format,
                    blend: None,
                    write_mask: wgpu::ColorWrites::ALL,
                })],
                compilation_options: Default::default(),
            }),
            multiview_mask: None,
            cache: None,
        });

        Ok(Self {
            surface,
            device,
            queue,
            config,
            pipeline,
            vertex_buffer,
        })
    }
}

Init Steps Explained

Step 1 — Instance: Instance::default() opens a connection to the graphics driver on the current platform. On Linux with Vulkan, this loads libvulkan.so and creates a Vulkan VkInstance. On Windows, it loads vulkan-1.dll. The instance is the foundational wgpu object — every other wgpu operation requires it.

Step 2 — Surface: instance.create_surface(window) binds the wgpu instance to the winit Window. This tells the GPU: "the pixels of this window will be the output of my rendering." In Vulkan terms, this is the first half of creating a SwapchainKHR. The surface must match the window platform type exactly (X11, Wayland, Windows, macOS, etc.).

Step 3 — Adapter: request_adapter() queries available GPUs and returns the best match for the given options. With PowerPreference::HighPerformance, wgpu prefers a discrete GPU over an integrated one on hybrid systems (e.g., NVIDIA + Intel Optimus). The compatible_surface: None path works because our Instance was created without a display handle; on Linux with Vulkan, the adapter selection remains correct because the surface itself was created through a compatible instance.

Step 4 — Device + Queue: request_device() allocates the logical GPU resource manager and its submission queue. The device tracks all GPU memory and validates API calls. The queue is the submission endpoint — every rendered frame becomes a command buffer that is submitted to this queue. On Vulkan, the device corresponds to VkDevice and the queue to a VkQueue.

Step 5 — SurfaceConfiguration: This allocates the swapchain framebuffers. We negotiate the pixel format with the driver (preferring an sRGB format for correct color display), pick the window dimensions (clamped to at least 1x1 to allow minimize-and-restore on some platforms), and select the present mode. PresentMode::Mailbox is a triple-buffered present mode that provides consistent 60fps without tearing on most platforms. desired_maximum_frame_latency: 2 tells the swapchain to keep two frames of back pressure, smoothing out frame time spikes.

Steps 6 through 8 — shader module compilation, vertex buffer upload, and render pipeline assembly — will be explored in detail in the next sections.

S4: Writing the Shaders

New concept: shaders are GPU programs. A shader is a function or set of functions that runs on the GPU, compiled once at pipeline creation time, then executed thousands of times in parallel. Each invocation operates on different data but follows the identical instruction sequence. There is no heap allocation, no recursion, no I/O, and no shared mutable state. The GPU runs every invocation of a shader in lockstep: if one thread takes a different branch, the entire wavefront serializes both paths and discards the dead result. This is why you write shaders differently from CPU code — you optimize for parallelism and branchless arithmetic.

A shader module can contain multiple entry points. For rendering, the two mandatory entry points are the vertex shader and the fragment shader. The vertex shader runs once per vertex. The fragment shader runs once per fragment — that is, once per pixel covered by the rasterized primitive.

Key insight #1 — Interpolation is free hardware: The vertex shader outputs per-vertex colors at @location(0). The rasterizer automatically interpolates them across the triangle surface using barycentric coordinates. The fragment shader just returns whatever it receives. The rainbow gradient is not programmed — it is a consequence of the pipeline architecture. You supply colors at three points; the hardware computes every color in between at zero shader cost.

Why WGSL: WebGPU Shading Language (WGSL) is the single source format. wgpu compiles it to the platform-native intermediate at runtime: SPIR-V for Vulkan, MSL for Metal, DXIL for DirectX. You write one shader file and wgpu produces the right binary for every backend.

Why include_str!("shader.wgsl"): This Rust macro embeds the file contents at compile time. The shader source becomes a string literal inside your binary. At runtime there is zero file I/O. No paths to resolve, no loading failures, no async reads. If the file is missing or malformed, the build fails, not the runtime.

The Complete Shader

Create shader.wgsl in your project root (at the same level as main.rs):

struct VertexOutput {
    @builtin(position) clip_position: vec4<f32>,
    @location(0) vertex_color: vec3<f32>,
};

@vertex
fn vs_main(
    @location(0) position: vec3<f32>,
    @location(1) color: vec3<f32>,
) -> VertexOutput {
    var out: VertexOutput;
    out.clip_position = vec4<f32>(position, 1.0);
    out.vertex_color = color;
    return out;
}

@fragment
fn fs_main(input: VertexOutput) -> @location(0) vec4<f32> {
    return vec4<f32>(input.vertex_color, 1.0);
}

Line-by-Line Walkthrough

struct VertexOutput { ... } — Defines the data flowing between the vertex shader and the fragment shader. This struct is not a Rust type and not a buffer layout — it is the output contract of the vertex shader that the rasterizer carries through to the fragment shader.

@builtin(position) clip_position: vec4<f32> — @builtin(position) is a reserved GPU output slot. Every vertex shader must produce a vec4<f32> at this slot. This value is the vertex position in clip space. The GPU uses it for perspective division (dividing x, y, z by w to produce ndc) and clipping. In our triangle, the w component is 1.0, so perspective division is the identity operation — our positions are already in the right space.

@location(0) vertex_color: vec3<f32> — @location(0) marks this field for interpolation. Any @location(n) output from the vertex shader that is not a builtin is automatically interpolated by the rasterizer using barycentric weights. At each vertex, the value is exact. Inside the triangle, it is the weighted blend of all three vertex values. The fragment shader receives a different vertex_color for every pixel, without any manual interpolation code.

Key insight #2 — THE LOCATIONS MUST MATCH: shader_location: 0 in Rust's VertexAttribute MUST equal @location(0) in WGSL's parameter annotation. If they differ, the shader reads from the wrong memory offset and produces garbage. This is not a type error or a runtime panic — it is silent data corruption. The GPU reads whatever bytes live at the mismatched offset and interprets them as floats.

@vertex fn vs_main(...) — @vertex declares this function as the vertex shader entry point. The function is invoked once per vertex in the draw call. For our triangle with three vertices, vs_main runs exactly three times.

@location(0) position: vec3<f32> — This input parameter receives data from the vertex buffer mapped by shader_location: 0. In our Rust VertexBufferLayout, the first VertexAttribute reads 3 floats at offset 0 and delivers them to the shader at location 0. This is the raw NDC position.

@location(1) color: vec3<f32> — The second vertex buffer attribute mapped to location 1. Reads 3 floats at the offset after the position (12 bytes into each vertex) — the per-vertex RGB color.

var out: VertexOutput; — Local variable declaration. WGSL requires explicit variable bindings. var creates a mutable local.

out.clip_position = vec4<f32>(position, 1.0); — Converts the vec3 input into homogeneous coordinates by appending w = 1.0. This promotes the position from 3D to clip space. With w = 1.0, perspective division (x/w, y/w, z/w) leaves the coordinates unchanged. If we were using perspective projection, the vertex shader would compute a nontrivial w value from the depth.

out.vertex_color = color; — Passes the input color through to the output. The rasterizer picks this field up, interpolates it across the triangle surface, and delivers the interpolated value to every fragment.

@fragment fn fs_main(input: VertexOutput) — @fragment declares the fragment shader entry point. input is the rasterizer's interpolated output from the vertex shader. Every @location(n) field in VertexOutput is now pre-blended. The @builtin(position) field is not interpolated — it is the original vertex position.

-> @location(0) vec4<f32> — The fragment shader must output at least one color value at @location(0). This number must match the corresponding color target in the render pipeline descriptor. The return type is vec4<f32> — RGBA with linear-space components.

return vec4<f32>(input.vertex_color, 1.0); — Promotes the interpolated RGB color to RGBA by setting alpha = 1.0 (fully opaque). The rasterizer interpolated input.vertex_color across the triangle; we just attach an alpha channel and return it. The output merge stage writes this color directly to the framebuffer.

Rust Shader Module Creation

The Rust side loads the shader file at compile time and feeds the source to wgpu:

let shader_module = device.create_shader_module(
    wgpu::ShaderModuleDescriptor {
        label: Some("Rainbow Triangle Shader"),
        source: wgpu::ShaderSource::Wgsl(include_str!("shader.wgsl").into()),
    }
);

ShaderModuleDescriptor — has two fields: label (debug string, shown in graphics debuggers and validation messages) and source (the shader text).
ShaderSource::Wgsl(...) — wraps the WGSL string. wgpu also accepts SPIR-V binary source via ShaderSource::SpirV, but WGSL is the native path.
device.create_shader_module() — takes the descriptor and parses + validates the shader. On Vulkan, wgpu translates WGSL to SPIR-V internally. If the shader has syntax errors, type mismatches, or unresolved entry points, this call returns an error.
&shader_module — the resulting handle is passed by reference into the render pipeline descriptor. The module remains valid for the lifetime of the pipeline.

S5: Uploading Vertex Data to the GPU

New concept: GPU memory isolation. The GPU cannot read Rust heap or stack memory directly. Vertex data must be laid out as a flat byte array and uploaded into a dedicated GPU [buffer slice]. The pipeline configuration then describes how to interpret those bytes: how many bytes per vertex, what format each attribute has, and where in the vertex strides the attribute begins.

Key insight #3 — create_buffer_init is an extension trait: The method lives in wgpu::util::DeviceExt, not on Device directly. If you call device.create_buffer_init(...) without importing the trait, the compiler reports "method not found." This is a Rust trait-discovery issue, not a wgpu API issue. Add use wgpu::util::DeviceExt; to bring the method into scope.

The Vertex Struct

#[repr(C)]
#[derive(Clone, Copy, bytemuck::Pod, bytemuck::Zeroable)]
struct Vertex {
    position: [f32; 3],
    color: [f32; 3],
}

#[repr(C)] — Forces the Rust compiler to lay out the struct fields in declaration order with no padding reordering. Without this, Rust is free to reorder fields for optimal alignment, which would break the byte layout the shader expects.
bytemuck::Pod — "Plain Old Data." Guarantees the struct has no padding holes, no destructors, and a trivial memory representation. wgpu requires all vertex types to be Pod so they can be safely transmuted to bytes.
bytemuck::Zeroable — Guarantees that initializing the struct's memory to all-zero bytes produces a valid instance. Required because Pod alone does not guarantee zero is a valid discriminant for enums or optional types. Combined with Pod, it enables bytemuck::cast_slice to convert between &[Vertex] and &[u8] without a unsafe block.

Vertex Data

const VERTICES: &[Vertex] = &[
    Vertex { position: [-0.5, -0.5, 0.0], color: [1.0, 0.0, 0.0] }, // red
    Vertex { position: [ 0.5, -0.5, 0.0], color: [0.0, 0.0, 1.0] }, // blue
    Vertex { position: [ 0.0,  0.5, 0.0], color: [0.0, 1.0, 0.0] }, // green
];

Positions are in NDC: The normalized device coordinates range from -1.0 (left/bottom) to +1.0 (right/top). Our triangle spans the bottom half of the screen: the bottom-left corner at (-0.5, -0.5), the bottom-right at (0.5, -0.5), and the top center at (0.0, 0.5). This produces an upright, centered triangle.
CCW winding order: The vertices are listed counter-clockwise: red → blue → green. In a standard right-handed coordinate system, connecting vertices in this sequence traces the triangle counter-clockwise. This determines which face is "front" and which is "back" — critical for culling and correct normal computation.

Buffer Upload

use wgpu::util::DeviceExt;
let vertex_buffer = device.create_buffer_init(
    &wgpu::util::BufferInitDescriptor {
        label: Some("Vertex Buffer"),
        contents: bytemuck::cast_slice(VERTICES),
        usage: wgpu::BufferUsages::VERTEX,
    }
);

use wgpu::util::DeviceExt — imports the extension trait that adds create_buffer_init to Device. Without this import, the method is not visible.
device.create_buffer_init(...) — combined allocate-and-upload. It creates a GPU buffer, allocates system memory, copies the contents slice into staging storage, and issues a synchronous copy to GPU memory. This is a convenience wrapper around create_buffer + queue.write_buffer.
bytemuck::cast_slice(VERTICES) — converts &[Vertex; 3] to &[u8] by reinterpreting the same memory at a byte level. The GPU receives 72 bytes: three vertices × 24 bytes per vertex (6 × f32 = 6 × 4 bytes). No copy, no serialization — just a pointer reinterpretation.
BufferUsages::VERTEX — declares this buffer will be bound as a vertex buffer in the pipeline. wgpu's validation layer will reject any attempt to use this buffer for staging, uniform, or storage access. Usage bits are chosen at creation and cannot be changed.

S6: Compiling the Render Pipeline

New concept: the render pipeline is a compiled GPU configuration. A render pipeline bundles every decision the GPU needs to execute a draw: which shaders to run, how to interpret vertex buffer bytes, what topology to use, whether to cull back faces, what blend mode to apply, and where to write the output. Pipeline creation is not a simple struct allocation — it compiles these choices into a GPU-executable configuration. Errors in any field are caught at creation time, not at draw time. This validation-upfront model is what makes pipelines expensive to create but cheap to execute.

Vertex Buffer Layout

Before the pipeline descriptor, you must tell wgpu how to parse the byte stream in the vertex buffer into per-vertex attributes:

let vertex_buffer_layout = wgpu::VertexBufferLayout {
    array_stride: std::mem::size_of::<Vertex>() as u64,
    step_mode: wgpu::VertexStepMode::Vertex,
    attributes: &[
        wgpu::VertexAttribute {
            offset: 0,
            format: wgpu::VertexFormat::F32x3,
            shader_location: 0,
        },
        wgpu::VertexAttribute {
            offset: std::mem::size_of::<[f32; 3]>() as u64,
            format: wgpu::VertexFormat::F32x3,
            shader_location: 1,
        },
    ],
};

array_stride: 24 — size_of::<Vertex>() = 24 bytes (6 × f32 × 4 bytes). This is the byte distance from one vertex to the next in the buffer. The GPU uses this to step through the buffer: vertex 0 starts at byte 0, vertex 1 at byte 24, vertex 2 at byte 48.
step_mode: Vertex — advance the buffer by one stride for every vertex the vertex shader processes. The other option is Instance, which advances per draw instance in instanced rendering. For a single triangle, Vertex is correct: each of the three vertices has its own position and color.
First attribute — shader_location: 0: reads 3 floats (F32x3) at byte offset 0 of each vertex. These 3 floats map to the shader location @location(0) in the vertex shader — the position parameter. The GPU delivers [x, y, z] to that function argument.
Second attribute — shader_location: 1: reads 3 floats at offset 12 (size_of::<[f32; 3]>() = 3 × 4 = 12). Skips past the position array to the color array inside each vertex. Maps to @location(1) in the shader — the color parameter. If the offset were 0 instead of 12, the shader would receive the position values as the color input, rendering a triangle with gradient colors derived from position data.

The Complete Render Pipeline Descriptor

let pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
    label: Some("Triangle Pipeline"),
    layout: None,
    vertex: wgpu::VertexState {
        module: &shader_module,
        entry_point: Some("vs_main"),
        buffers: &[vertex_buffer_layout],
        compilation_options: Default::default(),
    },
    primitive: wgpu::PrimitiveState {
        topology: wgpu::PrimitiveTopology::TriangleList,
        strip_index_format: None,
        front_face: wgpu::FrontFace::Ccw,
        cull_mode: Some(wgpu::Face::Back),
        unclipped_depth: false,
        polygon_mode: wgpu::PolygonMode::Fill,
        conservative: false,
    },
    depth_stencil: None,
    multisample: wgpu::MultisampleState {
        count: 1,
        mask: !0,
        alpha_to_coverage_enabled: false,
    },
    fragment: Some(wgpu::FragmentState {
        module: &shader_module,
        entry_point: Some("fs_main"),
        targets: &[Some(wgpu::ColorTargetState {
            format: config.format,
            blend: None,
            write_mask: wgpu::ColorWrites::ALL,
        })],
        compilation_options: Default::default(),
    }),
    multiview_mask: None,
    cache: None,
});

Field-by-Field Walkthrough

RenderPipelineDescriptor has 9 fields. Every field must be present. The structure does not use ..Default::default() at the descriptor level — each field is filled explicitly.

label: Some("Triangle Pipeline") — Debug string. Shown in GPU profilers (RenderDoc, Nvidia Nsight) and wgpu validation error messages. Omitting it produces anonymous pipelines that are impossible to trace during debugging.

layout: None — Derives the pipeline layout from the shader module automatically. When no push constants or bind groups are used, None tells wgpu to infer the layout. If you later add @group(n) bindings to your shader, you must provide a RenderPipelineLayout created with device.create_render_pipeline_layout().

vertex — VertexState (4 fields):

module: &shader_module — references the compiled shader module from S4.
entry_point: Some("vs_main") — selects which function in the module is the vertex shader entry point. Must match the @vertex fn vs_main(...) declaration exactly.
buffers: &[vertex_buffer_layout] — array of vertex buffer layouts. Multiple layouts are used rarely (multi-mesh, GPU instancing with separate instance buffers). For a single vertex buffer, one layout suffices.
compilation_options: Default::default() — shader compilation backend hints. Default uses the backend's standard flags for optimization and SPIR-V version.

primitive — PrimitiveState (7 fields):

topology: TriangleList — every 3 consecutive vertices form one triangle. For 3 vertices, this produces exactly 1 triangle. If we had 6 vertices, it would produce 2 independent triangles.
strip_index_format: None — only set for TriangleStrip or LineStrip topologies when using restart indices. Not applicable to TriangleList.
front_face: Ccw — counter-clockwise winding defines the front face of a triangle. Combined with cull_mode, this determines which triangles are visible. Because our vertices are listed CCW in S5, triangles drawn in that order face toward the viewer.
cull_mode: Some(wgpu::Face::Back) — discard triangles whose winding indicates a back face. For a single triangle viewed from the front, this is harmless but establishes correct culling for 3D geometry where back faces are guaranteed not to be visible.
unclipped_depth: false — depth values outside [0.0, 1.0] are clipped (the standard behavior). true allows depth values beyond the normal range to pass through — used for specific depth-testing tricks.
polygon_mode: Fill — render the full interior of the triangle. Other options are Line (wireframe edges) and Point (vertex points only).
conservative: false — the rasterizer fragments only pixels provably inside the triangle. true fragments every pixel that might intersect the triangle — used for conservative rasterization (shadow volumes, occlusion queries).

depth_stencil: None — No depth buffer or stencil buffer. Without depth testing, triangles are drawn in submission order: later draws overwrite earlier draws at the same pixel. For a single triangle this is not a concern.

multisample — MultisampleState (3 fields):

count: 1 — no multisampling. Each pixel produces one fragment. Higher values (2, 4, 8) activate MSAA, sampling multiple points per pixel and reducing aliasing at the cost of framebuffer bandwidth.
mask: !0 — all sample bits are enabled. This mask allows you to selectively disable individual MSAA samples (advanced use case).
alpha_to_coverage_enabled: false — do not use the alpha channel of the fragment color as a coverage mask. Enabled for transparent edge antialiasing (e.g., font rendering).

fragment — FragmentState (4 fields):

module: &shader_module — same shader module as the vertex shader.
entry_point: Some("fs_main") — selects the fragment shader entry point. Must match @fragment fn fs_main(...) in the WGSL.
targets — array of color target states, one per render pass output attachment. &[Some(...)] means one color target present. None at this index would mean a render pass with no color output (e.g., depth-only pass).
- ColorTargetState has exactly 3 fields (no view_formats field):
  - format: config.format — MUST match the surface format from SurfaceConfiguration. The pipeline writes in this format; the surface reads in this format. A mismatch at render time produces an error. If you change the surface format, you must recreate the pipeline.
  - blend: None — disables blending. Without blending, every fragment color replaces the existing framebuffer pixel (REPLACE mode). With blending, new and existing colors are combined according to a blend equation (useful for transparency).
  - write_mask: ColorWrites::ALL — write all four RGBA channels. You can mask out individual channels (e.g., write only R and G) if you need to preserve certain framebuffer channels across draw calls.
compilation_options: Default::default() — fragment shader compilation flags, same as the vertex compilation options above.

multiview_mask: None — no multiview rendering. Multiview is for stereoscopic (VR) or multi-viewport single-pass rendering. Not used here.

cache: None — no pipeline cache. A pipeline cache stores compiled shader binaries to speed up subsequent pipeline creation. Useful when creating many pipelines dynamically; for a single pipeline, caching has no practical benefit.

40 KiB Raw Blame History Unescape Escape