18 KiB
Building a Rainbow Triangle
S1: What We're Building
We're creating a window containing a single triangle with smoothly blended colors:
Red at the bottom-left corner, blue at the bottom-right corner, and green at the top vertex. The gradient between each pair of vertices is not computed by you — it is interpolated automatically by the GPU rasterizer in hardware. You provide three vertices, each carrying a position and a color. The rasterizer determines every pixel covered by the triangle and computes the color for that pixel by blending the three vertex colors proportionally to their distance. The result is a smooth rainbow gradient across a single primitive. We do not need a texture, a colormap, or a fragment shader with any branching — just three colored vertices and the default linear interpolation the rasterizer applies to every varying.
If you haven't read the concept overview, do so now. Coordinate systems explains how the GPU positions geometry. Shader basics covers the GPU programs that drive rendering.
S2: The winit Application and Event Loop
New concept: event-driven windowing. winit is the bridge between your Rust
code and the display server (X11 or Wayland on Linux). Think of it like epoll
or kqueue but for windows, input, and display lifecycle events instead of file
descriptors.
The entire program runs on the tokio async runtime — wgpu's adapter queries and device creation are async, and the runtime is the natural home for the main event loop.
Architecture Overview
main()is#[tokio::main] async fn— the entry point runs on the tokio runtime, giving us access to tokio's task scheduler and I/O facilities.tokio::spawn_blocking— winit'sevent_loop.run_app()is synchronous and owns the display server connection. Blocking the tokio runtime thread with an indefinite sync call would starve other tasks. We offload the blocking event loop to a dedicated thread, then await the join handle.Handle::block_on()inresumed()— wgpu initialization (adapter and device queries) is async, but winit'sresumed()handler is synchronous. We bridge the two execution models exactly once at startup. This initial GPU setup takes ~50ms of wall time.Arc<Window>— shared reference count to the window, needed because both winit event handlers and wgpu surface state must hold a reference to the same window object across the event loop boundary.ControlFlow::Poll— continuous redraw mode. winit firesRedrawRequestedas fast as the display server allows the window to be presented, giving us a tight render loop without a separate timer or explicit vsync setup. The display present mode controls the actual vsync behavior.
Dependencies
Add these to your Cargo.toml:
wgpu = "29"
winit = "0.30"
tokio = { version = "1", features = ["rt", "macros"] }
bytemuck = { version = "1", features = ["derive"] }
log = "0.4"
simple_logger = "5"
wgpu— the GPU abstraction layer. Manages device lifecycles, shaders, buffers, pipelines, and command encoding.winit— cross-platform window creation and event dispatch. Owns the display server connection.tokio— async runtime for the main loop and all GPU queries.bytemuck— zero-copy casting between Rust structs and byte slices. Required for uploading vertex data to GPU buffers without manual serialization.log/simple_logger— structured logging. wgpu and winit emit diagnostic messages vialogwhen misconfigurations or driver issues are detected.
Complete Code
use std::sync::Arc;
use winit::application::ApplicationHandler;
use winit::dpi::LogicalSize;
use winit::event::WindowEvent;
use winit::event_loop::{ActiveEventLoop, ControlFlow, EventLoop};
use winit::window::{Window, WindowId};
#[tokio::main]
async fn main() {
simple_logger::init_with_level(log::Level::Debug).unwrap();
let event_loop = EventLoop::new().unwrap();
let handle = tokio::Handle::current();
tokio::spawn_blocking(move || {
event_loop.run_app(&mut App {
handle,
window: None,
state: None,
})
})
.await
.unwrap();
}
struct App {
handle: tokio::Handle,
window: Option<Arc<Window>>,
state: Option<State>,
}
impl ApplicationHandler<()> for App {
fn resumed(&mut self, event_loop_ctl: &ActiveEventLoop) {
let window = Arc::new(
event_loop_ctl
.create_window(
Window::default_attributes()
.with_inner_size(LogicalSize::new(800.0, 600.0))
.with_title("Rainbow Triangle"),
)
.unwrap(),
);
event_loop_ctl.set_control_flow(ControlFlow::Poll);
self.window = Some(window.clone());
self.state = Some(
self.handle
.block_on(async {
State::new(window.clone()).await.expect("Failed to create wgpu State")
})
.expect("Failed to create wgpu State"),
);
}
fn window_event(
&mut self,
event_loop_ctl: &ActiveEventLoop,
_window_id: WindowId,
event: WindowEvent,
) {
let Some(state) = self.state.as_mut() else { return };
let Some(window) = self.window.as_ref() else { return };
match event {
WindowEvent::Resized(size) => state.resize(window, size),
WindowEvent::CloseRequested { .. } => event_loop_ctl.exit(),
WindowEvent::RedrawRequested => {
state.render();
window.request_redraw();
}
_ => {}
}
}
fn exiting(&mut self, event_loop_ctl: &ActiveEventLoop) {
event_loop_ctl.exit();
}
}
Why spawn_blocking: The display server event loop must run to completion
and cannot be interrupted. If we ran run_app() on the tokio runtime thread,
no other async tasks could execute. By spawning it on a blocking thread, the
tokio runtime remains free for GPU queries, driver I/O, and future background
tasks.
Why Handle::block_on: wgpu's request_adapter and request_device query
the driver over async D-Bus/Wayland/Vulkan entrypoints. These futures must be
polled by a runtime executor. block_on attaches temporarily to the runtime
thread via its handle, polls the future to completion (~50ms), then returns the
result.
Why ControlFlow::Poll: winit supports ControlFlow::Poll (continuous
redraw) and ControlFlow::Wait (idle until next event). A graphics application
needs a steady render loop. Poll tells winit to keep firing RedrawRequested
events. We re-queue ourselves inside the handler via window.request_redraw(),
matching the wgpu swapchain presentation rhythm.
Why request_redraw(): After presenting a frame to the display, we ask
winit to schedule the next RedrawRequested frame. This creates an explicit
render loop: render → present → request redraw → render → repeat. The rate is
governed by the swapchain present mode.
Why exiting(): This is the final lifecycle signal before the process
terminates. On some display servers, CloseRequested fires on the window but
the event loop must still drain. exiting() ensures we have one last clean
opportunity to flush the queue and release GPU resources before the process
exits.
S3: Connecting to the GPU — The Init Chain
New concept: 5-layer GPU connection. Each layer adds a capability:
- Instance — opens a connection to the graphics driver. On Vulkan this loads the Vulkan loader and registers instance-level extensions. On WebGL this picks the browser GPU context.
- Surface — binds the instance to a specific window's swapchain. The surface is the wgpu representation of the window's display buffer.
- Adapter — selects the physical GPU hardware. An adapter wraps the actual driver + silicon pair (e.g., Mesa RADV on AMD, NVIDIA driver on NVIDIA silicon).
- Device + Queue — the device owns all GPU resources (buffers, textures, shaders, pipelines). The queue is the submission channel: you encode work into command buffers and submit them to the queue.
- SurfaceConfiguration — allocates the swapchain framebuffers for this window at a specific resolution and pixel format.
The State Struct
struct State {
surface: wgpu::Surface<'static>,
device: wgpu::Device,
queue: wgpu::Queue,
config: wgpu::SurfaceConfiguration,
pipeline: wgpu::RenderPipeline,
vertex_buffer: wgpu::Buffer,
}
surface— connects to the window's display buffer. The'staticlifetime is safe becauseAppowns the window and lives for the entire lifetime of the process. The surface mediates all swapchain operations.device— owns all GPU resources. Every buffer, texture, shader module, and pipeline created in this guide is a child of the device. When the device is dropped, all its children are freed.queue— the command submission channel. You encode a frame's worth of work into a command buffer, then submit that buffer to the queue. The queue pushes work to the GPU hardware.config— holds the surface's current width, height, pixel format, and present mode. When the window is resized, we reconfigure the surface with updated dimensions.pipeline— the compiled render pipeline. A render pipeline is an immutable configuration combining a shader, a vertex buffer layout, a primitive topology, and a color target setup. Switching pipelines mid-frame is expensive; most applications use a few pipelines and change them between draw calls.vertex_buffer— GPU memory holding our vertex data. The GPU reads position and color data directly from this buffer during the vertex shader stage.
Complete State::new() Implementation
use wgpu::Surface;
// --- Vertex type and data ---
#[repr(C)]
#[derive(Clone, Copy, bytemuck::Pod, bytemuck::Zeroable)]
struct Vertex {
position: [f32; 3],
color: [f32; 3],
}
const VERTICES: &[Vertex] = &[
Vertex { position: [-0.5, -0.5, 0.0], color: [1.0, 0.0, 0.0] }, // red
Vertex { position: [ 0.5, -0.5, 0.0], color: [0.0, 0.0, 1.0] }, // blue
Vertex { position: [ 0.0, 0.5, 0.0], color: [0.0, 1.0, 0.0] }, // green
];
impl State {
async fn new(window: Arc<Window>) -> Result<Self, String> {
// Step 1: Instance — connection to the graphics driver
let instance = wgpu::Instance::default();
// Step 2: Surface — binds our window to the GPU's swapchain
let surface = instance
.create_surface(window)
.map_err(|e| format!("Failed to create surface: {:?}", e))?;
// Step 3: Adapter — selects the physical GPU
let adapter = instance
.request_adapter(&wgpu::RequestAdapterOptions {
power_preference: wgpu::PowerPreference::HighPerformance,
force_fallback_adapter: false,
compatible_surface: None,
})
.await
.ok_or("No GPU adapter found. Ensure Vulkan drivers are installed.")?;
// Step 4: Device + Queue — resource owner + command submission
let (device, queue) = adapter
.request_device(&wgpu::DeviceDescriptor::default(), None)
.await
.map_err(|e| format!("Failed to request device: {:?}", e))?;
// Step 5: SurfaceConfiguration — allocates swapchain framebuffers
let size = window.inner_size();
let surface_caps = surface.get_capabilities(&adapter);
let format = surface_caps.formats.iter()
.find(|f| f.is_srgb())
.copied()
.unwrap_or(surface_caps.formats[0]);
let config = wgpu::SurfaceConfiguration {
usage: wgpu::TextureUsages::RENDER_ATTACHMENT | wgpu::TextureUsages::TEXTURE_BINDING,
format,
width: size.width.max(1),
height: size.height.max(1),
present_mode: wgpu::PresentMode::Mailbox,
desired_maximum_frame_latency: 2,
alpha_mode: surface_caps.alpha_modes[0],
view_formats: vec![format.add_srgb_suffix()],
};
surface.configure(&device, &config);
// Step 6: Compile the shader module
let shader_module = device.create_shader_module(
wgpu::ShaderModuleDescriptor {
label: Some("Rainbow Triangle Shader"),
source: wgpu::ShaderSource::Wgsl(include_str!("shader.wgsl").into()),
}
);
// Step 7: Upload vertex data to GPU memory
use wgpu::util::DeviceExt;
let vertex_buffer = device.create_buffer_init(
&wgpu::util::BufferInitDescriptor {
label: Some("Vertex Buffer"),
contents: bytemuck::cast_slice(VERTICES),
usage: wgpu::BufferUsages::VERTEX,
}
);
// Step 8: Create the render pipeline
let vertex_buffer_layout = wgpu::VertexBufferLayout {
array_stride: std::mem::size_of::<Vertex>() as u64,
step_mode: wgpu::VertexStepMode::Vertex,
attributes: &[
wgpu::VertexAttribute {
offset: 0,
format: wgpu::VertexFormat::F32x3,
shader_location: 0,
},
wgpu::VertexAttribute {
offset: std::mem::size_of::<[f32; 3]>() as u64,
format: wgpu::VertexFormat::F32x3,
shader_location: 1,
},
],
};
let pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
label: Some("Triangle Pipeline"),
layout: None,
vertex: wgpu::VertexState {
module: &shader_module,
entry_point: Some("vs_main"),
buffers: &[vertex_buffer_layout],
compilation_options: Default::default(),
},
primitive: wgpu::PrimitiveState {
topology: wgpu::PrimitiveTopology::TriangleList,
strip_index_format: None,
front_face: wgpu::FrontFace::Ccw,
cull_mode: Some(wgpu::Face::Back),
unclipped_depth: false,
polygon_mode: wgpu::PolygonMode::Fill,
conservative: false,
},
depth_stencil: None,
multisample: wgpu::MultisampleState {
count: 1,
mask: !0,
alpha_to_coverage_enabled: false,
},
fragment: Some(wgpu::FragmentState {
module: &shader_module,
entry_point: Some("fs_main"),
targets: &[Some(wgpu::ColorTargetState {
format: config.format,
blend: None,
write_mask: wgpu::ColorWrites::ALL,
})],
compilation_options: Default::default(),
}),
multiview_mask: None,
cache: None,
});
Ok(Self {
surface,
device,
queue,
config,
pipeline,
vertex_buffer,
})
}
}
Init Steps Explained
Step 1 — Instance: Instance::default() opens a connection to the graphics
driver on the current platform. On Linux with Vulkan, this loads libvulkan.so
and creates a Vulkan VkInstance. On Windows, it loads vulkan-1.dll. The
instance is the foundational wgpu object — every other wgpu operation requires
it.
Step 2 — Surface: instance.create_surface(window) binds the wgpu instance
to the winit Window. This tells the GPU: "the pixels of this window will be
the output of my rendering." In Vulkan terms, this is the first half of creating
a SwapchainKHR. The surface must match the window platform type exactly (X11,
Wayland, Windows, macOS, etc.).
Step 3 — Adapter: request_adapter() queries available GPUs and returns the
best match for the given options. With
PowerPreference::HighPerformance, wgpu prefers a discrete GPU over an
integrated one on hybrid systems (e.g., NVIDIA + Intel Optimus). The
compatible_surface: None path works because our Instance was created without
a display handle; on Linux with Vulkan, the adapter selection remains correct
because the surface itself was created through a compatible instance.
Step 4 — Device + Queue: request_device() allocates the logical GPU
resource manager and its submission queue. The device tracks all GPU memory and
validates API calls. The queue is the submission endpoint — every rendered frame
becomes a command buffer that is submitted
to this queue. On Vulkan, the device corresponds to VkDevice and the queue
to a VkQueue.
Step 5 — SurfaceConfiguration: This allocates the
swapchain framebuffers.
We negotiate the pixel format with the driver (preferring an
sRGB format for correct color display), pick the
window dimensions (clamped to at least 1x1 to allow minimize-and-restore on some
platforms), and select the present mode.
PresentMode::Mailbox is a triple-buffered present mode that provides
consistent 60fps without tearing on most platforms.
desired_maximum_frame_latency: 2 tells the swapchain to keep two frames of
back pressure, smoothing out frame time spikes.
Steps 6 through 8 — shader module compilation, vertex buffer upload, and render pipeline assembly — will be explored in detail in the next sections.