diff --git a/docs/01-rainbow-triangle.md b/docs/01-rainbow-triangle.md index 24ff2eb..5f6cef5 100644 --- a/docs/01-rainbow-triangle.md +++ b/docs/01-rainbow-triangle.md @@ -27,22 +27,18 @@ code and the display server (X11 or Wayland on Linux). Think of it like `epoll` or `kqueue` but for windows, input, and display lifecycle events instead of file descriptors. -The entire program runs on the tokio async runtime — wgpu's [adapter](concepts/GLOSSARY.md#adapter) -queries and [device](concepts/GLOSSARY.md#device) creation are async, and the -runtime is the natural home for the main event loop. +GPU initialization (adapter and device queries) is async, while the winit event loop runs synchronously. We bridge these two execution models once at startup using `pollster::block_on`. ### Architecture Overview -- **`main()` is `#[tokio::main] async fn`** — the entry point runs on the tokio - runtime, giving us access to tokio's task scheduler and I/O facilities. -- **`tokio::spawn_blocking`** — winit's `event_loop.run_app()` is synchronous - and owns the display server connection. Blocking the tokio runtime thread with - an indefinite sync call would starve other tasks. We offload the blocking event - loop to a dedicated thread, then await the join handle. -- **`Handle::block_on()` in `resumed()`** — wgpu initialization (adapter and +- **`main()` is synchronous** — the entry point initializes logging, creates + the event loop, and calls `run_app()`. The entire program runs on a single + thread. No async runtime is needed for the main loop. +- **`pollster::block_on()` in `resumed()`** — wgpu initialization (adapter and device queries) is async, but winit's `resumed()` handler is synchronous. We - bridge the two execution models exactly once at startup. This initial GPU - setup takes ~50ms of wall time. + bridge the two execution models exactly once at startup using `pollster`, + a minimal single-threaded async executor. This initial GPU setup takes + ~50ms of wall time. - **`Arc`** — shared reference count to the window, needed because both winit event handlers and wgpu [surface](concepts/GLOSSARY.md#surface) state must hold a reference to the same window object across the event loop @@ -60,17 +56,16 @@ Add these to your `Cargo.toml`: ```toml wgpu = "29" winit = "0.30" -tokio = { version = "1", features = ["rt", "macros"] } +pollster = "0.4" bytemuck = { version = "1", features = ["derive"] } log = "0.4" -simple_logger = "5" ``` - `wgpu` — the GPU abstraction layer. Manages device lifecycles, shaders, buffers, pipelines, and command encoding. - `winit` — cross-platform window creation and event dispatch. Owns the display server connection. -- `tokio` — async runtime for the main loop and all GPU queries. +- `pollster` — minimal single-threaded async executor. Bridges wgpu's async GPU queries with synchronous winit callbacks. Polls futures to completion (~50ms) during initial GPU setup, then returns. - `bytemuck` — zero-copy casting between Rust structs and byte slices. Required for uploading vertex data to GPU buffers without manual serialization. - `log` / `simple_logger` — structured logging. wgpu and winit emit diagnostic @@ -79,6 +74,7 @@ simple_logger = "5" ### Complete Code ```rust +use pollster::block_on; use std::sync::Arc; use winit::application::ApplicationHandler; use winit::dpi::LogicalSize; @@ -86,26 +82,19 @@ use winit::event::WindowEvent; use winit::event_loop::{ActiveEventLoop, ControlFlow, EventLoop}; use winit::window::{Window, WindowId}; -#[tokio::main] -async fn main() { +fn main() { simple_logger::init_with_level(log::Level::Debug).unwrap(); let event_loop = EventLoop::new().unwrap(); - let handle = tokio::Handle::current(); - tokio::spawn_blocking(move || { - event_loop.run_app(&mut App { - handle, - window: None, - state: None, - }) + event_loop.run_app(&mut App { + window: None, + state: None, }) - .await .unwrap(); } struct App { - handle: tokio::Handle, window: Option>, state: Option, } @@ -125,11 +114,10 @@ impl ApplicationHandler<()> for App { self.window = Some(window.clone()); self.state = Some( - self.handle - .block_on(async { - State::new(window.clone()).await.expect("Failed to create wgpu State") - }) - .expect("Failed to create wgpu State"), + block_on(async { + State::new(window.clone()).await.expect("Failed to create wgpu State") + }) + .expect("Failed to create wgpu State"), ); } @@ -159,13 +147,9 @@ impl ApplicationHandler<()> for App { } ``` -> **WHY: `spawn_blocking` for winit** +> **WHY: `pollster::block_on` for async GPU init** > -> The display server event loop must run to completion and cannot be interrupted. If we ran `run_app()` on the tokio runtime thread, no other async tasks could execute. By spawning it on a blocking thread, the tokio runtime remains free for GPU queries, driver I/O, and future background tasks. - -> **WHY: `Handle::block_on` for async GPU init** -> -> wgpu's `request_adapter` and `request_device` query the driver over async D-Bus/Wayland/Vulkan entrypoints. These futures must be polled by a runtime executor. `block_on` attaches temporarily to the runtime thread via its handle, polls the future to completion (~50ms), then returns the result. +> wgpu's `request_adapter` and `request_device` query the driver over async D-Bus/Wayland/Vulkan entrypoints. These futures must be polled by a runtime executor. We use `pollster`, a minimal single-threaded async executor, to bridge wgpu's async GPU initialization with winit's synchronous `resumed()` callback. `pollster::block_on` polls the future to completion (~50ms) on the current thread, then returns the result — no background runtime, no spawn overhead, no cross-thread communication. > **WHY: `ControlFlow::Poll` for the render loop** > @@ -307,7 +291,7 @@ const VERTICES: &[Vertex] = &[ impl State { async fn new(window: Arc) -> Result { // Step 1: Instance — connection to the graphics driver - let instance = wgpu::Instance::default(); + let instance = wgpu::Instance::new(wgpu::InstanceDescriptor::new_without_display_handle()); // Step 2: Surface — binds our window to the GPU's swapchain let surface = instance @@ -375,12 +359,12 @@ impl State { attributes: &[ wgpu::VertexAttribute { offset: 0, - format: wgpu::VertexFormat::F32x3, + format: wgpu::VertexFormat::Float32x3, shader_location: 0, }, wgpu::VertexAttribute { offset: std::mem::size_of::<[f32; 3]>() as u64, - format: wgpu::VertexFormat::F32x3, + format: wgpu::VertexFormat::Float32x3, shader_location: 1, }, ], @@ -392,7 +376,7 @@ impl State { vertex: wgpu::VertexState { module: &shader_module, entry_point: Some("vs_main"), - buffers: &[vertex_buffer_layout], + buffers: &[Some(&vertex_buffer_layout)], compilation_options: Default::default(), }, primitive: wgpu::PrimitiveState { @@ -415,7 +399,7 @@ impl State { entry_point: Some("fs_main"), targets: &[Some(wgpu::ColorTargetState { format: config.format, - blend: None, + blend: Some(wgpu::BlendState::REPLACE), write_mask: wgpu::ColorWrites::ALL, })], compilation_options: Default::default(), @@ -751,7 +735,7 @@ the master `State::new()` block (S3, Step 8): the vertex shader processes. The other option is `Instance`, which advances per draw instance in instanced rendering. For a single triangle, `Vertex` is correct: each of the three vertices has its own position and color. -- **First attribute — `shader_location: 0`**: reads 3 floats (`F32x3`) at byte +- **First attribute — `shader_location: 0`**: reads 3 floats (`Float32x3`) at byte offset 0 of each vertex. These 3 floats map to the [shader location](concepts/GLOSSARY.md#shader-location) `@location(0)` in the vertex shader — the `position` parameter. The GPU delivers `[x, y, z]` to @@ -789,7 +773,7 @@ must provide a `RenderPipelineLayout` created with `device.create_render_pipelin - **`entry_point: Some("vs_main")`** — selects which function in the module is the vertex shader entry point. Must match the `@vertex fn vs_main(...)` declaration exactly. -- **`buffers: &[vertex_buffer_layout]`** — array of vertex buffer layouts. +- **`buffers: &[Some(&vertex_buffer_layout)]`** — array of optional vertex buffer layouts. Each layout is wrapped in `Some` to indicate it is present. Multiple layouts are used rarely (multi-mesh, GPU instancing with separate instance buffers). For a single vertex buffer, one layout suffices. - **`compilation_options: Default::default()`** — shader compilation backend @@ -846,10 +830,11 @@ draws at the same pixel. For a single triangle this is not a concern. `SurfaceConfiguration`. The pipeline writes in this format; the surface reads in this format. A mismatch at render time produces an error. If you change the surface format, you must recreate the pipeline. - - **`blend: None`** — disables blending. Without blending, every fragment - color replaces the existing framebuffer pixel (`REPLACE` mode). With - blending, new and existing colors are combined according to a blend - equation (useful for transparency). + - **`blend: Some(wgpu::BlendState::REPLACE)`** — explicitly replaces every + fragment color with the new output. `None` would default to this behavior, + but we make it explicit for clarity. With a custom blend state, new and + existing colors can be combined according to a blend equation + (useful for transparency). - **`write_mask: ColorWrites::ALL`** — write all four RGBA channels. You can mask out individual channels (e.g., write only R and G) if you need to preserve certain framebuffer channels across draw calls. @@ -903,8 +888,8 @@ fn render(&mut self) { ``` This is a **fully synchronous** method. It runs on the winit event loop thread -(triggered by `RedrawRequested`), has no `async` keyword, no `.await`, and takes -no tokio handle. All wgpu recording and submission operations are synchronous +(triggered by `RedrawRequested`), has no `async` keyword, no `.await`, and requires +no async runtime. All wgpu recording and submission operations are synchronous and fast — they only encode instructions and push them to the queue; they do not wait for GPU completion. @@ -997,12 +982,7 @@ fn render(&mut self) { depth_slice: None, resolve_target: None, ops: wgpu::Operations { - load: wgpu::LoadOp::Clear(wgpu::Color { - r: 0.1, - g: 0.1, - b: 0.1, - a: 1.0, - }), + load: wgpu::LoadOp::Clear(wgpu::Color::BLACK), store: wgpu::StoreOp::Store, }, })], @@ -1076,7 +1056,7 @@ color_attachments: &[Some(wgpu::RenderPassColorAttachment { depth_slice: None, resolve_target: None, ops: wgpu::Operations { - load: wgpu::LoadOp::Clear(wgpu::Color { r: 0.1, g: 0.1, b: 0.1, a: 1.0 }), + load: wgpu::LoadOp::Clear(wgpu::Color::BLACK), store: wgpu::StoreOp::Store, }, })], @@ -1094,7 +1074,7 @@ color_attachments: &[Some(wgpu::RenderPassColorAttachment { - **`ops`** — [operations](concepts/GLOSSARY.md#operations) controlling load and store behavior. Two sub-fields: - **`load: LoadOp::Clear(color)`** — before drawing, fill the entire - framebuffer with this color. **This IS your background color.** Dark gray. + framebuffer with this color. **This IS your background color.** Black. `LoadOp::Load` keeps existing pixels (used in UI compositing where you draw on top of previous content). - **`store: StoreOp::Store`** — after drawing, keep what was written. The @@ -1209,7 +1189,7 @@ and returns the original vector. This avoids a heap allocation for what is typically a 1-element vec. ```rust -fn resize(&mut self, size: wgpu::dpi::PhysicalSize) { +fn resize(&mut self, size: winit::dpi::PhysicalSize) { if size.width > 0 && size.height > 0 { let config = wgpu::SurfaceConfiguration { usage: self.config.usage, @@ -1300,7 +1280,7 @@ cargo run module compilation log, pipeline creation messages, and the `simple_logger` debug lines from surface status and device polling. -**Expected visual:** A dark gray background (from `LoadOp::Clear`) with a +**Expected visual:** A black background (from `LoadOp::Clear(wgpu::Color::BLACK)`) with a rainbow triangle spanning most of the window. Red at the bottom-left corner, blue at the bottom-right corner, green at the top vertex. Colors blend smoothly across the triangle surface via hardware interpolation.