Live Demo — Browser-Native GPU Inference

Zero server calls.
Everything runs on your GPU.

Model loaded into GPU memory via ONNX Runtime Web. WGSL compute shaders drive the particle stream — inference and rendering happen in the same browser tick. No API. No backend. Just your machine.

WebGPU

Initializing WebGPU...

WebGPU: checking…

Model: loading…

Inference: ready

— tok/s

Token Output

Model loading… inference pipeline will appear here when ready.

—

tokens/sec

—

ms/token

—

model loaded

—

GPU backend

Back to landing

Zero server calls.Everything runs on your GPU.

Zero server calls.
Everything runs on your GPU.