Live Demo — Browser-Native GPU Inference

Zero server calls.
Everything runs on your GPU.

Model loaded into GPU memory via ONNX Runtime Web. WGSL compute shaders drive the particle stream — inference and rendering happen in the same browser tick. No API. No backend. Just your machine.

WebGPU
Initializing WebGPU...
WebGPU: checking…
Model: loading…
Inference: ready

Token Output

Model loading… inference pipeline will appear here when ready.
tokens/sec
ms/token
model loaded
GPU backend
Back to landing