Live Demo — Browser-Native GPU Inference
Zero server calls.
Everything runs on your GPU.
Model loaded into GPU memory via ONNX Runtime Web. WGSL compute shaders drive the particle stream — inference and rendering happen in the same browser tick. No API. No backend. Just your machine.
WebGPU
Token Output
Model loading… inference pipeline will appear here when ready.
—
tokens/sec
—
ms/token
—
model loaded
—
GPU backend