Browser AI playground · Beta demo

Try small open-weight AI models in your browser

Test lightweight WebLLM-compatible models with WebGPU before installing a full local stack. This is a beta browser demo, not a replacement for Ollama, LM Studio, Open WebUI, or production serving.

What this playground does

  • Runs small model demos in the browser when WebGPU is available.
  • Shows model download progress, streaming output, and rough tokens/sec.
  • Keeps prompts client-side in this MVP; OpenSourcesAI does not run playground prompts on its own inference server.
  • Downloads compatible model files through the browser; those files may be cached locally by your browser and may involve requests to third-party model/CDN hosts.
  • Uses this browser demo as a lightweight preview before following full local setup guides.

Beta and privacy note: small browser models can be inaccurate, incomplete, slow, or unsupported on some devices. Prompt inference runs in your browser session in this MVP, but ordinary page-load metadata, hosting logs, analytics events, and third-party model download requests may still be processed. Check each model card and license before using outputs in production.

Included WebLLM model IDs

The selector uses WebLLM-compatible model IDs rather than generic model names. Compatibility, memory limits, browser support, and download size can change, so treat these as demo options and verify against WebLLM and model publisher documentation.

  • Llama 3.2 1B Instruct: Llama-3.2-1B-Instruct-q4f16_1-MLC · ~879MB VRAM required · 4k context
  • Llama 3.2 1B Instruct q4f32: Llama-3.2-1B-Instruct-q4f32_1-MLC · ~1.1GB VRAM required · 4k context
  • Llama 3.2 3B Instruct: Llama-3.2-3B-Instruct-q4f16_1-MLC · ~2.3GB VRAM required · 4k context
  • Llama 3.2 3B Instruct q4f32: Llama-3.2-3B-Instruct-q4f32_1-MLC · ~3GB VRAM required · 4k context
  • Llama 3.1 8B Instruct q4f16 1k: Llama-3.1-8B-Instruct-q4f16_1-MLC-1k · ~4.6GB VRAM required · 1k context

Model and project sources

Browser model demo beta

This playground loads a small WebLLM-compatible model in your browser with WebGPU. The first load can download hundreds of MB to several GB.

Prompts are processed client-side in this browser session. OpenSourcesAI does not run these prompts on a server in this MVP.

Small browser models can be inaccurate. Use this playground to test local inference behavior, speed, and prompt feel — not as a verified product-facts source.

Inference is powered by WebLLM from MLC AI. Model weights are provided by their respective publishers and are governed by their own licenses, including Meta Llama licenses where applicable.

Compatibility

Status: checking

Selected model: Llama 3.2 1B Instruct

Hardware note: Fast first test for most WebGPU laptops/desktops

Device note: Checking browser support...

Reference checklist

This fixed checklist is the reliable version. The chat below is only a small-model demo and may phrase things imperfectly.

  • WebGPU and browser support: Use a current Chrome or Edge build first, confirm WebGPU is detected, and check the browser console if loading fails.
  • Model loading and speed: Expect the first model load to download hundreds of MB to several GB. Treat tokens/sec as a rough output-speed estimate, not a quality score.
  • Browser cache and storage: Browser caching of model files is expected. Clear site data if you need to reclaim storage or reset a failed download.
  • Prompt privacy limits: Prompts run in this browser session in the MVP. Normal hosting logs, analytics events, and third-party model download requests may still occur.
  • When to move to a full local stack: Use Ollama, LM Studio, or Open WebUI when you need larger models, stable model storage, RAG over files, repeatable APIs, multi-user use, or better GPU control.

Choose model and behavior

Fast browser sanity checks, local AI explanations, short chat, and beginner local AI demos.

Small instruction model for checking whether the browser playground works on your device before trying larger models.

Model loading

Model not loaded yet.

Chat

Output from small browser models may be incomplete or wrong. For factual setup guidance, use the linked OpenSourcesAI guides and tool pages.

No messages yet

Load a model, use an example prompt, or type a prompt and the playground will load the selected model first.

Build the full local stack

The browser demo is useful for quick testing, but the full stack gives you more control over models, storage, privacy, GPU usage, documents, and integrations.