Starter stack
Ollama + Open WebUI Starter Stack
A practical local AI stack for beginners: Ollama runs the models, and Open WebUI gives you a private browser-based chat workspace on top.
Stack components
- Ollama: local model runtime and API.
- Open WebUI: self-hosted chat interface for local and hosted model backends.
- Models: start with smaller Qwen, Gemma, Phi, Mistral, or DeepSeek distilled models.
- Optional RAG layer: add Qdrant, Chroma, or another vector database later.
Hardware requirements
Start smaller than you think. A reliable 7B or 8B quantized model is a better first milestone than a large model that barely fits and makes every test slow.
TierHardwareModel guidance
Minimum16 GB RAM, CPU-only possibleUse 3B-8B quantized models
Recommended32 GB RAM, NVIDIA GPU with 12 GB+ VRAMUse 7B-14B models comfortably
Best64 GB RAM, 24 GB+ VRAM or server hardwareTry larger 32B-70B quantized models
Recommended first models
- Qwen3 Coder for coding experiments.
- Gemma 4 or Gemma-family models for general chat and local testing.
- Phi-4 Mini for small-model laptop tests.
- Mistral Small 3.1 for efficient local chat workflows.
Setup path
- Install Ollama on your local machine.
- Pull one small model first instead of starting with a huge checkpoint.
- Run a few prompts directly through Ollama.
- Install or run Open WebUI and connect it to Ollama.
- Create a test workspace and compare two or three models on the same prompts.
- Add RAG, vector search, or MCP tools only after the basic chat stack is stable.
Best for
- Beginners building their first private local AI setup
- Developers testing local models before wiring them into apps
- Teams that want a private browser chat interface over local models
- Builders who want a foundation for RAG, coding assistants, and MCP workflows