Starter stack

Ollama + Open WebUI Starter Stack

A practical local AI stack for beginners: Ollama runs the models, and Open WebUI gives you a private browser-based chat workspace on top.

Stack components

  • Ollama: local model runtime and API.
  • Open WebUI: self-hosted chat interface for local and hosted model backends.
  • Models: start with smaller Qwen, Gemma, Phi, Mistral, or DeepSeek distilled models.
  • Optional RAG layer: add Qdrant, Chroma, or another vector database later.

Hardware requirements

Start smaller than you think. A reliable 7B or 8B quantized model is a better first milestone than a large model that barely fits and makes every test slow.

TierHardwareModel guidance
Minimum16 GB RAM, CPU-only possibleUse 3B-8B quantized models
Recommended32 GB RAM, NVIDIA GPU with 12 GB+ VRAMUse 7B-14B models comfortably
Best64 GB RAM, 24 GB+ VRAM or server hardwareTry larger 32B-70B quantized models

Recommended first models

Setup path

  1. Install Ollama on your local machine.
  2. Pull one small model first instead of starting with a huge checkpoint.
  3. Run a few prompts directly through Ollama.
  4. Install or run Open WebUI and connect it to Ollama.
  5. Create a test workspace and compare two or three models on the same prompts.
  6. Add RAG, vector search, or MCP tools only after the basic chat stack is stable.

Best for

  • Beginners building their first private local AI setup
  • Developers testing local models before wiring them into apps
  • Teams that want a private browser chat interface over local models
  • Builders who want a foundation for RAG, coding assistants, and MCP workflows

Related pages