Guide
How to Choose a Model for Coding, RAG, Summarization, and Agents
The best model is the one that performs reliably on your task, budget, latency target, and license constraints.
Who this is for
Developers comparing open models for practical apps.
Recommended stack
- Qwen or DeepSeek for coding tests
- E5 or BGE for retrieval
- Qwen, Llama, Mistral, or Gemma for chat
Coding
Use real repo tasks and measure patch quality, not just code benchmark claims.
RAG
Separate embedding, retrieval, reranking, and answer generation choices. A better retriever can beat a bigger generator.
Agents
Prioritize tool-call reliability, context handling, and recovery from mistakes.
Practical recommendations
- Build a 20-question eval set
- Track latency and cost
- Record model and quantization version
Tradeoffs
Leaderboard performance does not guarantee performance on your prompts, users, or documents.
Related links
FAQ
Should I trust public benchmarks?
Use them as a shortlist signal, then run your own evaluation on real tasks.
Sources
Next steps
Use the model and tool directories to choose the concrete pieces for your local AI stack. Sponsor and affiliate placements will be added later.