Model family
Llama Models
Meta Llama models are widely supported open-weight options for local AI stacks, multimodal workflows, assistant prototypes, and guardrail experiments.
Best for
Chat
Use this family hub to compare Llama variants for chat workflows, then open the detail page for deeper deployment notes.
Multimodal
Use this family hub to compare Llama variants for multimodal workflows, then open the detail page for deeper deployment notes.
Local
Use this family hub to compare Llama variants for local workflows, then open the detail page for deeper deployment notes.
Safety
Use this family hub to compare Llama variants for safety workflows, then open the detail page for deeper deployment notes.
Variants
Llama models grouped by workflow
Latest / flagship
Llama 4 Scout
Meta · Llama
Best for: Teams evaluating Llama-family models for multimodal assistant, long-context, and application workflows.
Local: Evaluate local fit with the exact checkpoint and quantization available for your runtime.
Llama 4 Maverick
Meta · Llama
Best for: Builders comparing current Llama-family models for assistant, multimodal, and reasoning-oriented workflows.
Local: Evaluate local fit with the exact checkpoint and quantization available for your runtime.
Llama 3.1 405B Instruct
Meta · Llama
Best for: Server-class assistant evaluation and comparisons against smaller Llama variants.
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
Vision / multimodal
Safety / guardrails
Local-friendly
Llama 3 70B
Meta · Llama
Best for: Builders who want a widely supported open-weight chat model with broad runtime compatibility.
Local: Commonly used in local workflows through quantized builds, but 70B-class models are best with high-memory GPUs or workstation/server hardware.
Llama 3.3 70B Instruct
Meta · Llama
Best for: General assistant workflows, app prototypes, and Llama-family baseline comparisons.
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
Llama 3.1 70B Instruct
Meta · Llama
Best for: Teams comparing widely supported Llama-family 70B-class models.
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
Llama 3.1 8B Instruct
Meta · Llama
Best for: Local prototypes, small assistants, and lower-resource evaluation.
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
Llama 3 8B Instruct
Meta · Llama
Best for: Local baseline comparisons and lightweight app prototypes.
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
Compare
All Llama models in the directory
| Model | Type | Best for | Local runner notes | License | Detail |
|---|---|---|---|---|---|
| Llama 3 70B | Chat | Builders who want a widely supported open-weight chat model with broad runtime compatibility. | Commonly used in local workflows through quantized builds, but 70B-class models are best with high-memory GPUs or workstation/server hardware. | Llama 3 Community | Open |
| Llama 4 Scout | Multimodal | Teams evaluating Llama-family models for multimodal assistant, long-context, and application workflows. | Evaluate local fit with the exact checkpoint and quantization available for your runtime. | Llama license / check exact model card | Open |
| Llama 4 Maverick | Multimodal | Builders comparing current Llama-family models for assistant, multimodal, and reasoning-oriented workflows. | Evaluate local fit with the exact checkpoint and quantization available for your runtime. | Llama license / check exact model card | Open |
| Llama 3.3 70B Instruct | Chat | General assistant workflows, app prototypes, and Llama-family baseline comparisons. | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| Llama 3.1 405B Instruct | Chat | Server-class assistant evaluation and comparisons against smaller Llama variants. | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| Llama 3.1 70B Instruct | Chat | Teams comparing widely supported Llama-family 70B-class models. | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| Llama 3.1 8B Instruct | Edge | Local prototypes, small assistants, and lower-resource evaluation. | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| Llama 3 8B Instruct | Edge | Local baseline comparisons and lightweight app prototypes. | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| Llama Guard 3 | Safety | Safety checks, guardrail experiments, and policy classification workflows. | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| Llama 3.2 Vision | Vision | Vision-language experiments, screenshot reasoning, and multimodal app prototypes. | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |