Model family
Gemma Models
Gemma models are useful for efficient local, app, and multimodal workflows, with small-to-mid-size variants that are practical for developers.
Best for
Chat
Use this family hub to compare Gemma variants for chat workflows, then open the detail page for deeper deployment notes.
Local
Use this family hub to compare Gemma variants for local workflows, then open the detail page for deeper deployment notes.
Efficient
Use this family hub to compare Gemma variants for efficient workflows, then open the detail page for deeper deployment notes.
Multimodal
Use this family hub to compare Gemma variants for multimodal workflows, then open the detail page for deeper deployment notes.
Variants
Gemma models grouped by workflow
Latest / flagship
Gemma 4
Google · Gemma
Best for: Developers evaluating Google-backed open-weight models for efficient local apps, hosted prototypes, and multimodal workflows where supported.
Local: Smaller Gemma variants are practical for local testing; larger variants need more VRAM or unified memory.
Gemma 3 4B IT
Google · Gemma
Best for: Efficient local prototypes, app workflows, and Gemma-family comparisons.
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
Coding
Vision / multimodal
Local-friendly
Gemma 3 27B
Google · Gemma
Best for: Developers testing capable medium-sized chat models with broad tooling support.
Local: Can be tested locally with quantized builds on higher-end consumer GPUs or unified-memory systems.
Gemma 3 12B IT
Google · Gemma
Best for: Efficient local prototypes, app workflows, and Gemma-family comparisons.
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
Gemma 3 1B IT
Google · Gemma
Best for: Efficient local prototypes, app workflows, and Gemma-family comparisons.
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
Gemma 2 27B Instruct
Google · Gemma
Best for: Efficient local prototypes, app workflows, and Gemma-family comparisons.
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
Gemma 2 9B Instruct
Google · Gemma
Best for: Efficient local prototypes, app workflows, and Gemma-family comparisons.
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
Gemma 2 2B Instruct
Google · Gemma
Best for: Efficient local prototypes, app workflows, and Gemma-family comparisons.
Local: Use the exact checkpoint and quantization that matches your hardware and latency target.
Compare
All Gemma models in the directory
| Model | Type | Best for | Local runner notes | License | Detail |
|---|---|---|---|---|---|
| Gemma 4 | Chat | Developers evaluating Google-backed open-weight models for efficient local apps, hosted prototypes, and multimodal workflows where supported. | Smaller Gemma variants are practical for local testing; larger variants need more VRAM or unified memory. | Apache 2.0 / check exact model card | Open |
| Gemma 3 27B | Chat | Developers testing capable medium-sized chat models with broad tooling support. | Can be tested locally with quantized builds on higher-end consumer GPUs or unified-memory systems. | Gemma Terms | Open |
| Gemma 3 12B IT | Edge | Efficient local prototypes, app workflows, and Gemma-family comparisons. | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| Gemma 3 4B IT | Edge | Efficient local prototypes, app workflows, and Gemma-family comparisons. | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| Gemma 3 1B IT | Edge | Efficient local prototypes, app workflows, and Gemma-family comparisons. | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| Gemma 2 27B Instruct | Chat | Efficient local prototypes, app workflows, and Gemma-family comparisons. | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| Gemma 2 9B Instruct | Chat | Efficient local prototypes, app workflows, and Gemma-family comparisons. | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| Gemma 2 2B Instruct | Edge | Efficient local prototypes, app workflows, and Gemma-family comparisons. | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| CodeGemma 7B | Code | Coding assistant experiments and developer workflow prototypes. | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |
| PaliGemma 2 | Vision | Vision-language app prototypes and multimodal evaluation. | Use the exact checkpoint and quantization that matches your hardware and latency target. | Check exact model card | Open |