Hardware tier · 48GB VRAM · Reviewed June 2026

What Can 48GB VRAM Run? Workstation-Class Local AI

48GB of GPU memory is the threshold at which 70B models fit in a single memory pool at Q4 quantization. Below this point — at 24GB — 70B models simply do not fit. Above this point, 30B models at Q8 (near-lossless quality) also become possible. This tier covers three different paths to 48GB: an NVIDIA RTX A6000 workstation card, a dual RTX 3090 NVLink bridge, or Apple Silicon 64GB unified memory with its ~48GB accessible pool.

Model fit at 48GB VRAM

The table applies to any configuration with approximately 48 GB of contiguous accessible memory.

Model sizeBest quantizationMemory usedFits in 48GB?Notes
1B–14BFP162–28 GBYesFull precision on all models up to 14B. Large VRAM headroom.
30B–32BFP16~60–64 GBNo (Q8 fits)FP16 on 30B exceeds 48 GB. Q8 on 30B (~34 GB) fits with headroom.
30B–32BQ8~34 GBYesNear-lossless quality on 30B models. Impossible at 12 or 24 GB.
70BQ4_K_M~38–42 GBYesThe headline unlock. Fits with 6–10 GB headroom. 70B at interactive quality.
70BQ8~74 GBNoQ8 on 70B requires ~74 GB — exceeds 48 GB. Needs 80 GB+ (H100, A100 80GB) or cloud.
120B+Q470 GB+NoFrontier model sizes. Requires multi-GPU nodes or cloud.

What 48GB unlocks over 24GB

  • 70B models at Q4: The critical unlock. A 70B model at Q4_K_M needs ~38–42 GB. This fits in a 48 GB pool with 6–10 GB of headroom. No 24 GB GPU can do this without CPU offload. 70B at Q4 is the practical best-quality 70B that fits here.
  • 30B and 32B at Q8: Q8 on a 32B model needs ~34 GB — impossible at 24 GB, comfortable at 48 GB. Near-lossless quality on a 30B model is a meaningful step above 24 GB where Q4 is the ceiling.
  • 14B at FP16 with full context headroom: 14B FP16 uses ~28 GB, leaving 20 GB of headroom for KV cache at large context windows. Very long context at FP16 precision on a 14B model becomes practical.

Hardware paths to 48GB

HardwareBandwidthTypeNotes
NVIDIA RTX A6000 48GB768 GB/sProfessional workstation GPUSingle-card 48 GB GDDR6. Supports NVLink for 96 GB dual-card setups. Professional tier pricing.
Dual RTX 3090 NVLink~600 GB/s cross-linkConsumer dual-GPU (complex setup)Two RTX 3090 cards bridged via NVLink 3.0 present a 48 GB pool. Requires NVLink bridge, 1000 W+ PSU, and runtime-level support. See RTX 3090 guide for full setup notes.
Apple Silicon 64GB Unified Memory~400 GB/sMac unified memory (macOS only)The 64 GB pool is ~48 GB accessible for AI (~75%). Metal only — no CUDA runtimes. Single-card simplicity. See full Apple Silicon guide.

Choosing a path to 48GB

  • RTX A6000 48GB: Single card, clean setup, full CUDA ecosystem (Ollama, LM Studio, vLLM, TGI). Professional pricing. Best choice if you want the workstation path with no architectural complexity.
  • Dual RTX 3090 NVLink: Two consumer cards bridged into a 48 GB pool. Lower cost than an A6000 in many markets but significantly more complex — requires an NVLink bridge, a high-wattage PSU, explicit runtime support for NVLink, and careful thermal management. Not a beginner setup. See the RTX 3090 guide for full NVLink requirements.
  • Apple Silicon 64GB: The simplest consumer path to ~48 GB of accessible AI memory on macOS. No CUDA runtimes, but Ollama, LM Studio, and llama.cpp all support Metal acceleration. Lower tokens-per-second than the A6000 or dual 3090 setup for models under 32B, but the easiest entry to 70B at Q4 without professional GPU pricing.

What 48GB still cannot do

  • 70B at Q8: Q8 on a 70B model requires ~74 GB. Needs 80 GB+ (NVIDIA A100 80GB, H100 80GB) or cloud inference.
  • 30B at FP16: FP16 on 30B requires ~60 GB. Exceeds 48 GB. Q8 is the practical ceiling for 30B+ at this tier.
  • Multiple large models simultaneously: 70B at Q4 uses most of the 48 GB pool. Running a second model alongside it is not practical at this tier.

Cloud fallback for above-48GB workloads

  • RunPod: A100 80GB and H100 80GB instances for 70B at Q8 or FP16.
  • Lambda: ML-focused cloud GPU compute with A100 and H100 instances.
  • Vast.ai: Marketplace pricing — often lowest cost for occasional large-model inference jobs.

Check model fit for the 48GB tier