Hardware tier · 48GB VRAM · Reviewed June 2026

What Can 48GB VRAM Run? Workstation-Class Local AI

48GB of GPU memory is the threshold at which 70B models fit in a single memory pool at Q4 quantization. Below this point — at 24GB — 70B models simply do not fit. Above this point, 30B models at Q8 (near-lossless quality) also become possible. This tier covers three different paths to 48GB: an NVIDIA RTX A6000 workstation card, a dual RTX 3090 NVLink bridge, or Apple Silicon 64GB unified memory with its ~48GB accessible pool.

Model fit at 48GB VRAM

The table applies to any configuration with approximately 48 GB of contiguous accessible memory.

Model size	Best quantization	Memory used	Fits in 48GB?	Notes
1B–14B	FP16	2–28 GB	Yes	Full precision on all models up to 14B. Large VRAM headroom.
30B–32B	FP16	~60–64 GB	No (Q8 fits)	FP16 on 30B exceeds 48 GB. Q8 on 30B (~34 GB) fits with headroom.
30B–32B	Q8	~34 GB	Yes	Near-lossless quality on 30B models. Impossible at 12 or 24 GB.
70B	Q4_K_M	~38–42 GB	Yes	The headline unlock. Fits with 6–10 GB headroom. 70B at interactive quality.
70B	Q8	~74 GB	No	Q8 on 70B requires ~74 GB — exceeds 48 GB. Needs 80 GB+ (H100, A100 80GB) or cloud.
120B+	Q4	70 GB+	No	Frontier model sizes. Requires multi-GPU nodes or cloud.

What 48GB unlocks over 24GB

70B models at Q4: The critical unlock. A 70B model at Q4_K_M needs ~38–42 GB. This fits in a 48 GB pool with 6–10 GB of headroom. No 24 GB GPU can do this without CPU offload. 70B at Q4 is the practical best-quality 70B that fits here.
30B and 32B at Q8: Q8 on a 32B model needs ~34 GB — impossible at 24 GB, comfortable at 48 GB. Near-lossless quality on a 30B model is a meaningful step above 24 GB where Q4 is the ceiling.
14B at FP16 with full context headroom: 14B FP16 uses ~28 GB, leaving 20 GB of headroom for KV cache at large context windows. Very long context at FP16 precision on a 14B model becomes practical.

Hardware paths to 48GB

Hardware	Bandwidth	Type	Notes
NVIDIA RTX A6000 48GB	768 GB/s	Professional workstation GPU	Single-card 48 GB GDDR6. Supports NVLink for 96 GB dual-card setups. Professional tier pricing.
Dual RTX 3090 NVLink	~600 GB/s cross-link	Consumer dual-GPU (complex setup)	Two RTX 3090 cards bridged via NVLink 3.0 present a 48 GB pool. Requires NVLink bridge, 1000 W+ PSU, and runtime-level support. See RTX 3090 guide for full setup notes.
Apple Silicon 64GB Unified Memory	~400 GB/s	Mac unified memory (macOS only)	The 64 GB pool is ~48 GB accessible for AI (~75%). Metal only — no CUDA runtimes. Single-card simplicity. See full Apple Silicon guide.

Choosing a path to 48GB

RTX A6000 48GB: Single card, clean setup, full CUDA ecosystem (Ollama, LM Studio, vLLM, TGI). Professional pricing. Best choice if you want the workstation path with no architectural complexity.
Dual RTX 3090 NVLink: Two consumer cards bridged into a 48 GB pool. Lower cost than an A6000 in many markets but significantly more complex — requires an NVLink bridge, a high-wattage PSU, explicit runtime support for NVLink, and careful thermal management. Not a beginner setup. See the RTX 3090 guide for full NVLink requirements.
Apple Silicon 64GB: The simplest consumer path to ~48 GB of accessible AI memory on macOS. No CUDA runtimes, but Ollama, LM Studio, and llama.cpp all support Metal acceleration. Lower tokens-per-second than the A6000 or dual 3090 setup for models under 32B, but the easiest entry to 70B at Q4 without professional GPU pricing.

What 48GB still cannot do

70B at Q8: Q8 on a 70B model requires ~74 GB. Needs 80 GB+ (NVIDIA A100 80GB, H100 80GB) or cloud inference.
30B at FP16: FP16 on 30B requires ~60 GB. Exceeds 48 GB. Q8 is the practical ceiling for 30B+ at this tier.
Multiple large models simultaneously: 70B at Q4 uses most of the 48 GB pool. Running a second model alongside it is not practical at this tier.

Cloud fallback for above-48GB workloads

RunPod: A100 80GB and H100 80GB instances for 70B at Q8 or FP16.
Lambda: ML-focused cloud GPU compute with A100 and H100 instances.
Vast.ai: Marketplace pricing — often lowest cost for occasional large-model inference jobs.

Check model fit for the 48GB tier

Check 48GB model fit Apple Silicon guide