Models
1 selectedLlama
Qwen
Mistral
DeepSeek
Gemma
Phi
MiniMax
Yi
GLM
Command-R
Falcon
InternLM
SmolLM
Granite
StableLM
OLMo
Quantization & engine
vLLM (paged attention)
Workload
Constraints
Requirement
for the selected models + workloadTotal VRAM
0.96 GB
System RAM
1.05 GB
Storage
20.9 GB
Est. PSU
700 W
balanced tier
Weights: 0.70 GBKV cache: 0.14 GBOverhead: 0.13 GB
Budget
NVIDIA GeForce RTX 3060 12GB
$280
- VRAM
- 12.0 GB (11.0 GB free)
- Throughput
- 1.2k tok/s
- Est. PSU
- 450 W
Fits VRAMMeets concurrencyIn budget
Balanced
NVIDIA GeForce RTX 5070 Ti
$979
- VRAM
- 16.0 GB (15.0 GB free)
- Throughput
- 3.1k tok/s
- Est. PSU
- 700 W
Fits VRAMMeets concurrencyIn budget
Max performance
NVIDIA H200 141GB (SXM/NVL)
$31,000
- VRAM
- 141 GB (140 GB free)
- Throughput
- 16.4k tok/s
- Est. PSU
- 1350 W
Fits VRAMMeets concurrencyIn budget
Matching hardware
41 options fit the requirement| Hardware | Tier | VRAM | Units | Throughput | PSU | Price |
|---|---|---|---|---|---|---|
NVIDIA GeForce RTX 3060 12GB consumer gpu | budget | 12.0 GB | 1 | 1.2k tok/s | 450 W | $280 |
Apple M4 (10-core GPU) 16GB apple silicon | budget | 16.0 GB | 1 | 411 tok/s | 22 W | $599 |
NVIDIA GeForce RTX 3090 consumer gpu | budget | 24.0 GB | 1 | 3.2k tok/s | 750 W | $750 |
Apple M4 (10-core GPU) 24GB apple silicon | budget | 24.0 GB | 1 | 411 tok/s | 22 W | $799 |
NVIDIA GeForce RTX 4070 Ti SUPER consumer gpu | balanced | 16.0 GB | 1 | 2.3k tok/s | 650 W | $880 |
NVIDIA GeForce RTX 3090 Ti consumer gpu | budget | 24.0 GB | 1 | 3.5k tok/s | 950 W | $900 |
NVIDIA GeForce RTX 5070 Ti consumer gpu | balanced | 16.0 GB | 1 | 3.1k tok/s | 700 W | $979 |
Apple M4 (10-core GPU) 32GB apple silicon | budget | 32.0 GB | 1 | 411 tok/s | 22 W | $999 |
NVIDIA GeForce RTX 4080 SUPER consumer gpu | balanced | 16.0 GB | 1 | 2.5k tok/s | 700 W | $1,100 |
NVIDIA GeForce RTX 5080 consumer gpu | balanced | 16.0 GB | 1 | 3.3k tok/s | 800 W | $1,299 |
Apple M4 Pro (20-core GPU) 24GB apple silicon | budget | 24.0 GB | 1 | 935 tok/s | 38 W | $1,399 |
Apple M4 Pro (20-core GPU) 48GB apple silicon | balanced | 48.0 GB | 1 | 935 tok/s | 38 W | $1,799 |
Apple M3 Pro (18-core GPU) 18GB apple silicon | budget | 18.0 GB | 1 | 513 tok/s | 35 W | $1,999 |
Apple M4 Pro (20-core GPU) 64GB apple silicon | balanced | 64.0 GB | 1 | 935 tok/s | 38 W | $1,999 |
NVIDIA GeForce RTX 4090 consumer gpu | max | 24.0 GB | 1 | 3.5k tok/s | 950 W | $2,100 |
Apple M3 Pro (18-core GPU) 36GB apple silicon | budget | 36.0 GB | 1 | 513 tok/s | 35 W | $2,399 |
NVIDIA GeForce RTX 5090 consumer gpu | max | 32.0 GB | 1 | 6.1k tok/s | 1150 W | $2,999 |
Apple M3 Max (30-core GPU) 36GB apple silicon | balanced | 36.0 GB | 1 | 1.0k tok/s | 45 W | $2,999 |
Apple M4 Max (32-core GPU) 36GB apple silicon | balanced | 36.0 GB | 1 | 1.4k tok/s | 50 W | $3,199 |
Apple M3 Max (40-core GPU) 48GB apple silicon | balanced | 48.0 GB | 1 | 1.4k tok/s | 56 W | $3,499 |
2x RTX 4090 (48GB total) multi gpu | budget | 48.0 GB | 1 | 6.4k tok/s | 1700 W | $3,600 |
Apple M4 Max (40-core GPU) 48GB apple silicon | balanced | 48.0 GB | 1 | 1.9k tok/s | 56 W | $3,699 |
Apple M3 Max (40-core GPU) 64GB apple silicon | balanced | 64.0 GB | 1 | 1.4k tok/s | 56 W | $3,699 |
Apple M4 Max (40-core GPU) 64GB apple silicon | balanced | 64.0 GB | 1 | 1.9k tok/s | 56 W | $3,999 |
Apple M3 Ultra (80-core GPU) 96GB apple silicon | max | 96.0 GB | 1 | 2.8k tok/s | 270 W | $3,999 |
Apple M3 Max (40-core GPU) 96GB apple silicon | max | 96.0 GB | 1 | 1.4k tok/s | 56 W | $4,099 |
NVIDIA RTX A6000 (48GB) workstation gpu | balanced | 48.0 GB | 1 | 2.6k tok/s | 700 W | $4,200 |
2x RTX 5090 (64GB total) multi gpu | balanced | 64.0 GB | 1 | 11.3k tok/s | 2150 W | $4,400 |
Apple M3 Max (40-core GPU) 128GB apple silicon | max | 128 GB | 1 | 1.4k tok/s | 56 W | $4,499 |
Apple M4 Max (40-core GPU) 128GB apple silicon | max | 128 GB | 1 | 1.9k tok/s | 56 W | $4,699 |
Apple M3 Ultra (80-core GPU) 256GB apple silicon | max | 256 GB | 1 | 2.8k tok/s | 270 W | $5,999 |
NVIDIA RTX 6000 Ada Generation (48GB) workstation gpu | balanced | 48.0 GB | 1 | 3.3k tok/s | 700 W | $6,800 |
4x RTX 4090 (96GB total) multi gpu | balanced | 96.0 GB | 1 | 12.2k tok/s | 3250 W | $7,200 |
NVIDIA A100 40GB (PCIe/SXM) datacenter gpu | max | 40.0 GB | 1 | 5.3k tok/s | 850 W | $7,500 |
NVIDIA L40S (48GB) datacenter gpu | balanced | 48.0 GB | 1 | 3.0k tok/s | 750 W | $8,000 |
NVIDIA RTX PRO 6000 Blackwell Workstation (96GB) workstation gpu | max | 96.0 GB | 1 | 6.1k tok/s | 1200 W | $8,500 |
Apple M3 Ultra (80-core GPU) 512GB apple silicon | max | 512 GB | 1 | 2.8k tok/s | 270 W | $9,499 |
NVIDIA A100 80GB (PCIe/SXM) datacenter gpu | max | 80.0 GB | 1 | 7.0k tok/s | 850 W | $12,000 |
2x RTX 6000 Ada (96GB total) multi gpu | max | 96.0 GB | 1 | 6.1k tok/s | 1200 W | $13,600 |
NVIDIA H100 80GB (SXM/PCIe) datacenter gpu | max | 80.0 GB | 1 | 11.5k tok/s | 1350 W | $25,000 |
NVIDIA H200 141GB (SXM/NVL) datacenter gpu | max | 141 GB | 1 | 16.4k tok/s | 1350 W | $31,000 |