LLM Hardware Decision Dashboard

Last updated: 2026-03-29 13:03 | Tests: 110 | Cost: $0.0000

66%

8GB

90%

16-32GB

83%

64GB

89%

128GB

91%

Cloud (ref)

Quality by RAM Tier (% of ideal)

8GB

66%

16-32GB

90%

64GB

83%

128GB

89%

Cloud (ref)

91%

Score Heatmap (Model x Task)

Model	Infrastructure	Bug Fixing	Security	ERP Integration	Database Admini	Documentation	Shell / DevOps	Content (French	Analysis / Reas	Claude Code Spe	Avg
Qwen 2.5 7B 8GB	91%	76%	90%	100%	91%	92%	100%	100%	57%	70%	87%
Qwen 3.5 9B 8GB	100%	100%	60%	90%	0%	0%	100%	0%	0%	0%	45%
Phi-4 14B 16-32GB	100%	76%	90%	100%	91%	85%	100%	100%	48%	80%	87%
Qwen3 Coder Next (3B active/80B) 16-32GB	91%	100%	100%	90%	100%	100%	100%	100%	65%	90%	94%
Qwen 2.5 72B 64GB	100%	76%	100%	100%	100%	92%	100%	100%	65%	70%	90%
DeepSeek V3.1 64GB	91%	88%	90%	90%	100%	85%	100%	100%	48%	80%	87%
Qwen 3.5 27B 64GB	100%	100%	90%	90%	100%	0%	100%	0%	57%	90%	73%
DeepSeek R1 671B 128GB	100%	88%	90%	70%	100%	92%	100%	100%	40%	70%	85%
Qwen3 Max 128GB	91%	100%	90%	90%	100%	100%	100%	100%	57%	80%	91%
DeepSeek V3.2 128GB	91%	100%	90%	90%	100%	92%	100%	100%	65%	70%	90%
Claude Sonnet 4.6 Cloud (ref)	100%	100%	80%	80%	100%	92%	100%	100%	65%	90%	91%

Model Ranking

#	Model	Tier	Avg Score	Avg Time
1	Qwen3 Coder Next (3B active/80B)	16-32GB	94%	23.6s
2	Qwen3 Max	128GB	91%	19.7s
3	Claude Sonnet 4.6	Cloud (ref)	91%	20.4s
4	Qwen 2.5 72B	64GB	90%	18.0s
5	DeepSeek V3.2	128GB	90%	44.4s
6	Qwen 2.5 7B	8GB	87%	13.8s
7	Phi-4 14B	16-32GB	87%	9.1s
8	DeepSeek V3.1	64GB	87%	27.1s
9	DeepSeek R1 671B	128GB	85%	54.2s
10	Qwen 3.5 27B	64GB	73%	18.6s
11	Qwen 3.5 9B	8GB	45%	34.1s

Hardware Options Compared

\n\n\n\n\n

Option	RAM	Price (CAD)	Best Models	Avg Score	vs Claude	Verdict
Proxmox Ollama (current)	16GB CPU	$0	Qwen 2.5 7B	66%	73%	Keep — 24/7 lightweight tasks
MacBook Pro 14" M5 Pro	64GB	~$4,300	Qwen3 Coder 30B, Phi-4 14B	90%	99%	Sweet spot — 90% quality, portable, private
MacBook Pro M5 Max	128GB	$8,350	DeepSeek R1, Qwen3 Max	89%	98%	Overkill alone — pair with rack instead
Framework Desktop Max+ 395	128GB	$3,759	Same 128GB tier models	89%	98%	Home rack — heavy inference, 405B capable
Claude Max (cloud)	N/A	$140/mo	Claude Sonnet 4.6 / Opus	91%	100%	Irreplaceable — agent mode, MCP, 1M context

The Optimal Setup

Don't overspend on the laptop. Spend smart on laptop + rack.

PORTABLE

MacBook Pro 14"

M5 Pro — 64GB — 4TB

~$4,300

Private work: emails, translation, research, tab management, RAG — all local, no cloud. 32B models score 90%.

HOME RACK

Framework Desktop

Max+ 395 — 128GB

~$5,744

Heavy inference: 70B dedicated, 405B distributed (add 2nd node later). Access from anywhere via Tailscale.

CLOUD (KEEP)

Claude Max

Sonnet / Opus — 1M context

$140/mo

Irreplaceable: Claude Code, multi-file editing, agent mode, MCP, dev-agent. No local model can do this.

Component	Cost	What It Does
MacBook Pro 14" M5 Pro 64GB 4TB	$4,300	Portable dev + private AI (emails, tabs, research, translation)
Framework Desktop 128GB + PSU/NVMe/Fan	$3,759	Home rack: 70B dedicated, 405B with 2nd node
MikroTik CRS812-DDQ switch	$1,815	400G switch — future-proof, shared across nodes
DeskPi T1 8U rack	$170	Compact 10" rack — fits desk or closet
Claude Max (annual)	$1,680/yr	Agent mode, Claude Code, MCP — not replaceable
TOTAL (Year 1)	$11,724	192GB total (64 portable + 128 rack) + Claude

Why NOT $8,350 on 128GB laptop alone:

64GB laptop scores 90% — only 1% less than 128GB (89%)
$4,050 saved buys a 128GB rack node with money to spare
Rack runs 24/7 — agents, heartbeat, heavy inference even when laptop sleeps
Most private tasks (email, tabs, translation) need 7B-14B — runs on 24GB

What the agent handles locally on 64GB:

Read 40 Safari tabs → summarize → save to Obsidian → close them
Scan emails → classify → draft replies (7B, instant)
Translate documents FR/EN (7B-14B, fast)
Research topics → generate notes (32B, high quality)
Code assistance while offline (Qwen3 Coder, 94% score)

$8,350 Two Ways

Metric	128GB MacBook alone	64GB MacBook + Framework rack
Total cost	$8,350	$10,044 (+$1,694)
Total RAM	128GB	192GB (64+128)
405B capable	No	Yes (rack)
24/7 inference server	No (laptop sleeps)	Yes (rack always on)
Multi-user serving	No	Yes (Open WebUI on rack)
Portable quality	89% (72B)	90% (32B MoE)
Battery life	~15h	~18h (Pro chip)
Expandable	No	Yes (add nodes)

Claude Code Features (Not Replaceable Locally)

Feature	Claude Code	Local Model
Multi-file editing	Reads entire project	Single file context
Tool use (Bash, Read, Write)	Native	Not available
MCP integration	Native	Requires custom wrapper
Memory (CLAUDE.md, soul.md)	Auto-loaded	Manual injection
Agent mode (dev-agent)	Claude Agent SDK	Not available
Context window	1M tokens	32K-128K typically
Specs conformity check	Reads soul.md + specs.md	Must be prompted

Mobile + Desktop Combo Setups

Laptop = terminal + private local LLM for sensitive data. Desktop/Rack = heavy inference server at home. Connected via Tailscale from anywhere.

Combo	Laptop	Desktop/Rack	Total (CAD)	Local Private	Heavy Work	Rating
1. Budget Terminal + Power Rack	MacBook Pro M5 Pro 64GB $4,500	Framework 2-node 256GB $7,518 + switch $1,815	$14,003	32B models (90%)	405B distributed	Best value
2. Strong Laptop + Mid Rack	MacBook Pro M5 Max 128GB $8,350	Framework 1-node 128GB $3,759 + switch $1,815	$14,094	72B models (89%)	Combined 256GB	Most balanced
3. Strong Laptop + Power Rack	MacBook Pro M5 Max 128GB $8,350	Mac Studio M3 Ultra 192GB $6,299 + TB5 $180	$14,999	72B models	405B single node (fastest)	Best performance
4. Budget Terminal + Max Rack	MacBook Pro M4 Pro 48GB $3,500	Framework 4-node 512GB $15,036 + switch $1,815	$20,576	32B models	1T models, multi-user	Client demo machine
5. Cheapest That Works	MacBook Pro M5 Pro 64GB $4,500	Framework 1-node 128GB $3,759 + 10G switch $210	$8,554	32B models	70B single node	Minimum viable

Key Insight: Laptop vs Terminal

If you work with private client data — laptop needs at least 64GB (32B models run locally, no cloud)
If you travel/work from office — 128GB laptop means no dependency on home rack being reachable
If you're always home/office with good internet — 48GB laptop + powerful rack saves $4,000+
The 64GB "sweet spot": Our benchmark shows 16-32GB tier scored 90% — Qwen3 Coder Next runs on 16GB and beat 72B models. A 64GB laptop with smart model selection matches 128GB for most tasks.

Cluster Node Pricing (Scale-Up Path)

Start with 1 node, add more as needed. Switch is a one-time cost shared across all nodes.

Framework Desktop Max+ 395 — 128GB/node, Linux
Config	Total RAM	Models	Hardware (CAD)	+ Switch + Rack
1 node	128GB	70B Q4	$3,759	$5,744
2 nodes	256GB	405B Q4 distributed	$7,518	$9,503
3 nodes	384GB	405B Q6	$11,277	$13,317
4 nodes	512GB	1T Q4, multi-user	$15,036	$17,076

Mac Studio M3 Ultra — 192GB/node, macOS, TB5 RDMA
Config	Total RAM	Models	Hardware (CAD)	+ Switch + Rack
1 node	192GB	405B Q4 (single node!)	$6,479	$8,464
2 nodes	384GB	405B Q8	$12,958	$14,943
3 nodes	576GB	1T Q4	$19,437	$21,477
4 nodes	768GB	1T Q8, multi-user	$25,916	$27,956

Switch: MikroTik CRS812-DDQ — $1,295 USD (~$1,815 CAD) — 2x400G + 2x200G + 8x50G + 2x10G. Rack: DeskPi T1 8U ($170) or T2 12U ($225).

Platform Ranking

Rank	Platform	Strengths	Weaknesses	Best For
1	Mac Studio (M3/M5 Ultra)	TB5 RDMA (120Gbps, 50us) 800GB/s memory BW 256GB unified memory Ultra quiet, low power Scales linearly with nodes	Expensive ($12K+ for 256GB) macOS only Apple controls ecosystem Not repairable	Best performance clusters Production multi-user LLM 405B single node
2	Framework Desktop (Max+ 395)	Ultra quiet (no GPU fans) Energy efficient (~120W) No thermal throttling 128GB unified, repairable Linux (full control) Expandable (add nodes)	25Gbps Ethernet max (no RDMA) Multi-node degrades with TCP 256GB/s BW (vs 800 Mac)	Best value clusters Linux-first workloads Client demos (open source story)
3	NVIDIA GPU (RTX 5090/A100)	Fastest raw inference CUDA ecosystem NVLink for multi-GPU	Loud, hot, power hungry 24GB VRAM per card (consumer) 80GB per card (datacenter $$$) Thermal throttling common	Training workloads Datacenter deployments When speed > everything
4	Beelink/Mini PC (Ryzen Max+)	Same chip as Framework Pre-built (no assembly) Dual 10GbE on some models	Not rack-mountable Harder to cool in cluster Less repairable Fan noise varies	Single node use Quick deployment Budget builds

RDMA: The Game Changer

Without RDMA, adding cluster nodes makes LLM inference slower (Jeff Geerling: 4 Mac Studios over TCP = 15.2 t/s, worse than 1 node at 20.4 t/s). With TB5 RDMA: 31.9 t/s on 4 nodes — 2.1x faster. Framework clusters hit this TCP wall. Mac clusters scale linearly.

Sources: Jeff Geerling RDMA benchmark | Apple TN3205

Team Sizing Guide

How to size LLM infrastructure for teams sharing memory, models, and interfaces via Open WebUI.

Team Size	Concurrent Requests	Min RAM	Recommended Setup	Model Tier	Est. Cost (CAD)
Solo (1)	1	128GB	1x Framework node or MacBook M5 Max	70B dedicated	$3,759 - $8,350
Small (2-5)	1-2	128GB	1x Framework/Mac Studio + Open WebUI	70B shared (queue requests)	$5,744 - $8,464
Team (5-15)	3-5	256GB	2x nodes + Open WebUI + model routing	70B × 2 (load balanced) or 405B shared	$9,503 - $14,943
Department (15-30)	5-10	384-512GB	3-4 nodes + load balancer + RBAC	Multiple 70B concurrent + 405B for complex tasks	$13,317 - $21,477
Organization (30-50)	10-20	512GB-1TB	4-8 nodes + queue system + usage tracking	Multiple models tiered 7B fast / 70B standard / 405B premium	$17,076 - $55,000

Shared Infrastructure Stack

Component	Purpose	Cost
Open WebUI	Multi-user chat interface, RBAC, per-user API keys, usage tracking	Free (Docker)
Ollama	Model serving, concurrent requests, model loading/unloading	Free
pgvector (shared memory)	Team knowledge base, RAG, semantic search across projects	Free (PostgreSQL)
Cloudflare Access	Zero-trust auth for remote access (email OTP)	Free (50 users)
Model tiering	Route: quick → 7B, standard → 70B, premium → 405B	Config only
Budget controls	Per-user daily token limits, model access restrictions	Open WebUI built-in

Subscription Replacement Calculator

Replace per-seat SaaS AI subscriptions with self-hosted infrastructure.

Task Type	SaaS Cost/user/mo	Model Needed	RAM per user	Self-hosted Equivalent
Translation (FR/EN)	$20-30 (DeepL Pro)	7B (Qwen 2.5 7B)	~2GB	Ollama + Open WebUI
Document drafting	$20 (ChatGPT Plus)	14B (Phi-4)	~4GB	Open WebUI with templates
Email writing	$30 (Copilot Pro)	7B-14B	~2GB	Open WebUI + custom prompts
Data analysis / Excel	$30 (Copilot Pro)	32B (Qwen 2.5 32B)	~8GB	Open WebUI + file upload
Code assistance	$19 (Copilot)	32B (Qwen3 Coder)	~8GB	Continue.dev + Ollama
Meeting summaries	$10 (Otter.ai)	7B + Whisper	~4GB	Whisper + Ollama pipeline
Image generation	$20 (Midjourney)	Stable Diffusion / Flux	~8GB VRAM	ComfyUI + GPU node

User Profiles by Role

Role	Typical Tasks	Model Tier	Replaces	Saved/user/mo
Developers	Code completion, debugging, refactoring, PR review, documentation	32B-70B (Qwen3 Coder, DeepSeek)	Copilot ($19) + ChatGPT ($20)	$39
Engineers	Technical docs, calculations, specs, diagrams, research	32B-70B	Copilot Pro ($30) + ChatGPT ($20)	$50
Admin Staff	Email drafting, translation FR/EN, meeting notes, scheduling	7B-14B (lightweight, fast)	ChatGPT ($20) + DeepL ($25)	$45
Accountants	Invoice processing, data extraction, report generation, compliance Q&A	14B-32B	Copilot Pro ($30)	$30
Managers	Strategy docs, presentations, market research, competitive analysis	32B-70B	ChatGPT Plus ($20) + Perplexity ($20)	$40

Open WebUI handles all roles — admin assigns model tiers per user group. Developers get 70B access, admin staff gets 7B-14B (faster, cheaper). All data stays on-premises.

ROI Example: 20-person team

Approach	Monthly	Annual	3-Year
SaaS subscriptions (20 × $50 avg)	$1,000	$12,000	$36,000
Self-hosted (Framework 2-node + Open WebUI)	$50 (electricity)	$600	$11,303*
Savings	$950/mo	$11,400	$24,697

* Includes hardware ($9,503) + electricity ($1,800). Break-even: 10 months.

Scaling Rule of Thumb

1 node per 5-8 concurrent users for 70B models
Add 128GB RAM per 5 users if running 70B concurrently
Queue system needed at 10+ concurrent — requests wait vs degrade
Separate "fast" and "deep" models — 7B for chat, 70B for coding, 405B by request
All data stays local — no cloud API calls for team members' prompts

Full ROI Analysis

Complete 3-year total cost of ownership including resale, tax, and revenue impact.

Costs

Year 1 (Hardware)
MacBook Pro 16" M5 Max 128GB 4TB	$8,350
Framework Desktop 128GB + PSU/NVMe/Fan	$3,759
MikroTik CRS812-DDQ switch	$1,815
DeskPi T1 8U rack	$170
Hardware subtotal	$14,094
Ongoing (3 years)
AppleCare+ ($189+tax/yr)	$651
Claude Max ($140/mo)	$5,040
Electricity (rack 24/7 ~150W)	$591
API/OpenRouter savings	-$1,440
GROSS 3-YEAR COST	$18,936

Returns

Resale Value (after 3 years)
MacBook Pro (75% retention)	-$6,263
Framework Desktop (40%)	-$1,504
Switch (60%)	-$1,089
Resale subtotal	-$8,856
Tax Benefits (incorporated)
Hardware deduction (26.5% rate)	-$3,735
Operating expenses deduction	-$1,665
Tax savings	-$5,400
EFFECTIVE 3-YEAR COST	$4,680
Effective monthly cost	$130/mo

Revenue Impact

Time saved per day

No cloud waiting, private local AI,
agent handles routine tasks

Annual value at $150/hr

$72K

240 working days × 2h × $150

Even at 10% utilization

$7.2K/yr

Setup pays for itself in 8 months

KPI	MacBook Pro 16" M5 Max 128GB	Framework Desktop 128GB (rack)
LLM inference speed (32B)	25-35 t/s (Metal GPU)	15-20 t/s (CPU + iGPU)
Memory bandwidth	546 GB/s	256 GB/s
TB5 clustering	Yes — RDMA 120Gbps	No — Ethernet 25Gbps max
SSD speed (model loading)	7.4 GB/s (70B loads in 6s)	~5 GB/s (NVMe Gen4)
Battery life	15-18h	N/A (always on)
Noise under load	Fan audible on 70B	Near silent
Power consumption	30-100W	120W sustained
Repairability	0/10 (all soldered)	10/10 (everything swappable)
Resale value (3yr)	75% ($6,263 back)	40% ($1,504 back)
AppleCare+	$217/yr (holds resale value)	N/A (self-repair)
Warranty coverage	AppleCare+ until canceled	3yr Framework warranty
24/7 availability	No (laptop sleeps)	Yes (rack always on)
Multi-user serving	No (personal device)	Yes (Open WebUI + RBAC)
Expandable	No (sealed)	Yes (add nodes)
Linux native	Asahi (limited)	Full Linux support
Private data (offline)	72B local, no network needed	Needs laptop to access
Client demo	Good (portable)	Impressive (rack with dashboard)

Bottom Line

Effective cost: $130/mo for the entire setup (after resale + tax deduction)
Less than a Claude Max subscription — and you get 256GB of local compute
One client win ($5K-50K engagement from a self-hosted LLM demo) pays for everything
AppleCare+ at $189/yr is non-negotiable — it preserves the 75% resale value that makes the ROI work
The laptop and rack complement each other — portable private AI + always-on heavy inference. Neither alone is optimal.