Last updated: 2026-03-29 13:03 | Tests: 110 | Cost: $0.0000
| Model | Infrastructure | Bug Fixing | Security | ERP Integration | Database Admini | Documentation | Shell / DevOps | Content (French | Analysis / Reas | Claude Code Spe | Avg |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen 2.5 7B 8GB | 91% | 76% | 90% | 100% | 91% | 92% | 100% | 100% | 57% | 70% | 87% |
| Qwen 3.5 9B 8GB | 100% | 100% | 60% | 90% | 0% | 0% | 100% | 0% | 0% | 0% | 45% |
| Phi-4 14B 16-32GB | 100% | 76% | 90% | 100% | 91% | 85% | 100% | 100% | 48% | 80% | 87% |
| Qwen3 Coder Next (3B active/80B) 16-32GB | 91% | 100% | 100% | 90% | 100% | 100% | 100% | 100% | 65% | 90% | 94% |
| Qwen 2.5 72B 64GB | 100% | 76% | 100% | 100% | 100% | 92% | 100% | 100% | 65% | 70% | 90% |
| DeepSeek V3.1 64GB | 91% | 88% | 90% | 90% | 100% | 85% | 100% | 100% | 48% | 80% | 87% |
| Qwen 3.5 27B 64GB | 100% | 100% | 90% | 90% | 100% | 0% | 100% | 0% | 57% | 90% | 73% |
| DeepSeek R1 671B 128GB | 100% | 88% | 90% | 70% | 100% | 92% | 100% | 100% | 40% | 70% | 85% |
| Qwen3 Max 128GB | 91% | 100% | 90% | 90% | 100% | 100% | 100% | 100% | 57% | 80% | 91% |
| DeepSeek V3.2 128GB | 91% | 100% | 90% | 90% | 100% | 92% | 100% | 100% | 65% | 70% | 90% |
| Claude Sonnet 4.6 Cloud (ref) | 100% | 100% | 80% | 80% | 100% | 92% | 100% | 100% | 65% | 90% | 91% |
| # | Model | Tier | Avg Score | Avg Time |
|---|---|---|---|---|
| 1 | Qwen3 Coder Next (3B active/80B) | 16-32GB | 94% | 23.6s |
| 2 | Qwen3 Max | 128GB | 91% | 19.7s |
| 3 | Claude Sonnet 4.6 | Cloud (ref) | 91% | 20.4s |
| 4 | Qwen 2.5 72B | 64GB | 90% | 18.0s |
| 5 | DeepSeek V3.2 | 128GB | 90% | 44.4s |
| 6 | Qwen 2.5 7B | 8GB | 87% | 13.8s |
| 7 | Phi-4 14B | 16-32GB | 87% | 9.1s |
| 8 | DeepSeek V3.1 | 64GB | 87% | 27.1s |
| 9 | DeepSeek R1 671B | 128GB | 85% | 54.2s |
| 10 | Qwen 3.5 27B | 64GB | 73% | 18.6s |
| 11 | Qwen 3.5 9B | 8GB | 45% | 34.1s |
| Option | RAM | Price (CAD) | Best Models | Avg Score | vs Claude | Verdict |
|---|---|---|---|---|---|---|
| Proxmox Ollama (current) | 16GB CPU | $0 | Qwen 2.5 7B | 66% | 73% | Keep — 24/7 lightweight tasks |
| MacBook Pro 14" M5 Pro | 64GB | ~$4,300 | Qwen3 Coder 30B, Phi-4 14B | 90% | 99% | Sweet spot — 90% quality, portable, private |
| MacBook Pro M5 Max | 128GB | $8,350 | DeepSeek R1, Qwen3 Max | 89% | 98% | Overkill alone — pair with rack instead |
| Framework Desktop Max+ 395 | 128GB | $3,759 | Same 128GB tier models | 89% | 98% | Home rack — heavy inference, 405B capable |
| Claude Max (cloud) | N/A | $140/mo | Claude Sonnet 4.6 / Opus | 91% | 100% | Irreplaceable — agent mode, MCP, 1M context |
Don't overspend on the laptop. Spend smart on laptop + rack.
| Component | Cost | What It Does |
|---|---|---|
| MacBook Pro 14" M5 Pro 64GB 4TB | $4,300 | Portable dev + private AI (emails, tabs, research, translation) |
| Framework Desktop 128GB + PSU/NVMe/Fan | $3,759 | Home rack: 70B dedicated, 405B with 2nd node |
| MikroTik CRS812-DDQ switch | $1,815 | 400G switch — future-proof, shared across nodes |
| DeskPi T1 8U rack | $170 | Compact 10" rack — fits desk or closet |
| Claude Max (annual) | $1,680/yr | Agent mode, Claude Code, MCP — not replaceable |
| TOTAL (Year 1) | $11,724 | 192GB total (64 portable + 128 rack) + Claude |
| Metric | 128GB MacBook alone | 64GB MacBook + Framework rack |
|---|---|---|
| Total cost | $8,350 | $10,044 (+$1,694) |
| Total RAM | 128GB | 192GB (64+128) |
| 405B capable | No | Yes (rack) |
| 24/7 inference server | No (laptop sleeps) | Yes (rack always on) |
| Multi-user serving | No | Yes (Open WebUI on rack) |
| Portable quality | 89% (72B) | 90% (32B MoE) |
| Battery life | ~15h | ~18h (Pro chip) |
| Expandable | No | Yes (add nodes) |
| Feature | Claude Code | Local Model |
|---|---|---|
| Multi-file editing | Reads entire project | Single file context |
| Tool use (Bash, Read, Write) | Native | Not available |
| MCP integration | Native | Requires custom wrapper |
| Memory (CLAUDE.md, soul.md) | Auto-loaded | Manual injection |
| Agent mode (dev-agent) | Claude Agent SDK | Not available |
| Context window | 1M tokens | 32K-128K typically |
| Specs conformity check | Reads soul.md + specs.md | Must be prompted |
Laptop = terminal + private local LLM for sensitive data. Desktop/Rack = heavy inference server at home. Connected via Tailscale from anywhere.
| Combo | Laptop | Desktop/Rack | Total (CAD) | Local Private | Heavy Work | Rating |
|---|---|---|---|---|---|---|
| 1. Budget Terminal + Power Rack | MacBook Pro M5 Pro 64GB $4,500 |
Framework 2-node 256GB $7,518 + switch $1,815 |
$14,003 | 32B models (90%) | 405B distributed | Best value |
| 2. Strong Laptop + Mid Rack | MacBook Pro M5 Max 128GB $8,350 |
Framework 1-node 128GB $3,759 + switch $1,815 |
$14,094 | 72B models (89%) | Combined 256GB | Most balanced |
| 3. Strong Laptop + Power Rack | MacBook Pro M5 Max 128GB $8,350 |
Mac Studio M3 Ultra 192GB $6,299 + TB5 $180 |
$14,999 | 72B models | 405B single node (fastest) | Best performance |
| 4. Budget Terminal + Max Rack | MacBook Pro M4 Pro 48GB $3,500 |
Framework 4-node 512GB $15,036 + switch $1,815 |
$20,576 | 32B models | 1T models, multi-user | Client demo machine |
| 5. Cheapest That Works | MacBook Pro M5 Pro 64GB $4,500 |
Framework 1-node 128GB $3,759 + 10G switch $210 |
$8,554 | 32B models | 70B single node | Minimum viable |
Start with 1 node, add more as needed. Switch is a one-time cost shared across all nodes.
| Framework Desktop Max+ 395 — 128GB/node, Linux | ||||
|---|---|---|---|---|
| Config | Total RAM | Models | Hardware (CAD) | + Switch + Rack |
| 1 node | 128GB | 70B Q4 | $3,759 | $5,744 |
| 2 nodes | 256GB | 405B Q4 distributed | $7,518 | $9,503 |
| 3 nodes | 384GB | 405B Q6 | $11,277 | $13,317 |
| 4 nodes | 512GB | 1T Q4, multi-user | $15,036 | $17,076 |
| Mac Studio M3 Ultra — 192GB/node, macOS, TB5 RDMA | ||||
|---|---|---|---|---|
| Config | Total RAM | Models | Hardware (CAD) | + Switch + Rack |
| 1 node | 192GB | 405B Q4 (single node!) | $6,479 | $8,464 |
| 2 nodes | 384GB | 405B Q8 | $12,958 | $14,943 |
| 3 nodes | 576GB | 1T Q4 | $19,437 | $21,477 |
| 4 nodes | 768GB | 1T Q8, multi-user | $25,916 | $27,956 |
Switch: MikroTik CRS812-DDQ — $1,295 USD (~$1,815 CAD) — 2x400G + 2x200G + 8x50G + 2x10G. Rack: DeskPi T1 8U ($170) or T2 12U ($225).
| Rank | Platform | Strengths | Weaknesses | Best For |
|---|---|---|---|---|
| 1 | Mac Studio (M3/M5 Ultra) | TB5 RDMA (120Gbps, 50us) 800GB/s memory BW 256GB unified memory Ultra quiet, low power Scales linearly with nodes |
Expensive ($12K+ for 256GB) macOS only Apple controls ecosystem Not repairable |
Best performance clusters Production multi-user LLM 405B single node |
| 2 | Framework Desktop (Max+ 395) | Ultra quiet (no GPU fans) Energy efficient (~120W) No thermal throttling 128GB unified, repairable Linux (full control) Expandable (add nodes) |
25Gbps Ethernet max (no RDMA) Multi-node degrades with TCP 256GB/s BW (vs 800 Mac) |
Best value clusters Linux-first workloads Client demos (open source story) |
| 3 | NVIDIA GPU (RTX 5090/A100) | Fastest raw inference CUDA ecosystem NVLink for multi-GPU |
Loud, hot, power hungry 24GB VRAM per card (consumer) 80GB per card (datacenter $$$) Thermal throttling common |
Training workloads Datacenter deployments When speed > everything |
| 4 | Beelink/Mini PC (Ryzen Max+) | Same chip as Framework Pre-built (no assembly) Dual 10GbE on some models |
Not rack-mountable Harder to cool in cluster Less repairable Fan noise varies |
Single node use Quick deployment Budget builds |
Without RDMA, adding cluster nodes makes LLM inference slower (Jeff Geerling: 4 Mac Studios over TCP = 15.2 t/s, worse than 1 node at 20.4 t/s). With TB5 RDMA: 31.9 t/s on 4 nodes — 2.1x faster. Framework clusters hit this TCP wall. Mac clusters scale linearly.
Sources: Jeff Geerling RDMA benchmark | Apple TN3205
How to size LLM infrastructure for teams sharing memory, models, and interfaces via Open WebUI.
| Team Size | Concurrent Requests | Min RAM | Recommended Setup | Model Tier | Est. Cost (CAD) |
|---|---|---|---|---|---|
| Solo (1) | 1 | 128GB | 1x Framework node or MacBook M5 Max | 70B dedicated | $3,759 - $8,350 |
| Small (2-5) | 1-2 | 128GB | 1x Framework/Mac Studio + Open WebUI | 70B shared (queue requests) | $5,744 - $8,464 |
| Team (5-15) | 3-5 | 256GB | 2x nodes + Open WebUI + model routing | 70B × 2 (load balanced) or 405B shared |
$9,503 - $14,943 |
| Department (15-30) | 5-10 | 384-512GB | 3-4 nodes + load balancer + RBAC | Multiple 70B concurrent + 405B for complex tasks |
$13,317 - $21,477 |
| Organization (30-50) | 10-20 | 512GB-1TB | 4-8 nodes + queue system + usage tracking | Multiple models tiered 7B fast / 70B standard / 405B premium |
$17,076 - $55,000 |
| Component | Purpose | Cost |
|---|---|---|
| Open WebUI | Multi-user chat interface, RBAC, per-user API keys, usage tracking | Free (Docker) |
| Ollama | Model serving, concurrent requests, model loading/unloading | Free |
| pgvector (shared memory) | Team knowledge base, RAG, semantic search across projects | Free (PostgreSQL) |
| Cloudflare Access | Zero-trust auth for remote access (email OTP) | Free (50 users) |
| Model tiering | Route: quick → 7B, standard → 70B, premium → 405B | Config only |
| Budget controls | Per-user daily token limits, model access restrictions | Open WebUI built-in |
Replace per-seat SaaS AI subscriptions with self-hosted infrastructure.
| Task Type | SaaS Cost/user/mo | Model Needed | RAM per user | Self-hosted Equivalent |
|---|---|---|---|---|
| Translation (FR/EN) | $20-30 (DeepL Pro) | 7B (Qwen 2.5 7B) | ~2GB | Ollama + Open WebUI |
| Document drafting | $20 (ChatGPT Plus) | 14B (Phi-4) | ~4GB | Open WebUI with templates |
| Email writing | $30 (Copilot Pro) | 7B-14B | ~2GB | Open WebUI + custom prompts |
| Data analysis / Excel | $30 (Copilot Pro) | 32B (Qwen 2.5 32B) | ~8GB | Open WebUI + file upload |
| Code assistance | $19 (Copilot) | 32B (Qwen3 Coder) | ~8GB | Continue.dev + Ollama |
| Meeting summaries | $10 (Otter.ai) | 7B + Whisper | ~4GB | Whisper + Ollama pipeline |
| Image generation | $20 (Midjourney) | Stable Diffusion / Flux | ~8GB VRAM | ComfyUI + GPU node |
| Role | Typical Tasks | Model Tier | Replaces | Saved/user/mo |
|---|---|---|---|---|
| Developers | Code completion, debugging, refactoring, PR review, documentation | 32B-70B (Qwen3 Coder, DeepSeek) | Copilot ($19) + ChatGPT ($20) | $39 |
| Engineers | Technical docs, calculations, specs, diagrams, research | 32B-70B | Copilot Pro ($30) + ChatGPT ($20) | $50 |
| Admin Staff | Email drafting, translation FR/EN, meeting notes, scheduling | 7B-14B (lightweight, fast) | ChatGPT ($20) + DeepL ($25) | $45 |
| Accountants | Invoice processing, data extraction, report generation, compliance Q&A | 14B-32B | Copilot Pro ($30) | $30 |
| Managers | Strategy docs, presentations, market research, competitive analysis | 32B-70B | ChatGPT Plus ($20) + Perplexity ($20) | $40 |
Open WebUI handles all roles — admin assigns model tiers per user group. Developers get 70B access, admin staff gets 7B-14B (faster, cheaper). All data stays on-premises.
| Approach | Monthly | Annual | 3-Year |
|---|---|---|---|
| SaaS subscriptions (20 × $50 avg) | $1,000 | $12,000 | $36,000 |
| Self-hosted (Framework 2-node + Open WebUI) | $50 (electricity) | $600 | $11,303* |
| Savings | $950/mo | $11,400 | $24,697 |
* Includes hardware ($9,503) + electricity ($1,800). Break-even: 10 months.
Complete 3-year total cost of ownership including resale, tax, and revenue impact.
| Year 1 (Hardware) | |
| MacBook Pro 16" M5 Max 128GB 4TB | $8,350 |
| Framework Desktop 128GB + PSU/NVMe/Fan | $3,759 |
| MikroTik CRS812-DDQ switch | $1,815 |
| DeskPi T1 8U rack | $170 |
| Hardware subtotal | $14,094 |
| Ongoing (3 years) | |
| AppleCare+ ($189+tax/yr) | $651 |
| Claude Max ($140/mo) | $5,040 |
| Electricity (rack 24/7 ~150W) | $591 |
| API/OpenRouter savings | -$1,440 |
| GROSS 3-YEAR COST | $18,936 |
| Resale Value (after 3 years) | |
| MacBook Pro (75% retention) | -$6,263 |
| Framework Desktop (40%) | -$1,504 |
| Switch (60%) | -$1,089 |
| Resale subtotal | -$8,856 |
| Tax Benefits (incorporated) | |
| Hardware deduction (26.5% rate) | -$3,735 |
| Operating expenses deduction | -$1,665 |
| Tax savings | -$5,400 |
| EFFECTIVE 3-YEAR COST | $4,680 |
| Effective monthly cost | $130/mo |
| KPI | MacBook Pro 16" M5 Max 128GB | Framework Desktop 128GB (rack) |
|---|---|---|
| LLM inference speed (32B) | 25-35 t/s (Metal GPU) | 15-20 t/s (CPU + iGPU) |
| Memory bandwidth | 546 GB/s | 256 GB/s |
| TB5 clustering | Yes — RDMA 120Gbps | No — Ethernet 25Gbps max |
| SSD speed (model loading) | 7.4 GB/s (70B loads in 6s) | ~5 GB/s (NVMe Gen4) |
| Battery life | 15-18h | N/A (always on) |
| Noise under load | Fan audible on 70B | Near silent |
| Power consumption | 30-100W | 120W sustained |
| Repairability | 0/10 (all soldered) | 10/10 (everything swappable) |
| Resale value (3yr) | 75% ($6,263 back) | 40% ($1,504 back) |
| AppleCare+ | $217/yr (holds resale value) | N/A (self-repair) |
| Warranty coverage | AppleCare+ until canceled | 3yr Framework warranty |
| 24/7 availability | No (laptop sleeps) | Yes (rack always on) |
| Multi-user serving | No (personal device) | Yes (Open WebUI + RBAC) |
| Expandable | No (sealed) | Yes (add nodes) |
| Linux native | Asahi (limited) | Full Linux support |
| Private data (offline) | 72B local, no network needed | Needs laptop to access |
| Client demo | Good (portable) | Impressive (rack with dashboard) |