Last updated: 2026-03-29 13:59 | Tests: 110 | Cost: $0.0000
This dashboard runs real coding and business tasks against open-source LLMs via the OpenRouter API, then scores each response on correctness, completeness, and code quality. Tasks are built from actual IT infrastructure, ERP integration, security auditing, DevOps, and business documentation work — not synthetic benchmarks.
Scoring method: Each model response is auto-scored against expected keywords (60%), code compilation check (20%), and response completeness (20%). Claude Sonnet 4.6 serves as the reference baseline (100%). Scores represent percentage of reference quality achievable at each RAM tier.
Role profiling: Per-role benchmarks use tasks specific to each position — developers get code generation and PR review tasks, accountants get invoice parsing and compliance questions, admin staff get translation and email drafting. Each role is tested against 4 model tiers to determine the minimum hardware that delivers acceptable quality.
Note: These benchmarks are built on real-world IT consulting, infrastructure management, and business operations context. Your specific workload may vary, but the patterns hold for 80%+ of companies in the SMB/mid-market segment — the task types (coding, documentation, translation, analysis, compliance) are universal across industries. The key insight — that most roles score 90%+ on small models — is consistent regardless of industry.
| Model | Infrastructure | Bug Fixing | Security | ERP Integration | Database Admini | Documentation | Shell / DevOps | Content (French | Analysis / Reas | Claude Code Spe | Avg |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen 2.5 7B 8GB | 91% | 76% | 90% | 100% | 91% | 92% | 100% | 100% | 57% | 70% | 87% |
| Qwen 3.5 9B 8GB | 100% | 100% | 60% | 90% | 0% | 0% | 100% | 0% | 0% | 0% | 45% |
| Phi-4 14B 16-32GB | 100% | 76% | 90% | 100% | 91% | 85% | 100% | 100% | 48% | 80% | 87% |
| Qwen3 Coder Next (3B active/80B) 16-32GB | 91% | 100% | 100% | 90% | 100% | 100% | 100% | 100% | 65% | 90% | 94% |
| Qwen 2.5 72B 64GB | 100% | 76% | 100% | 100% | 100% | 92% | 100% | 100% | 65% | 70% | 90% |
| DeepSeek V3.1 64GB | 91% | 88% | 90% | 90% | 100% | 85% | 100% | 100% | 48% | 80% | 87% |
| Qwen 3.5 27B 64GB | 100% | 100% | 90% | 90% | 100% | 0% | 100% | 0% | 57% | 90% | 73% |
| DeepSeek R1 671B 128GB | 100% | 88% | 90% | 70% | 100% | 92% | 100% | 100% | 40% | 70% | 85% |
| Qwen3 Max 128GB | 91% | 100% | 90% | 90% | 100% | 100% | 100% | 100% | 57% | 80% | 91% |
| DeepSeek V3.2 128GB | 91% | 100% | 90% | 90% | 100% | 92% | 100% | 100% | 65% | 70% | 90% |
| Claude Sonnet 4.6 Cloud (ref) | 100% | 100% | 80% | 80% | 100% | 92% | 100% | 100% | 65% | 90% | 91% |
| # | Model | Tier | Avg Score | Avg Time |
|---|---|---|---|---|
| 1 | Qwen3 Coder Next (3B active/80B) | 16-32GB | 94% | 23.6s |
| 2 | Qwen3 Max | 128GB | 91% | 19.7s |
| 3 | Claude Sonnet 4.6 | Cloud (ref) | 91% | 20.4s |
| 4 | Qwen 2.5 72B | 64GB | 90% | 18.0s |
| 5 | DeepSeek V3.2 | 128GB | 90% | 44.4s |
| 6 | Qwen 2.5 7B | 8GB | 87% | 13.8s |
| 7 | Phi-4 14B | 16-32GB | 87% | 9.1s |
| 8 | DeepSeek V3.1 | 64GB | 87% | 27.1s |
| 9 | DeepSeek R1 671B | 128GB | 85% | 54.2s |
| 10 | Qwen 3.5 27B | 64GB | 73% | 18.6s |
| 11 | Qwen 3.5 9B | 8GB | 45% | 34.1s |
| Option | RAM | Price (CAD) | Best Models | Avg Score | vs Claude | Verdict |
|---|---|---|---|---|---|---|
| Proxmox Ollama (current) | 16GB CPU | $0 | Qwen 2.5 7B | 66% | 73% | Keep — 24/7 lightweight tasks |
| 14" MacBook Pro M5 Pro | 64GB max | ~$4,300 | Qwen3 Coder 30B, Phi-4 14B | 90% | 99% | Best portable — quiet, 18h battery, 90% quality |
| 16" MacBook Pro M5 Max | 128GB | $8,350 | 72B Q4, DeepSeek V3 67B | 89% | 98% | Only 16" has 128GB — never swaps, quiet under load |
| 14" MacBook Pro M5 Max | 64GB max | ~$5,500 | Same as 14" Pro but faster GPU | 83% | 91% | Avoid — Max thermal throttles in 14" body, Pro is better value |
| Mac Studio (used rack node) | 128-256GB | $3,500-9,000 | 405B single node (Ultra) | 89% | 98% | TB5 RDMA to MacBook — cluster as one machine |
| Framework Desktop Max+ 395 | 128GB | $3,759 | Same 128GB tier, Linux | 89% | 98% | Best value rack — teams, not personal (no RDMA) |
| Claude Max (cloud) | N/A | $140/mo | Claude Sonnet 4.6 / Opus | 91% | 100% | Irreplaceable — agent mode, MCP, 1M context |
Don't overspend on the laptop. Spend smart on laptop + rack.
| Component | Cost | What It Does |
|---|---|---|
| MacBook Pro 14" M5 Pro 64GB 4TB | $4,300 | Portable dev + private AI (emails, tabs, research, translation) |
| Mac Studio M4 Max 128GB (used, via /buyme) | ~$3,500 | TB5 RDMA to MacBook, 405B with Ultra upgrade later |
| MikroTik CRS812-DDQ switch | $1,815 | 400G switch — future-proof, shared across nodes |
| DeskPi T1 8U rack | $170 | Compact 10" rack — fits desk or closet |
| Claude Max (annual) | $1,680/yr | Agent mode, Claude Code, MCP — not replaceable |
| TOTAL (Year 1) | $13,710 | 256GB total (128 laptop + 128 rack via TB5 RDMA) + Claude |
| Metric | 128GB MacBook alone | 64GB MacBook + Framework rack |
|---|---|---|
| Total cost | $8,350 | $10,044 (+$1,694) |
| Total RAM | 128GB | 192GB (64+128) |
| 405B capable | No | Yes (rack) |
| 24/7 inference server | No (laptop sleeps) | Yes (rack always on) |
| Multi-user serving | No | Yes (Open WebUI on rack) |
| Portable quality | 89% (72B) | 90% (32B MoE) |
| Battery life | ~15h | ~18h (Pro chip) |
| Expandable | No | Yes (add nodes) |
| Feature | Claude Code | Local Model |
|---|---|---|
| Multi-file editing | Reads entire project | Single file context |
| Tool use (Bash, Read, Write) | Native | Not available |
| MCP integration | Native | Requires custom wrapper |
| Memory (CLAUDE.md, soul.md) | Auto-loaded | Manual injection |
| Agent mode (dev-agent) | Claude Agent SDK | Not available |
| Context window | 1M tokens | 32K-128K typically |
| Specs conformity check | Reads soul.md + specs.md | Must be prompted |
Laptop = terminal + private local LLM for sensitive data. Desktop/Rack = heavy inference server at home. Connected via Tailscale from anywhere.
| Combo | Laptop | Desktop/Rack | Total (CAD) | Local Private | Heavy Work | Rating |
|---|---|---|---|---|---|---|
| 1. Portable Pro + Power Rack | 14" M5 Pro 64GB 4TB ~$4,300 — quiet, 18h battery |
Framework 2-node 256GB $7,518 + switch $1,815 |
$13,803 | 32B models (90%) | 405B distributed | Best value |
| 2. Powerhouse 16" + Mac Rack | 16" M5 Max 128GB 4TB $8,350 — never swaps, 72B local |
Mac Studio M4 Max 128GB (used) ~$3,500 + TB5 cable $180 |
$12,030 | 72B models (89%) | 256GB via TB5 RDMA | Most balanced |
| 3. Powerhouse 16" + Ultra Rack | 16" M5 Max 128GB 4TB $8,350 — TB5 to Studio |
Mac Studio M3 Ultra 256GB (used) ~$9,000 + TB5 $180 |
$17,530 | 72B models | 405B single node (fastest) | Best performance |
| 4. Budget Terminal + Max Rack | 14" M4 Pro 48GB (used/refurb) ~$2,800 — TB5, lightweight |
Framework 4-node 512GB $15,036 + switch $1,815 |
$19,651 | 32B models | 1T models, multi-user | Client demo machine |
| 5. Cheapest That Works | 14" M5 Pro 64GB 4TB ~$4,300 — sweet spot portable |
Framework 1-node 128GB $3,759 + 10G switch $210 |
$8,554 | 32B models | 70B single node | Minimum viable |
Start with 1 node, add more as needed. Switch is a one-time cost shared across all nodes.
| Framework Desktop Max+ 395 — 128GB/node, Linux | ||||
|---|---|---|---|---|
| Config | Total RAM | Models | Hardware (CAD) | + Switch + Rack |
| 1 node | 128GB | 70B Q4 | $3,759 | $5,744 |
| 2 nodes | 256GB | 405B Q4 distributed | $7,518 | $9,503 |
| 3 nodes | 384GB | 405B Q6 | $11,277 | $13,317 |
| 4 nodes | 512GB | 1T Q4, multi-user | $15,036 | $17,076 |
| Mac Studio M3 Ultra — 192GB/node, macOS, TB5 RDMA | ||||
|---|---|---|---|---|
| Config | Total RAM | Models | Hardware (CAD) | + Switch + Rack |
| 1 node | 192GB | 405B Q4 (single node!) | $6,479 | $8,464 |
| 2 nodes | 384GB | 405B Q8 | $12,958 | $14,943 |
| 3 nodes | 576GB | 1T Q4 | $19,437 | $21,477 |
| 4 nodes | 768GB | 1T Q8, multi-user | $25,916 | $27,956 |
Switch: MikroTik CRS812-DDQ — $1,295 USD (~$1,815 CAD) — 2x400G + 2x200G + 8x50G + 2x10G. Rack: DeskPi T1 8U ($170) or T2 12U ($225).
| Rank | Platform | Strengths | Weaknesses | Best For |
|---|---|---|---|---|
| 1 | Mac Studio (M3/M5 Ultra) | TB5 RDMA (120Gbps, 50us) 800GB/s memory BW 256GB unified memory Ultra quiet, low power Scales linearly with nodes |
Expensive ($12K+ for 256GB) macOS only Apple controls ecosystem Not repairable |
Best performance clusters Production multi-user LLM 405B single node |
| 2 | Framework Desktop (Max+ 395) | Ultra quiet (no GPU fans) Energy efficient (~120W) No thermal throttling 128GB unified, repairable Linux (full control) Expandable (add nodes) |
No RDMA (Ethernet only) 100GbE NICs ($800/ea) close the gap significantly 256GB/s BW (vs 800 Mac) Alex Ziskind showed 100G approach beats Jeff Geerling 10G results |
Best value clusters Linux-first workloads Client demos (open source story) |
| 3 | NVIDIA GPU (RTX 5090/A100) | Fastest raw inference CUDA ecosystem NVLink for multi-GPU |
Loud, hot, power hungry 24GB VRAM per card (consumer) 80GB per card (datacenter $$$) Thermal throttling common |
Training workloads Datacenter deployments When speed > everything |
| 4 | Beelink/Mini PC (Ryzen Max+) | Same chip as Framework Pre-built (no assembly) Dual 10GbE on some models |
Not rack-mountable Harder to cool in cluster Less repairable Fan noise varies |
Single node use Quick deployment Budget builds |
Without RDMA, adding cluster nodes makes LLM inference slower on 10GbE (Jeff Geerling: 4 nodes TCP = 15.2 t/s, worse than 1 node at 20.4 t/s). With TB5 RDMA: 31.9 t/s — 2.1x faster.
However: Alex Ziskind showed that upgrading Framework nodes from 10GbE to 50/100GbE NICs significantly improves multi-node performance. 100GbE cards (~$800 USD each, cheaper used on eBay) close much of the gap with TB5. For teams, Framework + 100GbE is still the best value — $3,200 in NICs for a 4-node cluster vs $25K+ for Mac Studios.
Sources: Jeff Geerling RDMA benchmark | Apple TN3205
How to size LLM infrastructure for teams sharing memory, models, and interfaces via Open WebUI.
| Team Size | Concurrent Requests | Min RAM | Recommended Setup | Model Tier | Est. Cost (CAD) |
|---|---|---|---|---|---|
| Solo (1) | 1 | 128GB | 1x Framework node or MacBook M5 Max | 70B dedicated | $3,759 - $8,350 |
| Small (2-5) | 1-2 | 128GB | 1x Framework/Mac Studio + Open WebUI | 70B shared (queue requests) | $5,744 - $8,464 |
| Team (5-15) | 3-5 | 256GB | 2x nodes + Open WebUI + model routing | 70B × 2 (load balanced) or 405B shared |
$9,503 - $14,943 |
| Department (15-30) | 5-10 | 384-512GB | 3-4 nodes + load balancer + RBAC | Multiple 70B concurrent + 405B for complex tasks |
$13,317 - $21,477 |
| Organization (30-50) | 10-20 | 512GB-1TB | 4-8 nodes + queue system + usage tracking | Multiple models tiered 7B fast / 70B standard / 405B premium |
$17,076 - $55,000 |
| Component | Purpose | Cost |
|---|---|---|
| Open WebUI | Multi-user chat interface, RBAC, per-user API keys, usage tracking | Free (Docker) |
| Ollama | Model serving, concurrent requests, model loading/unloading | Free |
| pgvector (shared memory) | Team knowledge base, RAG, semantic search across projects | Free (PostgreSQL) |
| Cloudflare Access | Zero-trust auth for remote access (email OTP) | Free (50 users) |
| Model tiering | Route: quick → 7B, standard → 70B, premium → 405B | Config only |
| Budget controls | Per-user daily token limits, model access restrictions | Open WebUI built-in |
Replace per-seat SaaS AI subscriptions with self-hosted infrastructure.
| Task Type | SaaS Cost/user/mo | Model Needed | RAM per user | Self-hosted Equivalent |
|---|---|---|---|---|
| Translation (FR/EN) | $20-30 (DeepL Pro) | 7B (Qwen 2.5 7B) | ~2GB | Ollama + Open WebUI |
| Document drafting | $20 (ChatGPT Plus) | 14B (Phi-4) | ~4GB | Open WebUI with templates |
| Email writing | $30 (Copilot Pro) | 7B-14B | ~2GB | Open WebUI + custom prompts |
| Data analysis / Excel | $30 (Copilot Pro) | 32B (Qwen 2.5 32B) | ~8GB | Open WebUI + file upload |
| Code assistance | $19 (Copilot) | 32B (Qwen3 Coder) | ~8GB | Continue.dev + Ollama |
| Meeting summaries | $10 (Otter.ai) | 7B + Whisper | ~4GB | Whisper + Ollama pipeline |
| Image generation | $20 (Midjourney) | Stable Diffusion / Flux | ~8GB VRAM | ComfyUI + GPU node |
| Role | Typical Tasks | Model Tier | Replaces | Saved/user/mo |
|---|---|---|---|---|
| Developers | Code completion, debugging, refactoring, PR review, documentation | 32B-70B (Qwen3 Coder, DeepSeek) | Copilot ($19) + ChatGPT ($20) | $39 |
| Engineers | Technical docs, calculations, specs, diagrams, research | 32B-70B | Copilot Pro ($30) + ChatGPT ($20) | $50 |
| Admin Staff | Email drafting, translation FR/EN, meeting notes, scheduling | 7B-14B (lightweight, fast) | ChatGPT ($20) + DeepL ($25) | $45 |
| Accountants | Invoice processing, data extraction, report generation, compliance Q&A | 14B-32B | Copilot Pro ($30) | $30 |
| Managers | Strategy docs, presentations, market research, competitive analysis | 32B-70B | ChatGPT Plus ($20) + Perplexity ($20) | $40 |
For teams: Framework Desktop + 100GbE NICs is the best value. $9,503 for 2-node 256GB cluster serves 30 people. Add $1,600 for 100GbE NICs (2×$800) for near-Mac performance. Total: $11,103 — still 25% cheaper than one Mac Studio Ultra.
Open WebUI handles all roles — admin assigns model tiers per user group. Developers get 70B access, admin staff gets 7B-14B (faster, cheaper). All data stays on-premises.
| Approach | Monthly | Annual | 3-Year |
|---|---|---|---|
| SaaS subscriptions (20 × $50 avg) | $1,000 | $12,000 | $36,000 |
| Self-hosted (Framework 2-node + Open WebUI) | $50 (electricity) | $600 | $11,303* |
| Savings | $950/mo | $11,400 | $24,697 |
* Includes hardware ($9,503) + electricity ($1,800). Break-even: 10 months.
Benchmark results per company role — what hardware each position actually needs.
| Role | 8GB 7B | 16-32GB 14-32B | 64GB 72B | Cloud Claude | Min Laptop | Central |
|---|---|---|---|---|---|---|
| Admin Staff Email, translation, notes | 90% | 100% | 95% | 95% | 16GB | 7B-14B |
| Accountant Invoices, compliance, reports | 92% | 96% | 92% | 96% | 24GB | 14B-32B |
| Manager Strategy, presentations | 96% | 96% | 100% | 92% | 24GB | 32B |
| Engineer Specs, calculations | 92% | 92% | 89% | 79% | 48GB | 70B |
| Developer Code, debug, PR review | 80% | 100% | 86% | 100% | 64GB | 70B |
10 admin ($45/ea) + 3 acct ($30) + 5 eng ($50) + 8 dev ($39) + 4 mgr ($40)
= $1,262/mo = $45,432 over 3 years
Framework 2-node $9,503 + electricity $1,800 (3yr)
= $11,303 total. Savings: $34,129
Key insight: 70% of roles (admin, accountants, managers) score 90%+ on 7B-14B models. They dont need expensive laptops. One central 128GB server running 70B for devs + 14B for everyone else serves the entire company. Most employees keep their existing PCs and access via Open WebUI in the browser.
Complete 3-year total cost of ownership including resale, tax, and revenue impact.
| Year 1 (Hardware) | |
| MacBook Pro 16" M5 Max 128GB 4TB | $8,350 |
| Framework Desktop 128GB + PSU/NVMe/Fan | $3,759 |
| MikroTik CRS812-DDQ switch | $1,815 |
| DeskPi T1 8U rack | $170 |
| Hardware subtotal | $14,094 |
| Ongoing (3 years) | |
| AppleCare+ ($189+tax/yr) | $651 |
| Claude Max ($140/mo) | $5,040 |
| Electricity (rack 24/7 ~150W) | $591 |
| API/OpenRouter savings | -$1,440 |
| GROSS 3-YEAR COST | $18,936 |
| Resale Value (after 3 years) | |
| MacBook Pro (75% retention) | -$6,263 |
| Framework Desktop (40%) | -$1,504 |
| Switch (60%) | -$1,089 |
| Resale subtotal | -$8,856 |
| Tax Benefits (incorporated) | |
| Hardware deduction (26.5% rate) | -$3,735 |
| Operating expenses deduction | -$1,665 |
| Tax savings | -$5,400 |
| EFFECTIVE 3-YEAR COST | $4,680 |
| Effective monthly cost | $130/mo |
| KPI | MacBook Pro 16" M5 Max 128GB | Framework Desktop 128GB (rack) |
|---|---|---|
| LLM inference speed (32B) | 25-35 t/s (Metal GPU) | 15-20 t/s (CPU + iGPU) |
| Memory bandwidth | 546 GB/s | 256 GB/s |
| TB5 clustering | Yes — RDMA 120Gbps | No — Ethernet 25Gbps max |
| SSD speed (model loading) | 7.4 GB/s (70B loads in 6s) | ~5 GB/s (NVMe Gen4) |
| Battery life | 15-18h | N/A (always on) |
| Noise under load | Fan audible on 70B | Near silent |
| Power consumption | 30-100W | 120W sustained |
| Repairability | 0/10 (all soldered) | 10/10 (everything swappable) |
| Resale value (3yr) | 75% ($6,263 back) | 40% ($1,504 back) |
| AppleCare+ | $217/yr (holds resale value) | N/A (self-repair) |
| Warranty coverage | AppleCare+ until canceled | 3yr Framework warranty |
| 24/7 availability | No (laptop sleeps) | Yes (rack always on) |
| Multi-user serving | No (personal device) | Yes (Open WebUI + RBAC) |
| Expandable | No (sealed) | Yes (add nodes) |
| Linux native | Asahi (limited) | Full Linux support |
| Private data (offline) | 72B local, no network needed | Needs laptop to access |
| Client demo | Good (portable) | Impressive (rack with dashboard) |