Run AI locally โ without the guesswork
You're a sole user with no privacy constraints and Claude Pro / ChatGPT Plus already covers your needs. The cheapest setup below is ~$5,400 CAD โ that's 30+ months of a $140 cloud subscription.
Clean French/English drafting, document Q&A, code review, transcript analysis, basic agent workflows โ all of it private, all of it instant, all of it included once you've bought the hardware.
A laptop bought today should still be the right tool in 2029-2030. Three axes matter: storage (how much disk to buy), RAM (the real ceiling on what models you can run), and chip class (memory bandwidth = token generation speed). Buy too small and you'll feel pain by year 2; buy too big and you waste money on capacity you won't use. Here's how to think about it.
Realistic 4-year laptop disk budget assuming you have a home server or NAS for media + archive:
| macOS + apps | ~250 GB |
| Personal docs + mail archive | ~350 GB |
| Active dev workspace | ~150 GB |
| Local model library (laptop-runnable subset) | ~200 GB |
| HF cache + temp | ~60 GB |
| Buffer (snapshots, OS upgrades, growth) | ~400 GB |
| Year-4 total | ~1.4 TB |
Verdict: 2 TB leaves ~600 GB free at year 4. Comfortable, not luxurious.
Upgrade to 4 TB if: you record/edit video locally, you run multiple Parallels/Docker VMs, you don't have a home server (laptop = whole stack), or you want zero "manage disk space" thinking. Cost: +$1,400 CAD.
RAM is the real ceiling on what models you can run. Disk doesn't matter if the model can't fit in memory.
| Year | Mainstream "good" model | Resident |
|---|---|---|
| 2026 (now) | Gemma 4 31B 8-bit | 31 GB |
| 2027 | 40-50B class quantized | ~40 GB |
| 2028 | 60-80B class | ~60 GB |
| 2029 | 100B class | ~70 GB |
Verdict: 64 GB starts feeling tight by year 2-3 as model sizes grow + you also need RAM for OS + apps + active work. 128 GB is the only laptop config that's safe through 2029-2030.
Bigger models (200B+ dense, 1T+ MoE) won't run on any laptop regardless of RAM โ those live on home servers or clusters (Setup #5, #7).
Memory bandwidth = token generation speed. The Pro / Max gap widens as model sizes grow because bigger models = more memory to read per token.
| Chip | Bandwidth | 2026 30B | 2029 100B |
|---|---|---|---|
| M5 Pro | 307 GB/s | ~30 t/s | ~5 t/s |
| M5 Max | 614 GB/s | ~60 t/s | ~12 t/s |
| M5 Ultra | ~820 GB/s | ~80 t/s | ~18 t/s |
Verdict: M5 Pro is fine for 2026 30B-class daily use. M5 Max stays usable on 100B-class models in 2029. M5 Pro at 100B class would be painful (5 t/s = ~3 minutes for a typical answer).
Estimates above assume 4-bit quantization. Lower quants (Q3, MXFP4) extend the runway by ~30%.
The 3-4 year safe pick: 128 GB RAM + M5 Max + 2 TB SSD (Setup #3 or #4) is the only laptop config that won't feel undersized by 2029. If your AI use is light and you're OK refreshing in 2-3 years, M5 Pro 64 GB (Setup #1 or #2) is fine. Don't pay for 4 TB unless you fit one of the upgrade-if cases above โ that money is better spent on the home server (Setup #5).
Each setup is a complete order: machine + accessories + tax + total. Pick the one that matches your situation, hand the list to your IT person, or order it yourself. Prices are CAD, snapshot 2026-05-03, including QC sales tax (TPS+TVQ 14.975%).
Pick this if: Solo mobile professional. Lawyer, accountant, agent, consultant. Light-to-medium AI use a few times a day. You travel.
| 14" MacBook Pro M5 Pro 64GB 2TB โ Apple Canada | $4,099 |
| AppleCare+ for Mac (3 years) โ Apple | $549 |
| LM Studio (free) โ lmstudio.ai | $0 |
| Cherry Studio chat UI (free) โ cherry-ai.com | $0 |
| Subtotal | $4,648 |
| QC sales tax (14.975%) | +$696 |
| TOTAL | $5,344 |
Pick this if: Solo desk-first professional. Want a bigger screen built-in, occasional travel. Same AI capability as Setup #1.
| 16" MacBook Pro M5 Pro 64GB 2TB โ Apple Canada | $4,699 |
| AppleCare+ for Mac (3 years) โ Apple | $599 |
| LM Studio + Cherry Studio (free) โ lmstudio.ai | $0 |
| Subtotal | $5,298 |
| QC sales tax (14.975%) | +$793 |
| TOTAL | $6,091 |
Pick this if: Solo road-warrior who needs Max performance in a smaller body. You travel a lot, you want to run every model on this page, and you accept the thermal compromise of the 14" chassis.
| 14" MacBook Pro M5 Max 128GB 2TB โ Apple Canada | $5,899 |
| AppleCare+ for Mac (3 years) โ Apple | $599 |
| LM Studio + Cherry Studio (free) โ lmstudio.ai | $0 |
| Subtotal | $6,498 |
| QC sales tax (14.975%) | +$973 |
| TOTAL | $7,471 |
Pick this if: Solo cockpit. Heavy AI all day. You want every model on this page to run, no thermal compromise, no swap, no waiting.
| 16" MacBook Pro M5 Max 128GB 2TB โ Apple Canada | $6,299 |
| AppleCare+ for Mac (3 years) โ Apple | $649 |
| LM Studio + Cherry Studio (free) โ lmstudio.ai | $0 |
| Subtotal | $6,948 |
| QC sales tax (14.975%) | +$1,040 |
| TOTAL | $7,988 |
Pick this if: Heavy AI user with stable home internet (>99% uptime). You want Studio-grade quality at desk + laptop autonomy on the road. Best long-term value if you can wait for Studio M5 Ultra (rumored June or Oct 2026).
| 14" MacBook Pro M5 Pro 64GB 2TB โ Apple Canada | $4,099 |
| Mac Studio M5 Ultra 256GB (when released, est.) โ Apple Canada | $7,500 |
| 10 GbE switch (TP-Link / MikroTik) โ Amazon.ca | $400 |
| APC UPS 1500VA โ Amazon.ca | $300 |
| Tailscale (free, โค3 users) โ tailscale.com | $0 |
| AppleCare+ on Studio + laptop โ Apple | $1,100 |
| Subtotal | $13,399 |
| QC sales tax (14.975%) | +$2,007 |
| TOTAL | $15,406 |
Pick this if: Maximum tokens-per-dollar. You're comfortable with Linux + vLLM serving stack. You don't need Mac apps. Best raw inference speed in this guide.
| 4ร Intel Arc Pro B70 32GB GPU โ Canada Computers | $5,200 |
| AMD Threadripper 7960X + motherboard + 128GB DDR5 โ Canada Computers / Newegg.ca | $3,500 |
| 4 TB NVMe Gen5 SSD โ Newegg.ca | $600 |
| 1500W PSU + chassis + cooling โ Newegg.ca | $700 |
| Ubuntu 24.04 + vLLM (free) โ ubuntu.com | $0 |
| Subtotal | $10,000 |
| QC sales tax (14.975%) | +$1,498 |
| TOTAL | $11,498 |
Pick this if: Team of 15-50 people sharing private AI. You have rack space and someone comfortable with Linux ops. Scale by adding more nodes.
| 4ร Framework Desktop Ryzen AI Max+ 395 128GB โ frame.work | $11,000 |
| 10" rack enclosure (DeskPi T2 12U) โ Amazon.ca | $320 |
| 25 GbE switch (MikroTik CRS510) โ Canada Computers | $700 |
| 4ร 25 GbE NICs + DAC cables โ Amazon.ca | $1,100 |
| UPS 2200VA rack-mount โ Amazon.ca | $1,100 |
| Ubuntu + LibreChat + LiteLLM router (free) โ librechat.ai | $0 |
| Subtotal | $14,220 |
| QC sales tax (14.975%) | +$2,129 |
| TOTAL | $16,349 |
The biggest under-explained spec when people compare Apple chips for AI: memory bandwidth. AI inference (the part where the model actually generates each word of the answer) is bottlenecked by how fast the chip can read the model's weights from memory โ not by raw CPU/GPU compute. That makes bandwidth the single most important number for token generation speed.
| M5 base (MacBook Air, base MBP) | ~150 GB/s | 1ร |
| M5 Pro (Setups #1, #2, #5 laptop) | 307 GB/s | ~2ร |
| M5 Max (Setups #3, #4) | 614 GB/s | ~4ร |
| M5 Ultra (Setup #5 Studio, when released) | ~820 GB/s (est.) | ~5.5ร |
What this means for you: on the same model, an M5 Max generates tokens roughly twice as fast as an M5 Pro. M5 Pro is fine for occasional or light AI use. M5 Max is the floor for sustained or daily-driver AI work. Don't pay Pro prices and expect Max performance โ and don't pay Max prices for a 14" body that throttles back to Pro speeds anyway (see Setup #3 trade-offs).
Source: Alex Ziskind, "Apple's New M5 Max Changes the Local AI Story" (March 2026, Stream Triad memory bandwidth measurements).
We didn't make these numbers up. The "AI quality vs cloud" % on every setup card above traces back to this eval: 7 prompts × 6 models = 42 calls, run on a real M5 Max 128 GB, scored 0-5 by hand against a clear rubric. Sonnet 4.6 (cloud) is the reference at 100%; everything else runs on-device. Setups #3-#5 use the same Apple silicon as the test bench. Setups #6-#7 use comparable hardware (Intel Arc, AMD Ryzen AI Max+) โ quality scales with model size and bandwidth, not vendor branding.
Test bench: 14" MacBook Pro M5 Max 128 GB / 4 TB / macOS 26 / LM Studio 0.4.12 with MLX runtime app-mlx-generate-mac26-arm64@22. Real audio transcript from a French-speaking real-estate professional, PII-stripped to 615 words. Total OpenRouter cost for the Sonnet 4.6 baseline: under $0.50.
| Model | Reasoning 2 trains | Code review 3 bugs | FR Email relance | Garage lift 2P vs 4P | Consulting 3 questions | FR extract 1 sentence | FR analysis structured | Total |
|---|---|---|---|---|---|---|---|---|
| Sonnet 4.6 cloud reference |
5 | 5 | 5 | 5 | 5 | 5 | 5 | 35 |
| Gemma 4 31B MLX ~16GB · dense |
5 | 5 | 4 | 5 | 3 | 5 | 5 | 32 |
| Gemma 4 26B-A4B 8bit ~28GB · MoE-4B-active |
5 | 5 | 4 | 1 | 4 | 5 | 5 | 29 |
| Qwen3-Coder 30B-A3B ~16GB · MoE-3B-active |
5 | 5 | 5 | 1 | 4 | 5 | 4 | 29 |
| Llama 3.3 70B ~40GB · 4-bit |
5 | 4 | 4 | 4 | 3 | 5 | 3 | 28 |
| Qwen3.6 27B MLX ~35GB · 8-bit · reasoning |
5 | 5 | 3 | 5 | 0 | 0 | 5 | 23 |
| Model | Reasoning | Code rev | FR email | Garage lift | Consulting | FR extract | FR analysis | Total |
|---|---|---|---|---|---|---|---|---|
| Sonnet 4.6 | 9.2s | 8.1s | 16.5s | 11.7s | 9.9s | 3.6s | 29.9s | 88.9s |
| Qwen3-Coder 30B-A3B | 8.0s | 6.3s | 7.2s | 5.9s | 6.4s | 4.6s | 9.3s | 47.7s |
| Gemma 4 26B-A4B 8bit | 14.3s | 14.4s | 15.2s | 11.8s | 12.4s | 8.6s | 16.3s | 93.0s |
| Gemma 4 31B MLX | 48.8s | 34.2s | 48.6s | 27.8s | 25.1s | 14.2s | 55.3s | 254.0s |
| Llama 3.3 70B 4bit | 66.1s | 64.5s | 53.0s | 21.6s | 29.9s | 14.1s | 62.2s | 311.4s |
| Qwen3.6 27B MLX | 174.3s | 159.6s | 113.2s | 95.7s | 142.0s | 47.9s | 210.7s | 943.4s |
Hardware is half the story. Here's the open-source stack that runs on every setup above. All free unless noted.
localhost:1234Defensive section โ things that look impressive but won't help you:
These numbers help size large multi-user deployments. For SMB and solo use, jump back to Section 2 โ the setup cards already incorporate this data.
| Mac Cluster Scaling โ Exo 1.0 + RDMA over Thunderbolt 5 | ||||
|---|---|---|---|---|
| Cluster | Model | 1 node | 4 nodes | Scaling |
| 4ร M3 Ultra | DevStroll 123B dense | 9.2 t/s | 22 t/s | 2.4ร |
| 4ร M3 Ultra | Qwen 235B MoE | 30 t/s | 37 t/s | 1.2ร |
| Framework Cluster (Setup #7 architecture) | ||||
| Cluster | Model | Generation | Prefill | Notes |
| 4ร Framework Max+ 395 | MiniMax M2 Q6 | ~13 t/s | ~152 t/s | ~20% gen drop 2โ4 nodes |
| GPU Inference (Setup #6 architecture) | ||||
| GPU | Model | BF16 c1 | BF16 c4 | AWQ c1 |
| Intel Arc Pro B70 | Qwen3-4B | 56 t/s | 194 t/s | 72 t/s |
| NVIDIA RTX Pro 4000 | Qwen3-4B | 51 t/s | 173 t/s | 89 t/s |
Sources: Alex Ziskind benchmark videos (M5 Max + Apple cluster), Jeff Geerling RDMA benchmark.
The 7-prompt eval in Section 4 used a 0-5 rubric per prompt: 0 = no answer / hard fail, 5 = correct, insightful, would ship as-is. Every output was read end-to-end by a human (Artem) and scored against the same rubric. Sonnet 4.6 served as the cloud reference at 35/35.
Local models: LM Studio's built-in token timer, captured via the OpenAI-compatible /v1/chat/completions endpoint at localhost:1234, averaged across the 7 prompts. Cloud: OpenRouter response timing for the same prompts. All on a single test bench (M5 Max 128 GB / macOS 26 / LM Studio 0.4.12).
The full test harness (prompts, scoring rubric, runner.py) is open-source: github.com/radionokia/homelab/projects/llm-benchmark. Run it on any of the setups above, score the outputs against your own rubric, and you'll get reproducible quality + latency data for your hardware.
Artem Kravchenko โ AI Automation Consultant + IT Director, 30 years in IT, based in Quebec City. Bilingual (English / French).
I run all seven setups in production: my own work, my homelab, and at client sites. Numbers on this page are what I measure, not what vendors promise. If a recommendation here costs you money you didn't need to spend, I want to hear about it โ that feedback makes the next revision better.
Not sure which setup fits your situation? Free consult, no commitment.
Want to see ROI analysis, per-role breakdowns, and KPI scorecard? → Coming soon at /roi