Piotr Migdał · 2026-06-29 · notable
Quesma: 'Qwen3.6 27B is the sweet spot for local development'
Piotr Migdał argues Qwen3.6 27B is the local-dev sweet spot: ~32 tok/s on a MacBook M5 Max with 8-bit llama.cpp, fits in 42GB RAM, and reaches roughly mid-2025 frontier quality. The post hit 875 points on the Hacker News front page.

Hands-on case that 27B-dense Qwen3.6 is now production-grade on a single laptop — 875 points on Hacker News.
What is it?
Quesma's founding engineer Piotr Migdał walks through running Alibaba's open-weight Qwen3.6 27B model locally on a MacBook Pro with an M5 Max chip, and argues the 27B-dense variant is the new sweet spot — small enough to fit consumer hardware, large enough to be useful for daily coding work.
How does it work?
On the M5 Max with multi-token prediction enabled, the article reports about 32 tokens per second using llama.cpp at 8-bit quantization, which needs roughly 42GB of unified memory. Migdał cites Artificial Analysis benchmarks that place Qwen3.6 27B above Gemma 4 31B and just below DeepSeek V4 Flash on overall quality.
Why does it matter?
Local agent backends used to mean either tiny models that hallucinated or 70B+ models that needed a multi-GPU rig. A 27B-dense model that runs on one laptop and writes a playable hexagonal minesweeper from a single prompt changes that math — and the 875-point Hacker News front-page reception with nearly 600 comments suggests the developer community treats this as a real inflection.
Who is it for?
local-AI developers on Apple Silicon and high-end Linux laptops