Piotr Migdał · 2026-06-29 · notable

Quesma: 'Qwen3.6 27B is the sweet spot for local development'

Piotr Migdał argues Qwen3.6 27B is the local-dev sweet spot: ~32 tok/s on a MacBook M5 Max with 8-bit llama.cpp, fits in 42GB RAM, and reaches roughly mid-2025 frontier quality. The post hit 875 points on the Hacker News front page.

Thermal image of a MacBook running Qwen3.6 27B locally

Hands-on case that 27B-dense Qwen3.6 is now production-grade on a single laptop — 875 points on Hacker News.

What is it?

Quesma's founding engineer Piotr Migdał walks through running Alibaba's open-weight Qwen3.6 27B model locally on a MacBook Pro with an M5 Max chip, and argues the 27B-dense variant is the new sweet spot — small enough to fit consumer hardware, large enough to be useful for daily coding work.

How does it work?

On the M5 Max with multi-token prediction enabled, the article reports about 32 tokens per second using llama.cpp at 8-bit quantization, which needs roughly 42GB of unified memory. Migdał cites Artificial Analysis benchmarks that place Qwen3.6 27B above Gemma 4 31B and just below DeepSeek V4 Flash on overall quality.

Why does it matter?

Local agent backends used to mean either tiny models that hallucinated or 70B+ models that needed a multi-GPU rig. A 27B-dense model that runs on one laptop and writes a playable hexagonal minesweeper from a single prompt changes that math — and the 875-point Hacker News front-page reception with nearly 600 comments suggests the developer community treats this as a real inflection.

Who is it for?

local-AI developers on Apple Silicon and high-end Linux laptops

Quesma: 'Qwen3.6 27B is the sweet spot for local development'

What is it?

How does it work?

Why does it matter?

Who is it for?

Sources · 3 outlets

Tags