Alex Ellis · 2026-06-17 · notable
Alex Ellis: 'Local Qwen Isn't a Worse Opus — It's a Different Tool'
Alex Ellis argues local Qwen models are not stripped-down Opus stand-ins but a different tool, useful for bounded private work like telemetry analysis and code review, even after he spent $12,000 on an RTX 6000 Pro and saw the model still loop and hallucinate on open-ended tasks.

Open-source advocate Alex Ellis says local Qwen is the right tool for bounded private work, not a poor man's Opus.
What is it?
A first-person essay by Alex Ellis, the indie developer behind OpenFaaS and inlets. Ellis runs local Qwen 27B and 35-A3B on his own RTX 6000 Pro GPU and writes up what they are and are not good for, based on real customer-telemetry analysis, diagnostic reports, and code-review tasks from his open-source projects and consulting work.
How does it work?
Ellis pushes back on benchmark headlines that claim Qwen is 'near-Opus level'. Local Qwen excels at bounded, well-specified jobs where data privacy and predictable cost matter most, and where the operator is willing to supervise. On open-ended assignments, Ellis says the local model falls into infinite loops, hallucinates solutions, and degrades sharply when heavily quantized. After investing $12,000 in the RTX 6000 Pro, he is still unwilling to leave Qwen running unsupervised on long-horizon work.
Why does it matter?
The post is a counterweight to the wave of 'local model X matches Opus' takes that have followed every Qwen and GLM release this year. Ellis frames the question as fitness-for-purpose, not raw benchmark parity, and gives concrete examples of work where a 27B local model is the right call and where it is not.
Who is it for?
Developers weighing self-hosted Qwen against hosted Claude or GPT for production