Vicki Boykis · 2026-06-15 · notable

Vicki Boykis: 'Running Local Models Is Good Now'

ML engineer Vicki Boykis argues local models finally cleared the practical bar — she runs agentic coding workflows on Gemma 4 and friends, hitting about 75% of frontier cloud accuracy without API dependencies.

Cover image for Vicki Boykis' essay on running local language models.

A working ML engineer says open-weights local models are finally usable for real coding work.

What is it?

An essay from Vicki Boykis, a founding ML engineer who builds recommendation systems and information retrieval at a startup and previously worked at Mozilla.ai. She walks through her current local-model setup and argues the on-device side has crossed a usability line in the last year.

How does it work?

Boykis runs her own 'Pi agent harness' against open-weights models — Mistral 7B, Gemma 3, GPT-OSS-20B, several Qwen variants, Gemma-4-26b-a4b, and gemma-4-12b-qat — through llama.cpp, Ollama, LM Studio, Open WebUI, and Docker. She estimates the models reach about 75% of the accuracy and speed of frontier cloud models on her agentic coding tasks. She notes she has 'no concrete scientific evidence' and frames this as a personal experiment, not a benchmark.

Why does it matter?

The post hit 653 points on Hacker News and is being shared as the new short answer to 'should I bother with local models yet.' For developers who care about privacy, offline work, or AI without API bills, the practical answer flipped from 'not really' to 'yes, with some patience.' It also confirms the impact of recent open-weights releases like Gemma 4.

Who is it for?

Indie developers, privacy-conscious teams, ML hobbyists

Vicki Boykis: 'Running Local Models Is Good Now'

What is it?

How does it work?

Why does it matter?

Who is it for?

Links

Tags