AI/TLDR

What Is Jan? An Offline-First AI Assistant

You will understand what Jan is, how its offline-first desktop app runs open models locally as a private ChatGPT alternative, and when it can also use cloud models.

BEGINNER11 MIN READUPDATED 2026-06-14

In plain English

When you chat with a cloud assistant, your words travel to a company's servers, get processed there, and the reply comes back. That works well, but it means your conversations leave your machine, you need an internet connection, and you usually pay per use. For many tasks that is a fine trade. For private notes, sensitive documents, or just tinkering offline, it is not.

Jan — illustration
Jan — neilsahota.com

Jan is a free, open-source desktop app that flips this around. It downloads an open large language model onto your own computer and runs it locally, so the whole conversation happens on your hardware. Nothing has to leave the machine. It looks and feels like a familiar chat window — a message box, a list of past chats, a model picker — but the brain doing the answering is sitting on your own disk, not in a data center.

Think of the difference between calling a translator on the phone and hiring one to sit in your office. The phone translator (a cloud assistant) is always available and very capable, but every word you say goes down the line to someone else. The in-house translator (Jan) lives with you: a bit more setup, bounded by what one person can do, but everything stays in the room. Jan is the app that brings that in-house translator to your desktop.

Why it matters

Running a model on your own machine instead of through someone's API changes a few things that builders and privacy-conscious users care about a lot.

  • Privacy and data ownership. Your prompts, files, and the model's replies never have to touch a third-party server. For drafting personal notes, reviewing a contract, or working with health or legal text, that is the whole point — the data stays on hardware you control.
  • It works offline. Once the model is downloaded, you do not need a network connection. You can use it on a plane, in a basement, or anywhere the Wi‑Fi is bad, and there is no API outage that can take it away from you.
  • No per-token bill. A cloud assistant charges for every request. A local model costs nothing per message after the one-time download — you are only paying with your own electricity and the hardware you already own.
  • No lock-in. Jan runs open-weight models in standard formats, and your chats are stored as plain files on your disk. You are not tied to one vendor's account, model, or pricing.

Who is it for? Anyone who wants a private, ChatGPT-style chat window without sending data to the cloud: developers prototyping with local LLMs, writers and researchers handling sensitive text, students learning how models behave, and tinkerers who simply want to own their stack. If you would reach for a cloud chatbot but the data is too private to upload, Jan is built for exactly that gap.

The honest trade-off: a model small enough to run on your laptop is generally less capable than a frontier cloud model, and it leans on your hardware — more memory and a decent GPU mean bigger, faster models. Jan does not remove that ceiling; it just makes the local option easy and pleasant to use, and lets you reach for a cloud model when you need the extra horsepower.

How it works

Under the friendly chat window, Jan is really three things bundled together: a way to get models, an engine that runs them, and a local server that speaks the same API language as the cloud services. You install one app and those pieces are wired up for you.

Get a model

Jan ships with a built-in model hub. You browse a list of open models, pick one that fits your machine, and click download. The model arrives as a quantized file — a compressed version of the model's weights that trades a little quality for a much smaller size and lower memory use, which is what lets a multi-billion-parameter model fit on an ordinary laptop. Jan stores these files in a normal folder on your disk; they are yours to keep, copy, or delete.

Run it on your hardware

When you send a message, Jan loads the model into memory and an inference engine does the actual work of turning your prompt into tokens and generating a reply. Jan uses well-established open engines under the hood (the same llama.cpp family that powers much of the local-model world), and it can push the heavy math onto your GPU when you have one — see GPU offloading — or fall back to your CPU when you do not. From your side it is just typing and reading; the engine handles the math.

Talk to it like a cloud API

Jan can also expose a local server on your own computer that mimics the OpenAI-style chat API. That sounds technical, but it is powerful: any script or app that already knows how to call a cloud assistant can be pointed at Jan instead, just by changing the address to a local one. So you can build and test an app against a private local model, then swap to a cloud model later without rewriting your code.

calling Jan's local server like any chat APIpython
from openai import OpenAI

# Jan runs a local, OpenAI-compatible server on your own machine.
# No data leaves your computer; the api_key is just a placeholder.
client = OpenAI(
    base_url="http://localhost:1337/v1",  # point at Jan, not the cloud
    api_key="local-no-key-needed",
)

resp = client.chat.completions.create(
    model="<the model you downloaded in Jan>",
    messages=[{"role": "user", "content": "Summarize this note in one line."}],
)
print(resp.choices[0].message.content)

Jan vs Ollama vs LM Studio

Jan is one of several popular ways to run open models on your own computer. The three names people compare most are Jan, Ollama, and LM Studio. They overlap heavily — all three download and run quantized open models locally — but they aim at slightly different users.

ToolMain interfaceBest for
JanFull chat app (GUI), open-sourceA private, ChatGPT-style desktop app you own end to end
OllamaCommand line first, with a serverDevelopers who want a simple CLI and to wire models into other tools
LM StudioPolished GUI, closed-source appGUI users who want a rich model browser and tuning controls

The big distinctions: Jan's headline is that it is fully open-source and built as a complete chat application you live in, with privacy as the first principle. Ollama leans developer-first — a clean command-line tool and background server that other apps build on top of; many people actually run an Ollama server and point a separate UI at it. LM Studio is a slick graphical app like Jan, but it is closed-source. If your priority is an open, self-contained desktop assistant, Jan is the natural pick; if you want a scriptable engine to embed elsewhere, Ollama fits better. They are not mutually exclusive — Jan can even connect to other local servers.

Getting started and picking a model

The first-run experience is deliberately simple. You install Jan, open it, choose a model from the hub, and start chatting. The only real decision is which model — and that comes down to your hardware.

  1. Install and open Jan. No account, no sign-up, no internet login required to use it locally.
  2. Check your memory. The single most important number is your RAM (and your GPU's VRAM if you have a dedicated card). Bigger models need more memory; if a model does not fit, it will be very slow or fail to load.
  3. Download a model that fits. Start small. A smaller quantized model runs fast and proves the setup works; you can always grab a bigger one later for higher quality.
  4. Chat — or start the local server. Use the chat window directly, or turn on Jan's local API server to call the model from your own code.
  5. Optional: add a cloud key. When a task is too hard for your local model, add an API key in settings and switch to a cloud model for that chat.

Memory really is the gatekeeper. A rough way to think about it: a model's download size is close to how much memory it needs just to load, before any conversation. So a model file that is several gigabytes wants at least that much free RAM or VRAM to run comfortably. For the full picture of what your machine can handle, see local LLM hardware requirements and the per-platform guides for Mac and Windows.

Going deeper

Once the basic chat works, Jan opens up into the wider world of local AI. A few directions worth knowing as you go further.

Hybrid local-plus-cloud workflows. Because Jan can hold both local models and cloud API keys, a common pattern is to do everyday, private work locally and only reach for a frontier cloud model on the hard cases. You decide per-conversation where the data goes — sensitive material stays on-device, while a tricky reasoning task can borrow cloud power. That choice is yours to make explicitly, not a default buried in a vendor's settings.

Jan as a backend for other apps. The local OpenAI-compatible server is the real unlock for builders. Anything that speaks that API — a coding assistant, a note app, a small RAG prototype that answers from your own documents — can be pointed at Jan and run entirely offline. You get to develop against a private model and switch the address to a cloud provider later without touching the rest of your code.

The shared foundation underneath. Jan, Ollama, LM Studio, and many other tools mostly sit on the same open engines and the same quantized model files. Learning Jan teaches you the whole local-model mental model — quantization, memory budgets, GPU offloading, OpenAI-compatible servers — which transfers directly to every other runner. Picking a tool is mostly about the interface you prefer, not a different technology underneath.

Honest limits to keep in mind. A locally-run model is bounded by your machine: it will usually be smaller and slower than a top cloud model, its knowledge is frozen at training time unless you add retrieval, and a heavily quantized model can lose a little accuracy compared to its full-size original. None of that is a flaw in Jan specifically — it is the nature of running AI on your own hardware. Jan's job is to make that path private, open, and easy, and it does that well. The durable takeaway: local AI is a deliberate trade of raw capability for privacy, ownership, and offline use, and Jan is one of the friendliest on-ramps to that trade.

FAQ

What is Jan AI used for?

Jan is a free, open-source desktop app for chatting with AI models that run on your own computer. People use it as a private, offline ChatGPT alternative for tasks like drafting, summarizing, coding help, and working with sensitive documents — all without sending data to the cloud. It can also call cloud models with an API key when you want more power.

Is Jan free and open-source?

Yes. Jan is free to download and use, and its source code is open. Running models locally costs nothing per message after the one-time model download — you are only using your own hardware and electricity. There is no account or subscription required for local use.

Does Jan work completely offline?

Yes, once you have downloaded a model. Jan is offline-first: after the model file is on your disk, you can chat with no internet connection at all. You only need a network to download new models, or if you deliberately choose to connect Jan to a cloud model with an API key.

Jan vs Ollama — what is the difference?

Both run open models locally, but Jan is a full graphical chat application you live in, while Ollama is a developer-first command-line tool and server that other apps build on. Choose Jan for a self-contained, ChatGPT-style desktop app; choose Ollama if you want a scriptable engine to wire into your own tools. They often run the same underlying model files.

What hardware do I need to run Jan?

The key constraint is memory. A model needs roughly its download size in free RAM (or GPU VRAM) just to load, so smaller quantized models run on ordinary laptops while larger ones want more memory and ideally a dedicated GPU. Start with a small model to confirm the setup, then scale up if your machine has room.

Can Jan use cloud models like Claude or GPT?

Yes. Although Jan defaults to running models locally for privacy, you can add an API key in its settings and switch a conversation to a cloud model when a task needs more capability. This makes it easy to keep private work on-device and only send the hard cases to the cloud, on a per-chat basis.

Further reading