Overview
Ollama is a tool for downloading and running open large language models on your own computer. You install it on macOS, Windows, or Linux, then pull and chat with a model straight from the terminal with a single command.
It is aimed at developers who want to run models locally instead of calling a hosted service. Beyond the CLI, Ollama exposes a REST API on localhost and ships official Python and JavaScript libraries, so you can wire local models into your own apps and scripts.
As a local runtime, Ollama handles model downloads, serving, and the request loop for you, and connects to existing coding tools and agents such as Claude Code, Codex, and Copilot CLI through its launch integrations.
What it does
- One-line install script for macOS, Windows, and Linux, plus an official Docker image
- Run any model from the library with a single `ollama run` command
- Built-in REST API on http://localhost:11434 for running and managing models
- Official Python (`pip install ollama`) and JavaScript (`npm i ollama`) client libraries
- Launch integrations for coding tools and agents like Claude Code, Codex, and Copilot CLI
- Built on the llama.cpp backend for local model inference
Getting started
Install Ollama, then pull and chat with a model from the terminal. The same models are reachable over a local REST API and the Python and JavaScript libraries.
Install Ollama
Run the install script on macOS or Linux. On Windows, use the PowerShell command instead.
curl -fsSL https://ollama.com/install.sh | shRun and chat with a model
Pull a model from the library and start chatting in the terminal.
ollama run gemma4Call the REST API
Ollama serves a local REST API on port 11434 for running models from your own apps.
curl http://localhost:11434/api/chat -d '{
"model": "gemma4",
"messages": [{
"role": "user",
"content": "Why is the sky blue?"
}],
"stream": false
}'Use the Python library
Install the official client and send a chat request from Python.
pip install ollama
from ollama import chat
response = chat(model='gemma4', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])
print(response.message.content)Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Run open LLMs locally for privacy or offline work, without sending data to a hosted API
- Add a local model backend to a Python or JavaScript app through the official libraries
- Connect Ollama to coding tools and agents such as Claude Code, Codex, or Copilot CLI
- Prototype and test prompts against different models from the library before committing to a provider
How Ollama compares
Ollama alongside other open-source local runtimes tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Ollama | ★ 175k | Download and run open LLMs locally from your terminal |
| llama.cpp | ★ 117k | A C/C++ inference engine that runs LLMs in the GGUF format on CPUs, Apple Silicon, and GPUs with low memory use. |
| GPT4All | ★ 77.4k | GPT4All is a free desktop app and Python client that runs large language models locally on your own computer, with no API calls or GPU required. |
| LocalAI | ★ 47k | A self-hosted server that exposes an OpenAI-compatible API for running text, vision, voice, and image models on local hardware. |
| Jan | ★ 43.1k | An open-source desktop app that runs LLMs fully offline as a ChatGPT-style assistant on your own computer. |
| llamafile | ★ 25k | A Mozilla project that packages a model and its runtime into one executable file you can copy and run on any OS. |
| MLC LLM | ★ 22.8k | A machine-learning compiler that builds and runs LLMs across browsers, phones, and desktops using TVM-based code generation. |
| KTransformers | ★ 17.3k | A framework for running large Mixture-of-Experts models locally by splitting work between CPU and GPU to fit limited VRAM. |