OpenAI API: A Beginner's Guide to Chat Completions

Q: How do I get an OpenAI API key?

Sign up at platform.openai.com, open the *API keys* section under your account, and click *Create new secret key*. The key starts with `sk-`. Store it in an environment variable named `OPENAI_API_KEY` — never paste it into source code or commit it to a repository. If a key leaks, revoke it immediately in the dashboard.

Q: Which OpenAI model should I use as a beginner?

Don't default to the biggest model. Start with `gpt-4o-mini` or `gpt-4.1-mini` for most tasks — they're fast, cheap, and handle the vast majority of classification, extraction, and chat jobs well. Step up to `gpt-4o` or `gpt-4.1` when you need stronger reasoning or better instruction-following. Reserve the frontier models (GPT-5 family) for genuinely hard tasks where quality differences are measurable.

In plain English

The OpenAI API is how your code talks to OpenAI's family of GPT models. When you use ChatGPT in a browser, a person is typing. The API is the same intelligence, but a program sends the message and a program reads the answer — no human required.

Think of it as a delivery window for AI. You hand your request through the window (some text, a model name, a secret key), and a JSON response comes back containing the model's reply. OpenAI runs the model on its own servers — you don't download anything, you just send an HTTPS request and get an answer.

OpenAI currently offers two main request styles: the Chat Completions API — the classic endpoint that has powered GPT apps since 2023 — and the newer Responses API, launched in 2025, which adds built-in tools and server-side state management. Beginners can start with Chat Completions, since it's simpler, and the concepts transfer directly if you later switch. The general pattern shared by all LLM providers is explained in what is an LLM API.

Why it matters

ChatGPT the website is for one person typing one question. The moment you want GPT inside something — a customer-support bot answering thousands of tickets, a script that classifies 100,000 rows of data overnight, a feature in your app that drafts replies — you need the API. A person can't sit there pasting text into a browser at that scale. Code can.

The API is what turns "GPT is a clever assistant" into "GPT is a building block I can ship". It's the difference between using a calculator and wiring a calculator chip into your own product.

What you can build with it

Chatbots and assistants — conversational UIs backed by GPT instead of canned responses.
Content pipelines — generate, summarize, translate, or rewrite text in bulk.
Code helpers — autocomplete, explain, or review code inside your own editor or CI tool.
Data extraction — parse unstructured text (emails, PDFs, reviews) into structured JSON.
Agents — programs that reason step-by-step and call external tools to complete open-ended tasks.

How it works

Every Chat Completions request goes to one endpoint: POST https://api.openai.com/v1/chat/completions. You supply a model name, a list of messages, and your API key as a header. OpenAI runs the model, then sends back a JSON object containing the reply, token counts, and the reason the model stopped.

// One OpenAI API request, start to finish

Your codebuilds request: model, messages, optional settingsHTTPS + Authorization headerBearer sk-... sent to api.openai.comOpenAI runs the modelGPU cluster generates a reply token-by-tokenJSON responsechoices[0].message.content, usage, finish_reasonYour code reads itextract the text and use it in your app

The messages array

The core of every request is the messages array. Each element has a role and content. The three roles are:

system — background instructions that frame the whole conversation ("You are a terse SQL expert."). Sent once, usually at the top of the array.
user — the human's input.
assistant — the model's previous replies. You resend these to give GPT memory of earlier turns.

jsonjson

{
  "model": "gpt-4o",
  "messages": [
    { "role": "system",    "content": "You are a helpful assistant." },
    { "role": "user",      "content": "What is a transformer model?" }
  ]
}

The API is stateless

OpenAI keeps no session between your calls. If you want GPT to remember turn 1 when you're on turn 5, you have to resend turns 1 through 4 every time. This is the single biggest surprise for beginners: you manage the conversation history yourself. The newer Responses API can optionally store state on OpenAI's side (more on that below), but Chat Completions has no memory at all.

// Chat Completions vs. Responses API — key differences

Chat Completions (classic)

You maintain message history
Stateless — no server-side memory
Simple, widely-supported format
Best for lightweight, stateless flows

Responses API (2025+)

Optional server-side conversation state
Built-in tools: web search, code interpreter, file search
Better cache hit rates (40–80% improvement)
Where future OpenAI features will land

Your first call, step by step

Four steps from nothing to a working request.

Get an API key. Sign up at platform.openai.com, open API keys, and click Create new secret key. It starts with sk-.... Treat it like a password — anyone who has it can bill your account.
Store it safely. Never paste your key into source code or commit it to git. Set it as an environment variable: export OPENAI_API_KEY=sk-... on macOS/Linux, or add it to a .env file with python-dotenv. The official SDK reads OPENAI_API_KEY automatically.
Install the SDK. OpenAI publishes official libraries. For Python: pip install openai.
Send the request and print the reply.

bashbash

pip install openai
export OPENAI_API_KEY=sk-...   # replace with your real key

pythonpython

from openai import OpenAI

# Reads OPENAI_API_KEY from the environment automatically.
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user",   "content": "Explain what a neural network is in two sentences."},
    ],
)

# The reply text lives at choices[0].message.content.
print(response.choices[0].message.content)
print("Tokens used:", response.usage.prompt_tokens, "in /", response.usage.completion_tokens, "out")

Run it and you'll get a two-sentence answer plus token counts. That's a complete OpenAI integration. Everything else — streaming, tool calls, structured output — is variations on this.

Multi-turn conversation

To continue a conversation, append the model's reply to your messages list and send the whole thing again. Remember: the API is stateless — you own the history.

pythonpython

from openai import OpenAI

client = OpenAI()

# Build up the conversation list manually.
messages = [
    {"role": "system", "content": "You are a concise assistant."},
    {"role": "user",   "content": "My favorite language is Python."},
]

first = client.chat.completions.create(model="gpt-4o", messages=messages)
assistant_reply = first.choices[0].message.content

# Append GPT's reply, then ask a follow-up.
messages.append({"role": "assistant", "content": assistant_reply})
messages.append({"role": "user",      "content": "What's a quick tip for that language?"})

second = client.chat.completions.create(model="gpt-4o", messages=messages)
print(second.choices[0].message.content)  # GPT knows the context from turn 1.

Model lineup and token billing

"GPT" isn't one model — it's a family. You choose a model per request via the model field. OpenAI's current lineup ranges from small, ultra-cheap models for high-volume simple tasks up to large frontier models for hard reasoning. As of mid-2026, the headline models include the GPT-4.1 series (strong coding and instruction-following, up to 1M token context) and the GPT-5 series (top reasoning capability).

// OpenAI model tiers — speed vs. capability trade-off

GPT-5 / frontierBest reasoning, highest cost — use for genuinely hard tasksGPT-4.1 / GPT-4oStrong all-round — coding, analysis, chat — mid priceGPT-4.1 mini / GPT-4o miniBalanced budget option — most everyday tasksGPT-4.1 nano / microFastest, cheapest — classification, routing, simple extraction

A common beginner mistake is reaching for the biggest model for everything. Resist it. A nano or mini model handles classification or simple reformatting just as well for a fraction of the price. Start small and step up only when quality falls short.

How you're billed

OpenAI charges per token, split into two rates: a lower price for input tokens (everything you send — system prompt, user messages, and conversation history) and a higher price for output tokens (what the model generates). Output typically costs 4–8× more than input. The exact rates depend on the model — check platform.openai.com/pricing for current numbers, as they change with new model releases.

A token is roughly ¾ of an English word — "chatbot" is one token, "unbelievable" might be three. The usage field in every response tells you exactly how many input and output tokens were consumed for that call.

Prompt caching

OpenAI automatically caches the beginning of your prompt when the same prefix appears in multiple requests. Cached input tokens are discounted (often around 50% off). This means: put your large, unchanging system prompt first in the messages array so it qualifies for caching. The full picture of input, output, and cached pricing is in LLM API pricing explained.

Going deeper

Once your first call works, the same API unlocks everything serious AI apps need. These are all extra parameters on the request you already know — not separate APIs.

Streaming

By default you wait for the entire reply, then receive it at once — which can mean staring at a blank screen for several seconds on a long answer. Streaming sends tokens as they're generated, producing the typewriter effect you see in ChatGPT. Set stream=True in your request. The SDK provides a streaming helper that handles the server-sent-events protocol for you. Details in streaming explained.

Function calling / tool use

GPT can't check live data on its own. Function calling lets you describe tools the model may invoke (e.g., get_weather(city: str)). When GPT decides a tool is needed, it returns a structured JSON request instead of prose; your code executes the function and feeds the result back. This is the foundation of agentic apps. See what is function calling.

Structured outputs

Pass a JSON Schema in the response_format field and OpenAI guarantees the reply conforms to it — no more regex-parsing freeform text when a downstream program needs structured data. For extraction tasks ("parse this invoice into JSON") this is a game-changer for reliability.

Error handling and rate limits

Real apps hit bumps. A 429 means you're sending too fast (rate-limited); a 500 or 503 means OpenAI's servers are struggling. The official SDK retries transient errors automatically with exponential backoff. Wrap calls in try/except for the rest:

pythonpython

import openai
from openai import OpenAI

client = OpenAI()

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    # finish_reason is "stop" on a normal completion;
    # "length" means it hit max_tokens and was cut off.
    if response.choices[0].finish_reason == "length":
        print("Reply truncated — raise max_tokens.")
    print(response.choices[0].message.content)
except openai.RateLimitError:
    print("Rate-limited. The SDK already retried — back off further.")
except openai.APIStatusError as e:
    print(f"API error {e.status_code}: {e.message}")

HTTP code	Means	What to do
401	Invalid API key	Check OPENAI_API_KEY is set and the key is active
404	Unknown model ID	Fix the model string — typos and old names both return 404
429	Rate limited	Slow down; SDK retries with backoff already
500/503	OpenAI server error	Retry later; the SDK handles this automatically

FAQ

What is the OpenAI API in simple terms?

It's a way for your code to send text to OpenAI's GPT models and get an AI-generated reply back — without using ChatGPT's website. You make an HTTPS request to api.openai.com with your question, a model name, and a secret API key. OpenAI runs the model and returns the answer as JSON. It's how you embed GPT inside an app, script, or automated workflow.

How do I get an OpenAI API key?

Sign up at platform.openai.com, open the API keys section under your account, and click Create new secret key. The key starts with sk-. Store it in an environment variable named OPENAI_API_KEY — never paste it into source code or commit it to a repository. If a key leaks, revoke it immediately in the dashboard.

What is the difference between Chat Completions and the Responses API?

Chat Completions (the classic endpoint at /v1/chat/completions) is stateless — you manage the conversation history yourself. The Responses API, launched in 2025, can store conversation state on OpenAI's servers so you don't have to resend the full history every turn. It also has built-in tools like web search and code execution, and better prompt caching efficiency. OpenAI recommends new projects use the Responses API, but Chat Completions remains fully supported.

How much does the OpenAI API cost?

You're billed per token with separate rates for input (what you send) and output (what the model generates). Output typically costs 4–8× more than input. Rates vary by model — cheaper mini and nano models cost a fraction of the flagship models. In multi-turn conversations, input tokens grow each call because you resend the whole history, which is why prompt caching (automatic on eligible requests, often ~50% off) matters. Check platform.openai.com/pricing for current numbers.

Which OpenAI model should I use as a beginner?

Don't default to the biggest model. Start with gpt-4o-mini or gpt-4.1-mini for most tasks — they're fast, cheap, and handle the vast majority of classification, extraction, and chat jobs well. Step up to gpt-4o or gpt-4.1 when you need stronger reasoning or better instruction-following. Reserve the frontier models (GPT-5 family) for genuinely hard tasks where quality differences are measurable.

Does the OpenAI Chat Completions API remember previous messages?

No. Chat Completions is stateless — each call starts fresh. If you want GPT to recall an earlier turn, you must resend the entire message history (all prior user and assistant turns) with every new request. You maintain this list yourself in your code. The newer Responses API can optionally manage this state on OpenAI's side, which is one reason it's recommended for new projects.

// In plain English

// Why it matters

What you can build with it

// How it works

The messages array

The API is stateless

// Your first call, step by step

Multi-turn conversation

// Model lineup and token billing

How you're billed

Prompt caching

// Going deeper

Streaming

Function calling / tool use

Structured outputs

Error handling and rate limits

// FAQ

// Further reading

// Related

In plain English

Why it matters

How it works

Your first call, step by step

Model lineup and token billing

Going deeper

FAQ

Further reading

Related