What Is the Gemini API? Getting Started with Google's Models

Q: How do I get a Gemini API key for free?

Go to [aistudio.google.com](https://aistudio.google.com) and sign in with a Google account. Click **Get API key** in the left sidebar, then **Create API key**. Google generates a key instantly — no credit card and no Google Cloud project required. Store it in an environment variable (`GEMINI_API_KEY`) and never paste it into source code.

Q: Does the Gemini API support images and multimodal inputs?

Yes. The same `generateContent` endpoint accepts text, images, audio, and video mixed together in the `parts` array of a single request. You can pass images inline as Base64-encoded bytes or as URIs pointing to files uploaded via the Gemini File API. Multimodal support is available on Flash models and is included in the free tier.

Get a Gemini key from AI Studio and make your first call, with the free tier, long-context limits, and pricing demystified.

BEGINNER13 MIN READUPDATED 2026-06-12

In plain English

The Gemini API is Google's public interface for running Gemini language models from your own code. When you chat with Gemini on Google's website, a person is typing and reading. The API is the same underlying intelligence, but a program sends the message and a program reads the answer — no browser required, no human in the loop.

the Gemini API — diagram — the Gemini API — ai.google

A useful analogy: imagine Gemini is a very knowledgeable consultant locked inside a server room. The API is the pneumatic-tube system you use to pass notes in and receive answers back. You write your question on a slip of paper (your request payload), feed it into the tube, and a reply comes back a few seconds later. Google handles everything inside the room — the hardware, the model weights, the inference — you just deal with the tube.

The entry point for individual developers is Google AI Studio (aistudio.google.com). It is a free web interface where you prototype prompts, generate an API key, and inspect model responses — all without touching Google Cloud or setting up a billing account. Once you have a key, you send standard HTTPS requests from any language, or you use one of Google's official SDKs for Python and JavaScript/TypeScript.

Why it matters

Google's Gemini models — especially the Flash family — are among the most competitive models available in terms of price-to-capability ratio. They are also the only mainstream models with a genuinely free tier for development: real requests, real model intelligence, no credit card, with generous-enough limits to build and test a prototype end to end.

Beyond the free tier, Gemini Flash offers one of the largest context windows in the industry — meaning you can feed the model an entire codebase, a long document, or hours of transcript without hitting a truncation wall. For applications that work with long documents, multi-step conversations, or multimodal inputs (text, images, audio, video), Gemini is a strong practical choice.

What you can build with it

Chatbots and assistants — conversational interfaces powered by Gemini instead of hand-coded responses.
Document analysis — summarize, extract, or query long PDFs, legal contracts, or research papers that exceed most models' context windows.
Multimodal apps — send images alongside text ("describe what's wrong with this photo") using the same API endpoint.
Content pipelines — rewrite, translate, classify, or generate text in bulk at low cost with Flash models.
Agentic workflows — combine Gemini with function calling to let the model invoke your tools and complete multi-step tasks.

How it works

Every Gemini API call follows the same pattern: your code sends a POST request to https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent, authenticated with your API key either in the URL (?key=...) or in the x-goog-api-key header. The request body contains a contents array — the conversation so far. Google runs the model, then returns a JSON object with the candidate reply, token counts, and a finish reason.

// One Gemini API request, start to finish

Your codebuilds request: model name, contents array, optional configHTTPS requestPOST to generativelanguage.googleapis.com with API key headerGoogle runs Geminimodel generates reply tokens on Google's infrastructureJSON responsecandidates[0].content.parts[0].text, usageMetadata, finishReasonYour code reads itextract the text and use it in your app

The contents array

Gemini uses a contents array instead of a messages array (the OpenAI convention), but the concept is identical. Each element has a role (user or model) and a parts array. Parts can be text, inline images, file references, or function results — the same object handles them all.

jsonjson

{
  "contents": [
    {
      "role": "user",
      "parts": [{ "text": "What is a transformer model, in two sentences?" }]
    }
  ]
}

For multi-turn conversations you resend the full history each call — Gemini, like most LLM APIs, is stateless by default. A system_instruction field at the top level of the request body sets the persona or background context (equivalent to the system role in other APIs).

Model naming convention

Google follows a versioned naming scheme: gemini-{generation}.{minor}-{variant}. Flash models are fast and cheap; Pro models are larger and more capable. You pass the model name as a path segment in the URL or as the model parameter in the SDK. Always check the official models reference for the current list, because Google deprecates older versions and releases new ones regularly.

// Gemini model tiers — speed vs. capability

Pro modelsHighest capability, 2M token context — complex reasoning, advanced codingFlash modelsBalanced speed + capability, 1M token context — most production use casesFlash-Lite modelsFastest, lowest cost — classification, routing, simple extraction

Your first call, step by step

Four steps from a blank slate to a working Gemini integration.

Get an API key. Go to aistudio.google.com, sign in with a Google account, click Get API key in the left sidebar, then Create API key. Google creates a new Cloud project and generates the key in one step. It starts with AIza followed by a long alphanumeric string.
Store it safely. Never put the key directly in source code or commit it to a repository. Set it as an environment variable: export GEMINI_API_KEY=AIza... on macOS/Linux. The official SDK reads GEMINI_API_KEY automatically.
Install the SDK. Google publishes google-genai for Python and @google/genai for JavaScript/TypeScript.
Send the request and print the reply.

bashbash

pip install -U google-genai
export GEMINI_API_KEY=AIza...   # replace with your real key

pythonpython

from google import genai

# Reads GEMINI_API_KEY from the environment automatically.
client = genai.Client()

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Explain what an API is in three simple sentences.",
)

# The reply text is here.
print(response.text)
print("Input tokens:", response.usage_metadata.prompt_token_count)
print("Output tokens:", response.usage_metadata.candidates_token_count)

That's a complete Gemini integration. Run it and a short answer appears alongside token counts. Every more advanced feature — streaming, multi-turn chat, function calling — is a variation on this same structure.

JavaScript / TypeScript

typescripttypescript

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const response = await ai.models.generateContent({
  model: "gemini-3.5-flash",
  contents: "What is a context window?",
});

console.log(response.text);

Raw HTTP with curl

You can also call the API directly without any SDK — useful for quick tests or in environments where installing a package is inconvenient.

bashbash

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -X POST \
  -d '{"contents":[{"parts":[{"text":"Explain how AI works in two sentences."}]}]}'

Free tier, rate limits, and pricing

The Gemini API free tier requires no credit card and no billing account. It covers Flash and Flash-Lite models. Pro models moved behind paid billing in early 2026. The free tier is well suited for development, prototyping, and low-volume applications. When you need higher throughput or Pro model access, you enable Google Cloud billing and pay per token.

Model tier	Free RPM	Free RPD	Free TPM	Paid input (per 1M tokens)	Paid output (per 1M tokens)
Flash-Lite	15	1,000	250,000	$0.10	$0.40
Flash	10	250	250,000	$0.30	$2.50
Pro	5	100	250,000	$4.00	$18.00

How token billing works

You are billed separately for input tokens (everything you send — prompt, system instruction, conversation history, and any file bytes) and output tokens (what the model generates). Output costs roughly 4–8x more than input depending on the model tier. In a long multi-turn conversation, input tokens grow with every call because you resend the full history — watch response.usage_metadata.prompt_token_count as conversations grow to avoid surprise costs.

Context caching

Gemini supports context caching: you upload a large, static chunk of content once (a long document, a system prompt, a code file), and the model caches it server-side. Subsequent calls that reference the cache pay a significantly lower rate for those cached tokens. If your application repeatedly uses the same large document across many requests, context caching can cut costs substantially.

Long context and multimodal inputs

One of Gemini's defining strengths is its context window — the total number of tokens the model can process in a single request (input plus output combined). As of mid-2026, Flash models support up to 1 million tokens and Pro models up to 2 million tokens. For reference, a million tokens is roughly 750,000 English words — enough to hold several full-length novels, or an entire medium-sized codebase.

This changes what is practical. Instead of chunking a 300-page contract and summarizing each chunk separately, you can send the entire document in one call and ask a precise question. Instead of summarizing a codebase incrementally, you can provide all the relevant files at once. Long-context access is available on the free tier, though very large requests will consume your daily quota quickly.

Multimodal requests

The same generateContent endpoint accepts images, audio clips, and video alongside text — all mixed together in the parts array of a single request. You can pass images as Base64-encoded strings inline, or as URLs pointing to files already uploaded to the Gemini File API. For example, sending a screenshot and asking "what's wrong with this UI?" uses exactly the same code structure as a plain text request, just with an extra part containing the image data.

pythonpython

import base64
from google import genai
from google.genai import types

client = genai.Client()

with open("screenshot.png", "rb") as f:
    image_bytes = f.read()

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents=[
        types.Content(
            role="user",
            parts=[
                types.Part(text="What is shown in this image and what could be improved?"),
                types.Part(
                    inline_data=types.Blob(
                        mime_type="image/png",
                        data=base64.b64encode(image_bytes).decode(),
                    )
                ),
            ],
        )
    ],
)

print(response.text)

Going deeper

Once the basics work, several additional capabilities unlock more powerful applications — all as parameters on the same generateContent call you already know.

Streaming responses

By default you wait for the entire reply, then receive it at once. For long answers this means staring at a blank screen for seconds. Streaming delivers tokens as they're generated — the typewriter effect you see in chat interfaces. Use client.models.generate_content_stream() in the Python SDK. Each chunk yields a partial candidate with accumulated text.

pythonpython

from google import genai

client = genai.Client()

for chunk in client.models.generate_content_stream(
    model="gemini-3.5-flash",
    contents="Write a short story about a robot learning to cook.",
):
    print(chunk.text, end="", flush=True)

Function calling (tool use)

Gemini can't browse the web or query a database on its own. Function calling lets you describe tools the model may invoke. When Gemini decides a tool is needed, it returns a structured functionCall part instead of prose; your code runs the function and sends the result back in the next turn. This is the foundation of agentic apps where the model orchestrates real actions.

Gemini API vs. Vertex AI — when to switch

The Gemini Developer API (via AI Studio) is the right choice for individual developers, students, and startups building prototypes or early-stage products. The free tier, fast key generation, and minimal setup make it frictionless. Vertex AI is Google's enterprise platform: it runs the same Gemini models but adds SLAs, data residency controls, VPC networking, IAM, audit logging, and integration with the broader Google Cloud ecosystem. The recommended path is to start with the Developer API and migrate to Vertex AI when you need enterprise compliance or scale — the unified google-genai SDK supports both backends by changing one environment variable.

// Gemini Developer API vs. Vertex AI

Developer API (AI Studio)

Free tier, no credit card required
Key generated in seconds
No Cloud project setup needed
Best for prototypes and small projects
Community-level support

Vertex AI (Google Cloud)

No free tier — enterprise billing
Full Cloud project + IAM setup
SLA, data residency, VPC support
Best for production at scale
Enterprise support + compliance

Error handling and common HTTP errors

Real apps encounter failures. A 429 means you've exceeded your rate limit (requests per minute or tokens per minute). A 400 usually means a malformed request — check the model name and content structure. A 403 means your API key is invalid, expired, or blocked. The Python SDK raises typed exceptions you can catch:

pythonpython

from google import genai
from google.api_core import exceptions as api_exceptions

client = genai.Client()

try:
    response = client.models.generate_content(
        model="gemini-3.5-flash",
        contents="Hello!",
    )
    print(response.text)
except api_exceptions.ResourceExhausted:
    print("Rate limit hit — slow down or upgrade tier.")
except api_exceptions.Unauthenticated:
    print("API key invalid or expired — check GEMINI_API_KEY.")
except api_exceptions.InvalidArgument as e:
    print(f"Bad request: {e}")

HTTP code	Means	What to do
400	Bad request — invalid model or malformed contents	Check model string and contents array structure
403	API key invalid, expired, or key type rejected	Regenerate your key in AI Studio; ensure it is an auth key
429	Rate limit exceeded (RPM or TPM)	Add backoff/retry logic; consider upgrading to paid tier
500/503	Google server error	Retry with exponential backoff; transient issue

FAQ

What is the Gemini API in simple terms?

It is a way for your code to send text (and optionally images, audio, or video) to Google's Gemini language models and receive an AI-generated reply. You make an HTTPS request to Google's servers with your question, a model name, and a secret API key. Google runs the model and returns the answer as JSON. It is how you embed Gemini inside an app, script, or automated workflow instead of using the Google Gemini website.

How do I get a Gemini API key for free?

Go to aistudio.google.com and sign in with a Google account. Click Get API key in the left sidebar, then Create API key. Google generates a key instantly — no credit card and no Google Cloud project required. Store it in an environment variable (GEMINI_API_KEY) and never paste it into source code.

What models are available on the Gemini API free tier?

As of mid-2026, the free tier covers Flash and Flash-Lite models only. Pro models require a paid billing account. The free tier has real rate limits (requests per minute and per day) that are generous enough for development and light production use, but you will need to enable billing to scale up or access Pro-tier capability.

How large is the Gemini context window?

Flash models support up to 1 million tokens of context and Pro models support up to 2 million tokens. That is enough to hold an entire large codebase, a book-length document, or hours of transcript in a single call. Both paid and free-tier requests can use the full context window, though very large requests consume your daily quota faster.

What is the difference between the Gemini API and Vertex AI?

Both run Gemini models, but they target different audiences. The Gemini Developer API (accessed through Google AI Studio) is for individual developers and startups — free tier, instant key setup, no Cloud infrastructure. Vertex AI is Google's enterprise platform and adds SLAs, data residency, VPC networking, IAM, and audit logging. Start with the Developer API and migrate to Vertex AI only when you need enterprise compliance or Google Cloud integration. The google-genai SDK supports both with a one-line config change.

Does the Gemini API support images and multimodal inputs?

Yes. The same generateContent endpoint accepts text, images, audio, and video mixed together in the parts array of a single request. You can pass images inline as Base64-encoded bytes or as URIs pointing to files uploaded via the Gemini File API. Multimodal support is available on Flash models and is included in the free tier.

// In plain English

// Why it matters

What you can build with it

// How it works

The contents array

Model naming convention

// Your first call, step by step

JavaScript / TypeScript

Raw HTTP with curl

// Free tier, rate limits, and pricing

How token billing works

Context caching

// Long context and multimodal inputs

Multimodal requests

// Going deeper

Streaming responses

Function calling (tool use)

Gemini API vs. Vertex AI — when to switch

Error handling and common HTTP errors

// FAQ

// Further reading

// Related

In plain English

Why it matters

How it works

Your first call, step by step

Free tier, rate limits, and pricing

Long context and multimodal inputs

Going deeper

FAQ

Further reading

Related