AI/TLDR

Using LLMs Through AWS Bedrock and Vertex AI

Learn why enterprises call Claude through Bedrock or Gemini through Vertex, and what changes versus hitting the provider directly.

INTERMEDIATE11 MIN READUPDATED 2026-06-12

In plain English

When you want to call Claude, the most direct path is Anthropic's own API. When you want Gemini, Google's own Gemini API is right there. So why do most enterprise teams end up calling these same models through a different door — Amazon Bedrock or Google Vertex AI?

Think of it like buying coffee beans. You could buy direct from the farm, but a corporate office usually orders through a supplier that already has a contract with procurement, gets invoiced on the same account as everything else, ships to a bonded warehouse in the approved region, and has a support line that knows who you are. The coffee is the same. The supply chain is what changed.

Amazon Bedrock is AWS's managed API gateway for foundation models. It lets you call Claude (from Anthropic), Llama (Meta), Mistral, and others through a single AWS endpoint — authenticated with IAM, billed on your AWS invoice, logged in CloudWatch, and contained inside your AWS security boundary. Google Vertex AI does the same job for Google Cloud: it exposes Gemini (and other models) through the Google Cloud platform, authenticated with service accounts, billed to your GCP project, and governed by the same IAM policies as your other cloud resources.

Why a builder cares

If you are a solo developer building a side project, the direct provider API is almost always simpler: one API key, one invoice, immediate access to every new feature the day it ships. The cloud-platform routes exist to solve problems that don't appear until you are operating inside a larger organisation.

Problems the direct API doesn't solve

  • Billing consolidation. Finance teams want one AWS or GCP invoice, not a separate Anthropic subscription alongside ten other SaaS tools.
  • IAM-native auth. Enterprise security policies often prohibit long-lived API keys. Bedrock and Vertex let you authenticate with short-lived IAM credentials, service accounts, and federated identity — the same patterns used for every other cloud service.
  • Data residency. Regulated industries (healthcare, finance, government) need to prove that prompts and completions never leave a specific geographic region. Both platforms expose regional endpoints and publish compliance documentation for HIPAA, SOC 2, ISO 27001, and similar frameworks.
  • VPC containment. Traffic to a provider API crosses the public internet. Bedrock and Vertex can route calls through private networking (AWS PrivateLink or Google VPC Service Controls) so model traffic never leaves your cloud perimeter.
  • Audit logging. CloudTrail (AWS) and Cloud Audit Logs (GCP) automatically record every API call with the caller identity, timestamp, and request metadata — essential for compliance reviews.
  • Multi-model routing. Bedrock hosts Claude, Llama, Mistral, and more under one endpoint. If you want to fall back from one model to another, or A/B test two providers, you do it without managing multiple vendor accounts.

How it works

Both platforms act as a proxy layer that sits between your application code and the model provider's inference cluster. Your request travels through your cloud's network fabric, gets authenticated against cloud IAM, is logged, optionally filtered by guardrails, and then forwarded to the model. The response takes the reverse path.

AWS Bedrock: how authentication works

Bedrock uses AWS Signature Version 4 (SigV4) signing instead of a static API key. The Anthropic Bedrock SDK handles signing automatically once you provide AWS credentials. Those credentials can come from environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN), an EC2/ECS instance role, an assumed IAM role, or AWS SSO — the full standard AWS credential chain.

The new Bedrock integration (as of 2025) exposes Claude at a Messages API endpoint that uses the same request/response shape as Anthropic's direct API: https://bedrock-mantle.{region}.api.aws/anthropic/v1/messages. Model IDs carry an anthropic. prefix — for example, anthropic.claude-opus-4-8. Bedrock also retains a legacy InvokeModel / Converse integration with ARN-style model identifiers; new projects should use the Messages API path.

Calling Claude through Bedrock (Python)python
from anthropic import AnthropicBedrockMantle

# Credentials come from the AWS credential chain — no hard-coded key
client = AnthropicBedrockMantle(aws_region="us-east-1")

message = client.messages.create(
    model="anthropic.claude-opus-4-8",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Summarise this contract in three bullets."}],
)
print(message.content[0].text)

Vertex AI: how authentication works

Vertex AI uses Application Default Credentials (ADC) and Google Cloud service accounts instead of API keys. When your code runs inside GCP (Cloud Run, GKE, Compute Engine), ADC picks up credentials automatically from the instance's attached service account. Outside GCP, you run gcloud auth application-default login or set the GOOGLE_APPLICATION_CREDENTIALS environment variable to point at a service-account JSON key file.

The Google Gen AI SDK supports both the Gemini Developer API and Vertex AI through a single unified interface — you switch between them with a flag rather than rewriting API calls. On the Vertex path, you supply a GCP project ID and a location (region), and the SDK routes requests to aiplatform.googleapis.com inside your project.

Calling Gemini through Vertex AI (Python)python
from google import genai

# ADC picks up credentials automatically inside GCP;
# outside GCP, ensure GOOGLE_APPLICATION_CREDENTIALS is set
client = genai.Client(
    vertexai=True,
    project="my-gcp-project",
    location="us-central1",
)

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="Summarise this contract in three bullets.",
)
print(response.text)

Bedrock vs direct Anthropic API: what actually changes

For the vast majority of API calls, the request and response JSON are identical. The differences are in the operational layer around those calls.

DimensionDirect Anthropic APIAWS Bedrock
AuthenticationStatic API key (x-api-key header)SigV4 signing with IAM credentials (short-lived)
BillingAnthropic invoice / credit cardAWS invoice, supports reserved capacity and cost-allocation tags
Base pricing (on-demand)Same per-token rateSame per-token rate — no markup
Batch inference discount50% off50% off
Regional endpoint premiumN/A (single global endpoint)10% premium for regional endpoints; global routing is free
New features / model accessSame-day on releaseMay lag by days to weeks for new capabilities
Data retention / residencyGoverned by AnthropicGoverned by AWS, regional endpoints available
Audit loggingNot providedCloudTrail + CloudWatch out of the box
Private networkingNot availableAWS PrivateLink supported
Guardrails / content filtersAnthropic safety layer onlyBedrock Guardrails (configurable enterprise-side filters)
Multi-model switchingClaude models onlyClaude, Llama, Mistral, Titan, and others on one endpoint

Vertex AI vs direct Gemini API: what actually changes

Google offers two separate routes to Gemini: the Gemini Developer API (via generativelanguage.googleapis.com, accessed with an API key from Google AI Studio) and the Vertex AI API (via aiplatform.googleapis.com, authenticated with IAM). The split mirrors the Bedrock situation: the developer API is optimised for simplicity, Vertex is optimised for enterprise governance.

DimensionGemini Developer APIVertex AI Gemini API
AuthenticationAPI key (Google AI Studio)Service account / ADC / IAM
Free tierYes (rate-limited)No — all usage billed to GCP project
IAM access controlNot availableFine-grained IAM roles per resource
VPC Service ControlsNot availableSupported — prevent data exfiltration
Customer-managed encryption (CMEK)Not availableSupported
Data residency controlsLimitedRegional endpoints in EU, US, Asia
Audit loggingNot availableCloud Audit Logs + Cloud Monitoring
HIPAA / compliance BAANot availableAvailable through Google Cloud agreement
Batch inference jobsNot availableSupported
SLABest-effortGoogle Cloud SLA applies

The Gemini Developer API has a generous free tier and is the fastest way to prototype. It uses a simple API key, making it easy to get started in minutes. Vertex AI has no free tier but unlocks every enterprise control: HIPAA eligibility, VPC isolation, CMEK, and the full Google Cloud compliance posture.

Choosing the right route for your project

The decision is rarely about model quality — it is about your operational context. Here is a practical heuristic.

The lock-in consideration

Routing through a cloud platform does introduce mild lock-in to that cloud's tooling — IAM policies, billing constructs, and observability dashboards. The model itself is portable; the operational scaffolding around it is not. If multi-cloud portability matters, keep your model-calling code behind an abstraction layer that lets you swap the underlying client without touching business logic.

Bedrock's multi-model menu actually reduces model-level lock-in: you can switch from Claude to Llama or Mistral without changing your auth setup, billing config, or logging pipeline. This is one of its genuine advantages over going direct to each individual provider.

Going deeper

Bedrock Provisioned Throughput

For workloads that consistently push high token volumes, Bedrock offers Provisioned Throughput: you commit to a reserved capacity unit (measured in model units, roughly 1 million tokens per minute per unit) for a fixed hourly rate. At high enough volume this can be cheaper than on-demand pricing. The trade-off is that you pay the hourly rate even when you are not using it, so it only makes sense if your utilisation is predictably high. Bedrock also charges for HTTP 500 errors under provisioned throughput, unlike the direct Anthropic API which has a 3% error-rate forgiveness buffer.

Bedrock Guardrails

AWS Bedrock Guardrails is an operator-side content-filtering layer that sits in front of the model. You configure policies (blocked topics, PII redaction, profanity filters, grounding checks) in the AWS console, then pass a guardrail ID with each request. Guardrails run before the prompt reaches the model and after the response comes back, giving you a defence layer that is independent of the model provider's own safety systems. This is particularly useful in regulated industries where you need auditable proof that certain content was never processed.

Vertex AI CMEK and VPC Service Controls

Customer-managed encryption keys (CMEK) let you supply your own Cloud KMS key to encrypt data at rest in Vertex — prompts, cached content, fine-tune checkpoints. If your key is revoked, Google can no longer decrypt your data, giving you a cryptographic kill-switch. VPC Service Controls wrap your Vertex project in a service perimeter: API calls must originate from inside the perimeter, preventing data exfiltration even if a credential is compromised.

Feature-availability lag

Both cloud platforms typically lag the direct provider APIs by days to weeks when new features ship. Anthropic may announce prompt caching or a new tool type; it lands in the Anthropic API immediately and in Bedrock later once AWS validates and deploys the update. If staying on the bleeding edge of capabilities matters for your use case — for example, you are building a product demo around a just-launched feature — the direct API is the lower-friction choice.

Cross-region inference profiles on Bedrock

Bedrock's inference profiles (US, EU, JP, AU) let you route across multiple AWS regions within a geography for higher throughput and resilience, without pinning to a single region. This is distinct from the global endpoint, which can route anywhere in the world, and from single-region endpoints, which satisfy hard data-residency requirements. Inference profiles sit in between: geographic containment with multi-region redundancy.

FAQ

Is Claude on Bedrock the same model as Claude on the Anthropic API?

Yes. The model weights are identical — anthropic.claude-opus-4-8 on Bedrock and claude-opus-4-8 on the Anthropic API run the same model. What differs is the infrastructure layer: authentication method, billing, logging, and which optional features (like server-side tools) are available on each route.

Is using Bedrock or Vertex AI more expensive than the direct API?

On-demand per-token pricing is identical across routes — there is no markup for using Bedrock or Vertex. Costs can diverge at high volume: Bedrock's Provisioned Throughput can be cheaper for sustained heavy use but adds data-transfer fees. Vertex AI's batch inference offers significant discounts for non-real-time workloads. Bedrock regional endpoints carry a 10% premium over the global endpoint.

Do I need an AWS account to use Claude through Bedrock?

Yes. You need an AWS account with Amazon Bedrock model access enabled for the specific Claude models you want to use. Claude model access is granted through the Bedrock console under Model access. There is no way to use Bedrock without an AWS account.

Which Gemini features are only available on Vertex AI and not the Gemini Developer API?

Vertex AI adds IAM-based access control, VPC Service Controls, customer-managed encryption keys (CMEK), data residency enforcement via regional endpoints, Cloud Audit Logs integration, HIPAA-eligible terms, and batch inference jobs. The developer API has none of these — it uses a simple API key and has no compliance SLA.

Can I switch from the direct Anthropic API to Bedrock without rewriting my code?

Mostly yes. The Bedrock Messages API uses the same request/response shape as the Anthropic API. You swap the client class (AnthropicBedrockMantle instead of Anthropic), add an AWS region, prefix the model ID with anthropic., and update your auth to use AWS credentials. Features not supported on Bedrock (Files API, server-side tools, Message Batches endpoint) will break and need an alternative approach.

What is the difference between the Bedrock global endpoint and a regional endpoint?

The global endpoint dynamically routes your request to the best-available AWS region for capacity and latency — no extra charge. A regional endpoint pins your traffic to a single specified region, which satisfies data-residency requirements (your prompts never leave that region), but costs 10% more and offers lower throughput ceiling than the globally-routed option.

Further reading