In plain English
When you want to call Claude, the most direct path is Anthropic's own API. When you want Gemini, Google's own Gemini API is right there. So why do most enterprise teams end up calling these same models through a different door — Amazon Bedrock or Google Vertex AI?
Think of it like buying coffee beans. You could buy direct from the farm, but a corporate office usually orders through a supplier that already has a contract with procurement, gets invoiced on the same account as everything else, ships to a bonded warehouse in the approved region, and has a support line that knows who you are. The coffee is the same. The supply chain is what changed.
Amazon Bedrock is AWS's managed API gateway for foundation models. It lets you call Claude (from Anthropic), Llama (Meta), Mistral, and others through a single AWS endpoint — authenticated with IAM, billed on your AWS invoice, logged in CloudWatch, and contained inside your AWS security boundary. Google Vertex AI does the same job for Google Cloud: it exposes Gemini (and other models) through the Google Cloud platform, authenticated with service accounts, billed to your GCP project, and governed by the same IAM policies as your other cloud resources.
Why a builder cares
If you are a solo developer building a side project, the direct provider API is almost always simpler: one API key, one invoice, immediate access to every new feature the day it ships. The cloud-platform routes exist to solve problems that don't appear until you are operating inside a larger organisation.
Problems the direct API doesn't solve
- Billing consolidation. Finance teams want one AWS or GCP invoice, not a separate Anthropic subscription alongside ten other SaaS tools.
- IAM-native auth. Enterprise security policies often prohibit long-lived API keys. Bedrock and Vertex let you authenticate with short-lived IAM credentials, service accounts, and federated identity — the same patterns used for every other cloud service.
- Data residency. Regulated industries (healthcare, finance, government) need to prove that prompts and completions never leave a specific geographic region. Both platforms expose regional endpoints and publish compliance documentation for HIPAA, SOC 2, ISO 27001, and similar frameworks.
- VPC containment. Traffic to a provider API crosses the public internet. Bedrock and Vertex can route calls through private networking (AWS PrivateLink or Google VPC Service Controls) so model traffic never leaves your cloud perimeter.
- Audit logging. CloudTrail (AWS) and Cloud Audit Logs (GCP) automatically record every API call with the caller identity, timestamp, and request metadata — essential for compliance reviews.
- Multi-model routing. Bedrock hosts Claude, Llama, Mistral, and more under one endpoint. If you want to fall back from one model to another, or A/B test two providers, you do it without managing multiple vendor accounts.
How it works
Both platforms act as a proxy layer that sits between your application code and the model provider's inference cluster. Your request travels through your cloud's network fabric, gets authenticated against cloud IAM, is logged, optionally filtered by guardrails, and then forwarded to the model. The response takes the reverse path.
AWS Bedrock: how authentication works
Bedrock uses AWS Signature Version 4 (SigV4) signing instead of a static API key. The Anthropic Bedrock SDK handles signing automatically once you provide AWS credentials. Those credentials can come from environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN), an EC2/ECS instance role, an assumed IAM role, or AWS SSO — the full standard AWS credential chain.
The new Bedrock integration (as of 2025) exposes Claude at a Messages API endpoint that uses the same request/response shape as Anthropic's direct API: https://bedrock-mantle.{region}.api.aws/anthropic/v1/messages. Model IDs carry an anthropic. prefix — for example, anthropic.claude-opus-4-8. Bedrock also retains a legacy InvokeModel / Converse integration with ARN-style model identifiers; new projects should use the Messages API path.
from anthropic import AnthropicBedrockMantle
# Credentials come from the AWS credential chain — no hard-coded key
client = AnthropicBedrockMantle(aws_region="us-east-1")
message = client.messages.create(
model="anthropic.claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Summarise this contract in three bullets."}],
)
print(message.content[0].text)Vertex AI: how authentication works
Vertex AI uses Application Default Credentials (ADC) and Google Cloud service accounts instead of API keys. When your code runs inside GCP (Cloud Run, GKE, Compute Engine), ADC picks up credentials automatically from the instance's attached service account. Outside GCP, you run gcloud auth application-default login or set the GOOGLE_APPLICATION_CREDENTIALS environment variable to point at a service-account JSON key file.
The Google Gen AI SDK supports both the Gemini Developer API and Vertex AI through a single unified interface — you switch between them with a flag rather than rewriting API calls. On the Vertex path, you supply a GCP project ID and a location (region), and the SDK routes requests to aiplatform.googleapis.com inside your project.
from google import genai
# ADC picks up credentials automatically inside GCP;
# outside GCP, ensure GOOGLE_APPLICATION_CREDENTIALS is set
client = genai.Client(
vertexai=True,
project="my-gcp-project",
location="us-central1",
)
response = client.models.generate_content(
model="gemini-2.0-flash",
contents="Summarise this contract in three bullets.",
)
print(response.text)Bedrock vs direct Anthropic API: what actually changes
For the vast majority of API calls, the request and response JSON are identical. The differences are in the operational layer around those calls.
| Dimension | Direct Anthropic API | AWS Bedrock |
|---|---|---|
| Authentication | Static API key (x-api-key header) | SigV4 signing with IAM credentials (short-lived) |
| Billing | Anthropic invoice / credit card | AWS invoice, supports reserved capacity and cost-allocation tags |
| Base pricing (on-demand) | Same per-token rate | Same per-token rate — no markup |
| Batch inference discount | 50% off | 50% off |
| Regional endpoint premium | N/A (single global endpoint) | 10% premium for regional endpoints; global routing is free |
| New features / model access | Same-day on release | May lag by days to weeks for new capabilities |
| Data retention / residency | Governed by Anthropic | Governed by AWS, regional endpoints available |
| Audit logging | Not provided | CloudTrail + CloudWatch out of the box |
| Private networking | Not available | AWS PrivateLink supported |
| Guardrails / content filters | Anthropic safety layer only | Bedrock Guardrails (configurable enterprise-side filters) |
| Multi-model switching | Claude models only | Claude, Llama, Mistral, Titan, and others on one endpoint |
Vertex AI vs direct Gemini API: what actually changes
Google offers two separate routes to Gemini: the Gemini Developer API (via generativelanguage.googleapis.com, accessed with an API key from Google AI Studio) and the Vertex AI API (via aiplatform.googleapis.com, authenticated with IAM). The split mirrors the Bedrock situation: the developer API is optimised for simplicity, Vertex is optimised for enterprise governance.
| Dimension | Gemini Developer API | Vertex AI Gemini API |
|---|---|---|
| Authentication | API key (Google AI Studio) | Service account / ADC / IAM |
| Free tier | Yes (rate-limited) | No — all usage billed to GCP project |
| IAM access control | Not available | Fine-grained IAM roles per resource |
| VPC Service Controls | Not available | Supported — prevent data exfiltration |
| Customer-managed encryption (CMEK) | Not available | Supported |
| Data residency controls | Limited | Regional endpoints in EU, US, Asia |
| Audit logging | Not available | Cloud Audit Logs + Cloud Monitoring |
| HIPAA / compliance BAA | Not available | Available through Google Cloud agreement |
| Batch inference jobs | Not available | Supported |
| SLA | Best-effort | Google Cloud SLA applies |
The Gemini Developer API has a generous free tier and is the fastest way to prototype. It uses a simple API key, making it easy to get started in minutes. Vertex AI has no free tier but unlocks every enterprise control: HIPAA eligibility, VPC isolation, CMEK, and the full Google Cloud compliance posture.
Choosing the right route for your project
The decision is rarely about model quality — it is about your operational context. Here is a practical heuristic.
- Solo developer or small team
- Prototyping or hackathon
- Need features the day they ship
- No corporate procurement requirements
- Fine with API-key auth
- Want free tier (Gemini only)
- Already running on AWS or GCP
- Security team requires IAM auth (no static keys)
- Finance needs one consolidated invoice
- Data must stay in a specific region
- HIPAA, FedRAMP, or SOC 2 compliance required
- Want audit logs in CloudTrail or Cloud Audit Logs
The lock-in consideration
Routing through a cloud platform does introduce mild lock-in to that cloud's tooling — IAM policies, billing constructs, and observability dashboards. The model itself is portable; the operational scaffolding around it is not. If multi-cloud portability matters, keep your model-calling code behind an abstraction layer that lets you swap the underlying client without touching business logic.
Bedrock's multi-model menu actually reduces model-level lock-in: you can switch from Claude to Llama or Mistral without changing your auth setup, billing config, or logging pipeline. This is one of its genuine advantages over going direct to each individual provider.
Going deeper
Bedrock Provisioned Throughput
For workloads that consistently push high token volumes, Bedrock offers Provisioned Throughput: you commit to a reserved capacity unit (measured in model units, roughly 1 million tokens per minute per unit) for a fixed hourly rate. At high enough volume this can be cheaper than on-demand pricing. The trade-off is that you pay the hourly rate even when you are not using it, so it only makes sense if your utilisation is predictably high. Bedrock also charges for HTTP 500 errors under provisioned throughput, unlike the direct Anthropic API which has a 3% error-rate forgiveness buffer.
Bedrock Guardrails
AWS Bedrock Guardrails is an operator-side content-filtering layer that sits in front of the model. You configure policies (blocked topics, PII redaction, profanity filters, grounding checks) in the AWS console, then pass a guardrail ID with each request. Guardrails run before the prompt reaches the model and after the response comes back, giving you a defence layer that is independent of the model provider's own safety systems. This is particularly useful in regulated industries where you need auditable proof that certain content was never processed.
Vertex AI CMEK and VPC Service Controls
Customer-managed encryption keys (CMEK) let you supply your own Cloud KMS key to encrypt data at rest in Vertex — prompts, cached content, fine-tune checkpoints. If your key is revoked, Google can no longer decrypt your data, giving you a cryptographic kill-switch. VPC Service Controls wrap your Vertex project in a service perimeter: API calls must originate from inside the perimeter, preventing data exfiltration even if a credential is compromised.
Feature-availability lag
Both cloud platforms typically lag the direct provider APIs by days to weeks when new features ship. Anthropic may announce prompt caching or a new tool type; it lands in the Anthropic API immediately and in Bedrock later once AWS validates and deploys the update. If staying on the bleeding edge of capabilities matters for your use case — for example, you are building a product demo around a just-launched feature — the direct API is the lower-friction choice.
Cross-region inference profiles on Bedrock
Bedrock's inference profiles (US, EU, JP, AU) let you route across multiple AWS regions within a geography for higher throughput and resilience, without pinning to a single region. This is distinct from the global endpoint, which can route anywhere in the world, and from single-region endpoints, which satisfy hard data-residency requirements. Inference profiles sit in between: geographic containment with multi-region redundancy.
FAQ
Is Claude on Bedrock the same model as Claude on the Anthropic API?
Yes. The model weights are identical — anthropic.claude-opus-4-8 on Bedrock and claude-opus-4-8 on the Anthropic API run the same model. What differs is the infrastructure layer: authentication method, billing, logging, and which optional features (like server-side tools) are available on each route.
Is using Bedrock or Vertex AI more expensive than the direct API?
On-demand per-token pricing is identical across routes — there is no markup for using Bedrock or Vertex. Costs can diverge at high volume: Bedrock's Provisioned Throughput can be cheaper for sustained heavy use but adds data-transfer fees. Vertex AI's batch inference offers significant discounts for non-real-time workloads. Bedrock regional endpoints carry a 10% premium over the global endpoint.
Do I need an AWS account to use Claude through Bedrock?
Yes. You need an AWS account with Amazon Bedrock model access enabled for the specific Claude models you want to use. Claude model access is granted through the Bedrock console under Model access. There is no way to use Bedrock without an AWS account.
Which Gemini features are only available on Vertex AI and not the Gemini Developer API?
Vertex AI adds IAM-based access control, VPC Service Controls, customer-managed encryption keys (CMEK), data residency enforcement via regional endpoints, Cloud Audit Logs integration, HIPAA-eligible terms, and batch inference jobs. The developer API has none of these — it uses a simple API key and has no compliance SLA.
Can I switch from the direct Anthropic API to Bedrock without rewriting my code?
Mostly yes. The Bedrock Messages API uses the same request/response shape as the Anthropic API. You swap the client class (AnthropicBedrockMantle instead of Anthropic), add an AWS region, prefix the model ID with anthropic., and update your auth to use AWS credentials. Features not supported on Bedrock (Files API, server-side tools, Message Batches endpoint) will break and need an alternative approach.
What is the difference between the Bedrock global endpoint and a regional endpoint?
The global endpoint dynamically routes your request to the best-available AWS region for capacity and latency — no extra charge. A regional endpoint pins your traffic to a single specified region, which satisfies data-residency requirements (your prompts never leave that region), but costs 10% more and offers lower throughput ceiling than the globally-routed option.