Overview
Qwen3.7-Plus is Alibaba's Qwen team's multimodal agent model in the Qwen-Plus line, announced on June 2, 2026 and generally available from June 1, 2026 after a short public preview. It bolts image and video understanding onto the text-only Qwen 3.7 backbone and is positioned as the lower-cost sibling of the flagship Qwen3.7-Max — Alibaba lists it at roughly one-sixth the per-token price of Max.
Unlike a text model with a vision adapter, Qwen3.7-Plus is built to operate as an interactive agent: it perceives real-world scenes, reads screens and grounds clicks on GUIs, writes code from visual references, navigates mobile apps end to end, and answers questions about video frames — blending GUI and command-line actions inside a single agent loop with tool calls, self-testing, and autonomous iteration. It is a perception model only: it reads images and video but returns text, not generated pictures.
The model carries a 1-million-token context window and is exposed as the API endpoint qwen3.7-plus on Alibaba Cloud Model Studio (DashScope), reachable through OpenAI-compatible chat-completions and responses APIs across Beijing, Singapore, and US-Virginia endpoints, and resold through aggregators such as OpenRouter. It is proprietary and API-only — no open weights have been published.
| Released | 2026-06-02 |
|---|---|
| License | Proprietary (API-only) |
| Weights | API only |
| Parameters | Not disclosed |
| Context | 1M |
| Max output | 32,768 tokens |
| Architecture | Multimodal vision-language agent that extends the Qwen 3.7 text backbone with image and video understanding. It is a perception model — it accepts text, images, and video and returns text only (no image generation). Parameter count and exact architecture are not publicly disclosed. |
| Knowledge cutoff | Not disclosed |
| Modalities | Text, Vision, Video |
| Status | Generally available |
Benchmarks
- ScreenSpot Pro (GUI grounding)79%
- AndroidWorld (mobile agent)81%
- Terminal-Bench 2.0 (agentic terminal)70.3%
- GPQA Diamond (STEM reasoning)90.3%
- Artificial Analysis Intelligence Index39index
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.40 / 1M tokens per 1M tokens |
|---|---|
| Cached input | $0.04 / 1M tokens (cache write) per 1M tokens |
| Output | $1.20 / 1M tokens per 1M tokens |
Singapore (international) region, non-thinking mode, for prompts up to 256K tokens. Above 256K tokens the rate rises to $1.20 input / $3.60 output per 1M. Thinking-mode output is billed higher ($4 / 1M up to 256K). Pricing is tiered by request length.
Strengths
- GUI grounding and on-screen agent control: 79.0 on ScreenSpot Pro and 81.0 on AndroidWorld (vendor-reported)
- Very large 1M-token context window for long documents, codebases, and multi-turn agent traces
- Native image and video input alongside text — reads screens, frames, and document images
- Strong agentic tool use, self-testing, and autonomous iteration inherited from the Qwen 3.7 agent backbone
- Low per-token price relative to flagship agent models (about one-sixth the cost of Qwen3.7-Max)
- OpenAI-compatible API and multi-region availability make integration straightforward
Best for
- Computer-use and mobile agents that read screens and click the right UI element
- Long-context document, codebase, and transcript analysis up to 1M tokens
- Visual question answering over screenshots, document images, and video frames
- Coding from visual references and UI mockups with built-in test-and-iterate loops
- Tool-calling agent workflows that mix GUI and command-line actions
- Cost-sensitive multimodal deployments that need vision without flagship-tier pricing
How to access
| Provider | Model ID |
|---|---|
| Alibaba Cloud Model Studio (DashScope) ↗ | qwen3.7-plus |
| OpenRouter ↗ | qwen/qwen3.7-plus |
Qwen-Plus (multimodal agent) — every version
The full lineage of the Qwen-Plus (multimodal agent) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Qwen3.7-Pluscurrent | 2026-06-02 | — | Proprietary |
| Qwen3.6-Plus | 2026-04 | — | Proprietary |
| Qwen3.5-Plus | 2026-02-16 | 1M | Proprietary |
FAQ
Is Qwen3.7-Plus open source?
No. Qwen3.7-Plus is a proprietary, API-only model. No open weights have been published; it is accessed through Alibaba Cloud Model Studio (DashScope) and resellers like OpenRouter.
What can Qwen3.7-Plus actually take as input?
It accepts text, images, and video, and returns text only. It is a perception and agent model — it reads screens, document images, and video frames, but it does not generate images.
How is Qwen3.7-Plus different from Qwen3.7-Max?
Qwen3.7-Plus adds vision and video understanding on top of the Qwen 3.7 backbone and is the lower-cost, multimodal agent tier. Alibaba lists it at roughly one-sixth the per-token price of the text-focused flagship Qwen3.7-Max.
How large is the context window?
Qwen3.7-Plus supports a 1-million-token context window, with a maximum output of 32,768 tokens per response according to Alibaba Cloud Model Studio documentation.