AI/TLDR

Qwen3.7-Plus

Alibaba's low-cost multimodal agent that sees screens, codes, and acts in one loop

Overview

Qwen3.7-Plus is Alibaba's Qwen team's multimodal agent model in the Qwen-Plus line, announced on June 2, 2026 and generally available from June 1, 2026 after a short public preview. It bolts image and video understanding onto the text-only Qwen 3.7 backbone and is positioned as the lower-cost sibling of the flagship Qwen3.7-Max — Alibaba lists it at roughly one-sixth the per-token price of Max.

Unlike a text model with a vision adapter, Qwen3.7-Plus is built to operate as an interactive agent: it perceives real-world scenes, reads screens and grounds clicks on GUIs, writes code from visual references, navigates mobile apps end to end, and answers questions about video frames — blending GUI and command-line actions inside a single agent loop with tool calls, self-testing, and autonomous iteration. It is a perception model only: it reads images and video but returns text, not generated pictures.

The model carries a 1-million-token context window and is exposed as the API endpoint qwen3.7-plus on Alibaba Cloud Model Studio (DashScope), reachable through OpenAI-compatible chat-completions and responses APIs across Beijing, Singapore, and US-Virginia endpoints, and resold through aggregators such as OpenRouter. It is proprietary and API-only — no open weights have been published.

Released2026-06-02
LicenseProprietary (API-only)
WeightsAPI only
ParametersNot disclosed
Context1M
Max output32,768 tokens
ArchitectureMultimodal vision-language agent that extends the Qwen 3.7 text backbone with image and video understanding. It is a perception model — it accepts text, images, and video and returns text only (no image generation). Parameter count and exact architecture are not publicly disclosed.
Knowledge cutoffNot disclosed
ModalitiesText, Vision, Video
StatusGenerally available

Benchmarks

  1. ScreenSpot Pro (GUI grounding)79%
  2. AndroidWorld (mobile agent)81%
  3. Terminal-Bench 2.0 (agentic terminal)70.3%
  4. GPQA Diamond (STEM reasoning)90.3%
  5. Artificial Analysis Intelligence Index39index

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.40 / 1M tokens per 1M tokens
Cached input$0.04 / 1M tokens (cache write) per 1M tokens
Output$1.20 / 1M tokens per 1M tokens

Singapore (international) region, non-thinking mode, for prompts up to 256K tokens. Above 256K tokens the rate rises to $1.20 input / $3.60 output per 1M. Thinking-mode output is billed higher ($4 / 1M up to 256K). Pricing is tiered by request length.

Pricing source ↗

Strengths

  • GUI grounding and on-screen agent control: 79.0 on ScreenSpot Pro and 81.0 on AndroidWorld (vendor-reported)
  • Very large 1M-token context window for long documents, codebases, and multi-turn agent traces
  • Native image and video input alongside text — reads screens, frames, and document images
  • Strong agentic tool use, self-testing, and autonomous iteration inherited from the Qwen 3.7 agent backbone
  • Low per-token price relative to flagship agent models (about one-sixth the cost of Qwen3.7-Max)
  • OpenAI-compatible API and multi-region availability make integration straightforward

Best for

  • Computer-use and mobile agents that read screens and click the right UI element
  • Long-context document, codebase, and transcript analysis up to 1M tokens
  • Visual question answering over screenshots, document images, and video frames
  • Coding from visual references and UI mockups with built-in test-and-iterate loops
  • Tool-calling agent workflows that mix GUI and command-line actions
  • Cost-sensitive multimodal deployments that need vision without flagship-tier pricing

How to access

ProviderModel ID
Alibaba Cloud Model Studio (DashScope) ↗qwen3.7-plus
OpenRouter ↗qwen/qwen3.7-plus

Qwen-Plus (multimodal agent) — every version

The full lineage of the Qwen-Plus (multimodal agent) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Qwen3.7-Pluscurrent2026-06-02Proprietary
Qwen3.6-Plus2026-04Proprietary
Qwen3.5-Plus2026-02-161MProprietary

FAQ

Is Qwen3.7-Plus open source?

No. Qwen3.7-Plus is a proprietary, API-only model. No open weights have been published; it is accessed through Alibaba Cloud Model Studio (DashScope) and resellers like OpenRouter.

What can Qwen3.7-Plus actually take as input?

It accepts text, images, and video, and returns text only. It is a perception and agent model — it reads screens, document images, and video frames, but it does not generate images.

How is Qwen3.7-Plus different from Qwen3.7-Max?

Qwen3.7-Plus adds vision and video understanding on top of the Qwen 3.7 backbone and is the lower-cost, multimodal agent tier. Alibaba lists it at roughly one-sixth the per-token price of the text-focused flagship Qwen3.7-Max.

How large is the context window?

Qwen3.7-Plus supports a 1-million-token context window, with a maximum output of 32,768 tokens per response according to Alibaba Cloud Model Studio documentation.