ZJU-REAL · 2026-04-13 · notable
ClawGUI — Unified Framework for Training, Evaluating, and Deploying GUI Agents
Open-source end-to-end framework for GUI agents: RL training on real Android/iOS devices (GiGPO + Process Reward Model), reproducible benchmarking across 6 benchmarks with 95.8% match rate, and deployment via 12+ chat platforms.
Train, benchmark, and deploy mobile GUI agents — all from one auditable open-source repo.
What is it?
ClawGUI is an end-to-end framework from Zhejiang University's REAL lab for building and studying GUI-controlling agents. It has three parts: ClawGUI-RL (train agents with online RL on real or emulated Android/iOS screens), ClawGUI-Eval (standardized evaluation reproducing 6 existing GUI benchmarks at 95.8% accuracy), and ClawGUI-Agent (deploy on Android, HarmonyOS, and iOS via 12+ chat platforms including WeChat and Telegram).
How does it work?
Training uses GiGPO (Group-in-Group Policy Optimization) paired with a Process Reward Model. Agents learn to interact with real touch interfaces in parallel Docker Android containers, making training scalable without large hardware clusters. The eval harness runs the same tasks across benchmarks under a unified setup, removing the score-inflation from incompatible evaluation methodologies. Deployed agents translate natural language instructions into touch/swipe actions via the Android Accessibility API.
Why does it matter?
GUI agent research has fragmented into dozens of incompatible training setups with unreproducible benchmark numbers. ClawGUI consolidates the full pipeline in one codebase. The bundled 2B model reaches 17.1% on MobileWorld GUI-Only, a 6-point gain over comparable baselines, and 307 HuggingFace Papers upvotes in two days reflects real community interest.
Who is it for?
ML researchers and engineers building or benchmarking agents that control mobile interfaces.
Try it
git clone https://github.com/ZJU-REAL/ClawGUI