Google · 2026-04-08 · notable
LiteRT-LM — Google's Cross-Platform Edge LLM Inference Framework
Open-source inference framework for deploying LLMs on edge devices across Android, iOS, Web, Desktop, and IoT with GPU/NPU acceleration, vision/audio input, and function calling.
One inference framework that deploys the same LLM to phones, browsers, desktops, and Raspberry Pis.
Key specs
| License | Apache 2.0 |
|---|---|
| GitHub stars | 3.5k |
| Platforms | Android, iOS, Web, Desktop, IoT |
What is it?
LiteRT-LM is Google's open-source inference framework for running large language models on edge devices. It provides a unified API across Android (Kotlin), iOS (Swift, in development), Web, Desktop (Python, C++), and IoT platforms like Raspberry Pi. It powers the Google AI Edge Gallery app.
How does it work?
The framework handles model loading, tokenization, and hardware-accelerated inference across GPU and NPU backends. It supports multimodal inputs (vision and audio), function calling for agentic workflows, and works with models from Gemma, Llama, Phi-4, and Qwen families. Quantized models run efficiently on constrained hardware.
Why does it matter?
Deploying an LLM to a phone is one problem; deploying the same model to a phone, a browser, a Raspberry Pi, and a desktop with one API is a different problem. LiteRT-LM solves the second one, removing the need to maintain separate inference stacks per platform.
Who is it for?
Mobile and IoT developers deploying LLMs to edge devices.
Try it
pip install litert-lm