Overview
Data Formulator is a Microsoft Research project for exploring data with visualizations powered by AI agents. It combines a visual UI with natural-language input, so you connect a data source, ask questions, and get charts you can edit, branch, and share on one interactive canvas.
It is aimed at two audiences: data and platform teams who wire up databases, warehouses, and BI sources once to give an org an AI-powered exploration layer, and analysts who want to ask, edit, and share insights without writing query and plotting code by hand.
As a data-app builder, it runs locally as a Python package and connects to your own model provider. It supports OpenAI, Azure, Ollama, and Anthropic through LiteLLM, and can read from sources like PostgreSQL, MySQL, MSSQL, BigQuery, S3, and Azure Blob, plus files, images, and text.
What it does
- Natural-language plus direct-manipulation UI for creating and editing charts on a visual canvas
- A unified Data Agent with thread memory that inspects data, runs sandboxed code, and recommends next steps
- Data Thread keeps questions, intermediate results, and charts navigable so you can revisit steps, branch alternatives, and compare side by side
- Connectors for databases, warehouses, BI systems, object stores, and files (PostgreSQL, MySQL, MSSQL, BigQuery, S3, Azure Blob, and more)
- 30+ chart types via a semantic chart engine, plus a style-refinement agent for presentation-ready visuals
- Bring-your-own model: OpenAI, Azure, Ollama, and Anthropic supported through LiteLLM
Getting started
Data Formulator runs locally as a Python package; install it with pip or run it instantly with uvx, then open the local web UI.
Install with pip
Install the package from PyPI into your Python environment.
pip install data_formulatorOr run instantly with uvx
Use uvx to download and run it without a manual install step.
uvx data_formulatorStart the app
Launch Data Formulator, then open the local URL it prints in your browser and configure your model provider (OpenAI, Azure, Ollama, or Anthropic).
data_formulatorCommands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Letting analysts ask questions of a connected database and get editable charts without writing SQL or plotting code
- Giving a data or platform team a reusable AI exploration layer over warehouses and BI sources
- Extracting structured data from Excel files, images, websites, or text and turning it into visualizations
- Branching into alternative views of the same dataset and comparing them side by side, then exporting a report as image or PDF
How Data Formulator compares
Data Formulator alongside other open-source data app builders tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Streamlit | ★ 45k | A Python framework that turns scripts into interactive data and ML web apps with simple widget calls and no frontend code. |
| Gradio | ★ 43k | A Python library for quickly building shareable web demos and UIs for machine learning models, APIs, and arbitrary functions. |
| Reflex | ★ 28.6k | A framework for building full-stack web apps entirely in Python, compiling component code to a React frontend and Python backend. |
| Dash | ★ 24.3k | A Python framework from Plotly for building analytical web dashboards and data apps with interactive charts and no JavaScript required. |
| marimo | ★ 21.5k | A reactive Python notebook stored as plain Python that can be run as a script or deployed as an interactive data app. |
| NiceGUI | ★ 15.9k | A backend-first Python UI framework built on FastAPI and Vue for creating web interfaces, dashboards, and internal tools. |
| Data Formulator | ★ 15.8k | Build and refine data visualizations with AI on an interactive canvas |
| Mesop | ★ 6.6k | A Python UI framework, started at Google, for rapidly building AI demos and internal web apps using composable components. |