Overview
Jina-serve is an open-source framework for building and deploying AI services that communicate over gRPC, HTTP, and WebSockets. It lets you focus on your core model logic while it handles the serving layer, so you can move the same code from local development to production.
You write your logic inside Executors that process Documents, connect them through a Gateway, and serve them as Deployments. To build a multi-step pipeline, you chain Executors together into a Flow. The framework adds scaling, streaming, and dynamic batching, plus built-in Docker, Kubernetes, and cloud deployment options.
What it does
- Native support for major ML frameworks and data types, with DocArray-based input and output using BaseDoc and DocList
- High-performance serving over gRPC, HTTP, and WebSockets with replicas, shards, and dynamic batching for higher throughput
- LLM serving with token-by-token streaming output for responsive applications
- Built-in Docker integration and an Executor Hub for sharing and pulling containerized services
- Export to Kubernetes manifests or Docker Compose files for production deployment
- One-command deployment to Jina AI Cloud (JCloud)
Getting started
Jina-serve is a Python package installed from PyPI. The example below builds a simple service from an Executor, serves it as a Deployment, then calls it with the client.
Install Jina
Install the jina package from PyPI. Separate setup guides are available for Apple Silicon and Windows.
pip install jinaWrite an Executor
Define your data schemas with BaseDoc and put your model logic inside an Executor method marked with the @requests decorator. The method receives and returns a DocList of Documents.
from jina import Executor, requests
from docarray import DocList, BaseDoc
class Prompt(BaseDoc):
text: str
class Generation(BaseDoc):
prompt: str
text: str
class MyExecutor(Executor):
@requests
def generate(self, docs: DocList[Prompt], **kwargs) -> DocList[Generation]:
...Serve it as a Deployment
Wrap your Executor in a Deployment, choose a port, and call block() to keep the service running.
from jina import Deployment
from executor import MyExecutor
dep = Deployment(uses=MyExecutor, timeout_ready=-1, port=12345)
with dep:
dep.block()Call the service or chain a Flow
Use the Client to send Documents to your service. To build a pipeline, add several Executors to a Flow so requests pass through them in order.
from jina import Client, Flow
from docarray import DocList
# Single service
client = Client(port=12345)
response = client.post('/', inputs=[Prompt(text='hello')], return_type=DocList[Generation])
# Pipeline
flow = Flow(port=12345).add(uses=StableLM).add(uses=TextToImage)
with flow:
flow.block()Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Serving an LLM or other model as a gRPC, HTTP, or WebSocket API with token-by-token streaming output
- Building multi-step AI pipelines, such as text generation followed by text-to-image, by chaining Executors into a Flow
- Scaling a model service with replicas, shards, and dynamic batching to handle higher request volume
- Deploying AI services to production by exporting to Kubernetes or Docker Compose, or shipping with one command to Jina AI Cloud
How Jina compares
Jina alongside other open-source app frameworks tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| LangChain | ★ 140k | A widely used Python and JavaScript framework for building LLM applications by composing models, prompts, tools, retrievers, and memory into chains. |
| LlamaIndex | ★ 50.2k | A data framework for connecting language models to your own documents and data sources, with built-in agent and retrieval (RAG) tooling. |
| Haystack | ★ 25.6k | An orchestration framework from deepset for building modular LLM pipelines and agents for search, RAG, and question answering. |
| Jina | ★ 21.9k | Build and serve AI services that talk over gRPC, HTTP, and WebSockets |
| Prompt Flow | ★ 11.2k | Microsoft's toolkit for building LLM apps as executable flows that link prompts, Python code, and tools, with tracing, batch evaluation, and deployment. |