Jina

Build and serve AI services that talk over gRPC, HTTP, and WebSockets

github.com/jina-ai/serve★ 21.9k jina.ai/serve

Overview

Jina-serve is an open-source framework for building and deploying AI services that communicate over gRPC, HTTP, and WebSockets. It lets you focus on your core model logic while it handles the serving layer, so you can move the same code from local development to production.

You write your logic inside Executors that process Documents, connect them through a Gateway, and serve them as Deployments. To build a multi-step pipeline, you chain Executors together into a Flow. The framework adds scaling, streaming, and dynamic batching, plus built-in Docker, Kubernetes, and cloud deployment options.

What it does

Native support for major ML frameworks and data types, with DocArray-based input and output using BaseDoc and DocList
High-performance serving over gRPC, HTTP, and WebSockets with replicas, shards, and dynamic batching for higher throughput
LLM serving with token-by-token streaming output for responsive applications
Built-in Docker integration and an Executor Hub for sharing and pulling containerized services
Export to Kubernetes manifests or Docker Compose files for production deployment
One-command deployment to Jina AI Cloud (JCloud)

Getting started

Jina-serve is a Python package installed from PyPI. The example below builds a simple service from an Executor, serves it as a Deployment, then calls it with the client.

Install Jina

Install the jina package from PyPI. Separate setup guides are available for Apple Silicon and Windows.

bashbash

pip install jina

Write an Executor

Define your data schemas with BaseDoc and put your model logic inside an Executor method marked with the @requests decorator. The method receives and returns a DocList of Documents.

pythonpython

from jina import Executor, requests
from docarray import DocList, BaseDoc

class Prompt(BaseDoc):
    text: str

class Generation(BaseDoc):
    prompt: str
    text: str

class MyExecutor(Executor):
    @requests
    def generate(self, docs: DocList[Prompt], **kwargs) -> DocList[Generation]:
        ...

Serve it as a Deployment

Wrap your Executor in a Deployment, choose a port, and call block() to keep the service running.

pythonpython

from jina import Deployment
from executor import MyExecutor

dep = Deployment(uses=MyExecutor, timeout_ready=-1, port=12345)

with dep:
    dep.block()

Call the service or chain a Flow

Use the Client to send Documents to your service. To build a pipeline, add several Executors to a Flow so requests pass through them in order.

pythonpython

from jina import Client, Flow
from docarray import DocList

# Single service
client = Client(port=12345)
response = client.post('/', inputs=[Prompt(text='hello')], return_type=DocList[Generation])

# Pipeline
flow = Flow(port=12345).add(uses=StableLM).add(uses=TextToImage)
with flow:
    flow.block()

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Serving an LLM or other model as a gRPC, HTTP, or WebSocket API with token-by-token streaming output
Building multi-step AI pipelines, such as text generation followed by text-to-image, by chaining Executors into a Flow
Scaling a model service with replicas, shards, and dynamic batching to handle higher request volume
Deploying AI services to production by exporting to Kubernetes or Docker Compose, or shipping with one command to Jina AI Cloud

How Jina compares

Jina alongside other open-source app frameworks tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
LangChain	★ 140k	A widely used Python and JavaScript framework for building LLM applications by composing models, prompts, tools, retrievers, and memory into chains.
LlamaIndex	★ 50.2k	A data framework for connecting language models to your own documents and data sources, with built-in agent and retrieval (RAG) tooling.
Haystack	★ 25.6k	An orchestration framework from deepset for building modular LLM pipelines and agents for search, RAG, and question answering.
Jina	★ 21.9k	Build and serve AI services that talk over gRPC, HTTP, and WebSockets
Prompt Flow	★ 11.2k	Microsoft's toolkit for building LLM apps as executable flows that link prompts, Python code, and tools, with tracing, batch evaluation, and deployment.

// Overview

// What it does

// Getting started

Install Jina

Write an Executor

Serve it as a Deployment

Call the service or chain a Flow

// When to use it

// How Jina compares

Overview

What it does

Getting started

When to use it

How Jina compares