AI/TLDR

Jina

Build and serve AI services that talk over gRPC, HTTP, and WebSockets

Overview

Jina-serve is an open-source framework for building and deploying AI services that communicate over gRPC, HTTP, and WebSockets. It lets you focus on your core model logic while it handles the serving layer, so you can move the same code from local development to production.

You write your logic inside Executors that process Documents, connect them through a Gateway, and serve them as Deployments. To build a multi-step pipeline, you chain Executors together into a Flow. The framework adds scaling, streaming, and dynamic batching, plus built-in Docker, Kubernetes, and cloud deployment options.

What it does

  • Native support for major ML frameworks and data types, with DocArray-based input and output using BaseDoc and DocList
  • High-performance serving over gRPC, HTTP, and WebSockets with replicas, shards, and dynamic batching for higher throughput
  • LLM serving with token-by-token streaming output for responsive applications
  • Built-in Docker integration and an Executor Hub for sharing and pulling containerized services
  • Export to Kubernetes manifests or Docker Compose files for production deployment
  • One-command deployment to Jina AI Cloud (JCloud)

Getting started

Jina-serve is a Python package installed from PyPI. The example below builds a simple service from an Executor, serves it as a Deployment, then calls it with the client.

Install Jina

Install the jina package from PyPI. Separate setup guides are available for Apple Silicon and Windows.

bashbash
pip install jina

Write an Executor

Define your data schemas with BaseDoc and put your model logic inside an Executor method marked with the @requests decorator. The method receives and returns a DocList of Documents.

pythonpython
from jina import Executor, requests
from docarray import DocList, BaseDoc

class Prompt(BaseDoc):
    text: str

class Generation(BaseDoc):
    prompt: str
    text: str

class MyExecutor(Executor):
    @requests
    def generate(self, docs: DocList[Prompt], **kwargs) -> DocList[Generation]:
        ...

Serve it as a Deployment

Wrap your Executor in a Deployment, choose a port, and call block() to keep the service running.

pythonpython
from jina import Deployment
from executor import MyExecutor

dep = Deployment(uses=MyExecutor, timeout_ready=-1, port=12345)

with dep:
    dep.block()

Call the service or chain a Flow

Use the Client to send Documents to your service. To build a pipeline, add several Executors to a Flow so requests pass through them in order.

pythonpython
from jina import Client, Flow
from docarray import DocList

# Single service
client = Client(port=12345)
response = client.post('/', inputs=[Prompt(text='hello')], return_type=DocList[Generation])

# Pipeline
flow = Flow(port=12345).add(uses=StableLM).add(uses=TextToImage)
with flow:
    flow.block()

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Serving an LLM or other model as a gRPC, HTTP, or WebSocket API with token-by-token streaming output
  • Building multi-step AI pipelines, such as text generation followed by text-to-image, by chaining Executors into a Flow
  • Scaling a model service with replicas, shards, and dynamic batching to handle higher request volume
  • Deploying AI services to production by exporting to Kubernetes or Docker Compose, or shipping with one command to Jina AI Cloud

How Jina compares

Jina alongside other open-source app frameworks tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
LangChain★ 140kA widely used Python and JavaScript framework for building LLM applications by composing models, prompts, tools, retrievers, and memory into chains.
LlamaIndex★ 50.2kA data framework for connecting language models to your own documents and data sources, with built-in agent and retrieval (RAG) tooling.
Haystack★ 25.6kAn orchestration framework from deepset for building modular LLM pipelines and agents for search, RAG, and question answering.
Jina★ 21.9kBuild and serve AI services that talk over gRPC, HTTP, and WebSockets
Prompt Flow★ 11.2kMicrosoft's toolkit for building LLM apps as executable flows that link prompts, Python code, and tools, with tracing, batch evaluation, and deployment.