OpenAI · 2026-04-22 · notable
OpenAI Privacy Filter — Apache 2.0 PII Detection Model for On-Prem Data Pipelines
OpenAI releases an Apache 2.0 PII detection model: 1.5B params, 50M active, 128k context, detects 8 PII categories via BIOES token classification. Designed for on-prem data sanitization — no data leaves your stack.
A small, fast Apache 2.0 model from OpenAI for detecting and masking PII in text — designed to run entirely on-premises.
Key specs
| Parameters | 1.5B total, 50M active |
|---|---|
| Context window | 128,000 tokens |
| GitHub stars | 204 |
| Pii categories | 8 |
| Hugging face likes | 168 |
What is it?
OpenAI Privacy Filter is a token-classification model that identifies 8 types of personally identifiable information — names, private addresses, emails, phone numbers, account numbers, URLs, dates, and secrets — in long text documents. It outputs BIOES span labels (Begin, Inside, Outside, End, Single) and processes sequences in a single forward pass. Apache 2.0 licensed and designed to run on-prem without sending data to any external API.
How does it work?
The model has 1.5B parameters with 50M active during inference, keeping latency low despite the large nominal size. A 128k-token context window lets it process long documents without chunking. Constrained Viterbi decoding ensures coherent BIOES labels. Precision/recall tradeoffs are configurable at runtime via a threshold parameter.
Why does it matter?
Enterprises doing LLM fine-tuning or building RAG pipelines on sensitive data must strip PII before it touches a cloud model. On-prem PII detection has historically meant expensive proprietary tools or brittle regex heuristics. An Apache 2.0 model from a recognized lab that runs locally removes a real compliance barrier for teams working with healthcare, legal, or financial data.
Who is it for?
Data engineers and ML teams doing LLM fine-tuning or RAG on enterprise data with PII exposure constraints.
Try it
pip install transformers && from transformers import pipeline; p = pipeline('token-classification', model='openai/privacy-filter'); p('My name is Alice Smith')