What is Prodigy primarily used for?

Prodigy is primarily used for efficient data annotation in machine learning workflows, especially for tasks requiring custom interfaces, active learning, and integration into Python-based pipelines.

Does Prodigy offer a free tier?

No, Prodigy does not offer a free tier. It is a commercial product with a perpetual license model.

What programming languages does Prodigy support?

Prodigy is primarily designed for interaction via Python, allowing users to script annotation workflows and integrate with Python-based machine learning libraries.

How does Prodigy support active learning?

Prodigy supports active learning by allowing users to integrate machine learning models to pre-label data, suggest examples, or prioritize which data points human annotators should review next, aiming to reduce manual effort.

What types of data can be annotated with Prodigy?

Prodigy can be used to annotate various data types, including text, images, and audio, making it suitable for a wide range of NLP, computer vision, and speech-related tasks.

Is Prodigy an open-source tool?

No, Prodigy is a commercial, proprietary tool. Its source code is not publicly available.

How is Prodigy priced?

Prodigy is priced per perpetual license, with separate tiers for personal and team use. Enterprise pricing is custom.

Prodigy — Scriptable Annotation Tool for Machine Learning

Overview

Prodigy is a commercial annotation tool developed by Explosion AI, designed to assist machine learning practitioners in preparing training data for various AI models. Launched in 2016, it focuses on providing a highly customizable and scriptable environment for data labeling, particularly suited for active learning workflows. Unlike traditional annotation platforms that often rely on graphical user interfaces for all tasks, Prodigy emphasizes programmatic control through Python, allowing users to integrate annotation steps directly into their existing machine learning pipelines.

The tool is optimized for scenarios where rapid iteration and custom annotation interfaces are required. Developers can define their own annotation tasks, integrate custom models for pre-labeling or active learning suggestions, and manage datasets programmatically. This approach aims to reduce the overhead associated with manual data labeling by leveraging machine learning models to assist human annotators, thereby accelerating the data collection and model training loop. Prodigy supports a range of data types, including text, images, and audio, and offers built-in recipes for common natural language processing (NLP) tasks such as named entity recognition (NER), text classification, and sentiment analysis.

Prodigy's architecture is built around a command-line interface (CLI) and a web-based annotation interface that streams data to annotators. This design allows for flexible deployment, from local development environments to cloud-based setups. Its strengths lie in its extensibility and its focus on the developer experience, enabling machine learning engineers and data scientists to maintain control over the annotation process. For teams requiring a highly tailored and integrated data labeling solution, especially those already proficient in Python, Prodigy offers a scalable option for managing annotation projects and improving model performance through targeted data acquisition.

While other tools like Label Studio offer open-source alternatives for data labeling, Prodigy differentiates itself through its emphasis on active learning and its Python-first approach, which can streamline the integration of human-in-the-loop processes into automated ML pipelines.

Key features

Scriptable Annotation Workflows: Define and control annotation tasks entirely through Python scripts, allowing for deep customization and integration into existing ML pipelines.
Active Learning Support: Integrate machine learning models to suggest labels, filter data, or prioritize examples for human annotation, reducing the amount of manual labeling required.
Customizable Web Interface: Tailor the annotation interface to specific task requirements, including custom components for displaying data and collecting annotations.
Command-Line Interface (CLI): Manage datasets, run annotation sessions, and export data using a set of command-line tools.
Real-time Feedback: Provides immediate feedback on annotation quality and progress, helping to maintain consistency and efficiency.
Multi-modal Data Support: Handles various data types, including text, images, audio, and video, for diverse machine learning applications.
Pre-built Recipes: Offers ready-to-use recipes for common NLP tasks (e.g., NER, text classification, sentiment analysis) and other domains, accelerating project setup.
Database Integration: Stores annotations in a local database (SQLite by default) or integrates with other databases for larger projects.
Version Control for Data: Facilitates tracking and managing different versions of annotated datasets.

Pricing

Prodigy is a commercial product with a perpetual license model. Pricing is structured based on the type of license and the number of users. As of May 2026, the pricing details are as follows:

License Type	Price	Description
Prodigy Personal	$390	For individual use, includes all features and perpetual license.
Prodigy Team	$390 per user	For teams, priced per user, includes all features and perpetual license.
Prodigy Enterprise	Custom	For larger organizations, includes dedicated support, custom licensing, and advanced features. Contact sales for pricing.

For the most current pricing information, refer to the official Prodigy pricing page.

Common integrations

spaCy: Deep integration with the spaCy NLP library for pre-trained models, custom components, and efficient text processing. Refer to the Prodigy spaCy integration documentation.
Hugging Face Transformers: Can be used with models from the Hugging Face Transformers library for various NLP tasks, leveraging their pre-trained models for active learning.
PyTorch/TensorFlow: Direct integration with custom models built using popular deep learning frameworks like PyTorch or TensorFlow for active learning loops and model-assisted annotation.
Custom Python Scripts: Designed to integrate seamlessly with any Python-based script or library, allowing users to incorporate Prodigy into existing data pipelines.
Databases (e.g., SQLite, PostgreSQL): Stores annotation data, allowing for integration with various database systems for data management and export.

Alternatives

Label Studio: An open-source data labeling tool that supports a wide range of data types and customizable annotation interfaces.
Scale AI: A platform offering human-powered data annotation services and tools for various AI applications.
Figure Eight (Appen): A comprehensive data annotation platform providing both tools and managed services for data labeling.
SuperAnnotate: An end-to-end platform for data annotation and dataset management, focusing on computer vision and NLP.
CVAT (Computer Vision Annotation Tool): An open-source web-based annotation tool primarily for computer vision tasks, supporting bounding boxes, polygons, and more.

Getting started

To get started with Prodigy, you typically install it via pip after purchasing a license. Once installed, you can begin by creating a simple annotation recipe. Here's an example of how to set up a basic text classification task using Prodigy to label tweets as positive or negative:

# 1. Save this as a Python file, e.g., classify_tweets.py
# You'll run this using: prodigy textcat.manual my_dataset tweets.jsonl --label POSITIVE,NEGATIVE

import prodigy
from prodigy.components.loaders import JSONL

@prodigy.recipe(
    "textcat.manual",
    dataset=("The dataset to save to", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
    label=("Comma-separated label(s) to add to the annotation interface", "option", "l", str),
    exclude=("Comma-separated dataset(s) to exclude from the current dataset", "option", "e", str),
)
def textcat_manual(dataset: str, source: str, label: str = None, exclude: str = None):
    """Manually annotate text classification tasks."""
    labels = label.split(",") if label else []
    stream = JSONL(source)  # Load data from a JSONL file

    return {
        "dataset": dataset,
        "view_id": "textcat",  # Use the built-in textcat interface
        "stream": stream,
        "config": {
            "textcat_multilabel": False,  # Set to True for multi-label classification
            "labels": labels,  # Pass the labels to the frontend
            "exclude_by": "input", # Exclude previously annotated examples based on their input hash
        },
    }

# 2. Create a tweets.jsonl file with data to annotate:
# {"text": "This movie was fantastic!"}
# {"text": "I hated the ending."}
# {"text": "Neutral feelings about this."}

# 3. Run Prodigy from your terminal:
# prodigy textcat.manual my_tweet_annotations tweets.jsonl --label POSITIVE,NEGATIVE

# This command will start a local web server (usually on http://localhost:8080)
# where you can access the annotation interface.
# After annotating, you can export your data:
# prodigy db-out my_tweet_annotations > annotated_tweets.jsonl

This example demonstrates how to define a custom recipe using the @prodigy.recipe decorator, load data using JSONL, and configure the built-in textcat interface for manual classification. The --label argument specifies the categories annotators can choose from. After running the command, Prodigy launches a web server, providing a user interface for labeling. Annotated data can then be exported for model training or further analysis.

Prodigy

Overview

Key features

Pricing

Common integrations

Alternatives

Getting started

Frequently asked questions

User reviews

Reader threads

Overview

Key features

Pricing

Common integrations

Alternatives

Getting started

Related

Frequently asked questions

User reviews

Reader threads