Overview

Together AI is a cloud platform focused on providing services for large language models, particularly specializing in the deployment, fine-tuning, and training of open-source LLMs. Founded in 2022, the platform aims to offer infrastructure that supports the lifecycle of AI model development and deployment, from initial research and experimentation to scalable production inference. It is designed for developers and technical buyers who require access to a variety of pre-trained open-source models as well as the ability to customize them for specific applications.

The platform's core offerings include the Together Inference API, which allows users to run predictions on a range of open-source models such as Llama 2, Mixtral, and others, often with reported performance optimizations. This enables use cases such as content generation, summarization, chatbots, and code completion. The Together Fine-tuning API provides tools for adapting these base models with custom datasets, allowing enterprises to improve model performance and relevance for their particular domain or task without training a model from scratch. This process is critical for applications requiring high accuracy on specialized knowledge or specific stylistic outputs. For more advanced use cases, the Together Training Platform offers resources for training entirely new AI models.

Together AI positions itself as a cost-effective solution for LLM inference and development, employing a pay-as-you-go pricing model based on token usage and GPU-hours for training and fine-tuning. This structure can benefit users by aligning costs directly with consumption, which is particularly relevant for projects with fluctuating demands or those in early development stages. The platform emphasizes developer experience, offering a straightforward REST API and Python SDK to facilitate integration and model management. Its focus on open-source models aligns with a growing trend towards more transparent and customizable AI solutions, giving users greater control over their models and data. For example, open-source models like Meta's Llama 2 provide foundational capabilities that can be extended, as detailed in the Meta AI Llama 2 research paper.

The platform is suited for a range of applications, including deploying custom chatbots, building intelligent search engines, creating content generation pipelines, and supporting AI research efforts. By providing access to high-performance inference and fine-tuning capabilities, Together AI enables organizations to integrate advanced AI functionalities into their products and services without incurring the substantial infrastructure costs associated with managing dedicated GPU clusters. Compliance with standards like SOC 2 Type II indicates an adherence to security and availability best practices, which is relevant for enterprise adoption. The platform's commitment to supporting the open-source AI ecosystem also positions it as a resource for innovation in AI model development.

Key features

  • Together Inference API: Provides access to a catalog of open-source large language models for real-time inference, including models like DeepSeek-Coder and Llama 3, optimized for performance and lower latency.
  • Together Fine-tuning API: Allows users to customize pre-trained open-source models with their own datasets to improve performance on specific tasks or domains.
  • Together Training Platform: Offers infrastructure and tools for training custom AI models from scratch, providing flexibility for specific research or application needs.
  • Broad Model Support: Supports a variety of open-source models from developers such as Meta, Mistral AI, and DeepSeek, allowing users to select models based on their performance and architectural requirements.
  • Pay-as-You-Go Pricing: Billing is based on token usage for inference and GPU-hours for fine-tuning and training, with volume discounts available for higher consumption.
  • Python SDK and REST API: Offers programmatic access to platform features through a dedicated Python SDK and a standard RESTful API, simplifying integration into existing workflows, as described in the Together AI Inference API reference.
  • SOC 2 Type II Compliance: Demonstrates adherence to security, availability, processing integrity, confidentiality, and privacy standards, important for enterprise use cases.

Pricing

Together AI operates on a pay-as-you-go model, where costs are primarily determined by the number of input and output tokens for inference and the GPU-hours consumed for fine-tuning and training tasks. Volume discounts are available for higher usage tiers.

Together AI Inference Pricing (as of 2026-06-15)
Model Input Price (per 1M tokens) Output Price (per 1M tokens)
Together-Mixtral-8x7B-Instruct-v0.1 $0.35 $0.35
Llama-3-8B-Instruct $0.30 $0.30
Llama-3-70B-Instruct $0.59 $0.79
Qwen/Qwen1.5-72B-Chat $0.60 $0.70
DeepSeek/deepseek-coder-33b-instruct $0.30 $0.30

Fine-tuning and training costs are based on GPU-hour consumption, with rates varying by GPU type and duration. For full details on current pricing, including specific GPU rates and potential volume discounts, refer to the Together AI pricing page.

Common integrations

  • Python Applications: Developers can integrate Together AI's APIs into Python-based applications using the official Together AI Python SDK for inference and fine-tuning.
  • cURL: The platform's REST API can be accessed directly via cURL commands, allowing integration into shell scripts or any environment capable of making HTTP requests.
  • LangChain and LlamaIndex: While not explicitly documented as official integrations, the standard API interface allows for compatibility with popular LLM orchestration frameworks. For example, LangChain provides LLM integrations for platforms like Together AI, extending its utility for complex agent systems.
  • Custom Applications: Any application capable of sending HTTP requests can interact with the Together AI REST API, enabling integration with web applications (e.g., Flask, FastAPI, Django) or backend services written in various programming languages.

Alternatives

  • Anyscale: Offers a platform for building and deploying AI applications, including LLMs, with a focus on Ray for scalable computing.
  • Perplexity AI: Provides an AI-powered answer engine and API, primarily focused on search and information synthesis.
  • Replicate: A platform for running and fine-tuning open-source machine learning models, similar to Together AI in its focus on model deployment.
  • OpenAI Platform: Offers access to proprietary models like GPT-4 and GPT-3.5 via API, along with fine-tuning capabilities.
  • Google AI Studio: Provides access to Google's Gemini models and other generative AI tools for development and deployment.

Getting started

To begin using Together AI for inference, you typically need to obtain an API key and then make a request to the inference endpoint. The following Python example demonstrates how to make a basic inference call to the Together Inference API using the together Python client to generate text with a specified model.

import together

together.api_key = "YOUR_API_KEY" # Replace with your actual API key

def generate_text(prompt, model="mistralai/Mistral-7B-Instruct-v0.2", max_tokens=100):
    try:
        response = together.Complete.create(
            prompt=prompt,
            model=model,
            max_tokens=max_tokens,
            temperature=0.7,
            top_p=0.7,
            top_k=50,
            repetition_penalty=1
        )
        return response['output']['choices'][0]['text']
    except Exception as e:
        return f"An error occurred: {e}"

if __name__ == "__main__":
    user_prompt = "Explain the concept of quantum entanglement in simple terms."
    generated_content = generate_text(user_prompt)
    print(f"Prompt: {user_prompt}")
    print(f"Generated Content: {generated_content}")

    print("\n--- Another example ---")
    code_prompt = "Write a Python function to calculate the factorial of a number."
    code_content = generate_text(code_prompt, model="deepseek-ai/deepseek-coder-33b-instruct")
    print(f"Prompt: {code_prompt}")
    print(f"Generated Content: {code_content}")

This example first imports the together library and sets the API key. The generate_text function then uses together.Complete.create to send a request with a specified prompt, model, and generation parameters like max_tokens and temperature. The response object is then parsed to extract the generated text. This foundational code can be extended to handle more complex interactions, such as conversational AI, structured data generation, or integrating with web frameworks like FastAPI for building web services.