Managed Inference and Agents API Model Cards
Last updated September 04, 2025
Table of Contents
Our model cards contain documentation for each available AI model.
Available Models
The Heroku Managed Inference and Agent add-on is hosted in two regions: us
and eu
. However, the add-on can be provisioned and accessed from apps in any Heroku region.
Each region offers slightly different models.
Region: us
Model Documentation | Type | API Endpoint | Model Source | Description |
---|---|---|---|---|
Claude 4 Sonnet | text → text |
/v1/chat/completions | Anthropic | A state-of-the-art large language model (LLM) that supports chat and tool-calling. |
Claude 3.7-sonnet | text → text |
/v1/chat/completions | Anthropic | A state-of-the-art LLM that supports chat and tool-calling. |
Claude 3.5 Sonnet Latest | text → text |
/v1/chat/completions | Anthropic | A state-of-the-art LLM that supports chat and tool-calling. |
Claude 3.5 Haiku | text → text |
/v1/chat/completions | Anthropic | A faster, more affordable LLM that supports chat and tool-calling. |
Amazon Nova Lite | text → text |
/v1/chat/completions | Amazon | A fast and cost-effective LLM. |
Amazon Nova Pro | text → text |
/v1/chat/completions | Amazon | A high-performance LLM designed for complex tasks. |
OpenAI gpt-oss-120b | text → text |
/v1/chat/completions | OpenAI | An open-weight LLM that supports chat and tool-calling. |
Cohere Embed Multilingual | text → embedding |
/v1/embeddings | Cohere | A state-of-the-art embedding model that supports multiple languages. This model is helpful for developing RAG (Retrieval Augmented Generation) search. |
Stable Image Ultra | text → image |
/v1/images/generations | Stability AI | A state-of-the-art diffusion (image generation) model. |
Region: eu
Model Documentation | Type | API Endpoint | Model Source | Description |
---|---|---|---|---|
Claude 4 Sonnet | text → text |
/v1/chat/completions | Anthropic | A state-of-the-art LLM that supports chat and tool-calling. |
Claude 3.7 Sonnet | text → text |
/v1/chat/completions | Anthropic | A state-of-the-art LLM that supports chat and tool-calling. |
Claude 3 Haiku | text → text |
/v1/chat/completions | Anthropic | A faster, more affordable LLM that supports chat and tool-calling. |
Amazon Nova Lite | text → text |
/v1/chat/completions | Amazon | A fast and cost-effective LLM. |
Amazon Nova Pro | text → text |
/v1/chat/completions | Amazon | A high-performance LLM designed for complex tasks. |
Cohere Embed Multilingual | text → embedding |
/v1/embeddings | Cohere | A state-of-the-art embedding model that supports multiple languages. This model is helpful for developing RAG (Retrieval Augmented Generation) search. |