AI Models

The Heroku Managed Inference and Agent add-on supports the following models. The add-on is hosted in two regions: us and eu. However, the add-on can be provisioned and accessed from apps in any Heroku region. Select a model to view information on rate limits, prompt caching, and implementation.

Model Documentation Region Supported Inputs Supported Outputs API Endpoint Model Source Description
Claude Opus 4.5 US, EU text, image text /v1/chat/completions Anthropic A next-generation, frontier LLM that supports chat, tool-calling, autonomous coding, effort control, and enhanced reasoning.
Claude 4.5 Sonnet US, EU text, image text /v1/chat/completions Anthropic A state-of-the-art LLM optimized for enterprise apps that supports chat, tool-calling, and enhanced reasoning.
Claude 4.5 Haiku US, EU text, image text /v1/chat/completions Anthropic A state-of-the-art LLM that supports chat, tool-calling, and enhanced reasoning.
Nova 2 Lite US, EU text, image, video text /v1/chat/completions Amazon A fast and cost-effective LLM that supports conversational chat, tool-calling, and advanced reasoning with extended context.
Kimi K2 Thinking US text text /v1/chat/completions Moonshot AI An open-weight LLM that supports conversational chat, tool-calling, and chain-of-thought processing.
MiniMax M2 US text text /v1/chat/completions MiniMax An open-weight LLM that supports conversational chat, tool-calling, and programming tasks.
Qwen3 Coder 480B US text text /v1/chat/completions Qwen An open-weight LLM that supports conversational chat, tool-calling, and agentic coding.
Qwen3 235B US text text /v1/chat/completions Qwen An open-weight LLM that supports conversational chat, tool-calling, complex reasoning, and agentic coding.
Claude 4 Sonnet US, EU text, image text /v1/chat/completions Anthropic An intelligent and detail-oriented LLM that supports chat, tool-calling, and enhanced reasoning.
Claude 3.7 Sonnet US, EU text, image text /v1/chat/completions Anthropic An intelligent and detail-oriented LLM that supports chat, tool-calling, and enhanced reasoning.
Claude 3.5 Sonnet Latest US, EU text, image text /v1/chat/completions Anthropic A fast and affordable LLM that supports chat and tool-calling.
Claude 3.5 Haiku US, EU text text /v1/chat/completions Anthropic An affordable and straightforward LLM that supports chat and tool-calling.
Claude 3 Haiku EU text, image text /v1/chat/completions Anthropic A fast and affordable LLM that supports chat and tool-calling.
Nova Lite US, EU text, image, video text /v1/chat/completions Amazon A fast and cost-effective LLM.
Nova Pro US, EU text, image, video text /v1/chat/completions Amazon A high-performance LLM designed for complex tasks.
OpenAI gpt-oss-120b US, EU text text /v1/chat/completions OpenAI An open-weight LLM that supports chat and tool-calling.
Cohere Embed Multilingual US, EU text, image embedding /v1/embeddings Cohere A state-of-the-art embedding model that supports multiple languages and can be helpful for developing RAG search.
Stable Image Ultra US, EU text image /v1/images/generations Stability AI A state-of-the-art diffusion (image generation) model.
Cohere Rerank 3.5 US, EU text score /v1/rerank Cohere A reranking model that offers enhanced reasoning, broad data compatibility, and multilingual support.
Amazon Rerank 1.0 US, EU text score /v1/rerank Amazon A reliable, high-performing reranking model backed by AWS infrastructure.