Skip Navigation
Show nav
Dev Center
  • Get Started
  • Documentation
  • Changelog
  • Search
  • Get Started
    • Node.js
    • Ruby on Rails
    • Ruby
    • Python
    • Java
    • PHP
    • Go
    • Scala
    • Clojure
    • .NET
  • Documentation
  • Changelog
  • More
    Additional Resources
    • Home
    • Elements
    • Products
    • Pricing
    • Careers
    • Help
    • Status
    • Events
    • Podcasts
    • Compliance Center
    Heroku Blog

    Heroku Blog

    Find out what's new with Heroku on our blog.

    Visit Blog
  • Log inorSign up
View categories

Categories

  • Heroku Architecture
    • Compute (Dynos)
      • Dyno Management
      • Dyno Concepts
      • Dyno Behavior
      • Dyno Reference
      • Dyno Troubleshooting
    • Stacks (operating system images)
    • Networking & DNS
    • Platform Policies
    • Platform Principles
  • Developer Tools
    • Command Line
    • Heroku VS Code Extension
  • Deployment
    • Deploying with Git
    • Deploying with Docker
    • Deployment Integrations
  • Continuous Delivery & Integration (Heroku Flow)
    • Continuous Integration
  • Language Support
    • Node.js
      • Working with Node.js
      • Node.js Behavior in Heroku
      • Troubleshooting Node.js Apps
    • Ruby
      • Rails Support
      • Working with Bundler
      • Working with Ruby
      • Ruby Behavior in Heroku
      • Troubleshooting Ruby Apps
    • Python
      • Working with Python
      • Background Jobs in Python
      • Python Behavior in Heroku
      • Working with Django
    • Java
      • Java Behavior in Heroku
      • Working with Java
      • Working with Maven
      • Working with Spring Boot
      • Troubleshooting Java Apps
    • PHP
      • PHP Behavior in Heroku
      • Working with PHP
    • Go
      • Go Dependency Management
    • Scala
    • Clojure
    • .NET
      • Working with .NET
  • Databases & Data Management
    • Heroku Postgres
      • Postgres Basics
      • Postgres Getting Started
      • Postgres Performance
      • Postgres Data Transfer & Preservation
      • Postgres Availability
      • Postgres Special Topics
      • Migrating to Heroku Postgres
    • Heroku Key-Value Store
    • Apache Kafka on Heroku
    • Other Data Stores
  • AI
    • Working with AI
    • Heroku Inference
      • Inference API
      • Quick Start Guides
      • AI Models
      • Inference Essentials
    • Vector Database
    • Model Context Protocol
  • Monitoring & Metrics
    • Logging
  • App Performance
  • Add-ons
    • All Add-ons
  • Collaboration
  • Security
    • App Security
    • Identities & Authentication
      • Single Sign-on (SSO)
    • Private Spaces
      • Infrastructure Networking
    • Compliance
  • Heroku Enterprise
    • Enterprise Accounts
    • Enterprise Teams
    • Heroku Connect (Salesforce sync)
      • Heroku Connect Administration
      • Heroku Connect Reference
      • Heroku Connect Troubleshooting
  • Patterns & Best Practices
  • Extending Heroku
    • Platform API
    • App Webhooks
    • Heroku Labs
    • Building Add-ons
      • Add-on Development Tasks
      • Add-on APIs
      • Add-on Guidelines & Requirements
    • Building CLI Plugins
    • Developing Buildpacks
    • Dev Center
  • Accounts & Billing
  • Troubleshooting & Support
  • Integrating with Salesforce
  • AI
  • Heroku Inference
  • Inference Essentials
  • Managed Inference and Agents Add-on

Managed Inference and Agents Add-on

Last updated May 16, 2025

Table of Contents

  • Tools
  • Benefits
  • Available Models
  • Install the CLI Plugin
  • Provision Access to an AI Model Resource
  • Language-Specific Examples
  • Call an AI Model Resource
  • Monitoring and Logging
  • Deprovisioning an AI Model Resource

The Heroku Managed Inference and Agents add-on may employ third-party generative AI models to provide the Service. Due to the nature of generative AI, the output that it generates may be unpredictable, and may include inaccurate or harmful responses. Customer assumes all responsibility for such output, including ensuring its accuracy, safety, and compliance with applicable laws and third-party acceptable use policies. For more information, please see the Heroku Notices and License Information Documentation.

The Heroku Managed Inference and Agents add-on offers an easy way to access various large foundational AI models, including supported language (chat), embedding, and diffusion (image) models.

To use these models, attach one or more model resources from the Heroku MIA add-on to your Heroku app. The add-on adds config variables to your app, allowing you to call the provisioned models. You can call models with the Heroku AI CLI plug-in or via direct curl requests.

All available models are hosted on Amazon Bedrock. Heroku provides an API similar to OpenAI’s to access the models.

Check out the Python, Ruby, and JavaScript (Node.js) quick start guides.

Tools

This add-on can also enable your large language models (LLMs) to run tools on Heroku automatically, with built-in support for retries and error correction. It supports both custom tools you create and built-in tools, like code execution, provided by Heroku.

When enabled, your app’s LLM calls a tool that triggers Heroku’s control loop to provision, execute, and deprovision dynos in the background, with action traces included in the model’s output.

See Heroku Tools and Working With MCP to learn more.

Benefits

Heroku owns and maintains the add-on. Your data is never sent outside of secure AWS accounts. Inference prompts and completions are only logged temporarily via Heroku Logplex, which you control.

Available Models

The following models are available.

Region: us

Model Documentation Type API Endpoint Model Source Description
claude-3-5-sonnet-latest text → text v1/chat/completions Anthropic A state-of-the-art LLM that supports chat and tool-calling.
claude-3-5-haiku text → text v1/chat/completions Anthropic A faster, more affordable LLM that supports chat and tool-calling.
claude-3-7-sonnet text → text v1/chat/completions Anthropic A state-of-the-art LLM that supports chat, tool-calling, and extended thinking.
cohere-embed-multilingual text → embedding v1/embeddings Cohere A state-of-the-art embedding model that supports multiple languages. This model is helpful for developing Retrieval Augmented Generation (RAG) search.
stable-image-ultra text → image v1/images/generations Stability AI A state-of-the-art diffusion (image generation) model.

Region: eu

Model Documentation Type API Endpoint Model Source Description
claude-3-7-sonnet text → text v1/chat/completions Anthropic A state-of-the-art LLM that supports chat, tool-calling, and extended thinking.
claude-3-haiku text → text v1/chat/completions Anthropic A faster, more affordable LLM that supports chat and tool-calling.
cohere-embed-multilingual text → embedding v1/embeddings Cohere A state-of-the-art embedding model that supports multiple languages. This model is helpful for developing Retrieval Augmented Generation (RAG) search.

Install the CLI Plugin

Heroku provides an AI CLI plugin to interact with your model resources.

Install the Heroku CLI if you haven’t installed it yet. Then, install the Heroku AI plugin:

heroku plugins:install @heroku/plugin-ai

See Heroku AI CLI Plugin Command Reference for details of all plugin commands.

Provision Access to an AI Model Resource

To use a model, you must first create and attach a model resource $MODEL_ID to your app $APP_NAME.

If you don’t have an app, you can create one with heroku create <your-new-app-name>.

To view the available models, you can run heroku ai:models:list. After deciding which model you want to use, run:

heroku ai:models:create -a $APP_NAME $MODEL_ID

By default, apps in the us region can only provision us models, and apps in the eu region can only provision eu models. Similarly, private space apps in the oregon, virginia, or montreal regions by default can only provision us models; apps in the remaining private space regions can only provision eu models. To override this, use the --region flag and use addons:create instead of ai:models:create to create your model resource: heroku addons:create heroku-inference:$MODEL_ID -a $APP_NAME -- --region=us/eu

You can attach multiple model resources to a single app. By default, the first model resource you attach to an app has an alias of INFERENCE. Subsequent attachments have randomized alias names, so we recommend you specify an alias with the --as flag. Specifically, we recommend using --as values of EMBEDDING and DIFFUSION for our embedding and diffusion models:

heroku ai:models:create -a $APP_NAME cohere-embed-multilingual --as EMBEDDING
heroku ai:models:create -a $APP_NAME stable-image-ultra --as DIFFUSION

We recommend using an alias of INFERENCE for the chat models, an alias of EMBEDDING for the embedding model (cohere-embed-multilingual), and DIFFUSION for the image model (stable-image-ultra) that we offer. Our example code follows this pattern, so for easy copy and pasting of commands, we recommend you also use these aliases.

If you attach more than one model resource of the same type to a single app, you must specify your own alias and replace any example code you’re using with the resulting config vars.

Model Resource Config Vars

After attaching a model resource to your app, your app has three new config variables. You can view these variables by calling heroku config -a $APP_NAME. If your app’s model resource has an alias of INFERENCE, which is the default, your three new config variables are:

INFERENCE_KEY
INFERENCE_MODEL_ID
INFERENCE_URL

To save these config variables as environment variables in your current environment, you can run:

export INFERENCE_KEY=$(heroku config:get INFERENCE_KEY -a $APP_NAME)
export INFERENCE_MODEL_ID=$(heroku config:get INFERENCE_MODEL_ID -a $APP_NAME)
export INFERENCE_URL=$(heroku config:get INFERENCE_URL -a $APP_NAME)

Or you can view and export your config vars all at once with:

eval $(heroku config -a $APP_NAME --shell | grep '^INFERENCE_' | tee /dev/tty | sed 's/^/export /')

In subsequent commands, you can specify your app’s <MODEL_RESOURCE> by either the --as alias, which is "INFERENCE" by default. Or you can specify it by the model resource slug, for example,inference-production-curved-41276. Run heroku ai:models:info -a $APP_NAME to view the slug and alias of an attached model resource.

Language-Specific Examples

We have language-specific quick start guides in Python, Ruby, and JavaScript for each of our endpoints.

Call an AI Model Resource

Via API / curl Requests

A typical model call looks like this:

curl $INFERENCE_URL/v1/chat/completions \
 -H "Authorization: Bearer $INFERENCE_KEY" \
 -d '{
"model": '"\"$INFERENCE_MODEL_ID\""',
 <other model keyword-arguments, varies model to model>
}'

Though the full endpoint URL varies depending on the model you’re using. For example:

  • Chat (claude-3-5-sonnet-latest, claude-3-5-haiku, and claude-3-haiku) models use the /v1/chat/completions endpoint.
  • Embedding model cohere-embed-multilingual uses the /v1/embeddings endpoint.
  • Diffusion model stable-image-ultra uses the v1/images/generations endpoint.

See our model cards for details on each model.

We recommend using streaming for all inferencing requests to prevent timeouts when requests exceed 29 seconds. Important Caveat: When using streaming with tool calling, the MIA add-on streams complete responses after each tool call, rather than incremental updates. If any individual tool call takes 55 seconds or longer, a timeout will occur.

Via Heroku AI Plugin

A typical model call looks like this:

heroku ai:models:call <MODEL_RESOURCE> -a $APP_NAME --prompt 'What is 1+2?'

See our model cards for details on each model.

Monitoring and Logging

Display stats and the current state of your model resources via the ai plugin:

heroku ai:models:info <MODEL_RESOURCE> -a $APP_NAME # model resource can be the resource ID or alias.

Deprovisioning an AI Model Resource

If you only ever use heroku ai:models:create commands to create and attach model resources, you can use heroku ai:models:destroy to destroy that resource.

However, in some cases users attach a single model resource to multiple apps via heroku ai:models:attach. To destroy a model resource connected to multiple apps, you must first detach the resource from all but one app with heroku ai:models:detach. Then, run heroku ai:models:destroy or run the destroy command with the --force flag.

Destroy an AI Model Resource

This action destroys all associated data and you can’t undo it!

To destroy an AI model resource, run:

heroku ai:models:destroy <MODEL_RESOURCE> --app $APP_NAME

When destroying a model resource, you can specify the model resource’s alias or the resource ID.

Detach an AI Model Resource

If you chose to create and then attach model resources to certain apps, you can detach an AI model resource from a specific app with:

heroku ai:models:detach <MODEL_RESOURCE> --app $APP_NAME

When detaching a model resource, you can specify the model resource’s alias or the resource ID.

This add-on bills by usage. A detached AI model resource never bills you, nor will an attached AI model resource that you’re not actively using.

Keep reading

  • Inference Essentials

Feedback

Log in to submit feedback.

Using Heroku Tools with the Managed Inference and Agents Add-on Using Heroku Tools with the Managed Inference and Agents Add-on

Information & Support

  • Getting Started
  • Documentation
  • Changelog
  • Compliance Center
  • Training & Education
  • Blog
  • Support Channels
  • Status

Language Reference

  • Node.js
  • Ruby
  • Java
  • PHP
  • Python
  • Go
  • Scala
  • Clojure
  • .NET

Other Resources

  • Careers
  • Elements
  • Products
  • Pricing
  • RSS
    • Dev Center Articles
    • Dev Center Changelog
    • Heroku Blog
    • Heroku News Blog
    • Heroku Engineering Blog
  • Twitter
    • Dev Center Articles
    • Dev Center Changelog
    • Heroku
    • Heroku Status
  • Github
  • LinkedIn
  • © 2025 Salesforce, Inc. All rights reserved. Various trademarks held by their respective owners. Salesforce Tower, 415 Mission Street, 3rd Floor, San Francisco, CA 94105, United States
  • heroku.com
  • Legal
  • Terms of Service
  • Privacy Information
  • Responsible Disclosure
  • Trust
  • Contact
  • Cookie Preferences
  • Your Privacy Choices