Skip Navigation
Show nav
Dev Center
  • Get Started
  • Documentation
  • Changelog
  • Search
  • Get Started
    • Node.js
    • Ruby on Rails
    • Ruby
    • Python
    • Java
    • PHP
    • Go
    • Scala
    • Clojure
    • .NET
  • Documentation
  • Changelog
  • More
    Additional Resources
    • Home
    • Elements
    • Products
    • Pricing
    • Careers
    • Help
    • Status
    • Events
    • Podcasts
    • Compliance Center
    Heroku Blog

    Heroku Blog

    Find out what's new with Heroku on our blog.

    Visit Blog
  • Log inorSign up
View categories

Categories

  • Heroku Architecture
    • Compute (Dynos)
      • Dyno Management
      • Dyno Concepts
      • Dyno Behavior
      • Dyno Reference
      • Dyno Troubleshooting
    • Stacks (operating system images)
    • Networking & DNS
    • Platform Policies
    • Platform Principles
  • Developer Tools
    • Command Line
    • Heroku VS Code Extension
  • Deployment
    • Deploying with Git
    • Deploying with Docker
    • Deployment Integrations
  • Continuous Delivery & Integration (Heroku Flow)
    • Continuous Integration
  • Language Support
    • Node.js
      • Working with Node.js
      • Node.js Behavior in Heroku
      • Troubleshooting Node.js Apps
    • Ruby
      • Rails Support
      • Working with Bundler
      • Working with Ruby
      • Ruby Behavior in Heroku
      • Troubleshooting Ruby Apps
    • Python
      • Working with Python
      • Background Jobs in Python
      • Python Behavior in Heroku
      • Working with Django
    • Java
      • Java Behavior in Heroku
      • Working with Java
      • Working with Maven
      • Working with Spring Boot
      • Troubleshooting Java Apps
    • PHP
      • PHP Behavior in Heroku
      • Working with PHP
    • Go
      • Go Dependency Management
    • Scala
    • Clojure
    • .NET
      • Working with .NET
  • Databases & Data Management
    • Heroku Postgres
      • Postgres Basics
      • Postgres Getting Started
      • Postgres Performance
      • Postgres Data Transfer & Preservation
      • Postgres Availability
      • Postgres Special Topics
      • Migrating to Heroku Postgres
    • Heroku Key-Value Store
    • Apache Kafka on Heroku
    • Other Data Stores
  • AI
    • Working with AI
    • Heroku Inference
      • Inference API
      • Quick Start Guides
      • AI Models
      • Inference Essentials
    • Vector Database
    • Model Context Protocol
  • Monitoring & Metrics
    • Logging
  • App Performance
  • Add-ons
    • All Add-ons
  • Collaboration
  • Security
    • App Security
    • Identities & Authentication
      • Single Sign-on (SSO)
    • Private Spaces
      • Infrastructure Networking
    • Compliance
  • Heroku Enterprise
    • Enterprise Accounts
    • Enterprise Teams
    • Heroku Connect (Salesforce sync)
      • Heroku Connect Administration
      • Heroku Connect Reference
      • Heroku Connect Troubleshooting
  • Patterns & Best Practices
  • Extending Heroku
    • Platform API
    • App Webhooks
    • Heroku Labs
    • Building Add-ons
      • Add-on Development Tasks
      • Add-on APIs
      • Add-on Guidelines & Requirements
    • Building CLI Plugins
    • Developing Buildpacks
    • Dev Center
  • Accounts & Billing
  • Troubleshooting & Support
  • Integrating with Salesforce
  • AI
  • Heroku Inference
  • Inference API
  • Managed Inference and Agents API /v1/chat/completions

Managed Inference and Agents API /v1/chat/completions

Last updated May 16, 2025

Table of Contents

  • Request Body Parameters
  • extended_thinking Object
  • tools Array of Objects
  • tool_choice Object
  • messages Array of Objects
  • Request Headers
  • Response Format
  • Example Request
  • Example Response
  • Example Request with Tools
  • Example Response with Tools

The /v1/chat/completions endpoint generates conversational completions for a provided set of input messages. You can specify the model, adjust generation settings such as temperature , and opt to stream the responses in real time. You can also specify tools the model can choose to call.

Selecting a chat model: for the best intelligence, we recommend using a Claude Sonnet model; for cost-savings and fast inference we recommend Claude Haiku.

Request Body Parameters

Use parameters to manage how conversational completions are generated.

Required Parameters

Field Type Description Example
model string model used for completion, typically the value of your INFERENCE_MODEL_ID config var "claude-3-7-sonnet"
messages array array of messages objects (user-assistant conversational turns) used by the model to generate the next response [{"role": "user", "content": "Why is Heroku so awesome?"}]

Optional Parameters

Field Type Description Default Example
extended_thinking object (Claude 3.7 Sonnet only) enable extended thinking to perform internal reasoning steps (see Anthropic’s extended thinking docs) null {"enabled": true, "budget_tokens": 1024, "include_reasoning": true}
max_tokens integer maximum tokens the model may generate before stopping (each token typically represents around 4 characters of text)
max value: 4096 for Haiku models, 8192 for Sonnet models
varies 1024
stop array list of strings that stop the model from generating further tokens if any of the strings are in the response null ["foo"]
stream boolean option to stream responses incrementally via server-sent events (useful for chat interfaces and avoiding timeout errors) false true
temperature float controls the randomness of the response—values closer to 0 make the response more focused by favoring high-probability tokens, while values closer to 1.0 encourage more diverse responses by sampling from a broader range of possibilities for each generated token
range: 0.0 to 1.0
1.0 0.2
tool_choice enum or object option to force the model to use one or more of the tools listed in tools (see tool_choice) "required" "auto"
tools array list of tools the model may call (see tools) [] refer to the JSON example in the tools section
top_p float specifies the proportion of tokens to consider when generating the next token, in terms of cumulative probability
range: 0 to 1.0
0.999 0.95

extended_thinking Object

Extended thinking is only supported for Claude 3.7 Sonnet. Requests that include extended_thinking for unsupported models fail. The extended_thinking object lets you request that the model use additional internal tokens for reasoning steps before producing its final output. Enabling extended thinking typically improves reasoning ability on complex tasks.

Field Type Description Default
enabled boolean indicates if extended thinking is enabled false
budget_tokens integer maximum number of internal “thinking” tokens to use during internal reasoning
must be >= 1024 and < max_tokens
null
include_reasoning boolean indicates if the model’s internal reasoning trace is included in the response false

tools Array of Objects

tools lets you provide your model with an array of tools it can choose to call. Use tool_choice to specify how the model calls tools. When provided, your model may send back tool_calls in the role="assistant" generated message, asking your system to run the specified tool, and send back the result in a role="tool" message.

Note that these tools are given to the model in the form of an extended prompt and no further validation is done. Models may make up tool names that don’t exist in the tools array you have given them. To avoid this, we recommend you perform tool validation on your end when a model sends back a tool_calls assistant message.

Field Type Description Example
type enum<string> type of tool
always: "function")
"function"
function object details about the function to call example)

function Object

Field Type Description Example
description string description of what the function does, used by the model to choose when and how to call the function "This function calculates X"
name string name of the function to be called "example_function"
parameters object parameters the function accepts as a JSON Schema object {"type": "object", "properties": {}}

Example tools Array

[
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. Portland, OR"
            }
          },
          "required": ["location"]
        }
      }
    }
  ]

tool_choice Object

The tool_choice object specifies how the model should use the provided tools.

It can either be a string (none, auto, or required), or a tool_choice object. none will mean the model will call no tools. auto allows the model to call zero to many of the provided tools, and required forces the model to call at least one or more tools before responding to the user.

To force the model to call a specific tool, you may simply specify a single tool in the tools object and pass "tools": "required", or you can force the tool selection by passing a tool_choice object that specifies the required function.

Field Type Description Example
type enum<string> type of tool
always: "function"
"function"
function object JSON object containing the function’s name {"name": "example_function"}

messages Array of Objects

The messages object is an array of message objects.

Each message must specify a role field that determines the messages’s schema (see below).

Currently, the supported types are user, assistant, system, and tool.

If the most recent message uses the assistant role, the model will continue its answer starting from the content in that most recent message.

role=user message

user messages are the primary way to send queries to your model and prompt it to respond.

Field Type Description Required Example
role string role of the message (user) yes "user"
content string contents of the user message yes "What is the weather?"

role=assistant message

Typically, assistant messages are only generated by the model, however you can create your own or pre-fill a partially completed assistant response to help influence the content that the model will generate on its next turn.

Field Type Description Required Example
role string role of the message (assistant) yes "assistant"
content string contents of the assistant message yes, unless tool_calls is specified "Here is the information:"
refusal string or null refusal message by the assistant no "I can't answer that."
tool_calls array tool calls generated by the model no [{"id": "tool_call_12345", "type": "function", "function": {"name": "example_tool", "arguments": {"example_input": 123}}}]

Tool Call Object

Field Type Description Example
id string unique ID for the tool call "tooluse_abc123"
type string type of call, currently always "function" "function"
function object function call details see tool call example
Tool Call Example

Here’s an example of what a tool_calls object might look like, when your model has decided to call a tool you’ve given to it as an option via tools.

"tool_calls": [
  {
    "id": "toolu_02F9GXvY5MZAq8Lw3PTNQyJK",
    "type": "function",
    "function": {
        "name": "get_weather",
        "arguments": "{\"location\":\"Portland, OR\"}"
    }
  }
]

Function Object

Field Type Description Example
name string name of the tool to invoke "your_cool_tool"
arguments string JSON-encoded string of tool arguments "{}"

role=system message

A system message is sort of a prompt ‘prefix’ that is given to the model to help influence its responses.

Field Type Description Required Example
role string role of the message (system) yes "system"
content string or array contents of the system message yes "You are a helpful assistant. You favor brevity and avoid hedging. You readily admit when you don't know an answer."

role=tool message

A tool message object lets you communicate a specified tool’s result (output) to the model.

Field Type Description Required Example
role string role of the message (tool) yes "get_weather"
content string or array tool call result (output) yes "Rainy and 84º"
tool_call_id string tool call that this message is responding to yes "toolu_02F9GXvY5MZAq8Lw3PTNQyJK"

Request Headers

In the following example, we assume your model resource has an alias of “INFERENCE” (the default).

Header Type Description
Authorization string your AI add-on’s 'INFERENCE_KEY' value (API bearer token)

All inference curl requests must include an Authorization header containing your Heroku Inference key.

Response Format

When a request is successful, the API returns a JSON object with the following structure:

Field Type Description Example
id string unique identifier for the chat completion "chatcmpl-12345"
object string the response object type
always: "chat.completion"
"chat.completion"
created integer unix timestamp when the completion was created 1745623456
model string model ID used to generate the response "claude-3-7-sonnet"
system_fingerprint string (optional) fingerprint of the system version that generated the output "heroku-inf-abc123"
choices array of objects list of generated message choices (always length 1) see example response
usage object token usage statistics {"prompt_tokens":15,"completion_tokens":13,"total_tokens":28}

Choice Object

The object inside the choices array (length 1) has the following structure:

Field Type Description Example
index integer index of the choice
always:0
0
message object generated message content see example response
finish_reason enum<string> reason the model stopped
one of: "stop", "length", "tool_calls"
"stop"

Message Object

Field Type Description Example
role enum<string> role of the message sender one of: assistant, user, system, tool assistant
content string text content of the message "hello! how can I help you today?"
reasoning object internal reasoning trace generated if extended_thinking.include_reasoning is true
refusal string (optional) refusal message if the model declines to answer "I can't answer that."
tool_calls array of objects (optional) tool call requests generated by the model see example response

Reasoning Object

If extended_thinking.include_reasoning is set to true, the model returns a reasoning object inside the message.

Field Type Description Example
thinking string internal chain-of-thought reasoning used to form the model’s response "The user is asking about the weather. I should call the get_weather function with Portland, Oregon."
signature string cryptographic signature verifying the reasoning contents "ErcBCkgIAxABGAIi..."
redacted_thinking string (optional, typically omitted in response) redacted version of thinking if any parts were removed for safety or privacy null

Usage Object

Information about token consumption.

Field Type Description Example
prompt_tokens integer number of tokens used in the input prompt 407
completion_tokens integer number of tokens generated in the response 107
total_tokens integer total number of tokens used (prompt + completion) 514

Example Request

Let’s walk through an example /v1/chat/completions curl request.

First, use this command to set your Heroku environment variables as local variables. bash eval $(heroku config -a $APP_NAME --shell | grep '^INFERENCE_' | sed 's/^/export /' | tee >(cat >&2)) Next, send the curl request:

curl $INFERENCE_URL/v1/chat/completions \
 -H "Authorization: Bearer $INFERENCE_KEY" \
 -d @- <<EOF | jq
{
  "model": "$INFERENCE_MODEL_ID",
  "messages": [{"role": "user", "content": "Hello"}]
}
EOF

Example Response

{
  "id": "chatcmpl-1839afa8133ceda215788",
  "object": "chat.completion",
  "created": 1745619466,
  "model": "claude-3-7-sonnet",
  "system_fingerprint": "heroku-inf-1y38gdr",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hi! How can I help you today?",
        "refusal": null
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 12,
    "total_tokens": 20
  }
}

Example Request with Tools

 curl $INFERENCE_URL/v1/chat/completions \
 -H "Authorization: Bearer $INFERENCE_KEY" \
 -d @- <<EOF | jq
 {
    "model": "$INFERENCE_MODEL_ID",
    "messages": [
        {
            "role": "user",
            "content": "What's the weather like in Portland?"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. Portland, OR"
                        }
                    },
                    "required": [
                        "location"
                    ]
                }
            }
        }
    ]
}
EOF

Example Response with Tools

{
  "id": "chatcmpl-1839adcc2079997417288",
  "object": "chat.completion",
  "created": 1745617422,
  "model": "claude-3-7-sonnet",
  "system_fingerprint": "heroku-inf-1y38gdr",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'll help you check the current weather in Portland. Since Portland could refer to either Portland, Oregon or Portland, Maine, I should specify the state.\nI'll check Portland, OR as it's the larger and more commonly referenced Portland.",
        "refusal": null,
        "tool_calls": [
          {
            "id": "tooluse_aFByQsacQ_2BmYMGHvkBmg",
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\"location\":\"Portland, OR\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 407,
    "completion_tokens": 107,
    "total_tokens": 514
  }
}

Keep reading

  • Inference API

Feedback

Log in to submit feedback.

Managed Inference and Agents API /v1/mcp/servers Managed Inference and Agents API /v1/embeddings

Information & Support

  • Getting Started
  • Documentation
  • Changelog
  • Compliance Center
  • Training & Education
  • Blog
  • Support Channels
  • Status

Language Reference

  • Node.js
  • Ruby
  • Java
  • PHP
  • Python
  • Go
  • Scala
  • Clojure
  • .NET

Other Resources

  • Careers
  • Elements
  • Products
  • Pricing
  • RSS
    • Dev Center Articles
    • Dev Center Changelog
    • Heroku Blog
    • Heroku News Blog
    • Heroku Engineering Blog
  • Twitter
    • Dev Center Articles
    • Dev Center Changelog
    • Heroku
    • Heroku Status
  • Github
  • LinkedIn
  • © 2025 Salesforce, Inc. All rights reserved. Various trademarks held by their respective owners. Salesforce Tower, 415 Mission Street, 3rd Floor, San Francisco, CA 94105, United States
  • heroku.com
  • Legal
  • Terms of Service
  • Privacy Information
  • Responsible Disclosure
  • Trust
  • Contact
  • Cookie Preferences
  • Your Privacy Choices