Managed Inference and Agents API /v1/agents/heroku

Last updated July 03, 2025

Request Body Parameters
tools Array of Objects
messages Array of Objects
Request Headers
Response Format
Example Request
Example Response

The /v1/agents/heroku endpoint allows you to interact with an agentic system powered by large language models (LLMs) that can autonomously invoke tools based on your messages. Unlike /chat/completions, which generates a single model response, the v1/agents/heroku endpoint supports automatic tool execution and multistep workflows.

Request Body Parameters

Use the following parameters to manage the behavior of the agent and which tools it can use.

Required Parameters

Field	Type	Description	Example
model	string	model used for inference, typically the value of your `INFERENCE_MODEL_ID` config var
messages	array	array of messages used by the agent to determine its response and next actions	`[{"role": "user", "content": "Check my database schema."}]`

Optional Parameters

Field	Type	Description	Default	Example
max_tokens_per_inference_request	integer	max number of tokens the model can generate during each underlying inference request before stopping (a single call to `/v1/agents/heroku` can include multiple underlying inference requests) max value: 4096 for Haiku models, 8192 for Sonnet models	varies	`1024`
stop	array	list of strings that stop the model from generating further tokens if any of the strings are in the response (for example, `["foo"]` causes the model to stop generating output only if it generated the string `"foo"`)	`null`	`["foo"]`
temperature	float	controls randomness of the response: values closer to `0` make responses more focused by favoring high-probability tokens, while values closer to `1.0` encourage more diverse responses by sampling from a broader range of possibilities for each generated token range: `0.0` to `1.0`	`1.0`	`0.2`
tools	array	list of tools the agent is allowed to use	`null`	see tools field in the example request
top_p	float	specifies the proportion of tokens to consider when generating the next token, in terms of cumulative probability range: `0` to `1.0`	`0.999`	`0.95`

`tools` Array of Objects

Each tool in the array allows the agent to call an action on your behalf. Heroku automatically executes tool calls via one-off dynos. The /v1/agents/heroku endpoint currently supports two types of tools:

heroku_tool: 1st-party tools that Heroku Managed Inference and Agents natively supports
MCP tools: custom MCP tools you deploy to Heroku, which Heroku automatically runs when called by your model. To learn more, see Heroku MCP Tools about how to deploy your own custom MCP tools to Heroku.

Field	Type	Description	Example
type	enum<string>	type of tool one of:`heroku_tool`, `mcp`	`"heroku_tool"`
name	string	name of tool (see Heroku Tools for available tools)	`"code_exec_ruby"`
description	string	(optional) hint text to inform the model when to use this tool	`"Runs SQL query on a Heroku database"`
runtime_params	object	configuration to control automatic execution of Heroku Tools and `mcp` tools (see runtime parameters)

Runtime Parameters

The v1/agents/heroku endpoint passes certain settings to the specified mcp or heroku_tool tools at runtime The model can’t modify the settings.

Field	Type	Description	Default	Example
target_app_name	string	(required) name of Heroku app to run the tool in		`"my-heroku-app"`
dyno_size	string	dyno size to use when running the tool	`"standard-1x"`	`"standard-1x"`
ttl_seconds	integer	max seconds a dyno is allowed to run max: `120`	`120`	`10`
max_calls	integer	max number of times this tool can be called during the agent loop	`3`	`1`
tool_params	object	additional parameters for tool (for example, `cmd`, `db_attachment`) (see tool-specific docs)	(varies)	`{}`

mcp type tools allow optionally specifying ttl_seconds, max_calls, and dyno_size. The other parameters aren’t supported.
heroku_tool type tools require or allow certain parameters depending on the tool itself. See tool-specific docs for more information.

`messages` Array of Objects

A messages object is an array of message objects.

Each message must specify a role field that determines the message’s schema. Currently, the supported types are user, assistant, system, and tool.

If the most recent message uses the assistant role, the model will continue its answer starting from the content in that most recent message.

role=`user` message

user messages are the primary way to send queries to your model and prompt it to respond.

Field	Type	Description	Required	Example
role	string	role of message always: `"user"`	yes	`"user"`
content	string	contents of user message	yes	`"What is the weather?"`

role=`assistant` message

Typically, the model only generates assistant messages. However, you can create or prefill a partially completed assistant response to influence the content a model generates on its next turn.

Field	Type	Description	Required	Example
role	string	role of message always: `"assistant"`	yes	`"assistant"`
content	string	contents of assistant message	yes, unless `tool_calls` is specified	`"Here is the information"`
refusal	string or null	(currently ignored in requests) refusal message by the assistant	no	`"I cannot answer that"`
tool_calls	array	array of tool call request objects	no	`[{"id": "tool_call_12345", "type": "function", "function": {"name": "my_cool_tool", "arguments": {"some_input": 123}}}]`

Tool Call Object

Represents the model’s request to execute a specific tool.

Field	Type	Description	Example
id	string	unique ID for the tool call	`"tooluse_abc123"`
type	string	type of call always: `"function"`	`"function"`
function	object	function call details	see tool call example

Tool Call Example

"tool_calls": [
  {
    "id": "tooluse_abc123",
    "type": "function",
    "function": {
      "name": "dyno_run_command",
      "arguments": "{}"
    }
  }
]

Function Object

Field	Type	Description	Example
name	string	name of tool to invoke	`"dyno_run_command"`
arguments	string	JSON-encoded string of tool arguments	`"{}"`

role=`system` message

A system message is a special prompt given to the model to guide its responses.

Field	Type	Description	Required	Example
role	string	role of message always: `"system"`	yes	`"system"`
content	string	contents of system message	yes	`"You are a helpful assistant. You favor brevity and avoid hedging. You readily admit when you don't know an answer."`

role=`tool` message

A tool message object representing a specified tool’s output.

Field	Type	Description	Required	Example
role	string	role of message always: `"tool"`	yes	`"tool"`
content	string	output of tool call	yes	`"Rainy and 84º"`
tool_call_id	string	tool call the message is responding to	yes	`"toolu_02F9GXvY5MZAq8Lw3PTNQyJK"`

Request Headers

Header	Type	Description
`Authorization`	string	Bearer token containing your Heroku Inference API key

All /v1/agents/heroku requests must include the following header:

-H "Authorization: Bearer $INFERENCE_KEY"

Response Format

Agent responses are streamed back over Server-Sent Events (SSE). Each event: message includes a JSON payload representing a completion. The final event is event: done with the data [DONE].

Completion Object

Field	Type	Description	Example
id	string	unique ID for agent session	`"chatcmpl-abc123"`
object	enum<string>	type of completion one of: `chat.completion`, `tool.completion`	`"tool.completion"`
created	integer	unix timestamp when chunk was created	`1746546550`
model	string	model ID used to generate the message	`"claude-4-sonnet"`
system_fingerprint	string	fingerprint of system generating output	`"heroku-inf-abc123"`
choices	array of objects	array of length 1 containing a single choice object	see example response
usage	object	token usage statistics, empty for tool completions (no tokens consumed)	`{"prompt_tokens":15,"completion_tokens":13,"total_tokens":28}`

Choice Object

Field	Type	Description	Example
index	integer	index of the choice always: `0`	`0`
message	object	message content (response messages will always be of role `assistant` or `tool`)	see example response
finish_reason	enum<string>	reason model stopped one of: `stop`, `length`, `tool_calls`, `""`	`"tool_calls"`

Usage Object

Field	Type	Description	Example
prompt_tokens	integer	tokens used in prompt	`397`
completion_tokens	integer	tokens used in response	`65`
total_tokens	integer	sum of prompt and completion tokens	`462`

Each event: message streamed over SSE contains a single completion object: either a chat.completion or a tool.completion. The final message is event: done with data: [DONE].

Example Request

 curl --location $INFERENCE_URL/v1/agents/heroku \
  --header 'Content-Type: application/json' \
  --header "Authorization: Bearer $INFERENCE_KEY" \
  --data @- <<EOF
{
  "model": "$INFERENCE_MODEL_ID",
  "messages": [
    {
      "role": "user",
      "content": "What is the current time and date?"
    }
  ],
  "tools": [
    {
      "type": "heroku_tool",
      "name": "dyno_run_command",
      "runtime_params": {
        "target_app_name": "$APP_NAME",
        "tool_params": {
          "cmd": "echo hello && date",
          "description": "Runs `echo hello && date` on one-off dyno.",
          "parameters": {
            "type": "object",
            "properties": {},
            "required": []
          }
        }
      }
    }
  ]
}
EOF

Example Response

event:message
data:{"id":"chatcmpl-18441121a827937e07e20","object":"chat.completion","created":1748541400,"model":"claude-4-sonnet","system_fingerprint":"heroku-inf-1sefyj8","choices":[{"index":0,"message":{"role":"assistant","content":"I'll run the command to get the current time and date for you.","refusal":null,"tool_calls":[{"id":"tooluse_mo8vfAMhRqm3ypFIoCTmng","type":"function","function":{"name":"dyno_run_command","arguments":"{}"}}]},"finish_reason":"tool_calls"}],"usage":{"prompt_tokens":397,"completion_tokens":55,"total_tokens":452}}

event:message
data:{"id":"chatcmpl-18441121a827937e07e20","object":"tool.completion","created":1748541402,"system_fingerprint":"heroku-inf-1sefyj8","choices":[{"index":0,"message":{"role":"tool","content":"Tool 'dyno_run_command' returned result: hello\nThu May 29 05:56:42 PM UTC 2025","refusal":null,"tool_call_id":"tooluse_mo8vfAMhRqm3ypFIoCTmng","name":"dyno_run_command"},"finish_reason":""}],"usage":{}}

event:message
data:{"id":"chatcmpl-18441121a827937e07e20","object":"chat.completion","created":1748541407,"model":"claude-4-sonnet","system_fingerprint":"heroku-inf-1sefyj8","choices":[{"index":0,"message":{"role":"assistant","content":"Based on the command output, the current time and date is:\n\n**Thursday, May 29, 2025 at 5:56:42 PM UTC**","refusal":null},"finish_reason":"stop"}],"usage":{"prompt_tokens":500,"completion_tokens":40,"total_tokens":540}}

event:done
data:[DONE]

Keep reading

Inference API

Categories

Managed Inference and Agents API /v1/agents/heroku

Table of Contents

Request Body Parameters

Required Parameters

Optional Parameters

`tools` Array of Objects

Runtime Parameters

`messages` Array of Objects

role=`user` message

role=`assistant` message

Tool Call Object

Tool Call Example

Function Object

role=`system` message

role=`tool` message

Request Headers

Response Format

Completion Object

Choice Object

Usage Object

Example Request

Example Response

Keep reading

Feedback

Categories

Managed Inference and Agents API /v1/agents/heroku

Table of Contents

Request Body Parameters

Required Parameters

Optional Parameters

tools Array of Objects

Runtime Parameters

messages Array of Objects

role=user message

role=assistant message

Tool Call Object

Tool Call Example

Function Object

role=system message

role=tool message

Request Headers

Response Format

Completion Object

Choice Object

Usage Object

Example Request

Example Response

Keep reading

Feedback

`tools` Array of Objects

`messages` Array of Objects

role=`user` message

role=`assistant` message

role=`system` message

role=`tool` message