Managed Inference and Agents API /v1/agents/heroku
Last updated May 15, 2025
Table of Contents
The /v1/agents/heroku
endpoint allows you to interact with an agentic system powered by large language models (LLMs) that can autonomously invoke tools based on your messages. Unlike /chat/completions
, which generates a single model response, the v1/agents/heroku
endpoint supports automatic tool execution and multistep workflows.
Request Body Parameters
Use the following parameters to manage the behavior of the agent and which tools it can use.
Required Parameters
Field | Type | Description | Example |
---|---|---|---|
model | string | model used for inference, typically the value of your INFERENCE_MODEL_ID config var |
|
messages | array | array of messages used by the agent to determine its response and next actions | [{"role": "user", "content": "Check my database schema."}] |
Optional Parameters
Field | Type | Description | Default | Example |
---|---|---|---|---|
max_tokens_per_inference_request | integer | max number of tokens the model can generate during each underlying inference request before stopping (a single call to /v1/agents/heroku can include multiple underlying inference requests)max value: 4096 for Haiku models, 8192 for Sonnet models |
varies | 1024 |
stop | array | list of strings that stop the model from generating further tokens if any of the strings are in the response (for example, ["foo"] causes the model to stop generating output only if it generated the string "foo" ) |
null |
["foo"] |
temperature | float | controls randomness of the response: values closer to 0 make responses more focused by favoring high-probability tokens, while values closer to 1.0 encourage more diverse responses by sampling from a broader range of possibilities for each generated tokenrange: 0.0 to 1.0 |
1.0 |
0.2 |
tools | array | list of tools the agent is allowed to use | null |
see tools field in the example request |
top_p | float | specifies the proportion of tokens to consider when generating the next token, in terms of cumulative probability range: 0 to 1.0 |
0.999 |
0.95 |
tools
Array of Objects
Each tool in the array allows the agent to call an action on your behalf. Heroku automatically executes tool calls via one-off dynos. The /v1/agents/heroku
endpoint currently supports two types of tools:
- heroku_tool: 1st-party tools that Heroku Managed Inference and Agents natively supports
- MCP tools: custom MCP tools you deploy to Heroku, which Heroku automatically runs when called by your model. To learn more, see Heroku MCP Tools about how to deploy your own custom MCP tools to Heroku.
Field | Type | Description | Example |
---|---|---|---|
type | enum<string> | type of tool one of: heroku_tool , mcp |
"heroku_tool" |
name | string | name of tool (see Heroku Tools for available tools) | "code_exec_ruby" |
description | string | (optional) hint text to inform the model when to use this tool | "Runs SQL query on a Heroku database" |
runtime_params | object | configuration to control automatic execution of Heroku Tools and mcp tools (see runtime parameters) |
Runtime Parameters
The v1/agents/heroku
endpoint passes certain settings to the specified mcp
or heroku_tool
tools at runtime The model can’t modify the settings.
Field | Type | Description | Default | Example |
---|---|---|---|---|
target_app_name | string | (required) name of Heroku app to run the tool in | "my-heroku-app" |
|
dyno_size | string | dyno size to use when running the tool | "standard-1x" |
"standard-1x" |
ttl_seconds | integer | max seconds a dyno is allowed to run max: 120 |
120 |
10 |
max_calls | integer | max number of times this tool can be called during the agent loop | 3 |
1 |
tool_params | object | additional parameters for tool (for example, cmd , db_attachment ) (see tool-specific docs) |
(varies) | {} |
mcp
type tools allow optionally specifyingttl_seconds
,max_calls
, anddyno_size
. The other parameters aren’t supported.heroku_tool
type tools require or allow certain parameters depending on the tool itself. See tool-specific docs for more information.
messages
Array of Objects
A messages
object is an array of message objects.
Each message must specify a role
field that determines the message’s schema. Currently, the supported types are user
, assistant
, system
, and tool
.
If the most recent message uses the assistant
role, the model will continue its answer starting from the content in that most recent message.
role=user
message
user
messages are the primary way to send queries to your model and prompt it to respond.
Field | Type | Description | Required | Example |
---|---|---|---|---|
role | string | role of message always: "user" |
yes | "user" |
content | string | contents of user message | yes | "What is the weather?" |
role=assistant
message
Typically, the model only generates assistant
messages. However, you can create or prefill a partially completed assistant
response to influence the content a model generates on its next turn.
Field | Type | Description | Required | Example |
---|---|---|---|---|
role | string | role of message always: "assistant" |
yes | "assistant" |
content | string | contents of assistant message | yes, unless tool_calls is specified |
"Here is the information" |
refusal | string or null | refusal message by assistant | no | "I cannot answer that" |
tool_calls | array | array of tool call request objects | no | [{"id": "tool_call_12345", "type": "function", "function": {"name": "my_cool_tool", "arguments": {"some_input": 123}}}] |
Tool Call Object
Represents the model’s request to execute a specific tool.
Field | Type | Description | Example |
---|---|---|---|
id | string | unique ID for the tool call | "tooluse_abc123" |
type | string | type of call always: "function" |
"function" |
function | object | function call details | see tool call example |
Tool Call Example
"tool_calls": [
{
"id": "tooluse_abc123",
"type": "function",
"function": {
"name": "dyno_run_command",
"arguments": "{}"
}
}
]
Function Object
Field | Type | Description | Example |
---|---|---|---|
name | string | name of tool to invoke | "dyno_run_command" |
arguments | string | JSON-encoded string of tool arguments | "{}" |
role=system
message
A system
message is a special prompt given to the model to guide its responses.
Field | Type | Description | Required | Example |
---|---|---|---|---|
role | string | role of message always: "system" |
yes | "system" |
content | string | contents of system message | yes | "You are a helpful assistant. You favor brevity and avoid hedging. You readily admit when you don't know an answer." |
role=tool
message
A tool
message object representing a specified tool’s output.
Field | Type | Description | Required | Example |
---|---|---|---|---|
role | string | role of message always: "tool" |
yes | "tool" |
content | string | output of tool call | yes | "Rainy and 84º" |
tool_call_id | string | tool call the message is responding to | yes | "toolu_02F9GXvY5MZAq8Lw3PTNQyJK" |
Request Headers
Header | Type | Description |
---|---|---|
Authorization |
string | Bearer token containing your Heroku Inference API key |
All /v1/agents/heroku
requests must include the following header:
-H "Authorization: Bearer $INFERENCE_KEY"
Response Format
Agent responses are streamed back over Server-Sent Events (SSE). Each event: message
includes a JSON payload representing a completion. The final event is event: done
with the data [DONE]
.
Completion Object
Field | Type | Description | Example |
---|---|---|---|
id | string | unique ID for agent session | "chatcmpl-abc123" |
object | enum<string> | type of completion one of: chat.completion , tool.completion |
"tool.completion" |
created | integer | unix timestamp when chunk was created | 1746546550 |
model | string | model ID used to generate the message | "claude-3-7-sonnet" |
system_fingerprint | string | fingerprint of system generating output | "heroku-inf-abc123" |
choices | array of objects | array of length 1 containing a single choice object | see example response |
usage | object | token usage statistics, empty for tool completions (no tokens consumed) | {"prompt_tokens":15,"completion_tokens":13,"total_tokens":28} |
Choice Object
Field | Type | Description | Example |
---|---|---|---|
index | integer | index of the choice always: 0 |
0 |
message | object | message content (response messages will always be of role assistant or tool ) |
see example response |
finish_reason | enum<string> | reason model stopped one of: stop , length , tool_calls , "" |
"tool_calls" |
Usage Object
Field | Type | Description | Example |
---|---|---|---|
prompt_tokens | integer | tokens used in prompt | 397 |
completion_tokens | integer | tokens used in response | 65 |
total_tokens | integer | sum of prompt and completion tokens | 462 |
Each event: message
streamed over SSE contains a single completion object: either a chat.completion
or a tool.completion
. The final message is event: done
with data: [DONE]
.
Example Request
curl --location $INFERENCE_URL/v1/agents/heroku \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $INFERENCE_KEY" \
--data @- <<EOF
{
"model": "$INFERENCE_MODEL_ID",
"messages": [
{
"role": "user",
"content": "What is the current time and date?"
}
],
"tools": [
{
"type": "heroku_tool",
"name": "dyno_run_command",
"runtime_params": {
"target_app_name": "$APP_NAME",
"tool_params": {
"cmd": "echo hello && date",
"description": "Runs `echo hello && date` on one-off dyno.",
"parameters": {
"type": "object",
"properties": {},
"required": []
}
}
}
}
]
}
EOF
Example Response
event:message
data:{"id":"chatcmpl-183de038cafa9c3b09d8e","object":"chat.completion","created":1746798767,"model":"claude-3-7-sonnet","system_fingerprint":"heroku-inf-np7w0x","choices":[{"index":0,"message":{"role":"assistant","content":"I can help you find the current time and date by running a command on the system. Let me do that for you.","refusal":null,"tool_calls":[{"id":"tooluse_lgp6wvphSU-tz_8Ljp42Kg","type":"function","function":{"name":"dyno_run_command","arguments":"{}"}}]},"finish_reason":"tool_calls"}],"usage":{"prompt_tokens":397,"completion_tokens":65,"total_tokens":462}}
event:message
data:{"id":"chatcmpl-183de038cafa9c3b09d8e","object":"tool.completion","created":1746798768,"system_fingerprint":"heroku-inf-np7w0x","choices":[{"index":0,"message":{"role":"tool","content":"Tool 'dyno_run_command' returned result: hello\nFri May 9 13:52:48 UTC 2025","refusal":null,"tool_call_id":"tooluse_lgp6wvphSU-tz_8Ljp42Kg","name":"dyno_run_command"},"finish_reason":""}],"usage":{}}
event:message
data:{"id":"chatcmpl-183de038cafa9c3b09d8e","object":"chat.completion","created":1746798771,"model":"claude-3-7-sonnet","system_fingerprint":"heroku-inf-np7w0x","choices":[{"index":0,"message":{"role":"assistant","content":"The current time and date is:\nFriday, May 9, 2025, 13:52:48 UTC (Coordinated Universal Time)\n\nThis corresponds to:\n- 6:52:48 AM PDT (Pacific Daylight Time)\n- 9:52:48 AM EDT (Eastern Daylight Time)\n\nNote that the actual current time in your local timezone may differ depending on where you are located.","refusal":null},"finish_reason":"stop"}],"usage":{"prompt_tokens":509,"completion_tokens":99,"total_tokens":608}}
event:done
data:[DONE]