Managed Inference and Agents API /v1/embeddings

Last updated June 30, 2025

Request Body Parameters
Request Headers
Response Format
Example Request
Example Response

The /v1/embeddings endpoint generates vector embeddings (basically, a list of numbers) for a provided set of input texts. These embeddings are optimized for various use cases, such as search, classification, and clustering. You can customize how inputs are processed and choose different embedding types to suit your needs.

Request Body Parameters

Required Parameters

Field	Type	Description	Example
model	string	ID of the embedding model to use	`"cohere-embed-multilingual"`
input	array	single string or an array of strings for the model to embed max of: `96` strings, `2048` characters each recommended: length less than `512` tokens per string	`["example string 1", "example string 2"]`

Optional Parameters

Field	Type	Description	Default	Example
input_type	enum<string>	specifies the type of input passed to the model (prepends special tokens to the input) one of:`search_document`, `search_query`, `classification`, `clustering`	`"search_document"`	`"search_query"`
encoding_format	enum<string>	determines the encoding format of the output one of: `raw` or `base64`	`"raw"`	`"base64"`
embedding_type	enum<string>	specifies the type(s) of embeddings to return (`float`, `int8`, `uint8`, `binary`, `ubinary`)	`"float"`	`"int8"`
allow_ignored_params	boolean	ignore unsupported parameters in request instead of throwing an error	`false`	`true`

Request Headers

In the following example, we assume your model resource has an alias of "EMBEDDING" (meaning you created the model resource with an --as EMBEDDING flag).

Header	Type	Description
`Authorization`	string	your AI add-on’s ‘EMBEDDING_KEY’ value (API bearer token)

Inference curl requests must include an Authorization header containing your Heroku Inference key for the specified model.

Response Format

When a request is successful, the API returns a JSON object with the following structure:

Field	Type	Description
object	string	outer structure of the response always: `"list"`
data	array of objects	list of embeddings generated, one per input
model	string	ID of the model that generated the embeddings
usage	object	metadata about token usage (`prompt_tokens`, `total_tokens`)

Embedding Object

Each object inside the data array includes:

Field	Type	Description
object	string	type of object always: `"embedding"`
index	integer	index of the input string this embedding corresponds to (starting from 0)
embedding	array or string	embedding vector (of type `embedding_type`)

Example Request

Let’s walk through an example /v1/embeddings curl request.

First, use this command to set your Heroku environment variables as local variables.

eval $(heroku config -a $APP_NAME --shell | grep '^EMBEDDING_' | sed 's/^/export /' | tee >(cat >&2))

Next, send the curl request:

curl $EMBEDDING_URL/v1/embeddings \
 -H "Authorization: Bearer $EMBEDDING_KEY" \
 -d @- <<EOF
{
  "model": "$EMBEDDING_MODEL_ID",
  "input": "Hello, I am a long string (document) and I want to be turned into a searchable embedding vector! What fun!"
}
EOF

Example Response

{
    "object": "list",
    "data": [
        {
            "object": "embedding",
            "index": 0,
            "embedding": [
                -0.014755249,
                0.017410278,
                ...
                -0.041992188,
                0.006137848
            ]
        }
    ],
    "model": "cohere-embed-multilingual",
    "usage": {
        "prompt_tokens": 29,
        "total_tokens": 29
    }
}

Keep reading

Inference API

Categories