Managed Inference and Agents API /v1/embeddings
Last updated May 13, 2025
Table of Contents
The /v1/embeddings
endpoint generates vector embeddings (basically, a list of numbers) for a provided set of input texts. These embeddings are optimized for various use cases, such as search, classification, and clustering. You can customize how inputs are processed and choose different embedding types to suit your needs.
Request Body Parameters
Required Parameters
Field | Type | Description | Example |
---|---|---|---|
model | string | ID of the embedding model to use | "cohere-embed-multilingual" |
input | array | single string or an array of strings for the model to embed max of: 96 strings, 2048 characters eachrecommended: length less than 512 tokens per string |
["example string 1", "example string 2"] |
Optional Parameters
Field | Type | Description | Default | Example |
---|---|---|---|---|
input_type | enum<string> | specifies the type of input passed to the model (prepends special tokens to the input) one of: search_document , search_query , classification , clustering |
"search_document" |
"search_query" |
encoding_format | enum<string> | determines the encoding format of the output one of: raw or base64 |
"raw" |
"base64" |
embedding_type | enum<string> | specifies the type(s) of embeddings to return (float , int8 , uint8 , binary , ubinary ) |
"float" |
"int8" |
Request Headers
In the following example, we assume your model resource has an alias of "EMBEDDING"
(meaning you created the model resource with an --as EMBEDDING
flag).
Header | Type | Description |
---|---|---|
Authorization |
string | your AI add-on’s ‘EMBEDDING_KEY’ value (API bearer token) |
Inference curl
requests must include an Authorization
header containing your Heroku Inference key for the specified model.
Response Format
When a request is successful, the API returns a JSON object with the following structure:
Field | Type | Description |
---|---|---|
object | string | outer structure of the response always: "list" |
data | array of objects | list of embeddings generated, one per input |
model | string | ID of the model that generated the embeddings |
usage | object | metadata about token usage (prompt_tokens , total_tokens ) |
Embedding Object
Each object inside the data
array includes:
Field | Type | Description |
---|---|---|
object | string | type of object always: "embedding" |
index | integer | index of the input string this embedding corresponds to (starting from 0) |
embedding | array or string | embedding vector (of type embedding_type ) |
Example Request
Let’s walk through an example /v1/embeddings
curl
request.
First, use this command to set your Heroku environment variables as local variables.
eval $(heroku config -a $APP_NAME --shell | grep '^EMBEDDING_' | sed 's/^/export /' | tee >(cat >&2))
Next, send the curl
request:
curl $EMBEDDING_URL/v1/embeddings \
-H "Authorization: Bearer $EMBEDDING_KEY" \
-d @- <<EOF
{
"model": "$EMBEDDING_MODEL_ID",
"input": "Hello, I am a long string (document) and I want to be turned into a searchable embedding vector! What fun!"
}
EOF
Example Response
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
-0.014755249,
0.017410278,
...
-0.041992188,
0.006137848
]
}
],
"model": "cohere-embed-multilingual",
"usage": {
"prompt_tokens": 29,
"total_tokens": 29
}
}