Skip to main content
POST
/
v1
/
chat
/
completions
Create chat completion
curl --request POST \
  --url https://api.edgee.ai/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "openai/gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "<string>",
      "name": "<string>",
      "tool_call_id": "<string>",
      "refusal": "<string>",
      "tool_calls": [
        {
          "id": "<string>",
          "type": "function",
          "function": {
            "name": "<string>",
            "arguments": "<string>"
          }
        }
      ]
    }
  ],
  "max_tokens": 2,
  "stream": false,
  "stream_options": {
    "include_usage": true
  },
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "<string>",
        "description": "<string>",
        "parameters": {}
      }
    }
  ],
  "tool_choice": "none",
  "edgee_tool_ids": [
    "edgee_current_time",
    "edgee_generate_uuid"
  ],
  "edgee_pending_id": "<string>",
  "tags": [
    "<string>"
  ]
}
'
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 10,
    "total_tokens": 20,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens_details": {
      "reasoning_tokens": 0
    }
  },
  "compression": {
    "saved_tokens": 450,
    "cost_savings": 27000,
    "reduction": 48,
    "time_ms": 12
  }
}
Creates a completion for the chat message. The Edgee API is OpenAI-compatible and works with any model and provider. Supports both streaming and non-streaming responses.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your API key. More info here

Headers

X-Edgee-Enable-Compression
boolean

Enable token compression for this request. When true, the gateway compresses the prompt at the edge before forwarding to the provider, reducing input token costs by up to 50%. When compression is applied, the response includes a compression object with savings metrics.

X-Edgee-Tags
string

Comma-separated list of tags for categorizing and filtering requests in analytics and logs. Example: production,chatbot,customer-support

X-Edgee-Debug
boolean

Enable debug mode to include additional debugging information in the response.

Body

application/json
model
string
required

ID of the model to use. Format: {author_id}/{model_id} (e.g. openai/gpt-4o)

Example:

"openai/gpt-4o"

messages
object[]
required

A list of messages comprising the conversation so far.

Minimum array length: 1
max_tokens
integer

The maximum number of tokens that can be generated in the chat completion.

Required range: x >= 1
stream
boolean
default:false

If set, partial message deltas will be sent, as in OpenAI. Streamed chunks are sent as Server-Sent Events (SSE).

stream_options
object

Options for streaming response.

tools
object[]

A list of tools the model may call. Currently, only function type is supported.

tool_choice

Controls which tool is called by the model.

Available options:
none,
auto
edgee_tool_ids
string[]

List of Edge Tool IDs to inject (e.g. edgee_current_time, edgee_generate_uuid). Each ID must be activated for your API key. When omitted or empty, only tools with hydration enabled for your org or API key are auto-injected. Invalid or non-activated IDs return 400 with invalid_edgee_tool_ids.

Example:
["edgee_current_time", "edgee_generate_uuid"]
edgee_pending_id
string

Pending operation ID when continuing a conversation after Edge Tool execution (e.g. when mixing client-side and Edge Tools). The gateway injects stored Edge Tool results into the message history.

tags
string[]

Optional tags to categorize and label the request. Useful for filtering and grouping requests in analytics and logs. Can also be sent via the x-edgee-tags header as a comma-separated string.

Response

Chat completion created successfully

id
string
required

A unique identifier for the chat completion.

Example:

"chatcmpl-123"

object
enum<string>
required

The object type, which is always chat.completion.

Available options:
chat.completion
created
integer
required

The Unix timestamp (in seconds) of when the chat completion was created.

Example:

1677652288

model
string
required

The model used for the chat completion.

Example:

"openai/gpt-4o"

choices
object[]
required

A list of chat completion choices. Can be more than one if n is greater than 1.

usage
object
required

Usage statistics for the completion. In streaming responses, this is only present in the final chunk when stream_options.include_usage is true.

compression
object

Token compression metrics. Present in the response when token compression was applied to the request (via X-Edgee-Enable-Compression: true header or console settings). The usage.prompt_tokens field reflects the compressed token count actually billed by the provider.