Give Feedback

How to Use Serverless Inference on DigitalOcean Gradient™ AI Platform

Validated on 9 Feb 2026 • Last edited on 27 Feb 2026

DigitalOcean Gradient™ AI Platform lets you build fully-managed AI agents with knowledge bases for retrieval-augmented generation, multi-agent routing, guardrails, and more, or use serverless inference to make direct requests to popular foundation models.

Serverless inference lets you send API requests directly to foundation models without creating or managing an AI agent. This generates responses without any initial instructions or configuration to the model.

All requests are billed per input and output token.

Prerequisites

To use serverless inference, you need to authenticate your HTTP requests with a model access key. You can create a model access key in the DigitalOcean Control Panel or by sending a POST request to the /v2/gen-ai/models/api_keys endpoint. Then, send your prompts to models from OpenAI, Anthropic, Meta, or other providers using the serverless inference API endpoints.

Serverless Inference API Endpoints

The serverless inference API is available at https://inference.do-ai.run and has the following endpoints:

Endpoint	Verb	Description
`/v1/models`	GET	Returns a list of available models and their IDs.
`/v1/chat/completions`	POST	Sends chat-style prompts and returns model responses.
`/v1/responses`	POST	Sends chat-style prompts and returns text or multimodal model responses.
`/v1/images/generations`	POST	Generates images from text prompts.
`/v1/async-invoke`	POST	Sends text, image, or text-to-speech generation requests to fal models.

We support both /v1/chat/completions and /v1/responses endpoints for sending prompts. Choose the endpoint that best fits your use case:

Use /v1/chat/completions when building or maintaining chat-style integrations that rely on structured messages with roles such as system, user, and assistant, or when migrating existing chat-based code with minimal changes.
Use /v1/responses when building new integrations or working with newer models that only support the Responses API. It’s also useful for multi-step tool use in a single request, preserving state across turns with store: true, and simplifying requests by using a single input field with improved caching efficiency.

You can use these endpoints through cURL, Python OpenAI, or Gradient SDK.

Retrieve Available Models

The following cURL, Python OpenAI, and Gradient SDK examples show how to retrieve available models.

cURL

Send a GET request to the /v1/models endpoint using your model access key. For example:

curl -X GET https://inference.do-ai.run/v1/models \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json"

This returns a list of available models with their corresponding model IDs (id):

...
{
  "created": 1752255238,
  "id": "alibaba-qwen3-32b",
  "object": "model",
  "owned_by": "digitalocean"
},
{
  "created": 1737056613,
  "id": "anthropic-claude-3.5-haiku",
  "object": "model",
  "owned_by": "anthropic"
},
...

Python OpenAI

from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()

client = OpenAI(
    base_url="https://inference.do-ai.run/v1/",
    api_key=os.getenv("MODEL_ACCESS_KEY"),
)

models = client.models.list()
for m in models.data:
    print("-", m.id)

Gradient SDK

from gradient import Gradient
from dotenv import load_dotenv
import os

load_dotenv()

client = Gradient(model_access_key=os.getenv("MODEL_ACCESS_KEY"))

models = client.models.list()

print("Available models:")
for model in models.data:
    print(f"  - {model.id}")

Send Prompt to a Model Using the Chat Completions API

The following cURL, Python OpenAI, and Gradient SDK examples show how to send a prompt to a model. Include your model access key and the following in your request:

model: The model ID of the model you want to use. Get the model ID using /v1/models or on the available models page.
messages: The input prompt or conversation history. Serverless inference does not have sessions, so include all relevant context using this field.
temperature: A value between 0.0 and 1.0 to control randomness and creativity.
max_completion_tokens: The maximum number of tokens to generate in the response. Use this to manage output length and cost.

For Anthropic models, we recommend you specify this parameter for better accuracy and control of the model response. For models by other providers, this parameter is optional and defaults to around 2048 tokens.
max_tokens: This parameter is deprecated. Use max_completion_tokens instead to control the size of the generated response.

You can also use prompt caching and reasoning parameters in your request. For examples, see Use Prompt Caching and Use Reasoning.

cURL

Send a POST request to the /v1/chat/completions endpoint using your model access key.

The following example request sends a prompt to a Llama 3.3 Instruct-70B model with the prompt What is the capital of France?, a temperature of 0.7, and maximum number of tokens set to 256.

curl -X POST https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.3-70b-instruct",
    "messages": [
        {
            "role": "user",
            "content": "What is the capital of France?"
            }
            ], 
    "temperature": 0.7,
    "max_completion_tokens": 256
  }'

The response includes the generated text and token usage details:

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "audio": null,
        "content": "The capital of France is Paris.",
        "refusal": null,
        "role": ""
      }
    }
  ],
  "created": 1747247763,
  "id": "",
  "model": "llama3.3-70b-instruct",
  "object": "chat.completion",
  "service_tier": null,
  "usage": {
    "completion_tokens": 8,
    "prompt_tokens": 43,
    "total_tokens": 51
  }
}

Python OpenAI

from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()

client = OpenAI(
    base_url="https://inference.do-ai.run/v1/",
    api_key=os.getenv("MODEL_ACCESS_KEY"),
)

resp = client.chat.completions.create(
    model="llama3-8b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a fun fact about octopuses."}
    ],
)

print(resp.choices[0].message.content)

Gradient SDK

from gradient import Gradient
from dotenv import load_dotenv
import os

load_dotenv()

client = Gradient(model_access_key=os.getenv("MODEL_ACCESS_KEY"))

resp = client.chat.completions.create(
    model="llama3-8b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a fun fact about octopuses."}
    ],
)

print(resp.choices[0].message.content)

Use Reasoning

For models that support reasoning, you can pass a reasoning parameter in the request body either in OpenAI format using reasoning_effort or Anthropic format using reasoning.effort. The reasoning effort can be set to none, low, medium, high or max.

The following cURL example shows how to specify reasoning effort for Claude Opus 4.5 model in the Anthropic format:

curl -X POST https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic-claude-opus-4.5",
    "messages": [
      {
        "role": "user",
        "content": "What is 27 * 453? Think step by step."
      }
    ],
    "max_completion_tokens": 1192,
    "reasoning": {
      "effort": "high",
      "max_tokens": 1024
    }
  }'

The output shows the response shown step-by-step as requested in the model prompt:

{
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "# Calculating 27 × 453\n\nI'll break this into smaller parts:\n\n**Step 1:** Break down 453 into 400 + 50 + 3\n\n**Step 2:** Multiply each part by 27\n- 27 × 400 = 10,800\n- 27 × 50 = 1,350\n- 27 × 3 = 81\n\n**Step 3:** Add the results\n- 10,800 + 1,350 + 81 = **12,231**",
                "reasoning_content": "I need to calculate 27 * 453.\n\nLet me break this down step by step.\n\n27 * 453 = 27 * (400 + 50 + 3)\n= 27 * 400 + 27 * 50 + 27 * 3\n\n27 * 400 = 10,800\n27 * 50 = 1,350\n27 * 3 = 81\n\n10,800 + 1,350 + 81 = 12,231",
                "refusal": null,
                "role": "assistant"
            }
        }
    ],
    "created": 1771946745,
    "id": "",
    "model": "anthropic-claude-opus-4.5",
    ...
}

Note

For Anthropic models, if you omit the max_tokens parameter for reasoning , we calculate the token budget using the following ratio of the total tokens passed in max_completion_tokens:

Effort Level	Reasoning Token Budget (% of `max_completion_tokens`)
`low`	0.2
`medium`	0.5
`high`	0.8
`max`	0.95

The following cURL example shows how to specify reasoning effort for Claude Sonnet 4.6 model in the OpenAI format:

curl -X POST https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic-claude-4.6-sonnet",
    "messages": [
      {
        "role": "user",
        "content": "What is 27 * 453? Think step by step."
      }
    ],
    "max_completion_tokens": 8192,
    "reasoning_effort": "high"
  }'

The output looks like the following:

{
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "## Solving 27 × 453\n\nI'll break this into smaller, easier multiplications.\n\n**Split 453 into parts:**\n- 27 × 400\n- 27 × 50\n- 27 × 3\n\n**Calculate each part:**\n- 27 × 400 = **10,800**\n- 27 × 50 = **1,350**\n- 27 × 3 = **81**\n\n**Add the results:**\n- 10,800 + 1,350 = 12,150\n- 12,150 + 81 = **12,231**\n\n**27 × 453 = 12,231**",
                "reasoning_content": "27 * 453\n\nLet me break this down:\n27 * 453 = 27 * 400 + 27 * 50 + 27 * 3\n= 10800 + 1350 + 81\n= 12231",
                "refusal": null,
                "role": "assistant"
            }
        }
    ],
    "created": 1771948245,
    "id": "",
    "model": "anthropic-claude-4.6-sonnet",
    ...
}

Send Prompt to a Model Using the Responses API

The following cURL, Python OpenAI, and Gradient SDK examples show how to send a prompt using the /v1/responses endpoint. Include your model access key and the following in your request:

model: The model ID of the model you want to use. Get the model ID using /v1/models or on the available models page.
input: The prompt or input content you want the model to respond to.
max_output_tokens: The maximum number of tokens to generate in the response.
temperature: A value between 0.0 and 1.0 to control randomness and creativity.
stream: Set to true to stream partial responses.

You can also use prompt caching parameters in your request. For examples, see Use Prompt Caching and Use Reasoning.

cURL

Send a POST request to the /v1/responses endpoint using your model access key.

The following example request sends a prompt to an OpenAI GPT-OSS-20B model with the prompt What is the capital of France?, a temperature of 0.7, and maximum number of output tokens set to 50.

curl -sS -X POST https://inference.do-ai.run/v1/responses \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-oss-20b",
    "input": "What is the capital of France?",
    "max_output_tokens": 50,
    "temperature": 0.7,
    "stream": false
  }'

The response includes structured output and token usage details:

{
  ...
  "output": [
    {
      "content": [
        {
          "text": "We need to answer: The capital of France is Paris. This is straightforward.",
          "type": "reasoning_text"
        }
      ],
      ...
    },
    {
      "content": [
        {
          "text": "The capital of France is **Paris**.",
          "type": "output_text"
        }
      ],
      ...
    }
  ],
  ...
  "usage": {
    "input_tokens": 72,
    "input_tokens_details": {
      "cached_tokens": 32
    },
    "output_tokens": 35,
    "output_tokens_details": {
      "reasoning_tokens": 17,
      "tool_output_tokens": 0
    },
    "total_tokens": 107
  },
  ...
}

Python OpenAI

from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()

client = OpenAI(
    base_url="https://inference.do-ai.run/v1/",
    api_key=os.getenv("MODEL_ACCESS_KEY"),
)

resp = client.responses.create(
    model="openai-gpt-oss-20b",
    input="What is the capital of France?",
    max_output_tokens=50,
    temperature=0.7,
)

print(resp.output[1].content[0].text)

Gradient SDK

from gradient import Gradient
from dotenv import load_dotenv
import os

load_dotenv()

client = Gradient(model_access_key=os.getenv("MODEL_ACCESS_KEY"))

resp = client.responses.create(
    model="openai-gpt-oss-20b",
    input="What is the capital of France?",
    max_output_tokens=50,
    temperature=0.7,
)

print(resp.output[1].content[0].text)

Use Prompt Caching in Chat Completions and Responses API

Anthropic Models

Use prompt caching for Anthropic models in the chat completions API. Specify the cache_control parameter with type: ephemeral and ttl in your JSON request body. The ttl value can be 5m (default) or 1h. The following request body examples show how to use the cache_control parameter.

Single content part with cache control

...
{
  "role": "user",
  "content": {
    "type": "text",
    "text": "This is cached for 1h.",
    "cache_control": {
      "type": "ephemeral",
      "ttl": "1h"
    }
  }
}

Array of content parts (mixed cached and non-cached request)

...
{
  "role": "developer",
  "content": [
    {
      "type": "text",
      "text": "Cache this segment for 5 minutes.",
      "cache_control": {
        "type": "ephemeral",
        "ttl": "5m"
      }
    },
    {
      "type": "text",
      "text": "Do not cache this segment"
    }
  ]
}

Tool message content with cache control

...
{
  "role": "tool",
  "tool_call_id": "tool_call_id",
  "content": [
    {
      "type": "text",
      "text": "Tool output cached for 5m.",
      "cache_control": {
        "type": "ephemeral",
        "ttl": "5m"
      }
    }
  ]
}

The JSON response looks similar to the following and shows the number of input tokens cached during this request:

    "usage": {
        "cache_created_input_tokens": 1043,
        "cache_creation": {
            "ephemeral_1h_input_tokens": 0,
            "ephemeral_5m_input_tokens": 1043
        },
        "cache_read_input_tokens": 0,
        "completion_tokens": 100,
        "prompt_tokens": 14,
        "total_tokens": 114
    }

If you send the request again, cached input tokens are used and the response looks like this:

    "usage": {
        "cache_created_input_tokens": 0,
        "cache_creation": {
            "ephemeral_1h_input_tokens": 0,
            "ephemeral_5m_input_tokens": 0
        },
        "cache_read_input_tokens": 1043,
        "completion_tokens": 100,
        "prompt_tokens": 14,
        "total_tokens": 114
    }

OpenAI Models

Use prompt caching for OpenAI models for prompts containing 1024 tokens or more in both chat completions and responses API. Caching applies when the input tokens of a response match tokens from a previous response, though this is best-effort and not guaranteed.

To use prompt caching, specify the prompt_cache_retention parameter as either in_memory or 24h. The following request body example shows how to use the prompt_cache_retention parameter:

...
{
  "model": "gpt-4o-mini",
  "prompt_cache_retention": "24h",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant that summarizes text."
    },
    {
      "role": "user",
      "content": "Summarize the following text:\n\nArtificial intelligence is transforming industries by automating tasks, improving efficiency, and enabling new innovations..."
    }
  ],
  "temperature": 0.2
}

The JSON response looks similar to the following and shows the number of input tokens cached during this request:

{
  "id": "chatcmpl-xyz789",
  "object": "chat.completion",
  "created": 1772134300,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "Artificial intelligence is reshaping industries by automating processes, increasing efficiency, and enabling innovation."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 1200,
    "completion_tokens": 35,
    "total_tokens": 1235,
    "cache_read_input_tokens": 0,
    "cache_created_input_tokens": 1200,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 0,
      "ephemeral_1h_input_tokens": 1200
    }
  }
}

If you send the request again within the retention window, cached input tokens are used and the response looks like this:

"usage": {
  "prompt_tokens": 1200,
  "completion_tokens": 34,
  "total_tokens": 1234,
  "cache_read_input_tokens": 1200,
  "cache_created_input_tokens": 0,
  "cache_creation": {
    "ephemeral_5m_input_tokens": 0,
    "ephemeral_1h_input_tokens": 0
  }
}

Generate Image from Text Prompt

The following cURL, Python OpenAI, and Gradient SDK examples show how to generate an image from a text prompt. Include your model access key and the following in your request:

model: The model ID of the image generation model you want to use. Get the model ID using /v1/models or on the available models page.
prompt: The text prompt to generate the image from.
n: The number of images to generate. Must be between 1 and 10.
size: The desired dimensions of the generated image. Supported values are 256x256, 512x512, and 1024x1024.

Make sure to always specify n and size when generating images.

cURL

Send a POST request to the /v1/images/generations endpoint using your model access key.

The following example request sends a prompt to the openai-gpt-image-1 model to generate an image of a baby sea otter floating on its back in calm blue water, with an image size of 1024x1024:

curl -X POST https://inference.do-ai.run/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -d '{
    "model": "openai-gpt-image-1",
    "prompt": "A cute baby sea otter floating on its back in calm blue water",
    "n": 1,
    "size": "1024x1024"
  }'

The response includes a JSON object with a Base64 image string and other details such as image format and tokens used:

{
    "background": "opaque",
    "created": 1770659857,
    "data": [
        {
            "b64_json": "iVBORw0KGgoAAAANSUhEU...
        }
    ],
    "output_format": "png",
    "quality": "medium",
    "size": "1024x1024",
    "usage": {
        "input_tokens": 20,
        "input_tokens_details": {
            "text_tokens": 20
        },
        "output_tokens": 1056,
        "total_tokens": 1076
    }
}

If you want to save the image as a file, pipe the image string to a file using jq and base64:

curl -X POST https://inference.do-ai.run/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -d '{
    "model": "openai-gpt-image-1",
    "prompt": "A cute baby sea otter floating on its back in calm blue water",
    "n": 1,
    "size": "1024x1024"
  }' | jq -r '.data[0].b64_json' | base64 --decode > sea_otter.png

An image named sea_otter.png is created in your current directory after a few seconds.

Python OpenAI

from openai import OpenAI
from dotenv import load_dotenv
import os, base64

load_dotenv()

client = OpenAI(
    base_url="https://inference.do-ai.run/v1/",
    api_key=os.getenv("MODEL_ACCESS_KEY"),
)

result = client.images.generate(
    model="openai-gpt-image-1",
    prompt="A cute baby sea otter, children’s book drawing style",
    size="1024x1024",
    n=1
)

b64 = result.data[0].b64_json
with open("sea_otter.png", "wb") as f:
    f.write(base64.b64decode(b64))

print("Saved sea_otter.png")

Gradient SDK

from gradient import Gradient
from dotenv import load_dotenv
import os, base64

load_dotenv()

client = Gradient(model_access_key=os.getenv("MODEL_ACCESS_KEY"))

result = client.images.generations.create(
    model="openai-gpt-image-1",
    prompt="A cute baby sea otter, children’s book drawing style",
    size="1024x1024",
    n=1
)

b64 = result.data[0].b64_json
with open("sea_otter.png", "wb") as f:
    f.write(base64.b64decode(b64))

print("Saved sea_otter.png")

Generate Image, Audio, or Text-to-Speech Using fal Models

The following examples show how to generate an image or audio clip, or use text-to-speech with fal models with the /v1/async-invoke endpoint.

Generate Image

The following example sends a request to generate an image using the fal-ai/fast-sdxl model.

curl -X POST 'https://inference.do-ai.run/v1/async-invoke' \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "fal-ai/flux/schnell",
    "input": { "prompt": "A futuristic city at sunset" }
  }'

You can update the image generation request to also include the output format, number of inference steps, guidance scale, number of images to generate, and safety checker option:

curl -X POST https://inference.do-ai.run/v1/async-invoke \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "fal-ai/fast-sdxl",
    "input": {
      "prompt": "A futuristic cityscape at sunset, with flying cars and towering skyscrapers.",
      "output_format": "landscape_4_3",
      "num_inference_steps": 4,
      "guidance_scale": 3.5,
      "num_images": 1,
      "enable_safety_checker": true
    },
    "tags": [
      {"key": "type", "value": "test"} 
    ]
}'

Generate Audio

The following example sends a request to generate a 60 second audio clip using the fal-ai/stable-audio-25/text-to-audio model:

curl -X POST 'https://inference.do-ai.run/v1/async-invoke' \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "fal-ai/stable-audio-25/text-to-audio",
    "input": {
      "prompt": "Techno song with futuristic sounds",
      "seconds_total": 60
    },
    "tags": [
      { "key": "type", "value": "test" }
    ]
  }'

Use Text-to-Speech

The following example sends a request to generate text-to-speech audio using the fal-ai/multilingual-tts-v2 model:

curl -X POST 'https://inference.do-ai.run/v1/async-invoke' \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "fal-ai/elevenlabs/tts/multilingual-v2",
    "input": {
      "text": "This text-to-speech example uses DigitalOcean multilingual voice."
    },
    "tags": [
      { "key": "type", "value": "test" }
    ]
  }'

When you send a request to the /v1/async-invoke endpoint, it starts an asynchronous job for the image, audio, or text-to-speech generation and returns a request_id. The job status is QUEUED initially and the response looks similar to the following:

"completed_at": null,
"created_at": "2026-01-22T19:19:19.112403432Z",
"error": null,
"model_id": "fal-ai/fast-sdxl",
"output": null,
"request_id": "6590784a-ce47-4556-9ff4-53baff2693fb",
"started_at": null,
"status": "QUEUED"

Query the status endpoint frequently using the request_id to check the progress of the job:

curl -X GET "https://inference.do-ai.run/v1/async-invoke/<request_id>/status" \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY"

When the job completes, the status updates to COMPLETE. You can then use the /async-invoke/<request_id> endpoint to fetch the complete generated result:

curl -X GET "https://inference.do-ai.run/v1/async-invoke/<request_id>" \
  -H "Authorization: Bearer $MODEL_ACCESS_KEY"

The response includes a URL to the generated image, audio, or text-to-speech file, which you can download or open directly in your browser or app:

{
...
        "images": [
            {
                "content_type": "image/jpeg",
                "height": 768,
                "url": "https://v3b.fal.media/files/b/0a8b7281/HpQcEqkz-xy2ZI5do9Lyp.jpg",
                "width": 1024
            }
        ...
    },
    "request_id": "6f76e8f7-f6b4-4e20-ab9a-ca0f01a9d2f4",
    "started_at": null,
    "status": "COMPLETED"
}

Alternatively, you can call serverless inference from your automation workflows. The n8n community node connects to any DigitalOcean-hosted model using your model access key. You can self-host n8n using the n8n Marketplace app.

Model Access Keys

You can create and manage model access keys in the Model Access Keys section of the Serverless inference page in the DigitalOcean Control Panel or using the API.

Create Keys

Using the control panel

To create a model access key, click Create model access key to open the Add model access key window. In the Key name field, enter a name for your model access key, then click Add model access key.

Your new model access key with its creation date appears in the Model Access Keys section. The secret key is visible only once, immediately after creation, so copy and store it securely.

Model access keys are private and incur usage-based charges. Do not share them or expose them in front-end code. We recommend storing them using a secrets manager (for example, AWS Secrets Manager, HashiCorp Vault, or 1Password) or a secure environment variable in your deployment configuration.

Using API

How to Create Model API Key Using the DigitalOcean API

Create a personal access token and save it for use with the API.
Send a POST request to https://api.digitalocean.com/v2/gen-ai/models/api_keys.

cURL

Using cURL:

curl -X POST \
  -H "Content-Type: application/json"  \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  "https://api.digitalocean.com/v2/gen-ai/models/api_keys" \
  -d '{
    "name": "test-key"
  }'

Rename Keys

Renaming a model access key can help you organize and manage your keys more effectively, especially when using multiple keys for different projects or environments.

Using the control panel

To rename a key, click … to the right of the key in the list to open the key’s menu, then click Rename. In the Rename model access key window that opens, in the Key name field, enter a new name for your key and then click UPDATE.

Using API

How to Rename Model API Key Using the DigitalOcean API

Create a personal access token and save it for use with the API.
Send a PUT request to https://api.digitalocean.com/v2/gen-ai/models/api_keys/{api_key_uuid}.

cURL

Using cURL:

curl -X PUT \
  -H "Content-Type: application/json"  \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  "https://api.digitalocean.com/v2/gen-ai/models/api_keys/11efb7d6-cdb5-6388-bf8f-4e013e2ddde4" \
  -d '{
    "api_key_uuid": "11efb7d6-cdb5-6388-bf8f-4e013e2ddde4",
    "name": "test-key2"
  }'

Regenerate Keys

Regenerating a model access key creates a new secret key and immediately and permanently invalidates the previous one. If a key has been compromised or want to rotate keys for security purposes, regenerate the key, then update any applications using the previous key to use the new key.

Using the control panel

To regenerate a key, click … to the right of the key in the list to open the key’s menu, then click Regenerate. In the Regenerate model access key window that opens, enter the name of your key to confirm the action, then click Regenerate access key. Your new secret key is displayed in the Model Access Keys section.

Using API

How to Regenerate Model API Key Using the DigitalOcean API

Create a personal access token and save it for use with the API.
Send a PUT request to https://api.digitalocean.com/v2/gen-ai/models/api_keys/{api_key_uuid}/regenerate.

cURL

Using cURL:

curl -X PUT \
  -H "Content-Type: application/json"  \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  "https://api.digitalocean.com/v2/gen-ai/models/api_keys/11efcf7e-824d-2808-bf8f-4e013e2ddde4/regenerate"

Delete Keys

Deleting a model access key permanently and irreversibly destroys it. Any applications using a destroyed key lose access to the API.

Using the control panel

To delete a key, click … to the right of the key in the list to open the key’s menu, then click Delete. In the Delete model access key window that opens, type the name of the key to confirm the deletion, then click Delete access key.

Using API

How to Delete Model API Key Using the DigitalOcean API

Create a personal access token and save it for use with the API.
Send a DELETE request to https://api.digitalocean.com/v2/gen-ai/models/api_keys/{api_key_uuid}.

cURL

Using cURL:

curl -X DELETE \
  -H "Content-Type: application/json"  \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  "https://api.digitalocean.com/v2/gen-ai/models/api_keys/11efb7d6-cdb5-6388-bf8f-4e013e2ddde4"

How to Use Serverless Inference on DigitalOcean Gradient™ AI Platform

Prerequisites

Serverless Inference API Endpoints

Retrieve Available Models

Send Prompt to a Model Using the Chat Completions API

Use Reasoning

Send Prompt to a Model Using the Responses API

Use Prompt Caching in Chat Completions and Responses API

Anthropic Models

OpenAI Models

Generate Image from Text Prompt

Generate Image, Audio, or Text-to-Speech Using fal Models

Model Access Keys

Create Keys

cURL

Rename Keys

cURL

Regenerate Keys

cURL

Delete Keys

cURL

We can't find any results for your search.