OpenAI

The OpenAI provider contains language model support for the OpenAI responses, chat, and completion APIs, as well as embedding model support for the OpenAI embeddings API.

Setup

The OpenAI provider is available in the OpenAIProvider module. Add it to your Swift package:

dependencies: [
    .package(url: "https://github.com/teunlao/swift-ai-sdk", from: "1.0.0")
],
targets: [
    .target(
        name: "YourTarget",
        dependencies: [
            .product(name: "SwiftAISDK", package: "swift-ai-sdk"),
            .product(name: "OpenAIProvider", package: "swift-ai-sdk")
        ]
    )
]

Provider Instance

You can import the default provider instance openai from OpenAIProvider:

import SwiftAISDK
import OpenAIProvider

// Uses OPENAI_API_KEY environment variable automatically
let model = openai("gpt-4o")

If you need a customized setup, you can use createOpenAIProvider with your settings:

import OpenAIProvider

let provider = createOpenAIProvider(
    OpenAIProviderSettings(
        apiKey: "sk-...",
        organization: "org-...",
        headers: ["Custom-Header": "value"]
    )
)

let model = provider("gpt-4o")

You can use the following optional settings to customize the OpenAI provider instance:

baseURL string

Use a different URL prefix for API calls, e.g. to use proxy servers. The default prefix is https://api.openai.com/v1.
apiKey string

API key that is being sent using the Authorization header. It defaults to the OPENAI_API_KEY environment variable.
name string

The provider name. You can set this when using OpenAI compatible providers to change the model provider property. Defaults to openai.
organization string

OpenAI Organization.
project string

OpenAI project.
headers [String: String]

Custom headers to include in the requests.
fetch FetchFunction

Custom fetch implementation. You can use it as a middleware to intercept requests, or to provide a custom fetch implementation for testing.

Language Models

The OpenAI provider instance is a function that you can invoke to create a language model:

let model = openai("gpt-5")

It automatically selects the correct API based on the model id. You can also pass additional settings in the second argument:

let model = openai("gpt-5", /* additional settings */)

The available options depend on the API that’s automatically chosen for the model (see below). If you want to explicitly select a specific model API, you can use .responses, .chat, or .completion.

Example

You can use OpenAI language models to generate text with the generateText function:

import SwiftAISDK
import OpenAIProvider

let result = try await generateText(
  model: openai("gpt-5"),
  prompt: "Write a vegetarian lasagna recipe for 4 people."
)

print(result.text)

OpenAI language models can also be used in the streamText, generateObject, and streamObject functions (see AI SDK Core).

Responses Models

You can use the OpenAI responses API with the openai(modelId) or openai.responses(modelId) factory methods. It is the default API that is used by the OpenAI provider (since AI SDK 5).

let model = openai("gpt-5")

Further configuration can be done using OpenAI provider options.

import SwiftAISDK
import OpenAIProvider

let result = try await generateText(
  model: openai("gpt-5"), // or openai.responses(modelId: "gpt-5")
  providerOptions: [
    "openai": [
      "parallelToolCalls": false,
      "store": false,
      "user": "user_123"
    ]
  ],
  prompt: "..."
)

The following provider options are available:

parallelToolCalls boolean Whether to use parallel tool calls. Defaults to true.
store boolean

Whether to store the generation. Defaults to true.
maxToolCalls integer The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.
metadata Record<string, string> Additional metadata to store with the generation.
previousResponseId string The ID of the previous response. You can use it to continue a conversation. Defaults to undefined.
instructions string Instructions for the model. They can be used to change the system or developer message when continuing a conversation using the previousResponseId option. Defaults to undefined.
user string A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. Defaults to undefined.
reasoningEffort ‘minimal’ | ‘low’ | ‘medium’ | ‘high’ Reasoning effort for reasoning models. Defaults to medium. If you use providerOptions to set the reasoningEffort option, this model setting will be ignored.
reasoningSummary ‘auto’ | ‘detailed’ Controls whether the model returns its reasoning process. Set to 'auto' for a condensed summary, 'detailed' for more comprehensive reasoning. Defaults to undefined (no reasoning summaries). When enabled, reasoning summaries appear in the stream as events with type 'reasoning' and in non-streaming responses within the reasoning field.
strictJsonSchema boolean Whether to use strict JSON schema validation. Defaults to false.
serviceTier ‘auto’ | ‘flex’ | ‘priority’ | ‘default’ Service tier for the request. Set to ‘flex’ for 50% cheaper processing at the cost of increased latency (available for o3, o4-mini, and gpt-5 models). Set to ‘priority’ for faster processing with Enterprise access (available for gpt-4, gpt-5, gpt-5-mini, o3, o4-mini; gpt-5-nano is not supported).

Defaults to ‘auto’.
textVerbosity ‘low’ | ‘medium’ | ‘high’ Controls the verbosity of the model’s response. Lower values result in more concise responses, while higher values result in more verbose responses. Defaults to 'medium'.
include Array<string> Specifies additional content to include in the response. Supported values: ['file_search_call.results'] for including file search results in responses. ['message.output_text.logprobs'] for logprobs. Defaults to undefined.
promptCacheKey string A cache key for manual prompt caching control. Used by OpenAI to cache responses for similar requests to optimize your cache hit rates.
safetyIdentifier string A stable identifier used to help detect users of your application that may be violating OpenAI’s usage policies. The IDs should be a string that uniquely identifies each user.

The OpenAI responses provider also returns provider-specific metadata:

let result = try await generateText(
  model: openai.responses(modelId: "gpt-5")
)

let openaiMetadata = result.providerMetadata?.openai

The following OpenAI-specific metadata is returned:

responseId string The ID of the response. Can be used to continue a conversation.
cachedPromptTokens number The number of prompt tokens that were a cache hit.
reasoningTokens number The number of reasoning tokens that the model generated.

Reasoning Output

For reasoning models like gpt-5, you can enable reasoning summaries to see the model’s thought process. Different models support different summarizers—for example, o4-mini supports detailed summaries. Set reasoningSummary: "auto" to automatically receive the richest level available.

import SwiftAISDK
import OpenAIProvider

let result = try streamText(
  model: openai("gpt-5"),
  prompt: "Tell me about the Mission burrito debate in San Francisco.",
  providerOptions: [
    "openai": [
      "reasoningSummary": "detailed" // 'auto' for condensed or 'detailed' for comprehensive
    ]
  ]
)

for try await part in result.fullStream {
  switch part {
  case .reasoning(let delta):
    print("Reasoning: \(delta)")
  case .textDelta(let delta):
    print(delta, terminator: "")
  default:
    break
  }
}

For non-streaming calls with generateText, the reasoning summaries are available in the reasoning field of the response:

import SwiftAISDK
import OpenAIProvider

let result = try await generateText(
  model: openai("gpt-5"),
  prompt: "Tell me about the Mission burrito debate in San Francisco.",
  providerOptions: [
    "openai": [
      "reasoningSummary": "auto"
    ]
  ]
)

print("Reasoning:", result.reasoning ?? "")

Learn more about reasoning summaries in the OpenAI documentation.

Verbosity Control

You can control the length and detail of model responses using the textVerbosity parameter:

import SwiftAISDK
import OpenAIProvider

let result = try await generateText(
  model: openai("gpt-5-mini"),
  prompt: "Write a poem about a boy and his first pet dog.",
  providerOptions: [
    "openai": [
      "textVerbosity": "low" // 'low' for concise, 'medium' (default), or 'high' for verbose
    ]
  ]
)

The textVerbosity parameter scales output length without changing the underlying prompt:

'low': Produces terse, minimal responses
'medium': Balanced detail (default)
'high': Verbose responses with comprehensive detail

Web Search Tool

The OpenAI responses API supports web search through the openai.tools.webSearch tool.

let result = try await generateText(
  model: openai("gpt-5"),
  prompt: "What happened in San Francisco last week?",
  tools: [
    "web_search": openai.tools.webSearch(
      OpenAIWebSearchArgs(
        searchContextSize: "high",
        userLocation: OpenAIWebSearchArgs.UserLocation(
          city: "San Francisco",
          region: "California"
        )
      )
    )
  ],
  // Force web search tool (optional):
  toolChoice: ["type": "tool", "toolName": "web_search"]
)

// URL sources
let sources = result.sources

File Search Tool

The OpenAI responses API supports file search through the openai.tools.fileSearch tool.

You can force the use of the file search tool by setting the toolChoice parameter to { type: 'tool', toolName: 'file_search' }.

let result = try await generateText(
  model: openai("gpt-5"),
  prompt: "What does the document say about user authentication?",
  tools: [
    "file_search": openai.tools.fileSearch(
      OpenAIFileSearchArgs(
        vectorStoreIds: ["vs_123"],
        maxNumResults: 5,
        ranking: OpenAIFileSearchArgs.RankingOptions(
          ranker: "auto",
          scoreThreshold: 0.5
        ),
        filters: .object([
          "key": .string("author"),
          "type": .string("eq"),
          "value": .string("Jane Smith")
        ])
      )
    )
  ],
  providerOptions: [
    "openai": [
      // optional: include results
      "include": ["file_search_call.results"]
    ]
  ]
)

Image Generation Tool

OpenAI’s Responses API supports multi-modal image generation as a provider-defined tool. Availability is restricted to specific models (for example, gpt-5 variants).

You can use the image tool with either generateText or streamText:

import SwiftAISDK
import OpenAIProvider

let result = try await generateText(
  model: openai("gpt-5"),
  prompt: "Generate an image of an echidna swimming across the Mozambique channel.",
  tools: [
    "image_generation": openai.tools.imageGeneration(
      OpenAIImageGenerationArgs(outputFormat: "webp")
    )
  ]
)

for toolResult in result.staticToolResults {
  if toolResult.toolName == "image_generation" {
    let base64Image = toolResult.output.result
  }
}

import SwiftAISDK
import OpenAIProvider

let result = try streamText(
  model: openai("gpt-5"),
  prompt: "Generate an image of an echidna swimming across the Mozambique channel.",
  tools: [
    "image_generation": openai.tools.imageGeneration(
      OpenAIImageGenerationArgs(
        outputFormat: "webp",
        quality: "low"
      )
    )
  ]
)

for try await part in result.fullStream {
  if case .toolResult(let toolResult) = part, !toolResult.dynamic {
    let base64Image = toolResult.output.result
  }
}

For complete details on model availability, image quality controls, supported sizes, and tool-specific parameters, refer to the OpenAI documentation:

Image generation overview and models: OpenAI Image Generation
Image generation tool parameters (background, size, quality, format, etc.): Image Generation Tool Options

Code Interpreter Tool

The OpenAI responses API supports the code interpreter tool through the openai.tools.codeInterpreter tool. This allows models to write and execute Python code.

import SwiftAISDK
import OpenAIProvider

let result = try await generateText(
  model: openai("gpt-5"),
  prompt: "Write and run Python code to calculate the factorial of 10",
  tools: [
    "code_interpreter": openai.tools.codeInterpreter(
      OpenAICodeInterpreterArgs(
        container: .auto(fileIds: ["file-123", "file-456"])
      )
    )
  ]
)

The code interpreter tool can be configured with:

container: Either a container ID string or an object with fileIds to specify uploaded files that should be available to the code interpreter

Local Shell Tool

The OpenAI responses API support the local shell tool for Codex models through the openai.tools.localShell tool. Local shell is a tool that allows agents to run shell commands locally on a machine you or the user provides.

import SwiftAISDK
import OpenAIProvider

let result = try await generateText(
  model: openai.responses(modelId: "gpt-5-codex"),
  tools: [
    "local_shell": openai.tools.localShell()
  ],
  prompt: "List the files in my home directory.",
  stopWhen: stepCountIs(2)
)

Image Inputs

The OpenAI Responses API supports Image inputs for appropriate models. You can pass Image files as part of the message content using the ‘image’ type:

let result = try await generateText(
  model: openai("gpt-5"),
  messages: [
    [
      "role": "user",
      "content": [
        [
          "type": "text",
          "text": "Please describe the image."
        ],
        [
          "type": "image",
          "image": try Data(contentsOf: URL(fileURLWithPath: "./data/image.png"))
        ]
      ]
    ]
  ]
)

The model will have access to the image and will respond to questions about it. The image should be passed using the image field.

You can also pass a file-id from the OpenAI Files API.

[
  "type": "image",
  "image": "file-8EFBcWHsQxZV7YGezBC1fq"
]

You can also pass the URL of an image.

[
  "type": "image",
  "image": "https://sample.edu/image.png"
]

PDF Inputs

The OpenAI Responses API supports reading PDF files. You can pass PDF files as part of the message content using the file type:

let result = try await generateText(
  model: openai("gpt-5"),
  messages: [
    [
      "role": "user",
      "content": [
        [
          "type": "text",
          "text": "What is an embedding model?"
        ],
        [
          "type": "file",
          "data": try Data(contentsOf: URL(fileURLWithPath: "./data/ai.pdf")),
          "mediaType": "application/pdf",
          "filename": "ai.pdf" // optional
        ]
      ]
    ]
  ]
)

You can also pass a file-id from the OpenAI Files API.

[
  "type": "file",
  "data": "file-8EFBcWHsQxZV7YGezBC1fq",
  "mediaType": "application/pdf"
]

You can also pass the URL of a pdf.

[
  "type": "file",
  "data": "https://sample.edu/example.pdf",
  "mediaType": "application/pdf",
  "filename": "ai.pdf" // optional
]

The model will have access to the contents of the PDF file and respond to questions about it. The PDF file should be passed using the data field, and the mediaType should be set to 'application/pdf'.

Structured Outputs

The OpenAI Responses API supports structured outputs. You can enforce structured outputs using generateObject or streamObject, which expose a schema option. Additionally, you can pass a Zod or JSON Schema object to the experimental_output option when using generateText or streamText.

// Using generateObject
import SwiftAISDK
import OpenAIProvider
struct Ingredient: Codable, Sendable { let name: String; let amount: String }
struct Recipe: Codable, Sendable {
  let name: String
  let ingredients: [Ingredient]
  let steps: [String]
}

let result = try await generateObject(
  model: openai("gpt-4.1"),
  schema: Recipe.self,
  prompt: "Generate a lasagna recipe.",
  schemaName: "recipe",
  schemaDescription: "A recipe for lasagna."
).object

Chat Models

You can create models that call the OpenAI chat API using the .chat() factory method. The first argument is the model id, e.g. gpt-4. The OpenAI chat models support tool calls and some have multi-modal capabilities.

let model = openai.chat(modelId: "gpt-5")

OpenAI chat models support also some model specific provider options that are not part of the standard call settings. You can pass them in the providerOptions argument:

import SwiftAISDK
import OpenAIProvider

let model = openai.chat(modelId: "gpt-5")

let result = try await generateText(
  model: model,
  prompt: "Hello!",
  providerOptions: [
    "openai": [
      "logitBias": [
        "50256": -100  // optional likelihood for specific tokens
      ],
      "user": "test-user"  // optional unique user identifier
    ]
  ]
)

The following optional provider options are available for OpenAI chat models:

logitBias Record<number, number>

Modifies the likelihood of specified tokens appearing in the completion.

Accepts a JSON object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool to convert text to token IDs. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

As an example, you can pass {"50256": -100} to prevent the token from being generated.
logprobs boolean | number

Return the log probabilities of the tokens. Including logprobs will increase the response size and can slow down response times. However, it can be useful to better understand how the model is behaving.

Setting to true will return the log probabilities of the tokens that were generated.

Setting to a number will return the log probabilities of the top n tokens that were generated.
parallelToolCalls boolean

Whether to enable parallel function calling during tool use. Defaults to true.
user string

A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. Learn more.
reasoningEffort ‘minimal’ | ‘low’ | ‘medium’ | ‘high’

Reasoning effort for reasoning models. Defaults to medium. If you use providerOptions to set the reasoningEffort option, this model setting will be ignored.
structuredOutputs boolean

Whether to use structured outputs. Defaults to true.

When enabled, tool calls and object generation will be strict and follow the provided schema.
maxCompletionTokens number

Maximum number of completion tokens to generate. Useful for reasoning models.
store boolean

Whether to enable persistence in Responses API.
metadata Record<string, string>

Metadata to associate with the request.
prediction Record<string, any>

Parameters for prediction mode.
serviceTier ‘auto’ | ‘flex’ | ‘priority’ | ‘default’

Service tier for the request. Set to ‘flex’ for 50% cheaper processing at the cost of increased latency (available for o3, o4-mini, and gpt-5 models). Set to ‘priority’ for faster processing with Enterprise access (available for gpt-4, gpt-5, gpt-5-mini, o3, o4-mini; gpt-5-nano is not supported).

Defaults to ‘auto’.
strictJsonSchema boolean

Whether to use strict JSON schema validation. Defaults to false.
textVerbosity ‘low’ | ‘medium’ | ‘high’

Controls the verbosity of the model’s responses. Lower values will result in more concise responses, while higher values will result in more verbose responses.
promptCacheKey string

A cache key for manual prompt caching control. Used by OpenAI to cache responses for similar requests to optimize your cache hit rates.
safetyIdentifier string

A stable identifier used to help detect users of your application that may be violating OpenAI’s usage policies. The IDs should be a string that uniquely identifies each user.

Reasoning

OpenAI has introduced the o1,o3, and o4 series of reasoning models. Currently, o4-mini, o3, o3-mini, and o1 are available via both the chat and responses APIs. The models codex-mini-latest and computer-use-preview are available only via the responses API.

Reasoning models currently only generate text, have several limitations, and are only supported using generateText and streamText.

They support additional settings and response metadata:

You can use providerOptions to set
- the reasoningEffort option (or alternatively the reasoningEffort model setting), which determines the amount of reasoning the model performs.
You can use response providerMetadata to access the number of reasoning tokens that the model generated.

import SwiftAISDK
import OpenAIProvider

let result = try await generateText(
  model: openai.chat(modelId: "gpt-5"),
  prompt: "Invent a new holiday and describe its traditions.",
  providerOptions: [
    "openai": [
      "reasoningEffort": "low"
    ]
  ]
)

print(result.text)
print("Usage:", result.usage)
print("Reasoning tokens:", result.providerMetadata?["openai"]?["reasoningTokens"] ?? 0)

Structured Outputs

Structured outputs are enabled by default. You can disable them by setting the structuredOutputs option to false.

import SwiftAISDK
import OpenAIProvider
import AISDKProviderUtils

let result = try await generateObject(
  model: openai.chat(modelId: "gpt-4o-2024-08-06"),
  schema: Recipe.self, // reuse struct Recipe from snippet above
  prompt: "Generate a lasagna recipe.",
  schemaName: "recipe",
  schemaDescription: "A recipe for lasagna.",
  providerOptions: [
    "openai": [
      "structuredOutputs": false
    ]
  ]
)

print(result.object)

Logprobs

OpenAI provides logprobs information for completion/chat models. You can access it in the providerMetadata object.

import SwiftAISDK
import OpenAIProvider

let result = try await generateText(
  model: openai.chat(modelId: "gpt-5"),
  prompt: "Write a vegetarian lasagna recipe for 4 people.",
  providerOptions: [
    "openai": [
      // this can also be a number,
      // refer to logprobs provider options section for more
      "logprobs": true
    ]
  ]
)

let openaiMetadata = result.providerMetadata?["openai"]
let logprobs = openaiMetadata?["logprobs"]

Image Support

The OpenAI Chat API supports Image inputs for appropriate models. You can pass Image files as part of the message content using the ‘image’ type:

let result = try await generateText(
  model: openai.chat(modelId: "gpt-5"),
  messages: [
    [
      "role": "user",
      "content": [
        [
          "type": "text",
          "text": "Please describe the image."
        ],
        [
          "type": "image",
          "image": try Data(contentsOf: URL(fileURLWithPath: "./data/image.png"))
        ]
      ]
    ]
  ]
)

The model will have access to the image and will respond to questions about it. The image should be passed using the image field.

You can also pass the URL of an image.

[
  "type": "image",
  "image": "https://sample.edu/image.png"
]

PDF support

The OpenAI Chat API supports reading PDF files. You can pass PDF files as part of the message content using the file type:

let result = try await generateText(
  model: openai.chat(modelId: "gpt-5"),
  messages: [
    [
      "role": "user",
      "content": [
        [
          "type": "text",
          "text": "What is an embedding model?"
        ],
        [
          "type": "file",
          "data": try Data(contentsOf: URL(fileURLWithPath: "./data/ai.pdf")),
          "mediaType": "application/pdf",
          "filename": "ai.pdf" // optional
        ]
      ]
    ]
  ]
)

You can also pass a file-id from the OpenAI Files API.

[
  "type": "file",
  "data": "file-8EFBcWHsQxZV7YGezBC1fq",
  "mediaType": "application/pdf"
]

You can also pass the URL of a PDF.

[
  "type": "file",
  "data": "https://sample.edu/example.pdf",
  "mediaType": "application/pdf",
  "filename": "ai.pdf" // optional
]

Predicted Outputs

OpenAI supports predicted outputs for gpt-4o and gpt-4o-mini. Predicted outputs help you reduce latency by allowing you to specify a base text that the model should modify. You can enable predicted outputs by adding the prediction option to the providerOptions.openai object:

let result = try streamText(
  model: openai.chat(modelId: "gpt-5"),
  messages: [
    [
      "role": "user",
      "content": "Replace the Username property with an Email property."
    ],
    [
      "role": "user",
      "content": existingCode
    ]
  ],
  providerOptions: [
    "openai": [
      "prediction": [
        "type": "content",
        "content": existingCode
      ]
    ]
  ]
)

OpenAI provides usage information for predicted outputs (acceptedPredictionTokens and rejectedPredictionTokens). You can access it in the providerMetadata object.

let openaiMetadata = try await result.providerMetadata?.openai

let acceptedPredictionTokens = openaiMetadata?.acceptedPredictionTokens
let rejectedPredictionTokens = openaiMetadata?.rejectedPredictionTokens

Image Detail

You can use the openai provider option to set the image input detail to high, low, or auto:

let result = try await generateText(
  model: openai.chat(modelId: "gpt-5"),
  messages: [
    [
      "role": "user",
      "content": [
        ["type": "text", "text": "Describe the image in detail."],
        [
          "type": "image",
          "image": "https://github.com/vercel/ai/blob/main/examples/ai-core/data/comic-cat.png?raw=true",

          // OpenAI specific options - image detail:
          "providerOptions": [
            "openai": ["imageDetail": "low"]
          ]
        ]
      ]
    ]
  ]
)

Distillation

OpenAI supports model distillation for some models. If you want to store a generation for use in the distillation process, you can add the store option to the providerOptions.openai object. This will save the generation to the OpenAI platform for later use in distillation.

import SwiftAISDK
import OpenAIProvider

func main() async throws {
  let result = try await generateText(
    model: openai.chat(modelId: "gpt-4o-mini"),
    prompt: "Who worked on the original macintosh?",
    providerOptions: [
      "openai": [
        "store": true,
        "metadata": [
          "custom": "value"
        ]
      ]
    ]
  )

  print(result.text)
  print()
  print("Usage:", result.usage)
}

try await main()

Prompt Caching

OpenAI has introduced Prompt Caching for supported models including gpt-4o and gpt-4o-mini.

Prompt caching is automatically enabled for these models, when the prompt is 1024 tokens or longer. It does not need to be explicitly enabled.
You can use response providerMetadata to access the number of prompt tokens that were a cache hit.
Note that caching behavior is dependent on load on OpenAI’s infrastructure. Prompt prefixes generally remain in the cache following 5-10 minutes of inactivity before they are evicted, but during off-peak periods they may persist for up to an hour.

import SwiftAISDK
import OpenAIProvider

let result = try await generateText(
  model: openai.chat(modelId: "gpt-4o-mini"),
  prompt: "A 1024-token or longer prompt..."
)

print("usage:", [
  "promptTokens": result.usage.promptTokens,
  "completionTokens": result.usage.completionTokens,
  "totalTokens": result.usage.totalTokens,
  "cachedPromptTokens": result.providerMetadata?.openai?.cachedPromptTokens as Any
])

To improve cache hit rates, you can manually control caching using the promptCacheKey option:

import SwiftAISDK
import OpenAIProvider

let result = try await generateText(
  model: openai.chat(modelId: "gpt-5"),
  prompt: "A 1024-token or longer prompt...",
  providerOptions: [
    "openai": [
      "promptCacheKey": "my-custom-cache-key-123"
    ]
  ]
)

print("usage:", [
  "promptTokens": result.usage.promptTokens,
  "completionTokens": result.usage.completionTokens,
  "totalTokens": result.usage.totalTokens,
  "cachedPromptTokens": result.providerMetadata?.openai?.cachedPromptTokens as Any
])

Audio Input

With the gpt-4o-audio-preview model, you can pass audio files to the model.

import SwiftAISDK
import OpenAIProvider

let result = try await generateText(
  model: openai.chat(modelId: "gpt-4o-audio-preview"),
  messages: [
    [
      "role": "user",
      "content": [
        ["type": "text", "text": "What is the audio saying?"],
        [
          "type": "file",
          "mediaType": "audio/mpeg",
          "data": try Data(contentsOf: URL(fileURLWithPath: "./data/galileo.mp3"))
        ]
      ]
    ]
  ]
)

Completion Models

You can create models that call the OpenAI completions API using the .completion() factory method. The first argument is the model id. Currently only gpt-3.5-turbo-instruct is supported.

let model = openai.completion("gpt-3.5-turbo-instruct")

OpenAI completion models support also some model specific settings that are not part of the standard call settings. You can pass them as an options argument:

let model = openai.completion("gpt-3.5-turbo-instruct")

try await model.doGenerate(
  providerOptions: [
    "openai": [
      "echo": true, // optional, echo the prompt in addition to the completion
      "logitBias": [
        // optional likelihood for specific tokens
        "50256": -100
      ],
      "suffix": "some text", // optional suffix that comes after a completion of inserted text
      "user": "test-user" // optional unique user identifier
    ]
  ]
)

The following optional provider options are available for OpenAI completion models:

echo: boolean

Echo back the prompt in addition to the completion.
logitBias Record<number, number>

Modifies the likelihood of specified tokens appearing in the completion.

Accepts a JSON object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool to convert text to token IDs. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

As an example, you can pass {"50256": -100} to prevent the <|endoftext|> token from being generated.
logprobs boolean | number

Return the log probabilities of the tokens. Including logprobs will increase the response size and can slow down response times. However, it can be useful to better understand how the model is behaving.

Setting to true will return the log probabilities of the tokens that were generated.

Setting to a number will return the log probabilities of the top n tokens that were generated.
suffix string

The suffix that comes after a completion of inserted text.
user string

A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. Learn more.

Model Capabilities

Model	Image Input	Audio Input	Object Generation	Tool Usage
`gpt-5-pro`
`gpt-5`
`gpt-5-mini`
`gpt-5-nano`
`gpt-5-codex`
`gpt-5-chat-latest`
`gpt-4.1`
`gpt-4.1-mini`
`gpt-4.1-nano`
`gpt-4o`
`gpt-4o-mini`

Embedding Models

You can create models that call the OpenAI embeddings API using the .textEmbedding() factory method.

let model = openai.textEmbedding("text-embedding-3-large")

OpenAI embedding models support several additional provider options. You can pass them as an options argument:

import SwiftAISDK
import OpenAIProvider

let result = try await embed(
  model: openai.textEmbedding("text-embedding-3-large"),
  value: "sunny day at the beach",
  providerOptions: [
    "openai": [
      "dimensions": 512, // optional, number of dimensions for the embedding
      "user": "test-user" // optional unique user identifier
    ]
  ]
)
let embedding = result.embedding

The following optional provider options are available for OpenAI embedding models:

dimensions: number

The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.
user string

A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. Learn more.

Model Capabilities

Model	Default Dimensions	Custom Dimensions
`text-embedding-3-large`	3072
`text-embedding-3-small`	1536
`text-embedding-ada-002`	1536

Image Models

You can create models that call the OpenAI image generation API using the .image() factory method.

let model = openai.image("dall-e-3")

Model Capabilities

Model	Sizes
`gpt-image-1-mini`	1024x1024, 1536x1024, 1024x1536
`gpt-image-1`	1024x1024, 1536x1024, 1024x1536
`dall-e-3`	1024x1024, 1792x1024, 1024x1792
`dall-e-2`	256x256, 512x512, 1024x1024

You can pass optional providerOptions to the image model. These are prone to change by OpenAI and are model dependent. For example, the gpt-image-1 model supports the quality option:

let result = try await generateImage(
  model: openai.image("gpt-image-1"),
  prompt: "A salamander at sunrise in a forest pond in the Seychelles.",
  providerOptions: [
    "openai": ["quality": "high"]
  ]
)
let image = result.image
let providerMetadata = result.providerMetadata

For more on generateImage() see Image Generation.

OpenAI’s image models may return a revised prompt for each image. It can be access at providerMetadata.openai.images[0]?.revisedPrompt.

For more information on the available OpenAI image model options, see the OpenAI API reference.

Transcription Models

You can create models that call the OpenAI transcription API using the .transcription() factory method.

The first argument is the model id e.g. whisper-1.

let model = openai.transcription(modelId: "whisper-1")

You can also pass additional provider-specific options using the providerOptions argument. For example, supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.

import SwiftAISDK
import OpenAIProvider

let result = try await transcribe(
  model: openai.transcription(modelId: "whisper-1"),
  audio: Data([1, 2, 3, 4]),
  providerOptions: ["openai": ["language": "en"]]
)

To get word-level timestamps, specify the granularity:

import SwiftAISDK
import OpenAIProvider

let result = try await transcribe(
  model: openai.transcription(modelId: "whisper-1"),
  audio: Data([1, 2, 3, 4]),
  providerOptions: [
    "openai": [
      //"timestampGranularities": ["word"],
      "timestampGranularities": ["segment"]
    ]
  ]
)

// Access word-level timestamps
print(result.segments) // Array of segments with startSecond/endSecond

The following provider options are available:

timestampGranularities string[] The granularity of the timestamps in the transcription. Defaults to ['segment']. Possible values are ['word'], ['segment'], and ['word', 'segment']. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.
language string The language of the input audio. Supplying the input language in ISO-639-1 format (e.g. ‘en’) will improve accuracy and latency. Optional.
prompt string An optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language. Optional.
temperature number The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit. Defaults to 0. Optional.
include string[] Additional information to include in the transcription response.

Model Capabilities

Model	Transcription	Duration	Segments	Language
`whisper-1`
`gpt-4o-mini-transcribe`
`gpt-4o-transcribe`

Speech Models

You can create models that call the OpenAI speech API using the .speech() factory method.

The first argument is the model id e.g. tts-1.

let model = openai.speech(modelId: "tts-1")

You can also pass additional provider-specific options using the providerOptions argument. For example, supplying a voice to use for the generated audio.

import SwiftAISDK
import OpenAIProvider

let result = try await generateSpeech(
  model: openai.speech(modelId: "tts-1"),
  text: "Hello, world!",
  providerOptions: ["openai": [:]]
)

instructions string Control the voice of your generated audio with additional instructions e.g. “Speak in a slow and steady tone”. Does not work with tts-1 or tts-1-hd. Optional.
response_format string The format to audio in. Supported formats are mp3, opus, aac, flac, wav, and pcm. Defaults to mp3. Optional.
speed number The speed of the generated audio. Select a value from 0.25 to 4.0. Defaults to 1.0. Optional.

Model Capabilities

Model	Instructions
`tts-1`
`tts-1-hd`
`gpt-4o-mini-tts`