Speech

This page adapts the original AI SDK documentation: Speech.

Warning Speech is an experimental feature.

The AI SDK provides the generateSpeech function to generate speech from text using a speech model.

import SwiftAISDK
import OpenAIProvider

let audio = try await generateSpeech(
  model: openai.speech("tts-1"),
  text: "Hello, world!",
  voice: "alloy"
)

Language Setting

You can specify the language for speech generation (provider support varies):

import SwiftAISDK
import LMNTProvider

let spanish = try await generateSpeech(
  model: lmnt.speech("aurora"),
  text: "Hola, mundo!",
  language: "es"
)

To access the generated audio:

let data = audio.audio.data     // `Data` with audio bytes
let base64 = audio.audio.base64 // Base64 encoded audio

Settings

Provider-Specific settings

You can set model-specific settings with the providerOptions parameter.

let customized = try await generateSpeech(
  model: openai.speech("tts-1"),
  text: "Hello, world!",
  voice: "alloy",
  providerOptions: ["openai": [
    "speed": 1.0
  ]]
)

Abort Signals and Timeouts

generateSpeech accepts an optional abortSignal closure of type @Sendable () -> Bool that you can use to abort the speech generation process or set a timeout.

let deadline = Date().addingTimeInterval(1)

let timedAudio = try await generateSpeech(
  model: openai.speech("tts-1"),
  text: "Hello, world!",
  abortSignal: { Date() >= deadline }
)

Custom Headers

generateSpeech accepts an optional headers parameter of type [String: String] that you can use to add custom headers to the speech generation request.

let headerAudio = try await generateSpeech(
  model: openai.speech("tts-1"),
  text: "Hello, world!",
  headers: ["X-Custom-Header": "custom-value"]
)

Warnings

Warnings (e.g. unsupported parameters) are available on the warnings property.

let speech = try await generateSpeech(
  model: openai.speech("tts-1"),
  text: "Hello, world!"
)

print(speech.warnings)

Error Handling

When generateSpeech cannot generate valid audio, it throws a NoSpeechGeneratedError.

This error can arise for any of the following reasons:

The model failed to generate a response
The model generated a response that could not be parsed

The error preserves the following information to help you log the issue:

responses: Metadata about the speech model responses, including timestamp, model, and headers.
cause: The cause of the error. You can use this for more detailed error handling.

import SwiftAISDK
import OpenAIProvider

do {
  _ = try await generateSpeech(
    model: openai.speech("tts-1"),
    text: "Hello, world!"
  )
} catch let error as NoSpeechGeneratedError {
  print("AI_NoSpeechGeneratedError")
  print("Cause:", error.cause ?? "none")
  print("Responses:", error.responses)
}

Speech Models

Provider	Model
OpenAI	`tts-1`
OpenAI	`tts-1-hd`
OpenAI	`gpt-4o-mini-tts`
ElevenLabs	`eleven_v3`
ElevenLabs	`eleven_multilingual_v2`
ElevenLabs	`eleven_flash_v2_5`
ElevenLabs	`eleven_flash_v2`
ElevenLabs	`eleven_turbo_v2_5`
ElevenLabs	`eleven_turbo_v2`
LMNT	`aurora`
LMNT	`blizzard`
Hume	`default`

Above are a small subset of the speech models supported by the AI SDK providers. For more, see the respective provider documentation.