Speech
This page adapts the original AI SDK documentation: Speech.
Warning Speech is an experimental feature.
The AI SDK provides the generateSpeech
function to generate speech from text using a speech model.
import SwiftAISDKimport OpenAIProvider
let audio = try await generateSpeech( model: openai.speech(modelId: "tts-1"), text: "Hello, world!", voice: "alloy")Language Setting
Section titled “Language Setting”You can specify the language for speech generation (provider support varies):
import SwiftAISDKimport LMNTProvider
let spanish = try await generateSpeech( model: lmnt.speech("aurora"), text: "Hola, mundo!", language: "es")To access the generated audio:
let data = audio.audio.data // `Data` with audio byteslet base64 = audio.audio.base64 // Base64 encoded audioSettings
Section titled “Settings”Provider-Specific settings
Section titled “Provider-Specific settings”You can set model-specific settings with the providerOptions parameter.
let customized = try await generateSpeech( model: openai.speech(modelId: "tts-1"), text: "Hello, world!", voice: "alloy", providerOptions: ["openai": [ "speed": 1.0 ]])Abort Signals and Timeouts
Section titled “Abort Signals and Timeouts”generateSpeech accepts an optional abortSignal closure of type @Sendable () -> Bool
that you can use to abort the speech generation process or set a timeout.
let deadline = Date().addingTimeInterval(1)
let timedAudio = try await generateSpeech( model: openai.speech("tts-1"), text: "Hello, world!", abortSignal: { Date() >= deadline })Custom Headers
Section titled “Custom Headers”generateSpeech accepts an optional headers parameter of type [String: String]
that you can use to add custom headers to the speech generation request.
let headerAudio = try await generateSpeech( model: openai.speech("tts-1"), text: "Hello, world!", headers: ["X-Custom-Header": "custom-value"])Warnings
Section titled “Warnings”Warnings (e.g. unsupported parameters) are available on the warnings property.
let speech = try await generateSpeech( model: openai.speech("tts-1"), text: "Hello, world!")
print(speech.warnings)Error Handling
Section titled “Error Handling”When generateSpeech cannot generate valid audio, it throws a NoSpeechGeneratedError.
This error can arise for any of the following reasons:
- The model failed to generate a response
- The model generated a response that could not be parsed
The error preserves the following information to help you log the issue:
responses: Metadata about the speech model responses, including timestamp, model, and headers.cause: The cause of the error. You can use this for more detailed error handling.
import SwiftAISDKimport OpenAIProvider
do { _ = try await generateSpeech( model: openai.speech("tts-1"), text: "Hello, world!" )} catch let error as NoSpeechGeneratedError { print("AI_NoSpeechGeneratedError") print("Cause:", error.cause ?? "none") print("Responses:", error.responses)}Speech Models
Section titled “Speech Models”| Provider | Model |
|---|---|
| OpenAI | tts-1 |
| OpenAI | tts-1-hd |
| OpenAI | gpt-4o-mini-tts |
| ElevenLabs | eleven_v3 |
| ElevenLabs | eleven_multilingual_v2 |
| ElevenLabs | eleven_flash_v2_5 |
| ElevenLabs | eleven_flash_v2 |
| ElevenLabs | eleven_turbo_v2_5 |
| ElevenLabs | eleven_turbo_v2 |
| LMNT | aurora |
| LMNT | blizzard |
| Hume | default |
Above are a small subset of the speech models supported by the AI SDK providers. For more, see the respective provider documentation.