Transcription

This page adapts the original AI SDK documentation: Transcription.

Warning Transcription is an experimental feature.

The AI SDK provides the transcribe function to transcribe audio using a transcription model.

import SwiftAISDK
import OpenAIProvider

let transcript = try await transcribe(
  model: openai.transcription(modelId: "whisper-1"),
  audio: .data(try Data(contentsOf: URL(fileURLWithPath: "audio.mp3")))
)

The audio parameter accepts .data(Data), .base64(String), .url(URL), or .file(URL, removesAfterTranscription: Bool).

To access the generated transcript:

let text = transcript.text                      // e.g. "Hello, world!"
let segments = transcript.segments              // [TranscriptionSegment]
let language = transcript.language              // Optional language code
let duration = transcript.durationInSeconds     // Optional duration

Settings

Provider-Specific settings

Transcription models often have provider or model-specific settings which you can set using the providerOptions parameter.

let detailed = try await transcribe(
  model: openai.transcription(modelId: "whisper-1"),
  audio: .data(try Data(contentsOf: URL(fileURLWithPath: "audio.mp3"))),
  providerOptions: ["openai": [
    "timestampGranularities": ["word"]
  ]]
)

Abort Signals and Timeouts

transcribe accepts an optional abortSignal closure of type @Sendable () -> Bool that you can use to abort the transcription process or set a timeout.

let deadline = Date().addingTimeInterval(1)

let timed = try await transcribe(
  model: openai.transcription("whisper-1"),
  audio: .data(try Data(contentsOf: URL(fileURLWithPath: "audio.mp3"))),
  abortSignal: { Date() >= deadline }
)

Custom Headers

transcribe accepts an optional headers parameter of type [String: String] that you can use to add custom headers to the transcription request.

let customHeaderTranscript = try await transcribe(
  model: openai.transcription("whisper-1"),
  audio: .data(try Data(contentsOf: URL(fileURLWithPath: "audio.mp3"))),
  headers: ["X-Custom-Header": "custom-value"]
)

Warnings

Warnings (e.g. unsupported parameters) are available on the warnings property.

let result = try await transcribe(
  model: openai.transcription("whisper-1"),
  audio: .data(try Data(contentsOf: URL(fileURLWithPath: "audio.mp3")))
)

print(result.warnings)

Error Handling

When transcribe cannot generate a valid transcript, it throws a NoTranscriptGeneratedError.

This error can arise for any of the following reasons:

The model failed to generate a response
The model generated a response that could not be parsed

The error preserves the following information to help you log the issue:

responses: Metadata about the transcription model responses, including timestamp, model, and headers.
cause: The cause of the error. You can use this for more detailed error handling.

import SwiftAISDK
import OpenAIProvider

let audioData = try Data(contentsOf: URL(fileURLWithPath: "audio.mp3"))

do {
  _ = try await transcribe(
    model: openai.transcription("whisper-1"),
    audio: .data(audioData)
  )
} catch let error as NoTranscriptGeneratedError {
  print("NoTranscriptGeneratedError")
  print("Cause:", error.cause ?? "none")
  print("Responses:", error.responses)
}

Transcription Models

Provider	Model
OpenAI	`whisper-1`
OpenAI	`gpt-4o-transcribe`
OpenAI	`gpt-4o-mini-transcribe`
ElevenLabs	`scribe_v1`
ElevenLabs	`scribe_v1_experimental`
Groq	`whisper-large-v3-turbo`
Groq	`distil-whisper-large-v3-en`
Groq	`whisper-large-v3`
Azure OpenAI	`whisper-1`
Azure OpenAI	`gpt-4o-transcribe`
Azure OpenAI	`gpt-4o-mini-transcribe`
Rev.ai	`machine`
Rev.ai	`low_cost`
Rev.ai	`fusion`
Deepgram	`base` (+ variants)
Deepgram	`enhanced` (+ variants)
Deepgram	`nova` (+ variants)
Deepgram	`nova-2` (+ variants)
Deepgram	`nova-3` (+ variants)
Gladia	`default`
AssemblyAI	`best`
AssemblyAI	`nano`
Fal	`whisper`
Fal	`wizper`

Above are a small subset of the transcription models supported by the AI SDK providers. For more, see the respective provider documentation.