Skip to content

Transcription

This page adapts the original AI SDK documentation: Transcription.

Warning Transcription is an experimental feature.

The AI SDK provides the transcribe function to transcribe audio using a transcription model.

import SwiftAISDK
import OpenAIProvider
let transcript = try await transcribe(
model: openai.transcription(modelId: "whisper-1"),
audio: .data(try Data(contentsOf: URL(fileURLWithPath: "audio.mp3")))
)

The audio parameter accepts .data(Data), .base64(String), .url(URL), or .file(URL, removesAfterTranscription: Bool).

To access the generated transcript:

let text = transcript.text // e.g. "Hello, world!"
let segments = transcript.segments // [TranscriptionSegment]
let language = transcript.language // Optional language code
let duration = transcript.durationInSeconds // Optional duration

Transcription models often have provider or model-specific settings which you can set using the providerOptions parameter.

let detailed = try await transcribe(
model: openai.transcription(modelId: "whisper-1"),
audio: .data(try Data(contentsOf: URL(fileURLWithPath: "audio.mp3"))),
providerOptions: ["openai": [
"timestampGranularities": ["word"]
]]
)

transcribe accepts an optional abortSignal closure of type @Sendable () -> Bool that you can use to abort the transcription process or set a timeout.

let deadline = Date().addingTimeInterval(1)
let timed = try await transcribe(
model: openai.transcription("whisper-1"),
audio: .data(try Data(contentsOf: URL(fileURLWithPath: "audio.mp3"))),
abortSignal: { Date() >= deadline }
)

transcribe accepts an optional headers parameter of type [String: String] that you can use to add custom headers to the transcription request.

let customHeaderTranscript = try await transcribe(
model: openai.transcription("whisper-1"),
audio: .data(try Data(contentsOf: URL(fileURLWithPath: "audio.mp3"))),
headers: ["X-Custom-Header": "custom-value"]
)

Warnings (e.g. unsupported parameters) are available on the warnings property.

let result = try await transcribe(
model: openai.transcription("whisper-1"),
audio: .data(try Data(contentsOf: URL(fileURLWithPath: "audio.mp3")))
)
print(result.warnings)

When transcribe cannot generate a valid transcript, it throws a NoTranscriptGeneratedError.

This error can arise for any of the following reasons:

  • The model failed to generate a response
  • The model generated a response that could not be parsed

The error preserves the following information to help you log the issue:

  • responses: Metadata about the transcription model responses, including timestamp, model, and headers.
  • cause: The cause of the error. You can use this for more detailed error handling.
import SwiftAISDK
import OpenAIProvider
let audioData = try Data(contentsOf: URL(fileURLWithPath: "audio.mp3"))
do {
_ = try await transcribe(
model: openai.transcription("whisper-1"),
audio: .data(audioData)
)
} catch let error as NoTranscriptGeneratedError {
print("NoTranscriptGeneratedError")
print("Cause:", error.cause ?? "none")
print("Responses:", error.responses)
}
ProviderModel
OpenAIwhisper-1
OpenAIgpt-4o-transcribe
OpenAIgpt-4o-mini-transcribe
ElevenLabsscribe_v1
ElevenLabsscribe_v1_experimental
Groqwhisper-large-v3-turbo
Groqdistil-whisper-large-v3-en
Groqwhisper-large-v3
Azure OpenAIwhisper-1
Azure OpenAIgpt-4o-transcribe
Azure OpenAIgpt-4o-mini-transcribe
Rev.aimachine
Rev.ailow_cost
Rev.aifusion
Deepgrambase (+ variants)
Deepgramenhanced (+ variants)
Deepgramnova (+ variants)
Deepgramnova-2 (+ variants)
Deepgramnova-3 (+ variants)
Gladiadefault
AssemblyAIbest
AssemblyAInano
Falwhisper
Falwizper

Above are a small subset of the transcription models supported by the AI SDK providers. For more, see the respective provider documentation.