Transcription
This page adapts the original AI SDK documentation: Transcription.
Warning Transcription is an experimental feature.
The AI SDK provides the transcribe
function to transcribe audio using a transcription model.
import SwiftAISDKimport OpenAIProvider
let transcript = try await transcribe( model: openai.transcription(modelId: "whisper-1"), audio: .data(try Data(contentsOf: URL(fileURLWithPath: "audio.mp3"))))The audio parameter accepts .data(Data), .base64(String), .url(URL), or .file(URL, removesAfterTranscription: Bool).
To access the generated transcript:
let text = transcript.text // e.g. "Hello, world!"let segments = transcript.segments // [TranscriptionSegment]let language = transcript.language // Optional language codelet duration = transcript.durationInSeconds // Optional durationSettings
Section titled “Settings”Provider-Specific settings
Section titled “Provider-Specific settings”Transcription models often have provider or model-specific settings which you can set using the providerOptions parameter.
let detailed = try await transcribe( model: openai.transcription(modelId: "whisper-1"), audio: .data(try Data(contentsOf: URL(fileURLWithPath: "audio.mp3"))), providerOptions: ["openai": [ "timestampGranularities": ["word"] ]])Abort Signals and Timeouts
Section titled “Abort Signals and Timeouts”transcribe accepts an optional abortSignal closure of type @Sendable () -> Bool
that you can use to abort the transcription process or set a timeout.
let deadline = Date().addingTimeInterval(1)
let timed = try await transcribe( model: openai.transcription("whisper-1"), audio: .data(try Data(contentsOf: URL(fileURLWithPath: "audio.mp3"))), abortSignal: { Date() >= deadline })Custom Headers
Section titled “Custom Headers”transcribe accepts an optional headers parameter of type [String: String]
that you can use to add custom headers to the transcription request.
let customHeaderTranscript = try await transcribe( model: openai.transcription("whisper-1"), audio: .data(try Data(contentsOf: URL(fileURLWithPath: "audio.mp3"))), headers: ["X-Custom-Header": "custom-value"])Warnings
Section titled “Warnings”Warnings (e.g. unsupported parameters) are available on the warnings property.
let result = try await transcribe( model: openai.transcription("whisper-1"), audio: .data(try Data(contentsOf: URL(fileURLWithPath: "audio.mp3"))))
print(result.warnings)Error Handling
Section titled “Error Handling”When transcribe cannot generate a valid transcript, it throws a NoTranscriptGeneratedError.
This error can arise for any of the following reasons:
- The model failed to generate a response
- The model generated a response that could not be parsed
The error preserves the following information to help you log the issue:
responses: Metadata about the transcription model responses, including timestamp, model, and headers.cause: The cause of the error. You can use this for more detailed error handling.
import SwiftAISDKimport OpenAIProvider
let audioData = try Data(contentsOf: URL(fileURLWithPath: "audio.mp3"))
do { _ = try await transcribe( model: openai.transcription("whisper-1"), audio: .data(audioData) )} catch let error as NoTranscriptGeneratedError { print("NoTranscriptGeneratedError") print("Cause:", error.cause ?? "none") print("Responses:", error.responses)}Transcription Models
Section titled “Transcription Models”| Provider | Model |
|---|---|
| OpenAI | whisper-1 |
| OpenAI | gpt-4o-transcribe |
| OpenAI | gpt-4o-mini-transcribe |
| ElevenLabs | scribe_v1 |
| ElevenLabs | scribe_v1_experimental |
| Groq | whisper-large-v3-turbo |
| Groq | distil-whisper-large-v3-en |
| Groq | whisper-large-v3 |
| Azure OpenAI | whisper-1 |
| Azure OpenAI | gpt-4o-transcribe |
| Azure OpenAI | gpt-4o-mini-transcribe |
| Rev.ai | machine |
| Rev.ai | low_cost |
| Rev.ai | fusion |
| Deepgram | base (+ variants) |
| Deepgram | enhanced (+ variants) |
| Deepgram | nova (+ variants) |
| Deepgram | nova-2 (+ variants) |
| Deepgram | nova-3 (+ variants) |
| Gladia | default |
| AssemblyAI | best |
| AssemblyAI | nano |
| Fal | whisper |
| Fal | wizper |
Above are a small subset of the transcription models supported by the AI SDK providers. For more, see the respective provider documentation.