Skip to content

Kling AI

This page adapts the original AI SDK documentation: Kling AI.

The Kling AI provider contains support for Kling AI’s video generation models, including text-to-video, image-to-video, motion control, and multi-shot video generation.

The Kling AI provider is available in the KlingAIProvider module. Add it to your Swift package:

// Package.swift (excerpt)
dependencies: [
.package(url: "https://github.com/teunlao/swift-ai-sdk", from: "0.17.5")
],
targets: [
.target(
name: "YourTarget",
dependencies: [
.product(name: "SwiftAISDK", package: "swift-ai-sdk"),
.product(name: "KlingAIProvider", package: "swift-ai-sdk")
]
)
]

You can import the default provider instance klingai from KlingAIProvider:

import SwiftAISDK
import KlingAIProvider
let model = klingai.video("kling-v2.6-t2v")

If you need a customized setup, you can use createKlingAI and create a provider instance with your settings:

import KlingAIProvider
let klingai = createKlingAI(settings: KlingAIProviderSettings(
accessKey: "your-access-key", // optional, defaults to KLINGAI_ACCESS_KEY
secretKey: "your-secret-key" // optional, defaults to KLINGAI_SECRET_KEY
))

You can use the following optional settings to customize the Kling AI provider instance:

  • accessKey String

    Kling AI access key. Defaults to the KLINGAI_ACCESS_KEY environment variable.

  • secretKey String

    Kling AI secret key. Defaults to the KLINGAI_SECRET_KEY environment variable.

  • baseURL String

    Use a different URL prefix for API calls, e.g. to use proxy servers. The default prefix is https://api-singapore.klingai.com.

  • headers [String: String]

    Custom headers to include in the requests.

  • fetch FetchFunction

    Custom fetch implementation. You can use it as a middleware to intercept requests, or to provide a custom fetch implementation for e.g. testing.

You can create Kling AI video models using the .video() factory method. For more on video generation with the Swift AI SDK see Video Generation.

This provider currently supports three video generation modes: text-to-video, image-to-video, and motion control.

Generate videos from text prompts:

import SwiftAISDK
import KlingAIProvider
import Foundation
let result = try await experimental_generateVideo(
model: klingai.video("kling-v2.6-t2v"),
prompt: "A chicken flying into the sunset in the style of 90s anime.",
aspectRatio: "16:9",
duration: 5,
providerOptions: ["klingai": [
"mode": "std"
]]
)
try result.video.data.write(to: URL(fileURLWithPath: "video.mp4"))

Generate videos from a start frame image with an optional text prompt. The popular start+end frame feature is available via the imageTail option:

import SwiftAISDK
import KlingAIProvider
import Foundation
let result = try await experimental_generateVideo(
model: klingai.video("kling-v2.6-i2v"),
prompt: .imageToVideo(
image: .string("https://example.com/start-frame.png"),
text: "The cat slowly turns its head and blinks"
),
duration: 5,
providerOptions: ["klingai": [
// Pro mode required for start+end frame control (most models)
"mode": "pro",
// Optional: end frame image
"imageTail": "https://example.com/end-frame.png"
]]
)
try result.video.data.write(to: URL(fileURLWithPath: "video.mp4"))

Generate videos with multiple storyboard shots, each with its own prompt and duration (Kling v3.0+):

import SwiftAISDK
import KlingAIProvider
import Foundation
let result = try await experimental_generateVideo(
model: klingai.video("kling-v3.0-t2v"),
prompt: "",
aspectRatio: "16:9",
duration: 10,
providerOptions: ["klingai": [
"mode": "pro",
"multiShot": true,
"shotType": "customize",
"multiPrompt": [
[
"index": 1,
"prompt": "A sunrise over a calm ocean, warm golden light.",
"duration": "4"
],
[
"index": 2,
"prompt": "A flock of seagulls take flight from the beach.",
"duration": "3"
],
[
"index": 3,
"prompt": "Waves crash against rocky cliffs at sunset.",
"duration": "3"
]
],
"sound": "on"
]]
)
try result.video.data.write(to: URL(fileURLWithPath: "video.mp4"))

Multi-shot also works with image-to-video by combining a start frame image with per-shot prompts.

Generate video by transferring motion from a reference video to a character image:

import SwiftAISDK
import KlingAIProvider
import Foundation
let result = try await experimental_generateVideo(
model: klingai.video("kling-v2.6-motion-control"),
prompt: .imageToVideo(
image: .string("https://example.com/character.png"),
text: "The character performs a smooth dance move"
),
providerOptions: ["klingai": [
"videoUrl": "https://example.com/reference-motion.mp4",
"characterOrientation": "image",
"mode": "std"
]]
)
try result.video.data.write(to: URL(fileURLWithPath: "video.mp4"))

The following provider options are available via providerOptions["klingai"]. Options vary by mode — see the KlingAI Capability Map for per-model support.

  • mode ‘std’ | ‘pro’

    Video generation mode. 'std' is cost-effective. 'pro' produces higher quality but takes longer.

  • pollIntervalMs number

    Polling interval in milliseconds for checking task status. Defaults to 5000.

  • pollTimeoutMs number

    Maximum wait time in milliseconds for video generation. Defaults to 600000 (10 minutes).

  • negativePrompt string

    A description of what to avoid in the generated video (max 2500 characters).

  • sound ‘on’ | ‘off’

    Whether to generate audio simultaneously. Only V2.6 and subsequent models support this, and requires mode: 'pro'.

  • cfgScale number

    Flexibility in video generation. Higher values mean stronger prompt adherence. Range: [0, 1]. Not supported by V2.x models.

  • cameraControl object

    Camera movement control with a type preset ('simple', 'down_back', 'forward_up', 'right_turn_forward', 'left_turn_forward') and optional config with horizontal, vertical, pan, tilt, roll, zoom values (range: [-10, 10]).

  • multiShot boolean

    Enable multi-shot video generation (Kling v3.0+). When true, the video is split into up to 6 storyboard shots with individual prompts and durations.

  • shotType ‘customize’ | ‘intelligence’

    Storyboard method for multi-shot generation. 'customize' uses multiPrompt for user-defined shots. 'intelligence' lets the model auto-segment based on the main prompt. Required when multiShot is true.

  • multiPrompt Array<{index, prompt, duration}>

    Per-shot details for multi-shot generation. Each shot has an index (number), prompt (string, max 512 characters), and duration (string, in seconds). Shot durations must sum to the total duration. Required when multiShot is true and shotType is 'customize'.

  • voiceList Array<{voice_id: string}>

    Voice references for voice control (Kling v3.0+). Up to 2 voices. Reference via <<<voice_1>>> template syntax in the prompt. Requires sound: 'on'. Cannot coexist with elementList on the I2V endpoint.

  • imageTail string

    End frame image for start+end frame control. Accepts an image URL or raw base64-encoded data. Requires mode: 'pro' for most models.

  • staticMask string

    Static brush mask image for motion brush. Accepts an image URL or raw base64-encoded data.

  • dynamicMasks Array

    Dynamic brush configurations for motion brush. Up to 6 groups, each with a mask (image URL or base64) and trajectories (array of {x, y} coordinates).

  • elementList Array<{element_id: number}>

    Reference elements for element control (Kling v3.0+ I2V). Supports video character elements and multi-image elements. Up to 3 reference elements. Cannot coexist with voiceList.

  • videoUrl string (required)

    URL of the reference motion video. Supports .mp4/.mov, max 100MB, duration 3–30 seconds.

  • characterOrientation ‘image’ | ‘video’ (required)

    Orientation of the characters in the generated video. 'image' matches the reference image orientation (max 10s video). 'video' matches the reference video orientation (max 30s video).

  • keepOriginalSound ‘yes’ | ‘no’

    Whether to keep the original sound from the reference video. Defaults to 'yes'.

  • watermarkEnabled boolean

    Whether to generate watermarked results simultaneously.

ModelDescription
kling-v3.0-t2vLatest v3.0, multi-shot, voice control, sound (3-15s)
kling-v2.6-t2vV2.6, sound in pro mode
kling-v2.5-turbo-t2vOptimized for speed, std and pro
kling-v2.1-master-t2vHigh-quality generation, pro only
kling-v2-master-t2vMaster-quality generation
kling-v1.6-t2vV1.6 generation, std and pro
kling-v1-t2vOriginal V1 model, supports camera control (std)
ModelDescription
kling-v3.0-i2vLatest v3.0, multi-shot, element/voice control, sound (3-15s)
kling-v2.6-i2vV2.6, sound and end-frame in pro mode
kling-v2.5-turbo-i2vOptimized for speed, end-frame in pro
kling-v2.1-master-i2vHigh-quality generation, pro only
kling-v2.1-i2vV2.1 generation, end-frame in pro
kling-v2-master-i2vMaster-quality generation
kling-v1.6-i2vV1.6 generation, end-frame in pro
kling-v1.5-i2vV1.5 generation, end-frame and motion brush in pro
kling-v1-i2vOriginal V1 model, end-frame and motion brush in std/pro
ModelDescription
kling-v2.6-motion-controlTransfers motion from a reference video to a character image