Kling AI

This page adapts the original AI SDK documentation: Kling AI.

The Kling AI provider contains support for Kling AI’s video generation models, including text-to-video, image-to-video, motion control, and multi-shot video generation.

Setup

The Kling AI provider is available in the KlingAIProvider module. Add it to your Swift package:

// Package.swift (excerpt)
dependencies: [
  .package(url: "https://github.com/teunlao/swift-ai-sdk", from: "0.17.6")
],
targets: [
  .target(
    name: "YourTarget",
    dependencies: [
      .product(name: "SwiftAISDK", package: "swift-ai-sdk"),
      .product(name: "KlingAIProvider", package: "swift-ai-sdk")
    ]
  )
]

Provider Instance

You can import the default provider instance klingai from KlingAIProvider:

import SwiftAISDK
import KlingAIProvider

let model = klingai.video("kling-v2.6-t2v")

If you need a customized setup, you can use createKlingAI and create a provider instance with your settings:

import KlingAIProvider

let klingai = createKlingAI(settings: KlingAIProviderSettings(
  accessKey: "your-access-key", // optional, defaults to KLINGAI_ACCESS_KEY
  secretKey: "your-secret-key"  // optional, defaults to KLINGAI_SECRET_KEY
))

You can use the following optional settings to customize the Kling AI provider instance:

accessKey String

Kling AI access key. Defaults to the KLINGAI_ACCESS_KEY environment variable.
secretKey String

Kling AI secret key. Defaults to the KLINGAI_SECRET_KEY environment variable.
baseURL String

Use a different URL prefix for API calls, e.g. to use proxy servers. The default prefix is https://api-singapore.klingai.com.
headers [String: String]

Custom headers to include in the requests.
fetch FetchFunction

Custom fetch implementation. You can use it as a middleware to intercept requests, or to provide a custom fetch implementation for e.g. testing.

Video Models

You can create Kling AI video models using the .video() factory method. For more on video generation with the Swift AI SDK see Video Generation.

This provider currently supports three video generation modes: text-to-video, image-to-video, and motion control.

Text-to-Video

Generate videos from text prompts:

import SwiftAISDK
import KlingAIProvider
import Foundation

let result = try await experimental_generateVideo(
  model: klingai.video("kling-v2.6-t2v"),
  prompt: "A chicken flying into the sunset in the style of 90s anime.",
  aspectRatio: "16:9",
  duration: 5,
  providerOptions: ["klingai": [
    "mode": "std"
  ]]
)

try result.video.data.write(to: URL(fileURLWithPath: "video.mp4"))

Image-to-Video

Generate videos from a start frame image with an optional text prompt. The popular start+end frame feature is available via the imageTail option:

import SwiftAISDK
import KlingAIProvider
import Foundation

let result = try await experimental_generateVideo(
  model: klingai.video("kling-v2.6-i2v"),
  prompt: .imageToVideo(
    image: .string("https://example.com/start-frame.png"),
    text: "The cat slowly turns its head and blinks"
  ),
  duration: 5,
  providerOptions: ["klingai": [
    // Pro mode required for start+end frame control (most models)
    "mode": "pro",
    // Optional: end frame image
    "imageTail": "https://example.com/end-frame.png"
  ]]
)

try result.video.data.write(to: URL(fileURLWithPath: "video.mp4"))

Multi-Shot Video Generation

Generate videos with multiple storyboard shots, each with its own prompt and duration (Kling v3.0+):

import SwiftAISDK
import KlingAIProvider
import Foundation

let result = try await experimental_generateVideo(
  model: klingai.video("kling-v3.0-t2v"),
  prompt: "",
  aspectRatio: "16:9",
  duration: 10,
  providerOptions: ["klingai": [
    "mode": "pro",
    "multiShot": true,
    "shotType": "customize",
    "multiPrompt": [
      [
        "index": 1,
        "prompt": "A sunrise over a calm ocean, warm golden light.",
        "duration": "4"
      ],
      [
        "index": 2,
        "prompt": "A flock of seagulls take flight from the beach.",
        "duration": "3"
      ],
      [
        "index": 3,
        "prompt": "Waves crash against rocky cliffs at sunset.",
        "duration": "3"
      ]
    ],
    "sound": "on"
  ]]
)

try result.video.data.write(to: URL(fileURLWithPath: "video.mp4"))

Multi-shot also works with image-to-video by combining a start frame image with per-shot prompts.

Motion Control

Generate video by transferring motion from a reference video to a character image:

import SwiftAISDK
import KlingAIProvider
import Foundation

let result = try await experimental_generateVideo(
  model: klingai.video("kling-v2.6-motion-control"),
  prompt: .imageToVideo(
    image: .string("https://example.com/character.png"),
    text: "The character performs a smooth dance move"
  ),
  providerOptions: ["klingai": [
    "videoUrl": "https://example.com/reference-motion.mp4",
    "characterOrientation": "image",
    "mode": "std"
  ]]
)

try result.video.data.write(to: URL(fileURLWithPath: "video.mp4"))

Video Provider Options

The following provider options are available via providerOptions["klingai"]. Options vary by mode — see the KlingAI Capability Map for per-model support.

Common Options

mode ‘std’ | ‘pro’

Video generation mode. 'std' is cost-effective. 'pro' produces higher quality but takes longer.
pollIntervalMs number

Polling interval in milliseconds for checking task status. Defaults to 5000.
pollTimeoutMs number

Maximum wait time in milliseconds for video generation. Defaults to 600000 (10 minutes).

Text-to-Video and Image-to-Video Options

negativePrompt string

A description of what to avoid in the generated video (max 2500 characters).
sound ‘on’ | ‘off’

Whether to generate audio simultaneously. Only V2.6 and subsequent models support this, and requires mode: 'pro'.
cfgScale number

Flexibility in video generation. Higher values mean stronger prompt adherence. Range: [0, 1]. Not supported by V2.x models.
cameraControl object

Camera movement control with a type preset ('simple', 'down_back', 'forward_up', 'right_turn_forward', 'left_turn_forward') and optional config with horizontal, vertical, pan, tilt, roll, zoom values (range: [-10, 10]).
multiShot boolean

Enable multi-shot video generation (Kling v3.0+). When true, the video is split into up to 6 storyboard shots with individual prompts and durations.
shotType ‘customize’ | ‘intelligence’

Storyboard method for multi-shot generation. 'customize' uses multiPrompt for user-defined shots. 'intelligence' lets the model auto-segment based on the main prompt. Required when multiShot is true.
multiPrompt Array<{index, prompt, duration}>

Per-shot details for multi-shot generation. Each shot has an index (number), prompt (string, max 512 characters), and duration (string, in seconds). Shot durations must sum to the total duration. Required when multiShot is true and shotType is 'customize'.
voiceList Array<{voice_id: string}>

Voice references for voice control (Kling v3.0+). Up to 2 voices. Reference via <<<voice_1>>> template syntax in the prompt. Requires sound: 'on'. Cannot coexist with elementList on the I2V endpoint.

Image-to-Video Only Options

imageTail string

End frame image for start+end frame control. Accepts an image URL or raw base64-encoded data. Requires mode: 'pro' for most models.
staticMask string

Static brush mask image for motion brush. Accepts an image URL or raw base64-encoded data.
dynamicMasks Array

Dynamic brush configurations for motion brush. Up to 6 groups, each with a mask (image URL or base64) and trajectories (array of {x, y} coordinates).
elementList Array<{element_id: number}>

Reference elements for element control (Kling v3.0+ I2V). Supports video character elements and multi-image elements. Up to 3 reference elements. Cannot coexist with voiceList.

Motion Control Only Options

videoUrl string (required)

URL of the reference motion video. Supports .mp4/.mov, max 100MB, duration 3–30 seconds.
characterOrientation ‘image’ | ‘video’ (required)

Orientation of the characters in the generated video. 'image' matches the reference image orientation (max 10s video). 'video' matches the reference video orientation (max 30s video).
keepOriginalSound ‘yes’ | ‘no’

Whether to keep the original sound from the reference video. Defaults to 'yes'.
watermarkEnabled boolean

Whether to generate watermarked results simultaneously.

Video Model Capabilities

Text-to-Video

Model	Description
`kling-v3.0-t2v`	Latest v3.0, multi-shot, voice control, sound (3-15s)
`kling-v2.6-t2v`	V2.6, sound in pro mode
`kling-v2.5-turbo-t2v`	Optimized for speed, std and pro
`kling-v2.1-master-t2v`	High-quality generation, pro only
`kling-v2-master-t2v`	Master-quality generation
`kling-v1.6-t2v`	V1.6 generation, std and pro
`kling-v1-t2v`	Original V1 model, supports camera control (std)

Image-to-Video

Model	Description
`kling-v3.0-i2v`	Latest v3.0, multi-shot, element/voice control, sound (3-15s)
`kling-v2.6-i2v`	V2.6, sound and end-frame in pro mode
`kling-v2.5-turbo-i2v`	Optimized for speed, end-frame in pro
`kling-v2.1-master-i2v`	High-quality generation, pro only
`kling-v2.1-i2v`	V2.1 generation, end-frame in pro
`kling-v2-master-i2v`	Master-quality generation
`kling-v1.6-i2v`	V1.6 generation, end-frame in pro
`kling-v1.5-i2v`	V1.5 generation, end-frame and motion brush in pro
`kling-v1-i2v`	Original V1 model, end-frame and motion brush in std/pro

Motion Control

Model	Description
`kling-v2.6-motion-control`	Transfers motion from a reference video to a character image