Google Generative AI

The Google Generative AI provider contains language and embedding model support for the Google Generative AI APIs.

Setup

The Google Generative AI provider is available in the GoogleProvider module. Add it to your Swift package:

dependencies: [
    .package(url: "https://github.com/teunlao/swift-ai-sdk", from: "0.7.0")
],
targets: [
    .target(
        name: "YourTarget",
        dependencies: [
            .product(name: "SwiftAISDK", package: "swift-ai-sdk"),
            .product(name: "GoogleProvider", package: "swift-ai-sdk")
        ]
    )
]

Provider Instance

You can import the default provider instance google from GoogleProvider:

import GoogleProvider

let google = Google()

If you need a customized setup, you can import createGoogleGenerativeAI and create a provider instance with your settings:

import GoogleProvider

let google = createGoogleGenerativeAI(
  // custom settings
)

You can use the following optional settings to customize the Google Generative AI provider instance:

baseURL string

Use a different URL prefix for API calls, e.g. to use proxy servers. The default prefix is https://generativelanguage.googleapis.com/v1beta.
apiKey string

API key that is being sent using the x-goog-api-key header. It defaults to the GOOGLE_GENERATIVE_AI_API_KEY environment variable.
headers Record<string,string>

Custom headers to include in the requests.
fetch (input: RequestInfo, init?: RequestInit) => Promise<Response>

Custom fetch implementation. Defaults to the global fetch function. You can use it as a middleware to intercept requests, or to provide a custom fetch implementation for e.g. testing.

Language Models

You can create models that call the Google Generative AI API using the provider instance. The first argument is the model id, e.g. gemini-2.5-flash. The models support tool calls and some have multi-modal capabilities.

let model = google("gemini-2.5-flash")

You can use Google Generative AI language models to generate text with the generateText function:

import SwiftAISDK
import GoogleProvider

let result = try await generateText(
  model: google("gemini-2.5-flash"),
  prompt: "Write a vegetarian lasagna recipe for 4 people."
)
let text = result.text

Google Generative AI language models can also be used in the streamText, generateObject, and streamObject functions (see AI SDK Core).

Google Generative AI also supports some model specific settings that are not part of the standard call settings. You can pass them as an options argument:

let model = google("gemini-2.5-flash")

try await generateText(
  model: model,
  providerOptions: [
    "google": [
      "safetySettings": [
        [
          "category": "HARM_CATEGORY_UNSPECIFIED",
          "threshold": "BLOCK_LOW_AND_ABOVE"
        ]
      ]
    ]
  ]
)

The following optional provider options are available for Google Generative AI models:

cachedContent string

Optional. The name of the cached content used as context to serve the prediction. Format: cachedContents/{cachedContent}
structuredOutputs boolean

Optional. Enable structured output. Default is true.

This is useful when the JSON Schema contains elements that are not supported by the OpenAPI schema version that Google Generative AI uses. You can use this to disable structured outputs if you need to.

See Troubleshooting: Schema Limitations for more details.
safetySettings Array<{ category: string; threshold: string }>

Optional. Safety settings for the model.
- category string
  
  The category of the safety setting. Can be one of the following:
  - HARM_CATEGORY_HATE_SPEECH
  - HARM_CATEGORY_DANGEROUS_CONTENT
  - HARM_CATEGORY_HARASSMENT
  - HARM_CATEGORY_SEXUALLY_EXPLICIT
- threshold string
  
  The threshold of the safety setting. Can be one of the following:
  - HARM_BLOCK_THRESHOLD_UNSPECIFIED
  - BLOCK_LOW_AND_ABOVE
  - BLOCK_MEDIUM_AND_ABOVE
  - BLOCK_ONLY_HIGH
  - BLOCK_NONE
responseModalities string[] The modalities to use for the response. The following modalities are supported: TEXT, IMAGE. When not defined or empty, the model defaults to returning only text.
thinkingConfig { thinkingBudget: number; includeThoughts: boolean }

Optional. Configuration for the model’s thinking process. Only supported by specific Google Generative AI models.
- thinkingBudget number
  
  Optional. Gives the model guidance on the number of thinking tokens it can use when generating a response. Setting it to 0 disables thinking, if the model supports it. For more information about the possible value ranges for each model see Google Generative AI thinking documentation.
- includeThoughts boolean
  
  Optional. If set to true, thought summaries are returned, which are synthisized versions of the model’s raw thoughts and offer insights into the model’s internal reasoning process.

Thinking

The Gemini 2.5 series models use an internal “thinking process” that significantly improves their reasoning and multi-step planning abilities, making them highly effective for complex tasks such as coding, advanced mathematics, and data analysis. For more information see Google Generative AI thinking documentation.

You can control thinking budgets and enable a thought summary by setting the thinkingConfig parameter.

import SwiftAISDK
import GoogleProvider

let model = google("gemini-2.5-flash")

let result = try await generateText(
  model: model,
  prompt: "What is the sum of the first 10 prime numbers?",
  providerOptions: [
    "google": [
      "thinkingConfig": [
        "thinkingBudget": 8192,
        "includeThoughts": true
      ]
    ]
  ]
)

print(result.text)

print(result.reasoning) // Reasoning summary

File Inputs

The Google Generative AI provider supports file inputs, e.g. PDF files.

import SwiftAISDK
import GoogleProvider

let result = try await generateText(
  model: google("gemini-2.5-flash"),
  messages: [
    [
      "role": "user",
      "content": [
        [
          "type": "text",
          "text": "What is an embedding model according to this document?"
        ],
        [
          "type": "file",
          "data": try Data(contentsOf: URL(fileURLWithPath: "./data/ai.pdf")),
          "mediaType": "application/pdf"
        ]
      ]
    ]
  ]
)

You can also use YouTube URLs directly:

import SwiftAISDK
import GoogleProvider

let result = try await generateText(
  model: google("gemini-2.5-flash"),
  messages: [
    [
      "role": "user",
      "content": [
        [
          "type": "text",
          "text": "Summarize this video"
        ],
        [
          "type": "file",
          "data": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
          "mediaType": "video/mp4"
        ]
      ]
    ]
  ]
)

See File Parts for details on how to use files in prompts.

Cached Content

Google Generative AI supports both explicit and implicit caching to help reduce costs on repetitive content.

Implicit Caching

Gemini 2.5 models automatically provide cache cost savings without needing to create an explicit cache. When you send requests that share common prefixes with previous requests, you’ll receive a 75% token discount on cached content.

To maximize cache hits with implicit caching:

Keep content at the beginning of requests consistent
Add variable content (like user questions) at the end of prompts
Ensure requests meet minimum token requirements:
- Gemini 2.5 Flash: 1024 tokens minimum
- Gemini 2.5 Pro: 2048 tokens minimum

import SwiftAISDK
import GoogleProvider

// Structure prompts with consistent content at the beginning
let baseContext = "You are a cooking assistant with expertise in Italian cuisine. Here are 1000 lasagna recipes for reference..."

let veggieLasagna = try await generateText(
  model: google("gemini-2.5-pro"),
  prompt: "\(baseContext)\n\nWrite a vegetarian lasagna recipe for 4 people."
)

// Second request with same prefix - eligible for cache hit
let meatLasagnaResult = try await generateText(
  model: google("gemini-2.5-pro"),
  prompt: "\(baseContext)\n\nWrite a meat lasagna recipe for 12 people."
)

// Check cached token count in usage metadata
print("Cached tokens:", meatLasagnaResult.providerMetadata?.google?.usageMetadata as Any)
// e.g.
// {
//   groundingMetadata: null,
//   safetyRatings: null,
//   usageMetadata: {
//     cachedContentTokenCount: 2027,
//     thoughtsTokenCount: 702,
//     promptTokenCount: 2152,
//     candidatesTokenCount: 710,
//     totalTokenCount: 3564
//   }
// }

Explicit Caching

For guaranteed cost savings, you can still use explicit caching with Gemini 2.5 and 2.0 models. See the models page to check if caching is supported for the used model:

import SwiftAISDK
import GoogleProvider

let cacheManager = GoogleAICacheManager(
  apiKey: ProcessInfo.processInfo.environment["GOOGLE_GENERATIVE_AI_API_KEY"]!
)

let model = "gemini-2.5-pro"

let cache = try await cacheManager.create(
  model: model,
  contents: [
    [
      "role": "user",
      "parts": [["text": "1000 Lasagna Recipes..."]]
    ]
  ],
  ttlSeconds: 60 * 5
)
let cachedContent = cache.name

let veggieLasagnaRecipe = try await generateText(
  model: google(model),
  prompt: "Write a vegetarian lasagna recipe for 4 people.",
  providerOptions: [
    "google": [
      "cachedContent": cachedContent
    ]
  ]
)

let meatLasagnaRecipe = try await generateText(
  model: google(model),
  prompt: "Write a meat lasagna recipe for 12 people.",
  providerOptions: [
    "google": [
      "cachedContent": cachedContent
    ]
  ]
)

Code Execution

With Code Execution, certain models can generate and execute Python code to perform calculations, solve problems, or provide more accurate information.

You can enable code execution by adding the code_execution tool to your request.

import SwiftAISDK
import GoogleProvider

let result = try await generateText(
  model: google("gemini-2.5-pro"),
  tools: ["code_execution": google.tools.codeExecution()],
  prompt: "Use python to calculate the 20th fibonacci number."
)
let text = result.text
let toolCalls = result.toolCalls
let toolResults = result.toolResults

The response will contain the tool calls and results from the code execution.

Google Search

With search grounding, the model has access to the latest information using Google search. Google search can be used to provide answers around current events:

import SwiftAISDK
import GoogleProvider

let result = try await generateText(
  model: google("gemini-2.5-flash"),
  tools: [
    "google_search": google.tools.googleSearch()
  ],
  prompt: "List the top 5 San Francisco news from the past week. You must include the date of each article."
)

let text = result.text
let sources = result.sources

// access the grounding metadata
let metadata = result.providerMetadata?.google
let groundingMetadata = metadata?.groundingMetadata
let safetyRatings = metadata?.safetyRatings

When Search Grounding is enabled, the model will include sources in the response.

Additionally, the grounding metadata includes detailed information about how search results were used to ground the model’s response. Here are the available fields:

webSearchQueries (string[] | null)
- Array of search queries used to retrieve information
- Example: ["What's the weather in Chicago this weekend?"]
searchEntryPoint ({ renderedContent: string } | null)
- Contains the main search result content used as an entry point
- The renderedContent field contains the formatted content
groundingSupports (Array of support objects | null)
- Contains details about how specific response parts are supported by search results
- Each support object includes:
  - segment: Information about the grounded text segment
    - text: The actual text segment
    - startIndex: Starting position in the response
    - endIndex: Ending position in the response
  - groundingChunkIndices: References to supporting search result chunks
  - confidenceScores: Confidence scores (0-1) for each supporting chunk

Example response:

{
  "groundingMetadata": {
    "webSearchQueries": ["What's the weather in Chicago this weekend?"],
    "searchEntryPoint": {
      "renderedContent": "..."
    },
    "groundingSupports": [
      {
        "segment": {
          "startIndex": 0,
          "endIndex": 65,
          "text": "Chicago weather changes rapidly, so layers let you adjust easily."
        },
        "groundingChunkIndices": [0],
        "confidenceScores": [0.99]
      }
    ]
  }
}

URL Context

Google provides a provider-defined URL context tool.

The URL context tool allows the you to provide specific URLs that you want the model to analyze directly in from the prompt.

import SwiftAISDK
import GoogleProvider

let result = try await generateText(
  model: google("gemini-2.5-flash"),
  prompt: "Based on the document: https://ai.google.dev/gemini-api/docs/url-context. Answer this question: How many links we can consume in one request?",
  tools: [
    "url_context": google.tools.urlContext()
  ]
)

let text = result.text
let sources = result.sources
let metadata = result.providerMetadata?.google
let groundingMetadata = metadata?.groundingMetadata
let urlContextMetadata = metadata?.urlContextMetadata

The URL context metadata includes detailed information about how the model used the URL context to generate the response. Here are the available fields:

urlMetadata ({ retrievedUrl: string; urlRetrievalStatus: string; }[] | null)
- Array of URL context metadata
- Each object includes:
  - retrievedUrl: The URL of the context
  - urlRetrievalStatus: The status of the URL retrieval

Example response:

{
  "urlMetadata": [
    {
      "retrievedUrl": "https://ai-sdk.dev/providers/ai-sdk-providers/google-generative-ai",
      "urlRetrievalStatus": "URL_RETRIEVAL_STATUS_SUCCESS"
    }
  ]
}

With the URL context tool, you will also get the groundingMetadata.

"groundingMetadata": {
    "groundingChunks": [
        {
            "web": {
                "uri": "https://ai-sdk.dev/providers/ai-sdk-providers/google-generative-ai",
                "title": "Google Generative AI - AI SDK Providers"
            }
        }
    ],
    "groundingSupports": [
        {
            "segment": {
                "startIndex": 67,
                "endIndex": 157,
                "text": "**Installation**: Install the `@ai-sdk/google` module using your preferred package manager"
            },
            "groundingChunkIndices": [
                0
            ]
        },
    ]
}

Combine URL Context with Search Grounding

You can combine the URL context tool with search grounding to provide the model with the latest information from the web.

import SwiftAISDK
import GoogleProvider

let result = try await generateText(
  model: google("gemini-2.5-flash"),
  prompt: "Based on this context: https://ai-sdk.dev/providers/ai-sdk-providers/google-generative-ai, tell me how to use Gemini with AI SDK. Also, provide the latest news about AI SDK V5.",
  tools: [
    "google_search": google.tools.googleSearch(),
    "url_context": google.tools.urlContext()
  ]
)

let text = result.text
let sources = result.sources
let metadata = result.providerMetadata?.google
let groundingMetadata = metadata?.groundingMetadata
let urlContextMetadata = metadata?.urlContextMetadata

Image Outputs

Gemini models with image generation capabilities (gemini-2.5-flash-image-preview) support image generation. Images are exposed as files in the response.

import SwiftAISDK
import GoogleProvider

let result = try await generateText(
  model: google("gemini-2.5-flash-image-preview"),
  prompt: "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"
)

for file in result.files {
  if file.mediaType.starts(with: "image/") {
    print("Generated image:", file)
  }
}

Safety Ratings

The safety ratings provide insight into the safety of the model’s response. See Google AI documentation on safety settings.

Example response excerpt:

{
  "safetyRatings": [
    {
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "probability": "NEGLIGIBLE",
      "probabilityScore": 0.11027937,
      "severity": "HARM_SEVERITY_LOW",
      "severityScore": 0.28487435
    },
    {
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "probability": "HIGH",
      "blocked": true,
      "probabilityScore": 0.95422274,
      "severity": "HARM_SEVERITY_MEDIUM",
      "severityScore": 0.43398145
    },
    {
      "category": "HARM_CATEGORY_HARASSMENT",
      "probability": "NEGLIGIBLE",
      "probabilityScore": 0.11085559,
      "severity": "HARM_SEVERITY_NEGLIGIBLE",
      "severityScore": 0.19027223
    },
    {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "probability": "NEGLIGIBLE",
      "probabilityScore": 0.22901751,
      "severity": "HARM_SEVERITY_NEGLIGIBLE",
      "severityScore": 0.09089675
    }
  ]
}

Troubleshooting

Schema Limitations

The Google Generative AI API uses a subset of the OpenAPI 3.0 schema, which does not support features such as unions. The errors that you get in this case look like this:

GenerateContentRequest.generation_config.response_schema.properties[occupation].type: must be specified

By default, structured outputs are enabled (and for tool calling they are required). You can disable structured outputs for object generation as a workaround:

let result = try await generateObject(
  model: google("gemini-2.5-flash"),
  providerOptions: [
    "google": [
      "structuredOutputs": false
    ]
  ],
  schema: FlexibleSchema(jsonSchema(
    .object([
      "type": .string("object"),
      "properties": .object([
        "name": .object(["type": .string("string")]),
        "age": .object(["type": .string("number")]),
        "contact": .object([
          "type": .string("object"),
          "oneOf": .array([
            .object([
              "properties": .object([
                "type": .object(["const": .string("email")]),
                "value": .object(["type": .string("string")])
              ])
            ]),
            .object([
              "properties": .object([
                "type": .object(["const": .string("phone")]),
                "value": .object(["type": .string("string")])
              ])
            ])
          ])
        ])
      ])
    ])
  )),
  prompt: "Generate an example person for testing."
)
let object = result.object

The following Zod features are known to not work with Google Generative AI:

z.union
z.record

Model Capabilities

Model	Image Input	Object Generation	Tool Usage	Tool Streaming	Google Search	URL Context
`gemini-2.5-pro`
`gemini-2.5-flash`
`gemini-2.5-flash-lite`
`gemini-2.5-flash-lite-preview-06-17`
`gemini-2.0-flash`
`gemini-1.5-pro`
`gemini-1.5-pro-latest`
`gemini-1.5-flash`
`gemini-1.5-flash-latest`
`gemini-1.5-flash-8b`
`gemini-1.5-flash-8b-latest`

Gemma Models

You can use Gemma models with the Google Generative AI API.

Gemma models don’t natively support the systemInstruction parameter, but the provider automatically handles system instructions by prepending them to the first user message. This allows you to use system instructions with Gemma models seamlessly:

import SwiftAISDK
import GoogleProvider

let result = try await generateText(
  model: google("gemma-3-27b-it"),
  system: "You are a helpful assistant that responds concisely.",
  prompt: "What is machine learning?"
)
let text = result.text

The system instruction is automatically formatted and included in the conversation, so Gemma models can follow the guidance without any additional configuration.

Embedding Models

You can create models that call the Google Generative AI embeddings API using the .textEmbedding() factory method.

let model = google.textEmbedding("gemini-embedding-001")

The Google Generative AI provider sends API calls to the right endpoint based on the type of embedding:

Single embeddings: When embedding a single value with embed(), the provider uses the single :embedContent endpoint, which typically has higher rate limits compared to the batch endpoint.
Batch embeddings: When embedding multiple values with embedMany() or multiple values in embed(), the provider uses the :batchEmbedContents endpoint.

Google Generative AI embedding models support aditional settings. You can pass them as an options argument:

import SwiftAISDK
import GoogleProvider

let model = google.textEmbedding("gemini-embedding-001")

let result = try await embed(
  model: model,
  value: "sunny day at the beach",
  providerOptions: [
    "google": [
      "outputDimensionality": 512, // optional, number of dimensions for the embedding
      "taskType": "SEMANTIC_SIMILARITY" // optional, specifies the task type for generating embeddings
    ]
  ]
)
let embedding = result.embedding

The following optional provider options are available for Google Generative AI embedding models:

outputDimensionality: number

Optional reduced dimension for the output embedding. If set, excessive values in the output embedding are truncated from the end.
taskType: string

Optional. Specifies the task type for generating embeddings. Supported task types include:
- SEMANTIC_SIMILARITY: Optimized for text similarity.
- CLASSIFICATION: Optimized for text classification.
- CLUSTERING: Optimized for clustering texts based on similarity.
- RETRIEVAL_DOCUMENT: Optimized for document retrieval.
- RETRIEVAL_QUERY: Optimized for query-based retrieval.
- QUESTION_ANSWERING: Optimized for answering questions.
- FACT_VERIFICATION: Optimized for verifying factual information.
- CODE_RETRIEVAL_QUERY: Optimized for retrieving code blocks based on natural language queries.

Model Capabilities

Model	Default Dimensions	Custom Dimensions
`gemini-embedding-001`	3072
`text-embedding-004`	768

Image Models

You can create Imagen models that call the Google Generative AI API using the .image() factory method. For more on image generation with the AI SDK see generateImage.

import SwiftAISDK
import GoogleProvider

let result = try await generateImage(
  model: google.image("imagen-3.0-generate-002"),
  prompt: "A futuristic cityscape at sunset",
  aspectRatio: "16:9"
)
let image = result.image

Further configuration can be done using Google provider options.

import SwiftAISDK
import GoogleProvider

let result = try await generateImage(
  model: google.image("imagen-3.0-generate-002"),
  providerOptions: [
    "google": [
      "personGeneration": "dont_allow"
    ]
  ]
  // ...
)
let image = result.image

The following provider options are available:

personGeneration allow_adult | allow_all | dont_allow Whether to allow person generation. Defaults to allow_adult.

Model Capabilities

Model	Aspect Ratios
`imagen-3.0-generate-002`	1:1, 3:4, 4:3, 9:16, 16:9