diff --git a/src/content/docs/docs/integrations/google-genai.mdx b/src/content/docs/docs/integrations/google-genai.mdx index 86e044ee..7d03d488 100644 --- a/src/content/docs/docs/integrations/google-genai.mdx +++ b/src/content/docs/docs/integrations/google-genai.mdx @@ -952,88 +952,605 @@ TTS models automatically detect the input language. Supported languages include -The Google Generative AI plugin provides interfaces to Google's Gemini models through the Gemini API. +The Google AI plugin provides a unified interface to Google's generative AI models through the **Gemini Developer API** using API key authentication. -## Configuration +The plugin supports a wide range of capabilities: +- **Language Models**: Gemini models for text generation, reasoning, and multimodal tasks +- **Embedding Models**: Text and multimodal embeddings +- **Image Models**: Imagen for generation and Gemini for image analysis +- **Video Models**: Veo for video generation and Gemini for video understanding -To use this plugin, import the `googlegenai` package and pass -`googlegenai.GoogleAI` to `WithPlugins()` in the Genkit initializer: +## Setup -```go -import "github.com/firebase/genkit/go/plugins/googlegenai" +### Installation + +```bash +go get github.com/firebase/genkit/go/plugins/googlegenai ``` +### Configuration + ```go -g := genkit.Init(context.Background(), genkit.WithPlugins(&googlegenai.GoogleAI{})) +import "github.com/firebase/genkit/go/plugins/googlegenai" + +// ... init genkit ... +g := genkit.Init(ctx, genkit.WithPlugins(&googlegenai.GoogleAI{})) ``` -The plugin requires an API key for the Gemini API, which you can get from -[Google AI Studio](https://aistudio.google.com/app/apikey). +### Authentication -Configure the plugin to use your API key by doing one of the following: +Requires a Gemini API Key, which you can get from [Google AI Studio](https://aistudio.google.com/apikey). You can provide this key in several ways: + +1. **Environment variables**: Set `GEMINI_API_KEY` +2. **Plugin configuration**: Pass `APIKey` when initializing the plugin: -- Set the `GEMINI_API_KEY` environment variable to your API key. + ```go + genkit.WithPlugins(&googlegenai.GoogleAI{APIKey: "YOUR_API_KEY"}) + ``` -- Specify the API key when you initialize the plugin: +## Language Models - ```go - genkit.WithPlugins(&googlegenai.GoogleAI{APIKey: "YOUR_API_KEY"}) - ``` +You can create models that call the Google Generative AI API. The models support tool calls and some have multi-modal capabilities. - However, don't embed your API key directly in code! Use this feature only - in conjunction with a service like Cloud Secret Manager or similar. +### Available Models -## Usage +**Gemini 3 Series** - Latest experimental models with state-of-the-art reasoning: +- `gemini-3-pro-preview` - Preview of the most capable model for complex tasks +- `gemini-3-flash-preview` - Fast and intelligent model for high-volume tasks +- `gemini-3-pro-image-preview` - Supports image generation outputs -### Generative models +**Gemini 2.5 Series** - Latest stable models with advanced reasoning and multimodal capabilities: +- `gemini-2.5-pro` - Most capable stable model for complex tasks +- `gemini-2.5-flash` - Fast and efficient for most use cases +- `gemini-2.5-flash-lite` - Lightweight version for simple tasks +- `gemini-2.5-flash-image` - Supports image generation outputs -To get a reference to a supported model, specify its identifier to `googlegenai.GoogleAIModel`: +**Gemma 3 Series** - Open models for various use cases: +- `gemma-3-27b-it` - Large instruction-tuned model +- `gemma-3-12b-it` - Medium instruction-tuned model +- `gemma-3-4b-it` - Small instruction-tuned model +- `gemma-3-1b-it` - Tiny instruction-tuned model +- `gemma-3n-e4b-it` - Efficient 4-bit model + +:::note +See the [Google Generative AI models documentation](https://ai.google.dev/gemini-api/docs/models) for a complete list of available models and their capabilities. +::: + +### Basic Usage ```go -model := googlegenai.GoogleAIModel(g, "gemini-2.5-flash") +resp, err := genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-flash", nil)), + ai.WithPrompt("Explain how neural networks learn in simple terms."), +) ``` -Alternatively, you may create a `ModelRef` which pairs the model name with its config: +### Model References and Configuration + +You can reference models in several ways. Using **`googlegenai.ModelRef`** is generally considered best practice because it provides **strong typing**, ensuring that you are using the correct configuration type for the specific plugin: + +- **`googlegenai.ModelRef(name, config)`**: Creates a static reference that includes a specific configuration. This is the preferred way as it enforces type safety for the `config` parameter. +- **`ai.WithModelName(name)`**: Resolves a model by its strong identifier (e.g., `"googleai/gemini-2.5-flash"`). A quick way to reference a model when no configuration is needed. +- **`googlegenai.GoogleAIModel(g, name)`**: Returns a handle to a model registered with your Genkit instance. Use this when you want to provide configuration dynamically at the request level. + +You can provide configuration either when referencing the model or per-request to the `genkit.Generate` call: ```go -modelRef := googlegenai.GoogleAIModelRef("gemini-2.5-flash", &genai.GenerateContentConfig{ +import "google.golang.org/genai" + +config := &genai.GenerateContentConfig{ Temperature: genai.Ptr[float32](0.5), - MaxOutputTokens: genai.Ptr[int32](500), - // Other configuration... -}) +} + +// Option 1: Use a model reference with "baked-in" config +modelRef := googlegenai.ModelRef("gemini-2.5-flash", config) +resp, err := genkit.Generate(ctx, g, ai.WithModel(modelRef), ai.WithPrompt("...")) + +// Option 2: Pass configuration per-request +resp, err = genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-flash", nil)), + ai.WithConfig(config), // Pass config explicitly + ai.WithPrompt("..."), +) ``` -Model references have a `Generate()` method that calls the Google API: +### Structured Output + +Gemini models support structured output generation, which guarantees that the model output will conform to a specified JSON schema. ```go -resp, err := genkit.Generate(ctx, g, ai.WithModel(modelRef), ai.WithPrompt("Tell me a joke.")) +type CharacterProfile struct { + Name string `json:"name"` + Bio string `json:"bio"` + Age int `json:"age"` +} + +var profile CharacterProfile +resp, err := genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-flash", nil)), + ai.WithPrompt("Generate a profile for a fictional character"), + ai.WithOutputType(CharacterProfile{}), +) + if err != nil { - return err + log.Fatal(err) +} + +// Unmarshal the model output into the profile struct +if err := resp.Output(&profile); err != nil { + log.Fatal(err) } +``` + +Alternatively, you can use **`genkit.GenerateData`** for a more succinct call: + +```go +profile, resp, err := genkit.GenerateData[CharacterProfile](ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-flash", nil)), + ai.WithPrompt("Generate a profile for a fictional character"), +) +``` + +#### Schema Limitations + +The Gemini API relies on a specific subset of the OpenAPI 3.0 standard. When defining schemas for structured output, keep the following limitations in mind: + +**Supported Features** +- **Objects & Arrays**: Standard object properties and array items. +- **Enums**: Supported via `enum` tag or string slices in schema. +- **Nullable**: Supported (mapped to `nullable: true`). + +**Critical Limitations** +- **Validation Keywords**: Keywords like `pattern`, `minLength`, `maxLength`, `minItems`, and `maxItems` are **not supported** by the Gemini API's constrained decoding. Including them may result in errors or them being ignored. +- **Recursion**: Recursive schemas are generally not supported. +- **Complexity**: Deeply nested schemas or schemas with hundreds of properties may trigger complexity limits. -log.Println(resp.Text()) +**Best Practices** +- Keep schemas simple and flat where possible. +- Use property descriptions to guide the model instead of complex validation rules. +- If you need strict validation (e.g., regex), perform it in your application code *after* receiving the structured response. + +### Thinking and Reasoning + +Gemini 2.5 and newer models use an internal thinking process that improves reasoning for complex tasks. + +**Thinking Level (Gemini 3.0):** + +```go +resp, err := genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-3-pro-preview", &genai.GenerateContentConfig{ + ThinkingConfig: &genai.ThinkingConfig{ + ThinkingLevel: genai.Ptr("HIGH"), // Or "LOW" or "MEDIUM" + }, + })), + ai.WithPrompt("what is heavier, one kilo of steel or one kilo of feathers"), +) +``` + +Gemini 3 models support the following configuration options for thinking: + +- **ThinkingLevel** _string_ + The reasoning depth for the model (`"HIGH"`, `"MEDIUM"`, `"LOW"`, `"MINIMAL"`). + +**Thinking Budget (Gemini 2.5):** + +```go +resp, err := genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-pro", &genai.GenerateContentConfig{ + ThinkingConfig: &genai.ThinkingConfig{ + ThinkingBudget: genai.Ptr[int32](8192), + IncludeThoughts: true, + }, + })), + ai.WithPrompt("what is heavier, one kilo of steel or one kilo of feathers"), +) ``` -See [Generating content with AI models](/docs/models) for more information. +Gemini 2.5 models support the following configuration options for thinking: -### Embedding models +- **ThinkingBudget** _int32_ + The number of tokens the model is allowed to use for internal thinking. -To get a reference to a supported embedding model, specify its identifier to `googlegenai.GoogleAIEmbedder`: +- **IncludeThoughts** _bool_ + Whether to include the model's internal thoughts in the response. If enabled, you can access thoughts via `response.Reasoning`. + +### Context Caching + +Gemini 2.5 and newer models automatically cache common content prefixes (min 1024 tokens for Flash, 2048 for Pro), providing a 75% token discount on cached tokens. ```go -embeddingModel := googlegenai.GoogleAIEmbedder(g, "text-embedding-004") +// Structure prompts with consistent content at the beginning +baseContext := strings.Repeat("You are a helpful cook... (large context) ...", 50) + +// First request - content will be cached +resp, err := genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-flash", nil)), + ai.WithPrompt(baseContext + "\n\nTask 1..."), +) + +// Second request with same prefix - eligible for cache hit +resp, err = genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-flash", nil)), + ai.WithPrompt(baseContext + "\n\nTask 2..."), +) ``` -Embedder references have an `Embed()` method that calls the Google AI API: +### Safety Settings + +You can configure safety settings to control content filtering for different harm categories: ```go -resp, err := genkit.Embed(ctx, g, ai.WithEmbedder(embeddingModel), ai.WithTextDocs(userInput)) -if err != nil { - return err +resp, err := genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-flash", &genai.GenerateContentConfig{ + SafetySettings: []*genai.SafetySetting{ + { + Category: genai.HarmCategoryHateSpeech, + Threshold: genai.HarmBlockThresholdBlockMediumAndAbove, + }, + { + Category: genai.HarmCategoryDangerousContent, + Threshold: genai.HarmBlockThresholdBlockMediumAndAbove, + }, + }, + })), + ai.WithPrompt("..."), +) +``` + +Available harm categories: +- `HarmCategoryHarassment` +- `HarmCategoryHateSpeech` +- `HarmCategorySexuallyExplicit` +- `HarmCategoryDangerousContent` + +Available thresholds: +- `HarmBlockThresholdUnspecified` +- `HarmBlockThresholdBlockLowAndAbove` +- `HarmBlockThresholdBlockMediumAndAbove` +- `HarmBlockThresholdBlockOnlyHigh` +- `HarmBlockThresholdBlockNone` + +### Google Search Grounding + +Enable Google Search to provide answers with current information and verifiable sources. + +```go +resp, err := genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-flash", &genai.GenerateContentConfig{ + Tools: []*genai.Tool{ + {GoogleSearch: &genai.GoogleSearch{}}, + }, + })), + ai.WithPrompt("What are the top tech news stories this week?"), +) +``` + +The following configuration options are available for Google Search grounding: + +- **GoogleSearch** _struct_ + + Enables Google Search grounding. + Example: `&genai.GoogleSearch{}` + + - **DynamicRetrievalConfig** _struct_ + - **Mode** _string_ + The retrieval mode (e.g., `"MODE_DYNAMIC"`). + - **DynamicThreshold** _float32_ + The threshold for dynamic retrieval (e.g., `0.7`). + +### Google Maps Grounding + +Enable Google Maps to provide location-aware responses. + +```go +resp, err := genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-flash", &genai.GenerateContentConfig{ + Tools: []*genai.Tool{ + {GoogleMaps: &genai.GoogleMaps{}}, + }, + })), + ai.WithPrompt("Find a coffee shop nearby"), +) +``` + +The following configuration options are available for Google Maps grounding: + +- **GoogleMaps** _struct_ + + Enables Google Maps grounding. + Example: `&genai.GoogleMaps{EnableWidget: genai.Ptr(true)}` + + - **EnableWidget** _bool_ + Whether to include a widget token in the response. + +- **ToolConfig** _struct_ + + Additional configuration for provider tools. Can improve relevance by providing location context for Google Maps. + Example: `&genai.ToolConfig{RetrievalConfig: &genai.RetrievalConfig{...}}` + +### Code Execution + +Enable the model to write and execute Python code for calculations and logic. + +```go +resp, err := genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-pro", &genai.GenerateContentConfig{ + Tools: []*genai.Tool{ + {CodeExecution: &genai.ToolCodeExecution{}}, + }, + })), + ai.WithPrompt("Calculate the 100th prime number"), +) +``` + +The following configuration options are available for code execution: + +- **CodeExecution** _struct_ + + Enables code execution for reasoning and calculations. + Example: `&genai.ToolCodeExecution{}` + +### Generating Text and Images (Nano Banana) + +Some Gemini models (like `gemini-2.5-flash-image` AKA "Nano Banana") can output images natively alongside text: + +```go +resp, err := genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-flash-image", &genai.GenerateContentConfig{ + ResponseModalities: []string{"IMAGE", "TEXT"}, + })), + ai.WithPrompt("Create a picture of a futuristic city and describe it"), +) + +// Get all content +for _, part := range resp.Message.Content { + // ... process each part +} + +// Extract image +if img := resp.Media(); img != nil { + fmt.Printf("Image URL: %s\n", img.URL) } + +// Extract text +fmt.Printf("Text: %s\n", resp.Text()) +``` + +## Multimodal Input Capabilities + +### Video Understanding + +Gemini models can process videos passed as URIs or inline data. + +```go +resp, err := genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-flash", nil)), + ai.WithMessages( + ai.NewUserMessage( + ai.NewTextPart("What happens in this video?"), + ai.NewMediaPart("video/mp4", "https://example.com/video.mp4"), + ), + ), +) +``` + +### Image Understanding + +Gemini models can reason about images passed as inline data or URLs. + +```go +resp, err := genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-flash", nil)), + ai.WithMessages( + ai.NewUserMessage( + ai.NewTextPart("Describe what is in this image"), + ai.NewMediaPart("image/jpeg", "https://example.com/image.jpg"), + ), + ), +) +``` + +### Audio Understanding + +Gemini models can process audio files to transcribe speech text or answer questions. + +```go +resp, err := genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-flash", nil)), + ai.WithMessages( + ai.NewUserMessage( + ai.NewTextPart("Transcribe this audio clip"), + ai.NewMediaPart("audio/mp3", "https://example.com/audio.mp3"), + ), + ), +) ``` -See [Retrieval-augmented generation (RAG)](/docs/rag) for more information. +### PDF Support + +Gemini models can process PDF documents. + +```go +resp, err := genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-flash", nil)), + ai.WithMessages( + ai.NewUserMessage( + ai.NewTextPart("Summarize this document"), + ai.NewMediaPart("application/pdf", "https://example.com/doc.pdf"), + ), + ), +) +``` + +## Embedding Models + +### Available Models + +- `gemini-embedding-001` - Latest Gemini embedding model (3072 dimensions) +- `text-embedding-004` - Text embedding model (768 dimensions) +- `multimodalembedding` - Supports text, image, and video embeddings + +### Usage + +```go +resp, err := genkit.Embed(ctx, g, + ai.WithEmbedderName("googleai/gemini-embedding-001"), + ai.WithTextDocs("Machine learning models process data to make predictions."), +) +``` + +## Image Models + +### Available Models + +**Imagen 4 Series** - Latest generation with improved quality: +- `imagen-4.0-generate-001` - Standard quality +- `imagen-4.0-ultra-generate-001` - Ultra-high quality +- `imagen-4.0-fast-generate-001` - Fast generation + +**Imagen 3 Series**: +- `imagen-3.0-generate-002` + +### Usage + +```go +resp, err := genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("imagen-3.0-generate-002", &genai.GenerateImagesConfig{ + NumberOfImages: genai.Ptr[int32](4), + AspectRatio: genai.Ptr[string]("16:9"), + })), + ai.WithPrompt("A serene Japanese garden with cherry blossoms and a koi pond."), +) +``` + +## Video Models + +The Google AI plugin provides access to video generation capabilities through the Veo models. + +### Available Models + +**Veo 3.1 Series** - Latest generation with native audio and high fidelity: +- `veo-3.1-generate-preview` - High-quality video and audio generation +- `veo-3.1-fast-generate-preview` - Fast generation with high quality + +**Veo 3.0 Series**: +- `veo-3.0-generate-001` +- `veo-3.0-fast-generate-001` + +**Veo 2.0 Series**: +- `veo-2.0-generate-001` + +### Usage + +Video generation returns an operation that you must poll for results. + +```go +// Start video generation +op, err := genkit.GenerateOperation(ctx, g, + ai.WithModelName("googleai/veo-3.0-fast-generate-001"), + ai.WithPrompt("A majestic dragon soaring over a mystical forest at dawn."), +) + +// Check progress using the operation ID +for !op.Done { + time.Sleep(5 * time.Second) + op = genkit.CheckOperation(ctx, g, op) +} + +if op.Error != nil { + log.Fatal(op.Error) +} + +// Access generated video URI from the operation result +videoURI := op.Output.Message.Content[0].Media.URL +``` + +The Veo models support the following configuration options via `genai.GenerateVideosConfig`: + +- **NegativePrompt** _string_ + + Text that describes anything you want to discourage the model from generating. + +- **AspectRatio** _string_ + + Changes the aspect ratio of the generated video. + - `"16:9"` + - `"9:16"` + +- **PersonGeneration** _string_ + + Allow the model to generate videos of people. + - **Text-to-video generation**: + - `"allow_all"`: Generate videos that include adults and children. Currently the only available value for Veo 3. + - `"dont_allow"` (Veo 2 only): Don't allow people or faces. + - `"allow_adult"` (Veo 2 only): Generate videos with adults, but not children. + - **Image-to-video generation** (Veo 2 only): + - `"dont_allow"`: Don't allow people or faces. + - `"allow_adult"`: Generate videos with adults, but not children. + +- **NumberOfVideos** _int32_ + + Output videos requested. + - `1`: Supported in Veo 3 and Veo 2. + - `2`: Supported in Veo 2 only. + +- **DurationSeconds** _int32_ (Veo 2 only) + + Length of each output video in seconds (5 to 8). Not configurable for Veo 3.1/3.0 (defaults to 8 seconds). + +- **Resolution** _string_ (Veo 3.1 only) + + Resolution of the generated video. + - `"720p"` (default) + - `"1080p"` (Available for 16:9 aspect ratio) + +- **Seed** _int32_ (Veo 3.1/3.0 only) + + Sets the random seed for generation consistency. Doesn't guarantee determinism but improves consistency. + +- **EnhancePrompt** _bool_ (Veo 2 only) + + Enable or disable the prompt rewriter. Enabled by default. For Veo 3.1/3.0, the prompt enhancer is always on. + +## Speech Models + +The Google GenAI plugin provides access to text-to-speech capabilities through Gemini TTS models. + +### Available Models + +- `gemini-2.5-flash-preview-tts` +- `gemini-2.5-pro-preview-tts` + +### Usage + +To convert text to single-speaker audio, set the response modality to `"AUDIO"`, and pass a `SpeechConfig` object with `VoiceConfig` set. You'll need to choose a voice name from the prebuilt [output voices](https://ai.google.dev/gemini-api/docs/speech-generation#voices). + +The plugin returns raw PCM data in the response text (base64 encoded), which can then be converted to a standard format like WAV. + +```go +import "google.golang.org/genai" + +resp, err := genkit.Generate(ctx, g, + ai.WithModel(googlegenai.ModelRef("gemini-2.5-flash-preview-tts", &genai.GenerateContentConfig{ + ResponseModalities: []string{"AUDIO"}, + SpeechConfig: &genai.SpeechConfig{ + VoiceConfig: &genai.VoiceConfig{ + PrebuiltVoiceConfig: &genai.PrebuiltVoiceConfig{ + VoiceName: "Algenib", + }, + }, + }, + })), + ai.WithPrompt("Say: Genkit is the best Gen AI library!"), +) + +if err != nil { + log.Fatal(err) +} + +// The model output will be a base64 encoded string in resp.Text() +// You can decode this and save it as a PCM file or convert to WAV. +``` ## Next Steps @@ -1041,7 +1558,6 @@ See [Retrieval-augmented generation (RAG)](/docs/rag) for more information. - Explore [creating flows](/docs/flows) to build structured AI workflows - To use the Gemini API at enterprise scale see the [Vertex AI plugin](/docs/integrations/vertex-ai) -