diff --git a/charts/backingservices/charts/autopilot/Chart.yaml b/charts/backingservices/charts/autopilot/Chart.yaml new file mode 100644 index 000000000..bc4d04225 --- /dev/null +++ b/charts/backingservices/charts/autopilot/Chart.yaml @@ -0,0 +1,5 @@ +apiVersion: v1 +name: autopilot +description: Pega Autopilot AI service for on-prem deployments with direct LLM provider connectivity +version: 1.0.0 +appVersion: 1.0.0 diff --git a/charts/backingservices/charts/autopilot/README.md b/charts/backingservices/charts/autopilot/README.md new file mode 100644 index 000000000..3779f196c --- /dev/null +++ b/charts/backingservices/charts/autopilot/README.md @@ -0,0 +1,517 @@ +# Autopilot Service Helm chart + +The Pega `Autopilot Service` backing service provides GenAI-powered capabilities for Pega Infinity Platform by connecting directly to LLM providers (Azure OpenAI, AWS Bedrock, Google Vertex AI). This chart deploys the Autopilot Service for on-premise environments. + +## Pega GenaAI Features Support Matrix + +| Pega Version | GenAI Connect | GenAI Coach | GenAI Agent | +|---|:---:|:---:|:---:| +| 24.2 | Yes | | | +| 24.2.4 | Yes | Yes | | +| 25.1.1 *(requires HFIX-C4307)* | Yes | Yes | Yes | +| 25.1.2 | Yes | Yes | Yes | + +## Configuring a backing service with your pega environment + +You can provision the Autopilot Service into your `pega` environment namespace or any namesapce, with the autopilot service endpoint configured for your Pega Infinity environment. + +## Supported LLM Providers + +| Provider | Authentication Methods | +|---|---| +| Azure OpenAI | API Key, Pre-existing Secret | +| AWS Bedrock | Access Key/Secret, Pre-existing Secret | +| Google Vertex AI | Service Account JSON (base64-encoded), Pre-existing Secret | + +## Configuration settings + +| Configuration | Usage | +|---|---| +| `enabled` | Enable the Autopilot Service deployment as a backing service. Set this parameter to `true` to deploy the service. | +| `deployment.name` | Specify the name of your Autopilot Service deployment. Your deployment creates resources prefixed with this string. | +| `docker.registry.url` | Specify the image registry URL. | +| `docker.registry.username` | Specify the username for the Docker registry. | +| `docker.registry.password` | Specify the password for the Docker registry. | +| `docker.imagePullSecretNames` | List pre-existing secrets to be used for pulling Docker images. | +| `docker.autopilot.image` | Specify the Autopilot Service Docker image and tag. | +| `docker.autopilot.imagePullPolicy` | Specify the image pull policy. Default is `Always`. | +| `replicas` | Number of pod replicas to provision. Default is `2`. | +| `service.port` | Defines the port used by the Service. Default is `80`. | +| `service.targetPort` | Defines the port used by the Pod and Container. Default is `8080`. | +| `service.serviceType` | The [type of service](https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types) you wish to expose. Default is `ClusterIP`. | +| `enableGenaiHub` | Set to `false` for on-prem deployments with direct provider connectivity. Default is `false`. | +| `authEnabled` | Enable or disable authentication for the service. Default is `false`. | +| `isInternalDeployment` | Set to `false` for on-prem deployments. Default is `false`. | +| `modelProviders` | Comma-separated list of providers to enable (e.g., `"Azure,Vertex,Bedrock"`). Filters the model list to only include models from the specified providers. If empty, all models from the model list are returned. | +| `awsRegion` | AWS region for Bedrock. Default is `us-east-1`. | +| `affinity` | Define pod affinity so that it is restricted to run on particular node(s), or to prefer to run on particular nodes. | +| `tolerations` | Define pod tolerations so that it is allowed to run on node(s) with particular taints. | + +## Provider credentials + +The Autopilot Service supports two methods for providing LLM provider credentials. + +### Option 1: Inline credentials (auto-creates Kubernetes Secret) + +Specify provider credentials directly in your values file. The chart automatically creates a Kubernetes Secret containing these values. + +| Configuration | Usage | +|---|---| +| `azure.endpoint` | Azure OpenAI endpoint URL (e.g., `https://my-openai.openai.azure.com/`). | +| `azure.apiKey` | Azure OpenAI API key. | +| `azure.apiVersion` | Azure OpenAI API version. Default is `2024-10-21`. | +| `aws.accessKeyId` | AWS access key ID for Bedrock. | +| `aws.secretAccessKey` | AWS secret access key for Bedrock. | +| `aws.sessionToken` | Optional AWS session token for temporary credentials. | +| `vertex.credentials` | Base64-encoded Google service account JSON key. The `project_id` is automatically extracted from the JSON. | +| `vertex.location` | Vertex AI location. Default is `us-central1`. | + +```yaml +autopilot: + enabled: true + azure: + endpoint: "https://my-openai.openai.azure.com/" + apiKey: "your-azure-api-key" + aws: + accessKeyId: "AKIAIOSFODNN7EXAMPLE" + secretAccessKey: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" + vertex: + credentials: "base64-encoded-service-account-json" + location: "us-central1" +``` + +### Option 2: Pre-existing Kubernetes Secret + +Use a secret that you create and manage outside of this chart. Set `providerCredentialsSecret` to the name of your secret. The secret should contain keys matching the environment variable names used by the service. + +| Configuration | Usage | +|---|---| +| `providerCredentialsSecret` | Name of an existing Kubernetes Secret containing provider credentials. When set, inline credentials are ignored. | + +The secret can contain credentials for all providers in a single secret. Only the keys relevant to the providers you are using need to be present — all keys are mounted as `optional: true` so missing keys are silently ignored. + +| Key | Provider | Description | +|---|---|---| +| `AZURE_ENDPOINT` | Azure OpenAI | Azure OpenAI endpoint URL (e.g. `https://my-openai.openai.azure.com/`) | +| `AZURE_OPENAI_KEY` | Azure OpenAI | Azure OpenAI API key | +| `AWS_ACCESS_KEY_ID` | AWS Bedrock | AWS access key ID | +| `AWS_SECRET_ACCESS_KEY` | AWS Bedrock | AWS secret access key | +| `AWS_SESSION_TOKEN` | AWS Bedrock | AWS session token (optional, for temporary credentials) | +| `VERTEX_AUTH` | Google Vertex AI | Base64-encoded Google service account JSON. The Autopilot service base64-decodes this value itself, so the secret must hold the base64 string — not the raw JSON. | + +```yaml +autopilot: + enabled: true + providerCredentialsSecret: "my-provider-credentials" +``` + +Create the secret with credentials for all providers you intend to use. For `VERTEX_AUTH`, pass the base64-encoded JSON using `$(base64 -w0 ...)` so the pod receives the encoded string the service expects: + +```bash +kubectl create secret generic my-provider-credentials \ + --namespace \ + --from-literal=AZURE_ENDPOINT="https://my-openai.openai.azure.com/" \ + --from-literal=AZURE_OPENAI_KEY="your-azure-api-key" \ + --from-literal=AWS_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE" \ + --from-literal=AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" \ + --from-literal=AWS_SESSION_TOKEN="your-session-token" \ + --from-literal=VERTEX_AUTH="$(base64 -w0 /path/to/gcp-service-account.json)" +``` + +You can also create the secret from a YAML manifest to manage it in source control. For `VERTEX_AUTH`, put the base64-encoded JSON string directly in `stringData`: + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: my-provider-credentials + namespace: +type: Opaque +stringData: + AZURE_ENDPOINT: "https://my-openai.openai.azure.com/" + AZURE_OPENAI_KEY: "your-azure-api-key" + AWS_ACCESS_KEY_ID: "AKIAIOSFODNN7EXAMPLE" + AWS_SECRET_ACCESS_KEY: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" + AWS_SESSION_TOKEN: "your-session-token" # omit if not using temporary credentials + VERTEX_AUTH: "" # base64 encode the JSON file: base64 -w0 gcp-service-account.json +``` + +## Custom models configuration + +The Autopilot Service uses a model list to determine which LLM models are available. The service routes requests to the appropriate provider endpoint based on the model metadata in each entry. The `model_id` field format differs by provider — see [Building model list](#building-model-list) below for the rules per provider. + +## Building model list + +### For Azure OpenAI and Vertex AI + +- **`name`** — Must match the deployment name as it appears in the LLM provider console (e.g., the Azure OpenAI Studio deployment name, or the Vertex AI model ID shown in Model Garden). This value is used as the display name and routing key. +- **`model_path`** — Must be provided as an array of API endpoint paths relative to the provider base URL. For Azure OpenAI the path embeds the Azure portal deployment name (e.g., `["/openai/deployments/gpt-5/chat/completions"]`). For Vertex AI it embeds the model identifier (e.g., `["/google/deployments/gemini-2.5-pro/chat/completions"]`). + +### For AWS Bedrock + +- **`model_id`** — Must be the **exact model ID as shown in the AWS Bedrock console**, including the cross-region inference prefix and version suffix (e.g., `us.anthropic.claude-3-7-sonnet-20250219-v1:0`, `us.amazon.nova-pro-v1:0`, `amazon.titan-embed-text-v2:0`). + + +### Option 1: Use the default models file bundled with the chart (recommended) + +The chart includes a `files/default-models.json` file containing a curated list of models for all supported providers. Setting `deployModelsConfigMap: true` (the default) automatically creates a ConfigMap from this file during `helm install` or `helm upgrade`. + +| Configuration | Usage | +|---|---| +| `deployModelsConfigMap` | Set to `true` to create a ConfigMap from the bundled `files/default-models.json`. Default is `true`. | + +```yaml +autopilot: + enabled: true + deployModelsConfigMap: true +``` + +To customize the default model list before deployment, edit `files/default-models.json` in the chart directory. + +### Option 2: Provide a pre-existing ConfigMap + +Use a ConfigMap that you create and manage outside of this chart. The ConfigMap must contain a key named `models.json` with the model list as JSON content. + +| Configuration | Usage | +|---|---| +| `customModels.existingConfigMap` | Name of an existing ConfigMap containing `models.json`. | + +```yaml +autopilot: + enabled: true + deployModelsConfigMap: false + customModels: + existingConfigMap: "my-models-configmap" +``` + +Create the ConfigMap: + +```bash +kubectl create configmap my-models-configmap \ + --namespace \ + --from-file=models.json=./my-models.json +``` + +### Option 3: Inline model list in values + +Provide the model JSON content directly in your values file. The chart creates a ConfigMap from this inline content. + +| Configuration | Usage | +|---|---| +| `customModels.inline` | JSON string containing the model list. | + +```yaml +autopilot: + enabled: true + deployModelsConfigMap: false + customModels: + inline: | + [ + { + "provider": "azure", + "creator": "openai", + "model_name": "GPT-5", + "model_mapping_id": "gpt-5-2025-08-07", + "name": "gpt-5-2025-08-07", + "model_id": "gpt-5-2025-08-07", + "input_tokens": 400000, + "output_tokens": 128000, + "type": "chat_completion", + "version": "2025-08-07", + "model_path": ["/openai/deployments/gpt-5-2025-08-07/chat/completions"], + "supported_capabilities": { + "streaming": true, + "functions": true, + "json_mode": true + } + } + ] +``` + +### Model JSON format + +Each model entry in the models file requires the following fields: + +| Field | Required | Description | +|---|---|---| +| `provider` | Yes | Cloud provider (`azure`, `bedrock`, `vertex`). | +| `creator` | Yes | Model creator (e.g., `openai`, `anthropic`, `google`, `amazon`). | +| `model_name` | Yes | Model identifer name. | +| `name` | Yes | **For Azure OpenAI and Vertex AI:** must match the deployment name as shown in the LLM provider console. Used as the display name and routing key. | +| `model_id` | Yes | The provider-native model identifier. **For Azure OpenAI and Vertex AI:** the deployment name exactly as shown in the provider console — same value as `model_mapping_id` (e.g., `gpt-5-2025-08-07`, `gemini-2.5-pro`). **For Bedrock:** the exact model ID from the AWS Bedrock console, including cross-region prefix and version suffix (e.g., `us.anthropic.claude-3-7-sonnet-20250219-v1:0`, `us.amazon.nova-pro-v1:0`, `amazon.titan-embed-text-v2:0`). | +| `model_mapping_id` | Yes | Provider-specific deployment name used for API routing and for constructing `model_path` entries. | +| `model_path` | Yes | Array of API endpoint paths for the model, embedding the `model_mapping_id` value. **Azure OpenAI:** `["/openai/deployments//chat/completions"]`. **Vertex AI:** `["/google/deployments//chat/completions"]`. **Bedrock:** `["/anthropic/deployments//converse", "/anthropic/deployments//converse-stream"]` (adjust creator prefix to match). | +| `type` | Yes | Model type: `chat_completion`, `embedding`, or `image`. | +| `version` | Yes | Model version string. | +| `input_tokens` | No | Maximum input token count. | +| `output_tokens` | No | Maximum output token count. | +| `supported_capabilities` | No | Object describing model capabilities (streaming, functions, multimodal, etc.). | +| `parameters` | No | Object describing tunable parameters (temperature, top_p, max_tokens, etc.). | + +## Default fast and smart models + +The service automatically assigns default `fast` and `smart` models from the model list: +- **1st model** in the list -> `fast` +- **2nd model** in the list -> `smart` +- If only one model is available, it is used for both. + +To control which models are assigned as defaults, order your models file accordingly — the first two entries will be picked as `fast` and `smart` respectively. + +### Liveness and readiness probes + +The Autopilot Service uses liveness and readiness probes on the `/v1/health` endpoint. Configure probes as part of a `livenessProbe` or `readinessProbe` configuration. + +Parameter | Description | Default `livenessProbe` | Default `readinessProbe` +--- | --- | --- | --- +`initialDelaySeconds` | Number of seconds after the container has started before probes are initiated. | `10` | `10` +`timeoutSeconds` | Number of seconds after which the probe times out. | `10` | `10` +`periodSeconds` | How often (in seconds) to perform the probe. | `10` | `10` +`successThreshold` | Minimum consecutive successes for the probe to be considered successful after it determines a failure. | `1` | `1` +`failureThreshold` | The number of consecutive failures for the pod to be terminated by Kubernetes. | `3` | `3` + +Example: + +```yaml +livenessProbe: + initialDelaySeconds: 10 + timeoutSeconds: 10 + periodSeconds: 10 + successThreshold: 1 + failureThreshold: 3 +readinessProbe: + initialDelaySeconds: 10 + timeoutSeconds: 10 + periodSeconds: 10 + successThreshold: 1 + failureThreshold: 3 +``` + +## Ingress + +To expose the Autopilot Service externally, enable the ingress configuration. + +| Configuration | Usage | +|---|---| +| `ingress.enabled` | Set to `true` to deploy an ingress. Default is `false`. | +| `ingress.domain` | Specify your custom domain. | +| `ingress.ingressClassName` | Ingress class to be used. | +| `ingress.tls.enabled` | Specify the use of HTTPS for ingress connectivity. | +| `ingress.tls.secretName` | Specify the Kubernetes secret containing your SSL certificate. | +| `ingress.annotations` | Specify additional annotations to add to the ingress. | + +```yaml +ingress: + enabled: true + domain: "autopilot.example.com" + ingressClassName: "nginx" + tls: + enabled: true + secretName: "autopilot-tls-secret" + annotations: + nginx.ingress.kubernetes.io/proxy-body-size: "10m" +``` + +## Connecting Pega Infinity with Autopilot Service + +After deploying the Autopilot Service, configure your Pega Infinity environment to connect to it. There are two ways to do this. + +### Option 1: Dynamic System Setting (DSS) + +1. Log in to Pega Infinity as an administrator. +2. Navigate to **Records > SysAdmin > Dynamic System Settings** and create a new DSS with the following details: + +| Field | Value | +|---|---| +| **Owning Ruleset** | `Pega-Engine` | +| **Setting Purpose** | `prconfig/services/genai/autopilot/servicebaseurl/default` | +| **Value** | `http://..svc.cluster.local/` | + +3. Replace `` with the Autopilot Service name (default: `autopilot`) and `` with the Kubernetes namespace where the service is deployed. +4. Save the DSS. + +**Example:** If the Autopilot Service is deployed with the default name `autopilot` in the `autopilot` namespace: + +``` +http://autopilot.autopilot.svc.cluster.local/ +``` + +5. **A restart of Pega Infinity nodes is required for the DSS change to take effect.** + +### Option 2: prconfig.xml + +Add the Autopilot service URL directly to `charts/pega/config/deploy/prconfig.xml` in the Pega Helm charts repository: + +```xml + +``` + +**Example:** + +```xml + +``` + +After editing `prconfig.xml`, apply the change with a `helm upgrade` followed by a rollout restart: + +```bash +helm upgrade pega -f my-values.yaml -n +kubectl rollout restart statefulset/ -n +``` + +**A restart of Pega Infinity is required whenever the prconfig.xml entry is added or changed.** + +### Precedence + +If the Autopilot service URL is configured by both methods, **`prconfig.xml` takes precedence over the DSS**. + +## OAuth authentication between Pega Infinity and the Autopilot service + +You can enable OAuth authentication to secure requests between Pega Infinity and the Autopilot service using an Identity Provider (IdP). The Autopilot service only supports the `private_key_jwt` authentication type. + +### How it works + +- Pega Infinity obtains a Bearer token from your IdP using an OAuth 2.0 `client_credentials` grant. +- The token is attached as an `Authorization: Bearer ` header on every request to the Autopilot service. +- The Autopilot service validates incoming tokens against the IdP public key endpoint (`oauthPublicKeyURL`). + +### Scopes + +The Autopilot service does not require any OAuth scopes. Leave `autopilot.autopilotAuth.scopes` empty (or omit it entirely) when configuring the Pega chart. + +### Shared credentials with SRS and token-minting precedence + +Pega Infinity uses a single set of `SERV_AUTH_*` environment variables to mint Bearer tokens for backing services. When both SRS auth (`pegasearch.srsAuth`) and Autopilot auth (`autopilot.autopilotAuth`) are enabled at the same time, **the SRS credentials take precedence** and are used to mint tokens sent to both SRS and the Autopilot service. + +This means: + +- Both SRS and Autopilot can share the same IdP client application and credentials. +- You do not need separate client registrations for each backing service if both are pointed at the same IdP authorization server. +- If only `autopilot.autopilotAuth` is enabled (SRS auth is disabled), the Autopilot credentials are used to mint tokens. +- If both are enabled and credentials differ, the SRS credentials win — the Autopilot-specific credentials are not used for token minting. + +In practice, configure both backing services to trust the same IdP public key endpoint and issue tokens from the same client application. The recommended setup when both services are deployed together: + +```yaml +# pega chart values +pegasearch: + srsAuth: + enabled: true + url: "https://your-idp-host/oauth2/v1/token" + clientId: "your-shared-client-id" + authType: "private_key_jwt" + privateKey: "LS0tLS1CRUdJTiBSU0Eg..." + +autopilot: + autopilotAuth: + enabled: true + url: "https://your-idp-host/oauth2/v1/token" + clientId: "your-shared-client-id" + privateKey: "LS0tLS1CRUdJTiBSU0Eg..." + scopes: "" # Autopilot requires no scopes +``` + +Because SRS takes precedence, the token sent to Autopilot is minted using the SRS credentials above. Both services must therefore trust tokens issued for the same client. + +### Autopilot service configuration (backingservices chart) + +Enable auth on the Autopilot service and set the IdP public key URL so it can validate incoming tokens: + +| Parameter | Description | Default | +|---|---|---| +| `authEnabled` | Enables token validation on incoming requests to the Autopilot service. | `false` | +| `oauthPublicKeyURL` | URL of the IdP public key endpoint used to verify Bearer tokens. Required when `authEnabled` is `true`. | `""` | + +```yaml +autopilot: + enabled: true + authEnabled: true + oauthPublicKeyURL: "https://your-idp-host/oauth2/v1/keys" +``` + +### Pega Infinity configuration (pega chart) + +Configure the Pega chart to mint tokens and attach them to Autopilot requests. The Autopilot service URL itself is configured via a DSS in Pega Infinity (see "Connecting Pega Infinity with Autopilot Service" above) — no URL parameter is needed in the pega chart. + +| Parameter | Description | Default | +|---|---|---| +| `autopilot.autopilotAuth.enabled` | Enables OAuth token minting on the Pega Infinity side. | `false` | +| `autopilot.autopilotAuth.url` | URL of the OAuth service endpoint to obtain a token. | `""` | +| `autopilot.autopilotAuth.clientId` | OAuth client ID. | `""` | +| `autopilot.autopilotAuth.authType` | Authentication type. Only `private_key_jwt` is supported. | `"private_key_jwt"` | +| `autopilot.autopilotAuth.privateKey` | Base64-encoded PKCS8 private key. | `""` | +| `autopilot.autopilotAuth.privateKeyAlgorithm` | Algorithm for the private key. Allowed values: `RS256`, `RS384`, `RS512`, `ES256`, `ES384`, `ES512`. Defaults to `RS256` if not set. | `""` | +| `autopilot.autopilotAuth.scopes` | OAuth scopes to request. The Autopilot service does not require any scopes — leave this empty. | `""` | +| `autopilot.autopilotAuth.external_secret_name` | Name of a pre-existing Kubernetes Secret containing the key (key: `AUTOPILOT_OAUTH_PRIVATE_KEY`). When set, `privateKey` is ignored and no secret is created by the chart. | `""` | + +```yaml +autopilot: + autopilotAuth: + enabled: true + url: "https://your-idp-host/oauth2/v1/token" + clientId: "your-client-id" + privateKey: "LS0tLS1CRUdJTiBSU0Eg..." + privateKeyAlgorithm: "RS256" + scopes: "" # no scopes required for Autopilot +``` + +### Using a pre-existing secret + +To avoid placing the private key directly in `values.yaml`, create a Kubernetes Secret beforehand and reference it: + +```bash +kubectl create secret generic my-autopilot-auth-secret \ + --namespace \ + --from-literal=AUTOPILOT_OAUTH_PRIVATE_KEY="" +``` + +Then set in the pega chart values: + +```yaml +autopilot: + autopilotAuth: + enabled: true + external_secret_name: "my-autopilot-auth-secret" +``` + +## Example: Full deployment configuration + +```yaml +autopilot: + enabled: true + deployment: + name: "autopilot" + + docker: + registry: + url: YOUR_REGISTRY_URL + username: YOUR_REGISTRY_USERNAME + password: YOUR_REGISTRY_PASSWORD + autopilot: + image: YOUR_REGISTRY_URL/autopilot-service:latest + imagePullPolicy: Always + + replicas: 2 + enableGenaiHub: false + modelProviders: "Azure,Vertex,Bedrock" + deployModelsConfigMap: true + + azure: + endpoint: "https://my-openai.openai.azure.com/" + apiKey: "your-azure-api-key" + + aws: + accessKeyId: "your-access-key-id" + secretAccessKey: "your-secret-access-key" + + vertex: + credentials: "base64-encoded-service-account-json" + location: "us-central1" + + resources: + requests: + cpu: 500m + memory: "1Gi" + limits: + cpu: "1" + memory: "2Gi" +``` diff --git a/charts/backingservices/charts/autopilot/files/default-models.json b/charts/backingservices/charts/autopilot/files/default-models.json new file mode 100644 index 000000000..91f7a867a --- /dev/null +++ b/charts/backingservices/charts/autopilot/files/default-models.json @@ -0,0 +1,2696 @@ +[ + { + "provider": "azure", + "creator": "openai", + "model_name": "GPT-5", + "description": "OpenAI Chat Completions model", + "model_mapping_id": "gpt-5-2025-08-07", + "name": "gpt-5-2025-08-07", + "input_tokens": 400000, + "output_tokens": 128000, + "type": "chat_completion", + "model_id": "gpt-5-2025-08-07", + "default_model": false, + "version": "2025-08-07", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/png", + "image/jpeg", + "image/gif", + "image/webp" + ], + "functions": true, + "parallel_function_calling": true, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_completion_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 128000, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "n": { + "default": null, + "description": "number of completions to generate", + "title": "N", + "type": "integer", + "required": false + }, + "parallel_tool_calls": { + "default": null, + "description": "parallel tool calls", + "title": "Parallel Tool Calls", + "type": "boolean", + "required": false + }, + "response_format": { + "default": null, + "description": "response_format", + "title": "Response Format", + "type": "object", + "required": false, + "examples": [ + "text", + "json_object" + ] + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": 1, + "description": "Controls randomness of the generated output", + "maximum": 1, + "minimum": 1, + "title": "Temperature", + "type": "float", + "required": false + }, + "tool_choice": { + "default": null, + "description": "auto, none literals or valid function name string", + "title": "Tool Choice", + "type": "string", + "required": false, + "examples": [ + "auto", + "none", + "" + ] + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 1, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Gpt 5", + "model_path": [ + "/openai/deployments/gpt-5/chat/completions" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "azure", + "creator": "openai", + "model_name": "GPT-4.1", + "description": "OpenAI Chat Completions model", + "model_mapping_id": "gpt-4.1-2025-04-14", + "name": "gpt-4.1-2025-04-14", + "input_tokens": 1047576, + "output_tokens": 32768, + "type": "chat_completion", + "model_id": "gpt-4.1-2025-04-14", + "default_model": false, + "version": "2025-04-14", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/png", + "image/jpeg", + "image/gif", + "image/webp" + ], + "functions": true, + "parallel_function_calling": true, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 32768, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "n": { + "default": null, + "description": "number of completions to generate", + "title": "N", + "type": "integer", + "required": false + }, + "parallel_tool_calls": { + "default": null, + "description": "parallel tool calls", + "title": "Parallel Tool Calls", + "type": "boolean", + "required": false + }, + "response_format": { + "default": null, + "description": "response_format", + "title": "Response Format", + "type": "object", + "required": false, + "examples": [ + "text", + "json_object" + ] + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": 1, + "description": "Controls randomness of the generated output", + "maximum": 2, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "tool_choice": { + "default": null, + "description": "auto, none literals or valid function name string", + "title": "Tool Choice", + "type": "string", + "required": false, + "examples": [ + "auto", + "none", + "" + ] + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Gpt 4.1", + "model_path": [ + "/openai/deployments/gpt-4.1/chat/completions" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "azure", + "creator": "openai", + "model_name": "GPT-4.1-Mini", + "description": "OpenAI Chat Completions model", + "model_mapping_id": "gpt-4.1-mini-2025-04-14", + "name": "gpt-4.1-mini-2025-04-14", + "input_tokens": 1047576, + "output_tokens": 32768, + "type": "chat_completion", + "model_id": "gpt-4.1-mini-2025-04-14", + "default_model": false, + "version": "2025-04-14", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/png", + "image/jpeg", + "image/gif", + "image/webp" + ], + "functions": true, + "parallel_function_calling": true, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 32768, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "n": { + "default": null, + "description": "number of completions to generate", + "title": "N", + "type": "integer", + "required": false + }, + "parallel_tool_calls": { + "default": null, + "description": "parallel tool calls", + "title": "Parallel Tool Calls", + "type": "boolean", + "required": false + }, + "response_format": { + "default": null, + "description": "response_format", + "title": "Response Format", + "type": "object", + "required": false, + "examples": [ + "text", + "json_object" + ] + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": 1, + "description": "Controls randomness of the generated output", + "maximum": 2, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "tool_choice": { + "default": null, + "description": "auto, none literals or valid function name string", + "title": "Tool Choice", + "type": "string", + "required": false, + "examples": [ + "auto", + "none", + "" + ] + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Gpt 4.1 Mini", + "model_path": [ + "/openai/deployments/gpt-4.1-mini/chat/completions" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "vertex", + "creator": "google", + "model_name": "Gemini-Flash", + "description": "Gemini Chat Completions model", + "model_mapping_id": "gemini-1.5-flash", + "name": "gemini-1.5-flash", + "input_tokens": 1048576, + "output_tokens": 8192, + "type": "chat_completion", + "model_id": "gemini-1.5-flash", + "default_model": false, + "version": "1.5-flash", + "deprecation_info": { + "is_deprecated": true, + "scheduled_deprecation_date": "2025-09-24" + }, + "supported_capabilities": { + "streaming": true, + "multimodal": null, + "functions": true, + "parallel_function_calling": false, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 8192, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": 1, + "description": "Controls randomness of the generated output", + "maximum": 2, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "top_p": { + "default": 0.95, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "alternate_model_info": { + "name": "gemini-2.5-flash", + "provider": "vertex", + "creator": "google" + }, + "model_label": "Gemini 1.5 Flash", + "model_path": [ + "/google/deployments/gemini-1.5-flash/chat/completions" + ], + "lifecycle": "Deprecated", + "deprecation_date": "2025-09-24" + }, + { + "provider": "bedrock", + "creator": "anthropic", + "model_name": "Claude-Sonnet-4-6", + "description": "Claude Chat Completions model", + "model_mapping_id": "claude-sonnet-4-6", + "name": "claude-sonnet-4-6", + "input_tokens": 200000, + "output_tokens": 64000, + "type": "chat_completion", + "model_id": "us.anthropic.claude-sonnet-4-6", + "default_model": false, + "version": "v1", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/jpeg", + "image/png", + "image/gif", + "image/webp", + "application/pdf", + "application/docx", + "text/csv", + "application/doc", + "application/xls", + "application/xlsx", + "text/html", + "text/plain", + "text/markdown" + ], + "functions": true, + "parallel_function_calling": false, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": 64000, + "description": "Maximum number of tokens to generate", + "maximum": 64000, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": null, + "description": "Controls randomness of the generated output", + "maximum": 1, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Claude Sonnet 4.6", + "model_path": [ + "/anthropic/deployments/claude-sonnet-4-6/converse", + "/anthropic/deployments/claude-sonnet-4-6/converse-stream" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "vertex", + "creator": "google", + "model_name": "Gemini-Flash-Lite", + "description": "Gemini Chat Completions model", + "model_mapping_id": "gemini-2.5-flash-lite", + "name": "gemini-2.5-flash-lite", + "input_tokens": 1048576, + "output_tokens": 65535, + "type": "chat_completion", + "model_id": "gemini-2.5-flash-lite", + "default_model": false, + "version": "2.5-flash-lite", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/png", + "image/jpeg", + "image/webp", + "audio/x-aac", + "audio/flac", + "audio/mp3", + "audio/m4a", + "audio/mpeg", + "audio/mpga", + "audio/mp4", + "audio/ogg", + "audio/pcm", + "audio/wav", + "audio/webm", + "video/x-flv", + "video/quicktime", + "video/mpeg", + "video/mpg", + "video/mp4", + "video/webm", + "video/wmv", + "video/3gpp", + "application/pdf", + "text/plain" + ], + "functions": true, + "parallel_function_calling": false, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 65535, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "n": { + "default": null, + "description": "number of completions to generate", + "title": "N", + "type": "integer", + "required": false + }, + "response_format": { + "default": null, + "description": "response_format", + "title": "Response Format", + "type": "object", + "required": false, + "examples": [ + "text", + "json_object" + ] + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "array", + "required": false + }, + "temperature": { + "default": 1, + "description": "Controls randomness of the generated output", + "maximum": 2, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "thinking_budget": { + "default": null, + "description": "Allows control over hov much model thinks during its responses", + "maximum": 24576, + "minimum": 1, + "title": "Thinking Budget", + "type": "integer", + "required": false + }, + "tool_choice": { + "default": null, + "description": "auto, none literals or valid function name string", + "title": "Tool Choice", + "type": "string", + "required": false, + "examples": [ + "auto", + "none", + "" + ] + }, + "top_p": { + "default": 0.95, + "description": "Nucleus sampling parameter controlling diversity of the output", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Gemini 2.5 Flash-Lite", + "model_path": [ + "/google/deployments/gemini-2.5-flash-lite/chat/completions" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "azure", + "creator": "openai", + "model_name": "GPT-5.2", + "description": "OpenAI Chat Completions model", + "model_mapping_id": "gpt-5.2-2025-12-11", + "name": "gpt-5.2-2025-12-11", + "input_tokens": 400000, + "output_tokens": 128000, + "type": "chat_completion", + "model_id": "gpt-5.2-2025-12-11", + "default_model": false, + "version": "2025-12-11", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/png", + "image/jpeg", + "image/gif", + "image/webp" + ], + "functions": true, + "parallel_function_calling": true, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_completion_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 128000, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "n": { + "default": null, + "description": "number of completions to generate", + "title": "N", + "type": "integer", + "required": false + }, + "parallel_tool_calls": { + "default": null, + "description": "parallel tool calls", + "title": "Parallel Tool Calls", + "type": "boolean", + "required": false + }, + "response_format": { + "default": null, + "description": "response_format", + "title": "Response Format", + "type": "object", + "required": false, + "examples": [ + "text", + "json_object" + ] + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": 1, + "description": "Controls randomness of the generated output", + "maximum": 1, + "minimum": 1, + "title": "Temperature", + "type": "float", + "required": false + }, + "tool_choice": { + "default": null, + "description": "auto, none literals or valid function name string", + "title": "Tool Choice", + "type": "string", + "required": false, + "examples": [ + "auto", + "none", + "" + ] + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 1, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Gpt 5.2", + "model_path": [ + "/openai/deployments/gpt-5.2/chat/completions" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "bedrock", + "creator": "amazon", + "model_name": "Nova-Lite", + "description": "Amazon Nova Chat Completions model", + "model_mapping_id": "nova-lite-v1", + "name": "nova-lite-v1", + "input_tokens": 300000, + "output_tokens": 10000, + "type": "chat_completion", + "model_id": "us.amazon.nova-lite-v1:0", + "default_model": false, + "version": "1", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "application/pdf", + "application/docx", + "application/doc", + "text/csv", + "application/xls", + "application/xlsx", + "text/html", + "text/plain", + "text/markdown", + "image/png", + "image/jpeg", + "image/jpg", + "image/gif", + "image/webp", + "video/mp4", + "video/mov", + "video/mkv", + "video/webm", + "video/flv", + "video/mpeg", + "video/mpg", + "video/wmv", + "video/3gp" + ], + "functions": false, + "parallel_function_calling": false, + "json_mode": false, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 10000, + "title": "Max output Tokens", + "type": "integer", + "required": false + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": null, + "description": "Controls randomness of the generated output", + "maximum": 1, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Nova Lite", + "model_path": [ + "/amazon/deployments/nova-lite-v1/converse", + "/amazon/deployments/nova-lite-v1/converse-stream" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "azure", + "creator": "openai", + "model_name": "GPT-4o", + "description": "OpenAI Chat Completions model", + "model_mapping_id": "gpt-4o-2024-08-06", + "name": "gpt-4o-2024-08-06", + "input_tokens": 128000, + "output_tokens": 16384, + "type": "chat_completion", + "model_id": "gpt-4o-2024-08-06", + "default_model": false, + "version": "2024-08-06", + "deprecation_info": { + "is_deprecated": false, + "scheduled_deprecation_date": "2026-05-01" + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/png", + "image/jpeg", + "image/gif", + "image/webp" + ], + "functions": true, + "parallel_function_calling": true, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 16384, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "n": { + "default": null, + "description": "number of completions to generate", + "title": "N", + "type": "integer", + "required": false + }, + "parallel_tool_calls": { + "default": null, + "description": "parallel tool calls", + "title": "Parallel Tool Calls", + "type": "boolean", + "required": false + }, + "response_format": { + "default": null, + "description": "response_format", + "title": "Response Format", + "type": "object", + "required": false, + "examples": [ + "text", + "json_object" + ] + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": null, + "description": "Controls randomness of the generated output", + "maximum": 2, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "tool_choice": { + "default": null, + "description": "auto, none literals or valid function name string", + "title": "Tool Choice", + "type": "string", + "required": false, + "examples": [ + "auto", + "none", + "" + ] + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "alternate_model_info": { + "name": "gpt-4.1-2025-04-14", + "provider": "azure", + "creator": "openai" + }, + "model_label": "Gpt 4o", + "model_path": [ + "/openai/deployments/gpt-4o/chat/completions" + ], + "lifecycle": "Nearing Deprecation", + "deprecation_date": "2026-05-01" + }, + { + "provider": "azure", + "creator": "openai", + "model_name": "GPT-5-Nano", + "description": "OpenAI Chat Completions model", + "model_mapping_id": "gpt-5-nano-2025-08-07", + "name": "gpt-5-nano-2025-08-07", + "input_tokens": 400000, + "output_tokens": 128000, + "type": "chat_completion", + "model_id": "gpt-5-nano-2025-08-07", + "default_model": false, + "version": "2025-08-07", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/png", + "image/jpeg", + "image/gif", + "image/webp" + ], + "functions": true, + "parallel_function_calling": true, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_completion_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 128000, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "n": { + "default": null, + "description": "number of completions to generate", + "title": "N", + "type": "integer", + "required": false + }, + "parallel_tool_calls": { + "default": null, + "description": "parallel tool calls", + "title": "Parallel Tool Calls", + "type": "boolean", + "required": false + }, + "response_format": { + "default": null, + "description": "response_format", + "title": "Response Format", + "type": "object", + "required": false, + "examples": [ + "text", + "json_object" + ] + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": 1, + "description": "Controls randomness of the generated output", + "maximum": 1, + "minimum": 1, + "title": "Temperature", + "type": "float", + "required": false + }, + "tool_choice": { + "default": null, + "description": "auto, none literals or valid function name string", + "title": "Tool Choice", + "type": "string", + "required": false, + "examples": [ + "auto", + "none", + "" + ] + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 1, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Gpt 5 Nano", + "model_path": [ + "/openai/deployments/gpt-5-nano/chat/completions" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "azure", + "creator": "openai", + "model_name": "GPT-5.1", + "description": "OpenAI Chat Completions model", + "model_mapping_id": "gpt-5.1-2025-11-13", + "name": "gpt-5.1-2025-11-13", + "input_tokens": 400000, + "output_tokens": 128000, + "type": "chat_completion", + "model_id": "gpt-5.1-2025-11-13", + "default_model": false, + "version": "2025-11-13", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/png", + "image/jpeg", + "image/gif", + "image/webp" + ], + "functions": true, + "parallel_function_calling": true, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_completion_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 128000, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "n": { + "default": null, + "description": "number of completions to generate", + "title": "N", + "type": "integer", + "required": false + }, + "parallel_tool_calls": { + "default": null, + "description": "parallel tool calls", + "title": "Parallel Tool Calls", + "type": "boolean", + "required": false + }, + "response_format": { + "default": null, + "description": "response_format", + "title": "Response Format", + "type": "object", + "required": false, + "examples": [ + "text", + "json_object" + ] + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": 1, + "description": "Controls randomness of the generated output", + "maximum": 1, + "minimum": 1, + "title": "Temperature", + "type": "float", + "required": false + }, + "tool_choice": { + "default": null, + "description": "auto, none literals or valid function name string", + "title": "Tool Choice", + "type": "string", + "required": false, + "examples": [ + "auto", + "none", + "" + ] + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 1, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Gpt 5.1", + "model_path": [ + "/openai/deployments/gpt-5.1/chat/completions" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "bedrock", + "creator": "anthropic", + "model_name": "Claude-37-Sonnet", + "description": "Claude3 Chat Completions model", + "model_mapping_id": "claude-3-7-sonnet", + "name": "claude-3-7-sonnet", + "input_tokens": 200000, + "output_tokens": 128000, + "type": "chat_completion", + "model_id": "us.anthropic.claude-3-7-sonnet-20250219-v1:0", + "default_model": false, + "version": "3", + "deprecation_info": { + "is_deprecated": false, + "scheduled_deprecation_date": "2026-04-28" + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/jpeg", + "image/png", + "image/gif", + "image/webp", + "application/pdf", + "application/docx", + "text/csv", + "application/doc", + "application/xls", + "application/xlsx", + "text/html", + "text/plain", + "text/markdown" + ], + "functions": true, + "parallel_function_calling": false, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": 128000, + "description": "Maximum number of tokens to generate", + "maximum": 128000, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": null, + "description": "Controls randomness of the generated output", + "maximum": 1, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "alternate_model_info": { + "name": "claude-sonnet-4-5", + "provider": "bedrock", + "creator": "anthropic" + }, + "model_label": "Claude 3.7 Sonnet", + "model_path": [ + "/anthropic/deployments/claude-3-7-sonnet/converse", + "/anthropic/deployments/claude-3-7-sonnet/converse-stream" + ], + "lifecycle": "Nearing Deprecation", + "deprecation_date": "2026-04-28" + }, + { + "provider": "vertex", + "creator": "google", + "model_name": "Gemini-Pro", + "description": "Gemini Chat Completions model", + "model_mapping_id": "gemini-1.5-pro", + "name": "gemini-1.5-pro", + "input_tokens": 1048576, + "output_tokens": 8192, + "type": "chat_completion", + "model_id": "gemini-1.5-pro", + "default_model": false, + "version": "1.5-pro", + "deprecation_info": { + "is_deprecated": true, + "scheduled_deprecation_date": "2025-09-24" + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "application/pdf", + "text/csv", + "text/plain", + "text/markdown", + "text/html", + "text/xml", + "audio/wav", + "audio/mp3", + "audio/aiff", + "audio/aac", + "audio/ogg", + "audio/flac", + "image/png", + "image/jpeg", + "image/webp", + "image/heic", + "image/heif", + "video/mp4", + "video/mpeg", + "video/mov", + "video/avi", + "video/x-flv", + "video/mpg", + "video/webm", + "video/wmv", + "video/3gpp" + ], + "functions": true, + "parallel_function_calling": false, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 8192, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "temperature": { + "default": 1, + "description": "Controls randomness of the generated output", + "maximum": 2, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "top_p": { + "default": 0.95, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "alternate_model_info": { + "name": "gemini-2.5-pro", + "provider": "vertex", + "creator": "google" + }, + "model_label": "Gemini 1.5 Pro", + "model_path": [ + "/google/deployments/gemini-1.5-pro/chat/completions" + ], + "lifecycle": "Deprecated", + "deprecation_date": "2025-09-24" + }, + { + "provider": "azure", + "creator": "openai", + "model_name": "GPT-5-Mini", + "description": "OpenAI Chat Completions model", + "model_mapping_id": "gpt-5-mini-2025-08-07", + "name": "gpt-5-mini-2025-08-07", + "input_tokens": 400000, + "output_tokens": 128000, + "type": "chat_completion", + "model_id": "gpt-5-mini-2025-08-07", + "default_model": false, + "version": "2025-08-07", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/png", + "image/jpeg", + "image/gif", + "image/webp" + ], + "functions": true, + "parallel_function_calling": true, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_completion_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 128000, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "n": { + "default": null, + "description": "number of completions to generate", + "title": "N", + "type": "integer", + "required": false + }, + "parallel_tool_calls": { + "default": null, + "description": "parallel tool calls", + "title": "Parallel Tool Calls", + "type": "boolean", + "required": false + }, + "response_format": { + "default": null, + "description": "response_format", + "title": "Response Format", + "type": "object", + "required": false, + "examples": [ + "text", + "json_object" + ] + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": 1, + "description": "Controls randomness of the generated output", + "maximum": 1, + "minimum": 1, + "title": "Temperature", + "type": "float", + "required": false + }, + "tool_choice": { + "default": null, + "description": "auto, none literals or valid function name string", + "title": "Tool Choice", + "type": "string", + "required": false, + "examples": [ + "auto", + "none", + "" + ] + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 1, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Gpt 5 Mini", + "model_path": [ + "/openai/deployments/gpt-5-mini/chat/completions" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "bedrock", + "creator": "amazon", + "model_name": "Nova-Pro", + "description": "Amazon Nova Chat Completions model", + "model_mapping_id": "nova-pro-v1", + "name": "nova-pro-v1", + "input_tokens": 300000, + "output_tokens": 10000, + "type": "chat_completion", + "model_id": "us.amazon.nova-pro-v1:0", + "default_model": false, + "version": "1", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "application/pdf", + "application/docx", + "application/doc", + "text/csv", + "application/xls", + "application/xlsx", + "text/html", + "text/plain", + "text/markdown", + "image/png", + "image/jpeg", + "image/jpg", + "image/gif", + "image/webp", + "video/mp4", + "video/mov", + "video/mkv", + "video/webm", + "video/flv", + "video/mpeg", + "video/mpg", + "video/wmv", + "video/3gp" + ], + "functions": false, + "parallel_function_calling": false, + "json_mode": false, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 10000, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "temperature": { + "default": null, + "description": "Controls randomness of the generated output", + "maximum": 1, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Nova Pro", + "model_path": [ + "/amazon/deployments/nova-pro-v1/converse", + "/amazon/deployments/nova-pro-v1/converse-stream" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "bedrock", + "creator": "amazon", + "model_name": "Nova-2-Lite", + "description": "Amazon Nova Chat Completions model", + "model_mapping_id": "nova-2-lite-v1", + "name": "nova-2-lite-v1", + "input_tokens": 1000000, + "output_tokens": 65535, + "type": "chat_completion", + "model_id": "us.amazon.nova-2-lite-v1:0", + "default_model": false, + "version": "1", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "application/pdf", + "application/docx", + "application/doc", + "text/csv", + "application/xls", + "application/xlsx", + "text/html", + "text/plain", + "text/markdown", + "image/png", + "image/jpeg", + "image/jpg", + "image/gif", + "image/webp", + "video/mp4", + "video/mov", + "video/mkv", + "video/webm", + "video/flv", + "video/mpeg", + "video/mpg", + "video/wmv", + "video/3gp" + ], + "functions": false, + "parallel_function_calling": false, + "json_mode": false, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 65535, + "title": "Max output Tokens", + "type": "integer", + "required": false + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": null, + "description": "Controls randomness of the generated output", + "maximum": 1, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Nova 2 Lite", + "model_path": [ + "/amazon/deployments/nova-2-lite-v1/converse", + "/amazon/deployments/nova-2-lite-v1/converse-stream" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "vertex", + "creator": "google", + "model_name": "Gemini-Flash", + "description": "Gemini Chat Completions model", + "model_mapping_id": "gemini-2.5-flash", + "name": "gemini-2.5-flash", + "input_tokens": 1048576, + "output_tokens": 65535, + "type": "chat_completion", + "model_id": "gemini-2.5-flash", + "default_model": false, + "version": "2.5-flash", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "application/pdf", + "text/csv", + "text/plain", + "text/markdown", + "text/html", + "text/xml", + "audio/wav", + "audio/mp3", + "audio/aiff", + "audio/aac", + "audio/ogg", + "audio/flac", + "image/png", + "image/jpeg", + "image/webp", + "image/heic", + "image/heif", + "video/mp4", + "video/mpeg", + "video/mov", + "video/avi", + "video/x-flv", + "video/mpg", + "video/webm", + "video/wmv", + "video/3gpp" + ], + "functions": true, + "parallel_function_calling": false, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 65535, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "n": { + "default": null, + "description": "number of completions to generate", + "title": "N", + "type": "integer", + "required": false + }, + "response_format": { + "default": null, + "description": "response_format", + "title": "Response Format", + "type": "object", + "required": false, + "examples": [ + "text", + "json_object" + ] + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "array", + "required": false + }, + "temperature": { + "default": 1, + "description": "Controls randomness of the generated output", + "maximum": 2, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "thinking_budget": { + "default": null, + "description": "Allows control over hov much model thinks during its responses", + "maximum": 24576, + "minimum": 1, + "title": "Thinking Budget", + "type": "integer", + "required": false + }, + "tool_choice": { + "default": null, + "description": "auto, none literals or valid function name string", + "title": "Tool Choice", + "type": "string", + "required": false, + "examples": [ + "auto", + "none", + "" + ] + }, + "top_p": { + "default": 0.95, + "description": "Nucleus sampling parameter controlling diversity of the output", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Gemini 2.5 Flash", + "model_path": [ + "/google/deployments/gemini-2.5-flash/chat/completions" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "azure", + "creator": "openai", + "model_name": "GPT-4o-Mini", + "description": "OpenAI Chat Completions model", + "model_mapping_id": "gpt-4o-mini-2024-07-18", + "name": "gpt-4o-mini-2024-07-18", + "input_tokens": 128000, + "output_tokens": 16000, + "type": "chat_completion", + "model_id": "gpt-4o-mini-2024-07-18", + "default_model": false, + "version": "2024-07-18", + "deprecation_info": { + "is_deprecated": false, + "scheduled_deprecation_date": "2026-05-01" + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/png", + "image/jpeg", + "image/gif", + "image/webp" + ], + "functions": true, + "parallel_function_calling": true, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 16000, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "n": { + "default": null, + "description": "number of completions to generate", + "title": "N", + "type": "integer", + "required": false + }, + "parallel_tool_calls": { + "default": null, + "description": "parallel tool calls", + "title": "Parallel Tool Calls", + "type": "boolean", + "required": false + }, + "response_format": { + "default": null, + "description": "response_format", + "title": "Response Format", + "type": "object", + "required": false, + "examples": [ + "text", + "json_object" + ] + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": null, + "description": "Controls randomness of the generated output", + "maximum": 2, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "tool_choice": { + "default": null, + "description": "auto, none literals or valid function name string", + "title": "Tool Choice", + "type": "string", + "required": false, + "examples": [ + "auto", + "none", + "" + ] + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "alternate_model_info": { + "name": "gpt-4.1-mini-2025-04-14", + "provider": "azure", + "creator": "openai" + }, + "model_label": "Gpt 4o Mini", + "model_path": [ + "/openai/deployments/gpt-4o-mini/chat/completions" + ], + "lifecycle": "Nearing Deprecation", + "deprecation_date": "2026-05-01" + }, + { + "provider": "bedrock", + "creator": "amazon", + "model_name": "Nova-Micro", + "description": "Amazon Nova Chat Completions model", + "model_mapping_id": "nova-micro-v1", + "name": "nova-micro-v1", + "input_tokens": 128000, + "output_tokens": 10000, + "type": "chat_completion", + "model_id": "us.amazon.nova-micro-v1:0", + "default_model": false, + "version": "1", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "application/pdf", + "application/docx", + "application/doc", + "text/csv", + "application/xls", + "application/xlsx", + "text/html", + "text/plain", + "text/markdown", + "image/png", + "image/jpeg", + "image/jpg", + "image/gif", + "image/webp", + "video/mp4", + "video/mov", + "video/mkv", + "video/webm", + "video/flv", + "video/mpeg", + "video/mpg", + "video/wmv", + "video/3gp" + ], + "functions": false, + "parallel_function_calling": false, + "json_mode": false, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 10000, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "temperature": { + "default": null, + "description": "Controls randomness of the generated output", + "maximum": 1, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Nova Micro", + "model_path": [ + "/amazon/deployments/nova-micro-v1/converse", + "/amazon/deployments/nova-micro-v1/converse-stream" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "bedrock", + "creator": "anthropic", + "model_name": "Claude-3-Haiku", + "description": "Claude3 Chat Completions model", + "model_mapping_id": "claude-3-haiku", + "name": "claude-3-haiku", + "input_tokens": 200000, + "output_tokens": 4096, + "type": "chat_completion", + "model_id": "us.anthropic.claude-3-haiku-20240307-v1:0", + "default_model": false, + "version": "3", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/jpeg", + "image/png", + "image/gif", + "image/webp", + "application/pdf", + "application/docx", + "text/csv", + "application/doc", + "application/xls", + "application/xlsx", + "text/html", + "text/plain", + "text/markdown" + ], + "functions": true, + "parallel_function_calling": false, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": 4096, + "description": "Maximum number of tokens to generate", + "maximum": 4096, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": 0.7, + "description": "Controls randomness of the generated output", + "maximum": 1, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "top_p": { + "default": 1, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "alternate_model_info": { + "name": "claude-haiku-4-5", + "provider": "bedrock", + "creator": "anthropic" + }, + "model_label": "Claude 3 Haiku", + "model_path": [ + "/anthropic/deployments/claude-3-haiku/converse", + "/anthropic/deployments/claude-3-haiku/converse-stream" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "bedrock", + "creator": "anthropic", + "model_name": "Claude-35-Sonnet", + "description": "Claude3 Chat Completions model", + "model_mapping_id": "claude-3-5-sonnet", + "name": "claude-3-5-sonnet", + "input_tokens": 200000, + "output_tokens": 8192, + "type": "chat_completion", + "model_id": "anthropic.claude-3-5-sonnet-20241022-v2:0", + "default_model": false, + "version": "3", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/jpeg", + "image/png", + "image/gif", + "image/webp", + "application/pdf", + "application/docx", + "text/csv", + "application/doc", + "application/xls", + "application/xlsx", + "text/html", + "text/plain", + "text/markdown" + ], + "functions": true, + "parallel_function_calling": false, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": 8192, + "description": "Maximum number of tokens to generate", + "maximum": 8192, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": 0.7, + "description": "Controls randomness of the generated output", + "maximum": 1, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "alternate_model_info": { + "name": "claude-sonnet-4-5", + "provider": "bedrock", + "creator": "anthropic" + }, + "model_label": "Claude 3.5 Sonnet", + "model_path": [ + "/anthropic/deployments/claude-3-5-sonnet/converse-stream", + "/anthropic/deployments/claude-3-5-sonnet/converse" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "vertex", + "creator": "google", + "model_name": "Gemini-Pro", + "description": "Gemini Chat Completions model", + "model_mapping_id": "gemini-2.5-pro", + "name": "gemini-2.5-pro", + "input_tokens": 1048576, + "output_tokens": 65535, + "type": "chat_completion", + "model_id": "gemini-2.5-pro", + "default_model": false, + "version": "2.5-pro", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "application/pdf", + "text/csv", + "text/plain", + "text/markdown", + "text/html", + "text/xml", + "audio/wav", + "audio/mp3", + "audio/aiff", + "audio/aac", + "audio/ogg", + "audio/flac", + "image/png", + "image/jpeg", + "image/webp", + "image/heic", + "image/heif", + "video/mp4", + "video/mpeg", + "video/mov", + "video/avi", + "video/x-flv", + "video/mpg", + "video/webm", + "video/wmv", + "video/3gpp" + ], + "functions": true, + "parallel_function_calling": false, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 65535, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "n": { + "default": null, + "description": "number of completions to generate", + "title": "N", + "type": "integer", + "required": false + }, + "response_format": { + "default": null, + "description": "response_format", + "title": "Response Format", + "type": "object", + "required": false, + "examples": [ + "text", + "json_object" + ] + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "array", + "required": false + }, + "temperature": { + "default": 1, + "description": "Controls randomness of the generated output", + "maximum": 2, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "thinking_budget": { + "default": null, + "description": "Allows control over hov much model thinks during its responses", + "maximum": 32768, + "minimum": 128, + "title": "Thinking Budget", + "type": "integer", + "required": false + }, + "tool_choice": { + "default": null, + "description": "auto, none literals or valid function name string", + "title": "Tool Choice", + "type": "string", + "required": false, + "examples": [ + "auto", + "none", + "" + ] + }, + "top_p": { + "default": 0.95, + "description": "Nucleus sampling parameter controlling diversity of the output", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Gemini 2.5 Pro", + "model_path": [ + "/google/deployments/gemini-2.5-pro/chat/completions" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "bedrock", + "creator": "amazon", + "model_name": "Nova-Premier", + "description": "Amazon Nova Chat Completions model", + "model_mapping_id": "nova-premier-v1", + "name": "nova-premier-v1", + "input_tokens": 1000000, + "output_tokens": 10000, + "type": "chat_completion", + "model_id": "us.amazon.nova-premier-v1:0", + "default_model": false, + "version": "1", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "application/pdf", + "application/docx", + "application/doc", + "text/csv", + "application/xls", + "application/xlsx", + "text/html", + "text/plain", + "text/markdown", + "image/png", + "image/jpeg", + "image/jpg", + "image/gif", + "image/webp", + "video/mp4", + "video/mov", + "video/mkv", + "video/webm", + "video/flv", + "video/mpeg", + "video/mpg", + "video/wmv", + "video/3gp" + ], + "functions": true, + "parallel_function_calling": false, + "json_mode": false, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 10000, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "temperature": { + "default": null, + "description": "Controls randomness of the generated output", + "maximum": 1, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Nova Premier", + "model_path": [ + "/amazon/deployments/nova-premier-v1/converse-stream", + "/amazon/deployments/nova-premier-v1/converse" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "bedrock", + "creator": "anthropic", + "model_name": "Claude-Sonnet-45", + "description": "Claude Chat Completions model", + "model_mapping_id": "claude-sonnet-4-5", + "name": "claude-sonnet-4-5", + "input_tokens": 200000, + "output_tokens": 64000, + "type": "chat_completion", + "model_id": "us.anthropic.claude-sonnet-4-5-20250929-v1:0", + "default_model": false, + "version": "1.0", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/jpeg", + "image/png", + "image/gif", + "image/webp", + "application/pdf", + "application/docx", + "text/csv", + "application/doc", + "application/xls", + "application/xlsx", + "text/html", + "text/plain", + "text/markdown" + ], + "functions": true, + "parallel_function_calling": false, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": 64000, + "description": "Maximum number of tokens to generate", + "maximum": 64000, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": null, + "description": "Controls randomness of the generated output", + "maximum": 1, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Claude Sonnet 4.5", + "model_path": [ + "/anthropic/deployments/claude-sonnet-4-5/converse-stream", + "/anthropic/deployments/claude-sonnet-4-5/converse" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "vertex", + "creator": "google", + "model_name": "Gemini-Flash", + "description": "Gemini Chat Completions model", + "model_mapping_id": "gemini-2.0-flash", + "name": "gemini-2.0-flash", + "input_tokens": 1048576, + "output_tokens": 8192, + "type": "chat_completion", + "model_id": "gemini-2.0-flash", + "default_model": false, + "version": "2.0-flash", + "deprecation_info": { + "is_deprecated": false, + "scheduled_deprecation_date": "2026-06-01" + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "application/pdf", + "text/csv", + "text/plain", + "text/markdown", + "text/html", + "text/xml", + "audio/wav", + "audio/mp3", + "audio/aiff", + "audio/aac", + "audio/ogg", + "audio/flac", + "image/png", + "image/jpeg", + "image/webp", + "image/heic", + "image/heif", + "video/mp4", + "video/mpeg", + "video/mov", + "video/avi", + "video/x-flv", + "video/mpg", + "video/webm", + "video/wmv", + "video/3gpp" + ], + "functions": true, + "parallel_function_calling": false, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 8192, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "n": { + "default": null, + "description": "number of completions to generate", + "title": "N", + "type": "integer", + "required": false + }, + "response_format": { + "default": null, + "description": "response_format", + "title": "Response Format", + "type": "object", + "required": false, + "examples": [ + "text", + "json_object" + ] + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": 1, + "description": "Controls randomness of the generated output", + "maximum": 2, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "tool_choice": { + "default": null, + "description": "auto, none literals or valid function name string", + "title": "Tool Choice", + "type": "string", + "required": false, + "examples": [ + "auto", + "none", + "" + ] + }, + "top_p": { + "default": 0.95, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "alternate_model_info": { + "name": "gemini-2.5-flash", + "provider": "vertex", + "creator": "google" + }, + "model_label": "Gemini 2.0 Flash", + "model_path": [ + "/google/deployments/gemini-2.0-flash/chat/completions" + ], + "lifecycle": "Nearing Deprecation", + "deprecation_date": "2026-06-01" + }, + { + "provider": "azure", + "creator": "openai", + "model_name": "GPT-4.1-Nano", + "description": "OpenAI Chat Completions model", + "model_mapping_id": "gpt-4.1-nano-2025-04-14", + "name": "gpt-4.1-nano-2025-04-14", + "input_tokens": 1047576, + "output_tokens": 32768, + "type": "chat_completion", + "model_id": "gpt-4.1-nano-2025-04-14", + "default_model": false, + "version": "2025-04-14", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/png", + "image/jpeg", + "image/gif", + "image/webp" + ], + "functions": true, + "parallel_function_calling": true, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": null, + "description": "Maximum number of tokens to generate", + "maximum": 32768, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "n": { + "default": null, + "description": "number of completions to generate", + "title": "N", + "type": "integer", + "required": false + }, + "parallel_tool_calls": { + "default": null, + "description": "parallel tool calls", + "title": "Parallel Tool Calls", + "type": "boolean", + "required": false + }, + "response_format": { + "default": null, + "description": "response_format", + "title": "Response Format", + "type": "object", + "required": false, + "examples": [ + "text", + "json_object" + ] + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": 1, + "description": "Controls randomness of the generated output", + "maximum": 2, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "tool_choice": { + "default": null, + "description": "auto, none literals or valid function name string", + "title": "Tool Choice", + "type": "string", + "required": false, + "examples": [ + "auto", + "none", + "" + ] + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Gpt 4.1 Nano", + "model_path": [ + "/openai/deployments/gpt-4.1-nano/chat/completions" + ], + "lifecycle": "Generally Available" + }, + { + "provider": "bedrock", + "creator": "anthropic", + "model_name": "Claude-Haiku-45", + "description": "Claude Chat Completions model", + "model_mapping_id": "claude-haiku-4-5", + "name": "claude-haiku-4-5", + "input_tokens": 200000, + "output_tokens": 64000, + "type": "chat_completion", + "model_id": "us.anthropic.claude-haiku-4-5-20251001-v1:0", + "default_model": false, + "version": "1.0", + "deprecation_info": { + "is_deprecated": false + }, + "supported_capabilities": { + "streaming": true, + "multimodal": [ + "image/jpeg", + "image/png", + "image/gif", + "image/webp", + "application/pdf", + "application/docx", + "text/csv", + "application/doc", + "application/xls", + "application/xlsx", + "text/html", + "text/plain", + "text/markdown" + ], + "functions": true, + "parallel_function_calling": false, + "json_mode": true, + "is_multimodal": true + }, + "parameters": { + "max_tokens": { + "default": 64000, + "description": "Maximum number of tokens to generate", + "maximum": 64000, + "title": "Max Output Tokens", + "type": "integer", + "required": false + }, + "stop": { + "default": null, + "description": "Allows to define sequences causing model to terminate generation", + "title": "Stop Sequences", + "type": "string", + "required": false + }, + "temperature": { + "default": null, + "description": "Controls randomness of the generated output", + "maximum": 1, + "minimum": 0, + "title": "Temperature", + "type": "float", + "required": false + }, + "top_p": { + "default": null, + "description": "Nucleus sampling parameter controlling diversity of the output.", + "maximum": 1, + "minimum": 0, + "title": "Top P", + "type": "float", + "required": false + } + }, + "model_label": "Claude Haiku 4.5", + "model_path": [ + "/anthropic/deployments/claude-haiku-4-5/converse", + "/anthropic/deployments/claude-haiku-4-5/converse-stream" + ], + "lifecycle": "Generally Available" + } +] diff --git a/charts/backingservices/charts/autopilot/templates/_helpers.tpl b/charts/backingservices/charts/autopilot/templates/_helpers.tpl new file mode 100644 index 000000000..882d0dd06 --- /dev/null +++ b/charts/backingservices/charts/autopilot/templates/_helpers.tpl @@ -0,0 +1,122 @@ +{{- /* +imagePullSecret +backingservicesRegistrySecret +deploymentName +tlssecretsnippet +backingservices.gke.backendConfig +podAffinity +tolerations +are copied from backingservices/templates/_supplemental.tpl because helm lint requires +charts to render standalone. See: https://github.com/helm/helm/issues/11260 for more details. +*/}} + +{{- define "imagePullSecret" }} +{{- printf "{\"auths\": {\"%s\": {\"auth\": \"%s\"}}}" .Values.docker.registry.url (printf "%s:%s" .Values.docker.registry.username .Values.docker.registry.password | b64enc) | b64enc }} +{{- end }} + +{{- define "backingservicesRegistrySecret" }} +{{- $depName := printf "%s" (include "deploymentName" (dict "root" .root "defaultname" .defaultname )) -}} +{{- $depName -}}-registry-secret +{{- end }} + +{{- define "deploymentName" }}{{ $deploymentNamePrefix := .defaultname }}{{ if (.root.deployment) }}{{ if (.root.deployment.name) }}{{ $deploymentNamePrefix = .root.deployment.name }}{{ end }}{{ end }}{{ if (.root.name) }}{{ $deploymentNamePrefix = .root.name }}{{ end }}{{ $deploymentNamePrefix }}{{- end }} + +{{- define "tlssecretsnippet" -}} +tls: +- hosts: + - {{ include "domainName" (dict "node" .node) }} + secretName: {{ .node.ingress.tls.secretName }} +{{- end -}} + +{{- define "domainName" }} + {{- if .node.ingress -}} + {{- if .node.ingress.domain -}} + {{ .node.ingress.domain }} + {{- end -}} + {{- else if .node.service.domain -}} + {{ .node.service.domain }} + {{- end -}} +{{- end }} + +{{- define "podAffinity" }} +{{- if .affinity }} +affinity: +{{- toYaml .affinity | nindent 2 }} +{{- end }} +{{ end }} + +{{- define "tolerations" }} +{{- if .tolerations }} +tolerations: +{{- toYaml .tolerations | nindent 2 }} +{{- end }} +{{ end }} + +{{/* +Validates and returns the OAuth public key URL when auth is enabled. +Fails render if authEnabled is true but oauthPublicKeyURL is not set. +*/}} +{{- define "autopilot.oauthPublicKeyUrl" -}} +{{- if .Values.authEnabled }} + {{- if .Values.oauthPublicKeyURL }} + {{- .Values.oauthPublicKeyURL | quote }} + {{- else }} + {{- fail "A valid entry is required for oauthPublicKeyURL when authEnabled is true. Set oauthPublicKeyURL to the IdP public key endpoint so the Autopilot service can validate incoming tokens from Pega Infinity." | quote }} + {{- end }} +{{- end }} +{{- end }} + +{{/* +Autopilot secret name - uses pre-existing secret or auto-generated one +*/}} +{{- define "autopilot.credentialsSecretName" -}} +{{- if .Values.providerCredentialsSecret -}} +{{- .Values.providerCredentialsSecret -}} +{{- else -}} +{{- $depName := include "deploymentName" (dict "root" .Values "defaultname" "autopilot") -}} +{{- printf "%s-provider-credentials" $depName -}} +{{- end -}} +{{- end -}} + +{{/* +Check if any inline credentials are provided +*/}} +{{- define "autopilot.hasInlineCredentials" -}} +{{- if or .Values.azure.endpoint .Values.azure.apiKey .Values.aws.accessKeyId .Values.vertex.credentials -}} +true +{{- end -}} +{{- end -}} + +{{/* +Check if any models config is provided (inline, existing configmap, or use default bundled file) +*/}} +{{- define "autopilot.hasCustomModels" -}} +{{- if and .Values.customModels (or .Values.customModels.existingConfigMap .Values.customModels.inline) -}} +true +{{- else if .Values.deployModelsConfigMap -}} +true +{{- end -}} +{{- end -}} + +{{/* +Check if we need to create a ConfigMap (inline or default file, but NOT existing configmap) +*/}} +{{- define "autopilot.hasModelsConfig" -}} +{{- if and .Values.customModels .Values.customModels.inline -}} +true +{{- else if .Values.deployModelsConfigMap -}} +true +{{- end -}} +{{- end -}} + +{{/* +Resolve the models ConfigMap name +*/}} +{{- define "autopilot.modelsConfigMapName" -}} +{{- if and .Values.customModels .Values.customModels.existingConfigMap -}} +{{- .Values.customModels.existingConfigMap -}} +{{- else -}} +{{- $depName := include "deploymentName" (dict "root" .Values "defaultname" "autopilot") -}} +{{- printf "%s-models" $depName -}} +{{- end -}} +{{- end -}} diff --git a/charts/backingservices/charts/autopilot/templates/autopilot-configmap.yaml b/charts/backingservices/charts/autopilot/templates/autopilot-configmap.yaml new file mode 100644 index 000000000..88a0e3b7d --- /dev/null +++ b/charts/backingservices/charts/autopilot/templates/autopilot-configmap.yaml @@ -0,0 +1,20 @@ +{{- $depName := printf "%s" (include "deploymentName" (dict "root" .Values "defaultname" "autopilot" )) }} +{{- if .Values.enabled }} +{{- if not (and .Values.customModels .Values.customModels.existingConfigMap) }} +{{- if (include "autopilot.hasModelsConfig" .) }} +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ $depName }}-models + namespace: {{ .Release.Namespace }} +data: + {{- if and .Values.customModels .Values.customModels.inline }} + models.json: |- +{{ .Values.customModels.inline | indent 4 }} + {{- else }} + models.json: |- +{{ .Files.Get "files/default-models.json" | indent 4 }} + {{- end }} +{{- end }} +{{ end }} +{{ end }} diff --git a/charts/backingservices/charts/autopilot/templates/autopilot-deployment.yaml b/charts/backingservices/charts/autopilot/templates/autopilot-deployment.yaml new file mode 100644 index 000000000..dae6a1d67 --- /dev/null +++ b/charts/backingservices/charts/autopilot/templates/autopilot-deployment.yaml @@ -0,0 +1,153 @@ +{{- $depName := printf "%s" (include "deploymentName" (dict "root" .Values "defaultname" "autopilot" )) }} +{{- if .Values.enabled }} +kind: Deployment +apiVersion: apps/v1 +metadata: + name: {{ $depName }} + labels: + app: {{ $depName }} + {{- if and (.Values.deployment) (.Values.deployment.labels) }} + {{ toYaml .Values.deployment.labels | nindent 4 }} + {{- end }} +spec: + replicas: {{ .Values.replicas }} + selector: + matchLabels: + app: {{ $depName }} + template: + metadata: + labels: + app: {{ $depName }} + {{- if .Values.podLabels }} + {{ toYaml .Values.podLabels | nindent 8 }} + {{- end }} + {{- if .Values.podAnnotations }} + annotations: + {{ toYaml .Values.podAnnotations | nindent 8 }} + {{- end }} + spec: + containers: + - name: autopilot + image: {{ .Values.docker.autopilot.image }} + imagePullPolicy: {{ .Values.docker.autopilot.imagePullPolicy }} + ports: + - containerPort: {{ .Values.service.targetPort }} + resources: + {{- if .Values.resources }} + {{ toYaml .Values.resources | nindent 10 }} + {{- end }} + securityContext: + {{- if .Values.securityContext }} + {{ toYaml .Values.securityContext | nindent 10 }} + {{- end }} + {{- if (include "autopilot.hasCustomModels" .) }} + volumeMounts: + - name: custom-models + mountPath: /config/models.json + subPath: models.json + readOnly: true + {{- end }} + env: + # Core service configuration + - name: ENABLE_GENAI_HUB + value: {{ .Values.enableGenaiHub | quote }} + - name: AUTH_ENABLED + value: {{ .Values.authEnabled | quote }} + - name: IS_INTERNAL_DEPLOYMENT + value: {{ .Values.isInternalDeployment | quote }} + - name: BLOCK_UNAUTH_REQUESTS + value: {{ .Values.authEnabled | quote }} + {{- if .Values.authEnabled }} + - name: SAX_JWKS_URL + value: {{ include "autopilot.oauthPublicKeyUrl" . }} + {{- end }} + {{- if .Values.modelProviders }} + - name: MODEL_PROVIDERS + value: {{ .Values.modelProviders | quote }} + {{- end }} + {{- if (include "autopilot.hasCustomModels" .) }} + - name: LOCAL_MODELS_FILE + value: "/config/models.json" + {{- end }} + # AWS Bedrock + {{- if .Values.awsRegion }} + - name: AWS_DEFAULT_REGION + value: {{ .Values.awsRegion | quote }} + - name: AWS_REGION + value: {{ .Values.awsRegion | quote }} + {{- end }} + {{- if or .Values.aws.accessKeyId .Values.providerCredentialsSecret }} + - name: AWS_ACCESS_KEY_ID + valueFrom: + secretKeyRef: + name: {{ include "autopilot.credentialsSecretName" . }} + key: AWS_ACCESS_KEY_ID + optional: true + - name: AWS_SECRET_ACCESS_KEY + valueFrom: + secretKeyRef: + name: {{ include "autopilot.credentialsSecretName" . }} + key: AWS_SECRET_ACCESS_KEY + optional: true + {{- if or .Values.aws.sessionToken .Values.providerCredentialsSecret }} + - name: AWS_SESSION_TOKEN + valueFrom: + secretKeyRef: + name: {{ include "autopilot.credentialsSecretName" . }} + key: AWS_SESSION_TOKEN + optional: true + {{- end }} + {{- end }} + # Azure OpenAI + {{- if or .Values.azure.endpoint .Values.providerCredentialsSecret }} + - name: AZURE_ENDPOINT + valueFrom: + secretKeyRef: + name: {{ include "autopilot.credentialsSecretName" . }} + key: AZURE_ENDPOINT + optional: true + - name: AZURE_OPENAI_KEY + valueFrom: + secretKeyRef: + name: {{ include "autopilot.credentialsSecretName" . }} + key: AZURE_OPENAI_KEY + optional: true + {{- if .Values.azure.apiVersion }} + - name: OPENAI_API_VERSION + value: {{ .Values.azure.apiVersion | quote }} + {{- end }} + {{- end }} + # Google Vertex AI + {{- if or .Values.vertex.credentials .Values.providerCredentialsSecret }} + - name: vertex_auth + valueFrom: + secretKeyRef: + name: {{ include "autopilot.credentialsSecretName" . }} + key: VERTEX_AUTH + optional: true + {{- end }} + {{- if .Values.vertex.location }} + - name: VERTEX_LOCATION + value: {{ .Values.vertex.location | quote }} + {{- end }} + livenessProbe: + initialDelaySeconds: {{ .Values.livenessProbe.initialDelaySeconds }} + timeoutSeconds: {{ .Values.livenessProbe.timeoutSeconds }} + periodSeconds: {{ .Values.livenessProbe.periodSeconds }} + successThreshold: {{ .Values.livenessProbe.successThreshold }} + failureThreshold: {{ .Values.livenessProbe.failureThreshold }} + httpGet: + path: /v1/health + port: {{ .Values.service.targetPort }} + readinessProbe: + initialDelaySeconds: {{ .Values.readinessProbe.initialDelaySeconds }} + timeoutSeconds: {{ .Values.readinessProbe.timeoutSeconds }} + periodSeconds: {{ .Values.readinessProbe.periodSeconds }} + successThreshold: {{ .Values.readinessProbe.successThreshold }} + failureThreshold: {{ .Values.readinessProbe.failureThreshold }} + httpGet: + path: /v1/health + port: {{ .Values.service.targetPort }} +{{- include "podAffinity" .Values | indent 6 }} +{{- include "tolerations" .Values | indent 6 }} +{{ end }} diff --git a/charts/backingservices/charts/autopilot/templates/autopilot-ingress.yaml b/charts/backingservices/charts/autopilot/templates/autopilot-ingress.yaml new file mode 100644 index 000000000..21d44305e --- /dev/null +++ b/charts/backingservices/charts/autopilot/templates/autopilot-ingress.yaml @@ -0,0 +1,36 @@ +{{- $depName := printf "%s" (include "deploymentName" (dict "root" .Values "defaultname" "autopilot" )) }} +{{- if .Values.enabled }} +{{- if and (.Values.ingress) (eq .Values.ingress.enabled true) }} +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: {{ $depName }} + labels: + app: {{ $depName }} + {{- if .Values.ingress.annotations }} + annotations: +{{ toYaml .Values.ingress.annotations | indent 4 }} + {{- end }} +spec: + {{- if .Values.ingress.ingressClassName }} + ingressClassName: {{ .Values.ingress.ingressClassName }} + {{- end }} + rules: + - host: {{ .Values.ingress.domain }} + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: {{ $depName }} + port: + number: {{ .Values.service.port }} + {{- if .Values.ingress.tls.enabled }} + tls: + - hosts: + - {{ .Values.ingress.domain }} + secretName: {{ .Values.ingress.tls.secretName }} + {{- end }} +{{ end }} +{{ end }} diff --git a/charts/backingservices/charts/autopilot/templates/autopilot-poddisruptionbudget.yaml b/charts/backingservices/charts/autopilot/templates/autopilot-poddisruptionbudget.yaml new file mode 100644 index 000000000..e692e5321 --- /dev/null +++ b/charts/backingservices/charts/autopilot/templates/autopilot-poddisruptionbudget.yaml @@ -0,0 +1,17 @@ +{{- $depName := printf "%s" (include "deploymentName" (dict "root" .Values "defaultname" "autopilot" )) }} +{{- if .Values.enabled }} +{{- if (semverCompare ">= 1.21.0-0" (trimPrefix "v" .Capabilities.KubeVersion.GitVersion)) }} +apiVersion: policy/v1 +{{- else }} +apiVersion: policy/v1beta1 +{{- end }} +kind: PodDisruptionBudget +metadata: + name: {{ $depName }} + namespace: {{ .Release.Namespace }} +spec: + minAvailable: 1 + selector: + matchLabels: + app: {{ $depName }} +{{ end }} diff --git a/charts/backingservices/charts/autopilot/templates/autopilot-registry-secret.yaml b/charts/backingservices/charts/autopilot/templates/autopilot-registry-secret.yaml new file mode 100644 index 000000000..62b07537a --- /dev/null +++ b/charts/backingservices/charts/autopilot/templates/autopilot-registry-secret.yaml @@ -0,0 +1,10 @@ +{{- if .Values.enabled }} +apiVersion: v1 +kind: Secret +metadata: + name: {{ include "backingservicesRegistrySecret" ( dict "root" .Values "defaultname" "autopilot" ) }} + namespace: {{ .Release.Namespace }} +type: kubernetes.io/dockerconfigjson +data: + .dockerconfigjson: {{ template "imagePullSecret" . }} +{{ end }} diff --git a/charts/backingservices/charts/autopilot/templates/autopilot-secret.yaml b/charts/backingservices/charts/autopilot/templates/autopilot-secret.yaml new file mode 100644 index 000000000..7dc57797c --- /dev/null +++ b/charts/backingservices/charts/autopilot/templates/autopilot-secret.yaml @@ -0,0 +1,32 @@ +{{- $depName := printf "%s" (include "deploymentName" (dict "root" .Values "defaultname" "autopilot" )) }} +{{- if .Values.enabled }} +{{- if not .Values.providerCredentialsSecret }} +{{- if (include "autopilot.hasInlineCredentials" .) }} +apiVersion: v1 +kind: Secret +metadata: + name: {{ $depName }}-provider-credentials + namespace: {{ .Release.Namespace }} +type: Opaque +data: + {{- if .Values.azure.endpoint }} + AZURE_ENDPOINT: {{ .Values.azure.endpoint | b64enc }} + {{- end }} + {{- if .Values.azure.apiKey }} + AZURE_OPENAI_KEY: {{ .Values.azure.apiKey | b64enc }} + {{- end }} + {{- if .Values.aws.accessKeyId }} + AWS_ACCESS_KEY_ID: {{ .Values.aws.accessKeyId | b64enc }} + {{- end }} + {{- if .Values.aws.secretAccessKey }} + AWS_SECRET_ACCESS_KEY: {{ .Values.aws.secretAccessKey | b64enc }} + {{- end }} + {{- if .Values.aws.sessionToken }} + AWS_SESSION_TOKEN: {{ .Values.aws.sessionToken | b64enc }} + {{- end }} + {{- if .Values.vertex.credentials }} + VERTEX_AUTH: {{ .Values.vertex.credentials | b64enc }} + {{- end }} +{{- end }} +{{- end }} +{{ end }} diff --git a/charts/backingservices/charts/autopilot/templates/autopilot-service.yaml b/charts/backingservices/charts/autopilot/templates/autopilot-service.yaml new file mode 100644 index 000000000..801614e6e --- /dev/null +++ b/charts/backingservices/charts/autopilot/templates/autopilot-service.yaml @@ -0,0 +1,22 @@ +{{- $depName := printf "%s" (include "deploymentName" (dict "root" .Values "defaultname" "autopilot" )) }} +{{- if .Values.enabled }} +apiVersion: v1 +kind: Service +metadata: + name: {{ $depName }} + labels: + app: {{ $depName }} +{{- if and (.Values.service) (.Values.service.annotations) }} + annotations: +{{ toYaml .Values.service.annotations | indent 4 }} +{{- end }} +spec: + selector: + app: {{ $depName }} + ports: + - name: http + protocol: TCP + port: {{ .Values.service.port }} + targetPort: {{ .Values.service.targetPort }} + type: {{ .Values.service.serviceType }} +{{ end }} diff --git a/charts/backingservices/charts/autopilot/values.yaml b/charts/backingservices/charts/autopilot/values.yaml new file mode 100644 index 000000000..2bf2d909f --- /dev/null +++ b/charts/backingservices/charts/autopilot/values.yaml @@ -0,0 +1,161 @@ +--- +# Enable the Autopilot Service deployment as a backing service. +enabled: false + +deployment: + name: "autopilot" + +# ------------------------------------------------------------------- +# Docker image +# ------------------------------------------------------------------- +docker: + # If using a custom Docker registry, supply the credentials here to pull Docker images. + registry: + url: YOUR_DOCKER_REGISTRY_URL + username: YOUR_DOCKER_REGISTRY_USERNAME + password: YOUR_DOCKER_REGISTRY_PASSWORD + # List pre-existing secrets to be used for pulling docker images. + imagePullSecretNames: [] + autopilot: + image: YOUR_AUTOPILOT_SERVICE_IMAGE:TAG + imagePullPolicy: IfNotPresent + +# ------------------------------------------------------------------- +# Replicas & Resources +# ------------------------------------------------------------------- +replicas: 2 + +resources: + requests: + cpu: 500m + memory: "1Gi" + limits: + cpu: "1" + memory: "2Gi" + +# ------------------------------------------------------------------- +# Service +# ------------------------------------------------------------------- +service: + port: 80 + targetPort: 8080 + serviceType: ClusterIP + +# ------------------------------------------------------------------- +# Pod security context +# ------------------------------------------------------------------- +securityContext: + seccompProfile: + type: RuntimeDefault + readOnlyRootFilesystem: false + allowPrivilegeEscalation: false + +# ------------------------------------------------------------------- +# Health probes +# ------------------------------------------------------------------- +livenessProbe: + initialDelaySeconds: 10 + timeoutSeconds: 10 + periodSeconds: 10 + successThreshold: 1 + failureThreshold: 3 +readinessProbe: + initialDelaySeconds: 10 + timeoutSeconds: 10 + periodSeconds: 10 + successThreshold: 1 + failureThreshold: 3 + +# ------------------------------------------------------------------- +# Autopilot service configuration +# ------------------------------------------------------------------- + +# Set to false for on-prem (direct provider connectivity). +# Set to true when using Pega GenAI Hub gateway. +enableGenaiHub: false + +# Disable internal auth for on-prem deployments. +authEnabled: false +isInternalDeployment: false + +# When authEnabled is true, set the URL of the Identity Provider (IdP) public key endpoint +# so the Autopilot service can validate incoming OAuth Bearer tokens from Pega Infinity. +# Example: "https://your-idp-host/oauth2/v1/keys" +oauthPublicKeyURL: "" + +# ------------------------------------------------------------------- +# Custom models file +# ------------------------------------------------------------------- +# Option 1: Use the default models file bundled with this chart (files/default-models.json). +# Set deployModelsConfigMap: true. To customize, edit files/default-models.json before helm install. +# +# Option 2: Provide a pre-existing ConfigMap name containing models.json +# customModels: +# existingConfigMap: "my-models-configmap" +# +# Option 3: Inline the JSON content directly (created as a ConfigMap) +# customModels: +# inline: | +# [{"provider":"azure","creator":"openai","model_name":"GPT-5",...}] +# +# If none of the above are set, the service uses its built-in model list +# and filters by available provider credentials. +# ------------------------------------------------------------------- +deployModelsConfigMap: true +customModels: + existingConfigMap: "" + inline: "" + +# ------------------------------------------------------------------- +# Provider selection +# ------------------------------------------------------------------- + +# Comma-separated list of providers to enable (e.g., "Azure,Vertex,Bedrock"). +# If empty, auto-detection based on available credentials is used. +modelProviders: "" + +# AWS Bedrock region +awsRegion: "us-east-1" + +# ------------------------------------------------------------------- +# Provider credentials +# ------------------------------------------------------------------- +# Option 1: Inline credentials (auto-creates a Kubernetes Secret) +# Option 2: Use a pre-existing secret (set providerCredentialsSecret) +# ------------------------------------------------------------------- + +# Pre-existing secret name containing provider credentials. +# If set, inline credentials below are ignored. +# The secret should contain keys matching the env var names +# (e.g., AWS_ACCESS_KEY_ID, AZURE_ENDPOINT, AZURE_OPENAI_KEY, etc.) +providerCredentialsSecret: "" + +# --- Azure OpenAI credentials --- +azure: + endpoint: "" # e.g., https://my-openai.openai.azure.com/ + apiKey: "" # Azure OpenAI API key + apiVersion: "2024-10-21" + +# --- AWS Bedrock credentials --- +aws: + accessKeyId: "" + secretAccessKey: "" + sessionToken: "" # Optional, for temporary credentials + +# --- Google Vertex AI credentials --- +vertex: + credentials: "" # Base64-encoded service account JSON key (project_id is auto-extracted) + location: "us-central1" + +# ------------------------------------------------------------------- +# Ingress (optional) +# ------------------------------------------------------------------- +ingress: + enabled: false + domain: "" + ingressClassName: "" + annotations: {} + labels: {} + tls: + enabled: false + secretName: "" diff --git a/charts/backingservices/requirements.yaml b/charts/backingservices/requirements.yaml index b9dc45f81..0be5d66c9 100644 --- a/charts/backingservices/requirements.yaml +++ b/charts/backingservices/requirements.yaml @@ -16,3 +16,5 @@ dependencies: version: "1.0.0" - name: srs version: "0.1.0" +- name: autopilot + version: "1.0.0" diff --git a/charts/pega/templates/_helpers.tpl b/charts/pega/templates/_helpers.tpl index 536c2e9ad..c6f4bc1cf 100644 --- a/charts/pega/templates/_helpers.tpl +++ b/charts/pega/templates/_helpers.tpl @@ -391,6 +391,26 @@ key: privateKey {{- end }} {{- end }} +{{- define "autopilotAuthPrivateKey" -}} +{{- if (.Values.autopilot.autopilotAuth).enabled }} + {{- if (.Values.autopilot.autopilotAuth).privateKey }} + {{- .Values.autopilot.autopilotAuth.privateKey | b64enc }} + {{- else }} + {{- fail "A valid entry is required for autopilot.autopilotAuth.privateKey or autopilot.autopilotAuth.external_secret_name, when OAuth authentication is enabled between Autopilot and Pega Infinity i.e. autopilot.autopilotAuth.enabled is true." | quote }} + {{- end }} +{{- end }} +{{- end }} + +{{- define "autopilotAuthEnvSecretFrom" }} +{{- if .Values.autopilot.autopilotAuth.external_secret_name }} +name: {{ .Values.autopilot.autopilotAuth.external_secret_name }} +key: AUTOPILOT_OAUTH_PRIVATE_KEY +{{- else }} +name: pega-autopilot-auth-secret +key: privateKey +{{- end }} +{{- end }} + {{- define "tcpKeepAliveProbe" }} {{- if .node.tcpKeepAliveProbe }} sysctls: diff --git a/charts/pega/templates/_pega-deployment.tpl b/charts/pega/templates/_pega-deployment.tpl index 9bcb4b603..e2b075c43 100644 --- a/charts/pega/templates/_pega-deployment.tpl +++ b/charts/pega/templates/_pega-deployment.tpl @@ -232,6 +232,11 @@ spec: {{- else }} {{- fail "pegasearch.srsAuth.authType must be either private_key_jwt or client_secret_basic." }} {{- end }} +{{- else if (.root.Values.autopilot.autopilotAuth).enabled }} + - name: SERV_AUTH_PRIVATE_KEY + valueFrom: + secretKeyRef: +{{- include "autopilotAuthEnvSecretFrom" .root | indent 14 }} {{- end }} envFrom: - configMapRef: diff --git a/charts/pega/templates/pega-autopilot-auth-secret.yaml b/charts/pega/templates/pega-autopilot-auth-secret.yaml new file mode 100644 index 000000000..7c4dccb65 --- /dev/null +++ b/charts/pega/templates/pega-autopilot-auth-secret.yaml @@ -0,0 +1,11 @@ +{{- if and ((.Values.autopilot.autopilotAuth).enabled) (not .Values.autopilot.autopilotAuth.external_secret_name) }} +# Secret for OAuth private key used to get an authorization token for Pega Infinity connection to the Autopilot service +apiVersion: v1 +kind: Secret +metadata: + name: pega-autopilot-auth-secret + namespace: {{ .Release.Namespace }} +type: Opaque +data: + privateKey: {{ template "autopilotAuthPrivateKey" . }} +{{- end }} diff --git a/charts/pega/templates/pega-environment-config.yaml b/charts/pega/templates/pega-environment-config.yaml index 23bede6a7..15b71ffa2 100644 --- a/charts/pega/templates/pega-environment-config.yaml +++ b/charts/pega/templates/pega-environment-config.yaml @@ -80,6 +80,7 @@ data: # URL to connect to Search and Reporting service SEARCH_AND_REPORTING_SERVICE_URL: {{ include "pegaSearchURL" $ }} # URL of the OAuth endpoint to get the token for Search and Reporting service + # Note: SRS OAuth takes precedence over Autopilot OAuth when both are enabled. SERV_AUTH_URL: "{{ .Values.pegasearch.srsAuth.url }}" # OAuth scopes to grant permissions for Pega Infinity in Search and Reporting service # The required value is "pega.search:full" @@ -101,6 +102,20 @@ data: SRS_TRUSTSTORE_PATH: "/opt/pega/certs/{{ .Values.pegasearch.srsMTLS.trustStore }}" # Path to a mounted key store file for MTLS connection to SRS SRS_KEYSTORE_PATH: "/opt/pega/certs/{{ .Values.pegasearch.srsMTLS.keyStore }}" +{{- end }} +{{- if and (.Values.autopilot.autopilotAuth).enabled (not (and .Values.pegasearch.externalSearchService (.Values.pegasearch.srsAuth).enabled)) }} + # URL of the OAuth endpoint to get the token for the Autopilot service + SERV_AUTH_URL: "{{ .Values.autopilot.autopilotAuth.url }}" + # OAuth scopes to grant permissions for Pega Infinity in the Autopilot service + {{- if .Values.autopilot.autopilotAuth.scopes }} + SERV_AUTH_SCOPES: {{ .Values.autopilot.autopilotAuth.scopes | quote }} + {{- end }} + # OAuth Client ID + SERV_AUTH_CLIENT_ID: "{{ .Values.autopilot.autopilotAuth.clientId }}" + # Algorithm used to generate a private key used by OAuth client + # Allowed values: RS256, RS384, RS512, ES256, ES384, ES512 + # Default value: RS256 + SERV_AUTH_PRIVATE_KEY_ALGORITHM: {{ .Values.autopilot.autopilotAuth.privateKeyAlgorithm | default "RS256" | quote }} {{- end }} # Whether to enable connecting to a cassandra cluster. "true" for enabled, "false for disabled". CASSANDRA_CLUSTER: "{{ include "cassandraEnabled" . }}" diff --git a/charts/pega/values.yaml b/charts/pega/values.yaml index dc6e3eb54..40b9ea21e 100644 --- a/charts/pega/values.yaml +++ b/charts/pega/values.yaml @@ -536,6 +536,31 @@ pegasearch: # Enter the external secret name below. external_secret_name: "" +# Autopilot GenAI service settings. +# Configure OAuth authentication between Pega Infinity and the on-premise Autopilot backing service. +# The Autopilot service URL is configured separately via a Dynamic System Setting (DSS) in Pega Infinity. +autopilot: + autopilotAuth: + # Set enabled to true to enable OAuth authentication between Pega Infinity and the Autopilot service. + enabled: false + # URL of the OAuth service endpoint to get the token for the Autopilot service. + url: "" + # Client ID used in the OAuth service. + clientId: "" + # Authentication type: only "private_key_jwt" is supported. + authType: "private_key_jwt" + # OAuth private PKCS8 key (Base64-encoded) used to get an authorization token. + # Required when external_secret_name is not set. + privateKey: "" + # Algorithm used to generate the private key. Allowed values: RS256, RS384, RS512, ES256, ES384, ES512. + privateKeyAlgorithm: "" + # Scopes set in the OAuth service required to grant access to the Autopilot service. + scopes: "" + # Name of a pre-existing Kubernetes Secret containing the private key or client secret. + # When set, the chart does not create a new secret and ignores the privateKey value above. + # The secret must contain the key "AUTOPILOT_OAUTH_PRIVATE_KEY". + external_secret_name: "" + # Pega Installer settings. installer: image: "YOUR_INSTALLER_IMAGE:TAG"