google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment module – Creates a GCP VertexAI.EndpointWithModelGardenDeployment resource
Note
This module is part of the google.cloud collection (version 1.12.0).
You might already have this collection installed if you are using the ansible package.
It is not included in ansible-core.
To check whether it is installed, run ansible-galaxy collection list.
To install it, use: ansible-galaxy collection install google.cloud.
You need further requirements to be able to use this module,
see Requirements for details.
To use it in a playbook, specify: google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment.
Synopsis
Create an Endpoint and deploy a Model Garden model to it.
Requirements
The below requirements are needed on the host that executes this module.
python >= 3.8
requests >= 2.18.4
google-auth >= 2.25.1
Parameters
Parameter |
Comments |
|---|---|
The access token used to authenticate. |
|
The type of credential used. Choices:
|
|
The deploy config to use for the deployment. |
|
A description of resources that are dedicated to a DeployedModel or DeployedIndex, and that need a higher degree of manual configuration. |
|
The metric specifications that overrides a resource utilization metric (CPU utilization, accelerator’s duty cycle, and so on) target value (default to 60 if not set). At most one entry is allowed per metric. If machine_spec.accelerator_count is above 0, the autoscaling will be based on both CPU utilization and accelerator’s duty cycle metrics and scale up when either metrics exceeds its target value while scale down if both metrics are under their target value. The default target value is 60 for both metrics. If machine_spec.accelerator_count is 0, the autoscaling will be based on CPU utilization metric only with default target value 60 if not explicitly set. For example, in the case of Online Prediction, if you want to override target CPU utilization to 80, you should set autoscaling_metric_specs.metric_name to `aiplatform.googleapis.com/prediction/online/cpu/utilization` and autoscaling_metric_specs.target to `80`. |
|
The resource metric name. Supported metrics: * For Online Prediction: * `aiplatform.googleapis.com/prediction/online/accelerator/duty_cycle` * `aiplatform.googleapis.com/prediction/online/cpu/utilization`. |
|
The target resource utilization in percentage (1% - 100%) for the given metric; once the real usage deviates from the target by a certain percentage, the machine replicas change. The default value is 60 (representing 60%) if not provided. |
|
Specification of a single machine. |
|
The number of accelerators to attach to the machine. |
|
Possible values: ACCELERATOR_TYPE_UNSPECIFIED NVIDIA_TESLA_K80 NVIDIA_TESLA_P100 NVIDIA_TESLA_V100 NVIDIA_TESLA_P4 NVIDIA_TESLA_T4 NVIDIA_TESLA_A100 NVIDIA_A100_80GB NVIDIA_L4 NVIDIA_H100_80GB NVIDIA_H100_MEGA_80GB NVIDIA_H200_141GB NVIDIA_B200 TPU_V2 TPU_V3 TPU_V4_POD TPU_V5_LITEPOD. |
|
The type of the machine. See the [list of machine types supported for prediction](https://cloud.google.com/vertex-ai/docs/predictions/configure-compute#machine-types) See the [list of machine types supported for custom training](https://cloud.google.com/vertex-ai/docs/training/configure-compute#machine-types). For DeployedModel this field is optional, and the default value is `n1-standard-2`. For BatchPredictionJob or as part of WorkerPoolSpec this field is required. |
|
The number of nodes per replica for multihost GPU deployments. |
|
A ReservationAffinity can be used to configure a Vertex AI resource (e.g., a DeployedModel) to draw its Compute Engine resources from a Shared Reservation, or exclusively from on-demand capacity. |
|
Corresponds to the label key of a reservation resource. To target a SPECIFIC_RESERVATION by name, use `compute.googleapis.com/reservation-name` as the key and specify the name of your reservation as its value. |
|
Specifies the reservation affinity type. Possible values: TYPE_UNSPECIFIED NO_RESERVATION ANY_RESERVATION SPECIFIC_RESERVATION. |
|
Corresponds to the label values of a reservation resource. This must be the full resource name of the reservation or reservation block. |
|
The topology of the TPUs. Corresponds to the TPU topologies available from GKE. (Example: tpu_topology: “2x2x1”). |
|
The maximum number of replicas that may be deployed on when the traffic against it increases. If the requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale to that many replicas is guaranteed (barring service outages). If traffic increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, will use min_replica_count as the default value. The value of this field impacts the charge against Vertex CPU and GPU quotas. Specifically, you will be charged for (max_replica_count * number of cores in the selected machine type) and (max_replica_count * number of GPUs per replica in the selected machine type). |
|
The minimum number of machine replicas that will be always deployed on. This value must be greater than or equal to 1. If traffic increases, it may dynamically be deployed onto more replicas, and as traffic decreases, some of these extra replicas may be freed. |
|
Number of required available replicas for the deployment to succeed. This field is only needed when partial deployment/mutation is desired. If set, the deploy/mutate operation will succeed once available_replica_count reaches required_replica_count, and the rest of the replicas will be retried. If not set, the default required_replica_count will be min_replica_count. |
|
If true, schedule the deployment workload on [spot VMs](https://cloud.google.com/kubernetes-engine/docs/concepts/spot-vms). Choices:
|
|
If true, enable the QMT fast tryout feature for this model if possible. Choices:
|
|
System labels for Model Garden deployments. These labels are managed by Google and for tracking purposes only. |
|
The user-specified display name. This will be the default display name for both the endpoint and the deployed model. |
|
The endpoint config to use for the deployment. |
|
If true, the endpoint will be exposed through a dedicated DNS [Endpoint.dedicated_endpoint_dns]. Your request to the dedicated DNS will be isolated from other users’ traffic and will have better performance and reliability. Note: Once you enabled dedicated endpoint, you won’t be able to send request to the shared DNS {region}-aiplatform.googleapis.com. The limitations will be removed soon. Choices:
|
|
The user-specified display name of the endpoint. If not set, a default name will be used. |
|
The configuration for Private Service Connect (PSC). |
|
If true, expose the IndexEndpoint via private service connect. Choices:
|
|
A list of Projects from which the forwarding rule will target the service attachment. |
|
PSC config that is used to automatically create PSC endpoints in the user projects. |
|
Output only. Error message if the PSC service automation failed. |
|
Output only. Forwarding rule created by the PSC service automation. |
|
Output only. IP address rule created by the PSC service automation. |
|
The full name of the Google Compute Engine network. Format: projects/{project}/global/networks/{network}. |
|
Project id used to create forwarding rule. |
|
Output only. The state of the PSC service automation. Choices:
|
|
Output only. The name of the generated service attachment resource. This is only populated if the endpoint is deployed with PrivateServiceConnect. |
|
Specifies which Ansible environment you’re running this module within. This should not be set unless you know what you’re doing. This only alters the User Agent string for any API requests. |
|
The Hugging Face model to deploy. Format: Hugging Face model ID like `google/gemma-2-2b-it`. |
|
Resource ID segment making up resource `location`. It identifies the resource within its parent collection as described in https://google.aip.dev/122. |
|
The model config to use for the deployment. |
|
Whether the user accepts the End User License Agreement (EULA) for the model. Choices:
|
|
Specification of a container for serving predictions. Some fields in this message correspond to fields in the [Kubernetes Container v1 core specification](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core). |
|
Specifies arguments for the command that runs when the container starts. This overrides the container’s [`CMD`](https://docs.docker.com/engine/reference/builder/#cmd). Specify this field as an array of executable and arguments, similar to a Docker `CMD`’s “default parameters” form. If you don’t specify this field but do specify the command field, then the command from the `command` field runs without any additional arguments. See the [Kubernetes documentation about how the `command` and `args` fields interact with a container’s `ENTRYPOINT` and `CMD`](https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#notes). If you don’t specify this field and don’t specify the `command` field, then the container’s [`ENTRYPOINT`](https://docs.docker.com/engine/reference/builder/#cmd) and `CMD` determine what runs based on their default behavior. See the Docker documentation about [how `CMD` and `ENTRYPOINT` interact](https://docs.docker.com/engine/reference/builder/#understand-how-cmd-and-entrypoint-interact). In this field, you can reference [environment variables set by Vertex AI](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables) and environment variables set in the env field. You cannot reference environment variables set in the Docker image. In order for environment variables to be expanded, reference them by using the following syntax:$(VARIABLE_NAME) Note that this differs from Bash variable expansion, which does not use parentheses. If a variable cannot be resolved, the reference in the input string is used unchanged. To avoid variable expansion, you can escape this syntax with `$$`; for example:$$(VARIABLE_NAME) This field corresponds to the `args` field of the Kubernetes Containers [v1 core API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core). |
|
Specifies the command that runs when the container starts. This overrides the container’s [ENTRYPOINT](https://docs.docker.com/engine/reference/builder/#entrypoint). Specify this field as an array of executable and arguments, similar to a Docker `ENTRYPOINT`’s “exec” form, not its “shell” form. If you do not specify this field, then the container’s `ENTRYPOINT` runs, in conjunction with the args field or the container’s [`CMD`](https://docs.docker.com/engine/reference/builder/#cmd), if either exists. If this field is not specified and the container does not have an `ENTRYPOINT`, then refer to the Docker documentation about [how `CMD` and `ENTRYPOINT` interact](https://docs.docker.com/engine/reference/builder/#understand-how-cmd-and-entrypoint-interact). If you specify this field, then you can also specify the `args` field to provide additional arguments for this command. However, if you specify this field, then the container’s `CMD` is ignored. See the [Kubernetes documentation about how the `command` and `args` fields interact with a container’s `ENTRYPOINT` and `CMD`](https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#notes). In this field, you can reference [environment variables set by Vertex AI](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables) and environment variables set in the env field. You cannot reference environment variables set in the Docker image. In order for environment variables to be expanded, reference them by using the following syntax:$(VARIABLE_NAME) Note that this differs from Bash variable expansion, which does not use parentheses. If a variable cannot be resolved, the reference in the input string is used unchanged. To avoid variable expansion, you can escape this syntax with `$$`; for example:$$(VARIABLE_NAME) This field corresponds to the `command` field of the Kubernetes Containers [v1 core API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core). |
|
Deployment timeout. Limit for deployment timeout is 2 hours. |
|
List of environment variables to set in the container. After the container starts running, code running in the container can read these environment variables. Additionally, the command and args fields can reference these variables. Later entries in this list can also reference earlier entries. For example, the following example sets the variable `VAR_2` to have the value `foo bar`: ```json [ { “name”: “VAR_1”, “value”: “foo” }, { “name”: “VAR_2”, “value”: “$(VAR_1) bar” } ] ``` If you switch the order of the variables in the example, then the expansion does not occur. This field corresponds to the `env` field of the Kubernetes Containers [v1 core API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core). |
|
Name of the environment variable. Must be a valid C identifier. |
|
Variables that reference a $(VAR_NAME) are expanded using the previous defined environment variables in the container and any service environment variables. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not. |
|
List of ports to expose from the container. Vertex AI sends gRPC prediction requests that it receives to the first port on this list. Vertex AI also sends liveness and health checks to this port. If you do not specify this field, gRPC requests to the container will be disabled. Vertex AI does not use ports other than the first one listed. This field corresponds to the `ports` field of the Kubernetes Containers v1 core API. |
|
The number of the port to expose on the pod’s IP address. Must be a valid port number, between 1 and 65535 inclusive. |
|
Probe describes a health check to be performed against a container to determine whether it is alive or ready to receive traffic. |
|
ExecAction specifies a command to execute. |
|
Command is the command line to execute inside the container, the working directory for the command is root (‘/’) in the container’s filesystem. The command is simply exec’d, it is not run inside a shell, so traditional shell instructions (’|’, etc) won’t work. To use a shell, you need to explicitly call out to that shell. Exit status of 0 is treated as live/healthy and non-zero is unhealthy. |
|
Number of consecutive failures before the probe is considered failed. Defaults to 3. Minimum value is 1. Maps to Kubernetes probe argument ‘failureThreshold’. |
|
GrpcAction checks the health of a container using a gRPC service. |
|
Port number of the gRPC service. Number must be in the range 1 to 65535. |
|
Service is the name of the service to place in the gRPC HealthCheckRequest. See https://github.com/grpc/grpc/blob/master/doc/health-checking.md. If this is not specified, the default behavior is defined by gRPC. |
|
HttpGetAction describes an action based on HTTP Get requests. |
|
Host name to connect to, defaults to the model serving container’s IP. You probably want to set “Host” in httpHeaders instead. |
|
Custom headers to set in the request. HTTP allows repeated headers. |
|
The header field name. This will be canonicalized upon output, so case-variant names will be understood as the same header. |
|
The header field value. |
|
Path to access on the HTTP server. |
|
Number of the port to access on the container. Number must be in the range 1 to 65535. |
|
Scheme to use for connecting to the host. Defaults to HTTP. Acceptable values are “HTTP” or “HTTPS”. |
|
Number of seconds to wait before starting the probe. Defaults to 0. Minimum value is 0. Maps to Kubernetes probe argument ‘initialDelaySeconds’. |
|
How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1. Must be less than timeout_seconds. Maps to Kubernetes probe argument ‘periodSeconds’. |
|
Number of consecutive successes before the probe is considered successful. Defaults to 1. Minimum value is 1. Maps to Kubernetes probe argument ‘successThreshold’. |
|
TcpSocketAction probes the health of a container by opening a TCP socket connection. |
|
Optional: Host name to connect to, defaults to the model serving container’s IP. |
|
Number of the port to access on the container. Number must be in the range 1 to 65535. |
|
Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1. Must be greater or equal to period_seconds. Maps to Kubernetes probe argument ‘timeoutSeconds’. |
|
HTTP path on the container to send health checks to. Vertex AI intermittently sends GET requests to this path on the container’s IP address and port to check that the container is healthy. Read more about [health checks](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#health). For example, if you set this field to `/bar`, then Vertex AI intermittently sends a GET request to the `/bar` path on the port of your container specified by the first value of this `ModelContainerSpec`’s ports field. If you don’t specify this field, it defaults to the following value when you deploy this Model to an Endpoint:/v1/endpoints/ENDPOINT/deployedModels/DEPLOYED_MODEL:predict The placeholders in this value are replaced as follows: * ENDPOINT: The last segment (following `endpoints/`)of the Endpoint.name][] field of the Endpoint where this Model has been deployed. (Vertex AI makes this value available to your container code as the [`AIP_ENDPOINT_ID` environment variable](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables).) * DEPLOYED_MODEL: DeployedModel.id of the `DeployedModel`. (Vertex AI makes this value available to your container code as the [`AIP_DEPLOYED_MODEL_ID` environment variable](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables).). |
|
URI of the Docker image to be used as the custom container for serving predictions. This URI must identify an image in Artifact Registry or Container Registry. Learn more about the [container publishing requirements](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#publishing), including permissions requirements for the Vertex AI Service Agent. The container image is ingested upon ModelService.UploadModel, stored internally, and this original path is afterwards not used. To learn about the requirements for the Docker image itself, see [Custom container requirements](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#). You can use the URI to one of Vertex AI’s [pre-built container images for prediction](https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers) in this field. |
|
Probe describes a health check to be performed against a container to determine whether it is alive or ready to receive traffic. |
|
ExecAction specifies a command to execute. |
|
Command is the command line to execute inside the container, the working directory for the command is root (‘/’) in the container’s filesystem. The command is simply exec’d, it is not run inside a shell, so traditional shell instructions (’|’, etc) won’t work. To use a shell, you need to explicitly call out to that shell. Exit status of 0 is treated as live/healthy and non-zero is unhealthy. |
|
Number of consecutive failures before the probe is considered failed. Defaults to 3. Minimum value is 1. Maps to Kubernetes probe argument ‘failureThreshold’. |
|
GrpcAction checks the health of a container using a gRPC service. |
|
Port number of the gRPC service. Number must be in the range 1 to 65535. |
|
Service is the name of the service to place in the gRPC HealthCheckRequest. See https://github.com/grpc/grpc/blob/master/doc/health-checking.md. If this is not specified, the default behavior is defined by gRPC. |
|
HttpGetAction describes an action based on HTTP Get requests. |
|
Host name to connect to, defaults to the model serving container’s IP. You probably want to set “Host” in httpHeaders instead. |
|
Custom headers to set in the request. HTTP allows repeated headers. |
|
The header field name. This will be canonicalized upon output, so case-variant names will be understood as the same header. |
|
The header field value. |
|
Path to access on the HTTP server. |
|
Number of the port to access on the container. Number must be in the range 1 to 65535. |
|
Scheme to use for connecting to the host. Defaults to HTTP. Acceptable values are “HTTP” or “HTTPS”. |
|
Number of seconds to wait before starting the probe. Defaults to 0. Minimum value is 0. Maps to Kubernetes probe argument ‘initialDelaySeconds’. |
|
How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1. Must be less than timeout_seconds. Maps to Kubernetes probe argument ‘periodSeconds’. |
|
Number of consecutive successes before the probe is considered successful. Defaults to 1. Minimum value is 1. Maps to Kubernetes probe argument ‘successThreshold’. |
|
TcpSocketAction probes the health of a container by opening a TCP socket connection. |
|
Optional: Host name to connect to, defaults to the model serving container’s IP. |
|
Number of the port to access on the container. Number must be in the range 1 to 65535. |
|
Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1. Must be greater or equal to period_seconds. Maps to Kubernetes probe argument ‘timeoutSeconds’. |
|
List of ports to expose from the container. Vertex AI sends any prediction requests that it receives to the first port on this list. Vertex AI also sends [liveness and health checks](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#liveness) to this port. If you do not specify this field, it defaults to following value: ```json [ { “containerPort”: 8080 } ] ``` Vertex AI does not use ports other than the first one listed. This field corresponds to the `ports` field of the Kubernetes Containers [v1 core API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core). |
|
The number of the port to expose on the pod’s IP address. Must be a valid port number, between 1 and 65535 inclusive. |
|
HTTP path on the container to send prediction requests to. Vertex AI forwards requests sent using projects.locations.endpoints.predict to this path on the container’s IP address and port. Vertex AI then returns the container’s response in the API response. For example, if you set this field to `/foo`, then when Vertex AI receives a prediction request, it forwards the request body in a POST request to the `/foo` path on the port of your container specified by the first value of this `ModelContainerSpec`’s ports field. If you don’t specify this field, it defaults to the following value when you deploy this Model to an Endpoint:/v1/endpoints/ENDPOINT/deployedModels/DEPLOYED_MODEL:predict The placeholders in this value are replaced as follows: * ENDPOINT: The last segment (following `endpoints/`)of the Endpoint.name][] field of the Endpoint where this Model has been deployed. (Vertex AI makes this value available to your container code as the [`AIP_ENDPOINT_ID` environment variable](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables).) * DEPLOYED_MODEL: DeployedModel.id of the `DeployedModel`. (Vertex AI makes this value available to your container code as the [`AIP_DEPLOYED_MODEL_ID` environment variable](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables).). |
|
The amount of the VM memory to reserve as the shared memory for the model in megabytes. |
|
Probe describes a health check to be performed against a container to determine whether it is alive or ready to receive traffic. |
|
ExecAction specifies a command to execute. |
|
Command is the command line to execute inside the container, the working directory for the command is root (‘/’) in the container’s filesystem. The command is simply exec’d, it is not run inside a shell, so traditional shell instructions (’|’, etc) won’t work. To use a shell, you need to explicitly call out to that shell. Exit status of 0 is treated as live/healthy and non-zero is unhealthy. |
|
Number of consecutive failures before the probe is considered failed. Defaults to 3. Minimum value is 1. Maps to Kubernetes probe argument ‘failureThreshold’. |
|
GrpcAction checks the health of a container using a gRPC service. |
|
Port number of the gRPC service. Number must be in the range 1 to 65535. |
|
Service is the name of the service to place in the gRPC HealthCheckRequest. See https://github.com/grpc/grpc/blob/master/doc/health-checking.md. If this is not specified, the default behavior is defined by gRPC. |
|
HttpGetAction describes an action based on HTTP Get requests. |
|
Host name to connect to, defaults to the model serving container’s IP. You probably want to set “Host” in httpHeaders instead. |
|
Custom headers to set in the request. HTTP allows repeated headers. |
|
The header field name. This will be canonicalized upon output, so case-variant names will be understood as the same header. |
|
The header field value. |
|
Path to access on the HTTP server. |
|
Number of the port to access on the container. Number must be in the range 1 to 65535. |
|
Scheme to use for connecting to the host. Defaults to HTTP. Acceptable values are “HTTP” or “HTTPS”. |
|
Number of seconds to wait before starting the probe. Defaults to 0. Minimum value is 0. Maps to Kubernetes probe argument ‘initialDelaySeconds’. |
|
How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1. Must be less than timeout_seconds. Maps to Kubernetes probe argument ‘periodSeconds’. |
|
Number of consecutive successes before the probe is considered successful. Defaults to 1. Minimum value is 1. Maps to Kubernetes probe argument ‘successThreshold’. |
|
TcpSocketAction probes the health of a container by opening a TCP socket connection. |
|
Optional: Host name to connect to, defaults to the model serving container’s IP. |
|
Number of the port to access on the container. Number must be in the range 1 to 65535. |
|
Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1. Must be greater or equal to period_seconds. Maps to Kubernetes probe argument ‘timeoutSeconds’. |
|
The Hugging Face read access token used to access the model artifacts of gated models. |
|
If true, the model will deploy with a cached version instead of directly downloading the model artifacts from Hugging Face. This is suitable for VPC-SC users with limited internet access. Choices:
|
|
The user-specified display name of the uploaded model. If not set, a default name will be used. |
|
The Google Cloud Platform project to use. |
|
The Model Garden model to deploy. Format: `publishers/{publisher}/models/{publisher_model}@{version_id}`, or `publishers/hf-{hugging-face-author}/models/{hugging-face-model-name}@001`. |
|
Array of scopes to be used. |
|
The contents of a Service Account JSON file, either in a dictionary or as a JSON string that represents it. |
|
An optional service account email address if machineaccount is selected and the user does not wish to use the default email. |
|
The path of a Service Account JSON file if serviceaccount is selected as type. |
|
Whether the resource should exist in GCP. Choices:
|
Notes
Note
API Reference: https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations/deploy
Overview of Model Garden Guide: https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/explore-models
Overview of self-deployed models Guide: https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/self-deployed-models
Use models in Model Garden Guide: https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/use-models
For authentication, you can set auth_kind using the
GCP_AUTH_KINDenv variable.For authentication, you can set service_account_file using the
GCP_SERVICE_ACCOUNT_FILEenv variable.For authentication, you can set service_account_contents using the
GCP_SERVICE_ACCOUNT_CONTENTSenv variable.For authentication, you can set service_account_email using the
GCP_SERVICE_ACCOUNT_EMAILenv variable.For authentication, you can set access_token using the
GCP_ACCESS_TOKENenv variable.For authentication, you can set scopes using the
GCP_SCOPESenv variable.Environment variables values will only be used if the playbook values are not set.
The
service_account_email,service_account_file,service_account_fileandaccess_tokenoptions are mutually exclusive.
Examples
- name: Deploy Basic Model
google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment:
state: present
publisher_model_name: publishers/google/models/paligemma@paligemma-224-float32
endpoint_config:
endpoint_display_name: my-endpoint
model_config:
model_display_name: my-model
accept_eula: true
location: us-central1
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
################################################################################
- name: Deploy Hugging Face Model
google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment:
state: present
hugging_face_model_id: Qwen/Qwen3-0.6B
endpoint_config:
endpoint_display_name: huggingface-endpoint
model_config:
model_display_name: huggingface-model
accept_eula: true
location: us-central1
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
################################################################################
- name: Deploy Basic Model with PSC Endpoint
google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment:
state: present
publisher_model_name: publishers/google/models/paligemma@paligemma-224-float32
display_name: my-psc-endoint
endpoint_config:
private_service_connect_config:
enable_private_service_connect: true
project_allowlist:
- my-project-id
model_config:
accept_eula: true
location: us-central1
project: "{{ gcp_project }}"
auth_kind: "{{ gcp_cred_kind }}"
service_account_file: "{{ gcp_cred_file }}"
Return Values
Common return values are documented here, the following are the fields unique to this module:
Key |
Description |
|---|---|
Whether the resource was changed. Returned: always |
|
Output only. The display name assigned to the model deployed to the endpoint. This is not required to delete the resource but is used for debug logging. Returned: success |
|
Output only. The unique numeric ID that Vertex AI assigns to the model at the time it is deployed to the endpoint. It is required to undeploy the model from the endpoint during resource deletion as described in https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.endpoints/undeployModel. Returned: success |
|
The current state of the resource. Returned: always |