google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment module – Creates a GCP VertexAI.EndpointWithModelGardenDeployment resource

Note

This module is part of the google.cloud collection (version 1.13.0).

You might already have this collection installed if you are using the ansible package. It is not included in ansible-core. To check whether it is installed, run ansible-galaxy collection list.

To install it, use: ansible-galaxy collection install google.cloud. You need further requirements to be able to use this module, see Requirements for details.

To use it in a playbook, specify: google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment.

Synopsis 

Create an Endpoint and deploy a Model Garden model to it.

Requirements 

The below requirements are needed on the host that executes this module.

python >= 3.8
requests >= 2.18.4
google-auth >= 2.25.1

Parameters 

Parameter	Comments
access_token string	The access token used to authenticate.
auth_kind string / required	The type of credential used. Choices: `"accesstoken"` `"application"` `"machineaccount"` `"serviceaccount"`
deploy_config dictionary	The deploy config to use for the deployment.
dedicated_resources dictionary	A description of resources that are dedicated to a DeployedModel or DeployedIndex, and that need a higher degree of manual configuration.
autoscaling_metric_specs list / elements=dictionary	The metric specifications that overrides a resource utilization metric (CPU utilization, accelerator’s duty cycle, and so on) target value (default to 60 if not set). At most one entry is allowed per metric. If machine_spec.accelerator_count is above 0, the autoscaling will be based on both CPU utilization and accelerator’s duty cycle metrics and scale up when either metrics exceeds its target value while scale down if both metrics are under their target value. The default target value is 60 for both metrics. If machine_spec.accelerator_count is 0, the autoscaling will be based on CPU utilization metric only with default target value 60 if not explicitly set. For example, in the case of Online Prediction, if you want to override target CPU utilization to 80, you should set autoscaling_metric_specs.metric_name to `aiplatform.googleapis.com/prediction/online/cpu/utilization` and autoscaling_metric_specs.target to `80`.
metric_name string / required	The resource metric name. Supported metrics: * For Online Prediction: * `aiplatform.googleapis.com/prediction/online/accelerator/duty_cycle` * `aiplatform.googleapis.com/prediction/online/cpu/utilization`.
target integer	The target resource utilization in percentage (1% - 100%) for the given metric; once the real usage deviates from the target by a certain percentage, the machine replicas change. The default value is 60 (representing 60%) if not provided.
machine_spec dictionary / required	Specification of a single machine.
accelerator_count integer	The number of accelerators to attach to the machine.
accelerator_type string	Possible values: ACCELERATOR_TYPE_UNSPECIFIED NVIDIA_TESLA_K80 NVIDIA_TESLA_P100 NVIDIA_TESLA_V100 NVIDIA_TESLA_P4 NVIDIA_TESLA_T4 NVIDIA_TESLA_A100 NVIDIA_A100_80GB NVIDIA_L4 NVIDIA_H100_80GB NVIDIA_H100_MEGA_80GB NVIDIA_H200_141GB NVIDIA_B200 TPU_V2 TPU_V3 TPU_V4_POD TPU_V5_LITEPOD.
machine_type string	The type of the machine. See the [list of machine types supported for prediction](https://cloud.google.com/vertex-ai/docs/predictions/configure-compute#machine-types) See the [list of machine types supported for custom training](https://cloud.google.com/vertex-ai/docs/training/configure-compute#machine-types). For DeployedModel this field is optional, and the default value is `n1-standard-2`. For BatchPredictionJob or as part of WorkerPoolSpec this field is required.
multihost_gpu_node_count integer	The number of nodes per replica for multihost GPU deployments.
reservation_affinity dictionary	A ReservationAffinity can be used to configure a Vertex AI resource (e.g., a DeployedModel) to draw its Compute Engine resources from a Shared Reservation, or exclusively from on-demand capacity.
key string	Corresponds to the label key of a reservation resource. To target a SPECIFIC_RESERVATION by name, use `compute.googleapis.com/reservation-name` as the key and specify the name of your reservation as its value.
reservation_affinity_type string / required	Specifies the reservation affinity type. Possible values: TYPE_UNSPECIFIED NO_RESERVATION ANY_RESERVATION SPECIFIC_RESERVATION.
values list / elements=string	Corresponds to the label values of a reservation resource. This must be the full resource name of the reservation or reservation block.
tpu_topology string	The topology of the TPUs. Corresponds to the TPU topologies available from GKE. (Example: tpu_topology: “2x2x1”).
max_replica_count integer	The maximum number of replicas that may be deployed on when the traffic against it increases. If the requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale to that many replicas is guaranteed (barring service outages). If traffic increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, will use min_replica_count as the default value. The value of this field impacts the charge against Vertex CPU and GPU quotas. Specifically, you will be charged for (max_replica_count * number of cores in the selected machine type) and (max_replica_count * number of GPUs per replica in the selected machine type).
min_replica_count integer / required	The minimum number of machine replicas that will be always deployed on. This value must be greater than or equal to 1. If traffic increases, it may dynamically be deployed onto more replicas, and as traffic decreases, some of these extra replicas may be freed.
required_replica_count integer	Number of required available replicas for the deployment to succeed. This field is only needed when partial deployment/mutation is desired. If set, the deploy/mutate operation will succeed once available_replica_count reaches required_replica_count, and the rest of the replicas will be retried. If not set, the default required_replica_count will be min_replica_count.
spot boolean	If true, schedule the deployment workload on [spot VMs](https://cloud.google.com/kubernetes-engine/docs/concepts/spot-vms). Choices: `false` `true`
fast_tryout_enabled boolean	If true, enable the QMT fast tryout feature for this model if possible. Choices: `false` `true`
system_labels dictionary	System labels for Model Garden deployments. These labels are managed by Google and for tracking purposes only.
display_name string / required	The user-specified display name. This will be the default display name for both the endpoint and the deployed model.
endpoint_config dictionary	The endpoint config to use for the deployment.
dedicated_endpoint_enabled boolean	If true, the endpoint will be exposed through a dedicated DNS [Endpoint.dedicated_endpoint_dns]. Your request to the dedicated DNS will be isolated from other users’ traffic and will have better performance and reliability. Note: Once you enabled dedicated endpoint, you won’t be able to send request to the shared DNS {region}-aiplatform.googleapis.com. The limitations will be removed soon. Choices: `false` `true`
endpoint_display_name string	The user-specified display name of the endpoint. If not set, a default name will be used.
private_service_connect_config dictionary	The configuration for Private Service Connect (PSC).
enable_private_service_connect boolean / required	If true, expose the IndexEndpoint via private service connect. Choices: `false` `true`
project_allowlist list / elements=string	A list of Projects from which the forwarding rule will target the service attachment.
psc_automation_configs dictionary	PSC config that is used to automatically create PSC endpoints in the user projects.
error_message string	Output only. Error message if the PSC service automation failed.
forwarding_rule string	Output only. Forwarding rule created by the PSC service automation.
ip_address string	Output only. IP address rule created by the PSC service automation.
network string / required	The full name of the Google Compute Engine network. Format: projects/{project}/global/networks/{network}.
project_id string / required	Project id used to create forwarding rule.
state string	Output only. The state of the PSC service automation. Choices: `"PSC_AUTOMATION_STATE_UNSPECIFIED"` `"PSC_AUTOMATION_STATE_SUCCESSFUL"` `"PSC_AUTOMATION_STATE_FAILED"`
service_attachment string	Output only. The name of the generated service attachment resource. This is only populated if the endpoint is deployed with PrivateServiceConnect.
env_type string	Specifies which Ansible environment you’re running this module within. This should not be set unless you know what you’re doing. This only alters the User Agent string for any API requests.
hugging_face_model_id string	The Hugging Face model to deploy. Format: Hugging Face model ID like `google/gemma-2-2b-it`.
location string / required	Resource ID segment making up resource `location`. It identifies the resource within its parent collection as described in https://google.aip.dev/122.
model_config dictionary	The model config to use for the deployment.
accept_eula boolean	Whether the user accepts the End User License Agreement (EULA) for the model. Choices: `false` `true`
container_spec dictionary	Specification of a container for serving predictions. Some fields in this message correspond to fields in the [Kubernetes Container v1 core specification](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core).
args list / elements=string	Specifies arguments for the command that runs when the container starts. This overrides the container’s [`CMD`](https://docs.docker.com/engine/reference/builder/#cmd). Specify this field as an array of executable and arguments, similar to a Docker `CMD`’s “default parameters” form. If you don’t specify this field but do specify the command field, then the command from the `command` field runs without any additional arguments. See the [Kubernetes documentation about how the `command` and `args` fields interact with a container’s `ENTRYPOINT` and `CMD`](https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#notes). If you don’t specify this field and don’t specify the `command` field, then the container’s [`ENTRYPOINT`](https://docs.docker.com/engine/reference/builder/#cmd) and `CMD` determine what runs based on their default behavior. See the Docker documentation about [how `CMD` and `ENTRYPOINT` interact](https://docs.docker.com/engine/reference/builder/#understand-how-cmd-and-entrypoint-interact). In this field, you can reference [environment variables set by Vertex AI](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables) and environment variables set in the env field. You cannot reference environment variables set in the Docker image. In order for environment variables to be expanded, reference them by using the following syntax:$(VARIABLE_NAME) Note that this differs from Bash variable expansion, which does not use parentheses. If a variable cannot be resolved, the reference in the input string is used unchanged. To avoid variable expansion, you can escape this syntax with `$$`; for example:$$(VARIABLE_NAME) This field corresponds to the `args` field of the Kubernetes Containers [v1 core API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core).
command list / elements=string	Specifies the command that runs when the container starts. This overrides the container’s [ENTRYPOINT](https://docs.docker.com/engine/reference/builder/#entrypoint). Specify this field as an array of executable and arguments, similar to a Docker `ENTRYPOINT`’s “exec” form, not its “shell” form. If you do not specify this field, then the container’s `ENTRYPOINT` runs, in conjunction with the args field or the container’s [`CMD`](https://docs.docker.com/engine/reference/builder/#cmd), if either exists. If this field is not specified and the container does not have an `ENTRYPOINT`, then refer to the Docker documentation about [how `CMD` and `ENTRYPOINT` interact](https://docs.docker.com/engine/reference/builder/#understand-how-cmd-and-entrypoint-interact). If you specify this field, then you can also specify the `args` field to provide additional arguments for this command. However, if you specify this field, then the container’s `CMD` is ignored. See the [Kubernetes documentation about how the `command` and `args` fields interact with a container’s `ENTRYPOINT` and `CMD`](https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#notes). In this field, you can reference [environment variables set by Vertex AI](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables) and environment variables set in the env field. You cannot reference environment variables set in the Docker image. In order for environment variables to be expanded, reference them by using the following syntax:$(VARIABLE_NAME) Note that this differs from Bash variable expansion, which does not use parentheses. If a variable cannot be resolved, the reference in the input string is used unchanged. To avoid variable expansion, you can escape this syntax with `$$`; for example:$$(VARIABLE_NAME) This field corresponds to the `command` field of the Kubernetes Containers [v1 core API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core).
deployment_timeout string	Deployment timeout. Limit for deployment timeout is 2 hours.
env list / elements=dictionary	List of environment variables to set in the container. After the container starts running, code running in the container can read these environment variables. Additionally, the command and args fields can reference these variables. Later entries in this list can also reference earlier entries. For example, the following example sets the variable `VAR_2` to have the value `foo bar`: ```json [ { “name”: “VAR_1”, “value”: “foo” }, { “name”: “VAR_2”, “value”: “$(VAR_1) bar” } ] ``` If you switch the order of the variables in the example, then the expansion does not occur. This field corresponds to the `env` field of the Kubernetes Containers [v1 core API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core).
name string / required	Name of the environment variable. Must be a valid C identifier.
value string / required	Variables that reference a $(VAR_NAME) are expanded using the previous defined environment variables in the container and any service environment variables. If a variable cannot be resolved, the reference in the input string will be unchanged. The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME). Escaped references will never be expanded, regardless of whether the variable exists or not.
grpc_ports list / elements=dictionary	List of ports to expose from the container. Vertex AI sends gRPC prediction requests that it receives to the first port on this list. Vertex AI also sends liveness and health checks to this port. If you do not specify this field, gRPC requests to the container will be disabled. Vertex AI does not use ports other than the first one listed. This field corresponds to the `ports` field of the Kubernetes Containers v1 core API.
container_port integer	The number of the port to expose on the pod’s IP address. Must be a valid port number, between 1 and 65535 inclusive.
health_probe dictionary	Probe describes a health check to be performed against a container to determine whether it is alive or ready to receive traffic.
exec dictionary	ExecAction specifies a command to execute.
command list / elements=string	Command is the command line to execute inside the container, the working directory for the command is root (‘/’) in the container’s filesystem. The command is simply exec’d, it is not run inside a shell, so traditional shell instructions (’\|’, etc) won’t work. To use a shell, you need to explicitly call out to that shell. Exit status of 0 is treated as live/healthy and non-zero is unhealthy.
failure_threshold integer	Number of consecutive failures before the probe is considered failed. Defaults to 3. Minimum value is 1. Maps to Kubernetes probe argument ‘failureThreshold’.
grpc dictionary	GrpcAction checks the health of a container using a gRPC service.
port integer	Port number of the gRPC service. Number must be in the range 1 to 65535.
service string	Service is the name of the service to place in the gRPC HealthCheckRequest. See https://github.com/grpc/grpc/blob/master/doc/health-checking.md. If this is not specified, the default behavior is defined by gRPC.
http_get dictionary	HttpGetAction describes an action based on HTTP Get requests.
host string	Host name to connect to, defaults to the model serving container’s IP. You probably want to set “Host” in httpHeaders instead.
http_headers list / elements=dictionary	Custom headers to set in the request. HTTP allows repeated headers.
name string	The header field name. This will be canonicalized upon output, so case-variant names will be understood as the same header.
value string	The header field value.
path string	Path to access on the HTTP server.
port integer	Number of the port to access on the container. Number must be in the range 1 to 65535.
scheme string	Scheme to use for connecting to the host. Defaults to HTTP. Acceptable values are “HTTP” or “HTTPS”.
initial_delay_seconds integer	Number of seconds to wait before starting the probe. Defaults to 0. Minimum value is 0. Maps to Kubernetes probe argument ‘initialDelaySeconds’.
period_seconds integer	How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1. Must be less than timeout_seconds. Maps to Kubernetes probe argument ‘periodSeconds’.
success_threshold integer	Number of consecutive successes before the probe is considered successful. Defaults to 1. Minimum value is 1. Maps to Kubernetes probe argument ‘successThreshold’.
tcp_socket dictionary	TcpSocketAction probes the health of a container by opening a TCP socket connection.
host string	Optional: Host name to connect to, defaults to the model serving container’s IP.
port integer	Number of the port to access on the container. Number must be in the range 1 to 65535.
timeout_seconds integer	Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1. Must be greater or equal to period_seconds. Maps to Kubernetes probe argument ‘timeoutSeconds’.
health_route string	HTTP path on the container to send health checks to. Vertex AI intermittently sends GET requests to this path on the container’s IP address and port to check that the container is healthy. Read more about [health checks](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#health). For example, if you set this field to `/bar`, then Vertex AI intermittently sends a GET request to the `/bar` path on the port of your container specified by the first value of this `ModelContainerSpec`’s ports field. If you don’t specify this field, it defaults to the following value when you deploy this Model to an Endpoint:/v1/endpoints/ENDPOINT/deployedModels/DEPLOYED_MODEL:predict The placeholders in this value are replaced as follows: * ENDPOINT: The last segment (following `endpoints/`)of the Endpoint.name][] field of the Endpoint where this Model has been deployed. (Vertex AI makes this value available to your container code as the [`AIP_ENDPOINT_ID` environment variable](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables).) * DEPLOYED_MODEL: DeployedModel.id of the `DeployedModel`. (Vertex AI makes this value available to your container code as the [`AIP_DEPLOYED_MODEL_ID` environment variable](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables).).
image_uri string / required	URI of the Docker image to be used as the custom container for serving predictions. This URI must identify an image in Artifact Registry or Container Registry. Learn more about the [container publishing requirements](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#publishing), including permissions requirements for the Vertex AI Service Agent. The container image is ingested upon ModelService.UploadModel, stored internally, and this original path is afterwards not used. To learn about the requirements for the Docker image itself, see [Custom container requirements](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#). You can use the URI to one of Vertex AI’s [pre-built container images for prediction](https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers) in this field.
liveness_probe dictionary	Probe describes a health check to be performed against a container to determine whether it is alive or ready to receive traffic.
exec dictionary	ExecAction specifies a command to execute.
command list / elements=string	Command is the command line to execute inside the container, the working directory for the command is root (‘/’) in the container’s filesystem. The command is simply exec’d, it is not run inside a shell, so traditional shell instructions (’\|’, etc) won’t work. To use a shell, you need to explicitly call out to that shell. Exit status of 0 is treated as live/healthy and non-zero is unhealthy.
failure_threshold integer	Number of consecutive failures before the probe is considered failed. Defaults to 3. Minimum value is 1. Maps to Kubernetes probe argument ‘failureThreshold’.
grpc dictionary	GrpcAction checks the health of a container using a gRPC service.
port integer	Port number of the gRPC service. Number must be in the range 1 to 65535.
service string	Service is the name of the service to place in the gRPC HealthCheckRequest. See https://github.com/grpc/grpc/blob/master/doc/health-checking.md. If this is not specified, the default behavior is defined by gRPC.
http_get dictionary	HttpGetAction describes an action based on HTTP Get requests.
host string	Host name to connect to, defaults to the model serving container’s IP. You probably want to set “Host” in httpHeaders instead.
http_headers list / elements=dictionary	Custom headers to set in the request. HTTP allows repeated headers.
name string	The header field name. This will be canonicalized upon output, so case-variant names will be understood as the same header.
value string	The header field value.
path string	Path to access on the HTTP server.
port integer	Number of the port to access on the container. Number must be in the range 1 to 65535.
scheme string	Scheme to use for connecting to the host. Defaults to HTTP. Acceptable values are “HTTP” or “HTTPS”.
initial_delay_seconds integer	Number of seconds to wait before starting the probe. Defaults to 0. Minimum value is 0. Maps to Kubernetes probe argument ‘initialDelaySeconds’.
period_seconds integer	How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1. Must be less than timeout_seconds. Maps to Kubernetes probe argument ‘periodSeconds’.
success_threshold integer	Number of consecutive successes before the probe is considered successful. Defaults to 1. Minimum value is 1. Maps to Kubernetes probe argument ‘successThreshold’.
tcp_socket dictionary	TcpSocketAction probes the health of a container by opening a TCP socket connection.
host string	Optional: Host name to connect to, defaults to the model serving container’s IP.
port integer	Number of the port to access on the container. Number must be in the range 1 to 65535.
timeout_seconds integer	Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1. Must be greater or equal to period_seconds. Maps to Kubernetes probe argument ‘timeoutSeconds’.
ports list / elements=dictionary	List of ports to expose from the container. Vertex AI sends any prediction requests that it receives to the first port on this list. Vertex AI also sends [liveness and health checks](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#liveness) to this port. If you do not specify this field, it defaults to following value: ```json [ { “containerPort”: 8080 } ] ``` Vertex AI does not use ports other than the first one listed. This field corresponds to the `ports` field of the Kubernetes Containers [v1 core API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core).
container_port integer	The number of the port to expose on the pod’s IP address. Must be a valid port number, between 1 and 65535 inclusive.
predict_route string	HTTP path on the container to send prediction requests to. Vertex AI forwards requests sent using projects.locations.endpoints.predict to this path on the container’s IP address and port. Vertex AI then returns the container’s response in the API response. For example, if you set this field to `/foo`, then when Vertex AI receives a prediction request, it forwards the request body in a POST request to the `/foo` path on the port of your container specified by the first value of this `ModelContainerSpec`’s ports field. If you don’t specify this field, it defaults to the following value when you deploy this Model to an Endpoint:/v1/endpoints/ENDPOINT/deployedModels/DEPLOYED_MODEL:predict The placeholders in this value are replaced as follows: * ENDPOINT: The last segment (following `endpoints/`)of the Endpoint.name][] field of the Endpoint where this Model has been deployed. (Vertex AI makes this value available to your container code as the [`AIP_ENDPOINT_ID` environment variable](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables).) * DEPLOYED_MODEL: DeployedModel.id of the `DeployedModel`. (Vertex AI makes this value available to your container code as the [`AIP_DEPLOYED_MODEL_ID` environment variable](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables).).
shared_memory_size_mb string	The amount of the VM memory to reserve as the shared memory for the model in megabytes.
startup_probe dictionary	Probe describes a health check to be performed against a container to determine whether it is alive or ready to receive traffic.
exec dictionary	ExecAction specifies a command to execute.
command list / elements=string	Command is the command line to execute inside the container, the working directory for the command is root (‘/’) in the container’s filesystem. The command is simply exec’d, it is not run inside a shell, so traditional shell instructions (’\|’, etc) won’t work. To use a shell, you need to explicitly call out to that shell. Exit status of 0 is treated as live/healthy and non-zero is unhealthy.
failure_threshold integer	Number of consecutive failures before the probe is considered failed. Defaults to 3. Minimum value is 1. Maps to Kubernetes probe argument ‘failureThreshold’.
grpc dictionary	GrpcAction checks the health of a container using a gRPC service.
port integer	Port number of the gRPC service. Number must be in the range 1 to 65535.
service string	Service is the name of the service to place in the gRPC HealthCheckRequest. See https://github.com/grpc/grpc/blob/master/doc/health-checking.md. If this is not specified, the default behavior is defined by gRPC.
http_get dictionary	HttpGetAction describes an action based on HTTP Get requests.
host string	Host name to connect to, defaults to the model serving container’s IP. You probably want to set “Host” in httpHeaders instead.
http_headers list / elements=dictionary	Custom headers to set in the request. HTTP allows repeated headers.
name string	The header field name. This will be canonicalized upon output, so case-variant names will be understood as the same header.
value string	The header field value.
path string	Path to access on the HTTP server.
port integer	Number of the port to access on the container. Number must be in the range 1 to 65535.
scheme string	Scheme to use for connecting to the host. Defaults to HTTP. Acceptable values are “HTTP” or “HTTPS”.
initial_delay_seconds integer	Number of seconds to wait before starting the probe. Defaults to 0. Minimum value is 0. Maps to Kubernetes probe argument ‘initialDelaySeconds’.
period_seconds integer	How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1. Must be less than timeout_seconds. Maps to Kubernetes probe argument ‘periodSeconds’.
success_threshold integer	Number of consecutive successes before the probe is considered successful. Defaults to 1. Minimum value is 1. Maps to Kubernetes probe argument ‘successThreshold’.
tcp_socket dictionary	TcpSocketAction probes the health of a container by opening a TCP socket connection.
host string	Optional: Host name to connect to, defaults to the model serving container’s IP.
port integer	Number of the port to access on the container. Number must be in the range 1 to 65535.
timeout_seconds integer	Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1. Must be greater or equal to period_seconds. Maps to Kubernetes probe argument ‘timeoutSeconds’.
hugging_face_access_token string	The Hugging Face read access token used to access the model artifacts of gated models.
hugging_face_cache_enabled boolean	If true, the model will deploy with a cached version instead of directly downloading the model artifacts from Hugging Face. This is suitable for VPC-SC users with limited internet access. Choices: `false` `true`
model_display_name string	The user-specified display name of the uploaded model. If not set, a default name will be used.
project string	The Google Cloud Platform project to use.
publisher_model_name string	The Model Garden model to deploy. Format: `publishers/{publisher}/models/{publisher_model}@{version_id}`, or `publishers/hf-{hugging-face-author}/models/{hugging-face-model-name}@001`.
scopes list / elements=string	Array of scopes to be used.
service_account_contents jsonarg	The contents of a Service Account JSON file, either in a dictionary or as a JSON string that represents it.
service_account_email string	An optional service account email address if machineaccount is selected and the user does not wish to use the default email.
service_account_file path	The path of a Service Account JSON file if serviceaccount is selected as type.
state string	Whether the resource should exist in GCP. Choices: `"present"` ← (default) `"absent"`

Notes 

Note

API Reference: https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations/deploy
Overview of Model Garden Guide: https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/explore-models
Overview of self-deployed models Guide: https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/self-deployed-models
Use models in Model Garden Guide: https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/use-models
For authentication, you can set auth_kind using the GCP_AUTH_KIND env variable.
For authentication, you can set service_account_file using the GCP_SERVICE_ACCOUNT_FILE env variable.
For authentication, you can set service_account_contents using the GCP_SERVICE_ACCOUNT_CONTENTS env variable.
For authentication, you can set service_account_email using the GCP_SERVICE_ACCOUNT_EMAIL env variable.
For authentication, you can set access_token using the GCP_ACCESS_TOKEN env variable.
For authentication, you can set scopes using the GCP_SCOPES env variable.
Environment variables values will only be used if the playbook values are not set.
The service_account_email, service_account_file, service_account_file and access_token options are mutually exclusive.

Examples 

- name: Deploy Basic Model
  google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment:
    state: present
    publisher_model_name: publishers/google/models/paligemma@paligemma-224-float32
    endpoint_config:
      endpoint_display_name: my-endpoint
    model_config:
      model_display_name: my-model
      accept_eula: true
    location: us-central1
    project: "{{ gcp_project }}"
    auth_kind: "{{ gcp_cred_kind }}"
    service_account_file: "{{ gcp_cred_file }}"

################################################################################

- name: Deploy Hugging Face Model
  google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment:
    state: present
    hugging_face_model_id: Qwen/Qwen3-0.6B
    endpoint_config:
      endpoint_display_name: huggingface-endpoint
    model_config:
      model_display_name: huggingface-model
      accept_eula: true
    location: us-central1
    project: "{{ gcp_project }}"
    auth_kind: "{{ gcp_cred_kind }}"
    service_account_file: "{{ gcp_cred_file }}"

################################################################################

- name: Deploy Basic Model with PSC Endpoint
  google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment:
    state: present
    publisher_model_name: publishers/google/models/paligemma@paligemma-224-float32
    display_name: my-psc-endoint
    endpoint_config:
      private_service_connect_config:
        enable_private_service_connect: true
        project_allowlist:
          - my-project-id
    model_config:
      accept_eula: true
    location: us-central1
    project: "{{ gcp_project }}"
    auth_kind: "{{ gcp_cred_kind }}"
    service_account_file: "{{ gcp_cred_file }}"

Return Values 

Common return values are documented here, the following are the fields unique to this module:

Key	Description
changed boolean	Whether the resource was changed. Returned: always
deployedModelDisplayName string	Output only. The display name assigned to the model deployed to the endpoint. This is not required to delete the resource but is used for debug logging. Returned: success
deployedModelId string	Output only. The unique numeric ID that Vertex AI assigns to the model at the time it is deployed to the endpoint. It is required to undeploy the model from the endpoint during resource deletion as described in https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.endpoints/undeployModel. Returned: success
state string	The current state of the resource. Returned: always

Authors

Google Inc. (@googlecloudplatform)

google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment module – Creates a GCP VertexAI.EndpointWithModelGardenDeployment resource

Synopsis

Requirements

Parameters

Notes

Examples

Return Values