google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment module – Creates a GCP VertexAI.EndpointWithModelGardenDeployment resource

Note

This module is part of the google.cloud collection (version 1.12.0).

You might already have this collection installed if you are using the ansible package. It is not included in ansible-core. To check whether it is installed, run ansible-galaxy collection list.

To install it, use: ansible-galaxy collection install google.cloud. You need further requirements to be able to use this module, see Requirements for details.

To use it in a playbook, specify: google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment.

Synopsis

  • Create an Endpoint and deploy a Model Garden model to it.

Requirements

The below requirements are needed on the host that executes this module.

  • python >= 3.8

  • requests >= 2.18.4

  • google-auth >= 2.25.1

Parameters

Parameter

Comments

access_token

string

The access token used to authenticate.

auth_kind

string / required

The type of credential used.

Choices:

  • "accesstoken"

  • "application"

  • "machineaccount"

  • "serviceaccount"

deploy_config

dictionary

The deploy config to use for the deployment.

dedicated_resources

dictionary

A description of resources that are dedicated to a DeployedModel or DeployedIndex, and that need a higher degree of manual configuration.

autoscaling_metric_specs

list / elements=dictionary

The metric specifications that overrides a resource utilization metric (CPU utilization, accelerator’s duty cycle, and so on) target value (default to 60 if not set).

At most one entry is allowed per metric.

If machine_spec.accelerator_count is above 0, the autoscaling will be based on both CPU utilization and accelerator’s duty cycle metrics and scale up when either metrics exceeds its target value while scale down if both metrics are under their target value.

The default target value is 60 for both metrics.

If machine_spec.accelerator_count is 0, the autoscaling will be based on CPU utilization metric only with default target value 60 if not explicitly set.

For example, in the case of Online Prediction, if you want to override target CPU utilization to 80, you should set autoscaling_metric_specs.metric_name to `aiplatform.googleapis.com/prediction/online/cpu/utilization` and autoscaling_metric_specs.target to `80`.

metric_name

string / required

The resource metric name.

Supported metrics: * For Online Prediction: * `aiplatform.googleapis.com/prediction/online/accelerator/duty_cycle` * `aiplatform.googleapis.com/prediction/online/cpu/utilization`.

target

integer

The target resource utilization in percentage (1% - 100%) for the given metric; once the real usage deviates from the target by a certain percentage, the machine replicas change.

The default value is 60 (representing 60%) if not provided.

machine_spec

dictionary / required

Specification of a single machine.

accelerator_count

integer

The number of accelerators to attach to the machine.

accelerator_type

string

Possible values: ACCELERATOR_TYPE_UNSPECIFIED NVIDIA_TESLA_K80 NVIDIA_TESLA_P100 NVIDIA_TESLA_V100 NVIDIA_TESLA_P4 NVIDIA_TESLA_T4 NVIDIA_TESLA_A100 NVIDIA_A100_80GB NVIDIA_L4 NVIDIA_H100_80GB NVIDIA_H100_MEGA_80GB NVIDIA_H200_141GB NVIDIA_B200 TPU_V2 TPU_V3 TPU_V4_POD TPU_V5_LITEPOD.

machine_type

string

The type of the machine.

See the [list of machine types supported for prediction](https://cloud.google.com/vertex-ai/docs/predictions/configure-compute#machine-types) See the [list of machine types supported for custom training](https://cloud.google.com/vertex-ai/docs/training/configure-compute#machine-types).

For DeployedModel this field is optional, and the default value is `n1-standard-2`.

For BatchPredictionJob or as part of WorkerPoolSpec this field is required.

multihost_gpu_node_count

integer

The number of nodes per replica for multihost GPU deployments.

reservation_affinity

dictionary

A ReservationAffinity can be used to configure a Vertex AI resource (e.g., a DeployedModel) to draw its Compute Engine resources from a Shared Reservation, or exclusively from on-demand capacity.

key

string

Corresponds to the label key of a reservation resource.

To target a SPECIFIC_RESERVATION by name, use `compute.googleapis.com/reservation-name` as the key and specify the name of your reservation as its value.

reservation_affinity_type

string / required

Specifies the reservation affinity type.

Possible values: TYPE_UNSPECIFIED NO_RESERVATION ANY_RESERVATION SPECIFIC_RESERVATION.

values

list / elements=string

Corresponds to the label values of a reservation resource.

This must be the full resource name of the reservation or reservation block.

tpu_topology

string

The topology of the TPUs.

Corresponds to the TPU topologies available from GKE.

(Example: tpu_topology: “2x2x1”).

max_replica_count

integer

The maximum number of replicas that may be deployed on when the traffic against it increases.

If the requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale to that many replicas is guaranteed (barring service outages).

If traffic increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped.

If this value is not provided, will use min_replica_count as the default value.

The value of this field impacts the charge against Vertex CPU and GPU quotas.

Specifically, you will be charged for (max_replica_count * number of cores in the selected machine type) and (max_replica_count * number of GPUs per replica in the selected machine type).

min_replica_count

integer / required

The minimum number of machine replicas that will be always deployed on.

This value must be greater than or equal to 1.

If traffic increases, it may dynamically be deployed onto more replicas, and as traffic decreases, some of these extra replicas may be freed.

required_replica_count

integer

Number of required available replicas for the deployment to succeed.

This field is only needed when partial deployment/mutation is desired.

If set, the deploy/mutate operation will succeed once available_replica_count reaches required_replica_count, and the rest of the replicas will be retried.

If not set, the default required_replica_count will be min_replica_count.

spot

boolean

If true, schedule the deployment workload on [spot VMs](https://cloud.google.com/kubernetes-engine/docs/concepts/spot-vms).

Choices:

  • false

  • true

fast_tryout_enabled

boolean

If true, enable the QMT fast tryout feature for this model if possible.

Choices:

  • false

  • true

system_labels

dictionary

System labels for Model Garden deployments.

These labels are managed by Google and for tracking purposes only.

display_name

string / required

The user-specified display name.

This will be the default display name for both the endpoint and the deployed model.

endpoint_config

dictionary

The endpoint config to use for the deployment.

dedicated_endpoint_enabled

boolean

If true, the endpoint will be exposed through a dedicated DNS [Endpoint.dedicated_endpoint_dns].

Your request to the dedicated DNS will be isolated from other users’ traffic and will have better performance and reliability.

Note: Once you enabled dedicated endpoint, you won’t be able to send request to the shared DNS {region}-aiplatform.googleapis.com.

The limitations will be removed soon.

Choices:

  • false

  • true

endpoint_display_name

string

The user-specified display name of the endpoint.

If not set, a default name will be used.

private_service_connect_config

dictionary

The configuration for Private Service Connect (PSC).

enable_private_service_connect

boolean / required

If true, expose the IndexEndpoint via private service connect.

Choices:

  • false

  • true

project_allowlist

list / elements=string

A list of Projects from which the forwarding rule will target the service attachment.

psc_automation_configs

dictionary

PSC config that is used to automatically create PSC endpoints in the user projects.

error_message

string

Output only.

Error message if the PSC service automation failed.

forwarding_rule

string

Output only.

Forwarding rule created by the PSC service automation.

ip_address

string

Output only.

IP address rule created by the PSC service automation.

network

string / required

The full name of the Google Compute Engine network.

Format: projects/{project}/global/networks/{network}.

project_id

string / required

Project id used to create forwarding rule.

state

string

Output only.

The state of the PSC service automation.

Choices:

  • "PSC_AUTOMATION_STATE_UNSPECIFIED"

  • "PSC_AUTOMATION_STATE_SUCCESSFUL"

  • "PSC_AUTOMATION_STATE_FAILED"

service_attachment

string

Output only.

The name of the generated service attachment resource.

This is only populated if the endpoint is deployed with PrivateServiceConnect.

env_type

string

Specifies which Ansible environment you’re running this module within.

This should not be set unless you know what you’re doing.

This only alters the User Agent string for any API requests.

hugging_face_model_id

string

The Hugging Face model to deploy.

Format: Hugging Face model ID like `google/gemma-2-2b-it`.

location

string / required

Resource ID segment making up resource `location`.

It identifies the resource within its parent collection as described in https://google.aip.dev/122.

model_config

dictionary

The model config to use for the deployment.

accept_eula

boolean

Whether the user accepts the End User License Agreement (EULA) for the model.

Choices:

  • false

  • true

container_spec

dictionary

Specification of a container for serving predictions.

Some fields in this message correspond to fields in the [Kubernetes Container v1 core specification](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core).

args

list / elements=string

Specifies arguments for the command that runs when the container starts.

This overrides the container’s [`CMD`](https://docs.docker.com/engine/reference/builder/#cmd).

Specify this field as an array of executable and arguments, similar to a Docker `CMD`’s “default parameters” form.

If you don’t specify this field but do specify the command field, then the command from the `command` field runs without any additional arguments.

See the [Kubernetes documentation about how the `command` and `args` fields interact with a container’s `ENTRYPOINT` and `CMD`](https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#notes).

If you don’t specify this field and don’t specify the `command` field, then the container’s [`ENTRYPOINT`](https://docs.docker.com/engine/reference/builder/#cmd) and `CMD` determine what runs based on their default behavior.

See the Docker documentation about [how `CMD` and `ENTRYPOINT` interact](https://docs.docker.com/engine/reference/builder/#understand-how-cmd-and-entrypoint-interact).

In this field, you can reference [environment variables set by Vertex AI](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables) and environment variables set in the env field.

You cannot reference environment variables set in the Docker image.

In order for environment variables to be expanded, reference them by using the following syntax:$(VARIABLE_NAME) Note that this differs from Bash variable expansion, which does not use parentheses.

If a variable cannot be resolved, the reference in the input string is used unchanged.

To avoid variable expansion, you can escape this syntax with `$$`; for example:$$(VARIABLE_NAME) This field corresponds to the `args` field of the Kubernetes Containers [v1 core API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core).

command

list / elements=string

Specifies the command that runs when the container starts.

This overrides the container’s [ENTRYPOINT](https://docs.docker.com/engine/reference/builder/#entrypoint).

Specify this field as an array of executable and arguments, similar to a Docker `ENTRYPOINT`’s “exec” form, not its “shell” form.

If you do not specify this field, then the container’s `ENTRYPOINT` runs, in conjunction with the args field or the container’s [`CMD`](https://docs.docker.com/engine/reference/builder/#cmd), if either exists.

If this field is not specified and the container does not have an `ENTRYPOINT`, then refer to the Docker documentation about [how `CMD` and `ENTRYPOINT` interact](https://docs.docker.com/engine/reference/builder/#understand-how-cmd-and-entrypoint-interact).

If you specify this field, then you can also specify the `args` field to provide additional arguments for this command.

However, if you specify this field, then the container’s `CMD` is ignored.

See the [Kubernetes documentation about how the `command` and `args` fields interact with a container’s `ENTRYPOINT` and `CMD`](https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#notes).

In this field, you can reference [environment variables set by Vertex AI](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables) and environment variables set in the env field.

You cannot reference environment variables set in the Docker image.

In order for environment variables to be expanded, reference them by using the following syntax:$(VARIABLE_NAME) Note that this differs from Bash variable expansion, which does not use parentheses.

If a variable cannot be resolved, the reference in the input string is used unchanged.

To avoid variable expansion, you can escape this syntax with `$$`; for example:$$(VARIABLE_NAME) This field corresponds to the `command` field of the Kubernetes Containers [v1 core API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core).

deployment_timeout

string

Deployment timeout.

Limit for deployment timeout is 2 hours.

env

list / elements=dictionary

List of environment variables to set in the container.

After the container starts running, code running in the container can read these environment variables.

Additionally, the command and args fields can reference these variables.

Later entries in this list can also reference earlier entries.

For example, the following example sets the variable `VAR_2` to have the value `foo bar`: ```json [ { “name”: “VAR_1”, “value”: “foo” }, { “name”: “VAR_2”, “value”: “$(VAR_1) bar” } ] ``` If you switch the order of the variables in the example, then the expansion does not occur.

This field corresponds to the `env` field of the Kubernetes Containers [v1 core API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core).

name

string / required

Name of the environment variable.

Must be a valid C identifier.

value

string / required

Variables that reference a $(VAR_NAME) are expanded using the previous defined environment variables in the container and any service environment variables.

If a variable cannot be resolved, the reference in the input string will be unchanged.

The $(VAR_NAME) syntax can be escaped with a double $$, ie: $$(VAR_NAME).

Escaped references will never be expanded, regardless of whether the variable exists or not.

grpc_ports

list / elements=dictionary

List of ports to expose from the container.

Vertex AI sends gRPC prediction requests that it receives to the first port on this list.

Vertex AI also sends liveness and health checks to this port.

If you do not specify this field, gRPC requests to the container will be disabled.

Vertex AI does not use ports other than the first one listed.

This field corresponds to the `ports` field of the Kubernetes Containers v1 core API.

container_port

integer

The number of the port to expose on the pod’s IP address.

Must be a valid port number, between 1 and 65535 inclusive.

health_probe

dictionary

Probe describes a health check to be performed against a container to determine whether it is alive or ready to receive traffic.

exec

dictionary

ExecAction specifies a command to execute.

command

list / elements=string

Command is the command line to execute inside the container, the working directory for the command is root (‘/’) in the container’s filesystem.

The command is simply exec’d, it is not run inside a shell, so traditional shell instructions (’|’, etc) won’t work.

To use a shell, you need to explicitly call out to that shell.

Exit status of 0 is treated as live/healthy and non-zero is unhealthy.

failure_threshold

integer

Number of consecutive failures before the probe is considered failed.

Defaults to 3.

Minimum value is 1.

Maps to Kubernetes probe argument ‘failureThreshold’.

grpc

dictionary

GrpcAction checks the health of a container using a gRPC service.

port

integer

Port number of the gRPC service.

Number must be in the range 1 to 65535.

service

string

Service is the name of the service to place in the gRPC HealthCheckRequest.

See https://github.com/grpc/grpc/blob/master/doc/health-checking.md.

If this is not specified, the default behavior is defined by gRPC.

http_get

dictionary

HttpGetAction describes an action based on HTTP Get requests.

host

string

Host name to connect to, defaults to the model serving container’s IP.

You probably want to set “Host” in httpHeaders instead.

http_headers

list / elements=dictionary

Custom headers to set in the request.

HTTP allows repeated headers.

name

string

The header field name.

This will be canonicalized upon output, so case-variant names will be understood as the same header.

value

string

The header field value.

path

string

Path to access on the HTTP server.

port

integer

Number of the port to access on the container.

Number must be in the range 1 to 65535.

scheme

string

Scheme to use for connecting to the host.

Defaults to HTTP.

Acceptable values are “HTTP” or “HTTPS”.

initial_delay_seconds

integer

Number of seconds to wait before starting the probe.

Defaults to 0.

Minimum value is 0.

Maps to Kubernetes probe argument ‘initialDelaySeconds’.

period_seconds

integer

How often (in seconds) to perform the probe.

Default to 10 seconds.

Minimum value is 1.

Must be less than timeout_seconds.

Maps to Kubernetes probe argument ‘periodSeconds’.

success_threshold

integer

Number of consecutive successes before the probe is considered successful.

Defaults to 1.

Minimum value is 1.

Maps to Kubernetes probe argument ‘successThreshold’.

tcp_socket

dictionary

TcpSocketAction probes the health of a container by opening a TCP socket connection.

host

string

Optional: Host name to connect to, defaults to the model serving container’s IP.

port

integer

Number of the port to access on the container.

Number must be in the range 1 to 65535.

timeout_seconds

integer

Number of seconds after which the probe times out.

Defaults to 1 second.

Minimum value is 1.

Must be greater or equal to period_seconds.

Maps to Kubernetes probe argument ‘timeoutSeconds’.

health_route

string

HTTP path on the container to send health checks to.

Vertex AI intermittently sends GET requests to this path on the container’s IP address and port to check that the container is healthy.

Read more about [health checks](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#health).

For example, if you set this field to `/bar`, then Vertex AI intermittently sends a GET request to the `/bar` path on the port of your container specified by the first value of this `ModelContainerSpec`’s ports field.

If you don’t specify this field, it defaults to the following value when you deploy this Model to an Endpoint:/v1/endpoints/ENDPOINT/deployedModels/DEPLOYED_MODEL:predict The placeholders in this value are replaced as follows: * ENDPOINT: The last segment (following `endpoints/`)of the Endpoint.name][] field of the Endpoint where this Model has been deployed.

(Vertex AI makes this value available to your container code as the [`AIP_ENDPOINT_ID` environment variable](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables).) * DEPLOYED_MODEL: DeployedModel.id of the `DeployedModel`.

(Vertex AI makes this value available to your container code as the [`AIP_DEPLOYED_MODEL_ID` environment variable](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables).).

image_uri

string / required

URI of the Docker image to be used as the custom container for serving predictions.

This URI must identify an image in Artifact Registry or Container Registry.

Learn more about the [container publishing requirements](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#publishing), including permissions requirements for the Vertex AI Service Agent.

The container image is ingested upon ModelService.UploadModel, stored internally, and this original path is afterwards not used.

To learn about the requirements for the Docker image itself, see [Custom container requirements](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#).

You can use the URI to one of Vertex AI’s [pre-built container images for prediction](https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers) in this field.

liveness_probe

dictionary

Probe describes a health check to be performed against a container to determine whether it is alive or ready to receive traffic.

exec

dictionary

ExecAction specifies a command to execute.

command

list / elements=string

Command is the command line to execute inside the container, the working directory for the command is root (‘/’) in the container’s filesystem.

The command is simply exec’d, it is not run inside a shell, so traditional shell instructions (’|’, etc) won’t work.

To use a shell, you need to explicitly call out to that shell.

Exit status of 0 is treated as live/healthy and non-zero is unhealthy.

failure_threshold

integer

Number of consecutive failures before the probe is considered failed.

Defaults to 3.

Minimum value is 1.

Maps to Kubernetes probe argument ‘failureThreshold’.

grpc

dictionary

GrpcAction checks the health of a container using a gRPC service.

port

integer

Port number of the gRPC service.

Number must be in the range 1 to 65535.

service

string

Service is the name of the service to place in the gRPC HealthCheckRequest.

See https://github.com/grpc/grpc/blob/master/doc/health-checking.md.

If this is not specified, the default behavior is defined by gRPC.

http_get

dictionary

HttpGetAction describes an action based on HTTP Get requests.

host

string

Host name to connect to, defaults to the model serving container’s IP.

You probably want to set “Host” in httpHeaders instead.

http_headers

list / elements=dictionary

Custom headers to set in the request.

HTTP allows repeated headers.

name

string

The header field name.

This will be canonicalized upon output, so case-variant names will be understood as the same header.

value

string

The header field value.

path

string

Path to access on the HTTP server.

port

integer

Number of the port to access on the container.

Number must be in the range 1 to 65535.

scheme

string

Scheme to use for connecting to the host.

Defaults to HTTP.

Acceptable values are “HTTP” or “HTTPS”.

initial_delay_seconds

integer

Number of seconds to wait before starting the probe.

Defaults to 0.

Minimum value is 0.

Maps to Kubernetes probe argument ‘initialDelaySeconds’.

period_seconds

integer

How often (in seconds) to perform the probe.

Default to 10 seconds.

Minimum value is 1.

Must be less than timeout_seconds.

Maps to Kubernetes probe argument ‘periodSeconds’.

success_threshold

integer

Number of consecutive successes before the probe is considered successful.

Defaults to 1.

Minimum value is 1.

Maps to Kubernetes probe argument ‘successThreshold’.

tcp_socket

dictionary

TcpSocketAction probes the health of a container by opening a TCP socket connection.

host

string

Optional: Host name to connect to, defaults to the model serving container’s IP.

port

integer

Number of the port to access on the container.

Number must be in the range 1 to 65535.

timeout_seconds

integer

Number of seconds after which the probe times out.

Defaults to 1 second.

Minimum value is 1.

Must be greater or equal to period_seconds.

Maps to Kubernetes probe argument ‘timeoutSeconds’.

ports

list / elements=dictionary

List of ports to expose from the container.

Vertex AI sends any prediction requests that it receives to the first port on this list.

Vertex AI also sends [liveness and health checks](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#liveness) to this port.

If you do not specify this field, it defaults to following value: ```json [ { “containerPort”: 8080 } ] ``` Vertex AI does not use ports other than the first one listed.

This field corresponds to the `ports` field of the Kubernetes Containers [v1 core API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#container-v1-core).

container_port

integer

The number of the port to expose on the pod’s IP address.

Must be a valid port number, between 1 and 65535 inclusive.

predict_route

string

HTTP path on the container to send prediction requests to.

Vertex AI forwards requests sent using projects.locations.endpoints.predict to this path on the container’s IP address and port.

Vertex AI then returns the container’s response in the API response.

For example, if you set this field to `/foo`, then when Vertex AI receives a prediction request, it forwards the request body in a POST request to the `/foo` path on the port of your container specified by the first value of this `ModelContainerSpec`’s ports field.

If you don’t specify this field, it defaults to the following value when you deploy this Model to an Endpoint:/v1/endpoints/ENDPOINT/deployedModels/DEPLOYED_MODEL:predict The placeholders in this value are replaced as follows: * ENDPOINT: The last segment (following `endpoints/`)of the Endpoint.name][] field of the Endpoint where this Model has been deployed.

(Vertex AI makes this value available to your container code as the [`AIP_ENDPOINT_ID` environment variable](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables).) * DEPLOYED_MODEL: DeployedModel.id of the `DeployedModel`.

(Vertex AI makes this value available to your container code as the [`AIP_DEPLOYED_MODEL_ID` environment variable](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#aip-variables).).

shared_memory_size_mb

string

The amount of the VM memory to reserve as the shared memory for the model in megabytes.

startup_probe

dictionary

Probe describes a health check to be performed against a container to determine whether it is alive or ready to receive traffic.

exec

dictionary

ExecAction specifies a command to execute.

command

list / elements=string

Command is the command line to execute inside the container, the working directory for the command is root (‘/’) in the container’s filesystem.

The command is simply exec’d, it is not run inside a shell, so traditional shell instructions (’|’, etc) won’t work.

To use a shell, you need to explicitly call out to that shell.

Exit status of 0 is treated as live/healthy and non-zero is unhealthy.

failure_threshold

integer

Number of consecutive failures before the probe is considered failed.

Defaults to 3.

Minimum value is 1.

Maps to Kubernetes probe argument ‘failureThreshold’.

grpc

dictionary

GrpcAction checks the health of a container using a gRPC service.

port

integer

Port number of the gRPC service.

Number must be in the range 1 to 65535.

service

string

Service is the name of the service to place in the gRPC HealthCheckRequest.

See https://github.com/grpc/grpc/blob/master/doc/health-checking.md.

If this is not specified, the default behavior is defined by gRPC.

http_get

dictionary

HttpGetAction describes an action based on HTTP Get requests.

host

string

Host name to connect to, defaults to the model serving container’s IP.

You probably want to set “Host” in httpHeaders instead.

http_headers

list / elements=dictionary

Custom headers to set in the request.

HTTP allows repeated headers.

name

string

The header field name.

This will be canonicalized upon output, so case-variant names will be understood as the same header.

value

string

The header field value.

path

string

Path to access on the HTTP server.

port

integer

Number of the port to access on the container.

Number must be in the range 1 to 65535.

scheme

string

Scheme to use for connecting to the host.

Defaults to HTTP.

Acceptable values are “HTTP” or “HTTPS”.

initial_delay_seconds

integer

Number of seconds to wait before starting the probe.

Defaults to 0.

Minimum value is 0.

Maps to Kubernetes probe argument ‘initialDelaySeconds’.

period_seconds

integer

How often (in seconds) to perform the probe.

Default to 10 seconds.

Minimum value is 1.

Must be less than timeout_seconds.

Maps to Kubernetes probe argument ‘periodSeconds’.

success_threshold

integer

Number of consecutive successes before the probe is considered successful.

Defaults to 1.

Minimum value is 1.

Maps to Kubernetes probe argument ‘successThreshold’.

tcp_socket

dictionary

TcpSocketAction probes the health of a container by opening a TCP socket connection.

host

string

Optional: Host name to connect to, defaults to the model serving container’s IP.

port

integer

Number of the port to access on the container.

Number must be in the range 1 to 65535.

timeout_seconds

integer

Number of seconds after which the probe times out.

Defaults to 1 second.

Minimum value is 1.

Must be greater or equal to period_seconds.

Maps to Kubernetes probe argument ‘timeoutSeconds’.

hugging_face_access_token

string

The Hugging Face read access token used to access the model artifacts of gated models.

hugging_face_cache_enabled

boolean

If true, the model will deploy with a cached version instead of directly downloading the model artifacts from Hugging Face.

This is suitable for VPC-SC users with limited internet access.

Choices:

  • false

  • true

model_display_name

string

The user-specified display name of the uploaded model.

If not set, a default name will be used.

project

string

The Google Cloud Platform project to use.

publisher_model_name

string

scopes

list / elements=string

Array of scopes to be used.

service_account_contents

jsonarg

The contents of a Service Account JSON file,

either in a dictionary or as a JSON string that represents it.

service_account_email

string

An optional service account email address if machineaccount is

selected and the user does not wish to use the default email.

service_account_file

path

The path of a Service Account JSON file if serviceaccount

is selected as type.

state

string

Whether the resource should exist in GCP.

Choices:

  • "present" ← (default)

  • "absent"

Notes

Note

Examples

- name: Deploy Basic Model
  google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment:
    state: present
    publisher_model_name: publishers/google/models/paligemma@paligemma-224-float32
    endpoint_config:
      endpoint_display_name: my-endpoint
    model_config:
      model_display_name: my-model
      accept_eula: true
    location: us-central1
    project: "{{ gcp_project }}"
    auth_kind: "{{ gcp_cred_kind }}"
    service_account_file: "{{ gcp_cred_file }}"

################################################################################

- name: Deploy Hugging Face Model
  google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment:
    state: present
    hugging_face_model_id: Qwen/Qwen3-0.6B
    endpoint_config:
      endpoint_display_name: huggingface-endpoint
    model_config:
      model_display_name: huggingface-model
      accept_eula: true
    location: us-central1
    project: "{{ gcp_project }}"
    auth_kind: "{{ gcp_cred_kind }}"
    service_account_file: "{{ gcp_cred_file }}"

################################################################################

- name: Deploy Basic Model with PSC Endpoint
  google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment:
    state: present
    publisher_model_name: publishers/google/models/paligemma@paligemma-224-float32
    display_name: my-psc-endoint
    endpoint_config:
      private_service_connect_config:
        enable_private_service_connect: true
        project_allowlist:
          - my-project-id
    model_config:
      accept_eula: true
    location: us-central1
    project: "{{ gcp_project }}"
    auth_kind: "{{ gcp_cred_kind }}"
    service_account_file: "{{ gcp_cred_file }}"

Return Values

Common return values are documented here, the following are the fields unique to this module:

Key

Description

changed

boolean

Whether the resource was changed.

Returned: always

deployedModelDisplayName

string

Output only.

The display name assigned to the model deployed to the endpoint.

This is not required to delete the resource but is used for debug logging.

Returned: success

deployedModelId

string

Output only.

The unique numeric ID that Vertex AI assigns to the model at the time it is deployed to the endpoint.

It is required to undeploy the model from the endpoint during resource deletion as described in https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.endpoints/undeployModel.

Returned: success

state

string

The current state of the resource.

Returned: always

Authors

  • Google Inc. (@googlecloudplatform)