google.cloud.gcp_vertexai_endpoint module – Creates a GCP VertexAI.Endpoint resource

Note

This module is part of the google.cloud collection (version 1.13.0).

You might already have this collection installed if you are using the ansible package. It is not included in ansible-core. To check whether it is installed, run ansible-galaxy collection list.

To install it, use: ansible-galaxy collection install google.cloud. You need further requirements to be able to use this module, see Requirements for details.

To use it in a playbook, specify: google.cloud.gcp_vertexai_endpoint.

Synopsis 

Models are deployed into it, and afterwards Endpoint is called to obtain predictions and explanations.

Requirements 

The below requirements are needed on the host that executes this module.

python >= 3.8
requests >= 2.18.4
google-auth >= 2.25.1

Parameters 

Parameter	Comments
access_token string	The access token used to authenticate.
auth_kind string / required	The type of credential used. Choices: `"accesstoken"` `"application"` `"machineaccount"` `"serviceaccount"`
dedicated_endpoint_enabled boolean	If true, the endpoint will be exposed through a dedicated DNS [Endpoint.dedicated_endpoint_dns]. Your request to the dedicated DNS will be isolated from other users’ traffic and will have better performance and reliability. Note: Once you enabled dedicated endpoint, you won’t be able to send request to the shared DNS {region}-aiplatform.googleapis.com. The limitation will be removed soon. Choices: `false` `true`
description string	The description of the Endpoint.
display_name string / required	The display name of the Endpoint. The name can be up to 128 characters long and can consist of any UTF-8 characters.
encryption_spec dictionary	Customer-managed encryption key spec for an Endpoint. If set, this Endpoint and all sub-resources of this Endpoint will be secured by this key.
kms_key_name string / required	The Cloud KMS resource identifier of the customer managed encryption key used to protect a resource. Has the form: `projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key`. The key needs to be in the same region as where the compute resource is created.
env_type string	Specifies which Ansible environment you’re running this module within. This should not be set unless you know what you’re doing. This only alters the User Agent string for any API requests.
labels dictionary	The labels with user-defined metadata to organize your Endpoints. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
location string / required	The location for the resource.
name string / required	The resource name of the Endpoint. The name must be numeric with no leading zeros and can be at most 10 digits.
network string	The full name of the Google Compute Engine [network](https://cloud.google.com//compute/docs/networks-and-firewalls#networks) to which the Endpoint should be peered. Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network. Only one of the fields, network or enable_private_service_connect, can be set. [Format](https://cloud.google.com/compute/docs/reference/rest/v1/networks/insert): `projects/{project}/global/networks/{network}`. Where `{project}` is a project number, as in `12345`, and `{network}` is network name. Only one of the fields, `network` or `privateServiceConnectConfig`, can be set.
predict_request_response_logging_config dictionary	Configures the request-response logging for online prediction.
bigquery_destination dictionary	BigQuery table for logging. If only given a project, a new dataset will be created with name `logging_<endpoint-display-name>_<endpoint-id>` where will be made BigQuery-dataset-name compatible (e.g. most special characters will become underscores). If no table name is given, a new table will be created with name `request_response_logging`.
output_uri string	BigQuery URI to a project or table, up to 2000 characters long. When only the project is specified, the Dataset and Table is created. When the full table reference is specified, the Dataset must exist and table must not exist. Accepted forms: - BigQuery path. For example: `bq://projectId` or `bq://projectId.bqDatasetId` or `bq://projectId.bqDatasetId.bqTableId`.
enabled boolean	If logging is enabled or not. Choices: `false` `true`
sampling_rate string	Percentage of requests to be logged, expressed as a fraction in range(0,1].
private_service_connect_config dictionary	Configuration for private service connect. `network` and `privateServiceConnectConfig` are mutually exclusive.
enable_private_service_connect boolean / required	If true, expose the IndexEndpoint via private service connect. Choices: `false` `true`
enable_secure_private_service_connect boolean	If set to true, enable secure private service connect with IAM authorization. Otherwise, private service connect will be done without authorization. Note latency will be slightly increased if authorization is enabled. Choices: `false` `true`
project_allowlist list / elements=string	A list of Projects from which the forwarding rule will target the service attachment.
psc_automation_configs list / elements=dictionary	List of projects and networks where the PSC endpoints will be created. This field is used by Online Inference(Prediction) only.
error_message string	Error message if the PSC service automation failed.
forwarding_rule string	Forwarding rule created by the PSC service automation.
ip_address string	IP address rule created by the PSC service automation.
network string / required	The full name of the Google Compute Engine [network](https://cloud.google.com/compute/docs/networks-and-firewalls#networks). [Format](https://cloud.google.com/compute/docs/reference/rest/v1/networks/get): projects/{project}/global/networks/{network}.
project_id string / required	Project id used to create forwarding rule.
state string	The state of the PSC service automation. Choices: `"PSC_AUTOMATION_STATE_FAILED"` `"PSC_AUTOMATION_STATE_SUCCESSFUL"`
project string	The Google Cloud Platform project to use.
region string	The region for the resource.
scopes list / elements=string	Array of scopes to be used.
service_account_contents jsonarg	The contents of a Service Account JSON file, either in a dictionary or as a JSON string that represents it.
service_account_email string	An optional service account email address if machineaccount is selected and the user does not wish to use the default email.
service_account_file path	The path of a Service Account JSON file if serviceaccount is selected as type.
state string	Whether the resource should exist in GCP. Choices: `"present"` ← (default) `"absent"`
traffic_split string	A map from a DeployedModel’s id to the percentage of this Endpoint’s traffic that should be forwarded to that DeployedModel. If a DeployedModel’s id is not listed in this map, then it receives no traffic. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment. See the `deployModel` [example](https://cloud.google.com/vertex-ai/docs/general/deployment#deploy_a_model_to_an_endpoint) and [documentation](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations.endpoints/deployModel) for more information. ~> Note: To set the map to empty, set `”{}”`, apply, and then remove the field from your config.

Notes 

Note

API Reference: https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.endpoints
Official Documentation Guide: https://cloud.google.com/vertex-ai/docs
For authentication, you can set auth_kind using the GCP_AUTH_KIND env variable.
For authentication, you can set service_account_file using the GCP_SERVICE_ACCOUNT_FILE env variable.
For authentication, you can set service_account_contents using the GCP_SERVICE_ACCOUNT_CONTENTS env variable.
For authentication, you can set service_account_email using the GCP_SERVICE_ACCOUNT_EMAIL env variable.
For authentication, you can set access_token using the GCP_ACCESS_TOKEN env variable.
For authentication, you can set scopes using the GCP_SCOPES env variable.
Environment variables values will only be used if the playbook values are not set.
The service_account_email, service_account_file, service_account_file and access_token options are mutually exclusive.

Examples 

- name: Create Endpoint
  google.cloud.gcp_vertexai_endpoint:
    name: "{{ resource_name }}"
    state: present
    display_name: "{{ resource_name }}"
    # network: "projects/{{ gcp_project_number }}/global/networks/{{ mynet }}"  # Network must be peered
    location: us-central1
    region: us-central1
    project: "{{ gcp_project }}"
    auth_kind: "{{ gcp_cred_kind }}"
    service_account_file: "{{ gcp_cred_file }}"
  register: _myep

- name: Print Endpoint
  ansible.builtin.debug:
    var: _myep

- name: Delete Endpoint
  google.cloud.gcp_vertexai_endpoint:
    name: "{{ resource_name }}"
    state: absent
    display_name: "{{ resource_name }}"
    # network: "projects/{{ gcp_project_number }}/global/networks/{{ mynet }}"
    location: us-central1
    region: us-central1
    project: "{{ gcp_project }}"
    auth_kind: "{{ gcp_cred_kind }}"
    service_account_file: "{{ gcp_cred_file }}"

Return Values 

Common return values are documented here, the following are the fields unique to this module:

Key	Description
changed boolean	Whether the resource was changed. Returned: always
createTime string	Output only. Timestamp when this Endpoint was created. Returned: success
dedicatedEndpointDns string	Output only. DNS of the dedicated endpoint. Will only be populated if dedicatedEndpointEnabled is true. Format: `https://{endpointId}.{region}-{projectNumber}.prediction.vertexai.goog`. Returned: success
deployedModels list / elements=dictionary	Output only. The models deployed in this Endpoint. To add or remove DeployedModels use EndpointService.DeployModel and EndpointService.UndeployModel respectively. Models can also be deployed and undeployed using the [Cloud Console](https://console.cloud.google.com/vertex-ai/). Returned: success
automaticResources dictionary	A description of resources that to large degree are decided by Vertex AI, and require only a modest additional configuration. Returned: success
maxReplicaCount integer	The maximum number of replicas this DeployedModel may be deployed on when the traffic against it increases. If the requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale the model to that many replicas is guaranteed (barring service outages). If traffic against the DeployedModel increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, a no upper bound for scaling under heavy traffic will be assume, though Vertex AI may be unable to scale beyond certain replica number. Returned: success
minReplicaCount integer	The minimum number of replicas this DeployedModel will be always deployed on. If traffic against it increases, it may dynamically be deployed onto more replicas up to max_replica_count, and as traffic decreases, some of these extra replicas may be freed. If the requested value is too large, the deployment will error. Returned: success
createTime string	Output only. Timestamp when the DeployedModel was created. Returned: success
dedicatedResources dictionary	A description of resources that are dedicated to the DeployedModel, and that need a higher degree of manual configuration. Returned: success
autoscalingMetricSpecs list / elements=dictionary	The metric specifications that overrides a resource utilization metric (CPU utilization, accelerator’s duty cycle, and so on) target value (default to 60 if not set). At most one entry is allowed per metric. If machine_spec.accelerator_count is above 0, the autoscaling will be based on both CPU utilization and accelerator’s duty cycle metrics and scale up when either metrics exceeds its target value while scale down if both metrics are under their target value. The default target value is 60 for both metrics. If machine_spec.accelerator_count is 0, the autoscaling will be based on CPU utilization metric only with default target value 60 if not explicitly set. For example, in the case of Online Prediction, if you want to override target CPU utilization to 80, you should set autoscaling_metric_specs.metric_name to `aiplatform.googleapis.com/prediction/online/cpu/utilization` and autoscaling_metric_specs.target to `80`. Returned: success
metricName string	The resource metric name. Supported metrics: * For Online Prediction: * `aiplatform.googleapis.com/prediction/online/accelerator/duty_cycle` * `aiplatform.googleapis.com/prediction/online/cpu/utilization`. Returned: success
target integer	The target resource utilization in percentage (1% - 100%) for the given metric; once the real usage deviates from the target by a certain percentage, the machine replicas change. The default value is 60 (representing 60%) if not provided. Returned: success
machineSpec dictionary	The specification of a single machine used by the prediction. Returned: success
acceleratorCount integer	The number of accelerators to attach to the machine. Returned: success
acceleratorType string	The type of accelerator(s) that may be attached to the machine as per accelerator_count. See possible values [here](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec#AcceleratorType). Returned: success
machineType string	The type of the machine. See the [list of machine types supported for prediction](https://cloud.google.com/vertex-ai/docs/predictions/configure-compute#machine-types) See the [list of machine types supported for custom training](https://cloud.google.com/vertex-ai/docs/training/configure-compute#machine-types). For DeployedModel this field is optional, and the default value is `n1-standard-2`. For BatchPredictionJob or as part of WorkerPoolSpec this field is required. TODO: Try to better unify the required vs optional. Returned: success
maxReplicaCount integer	The maximum number of replicas this DeployedModel may be deployed on when the traffic against it increases. If the requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale the model to that many replicas is guaranteed (barring service outages). If traffic against the DeployedModel increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, will use min_replica_count as the default value. The value of this field impacts the charge against Vertex CPU and GPU quotas. Specifically, you will be charged for max_replica_count * number of cores in the selected machine type) and (max_replica_count * number of GPUs per replica in the selected machine type). Returned: success
minReplicaCount integer	The minimum number of machine replicas this DeployedModel will be always deployed on. This value must be greater than or equal to 1. If traffic against the DeployedModel increases, it may dynamically be deployed onto more replicas, and as traffic decreases, some of these extra replicas may be freed. Returned: success
displayName string	The display name of the DeployedModel. If not provided upon creation, the Model’s display_name is used. Returned: success
enableAccessLogging boolean	These logs are like standard server access logs, containing information like timestamp and latency for each prediction request. Note that Stackdriver logs may incur a cost, especially if your project receives prediction requests at a high queries per second rate (QPS). Estimate your costs before enabling this option. Returned: success
enableContainerLogging boolean	If true, the container of the DeployedModel instances will send `stderr` and `stdout` streams to Stackdriver Logging. Only supported for custom-trained Models and AutoML Tabular Models. Returned: success
id string	The ID of the DeployedModel. If not provided upon deployment, Vertex AI will generate a value for this ID. This value should be 1-10 characters, and valid characters are /[0-9]/. Returned: success
model string	The name of the Model that this is the deployment of. Note that the Model may be in a different location than the DeployedModel’s Endpoint. Returned: success
modelVersionId string	Output only. The version ID of the model that is deployed. Returned: success
privateEndpoints dictionary	Output only. Provide paths for users to send predict/explain/health requests directly to the deployed model services running on Cloud via private services access. This field is populated if network is configured. Returned: success
explainHttpUri string	Output only. Http(s) path to send explain requests. Returned: success
healthHttpUri string	Output only. Http(s) path to send health check requests. Returned: success
predictHttpUri string	Output only. Http(s) path to send prediction requests. Returned: success
serviceAttachment string	Output only. The name of the service attachment resource. Populated if private service connect is enabled. Returned: success
serviceAccount string	The service account that the DeployedModel’s container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn’t have access to the resource project. Users deploying the Model must have the `iam.serviceAccounts.actAs` permission on this service account. Returned: success
sharedResources string	The resource name of the shared DeploymentResourcePool to deploy on. Format: projects/{project}/locations/{location}/deploymentResourcePools/{deployment_resource_pool}. Returned: success
etag string	Used to perform consistent read-modify-write updates. If not set, a blind “overwrite” update happens. Returned: success
modelDeploymentMonitoringJob string	Output only. Resource name of the Model Monitoring job associated with this Endpoint if monitoring is enabled by CreateModelDeploymentMonitoringJob. Format: `projects/{project}/locations/{location}/modelDeploymentMonitoringJobs/{model_deployment_monitoring_job}`. Returned: success
state string	The current state of the resource. Returned: always
updateTime string	Output only. Timestamp when this Endpoint was last updated. Returned: success

Authors

Google Inc. (@googlecloudplatform)

google.cloud.gcp_vertexai_endpoint module – Creates a GCP VertexAI.Endpoint resource

Synopsis

Requirements

Parameters

Notes

Examples

Return Values