What is Google-CloudVertex?

“Google-CloudVertex” is a Google Cloud web-crawler user agent used by Vertex AI (e.g., Vertex AI Search/Conversations) to fetch and index web content and files on behalf of customers for retrieval-augmented generation and enterprise search.

Legitimate uses:
– Ingest public websites/knowledge bases for Vertex AI Search and chatbots.
– Keep RAG indices fresh (docs, FAQs, product pages).
– Content classification/summarization pipelines in Vertex AI.
– Observability: security teams tune robots.txt, rate limits, and allowlists for this UA.

Abuse/misuse risks (illegal or fraudulent):
– Unauthorized scraping or terms/robots.txt violations.
– Large-scale data harvesting for spam, account takeover reconnaissance, or price/content theft.
– Training/social-engineering content collection to craft targeted phishing.
– Infrastructure masking: attackers spoof the UA to evade naive allowlists.

Notes:
– Identify by the “Google-CloudVertex” user agent; validate via IP reputation and robots.txt controls, not UA string alone.

Why is Google-CloudVertex crawling my site?

It’s likely fetching content to power AI services (indexing pages, extracting structured data, updating models), testing reachability/latency, and discovering new or changed endpoints. For you, risks include resource strain (bandwidth/CPU spikes, cache churn), log noise obscuring real threats, skewed analytics, and accidental exposure of non-public or high-sensitivity content if access controls are weak. It can facilitate reconnaissance by mapping URLs, parameters, and APIs, aiding attackers who piggyback on discovered surfaces. Content and metadata may be incorporated into external AI outputs, raising concerns about IP leakage, data licensing/compliance (PII, regulated data), jurisdictional transfer, and brand confusion if your material informs responses without attribution or context. High-rate crawling patterns can trigger WAF/CDN anomalies, throttle legitimate users, or degrade SEO signals through duplicate/facet pages being over-sampled. Ensure sensitive endpoints require authentication and that public content aligns with your policy and legal posture.

Threat research insights on Google-CloudVertex

All data in this section are produced by DataDome's Galileo Threat Research team from our proprietary detection network and reviewed by human analysts.

Verified Bot A verified bot has high identification strength
Verified
Robots.txt Compliance Whether this bot respects robots.txt directives
Not respected
Identification Strength How confidently DataDome can identify this bot
High

Traffic origins

Top 15 countries by bot traffic

US US 100.0%

Most used autonomous system (AS)

Top 5 by traffic share

Google LLC
100.0%
Traffic Occupancy
0.03%

On average, occupy 0.03% of the traffic from bots in the directory

Authorization Rate
100%

Businesses decide to authorize this bot 100% of the time

How to block Google-CloudVertex?

Web server User-Agent block (hard block)
– Nginx:
if ($http_user_agent ~* "Google-CloudVertex") { return 403; }
– Apache:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Google-CloudVertex [NC]
RewriteRule .* - [F]

IP/CIDR blocking (network layer)
– Block known Google Cloud IP ranges at your firewall or reverse proxy.
– Automate fetching Google Cloud’s published CIDR list and update rules periodically to avoid gaps.

Reverse DNS + forward-confirmation (verification-based block)
– On request, perform reverse DNS on the client IP.
– Then forward-confirm the hostname maps back to the same IP (FCrDNS).
– If the hostname matches known GCP patterns (e.g., *.googleusercontent.com), deny.
– Reduces User-Agent spoofing and avoids overblocking non-GCP traffic.

DataDome

See which bots and AI agents bypass your defenses

Create your account to start analyzing and mitigating malicious bots and AI-drive threats in real-time