Splunk Synthetic Monitoring Tool

What is Splunk Synthetic Monitoring Tool?

There is no public, official “Splunk crawler” like Googlebot. “Splunk crawler/bot” seen in logs typically refers to:
– Splunk Synthetic Monitoring (formerly Rigor) or scripted checks run via Splunk
– Customer-built web probes using Splunk (e.g., Website Monitoring app, custom Python/Phantom/SOAR playbooks)
User agents or labels may include “Splunk” but originate from customer infrastructure or Splunk synthetic nodes.

Legitimate use cases
– Uptime/SLA and page performance checks
– Transaction synthetics (login/checkout flows)
– API health monitoring
– Security control validation and attack-surface discovery
– Data collection for analytics/dashboards

Fraud/illegal misuse (not guidance)
– UA spoofing as “Splunk” to bypass naive bot filters
– Reconnaissance and large-scale scraping
– Inventory scalping and price scraping
– Ad fraud and click automation
– ATO prep: endpoint, form, and rate-limit enumeration

Note: Validate via reverse DNS/IP ownership, known Splunk Synthetic node IPs, and behavior-based detection, not UA strings alone.

Why is Splunk Synthetic Monitoring Tool crawling my site?

It’s typically driven by a Splunk customer performing external monitoring or research—e.g., synthetic uptime/performance checks, content/link validation, threat-intel reconnaissance, or fraud/security signal enrichment against your domains. The crawler fetches pages and resources to measure response, changes, or indicators, and may revisit at intervals or from multiple vantage points. Potential negatives: added, sometimes bursty, traffic load; skewed web analytics, conversion funnels, and A/B tests; pollution of fraud/behavioral baselines (e.g., session velocity, device fingerprint diversity); inadvertent trigger of WAF/rate limiting that affects real users; consumption of API quotas or bandwidth; exposure of sensitive-but-public endpoints to wider correlation; and possible SEO/crawl budget side effects if it competes with search engine crawlers. If your site ties traffic to costs (CDN, serverless invocations), it can also increase spend.

Threat research insights on Splunk Synthetic Monitoring Tool

All data in this section are produced by DataDome's Galileo Threat Research team from our proprietary detection network and reviewed by human analysts.

Verified Bot A verified bot has high identification strength
Verified
Robots.txt Compliance Whether this bot respects robots.txt directives
Not respected
Identification Strength How confidently DataDome can identify this bot
High

Traffic origins

Top 15 countries by bot traffic

US US 38.73%
CA CA 25.47%
DE DE 13.38%
GB GB 9.68%
FR FR 5.61%
ES ES 3.7%
IT IT 2.75%
CH CH 0.29%
AU AU 0.13%
SG SG 0.12%
DK DK 0.04%
FI FI 0.04%
SE SE 0.04%
TW TW 0.01%

Most used autonomous system (AS)

Top 5 by traffic share

Amazon.com, Inc.
100.0%
Traffic Occupancy
0.02%

On average, occupy 0.02% of the traffic from bots in the directory

Authorization Rate
0%

Businesses decide to authorize this bot 0% of the time

How to block Splunk Synthetic Monitoring Tool?

1) User-Agent filtering at the web server
Nginx: if ($http_user_agent ~* "Splunk") { return 403; }
Apache:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} "(?i)Splunk"
RewriteRule .* - [F]

2) IP/ASN/network blocking
Block known IP ranges or hosting ASNs used by Splunk if identified and unwanted.

3) Rate limiting and dynamic banning
Use Nginx limit_req or similar to throttle high-frequency requests from this bot; optionally use fail2ban for auto-blocking.

4) JavaScript token + honeypot traps
Require JS-generated signed cookies/tokens; add honeypot URLs and block any Splunk agent that touches them.

DataDome

See which bots and AI agents bypass your defenses

Create your account to start analyzing and mitigating malicious bots and AI-drive threats in real-time