Splunk Synthetics
What is Splunk Synthetics?
Splunk Synthetics crawler bot is the automated site-scanning component of Splunk Synthetic Monitoring (Observability Cloud). It crawls web properties, enumerates pages and resources, and runs scripted checks to measure availability, performance, and correctness from global locations.
Legitimate use cases
– Build a site map to seed synthetic monitors and user journeys
– Detect regressions (latency, errors, broken links, third-party tag impact)
– Validate content, redirects, and releases across regions
– Track SLAs/SLOs and catch incidents before users do
– Audit performance best practices and asset weights
Abuse and fraud risks (high-level)
– Reconnaissance to enumerate endpoints, hidden paths, and workflows
– Content and price scraping to fuel phishing or brand abuse
– Competitive intelligence gathering beyond fair-use
– Identifying weak third-party scripts or misconfigs for exploitation
– Load probing to time attacks or bypass bot defenses
Mitigate with bot management, rate limiting, robots.txt, WAF rules, and anomaly detection.
Why is Splunk Synthetics crawling my site?
It’s typically hitting your site because someone (e.g., your team, a vendor, or a partner) configured synthetic tests to validate availability, performance, or critical user flows against your endpoints. Potential downsides: inflated traffic and conversion metrics if not filtered; increased load on origin/CDN, API rate consumption, and edge egress costs; noisy WAF/IDS signals and fraud false positives (e.g., CAPTCHA challenges, bot-score degradation), creating alert fatigue; test journeys that unintentionally traverse sensitive flows (checkout, auth, password reset) and trigger operational or compliance reviews; error-surfacing that exposes verbose messages if your app fails under scripted paths; log and SIEM cost spikes from high-frequency requests; interference with A/B experiments and personalization models; and possible SLA or throttle contention with real users during peak. Ensure internal observability and analytics filters clearly distinguish this traffic to prevent misinterpretation.
How to block Splunk Synthetics?
1) User-Agent filtering at the web server
Nginx: if ($http_user_agent ~* "Splunk Synthetics") { return 403; }
Apache:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} "(?i)Splunk Synthetics"
RewriteRule .* - [F]
2) IP/ASN/network blocking
Block known IP ranges or hosting ASNs used by Splunk Synthetics if identified and unwanted.
3) Rate limiting and dynamic banning
Use Nginx limit_req or similar to throttle high-frequency requests from this bot; optionally use fail2ban for auto-blocking.
4) JavaScript token + honeypot traps
Require JS-generated signed cookies/tokens; add honeypot URLs and block any Splunk Synthetics agent that touches them.
Block and Manage Splunk Synthetics with DataDome
See which bots and AI agents bypass your defenses
Create your account to start analyzing and mitigating malicious bots and AI-drive threats in real-time