What is Site Improve?

The Siteimprove.com bot crawler is an automated tool used by Siteimprove, a digital optimization platform, to analyze websites for various performance metrics. It scans web pages to assess factors such as SEO, accessibility, content quality, and overall site performance. The bot helps organizations identify issues that could affect user experience or search engine rankings. By providing detailed reports, it enables website administrators to make informed decisions to enhance their site’s effectiveness. The benefits include improved site accessibility, better compliance with web standards, and enhanced visibility in search engines, ultimately leading to a more user-friendly and efficient website.

Why is Site Improve crawling my site?

Siteimprove.com bot crawls your website to gather data for analysis on behalf of its clients. This activity is typically initiated by a website owner or administrator who uses Siteimprove’s services to monitor and optimize their site’s performance. The bot collects information on various aspects such as SEO, accessibility, and content quality, providing actionable insights to improve the website’s functionality and user experience.

Threat research insights on Site Improve

All data in this section are produced by DataDome's Galileo Threat Research team from our proprietary detection network and reviewed by human analysts.

Verified Bot A verified bot has high identification strength
Verified
Robots.txt Compliance Whether this bot respects robots.txt directives
Not respected
Identification Strength How confidently DataDome can identify this bot
High

Traffic origins

Top 15 countries by bot traffic

DE DE 35.03%
US US 32.55%
CA CA 30.79%
GB GB 0.78%
AU AU 0.7%
JP JP 0.06%
NL NL 0.06%
HK HK 0.01%
SE SE 0.01%
BR BR 0.01%

Most used autonomous system (AS)

Top 5 by traffic share

Amazon.com, Inc.
99.94%
Google LLC
0.06%
Traffic Occupancy
<0.1%

On average, occupy <0.1% of the traffic from bots in the directory

Authorization Rate
100%

Businesses decide to authorize this bot 100% of the time

How to block Site Improve?

1. IP Address Blocking:
Identify the IP addresses used by Siteimprove and block them at the server level using firewall rules or web server configurations like .htaccess for Apache or nginx.conf for Nginx.

 

2. User-Agent Filtering:
Configure your web server to deny requests from the Siteimprove user-agent string. This can be done in .htaccess for Apache:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Siteimprove [NC]
RewriteRule .* - [F,L]

This denies access to the bot based on its user-agent.

 

3. WAF Rules:
Implement custom rules in your Web Application Firewall (WAF) to detect and block requests from the Siteimprove bot based on its user-agent or other identifiable characteristics.

 

4. JavaScript Challenge:
Use JavaScript challenges to verify human interaction, which bots typically cannot process, thus preventing them from accessing your site content.

 

5. CAPTCHA Implementation:
Integrate CAPTCHA mechanisms on key entry points of your website to deter automated access by bots like Siteimprove, ensuring only human users can proceed.

DataDome

See which bots and AI agents bypass your defenses

Create your account to start analyzing and mitigating malicious bots and AI-drive threats in real-time