What is Baidu?

Baiduspider is the web-crawling bot used by Baidu, a leading Chinese search engine. Similar to Google’s Googlebot, Baiduspider systematically browses the internet to index web pages for Baidu’s search engine results. It operates by sending HTTP requests to websites, retrieving content, and analyzing it to update Baidu’s search index. This process allows Baidu to provide relevant search results to its users. Baiduspider identifies itself through its user-agent string, which typically includes “”Baiduspider”” in the name. Baiduspider is intended to respect the robots.txt protocol, but reports from webmasters indicate that it occasionally ignores these rules, so monitoring for unauthorized crawling is recommended. However, due to its origin and focus on Chinese-language content, Baiduspider has the most impact for websites targeting users in China or Chinese-language audiences. However, it visits many international sites and can consume bandwidth/resources even if your primary audience is not in China.

Baiduspider is the foundation of Baidu’s search ecosystem, powering results for over a billion Chinese users, and operates specialized crawlers for images, video, news, and mobile content.

Why is Baidu crawling my site?

Baiduspider crawls your website primarily to index its content for Baidu’s search engine. This indexing helps improve the visibility of your site in search results for users of Baidu, particularly if your content is relevant to Chinese-speaking audiences. Additionally, if your website has inbound links from other sites that Baiduspider has already indexed, it may follow these links to discover and crawl your site. The bot aims to provide comprehensive search results by continuously updating its index with new and modified web pages.

Threat research insights on Baidu

All data in this section are produced by DataDome's Galileo Threat Research team from our proprietary detection network and reviewed by human analysts.

Verified Bot A verified bot has high identification strength
Verified
Robots.txt Compliance Whether this bot respects robots.txt directives
Not respected
Identification Strength How confidently DataDome can identify this bot
High

Traffic origins

Top 15 countries by bot traffic

CN CN 100.0%

Most used autonomous system (AS)

Top 5 by traffic share

CHINA UNICOM China169 Backbone
88.81%
IDC, China Telecommunications Corporation
11.19%
China Unicom Beijing Province Network
0.0%
China Mobile Communications Group Co., Ltd.
0.0%
Traffic Occupancy
0.42%

On average, occupy 0.42% of the traffic from bots in the directory

Authorization Rate
100%

Businesses decide to authorize this bot 100% of the time

How to block Baidu?

1. IP Blocking: Identify the IP ranges used by Baiduspider and configure your server’s firewall or .htaccess to block these addresses. Baiduspider’s IP ranges may change frequently, so it’s necessary to keep your block list updated using official Baidu documentation or network monitoring.

 

2. User-Agent Filtering: Implement server-side filtering to deny requests from user-agent strings that contain ‘Baiduspider.’ Baiduspider’s main user-agent is

Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)

, but it may use multiple user-agent variants for desktop, mobile, and media-specific crawling. This approach can be implemented in web server configurations like Apache or Nginx.

 

3. CAPTCHA Implementation: Use CAPTCHA challenges for suspicious traffic patterns that resemble bot activity. While this may not specifically target Baiduspider, it can deter unwanted automated access.

 

4. Rate Limiting: Configure rate limiting on your server to restrict the number of requests from a single IP address over a specified period. This can help manage excessive crawling behavior.

 

5. Contact Baidu: If you have a legitimate reason to block Baiduspider, consider reaching out to Baidu directly to request exclusion from their crawling activities.

DataDome

See which bots and AI agents bypass your defenses

Create your account to start analyzing and mitigating malicious bots and AI-drive threats in real-time