What is bot traffic? How to Detect & Stop Unwanted Bot Traffic
Bot traffic is internet traffic coming from automated software (bots) designed to perform repetitive, mostly simple tasks. These bots can perform tasks around the clock, and often much quicker than any human ever could.
Around half of all internet traffic comes from web bots. While there are good bots that can be beneficial for your website, approximately 30% of all traffic comes from bad bots. These bots are designed to perform all sorts of malicious tasks, from scraping web content to stealing user accounts and scalping inventory.
Even when bot attacks are unsuccessful in executing their malicious objectives, they can still strain your web servers and hurt your website’s performance, potentially making the website unavailable for human visitors. Effective management of bot traffic is therefore very important for any business with an online presence—but as we will see, this is not an easy task.
Table of Contents
- Types of Website Bot Traffic
- How to Identify Bot Traffic
- How to Stop Bot Traffic on Your Website
- Conclusion
To really understand what bot traffic is and how to effectively manage it, let’s first explore the different types of bots.
Types of Website Bot Traffic
Web bot traffic can be divided into three broad categories:
1. Good Bots
Recognizing the good, helpful bots is very important in managing bot traffic. Good bots are, in fact, key to the success and performance of your site.
Examples
Search Engine Bots:
The most important type of good bot is crawler bots owned and operated by Google, Bing, Baidu, Yandex, and other search engines. Their task is fairly obvious: they constantly crawl the internet to find the content that they will show to people in search of information. Search engine bots help you get your website in front of potential buyers, and you definitely want their traffic.
Partner/Vendor Bots:
These bots are sent by various third-party service providers you use. For example, if you use SEO tools like Ahrefs or SEMRush, their bots crawl your site to check your SEO performance (link profile, traffic volume, etc.). Performance measurement tools such as Pingdom also fall in this category. Like search engine bots, partner bots render useful services. However, on certain occasions—such as a major sales event with a significant traffic spike—you may want to limit the number of requests they are allowed to make to your website in order to optimize the performance for human visitors.
2. Commercial Bots
The DataDome bot management solution classifies “commercial bots” in a separate category. These bots are operated by legitimate companies, typically for collecting and exploiting online content. They are mostly honest about their identity, but they may or may not be beneficial to your business. Commercial bot traffic can also drain your server resources and impact website performance.
Both good bots and commercial bots generally meet the following three main criteria:
- They come from well-known, legitimate sources (Google, Bing, etc.), and they are transparent about the owner/operator of the bot.
- They perform mostly beneficial tasks.
- They will follow the rules and policies in your robots.txt file.
Examples
Aggregator Bots:
These bots crawl websites to find attractive and relevant content to feature on aggregator sites and platforms. They can help promote your content and amplify its reach, but most website owners prefer to control which aggregator bots can access their content and at what rates.
Price Comparison Bots:
Price comparison bots are similar to aggregator bots, but instead of online content they are looking for prices. For example, a flight comparison website may use these bots to scan the websites of different airlines and compile the prices in a comparison tool. Price comparison bots can help get your offers in front of more buyers, but most website owners prefer to work with approved comparison partners who get their data from a pricing feed.
Copyright Bots:
These bots crawl the internet to search for copyrighted images, videos, and other content to ensure nobody is illegally using this content without permission.
3. Bad Bots
Unlike good bots, bad (malicious) bots won’t follow your robots.txt rules. They also tend to hide their identity and source, and often try to pose themselves as legitimate human users.
The main thing differentiating these bad bots from good bots, however, is the type of tasks they perform: bad bots are programmed with malicious intent, to perform disruptive and even destructive tasks. Bad bots can cause a lot of permanent damage when left unchecked.
Examples
Web Scraping Bots:
These bots steal content and information on your site and then publish or sell it on other sites. This can create content duplication issues, among other problems. The bot, for example, can steal private price information regarding your products and release it to your competitors, so you lose your competitive advantage. This is pretty common in websites for businesses where price is a very important purchase decision factor, such as ticketing websites, travel agents, etc.
Credential Stuffing Bots:
These bots use stolen credentials (typically sourced from data breaches) to “stuff” known usernames and passwords into the login pages on other sites. The purpose is to gain access to (and abuse) user accounts. People tend to use the same username-password combination for all their accounts, so these attacks often have a high success rate.
Read more: Behind the scenes of a massively distributed credential stuffing attack.
Spam Bots:
These bots post spam content or send spam emails in bulk, often including links to fraudulent websites. We commonly see these bots leaving comments on blogs, social media posts, and forums, among other mediums.
Ad Fraud Bots:
These bots click on pay-per-click (PPC) ads to generate extra revenue or skew the cost of the ad. As a result, the advertiser is charged with high advertising fees for a campaign that is not actually effective.
Denial of Service (DoS) Bots:
In layer 7 DDoS attacks, bots make repeated requests to resource-hungry elements of a web application, such as large file downloads or form submissions. This causes slowdowns in performance, or even complete downtime.
Credit Card Fraud Bots:
These bots test combinations of credit card information in order to identify missing data like CVV code, expiry date, etc. by trying small transactions repeatedly. Their activity might cause chargebacks for the e-commerce site, and could damage the fraud score of the business. Read more on preventing carding attacks.
Gift Card Fraud Bots:
These bad bots steal money from gift card accounts, which can lead to damages in reputation and loss of future revenue.
Consequences of bad bots include:
- Sudden spikes and abnormal increases of page views.
- Higher bandwidth usage.
- Skewed Google Analytics reports and other KPIs, which may lead to important business decisions based on inaccurate data.
- Lower conversion rates.
- Poor website performance.
- Increased strain on data centers and higher server costs.
How to Identify Bot Traffic
Bot traffic must first be correctly identified before it can be managed. Here are a few things to look out for in your traffic and business metrics.
1. Increase in Traffic & Bounce Rate
If you notice a sudden rise in both traffic volume and bounce rate, it is a strong indication of bad bot traffic. An abnormal increase in traffic (or unpredictable traffic spikes) usually means a high number of bots coming to your site, or a single bot repeatedly coming to your site again and again. The increase in bounce rate indicates that the bot leaves without exploring more pages after it has fulfilled its task.
2. Page Load Speed
A dramatic dip in page load speed—especially if you haven’t made any significant changes to your website— is a telltale sign of bad bot traffic. Although bot traffic is not the only possible reason for slower site performance, it’s an indication that you should take a closer look at your other KPIs.
While one single bot is unlikely to make a significant impact on your site’s overall speed, many malicious activities involve a lot of bots entering a website at the same time, like in the event of Layer 7 DDoS attacks.
3. Abnormal Decrease in Bounce Rate
Another important factor to check is when your bounce rate dips to a suspiciously low level. This is a strong indicator of web scraping bots stealing your content or ticket scalping, essentially scanning a very large number of pages.
4. SEO Performance
This one might be more difficult to measure right away, but when web scraping bots steal your content and publish it on other sites, it might impact your site’s SERP ranking in the long run.
There’s a chance that your site might be outranked by the site dubiously publishing your content, and your site might also get penalized by Google for duplication issues. Make sure to always set up canonical tags on every blog post so your article is always considered canonical even when your content is stolen.
5. Customer Complaints About Unavailable Goods
If your customers repeatedly complain that they’re unable to purchase the products they want from your site, you may be the victim of scalper bots. These bots are designed for ultra-fast online purchasing, and can be a cause of great frustration for real customers who are unable to beat them to the checkout page.
Diagnose your website for bot traffic.
How to Stop Bot Traffic on Your Website
If you’ve checked your stats and determined you have a bot traffic problem, what now? Well, you focus on how to stop bot attacks, of course.
Although the main focus should be to stop bad bot traffic, you also need to manage traffic from good and commercial bots. Not all good and commercial bots may be useful for your site—and while they won’t deliberately hurt your site, they might strain your site’s performance with unnecessary traffic. Also, properly managing these good bots will also help us in differentiating them from bad bots.
Managing Good Bots
Thankfully, since good bots are open about their identity and are mostly willing to be managed, managing their traffic should be fairly easy. There are two main approaches we can use:
Robots.txt
The main approach in managing good bots is to set up rules and policies in your robots.txt file. The basic principle is to allow the good bots that are going to benefit your site and block bots who might not help your site at all. For example, if you don’t serve the Chinese market and there’s no Chinese language version of your site, there’s no need for Baidu’s bots to crawl your site and you may want to block them.
You can follow this guide on how to manage your robots.txt.
Block & Allow Lists
If you have a bot management solution, the other approach is to set up a block list and/or allow list. We can, for example, set up an allow list of what kinds of good bots are allowed to roam our site, if we are 100% sure that these are the only bots that are going to be beneficial for our site. A good bot management solution should also let you manage good bot traffic with features such as rate limiting or timeboxing, so you can allow access on your own terms.
Managing the Bad Bots
In managing and stopping bad bot traffic, here are several different approaches to try:
Investing in a Bot Management Solution
With bad bots becoming more advanced and adept at imitating human behaviors, an advanced bot management solution is required. Bots may even use AI and machine learning technologies to achieve their tasks and to mask their identity, and so an AI-based bot management solution like DataDome is now a necessity.
DataDome performs real-time, behavioral-based bot detection to effectively identify even the most sophisticated bots, which can forge their user agent (UA) and rotate between hundreds if not thousands of perfectly clean IP addresses.
A lot of these bot management solutions are now fairly affordable and easy to use, so if you are serious about your cybersecurity, investing in a proper bot detection and mitigation solution is a must.
CAPTCHA
A basic approach to stop bot traffic is to use CAPTCHAs, but we shouldn’t think of them as a one-size-fits-all answer to bot management. There are two reasons for this:
- CAPTCHA can be a complicated concept, especially when you factor in reCAPTCHA v2 vs v3.
- Using too many CAPTCHAs on your site can ruin the user experience and increase your site’s bounce rate.
- Not only are today’s bots getting better, but there are also various CAPTCHA farm services available for hackers where humans will solve the CAPTCHA for the bot.
So, think of CAPTCHAs as prerequisite protection, and not the final answer to your bot management strategy. DataDome’s bot management solution comes with its own integrated DataDome CAPTCHA that takes, on average, less than 3 seconds to solve for humans.
Using a Web Application Firewall (WAF)
Another common solution for stopping bot traffic is using a WAF, which is a firewall (shield) placed between a web application (or web page) and the client. Traffic and resources first go to the WAF before they are sent to the client. We can think of a WAF as a reverse proxy server.
A WAF can be useful for protecting applications against the most common types of attacks, and may block a part of your unwanted bot traffic. However, WAFs are designed for application protection—not bot detection—and are powerless against sophisticated bots that actively try to circumvent your security solutions.
IP-Based Management
Although today’s bots typically use vast numbers of different IP addresses, making IP-based protection rather ineffective, we can still implement the practice of blocking IP addresses that are obvious sources of malicious bots. Be very careful when blocking public IPs, since you could be blocking legitimate users as well.
Stricter Access Controls
On sensitive areas of your website (i.e. areas intended for admins, where users can access your database), you can implement stricter access controls like requiring multi-factor authentication (MFA). This can be effective in stopping bot traffic performing credential stuffing attacks and other malicious activities.
In the event that a bot has successfully cracked your site with its attacks, it won’t be able to fully access your network, minimizing potential damage.
Conclusion
Unmanaged bot traffic can be very costly for any business with a website/online presence. Effectively identifying and stopping abusive bot traffic is therefore extremely important.
While there are various approaches we can use to mitigate bad bot traffic, investing in a specialized bot management solution remains the most effective one. Since today’s most advanced bots are extensively using machine learning technologies, an AI-based bot management solution is preferred. The best bot management solutions leverage machine learning to analyze visitor behavior and stop malicious bots before they even reach your network.