What’s the difference between good bots and bad bots?

Bot management Scraping

What is a bot?

A bot is a software application that automatically performs certain tasks quickly and at scale. It is a tool that can be used for good or bad purposes. Good bots are integral to our daily online lives, while bad bots can seriously damage your business if you don’t properly protect yourself.

This article will talk about the differences between good bots and bad bots. It will also cover the different types of bot attacks and how you can protect yourself from bad bots while letting good bots do their necessary work.

Good Bots vs. Bad Bots

The easiest way to classify bots is by their intent: have they been created to do good or bad? This heuristic works because bots vary so widely in complexity that other categorizations quickly fail. Bots can be a few lines of code, meant to automate a repetitive task, or multiple scripts working together to mimic the behavior of a human.

What is a good bot?

A good bot is a bot that performs a helpful or useful task for your company or website visitors. It is not built with bad intentions. Most of the time, it does not damage or worsen the user experience of the places it crawls.

A good bot is usually built by a reputable company. It respects the webmaster’s rules that regulate how often website bots should crawl and index a website. These rules are usually defined in a website’s robots.txt file. A good bot should be programmed to look for that file, read it, and follow its rules before it does anything else.

Examples of good bots are:

Search engine bots, such as Googlebot, the Baiduspider, Bingbot, YandexBot, and others. These bots crawl the vast World Wide Web to find the content that will improve search engine results.
Social network bots, such as Facebook Crawler and Pinterest Crawler. These bots crawl the websites that have been shared on social media networks to create better recommendations, fight spam, build a safer online environment, and more.
Aggregator bots, such as the Feedly Fetcher. These bots crawl the RSS or Atom feeds of websites to build their automatically generated feeds as per the preferences of their users.
Marketing bots, such as SEMrush bot and AhrefsBot. All SEO and content marketing software will have bots that crawl websites for backlinks, organic and paid keywords, amount of traffic, and more.
Site monitoring bots, such as Uptimebot, WordPress pingbacks, and the PRTG Network Monitor. These bots ping your website to detect its overall performance and whether it’s down or not.
Voice engine bots, such as Alexa’s Crawler and Applebot (Siri). Similar to search engine bots, these bots crawl the web so they can give accurate answers to the questions your users ask their voice assistant devices.

As you can see, many different types of good bots will want to crawl your website. This doesn’t mean you should all let them. Good bots take up bandwidth too.

Sometimes, it doesn’t make sense for them to crawl your website. For example, if you don’t serve the Chinese or Russian markets, the Baiduspider and YandexBot shouldn’t crawl your website. You can easily forbid them from doing so with the robots.txt file. The fact that these bots will obey those rules makes them good bots.

What is a bad bot?

A bad bot is programmed to perform a task that will hurt your company or website visitors. It is built with bad intentions and will directly or indirectly worsen the user experience of the places it crawls.

Bad bots are usually built by cybercriminals, fraudsters, or anyone else who’s involved in illegal activities. Your competitors, too, can use bad bots to hurt you. Bad bots either don’t read or simply ignore the rules in the robots.txt file.

Unfortunately, bad bots have become increasingly sophisticated. They used to be simple crawlers that were easily identifiable as bots. Now, a growing number of bots mimic human behavior and use software such as Chrome Headless, Playwright, and Puppeteer to trick all but the best bot management solutions.

Different Types of Bad Bot Attacks

1. Layer 7 DDoS Attacks

DDoS stands for Distributed Denial of Service. In Layer 7 DDoS attacks, bad bots will target specific application-layer processes and overwhelm those functions or features until your websites, apps, or APIs either slow down significantly or crash altogether. In 2017, Neustar estimated that these attacks cost an organization an average of $2.5 million in revenue.

2. Web Scraping

Scraper bots will steal the prices, product descriptions, or other valuable content from your website and use it elsewhere without your permission. Competitors can use these bots to undercut your prices or quickly repurpose your content. Sometimes, their results will rank high in search engines too, worsening your SEO rankings and stealing customers that should’ve been yours.

3. Click Fraud

Click or ad fraud means generating fake pageviews, clicks, and impressions to cost advertisers money without generating any sales. Bad bots do this at such a scale it costs companies billions of dollars a year. This is bad for advertisers, but for publishers too, as these companies want to maintain good relationships with their advertisers. Click fraud can seriously damage their reputation.

4. Account Takeover (ATO)

Account takeover happens when bad bots take over user accounts to access personal data, linked bank accounts, and credit cards. They do so through so-called credential stuffing and credential cracking attack techniques, where they collect information from data breaches to stuff or brute-force usernames and passwords at scale. Once they’re in, they can steal someone’s identity or fraudulently use their credit card. Read how to prevent carding attacks here

5. Spam

Bad bots can swarm your website to post bad comments wherever possible, advertising illicit goods and services or distributing malware. They can also scrape your website for email addresses that they’ll then send unsolicited emails to. Statista observed that 53.95% of all email traffic in March 2020 was spam. While often more a nuisance than a genuine threat, spam can still damage your reputation and drain your resources.

How can you prevent bots?

Use a bot protection solution.

Ultimately, you need a dedicated bot management solution. There’s a reason leading independent research firms consider bot management increasingly mature and beneficial for companies to invest in. Bot management solutions have become required cybersecurity software to protect your websites, apps, and APIs. These solutions are especially critical for blocking AI bots, which use advanced techniques to mimic human behavior and bypass traditional defenses.

DataDome is real-time bot management software that protects your business against all the attacks listed above. It detects and blocks bad bots in less than 2ms, even if those bots rotate through thousands of IPs to make them harder to detect.

Additionally, DataDome understands that every company has a different tech stack. Your website architecture, no matter how complex, shouldn’t hold you back from investing in the right bot management solution. DataDome works on any web infrastructure and can be deployed in minutes. There’s no complex setup, although you can customize DataDome to your heart’s desire if you so please.

The bad bot landscape is continuously evolving. If the DataDome algorithm detects a new bot threat on any of the properties it protects, it will propagate that knowledge in less than 50ms to all DataDome customers. This way, the more properties under its protection, the better it protects.

Exclude bots from Google Analytics.

Bots can make it seem like you’re receiving many more website visitors than you actually are, making it impossible to make data-driven decisions using Google Analytics. While Google Analytics offers an option to exclude known bad bots from your data in the Admin View settings, it isn’t bot detection software.

Google Analytics will miss many lesser-known, new, and advanced bots. Sometimes, such bots can take up the majority of your web traffic. It’s not uncommon for bots to take up a whopping seventy percent of all traffic (we’ve seen it happen). Additionally, even if Google Analytics were perfect at understanding which visitor is a bot and which is a human, it does nothing to prevent the bad bot from trying to wreak havoc to your websites, apps, and APIs.

Use CAPTCHAs.

Captchas used to be reasonably effective protection against bots. In recent years, however, bots have found ways to circumvent CAPTCHAs, to the point where captchas have now become hard for humans and easy for bots. Whereas captchas used to be a nuisance that had at least some use, today they kill your conversions and make the internet less accessible for very little protection in exchange.

Implement a WAF.

A good Web Application Firewall can block familiar threats that rely on known malicious user agents and IP addresses. However, bots now rotate through hundreds, if not thousands of residential bot IP addresses with good reputations. WAFs and their IP-centric rules can no longer keep up with modern bot detection and prevention.

Integrate MFA.

Multi-factor authentication is a great security tool to stop bad bots on your login pages. However, you cannot force your users to implement MFA, no matter how much you emphasize its importance in your UI. Unfortunately, many users won’t bother using it, because they feel it introduces unnecessary friction. This leaves them vulnerable to account takeovers.

Conclusion

This article has gone over the definition of a bot, the difference between good bots and bad bots, the different types of bot attacks, and how you can prevent those bot attacks.

DataDome has a free 30-day trial that will show you how much bot traffic is already crawling your website. The trial is easy to install on your own and requires no credit card. Try it today to better understand the possible threats you’re under.