DataDome

The State of Bots 2024: Changes to the Bot Ecosystem

Table of contents

The bot ecosystem in 2024 is significantly more advanced than even just last year, with updates to Headless Chrome making automated browsers more difficult to catch, overwhelming proxy usage with reputable IPs, and AI advances making traditional CAPTCHAs easy to automatically solve.

Puppeteer Extra Stealth is Dead—Long Live Anti-CDP Bot Frameworks

When it comes to bot development, it’s difficult not to mention Puppeteer Extra Stealth, one of the most popular anti-detect bot frameworks. It offers bot developers several features to lie about a bot’s fingerprint and is even integrated with CAPTCHA farms. As of June 2024, it has 6.2K stars on Github.

Github Puppeteer Extra Stealth Page

Lately, Puppeteer Extra Stealth’s popularity started to decline. No significant code change or update has been pushed within the last year, and the primary code maintainer started his own paid bot product. But that’s not the only reason to explain the decrease of popularity. Indeed, the latest update of Headless Chrome makes automated browsers more difficult to detect by default.

With just a few changes—such as page.setUserAgent() to change the user agent, and using the --disable-blink-features=AutomationControlled argument to get rid of navigator.webdriver—there are very few inconsistencies left in the fingerprint of Headless Chrome.

The lack of maintenance of Puppeteer Extra Stealth, combined with the major Headless Chrome update and new CDP detection techniques, led the bot dev community to create new anti-detect bot frameworks. These new frameworks include nodriver (announced as the successor of undetected chromedriver), and Selenium driverless. To avoid being detected, these frameworks decided not to rely on Chromedriver and Selenium. Instead, they implement all the usual bot automation functions using low-level CDP commands that do not leverage Runtime.enable.

Even though these frameworks are quite new, they are already popular. For example, as of June 2024, nodriver already has 590 stars.

Github nodriver page

But It’s Not Just About Automated (Headless) Browsers

It has become easier to forge all kinds of signals, including low-level signals that used to be difficult to forge consistently.

For example, the Noble TLS library enables bot developers to replace their usual Python HTTP client with an HTTP client that has a consistent TLS fingerprint.

Developers just need to provide a user-agent, e.g. Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36 for Chrome 125 on MacOS, and the library takes care of updating the TLS fingerprint accordingly. This is particularly useful for attackers to bypass server-side-only detection mechanisms and static signatures/rules implemented in WAFs.

Noble TLS Page

Besides forging fingerprinting signals, several libraries—such as Ghost cursor—help bots generate more realistic and human-like mouse movements.

Github ghost cursor page

Instead of moving the mouse in a straight line, it helps users to generate mouse movements with a less suspicious trajectory using Bezier curves.

Bezier Curve

Ghost cursor is also compatible with Puppeteer, one of the most popular bot automation frameworks. Thus, bot developers don’t need to modify all of their code to benefit from this library.

Using ghost cursor in puppeteer

Attackers Have Access to Millions of (Residential) IPs

Thanks to residential proxy services such as Brightdata, Smartproxy, and Oxylabs, bot developers have access to millions of residential IPs located all around the world:

Brightdata proxy locations by country

This enables bot developers to:

  1. Distribute their attacks across thousands of IPs, which help them to bypass IP-based rate limiting techniques.
  2. Have access to IPs that belong to well-known ISPs such as Comcast and AT&T, which help them to bypass all forms of detection that aim to block data center IPs.
  3. Have access to thousands of IPs located in the same country as the target website or mobile app. Thus, they can bypass all kinds of geo-blocking techniques.

Bots as a Service: Proxy Networks Make the Lives of Bot Developers Easier

Lately, most of the popular proxy networks started to provide scraping bots as a service. Bot as a service (BaaS) providers let users run bots at scale without requiring any bot development or reverse engineering knowledge.

A BaaS is simply a REST API on which a user provides the URL they want to scrape. The business model is simple: users only pay when their request is successful. Therefore, as long as the request is blocked, the user doesn’t pay for anything. No need to worry about proxy bandwidth, which can become expensive when using residential proxies.

If the request is blocked, the BaaS will make several requests in parallel in an attempt to bypass the protection. For example, they may:

  • Rotate user agent.
  • Spoof new HTTP headers.
  • Change IP address by using new proxies.
  • Forge a CAPTCHA.

If at some point, the BaaS is able to get the content without being blocked, it will return the content to the user. The user only pays for a single API call, even though the BaaS had to do dozens of requests to successfully get the content of the page.

Traditional CAPTCHAs are Definitely Out of the League

Security researchers have shown that traditional CAPTCHAs that rely mostly on the difficulty of their challenge for security have become straightforward to solve using audio and image recognition techniques.

Traditional CAPTCHAs studied

CAPTCHA bot Solve time and accuracy

AI Helped to Scale CAPTCHA Farm Services

In the past, CAPTCHA farm services such as 2captcha used to rely on human workers from developing countries to solve CAPTCHA on the behalf of bots. With the recent progress in audio and image recognition techniques, new services such as CapSolver were able to both reduce the cost and the price of solving CAPTCHA.

In 2018, it used to cost around $3 to solve 1,000 reCAPTCHA v2 challenges and it took around 45s per reCAPTCHA. Now, CAPTCHA solving services can solve 1,000 reCAPTCHA v2 challenges for $0.8 and 5x less time spent per challenge.

Cheaper CAPTCHA solver rates

How to Protect Against Sophisticated Bots in 2024

To summarize the state of the bot development this year:

  • Bots have access to tools that enable them to have a near-perfect fingerprint. They can easily forge low-level signals, such as TLS fingerprints.
  • Bots have access to millions of residential proxy IPs located all around the world. These residential proxies enable them to bypass traditional detection mechanisms such as IP-based rate limiting and geoblocking.
  • Traditional CAPTCHAs—whose security rely solely on the difficulty of the challenge—have become ineffective against bots.
  • The latest AI progress in audio and image recognition techniques significantly speed up the time and monetary cost for bots to solve CAPTCHAs.

What countermeasures can I use to protect against sophisticated bots?

Depending on the type of attacks conducted by bots, you can implement certain countermeasures on your own. For example, you could incentivize your users to activate multi-factor authentication when it comes to credential stuffing attacks. While this is no silver bullet, this definitely raises the bar for attackers.

When it comes to fake account creation, you can try to detect disposable emails, or enforce phone number verification. However, be careful when it comes to phone number verification, which can lead to significant SMS fees in case attackers make large amounts of registrations.

In general, it’s key to have a multi-layered approach against sophisticated threats. Indeed, since it has never been so easy for an attacker to lie and forge its fingerprint, behavior, and IPs, your bot detection should be able to leverage:

  1. All detection signals and techniques available, from advanced browser fingerprinting challenges and ML-based behavioral analysis to sophisticated residential proxy detection.
  2. And to analyze all these signals in real time to ensure bot requests are blocked as fast as possible, before they can cause any damage to your infrastructure.

On top of that, you should also consider the effectiveness of your bot detection against distributed attacks. Since attackers have access to millions of IP addresses, it’s important to have approaches that can block bots from the first request, even when they leverage hundreds of thousands of IP to distribute their attack. At DataDome, we developed several ML models specialized for this use case, including a novel ML approach that we presented at Black Hat Asia.

Finally, one of the most important points when it comes to security is the user experience (UX). Your bot detection should safeguard your UX: security shouldn’t come at the expense of it. You don’t want to bother all your human users with a CAPTCHA whenever they are going to spend their money. That’s why DataDome analyzes thousands of signals in the background, for every request, and only challenges the requests that have been flagged as malicious by our detection engine.

To see firsthand how the DataDome Platform can protect your business from bad bots and online fraud, try it for free or book a demo today.

DataDome
dd product home overview

Still exploring?

Start with an on-demand demo.