Behind the scenes of a massively distributed credential stuffing attack.

Credential stuffing

A while back, one of DataDome’s customers came under an interesting attack.

The bot operators mostly leveraged common credential stuffing methods and tools; what makes this attack particularly noteworthy is the numbers.

Over approximately 48 hours, the attackers bombarded their target with:

More than 5.7 million requests.
From more than 250,000 different IP addresses.
Distributed across more than 8,000 autonomous systems.
In more than 215 countries (including dependent territories and special areas of geographical interest).

Make no mistake—this attack was definitely not the brainchild of a hooded teenager in his parents’ basement. Whoever was behind it disposed of considerable human, technical, and financial resources, and had sufficient motivation to deploy them.

Besides the prodigious numbers, the attack represents an interesting case study because we were able to observe in real time how the perpetrators were trying to reverse engineer our bot protection solution: attacking from different angles, making educated hypotheses about our detection criteria, and testing out different strategies for working around the protection (spoiler alert: unsuccessfully).

Let’s dive in!

Background

The target of this particular attack is a pure-play online business based in the United States. The majority of its customers are based in North and Central America, with smaller numbers of users located primarily in the UK and other parts of Western Europe.

For reference, here is a snapshot of the website’s normal, legitimate traffic distribution.

In order to access the website’s services, customers must have an account. Like any website with a significant number of members, it is, therefore, an attractive target for hacker bots trying to access (and take over) user accounts via credential stuffing.

What is credential stuffing?
Credential stuffing is a type of cyberattack that uses malicious bots to attempt to login to websites using stolen credentials (usernames, emails, passwords, etc.). Read more

The company implemented the DataDome bot protection solution a few weeks prior to this attack. Our user dashboard enables real-time monitoring of all automated traffic, and we immediately noticed that the website had a high, regular volume of credential stuffing attempts. Once the protection was activated, however, these attempts were no longer a cause for concern.

Then one day …

Phase 1: A massive Attack Distributed Across 215 Countries

In the days leading up to the attack, there was a bit of a lull in the bad bot traffic to the website — silence before the storm, as it would turn out.

Late in the evening on August 23, our internal systems triggered a notification that something unusual was happening on this customer’s website.

This graph shows the number of requests that the DataDome solution has blocked from the website. The algorithm was humming along, blocking approximately 20,000 requests per hour, when all of a sudden the volume of malicious requests quintupled.

This wasn’t a volumetric attack, however. Remember, the spike we’re seeing doesn’t show the total volume of website traffic, only the number of blocked requests. The impressive aspect of the attack wasn’t volume, but distribution.

A closer look at the geographic distribution unveiled that the malicious traffic was coming from more than 215 different countries, with the highest concentration in South-East Asia.

Even more extraordinary, the traffic was spread across more than 250,000 IP addresses and more than 8,000 autonomous systems!

This gives us a first glimpse into how the bot operators have been reasoning.

They had obviously discovered that our customer — their target — had implemented some kind of protection of its membership area, and was now blocking their bots. In order to continue to do their dodgy business, they would have to bypass this protection.

However, they probably figured that they were up against a WAF (Web Application Firewall). So they designed a low and slow attack, making sure that each unique IP address generated no more than 10-20 requests. As this would trigger neither user-level nor session-level alerts, the attack would be impossible to detect for a WAF.

Of course, manually blocking malicious traffic from 250,000 different IP addresses is utterly unrealistic.

Thankfully for our customer, the DataDome credential stuffing protection did its job. Despite the attack’s spectacular distribution, the malicious traffic was promptly detected and blocked, as illustrated by the monumental spike in denied requests that we saw in the first graph above.

The bot operators, on the other hand, didn’t like this one bit. And so they devised the second phase of their attack.

Phase 2: Reverse Engineering the Bot Protection Solution

The next graph shows the distribution of the originating IPs by country for the duration of the attack. It offers a fascinating illustration of how the bot operators are trying to reverse engineer the protection solution to eschew detection.

As discussed above, the first wave (August 24) was characterized by massive geographic distribution. The largest numbers of requests came from Vietnam (the purple curve) and Indonesia (the puke green one), but big bulks of traffic came from other countries as well:

As a side note, there’s no way to tell from our data precisely how the attackers got access to such a humongous number of IP addresses. We also don’t know where they were actually located — perhaps their random IP selection just has a proclivity for certain ranges?

Anyhow, exploiting IPs from 215 countries wasn’t doing the trick for our unfortunate friends: the pesky security system was still blocking their bots. And we can almost hear them pondering: “The protection solution must be deeming international traffic suspicious. After all, this is an American site with a mostly American audience — so let’s try to attack from only American IPs!”

And for the next 36 hours, that’s precisely what they did: the vast majority of requests were now coming from IPs located in the US (the red curve above).

Alas, our hard-working hackers had no more success with this strategy than with the previous one. And after one last effort with another multi-country set of IPs, they declared defeat and went away to attack someone else. We hope it wasn’t you.

How DataDome Deals With Massively Distributed Attacks

As mentioned above, a rules-based security solution such as a WAF would have been futile against this attack. Sure, our customer does monitor failed login attempts independently of DataDome. Without us, they would still have discovered that a major attack was going on. However, they would have been powerless to stop it without causing significant inconvenience to real customers.

The case perfectly illustrates how easily bot operators can now bypass IP-centered solutions. Thanks to residential proxy networks, vulnerable IoT systems, and malicious applications installed on mobile devices, they can quickly rotate through hundreds of thousands of different IPs and adapt their attack strategies as they go along.

It now takes truly expert bot detection know-how to thwart attacks of this nature. Efficient protection requires tools that are capable of determining the visitors’ intent, rather than just analyzing volume and known bot signatures.

So how exactly do we achieve that?

The DataDome bot protection solution is built around a two-layer detection engine that makes extensive use of artificial intelligence and machine learning technologies.

In the first layer, known threats are detected in less than 2 milliseconds thanks to known AI/custom rule pattern matching and HTTP fingerprinting. This layer identifies and blocks 99 percent of all bad bot requests to our customers’ websites, mobile apps, and APIs.

New threats, which represent the real challenge, are identified via statistical and behavioral criteria. This analysis is based on an extensive set of signals, including (but not limited to) fake browser detection, browser automation detection, browser tracking, user event tracking, and device detection. This layer detects advanced new threats in less than 100 milliseconds, and it also feeds the first detection layer in real time.

Thanks to these AI-based behavioral detection capabilities, we identify and block even the most sophisticated bots in real time, with no discernable impact on real, human users.

Read more: How to detect malicious bots.

A Note on Feedback Loops

In the relatively rare cases where our identification is still not conclusive after the initial detection steps, we present the visitor with a challenge that includes a CAPTCHA. But, how effective are CAPTCHAs? The trusty CAPTCHA is used by DataDome as a simple but effective tool for measuring false positives: human visitors that have incorrectly been flagged as bots. As such, it constitutes a valuable feedback mechanism for our algorithm. (Our false positive rate is an industry-beating 0.01%.)

The user dashboard that our customers see also includes a real-time report showing the number of CAPTCHAs served, failed, and passed. It’s how our customers can continuously monitor their false positive rate and assess the quality of the AI detection themselves. (Which certainly keeps us on our toes!)

We’re proud to declare that—out of the millions of CAPTCHAs that our solution displayed during this attack—zero CAPTCHAs were solved on the login endpoint.

In other words, while behind the scenes, we were fighting off millions of malicious requests in the space of a couple of days, not a single real, human user was bothered by a CAPTCHA.

What We Could Have Done Better

So was everything 100% perfect? Of course not.

While we’re confident that we managed to prevent the attackers from taking over any of our customers’ user accounts, a few bots likely eschewed detection in the very early phases of the attack.

When an attack is distributed on such a scale, detection does take a little more time. To ensure that zero bots will bypass the protection, even during major attacks, we need our algorithm to update even faster.

Thanks to the behavioral analysis, we currently detect a new bot pattern every 50 ms. That’s 1.2 million new bots detected every day, automatically and in real time. We analyze every request to our customers’ websites, apps, and APIs, and we collect 250 events for every request.

In our efforts to continuously improve our algorithm, we need to collect ever more event data to feed our real-time event analysis with an ultra-fast feedback loop—without increasing latency, of course.

Conclusion

So what are the key takeaways from our review of this impressive attack?

Here’s the good, the bad, and the ugly—in reverse order, for the sake of your good night’s sleep.

The stakes for attackers are very high.

Any cyberattack is the result of a cost vs value consideration, although “value” doesn’t always mean money. Prestige, revenge, and political influence are but a few examples of alternative currencies.

Either way, our data clearly demonstrate that the value our protagonists expected to generate from a successful breach was significant. Given the investment in time and money required to perpetrate such an attack, the potential upside must have been considerable.

If you’re in charge of data security on a website with lots of user accounts, take note.

Bot detection is extremely difficult.

Precisely because the stakes are so high, bot operators are continuously exploring and investing in new and sophisticated technologies, just like we are.

Current bots are already almost indistinguishable from human users, and impossible to detect without specialized knowledge. We expect the next generation of bots to make massive use of artificial intelligence, which will make it even more challenging to spot them.

Efficient bot protection exists!

As promised, we’ll end on a happy note. While accurate bot detection is hard, it’s not impossible.

As bots and humans are now using the same browsers and IPs, bot protection must be a real-time, automated process. Humans can no longer act fast enough to match the technical prowess of bots, but artificial intelligence can.

Thanks to real-time event tracking and behavioral detection, we are able to deflect even the most sophisticated attacks, as illustrated by this case study.

Also, there is strength in numbers. When a new malicious bot is detected—automatically and in real time—on one of global enterprise online business we protect, all our customers are automatically protected in less than 50 milliseconds.

To stay ahead of bad bots, we continue to develop and scale the DataDome solution with ever-stronger event collection and event tracking capabilities. If you haven’t yet signed up for our free 30-day trial, what are you waiting for? 😉