Server-side bot detection is not enough. Here’s why…

Bot management Scraping

Most IT security professionals are now well aware of bad bots and the persistent threat they represent to any business with an online presence. As a result, the demand for anti-bot software is rapidly increasing.

An efficient bot protection solution must be able to accurately distinguish between bad bots, good bots, and humans, ideally in real time. To determine whether a visitor is a human or a bot, we can collect information from the server side and the client side, in the browser or in a mobile application.

In the following, we will demonstrate why solutions that rely exclusively on server-side detection are powerless against certain types of bots, and why client-side signals must complete the analysis for truly efficient bot protection.

Server-side fingerprinting identifies basic bots.

Server-side detection is typically based on the following information:

HTTP fingerprint: a fingerprint constituted of the HTTP headers sent by the browser, such as the user agent or the compression algorithms supported.
TCP fingerprint: the TCP fingerprint leverages differences in the TCP stack, such as the ordering of packets, to determine the nature of the browser or device that sends the request.
TLS fingerprint: these fingerprints use the set of supported TLS ciphersuites to identify the nature of the device and the software (e.g. mobile app) making the request.
Server-side behavioral features: the number of requests, their frequency, and whether or not there is a browsing pattern can be used to determine if a user is human or not.

This server-side detection is a necessary primary measure, but it’s not sufficient.

What are the limits of server-side bot detection?

Faced with the most recent generations of bots, a security solution with server-side only detection will quickly come up against its limits. That’s because these advanced bots leverage exactly the same browsers as human users—Chrome, Firefox, Safari—or headless browsers like Headless Chrome.

Unlike basic bots that can’t execute JavaScript and are therefore easy to unmask, these advanced bots have consistent HTTP, TCP and TLS fingerprints.

Moreover, whenever small inconsistencies exist, such as a non-human user agent, it can easily be fixed by adding a few lines of code to the bot or by using open-source instrumentation frameworks to forge consistent fingerprints (we discuss this in more detail below).

If server-side detection is all you do, you are completely blind to these bots. Your only chance is to rely on server-side behavioral features, and wait for the bots to trigger your request volume threshold before you can block them.

This approach will invariably miss bots that frequently change their IP address using proxies. And even if they don’t, bots that are targeting customer critical touch points like your login page may already have done a lot of harm by the time you identify and block them.

This is why a truly efficient bot detection solution must combine server-side with client-side detection.

What client-side bot detection enables:

Client-side (in-browser) tracking makes it possible to record and analyze a wide range of low-level facts about the user device and the browser making requests, as well as behavioral signals.

For example:

Browser tracking: feature presence, js challenges…
App tracking: camera version, screen resolution, number of touch points…
Device tracking: the number of CPU cores, device memory, GPU…
User event tracking: mouse movements and touch events…

These client-side signals are crucial for detecting the most advanced bots, even when they forge their fingerprint to bypass less sophisticated security systems.

But don’t take our word for it: let us show you exactly what happens when you don’t collect any client-side signals, by zooming in on one specific use case and one method for client-side detection.

Use Case: Advanced Headless Chrome bots that modify their fingerprint to avoid detection.

In this use case, malicious actors attempt to conduct a credential stuffing attack using thousands of bots based on Headless Chrome and Puppeteer.

By default, Headless Chrome can be identified server-side via its user agent:
Mozilla/5.0 (X11; Linux x86_64)
AppleWebKit/537.36 (KHTML, like Gecko)
HeadlessChrome/79.0.3945.88
Safari/537.36

However, popular open source libraries such as Puppeteer extra enable developers to erase these obvious detection signals.

The Puppeteer extra library adds more features to the Puppeteer instrumentation framework. Thanks to its stealth plugin, the hackers can easily modify the fingerprints of their Headless Chrome bots. This alone will be enough to bypass most existing bot detection systems.

By default, Puppeteer extra will change the bots’ user agents so that they are consistent with human visitors, and remove attributes such as navigator.webdriver that are traditionally used to detect Headless Chrome.

The library also enables the bot developers to forge several other attributes, such as the list of plugins, the available codecs, or the GPU.

If you want to try for yourself, you can easily launch a crawler based on Puppeteer stealth using the code below. The main difference, compared to a traditional Puppeteer program, is simply that you don’t import puppeteer, but puppeteer-extra instead:

const puppeteer = require(‘puppeteer-extra’)
// Enable stealth plugin with all evasions
puppeteer.use(require(‘puppeteer-extra-plugin-stealth’)());

(async () => {
// Launch the browser in headless mode and set up a page.
const browser = await puppeteer.launch({
headless: true
})
const page = await browser.newPage()

// Navigate to the page that will perform the tests.
const url = “https://yourwebsite … “;
await page.goto(url)
await browser.close()
})()

If you now verify the bot’s user agent, you can see that it’s become a legitimate one:

const userAgent = await page.evaluate(() => {
return navigator.userAgent;
})
console.log(userAgent)
// Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.0 Safari/537.36

Headless Chrome has a navigator.webdriver property that is equal to true. When we verify this value in Puppeteer stealth, we can see that navigator.webdriver isn’t present anymore in the bot’s fingerprint, making it undetectable with this technique.

const webdriver = await page.evaluate(() => {
return navigator.webdriver;
})
console.log(webdriver)
// undefined

Since both the user agent and the HTTP headers are the same as those of a human user, simple server-side HTTP fingerprinting will not be enough to identify this visitor as a bot. If your bot protection solution relies exclusively on server-side detection, your only hope is that it will sooner or later display suspicious behavior.

On the other hand, thanks to advanced client-side detection, solutions like DataDome can identify these advanced bots upon their first request, even though they are deliberately crafted to avoid detection.

For example, one of the techniques used by Puppeteer stealth to bypass detection is to override the canPlayType function, which is used to test the presence of audio and video codecs.

However, doing this leaves a trace. Indeed, we can test if the canPlayType function has been overridden by executing the code below.

In version 2.4.5 of Puppeteer stealth plugin, if you ran the following code, you obtained:

const canPlayTypeTs = await page.evaluate(() => {
var audioElt = document.createElement(“audio”);
return audioElt.canPlayType.toString();
})
console.log(canPlayTypeTs)
// ‘function () { [native code] }’

In the case of a legitimate Chrome browser used by a human, however, you obtained:

‘function canPlayType() { [native code] }’

Verdict: The first user is a bot.

Conclusion

As our use case demonstrates, server-side fingerprinting must be combined with client-side signals in order to accurately detect advanced bots.

Of course, this is only a very simple example of the kind of client-side detection DataDome performs. Our client-side module collects hundreds of different signals like the ones discussed above for every request, and leverages hundreds of billions of events that we are tracking every day.

We also have specific detection methods for all the major instrumentation frameworks, such as Selenium and Playwright. This enables us to detect even the most advanced bots in real time, including those that actively lie about their fingerprint in order to masquerade as humans.

Unfortunately, we can’t give away too much information about the exact signals we use to unmask even the stealthiest bots. But trust us: it works, as this Puppeteer extra stealth user discovered.