Why Client-Side Signals are a Must-Have for Detecting Sophisticated Attacks

Bot management Scraping

When it comes to bot detection, you might hear arguments both in favor of and against leveraging client-side signals. This article explains how client-side signals significantly improve accuracy in detecting modified/anti-bot-detection frameworks.

What are client-side signals?

Client-side signals are signals collected from the end-user device. They can be collected in the browser using JavaScript (JS), or in mobile applications using an SDK. Some people advocate against the use of client-side signals because they are collected from the device of the user (or the potential attacker), and can therefore be manipulated (we address this limitation below).

At DataDome, on the other hand, we strongly advocate for collecting client-side signals in addition to server-side and reputational signals. To give you an understanding of why collecting client-side signals is important, I dive into the following topics below:

How to Address the Limits of Client-Side Signal Collection

Client-Side JavaScript Signals for Bot Detection in the Wild

3 Types of Bot Detection Signals:

1. Behavior

Behavioral signals can be collected on both the server side and the client side.

For example:

On the server side, a detection engine can analyze the way the user is browsing a website/mobile app. The behavior can be analyzed as a time series, using unsupervised machine learning (ML) to detect outliers in the number of requests over time. We can also conduct more advanced graph-based detection to analyze the transition between different URLs, whether or not there are cycles in the URLs visited.

On the client side, behavioral signals are collected in the browser using JavaScript, or in a mobile application using an SDK. These signals often come from events linked to user interaction with the website/app, and include mouse movements, clicks, touch events, typing speed, and sensor signals (such as those from an accelerometer).

Behavioral signals are fed to machine learning models that aim to detect whether or not the way a user interacts with a website or a mobile application is consistent with human behavior.

2. Reputation

Reputational signals are only computed on the server-side. Reputation can be computed at different levels of granularity (user session, IP address, autonomous system) and in different time windows (hours, days, months).

Reputational signals enable a detection engine to leverage prior knowledge to best adjust its decisions/aggressiveness. Thus, if an autonomous system is often linked to credential stuffing, then ML models would consider that information (along with thousands of other signals) and be more aggressive with the incoming traffic.

3. Signatures

Signature signals can be collected both on the server side and the client side. Signatures take diverse forms, including:

HTTP fingerprints, based on HTTP headers (server side).
TLS fingerprints, based on metadata extracted during the TLS handshake (server side).
Browser fingerprints, based on information about the browser, device, and operating system (OS) collected using JS (client side, in the browser).
Mobile fingerprints, based on information about the device and OS collected using an SDK (client side, in a mobile application).

Browser fingerprints and mobile fingerprints are paramount to thorough detection. Advanced JS or SDK challenges make it possible to detect popular headless browsers and automation frameworks, such as:

Headless Chrome
Puppeteer
Selenium
Playwright

In addition to detecting generic bot automation technologies, client-side challenges are also highly efficient at detecting and tracking modified bot frameworks that aim to bypass traditional bot detection techniques. In particular, client-side challenges can detect:

Puppeteer Extra Stealth—a modified Puppeteer framework that lies about its fingerprint and forges CAPTCHA.
Modified Selenium Chrome Drivers
Modified Playwrights

To learn more about how client-side challenges help to quickly and accurately detect modified bot frameworks, you can read our two recent articles about Puppeteer extra stealth and modified Selenium.

To summarize, two types of bot detection signals can be collected on the client-side:

Behavior: Mouse Movements, Touch Events, Sensors
Signatures: Browser & Mobile Fingerprinting

Even if behavioral and signature signals can be flawed, DataDome has found that getting rid of client-side signals (~30-40% of all the signals) is detrimental to high-quality bot detection. In particular, automated (headless) browsers like Headless Chrome possess consistent HTTP and TLS fingerprints inherently. Therefore, it’s crucial to gather as many signals as possible to swiftly detect bots.

How to Address the Limits of Client-Side Signal Collection

It is known that client-side signals can be spoofed or manipulated by attackers using several techniques. One example is overriding JS native objects to avoid detection, which is done by popular libraries such as Puppeteer extra stealth.

A common rule in security is: You should never trust any user inputs. This includes not only text or parameters sent by the user, but also any information collected on the client-side.

So, how do we address these limits?

We use a combination of techniques:

Code Obfuscation: The JavaScript code responsible for collecting the client-side signals in the browser is obfuscated to make it more difficult for attackers to understand what’s collected.
Collecting Raw Signals on the Client Side: All the detection logic is on the server side. That way, even if attackers can infer some collected signals, the purpose of collecting those signals and the way they are used are not exposed on the client side.
Frequently Update the Client-Side Signals: We update our client-side signals often to stay ahead of bot developers trying to adapt their bots. Our client-side signal collection modules (SDK and JSTag) are designed to make it easy, fast, and safe for us to deploy new versions that collect new signals. The way our detection engine is architectured also allows new signals to be quickly used by existing ML models and detection rules.

Finally, all signals are handled safely, which means having non-suspicious client-side signals does not grant someone any privileges. Users will not be allowed in just for sending a legit set of client-side signals. All requests from each user are still evaluated by our detection engine in real time based on all detection signals (behavior, reputation, server-side signatures, etc.) to detect malicious traffic.

Client-Side JavaScript Signals for Bot Detection in the Wild

The two graphs below, taken from our recent posts about Puppeteer extra stealth (graph 1) and modified Selenium (graph 2), show the volume of requests we blocked using signals collected on the client-side with JavaScript.

Client-Side Signals JavaScript Puppeteer Extra Stealth

Number of Puppeteer extra stealth requests blocked over time using client-side JavaScript signals.

Client-Side Signals JavaScript Modified Selenium Requests

Number of modified Selenium requests blocked over time using client-side JavaScript signals. The traffic is split per type of modified Selenium.

Without JS/client-side signals, we might have still detected the bots above using other signals, but it likely would have taken more time. Behavioral signals would require more requests before an accurate decision could be made. Luckily, client-side signals are instantaneous.

Whenever client-side signals are linked to a (modified) automated browser, we can confidently block the malicious session. Moreover, client-side signals help us provide more contextual information to our customers. Indeed, we can infer the type of technologies used by attackers—automation framework, whether or not it integrates with CAPTCHA farms—based on behavioral and reputational signals.

Conclusion

At DataDome, we argue that an efficient bot detection engine must leverage a large and diverse range of detection signals (client side, server side, behavior, reputation, signatures) in real time to detect advanced bots.

Even though client-side signals can sometimes be manipulated by attackers, they provide a significant benefit to threat detection and explainability when used with the right countermeasures (obfuscation, frequent updates, and no privilege).

Particularly, client-side signals are highly effective at detecting and tracking anti-bot frameworks that modify their fingerprints. To get a complete picture of which sophisticated attacks are targeting you, try a free threat assessment and 30-day trial of DataDome.