Pagesjaunes.fr Relies on DataDome’s Expertise to Better Protect its Data

Control Over Bot Traffic
Protection of Valuable Data
GDPR Compliance
DataDome
Table of contents
7 Dec, 2017
|
min

Online directories are premium targets for web scrapers. What better sources for creating a commercially exploitable database, quickly and with minimum effort?

SoLocal, which includes the brands Pagesjaunes (France’s Yellow Pages), Mappy, and Ooreka, is the European leader in local digital communication and appointment scheduling on the internet.

In terms of audience, pagesjaunes.fr is one of the top 20 websites in France, and the SoLocal group was in 8th position on JDNet’s September 2017 ranking.

The main point of blocking bots for SoLocal is to be able to manage the use of its data.
Benjamin Letrou, Architecture, Performance, & Security Manager, Pagesjaunes.fr

The Problem: Demonetization of Copied Data

Benjamin Letrou is the architecture, performance and security manager of the pagesjaunes.fr portal. He had known for a long time that a significant portion of the site’s traffic was generated by bots: some useful, such as Google’s indexation bot; others harmful, like the hackers who were exploiting the data they scraped via the website or the mobile API.

In most cases, data extractions were made for lead list building purposes. These lists can then either be exploited directly or sold to third parties.

Directory data are so attractive that you can find specialized software for extracting data from Yellow Pages, no matter in which country. For a hundred dollars or so, any internet user with basic technical skills can extract the Yellow Pages data he or she wants, unless they are protected.

Different methods were used to measure and track data leaks. For example, the use of decoys made it possible to follow the data all the way to transfer and redistribution.

The technical team also monitored traffic and technical logs very closely, in order to detect abnormal behaviors such as aggressive IP addresses or unusual queries.

The scale of the challenge was therefore well identified, but countermeasures still had to be found against the massive data extractions.

“We have always had infrastructure-level protections against massive attacks. They were efficient, but they were based on IP addresses only and not very sophisticated,” says Benjamin Letrou. “Other measures such as honeypots ensured additional protection, but they were cumbersome to set up.”

The Solution: Step 1. Identifying & Analyzing the Threat

For some time, therefore, Benjamin Letrou had been looking for a more efficient solution.

“I had a relatively clear idea of what I wanted, and I was contemplating developing the protection solution myself when someone on my team suggested I take a look at DataDome,” explains Benjamin. “I quickly understood that, technically, the solution met our needs. All that remained was to agree on the financial aspect!”

The integration was quickly done, as the DataDome solution matched well with the technologies pagesjaunes.fr were already using. The DataDome module, which is compatible with most configurations including multi-cloud, required minimal modification to the technical infrastructure.

In a first phase, Mr. Letrou and his team only used the DataDome solution to observe the traffic, without activating the bot blocking function. Based on the results and analysis thus obtained, the solution was adapted to pagesjaunes.fr while remaining within the scope of a generic product.

“The integration went very well,” Letrou affirms. “The DataDome team was extremely responsive, whether we needed explanations or solutions that meet our demands.”

The result enabled the data analysis team, which is responsible for the quality of traffic data sent to advertisers, to corroborate and refine the filtering they had been carrying out for years. The information provided by the DataDome solution has fueled and supplemented existing tools, enabling the team to better understand their audiences and to strengthen the relationship with advertisers.

The Result: Protecting Data to Enhance Its Value

Following the analysis phase, DataDome’s smart protection was activated. Scraper bots are now blocked from accessing the site. Their attempts are identified and mapped on dashboard, which allows the pagesjaunes.fr teams to measure bot activity in real time. DataDome now protects pagesjaunes.fr’s valuable data, bringing about multiple benefits.

“Bot traffic and data mining were issues that required lots of resources, and sometimes manual processes,” observes Letrou. “Now, it’s managed. Our data are secure, and no longer accessible to bots.”

Last but not least: DataDome is part of pagesjaunes.fr’ process for compliance with the GDPR, a new European legislation which takes effect in May 2018, by ensuring, “among other things, the protection of personal data hosted online,” Benjamin Letrou confirms.

DataDome
dd product home overview

Still exploring?

Start with an on-demand demo.