Wait, wait … “bots”, “hits”, “scraping”… what does it all mean?
Do not worry. This FAQ contains all the necessary information to understand what we do, why we do it, and how. Just browse and see.
A bot (short for “robot”) is an automated or semi-automated program built to interact with webpages and web servers, to access and scan the data and content on said pages in a way meant to mimick the actions of an human visitor.
A good bot usually operates for a search engine, crawling the web regularily to index web pages and make them appear in online search results. A bad bot is usually programmed for malicious purposes such as data harvesting, content scraping, competitive intelligence or marketing and advertising fraud.
Bad bots can be a threat for various reasons:
- Competitive intelligence: scraping and reusing your data, depriving you of traffic and sales revenue
- Content theft: republishing of proprietary content, hurting your SEO and depriving you from traffic and ads revenue
- Ad Fraud: invalid clicks and impressions causing a waste of advertising spending
- Hacking & brute force attacks: voluminous attacks of your website infrastructure, causing a rise in maintenance costs and lowering customer experience
Web scraping is the act of extracting data from a website, in order to store it or make it available elsewhere. While web indexing (as conducted by search engines) aims at facilitating access to relevant data, Web scraping is a way to use said data by harvesting it from its source.
Ad fraud is a pervasive way to disrupt the marketing efforts conducted by a company. Automated programs generate fake clicks and impressions on links and display ads, costing companies money for invalid views and actions.
In technical terms, a “hit” is generated for every element loading on a webpage visited by a human or an automated program. This includes the page itself, its CSS, hosted images or advertisements.
We’ve analyzed billions of hits on our partners’ websites over the past months. Sorting through this massive database, we have been able to establish key charateristics – both technical and behavioral – of bad bot traffic. Any hit matching our detection criteria with certainty can be blocked simply by our module, by denying access to the incoming hit.
Detection is the toughtest part of the job. Once a bad bot has been identified by our API server, our module can deny access or serve a Captcha to the bot in real time to prevent it from accessing your content.
You can simply download the module by applying for a DataDome dashboard access. You will then have the opportunity to select the version of the module corresponding with your servers, and integrate it directly by copying and pasting the code within your server.
Our solution is fully compatible with 80% of web servers, including Nginx, Varnish, Apache, IIS, and with popular applications such as WordPress.
DataDome comes fully equipped with custom dashboards, allowing SysAdmins and security experts to get real-time insights on bot activity.
As a SaaS solution, we store all of our data in the cloud. As a result, it takes less than 2 milliseconds, on top of standard network latency, for us to match a hit on your website to our API server, and let it decide whether to grant access or to keep the intruder at bay. Your customers won’t notice a thing – as a matter of fact, they’ll even enjoy a smoother experience due to traffic deloading. With the KeepAlive connexions created by our modules, we notice a maximal total latency of less than 30 milliseconds.
- We also provide our customers with a timeout setting, which deactivates our module if reached. This feature ensures that in any case, our module will never degrade your customers’ experience.
We care deeply about the protection of our customers’ data. Everything is stored in separate indexes, in multiple Tier 3 datacenters.
Not quite. But on top of our primary focus on content scraping, infrastructure maintenance, and advertising fraud, our solution also protects websites and applications from common attacks and exploits, such as SQL injection and application intrusion.
While DataDome doesn’t operate on the network layer, our solution can prevent DDOS attacks on your applications by protecting your assets against password brute force and injection brute force.
Hosting companies can protect your website on the network layer, ensuring responsiveness of your pages by managing massive bot activity, but do not provide specific protection for your applications and contents. DataDome fulfills this need by detecting subtler bot activity, that hosting companies aren’t equipped to notice.
ocking an IP is not enough to protect your content. By doing so, you run the risk of blocking legitimate search engine bots, or human visitors as a single IP can be used by hundreds of different users. Many bots tend to be “clever” and only run a few hits per IP every day to remain undetected by volume-based protection solutions.
Want to know how bots are impacting your business ? Create your DataDome account right now and get started for free.