Bot detection: how to detect bot traffic in 2021 | DataDome
Bot detection is (or should be) a cornerstone of any application security and online fraud prevention program. About a third of the world’s total web traffic is now made up of malicious bots, and bad bots are responsible for many of the most serious threats to online businesses.
However, accurately detecting bot traffic is harder than it has ever been. Bot developers are smart people and skilled engineers. They exploit the latest technologies and constantly devise new ways to circumvent security solutions. Many of them are also flush with cash: selling bots and associated services has become a thriving business in its own right.
So bad bot developers have motive, and they have means. Our task, then, is to deny them the opportunity to conduct their attacks.
In this article, we’ll explore the current state of bad bots and reveal (some of) the bot detection techniques we are using here at DataDome.
What security professionals expect
from a bot detection solution
What does a bad bot look like in 2021?
Bot detection techniques: on signals,
bot signatures & machine learning
Feedback loops &
human user experience
Threat intelligence & bot SOC
Bringing it all together:
real-time response at the edge
How to detect and manage web bots? – Wire19
What is Bot Traffic?
Bot traffic refers to any traffic that doesn’t originate from human users to a website or platform. Although the term ‘bot traffic’ might seem harmful at first, it’s important to remember that there are also good bots with beneficial purposes.
In fact, some bots are essential for the success of our website, like Googlebot or Bingbot. However, there are also bots that are malicious in nature, which are used for various purposes from initiating DDoS attacks, content/data scraping, or even launching data thefts.
Around 40-50% of total internet traffic is bot traffic, and a lot of them come from bad bots. This is why detecting traffic from bad bots, differentiating them from good bots and/or legitimate users, and managing the traffic are the concern of many businesses.
Current Challenges in Detecting Malicious Bot Traffic
As we have briefly discussed above, there are two layers to the challenges in detecting malicious bot traffic: differentiating between bot traffic and legitimate human traffic and distinguishing between good bots and bad bots.
Distinguishing bots from human users alone has become a pretty complex task. Especially bad bots are evolving rapidly with bad developers using the latest technologies faster than ever before. Also, these malicious bots are purposely designed to evade the traditional bot detection systems. Discerning between these bots with good bots is even more difficult.
With that being said, internet bots have evolved dramatically in recent years, and we can classify these bots (especially bad bots) into four ‘generations’:
First-generation bots or gen-1: built with basic scripting tools and mainly perform basic automated tasks like scraping, form spam, and carding. Mitigating them used to be simple since they often use inconsistent UAs(user agents), and they used to make thousands of addresses from just one or two IP addresses.
Third-generation bots or gen-3: the gen-3 bots allow what we know as low-and-slow DDoS attack, but can be used for identity theft, API abuse, and other applications. They are fairly difficult to detect based on device and browser characteristics and would require proper behavioral and interaction-based analysis to identify.
Fourth-generation bots or gen-4: currently the newest iteration of bots, and can perform human-like interactions like non-linear mouse movements and can also change their IP addresses. Advanced detection methods, often involving the use of AI and machine learning technologies are required in detecting these bots.
The latest generation of bots (gen-4) is very hard to differentiate from legitimate human users and basic bot detection technologies are no longer sufficient. So, how can we detect and manage these bots properly? Let us discuss it in the next section below.
How to Detect Bot Traffic? Differentiating Between Bots and Human Visitors
Here we will tackle the first layer of the challenge: how to detect bot traffic and distinguish them from human traffic.
Related Read: Microsoft Healthcare Bot now generally available on Azure Marketplace
Fortunately, we can use any analytics tools that can analyze your website traffic. Google Analytics, for example, is a good place to start. Then, we can check the following metrics:
If you see any surge in traffic for a particular day up to a week-long, it can be a sign of bot traffic. Typically your traffic should grow steadily over time according to your marketing performance. If, for example, you’ve seen an improvement in SERP ranking, you can (and should) expect an increase in traffic. The same thing can be said when there’s a new product launch. So, if there’s any spike without any correlation to your activities, you should take a closer look at this time period.
Another important metric to look at. The ‘healthy’ traffic can come from a variety of channels according to your marketing and promotional activities (organic search traffic, direct traffic, social media traffic, paid campaigns, and referral traffic). However, bot traffic commonly comes from direct traffic consisting of new unique users and sessions.
Also, suspicious hits from single IP addresses are the most basic form of bot activities and should be the easiest to detect (and manage). You should also take notes when there is increased activity on your site from locations you don’t cater to, or if you see hits from various other languages than your primary website language.
An abnormally high bounce rate over a period of time, as well as a surge in new sessions, can be a major sign of bot traffic. Also, a sudden and unnatural drop of bounce rate (to below 25% or below your usual percentage) might be a sign of malicious bot activity.
A significant slowdown of your website performance might be a sign that your servers are stressed out due to bot traffic.
Regularly monitoring these metrics can be an effective way of detecting bot traffic and activities. However, this is mainly a manual process that is not only time-consuming and labor-heavy but is generally ineffective in differentiating between good and bad bots and mitigating the activity of only malicious bots. For that, we’d need a different approach as we’ll discuss below.
Bot Detection Techniques
There are several types of bot detection techniques to distinguish bad bots or good bots:
This detection focuses on analyzing behaviors commonly done by human users. For example, non-linear mouse movements, certain habits in typing, browsing speed, and so on. By analyzing these behaviors, the detection system will predict whether the traffic is a human or a bot.
Above, we have mentioned that gen-4 bots are really good at mimicking human behaviors, but advanced behavioral detection tools can still detect the difference. Here are some common activities tracked in behavioral detection approach:
Mouse movements (non-linear, randomized vs linear, patterned)
Mouse clicks (bots might have certain uniformed rhythms)
Total number of requests during a session
The number of pages seen during a session
The order of pages seen and the presence of a pattern
The average time between pages
Whether certain resources are blocked (some bots block resources not useful for their missions to save bandwidth)
While behavioral detection is still mainly used to differentiate between bot and human traffic, it is also effective in recognizing bad bots, since malicious bots tend to perform certain behaviors (i. e. when they are performing data scraping, we can be sure that it’s a bad bot).
Fingerprinting Detection Technique
In fingerprinting-based detection, the detection system aims to obtain information about the browser and device used to access the website to detect any common signature carried by bad bots. The fingerprinting system usually collects multiple attributes and analyze whether they are consistent with each other. This is done to check the presence of spoofing or modifications.
Here are the common approaches in fingerprinting bot traffic:
Browser fingerprinting: the main approach is to check the presence of attributes added by headless (modified) browsers like PhantomJS, Nightmare, Puppeteer (headless Chrome), Selenium (for Firefox), and others. However, advanced bot developers can remove these attributes.
Checking OS consistency: similar to the above, but here we aim to check the consistency of OS claimed in the UA (user agent).
Inconsistent behavior: checking the consistency of features in a browser compared to a headless browser, specifically to check whether the browser is in headless/modified mode.
Red pills: checking whether the browser is running a virtual machine, which is a huge telltale sign that it is a bot.
Most of us should be familiar with the concept of CAPTCHA. The idea behind the concept of CAPTCHA is that the challenge presented should be (very) easy for human users but very difficult to complete by bots or automated programs.
As we all know, image and audio recognition are popular in CAPTCHA applications. However, recent advancements in audio and image recognition have caused these approaches to be quite obsolete to an extent.
There is no one-size-fits-all approach to bot detection, and each technique has its own benefits and drawbacks depending on the specific use case. Some can work better in discerning a certain type of bots, while some others might be the better approach in differentiating between bots and legitimate traffic.
However, today’s gen-4 bots can imitate human behaviors very well, and they are also distributed with various sophisticated methods and so IP-based detection is now no longer available. Advanced bot detection and protection software is now necessary if you want to safeguard your system from various cybersecurity threats related to malicious bot activities like DDoS, identity theft, data/content craping, and others.
Suggested Reading: How cutting-edge robotics technology can be advantageous to manufacturing industry?
About Author: Mike Khorev is an SEO expert and marketing consultant.
What is bot traffic? How to detect bot traffic and block it – PPC …
Bot traffic is any internet traffic coming from automated bots. These bots can perform tasks quicker than any human, making them very efficient and popular.
With so much misunderstood about bot traffic, we’re taking a look at the different robots involved and what it means for your website.
In a world where billions of users interact with each other
online every single day, the internet can seem like a hectic place. With users
liking pictures, retweeting messages, and upvoting comments, the amount of
daily web traffic on the internet is at an all-time high.
But just how many of these visitors are actually real?
With more and more bots being launched onto the internet
every single day, is this a good thing for website owners and users, or just
In order to fully understand what is bot traffic, we must first explore the different types of automated bots out there and what they do.
What Is Bot Traffic?
Bot traffic can be defined as any online internet traffic that is not generated by a human. This usually means the traffic comes from some kind of automated script or computer program that is made to save a user the time of doing all the tasks manually. Although these bots try to mimic human behavior, they are most certainly not human.
These automated bots can do simple things like clicking links and downloading images, or complicated jobs such as scraping or filling out forms. Whatever they are made to do, they usually do it at a large scale and run almost non-stop. If you’ve ever posted an image on social media like Instagram or Facebook and received hundreds of likes in seconds, then they are most likely bots.
With over 50% of the internet estimated to be bot traffic, it’s clear that bots can be found almost everywhere and on virtually every website.
To give you an idea of the different types of bots out there, here’s a quick breakdown of the good bots and what they do, as well as the bad bots.
The Good Bot Traffic
Although automated bot traffic does get quite a negative reputation from webmasters, there are in fact a range of legitimate bots out there that are only trying to help.
Search Engine Bots
The first and most obvious kind of good bot traffic has to be search engine bots. These internet bots crawl as much as the web as they can and help website owners get their websites listed on search engines such as Google search, Yahoo, and, Bing. Their requests might be automated and listed as bot traffic, but these bots are certainly good bots.
If you own a website, then making sure your site is healthy and always online is often a priority for many owners. To help users ensure their site is always accessible, there is a range of website monitoring bots out there that will automatically ping your site to ensure it’s still online. If anything ever breaks, or your website does go offline, then you’ll be immediately notified and be able to do something about it.
Trying to get your site to number one on search engines can be extremely difficult, especially when you don’t have a lot of information. Luckily, there is a range of software out there that can help improve your SEO efforts by crawling your site and competitors to see what you rank for and how well. Webmasters can then use this data to improve their search visibility and improve their organic web traffic.
Ensuring nobody has stolen your images and used them as their own can be a challenging task. With so many websites to continually check, the only sensible solution is to have an automated bot do it. These web robots crawl the web scanning for specific images to ensure nobody is illegally using any copyrighted content without permission.
The Bad Bot Traffic
Unlike the good bots we just mentioned above, bad bots do really bad things to your website and can cause a lot of damage if left to roam free. This can be any type of bot attack from sending fake traffic and spam traffic or something much more disruptive like ad fraud.
Web scrapers are annoying internet bots that scrape websites looking for valuable information such as email address and contact details. In other cases, they will steal content and images from websites and use them on their own site or social media accounts without permission. They don’t benefit anyone apart from the person who is using it to scrape data.
If you’ve ever got a bizarre email or blog comment from someone, then the chances are a spam bot left it. These bots love to leave generated messages (that often make no sense) on a website’s blog. They also fill out contact forms on websites and spam owners with promotional messages.
One of the oldest and deadliest bad bots out there has to be the DDoS bot. Known as distributed denial of service bots, these bots are often installed on unsuspecting victims PC’s and are used to target a particular website or server with the aim of bringing them offline.
Known as a DDoS attack, there have been plenty of reports in the past of them doing some severe financial damage to sites that have ended up being offline for several days.
These bots might seem like good bots from a website’s server logs, but that is unfortunately not the case. There is a range of malicious bots out there that will scan millions of sites for vulnerabilities and report them back to their creator. Unlike genuine bots that would inform the website owner, these malicious bots are specifically made to report back to one person who will then most likely sell the information or use it themselves to hack websites.
Click Fraud Bots
Unknown to many, there are plenty of sophisticated bots that produce a huge amount of malicious bot traffic specifically targeting paid ads. Unlike robots that produce unwanted website traffic, these bots engage in something known as ad fraud.
Responsible for fraudulently clicking paid ads, this non human traffic costs advertisers billions every year and is often disguised as legitimate traffic. Without good bot detection software, this bot activity can cost advertisers a large proportion of their ad budget.
How Can Traffic Bots Be Bad for Websites?
Now you know about the different types of good and malicious bots out there, how can bot traffic be bad for your site?
The important thing to understand about bots is that most of the scripts and programs are designed to do one job many times over. The creator of the bot obviously wants the job done as fast as possible, but this can bring up many problems for your site.
The biggest problem is that if a robot is continuously requesting information from your site, then this can lead to an overall slow down. This means that the site will be slow for everyone accessing it, which can cause massive problems if, for example, you’re an online store.
Consistent scraping requests could also lead to skewing important KPI’s and Google Analytics data such as your bounce rate.
In extreme cases, too much bot traffic can actually take your entire website offline, which is obviously not good. But thankfully, this is only in extreme circumstances, most of the time, the effects of bot traffic on your website are very subtle.
Having lots of bot traffic on your website will usually lead to things such as:
More page viewsHigher bandwidth usageIncorrect Google AnalyticsSkewed marketing data qualityDecrease in conversionsJunk emailsLonger load timesHigher server costsIncreased bounce rateIncreased strain on data centers
How to Detect Bot Traffic
If you want to check to see if your website is being affected by bot traffic, then the best place to start is Google Analytics.
In Google Analytics, you’ll be able to see all the essential site metrics, such as average time on page, bounce rate, the number of page views and other analytics data. Using this information you can quickly determine if your site’s analytics data has been skewed by bot traffic and to what extent.
Since you can’t see any IP addresses of users in Google Analytics, you’ll have to review these metrics to see if they make sense. A very low time on site metric is a clear indicator that most of your visitors could be bots. It only takes an internet bot just a few seconds to crawl a webpage before it leaves and moves onto its next target.
Another place to check in Google Analytics is the referrals section to check you aren’t receiving any referral spam. Many companies target other sites with a custom bot that will spam their website URL.
When a webmaster checks their referral traffic in Google Analytics they’ll see the name of the website and be inclined to visit. As crude as this sounds, it can help generate the site quite a lot of visitors (mainly out of curiosity! ). It might not sound like they are doing harm to your website, but they are actually skewing all of your metrics, wasting your bandwidth, and clogging up your server in general.
How to Stop Bot Traffic
Filtering bad bot traffic and stopping automated robots from harming your website is completely possible, but the solution will depend on the type of traffic source that is affecting your site. Remember, not all bot traffic is bad, and blocking bots such as search engine crawlers is really not a good idea!
If your website is prone to being scraped by robots, vulnerability scanners, and automated traffic bots, then the chances are you’ll want some bot filtering in the form of a firewall or CAPTCHA. The best way to do this is to install a free bot filter service on your website called CloudFlare.
Aside from being a Content Delivery Network (CDN), CloudFlare acts as an application firewall between the website and user, meaning it will only allow legitimate users to access your website. Any suspicious users won’t make it past and won’t get to access your site. This means they won’t waste your bandwidth, ruin your analytics, or have your content stolen.
Another useful way to block bots is to use your website’s robots txt file by filling it with user agents or the actual name of the known bots. You can learn more about blocking robots in the robots txt file in this handy guide. Of course, this only works if the robot respects the file, which most genuine bots do. If you’re trying to get rid of a pesky bad bot, then using the CloudFlare option mentioned above is the best.
However, if you’re looking to protect your website from other forms of bots such as fraudulent and repetitive clicks on your ads, then you’ll need something else.
Protect Your Ads From Bad Bot Traffic
Anyone who runs pay per click ads on Google is subject to bot traffic. With so many crawlers out there constantly scraping Google and its results, it’s only a matter of time before these bots click on your ads and ruin your analytics data and budget.
PPC Protect is an automated ad fraud detection tool that will identify any click fraud on your pay per click ads in real-time.
By collecting lots of data from every click, the software will be able to detect when an IP address is suspicious and block that particular user from seeing your ads in the future.
This helps combat bot traffic from SEO tools that crawl Google and other search engines looking for PPC ads. With plenty of these tools out there, you’d be surprised at how many times they crawl search results looking for ads and other information.
To protect your ads from the likes of unwanted bot traffic and scrapers, click below to sign up for a free 14-day trial of our service.