Bot traffic is more than just an annoyance, it’s a threat to your business. A website that sees regular bot visitors is usually a website with some sort of vulnerability that’s being exploited by automated bots.
Unfortunately for the owners of sites with this problem, many crawlers used by search engines and other big-name companies are also automated bots.
In order to mitigate bot traffic, we’ve compiled a list of techniques you can use in order to stop unwanted visitors cold in their tracks.
1. Use CAPTCHA On Traffic Using Outdated Browser Versions
This is a pretty simple solution. It simply involves setting up a CAPTCHA script and visiting the page using an outdated version of your website using Internet Explorer, Google Chrome or Safari, for example.
In the vast majority of cases, bots will not be able to properly read and process CAPTCHAs so they will be sent away.
2. Use A Honeypot To Find Bot Email Addresses
Using a honeypot is an easy way to intercept and gather data on the IP addresses of bots that have enslaved your site. A honeypot is essentially a page that’s full of trackable elements, such as graphics and scripts, but that doesn’t actually offer any value or interest to users. The idea is simple: you create a couple of pages with this content so bots will crawl them in order to collect data on your site.
With the information you gather, you’ll be able to personally identify computers that have infiltrated your site and may even find out their IP addresses. After gathering this data, you can react accordingly.
3. Pay Attention To Data Breaches In The News
As you probably know, search spiders tend to crawl the same websites over and over again. They do this by visiting the website’s root directory and then following all of the links in order to gather as much information about your site as possible.
So if another website that’s been compromised gets a mention in the news, chances are that your site will start receiving bot traffic from that site and its search engine spiders. This is especially true if you have a natural link with them (i.e. a link on their site that points to your site). You can opt to restrict traffic coming from these inbound links until the data breach has been secured, as a way to prevent likely bot traffic.
4. Block Popular Proxy Services
Bot traffic loves to use proxy services to conduct their attacks on websites. If you notice a significant amount of traffic coming from commonly used proxy services, you should consider blocking all of that traffic, or at least requiring a CAPTCHA from those data centers.
5. Use Rate-Limiting to Block Traffic From IP Addresses That Behave Maliciously
Rate-limiting is an easy way to block known and unknown bot traffic from flooding your site. It will allow you to determine how many page views (or “hits”) a particular IP address can access on your site in a given time period.
This will force any bot that tries to access your site repeatedly to wait for a specific amount of time before being allowed to view another page. This will slow down the process enough that most bots that are trying to hack your site will eventually give up and move onto another target.
6. Use Robots.txt To Prevent Bad Bots From Crawling Your Content
This is a very simple countermeasure that most websites don’t use. If you have .html files on your site that are similar to articles that have been published elsewhere, write a robots.txt in your site’s root directory. This will tell the bots that you want them to not crawl your site because it contains a particular file type. As a result, they will automatically stop crawling and indexing those files without any additional configuration from you.
7. Closely Examine Your Log Files
Check your log files often – both error.log and access.log. You will find out exactly what is getting onto your site and how it is being indexed. This will allow you to determine whether someone is actually trying to hack into your site or if there’s some other reason a bot is getting onto it. Knowing the difference between these scenarios can help you make more informed decisions about which countermeasures are right for you in each situation.
There are many, many other ways you can work to ensure the security of your website. But the seven I’ve listed here are probably the most effective and will go a long way in preventing serious damage. The best practice is to have a system in place so that if something does happen, it can be very quickly fixed and you don’t lose any time in getting back to business.