
In the last few years, there has been a sudden increase in the spam crawlers. Google analytics consultants have been aware of it and are still trying to figure out the answer to such a huge problem as it leads to a great noise in the accumulated data. However, there are a few methods by which such spam crawlers can be controlled and avoided.
The primary mission of bots is to execute the task of crawling the page meticulously. These bots are further used for unethical and harmful purposes such as creating elements of spam on other websites. Some of the activities that are carried out by spambots are as follows:
- Increasing website traffic by spurious methods
- Creating useless data
- Exposing the sites and machines to the malware
- Collecting email Ids and creating numerous fake accounts and domains
The list goes on. To begin with, when a user creates a Google Analytics account, he should create more than four properties. These properties will have their own tracking ID and help find the spam domains since they are not linking to the site’s ID.
There have been few tips and tricks explained by some of the best google analytics consultants to remove these bots and avoid the spam domains.
Referral Spam
There are two types of referrals:
1. Ghost Referral
A few spam domains that give the reference of a valid website to your page and will keep sending spam traffic without visiting it. Since their visit does not get recorded the standard ways of blocking them does not work.
Therefore, as mentioned above create a list of hostnames and put them in the block list. You can get this list easily of the Internet that shows active spambots over the years.
You can also check the hostnames that are sending traffic by going to
Audience > Technology > Network > Hostname
Never underestimate the reference of famous websites as in most of the cases ‘ghost spam’ is activated through them.
After creating this list, open the google analytics account and go to
Administration > View filters > Edit Filters
Add the filter name, check the include button, give the field name Hostname and add the created list in the filter pattern. Save this filter.
However, please note that the history of spam does not get removed by this method.
2. Non-Ghost referral
The bots from these domains actually make a visit to the websites hence the process to handle these bots varies from that of ghost ones. Several methods are available to handle such bots:
Filter
A different type of filter is used for such visiting domains. They can be removed by going to:
Administration > View Filters > Add filters to view
On this page create a new filter and give it a new name. Click on the exclude button in the filter type and click on the referrals for the filter field. Add the URL of the spam domain (with the vertical bars if there is more than one) in filter pattern.
.htaccess
This method makes use of coding and it blocks the spam domains altogether.
ReWriteEngine On
ReWriteCond %{HTTP_Reffer}^http://Website name.com/[NC,OR]
ReWriteRule.*-[F]
It is advisable to create a backup beforehand as any misplaced code might harm the site negatively.
Wp-Ban: Since WordPress is a vastly used CMS, it’s imperative that we talk about it over here. Wp-Ban is a plugin used by WordPress users who do not prefer using .htaccess. It blocks the offending sites entirely using their IP address, URL or IP range.
Google Analytics
Click on the checkbox for ‘Exclude all hits from known bots and spiders’ under the review setting page in Google Analytics. It will keep the well-known spam bots from crawling the web pages.
Other than these methods keeping alert about such issues help a lot:
- Use a firewall and security system to avoid malware
- Keep a tab on server logs
- Check the site traffic for unusual spikes
The following links have a list of all those sites that are the known sources of spam.
http://help.analyticsedge.com/spam-filter/definitive-guide-to-removing-google-analytics-spam/
https://www.optimizesmart.com/geek-guide-removing-referrer-spam-google-analytics/
This list will help in making it easy for users to determine the inappropriate domains and taking necessary measures to avoid any data noise.