Bot Traffic and Referrer Spam: Cleaning Your Analytics Data
Your analytics dashboard shows 500 visitors yesterday. Were they all real people? Probably not.
Bot traffic and referrer spam are a persistent problem in web analytics. Understanding what they are — and how to handle them — keeps your data usable.
What bot traffic looks like
Bots are automated programs that visit websites. They range from legitimate crawlers (Googlebot, Bingbot) to malicious scrapers and click farms.
Signs of bot traffic in your analytics:
- Sudden spikes in traffic from a single source with no corresponding uptick in engagement (goal completions, time on site)
- Unfamiliar referrers with very high traffic that disappear after a few days
- Geographic concentrations from regions you don't normally serve
- Pageviews on non-existent paths — bots often probe for
/admin,/wp-login.php, and similar paths
Legitimate bots (search crawlers) typically identify themselves via their User-Agent string. Most analytics tools, including Antlytics, filter known crawler User-Agents from your data.
What referrer spam is
Referrer spam is a technique where bots make requests to your site with a fake referrer URL — typically a domain the spammer wants to promote. In older analytics tools, site owners would investigate unfamiliar referrers by visiting the URL, giving the spammer impressions.
In a privacy-first analytics context, referrer spam shows up as visits from suspicious-looking domains that you don't recognise. The referrer is fabricated — the spammer never actually sent anyone to your site.
Common patterns:
- Short sessions with no goal completions
- Domains that look like they're promoting services ("semalt.com", "free-buttons.xyz", and similar patterns)
- Very high visit counts from a single domain in a short window
How filtering works
Antlytics filters known bot User-Agents at the point of data ingestion. Requests from recognised crawler strings (Googlebot, Bingbot, AhrefsBot, and others) are dropped before they reach your dashboard.
What this does not catch:
- Bots that spoof legitimate User-Agent strings (e.g., pretending to be Chrome)
- Headless browser-based scrapers
- Click farms using real devices
No analytics tool catches everything. The goal is to filter the obvious noise, not to achieve perfect accuracy.
Identifying suspicious traffic patterns
When you see something unusual in your referrers, check a few things:
-
Does the referrer domain exist? Open the domain in a new tab. A legitimate referrer has a real website.
-
Did the traffic convert? Bot traffic almost never completes conversion goals. If 200 visits from a referrer produced zero goal completions, that's a signal.
-
Is the spike consistent or a one-day event? Genuine traffic from a new source tends to have a natural decay curve. A single-day spike with no follow-up is suspicious.
-
What pages did they visit? Bots probing for vulnerabilities visit paths like
/admin,/wp-config.php, and.env. Real visitors visit your actual content.
What Antlytics filters by default
Antlytics filters:
- Known crawler and bot User-Agents
- Requests with
X-Antlytics-Skip: 1header (for first-party proxy setups where you want to exclude certain requests) - Localhost and private IP addresses (development traffic)
Your own visits during development are not filtered by default. Use a staging environment, or browse your production site in an incognito window with extensions disabled to avoid inflating your own data. See troubleshooting for tips on verifying your install without polluting live counts.
When bot traffic is actually your problem to solve
Aggressive bot traffic can affect more than just your analytics — it can put load on your server and inflate your bandwidth costs. If you're seeing sustained high-volume bot requests:
- Review your web server logs (not analytics — logs capture everything including filtered bot requests)
- Consider rate limiting at the CDN or reverse proxy level
- For Next.js on Vercel, Vercel's built-in bot protection filters many bad actors before requests reach your application
Analytics cleaning is about data quality. Server-level protection is about performance and cost. Both matter, but they're different problems.
FAQ
Why is my traffic suddenly much lower than last week? A drop could mean a bot that was inflating your numbers has stopped, or that legitimate traffic has declined. Compare your goal completion rate — if it stayed steady or improved, the drop in raw visits is probably noise leaving the data.
Can I manually exclude a spam referrer? Antlytics does not currently support custom referrer exclusion rules in the dashboard. Known spammers are filtered at the ingestion layer. For one-off spam events, the cleanest approach is to note the date range and mentally exclude it when reviewing trends.
Is it worth worrying about bots if my site is small? If your site gets fewer than a few thousand visits per month, bots represent a larger percentage of that traffic. A single bot scan can make your traffic look significantly higher than it is. For small sites, treat raw visit counts with some scepticism and focus on conversion metrics instead.
Do privacy-first tools attract more bot traffic than GA? No. Bots target sites, not analytics tools. What changes is that GA has more sophisticated bot filtering than most independent tools. Antlytics filters known bots — the trade-off is that less-recognisable bots may pass through.
Related: What is privacy-first analytics? · How cookieless analytics counts visitors · Privacy-first web analytics guide