Bot Traffic and Referrer Spam: Cleaning Your Analytics Data

Your analytics dashboard shows 500 visitors yesterday. Were they all real people? Probably not.

Bot traffic and referrer spam are a persistent problem in web analytics. Understanding what they are — and how to handle them — keeps your data usable.

What bot traffic looks like

Bots are automated programs that visit websites. They range from legitimate crawlers (Googlebot, Bingbot) to malicious scrapers and click farms.

Signs of bot traffic in your analytics:

Sudden spikes in traffic from a single source with no corresponding uptick in engagement (goal completions, time on site)
Unfamiliar referrers with very high traffic that disappear after a few days
Geographic concentrations from regions you don't normally serve
Pageviews on non-existent paths — bots often probe for /admin, /wp-login.php, and similar paths

Legitimate bots (search crawlers) typically identify themselves via their User-Agent string. Most analytics tools, including Antlytics, filter known crawler User-Agents from your data.

What referrer spam is

Referrer spam is a technique where bots make requests to your site with a fake referrer URL — typically a domain the spammer wants to promote. In older analytics tools, site owners would investigate unfamiliar referrers by visiting the URL, giving the spammer impressions.

In a privacy-first analytics context, referrer spam shows up as visits from suspicious-looking domains that you don't recognise. The referrer is fabricated — the spammer never actually sent anyone to your site.

Common patterns:

Short sessions with no goal completions
Domains that look like they're promoting services ("semalt.com", "free-buttons.xyz", and similar patterns)
Very high visit counts from a single domain in a short window

How filtering works

Antlytics filters known bot User-Agents at the point of data ingestion. Requests from recognised crawler strings (Googlebot, Bingbot, AhrefsBot, and others) are dropped before they reach your dashboard.

What this does not catch:

Bots that spoof legitimate User-Agent strings (e.g., pretending to be Chrome)
Headless browser-based scrapers
Click farms using real devices

No analytics tool catches everything. The goal is to filter the obvious noise, not to achieve perfect accuracy.

Identifying suspicious traffic patterns

When you see something unusual in your referrers, check a few things:

Does the referrer domain exist? Open the domain in a new tab. A legitimate referrer has a real website.
Did the traffic convert? Bot traffic almost never completes conversion goals. If 200 visits from a referrer produced zero goal completions, that's a signal.
Is the spike consistent or a one-day event? Genuine traffic from a new source tends to have a natural decay curve. A single-day spike with no follow-up is suspicious.
What pages did they visit? Bots probing for vulnerabilities visit paths like /admin, /wp-config.php, and .env. Real visitors visit your actual content.

What Antlytics filters by default

Antlytics filters:

Known crawler and bot User-Agents
Requests with X-Antlytics-Skip: 1 header (for first-party proxy setups where you want to exclude certain requests)
Localhost and private IP addresses (development traffic)

Your own visits during development are not filtered by default. Use a staging environment, or browse your production site in an incognito window with extensions disabled to avoid inflating your own data. See troubleshooting for tips on verifying your install without polluting live counts.

When bot traffic is actually your problem to solve

Aggressive bot traffic can affect more than just your analytics — it can put load on your server and inflate your bandwidth costs. If you're seeing sustained high-volume bot requests:

Review your web server logs (not analytics — logs capture everything including filtered bot requests)
Consider rate limiting at the CDN or reverse proxy level
For Next.js on Vercel, Vercel's built-in bot protection filters many bad actors before requests reach your application

Analytics cleaning is about data quality. Server-level protection is about performance and cost. Both matter, but they're different problems.

FAQ

Why is my traffic suddenly much lower than last week? A drop could mean a bot that was inflating your numbers has stopped, or that legitimate traffic has declined. Compare your goal completion rate — if it stayed steady or improved, the drop in raw visits is probably noise leaving the data.

Can I manually exclude a spam referrer? Antlytics does not currently support custom referrer exclusion rules in the dashboard. Known spammers are filtered at the ingestion layer. For one-off spam events, the cleanest approach is to note the date range and mentally exclude it when reviewing trends.

Is it worth worrying about bots if my site is small? If your site gets fewer than a few thousand visits per month, bots represent a larger percentage of that traffic. A single bot scan can make your traffic look significantly higher than it is. For small sites, treat raw visit counts with some scepticism and focus on conversion metrics instead.

Do privacy-first tools attract more bot traffic than GA? No. Bots target sites, not analytics tools. What changes is that GA has more sophisticated bot filtering than most independent tools. Antlytics filters known bots — the trade-off is that less-recognisable bots may pass through.