Wondering why your site logs show hits from odd sounding user agents and referrers? Those could be unwanted requests from bad bots, scrapers and other unauthorized tools trying to access your site. Left unchecked, they can seriously impact performance, availability and even security.
This comprehensive guide will teach you how to reliably identify and block unwanted requests in popular platforms like Nginx, Apache and WordPress.
Why Block Unwanted Requests?
Here are five key reasons you should actively block unwanted traffic and requests:
- Improve site performance by reducing unnecessary load and junk requests. Quicker response times and lower resource usage.
- Preserve availability for legitimate human users by limiting abusive requests from bots. No more access denials during traffic floods.
- Enhance security by stopping common reconnaissance probes and attempted attacks. Prevent exploits.
- Get accurate site analytics without fake visits and junk traffic diluting metrics. Better data to inform business decisions.
- Reduce infrastructure costs by blocking requests before they waste resources and bandwidth. Scale capacity optimally.
According to a 2022 survey by Distil Networks, bad bots account for over 25% of all website traffic. An earlier Imperva report found over 55% of bot traffic as malicious. Left unchecked, these unwanted requests can cripple infrastructure and bankrupt businesses.
By proactively blocking scrapers, aggressive crawlers and other malicious requests, sites see faster response times, lower resource usage and better user experience for genuine human visitors. Let‘s examine common categories next.
Types of Unwanted Requests
Here are the usual suspects behind unwanted requests:
- Scrapers and Bots: Copy and steal site content automatically without taking permission. Target blogs, news sites and online stores.
- Vulnerability Scanners: Probe websites for security flaws to exploit like SQLi, XSS.
- Spam Referrers: Send fake click visits to boost advertising revenue and site visitor counts.
- Malware and Botnets: Infect websites to attack other sites or send spam. Leverage vulnerabilities.
- Aggressive SEO crawlers: Rapidly crawl sites risking denial of service. Stress test capacity limits.
- Cloud Instance Scanners: Identify misconfigured cloud assets to exploit like open S3 buckets.
- Credential Stuffers: Automatically try stolen username/password combos across sites. Check for reused credentials.
In 2022, bad bots made over 37 billion unauthorized access attempts across web applications according to Distil Networks. Who is hitting your site and how can you tell?
Identifying Unwanted Requests
Unusual spikes in traffic, odd browser signatures and suspicious access patterns are telltale signs of unwanted automated requests.
Log analyzers like GoAccess or parsing server access logs can reveal both high level trends as well as specifics to block abuse.
Look for unusual spikes in:
- Traffic volume: Especially around the same time windows indicate bot floods.
- Bandwidth usage: Unusual large spikes suggest big uploads/downloads.
- 400-500 errors: Steep rise shows script kiddie scans trying to exploit common vulnerabilities.
- 404 errors: A sharp surge indicates recon scans, brute force attempts and hacking activities.
Now extract suspect signatures for specific blocking:
- User Agents: Unusual signatures like Python-urllib, SearchEngineSpider indicate scrapers and crawlers.
- IP addresses: The same source IP making 100s of requests point to attacker IPs. Zero in by geo-location, ASN etc.
- Request signatures: Odd patterns like common malware paths, known vulnerable param combinations etc help craft precise blocking rules.
- Referrer domains: Sites that send fake traffic will appear consistently as top referrers.
Got suspects confirmed? Let‘s start blocking unwanted requests!
Block Requests in Nginx
Nginx powers over 35% of the top 10 million websites. With inbuilt features like access control and rewriting rules, Nginx makes it easy to block requests.
Step 1: Always take a backup before modifying configuration:
cp /etc/nginx/nginx.conf /etc/nginx/nginx.conf.backup
Step 2: Open nginx.conf and add blocking rules:
Block User Agents
if ($http_user_agent ~* "(badbot|scraper|nofollow)") {
return 403;
}
Block Referrers
if ($http_referer ~* "(semalt\.com|.*badref)") {
return 403;
}
Step 3: Test configuration and reload Nginx:
nginx -t
service nginx reload
Now abnormal user agents and referrers will start seeing 403 Forbidden errors. Watch your access logs to confirm.
Based on traffic patterns, expand blocking rules for optimum protection while avoiding unintended impacts.
For example, to block a range of suspicious user agents:
if ($http_user_agent ~* "(bot|crawler|spider|scan|sniff|audit|attack|slurp)") {
return 403;
}
And referrers:
if ($http_referer ~* "(semalt|buttons-for|site\\.com)") {
return 302 /blocked.html;
}
Custom Block Pages
To display customized block pages versus default errors, use:
if ($http_user_agent ~* "badbot") {
return 302 /blocked.html;
}
This cleanly redirects blocked requests to a /blocked.html page explaining access policies. Custom pages improve user experience.
Advanced Blocking
Going beyond blacklisting, Nginx also allows whitelisting permitted user agents while blocking everything else.
For example, to allow only specified search engines and block all other bots/crawlers:
set $wanted_bot ‘Googlebot|bingbot|YandexBot‘;
if ($http_user_agent !~* $wanted_bot) {
return 403;
}
Other advanced techniques include:
- Rate limiting: Allow only a fixed number of requests per minute from a client. Prevents abusive levels while permitting some access.
- CAPTCHAs: Block bots lacking human cognitive abilities using Turing tests. Risks user experience and accessibility issues.
- Client Puzzles: Make clients solve computational puzzles before allowing requests. Discourages abuse without logs or blocking specific entities.
Based on your site audience and content policies, choose appropriate blocking methods.
Block Requests in Apache
The Apache HTTP server powers over 33% of all active websites. Here is how to block requests in Apache:
Step 1: Backup configuration files
cp /etc/httpd/conf/httpd.conf /etc/httpd/conf/httpd.conf.backup
Step 2: Open httpd.conf and add blocking conditions.
To block user agents:
RewriteCond %{HTTP_USER_AGENT} ^JakartaCommons
RewriteRule ^ - [F]
To block referrers:
RewriteCond %{HTTP_REFERER} semalt\.com [NC]
RewriteRule ^ - [F]
The [F] flag returns a 403 Forbidden error.
Step 3: Restart Apache for changes to take effect:
service httpd configtest
service httpd restart
Watch access logs to confirm blocked requests are stopped effectively while minimizing false positives.
For blocking a group of user agents, use:
RewriteCond %{HTTP_USER_AGENT} ^.*(bot|crawler|spider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*vermin.*
RewriteRule ^ - [F,L]
The case insensitive [NC] flag catches variants, while the [OR] flag blocks multiple conditions.
Custom Error Pages
To display custom block pages:
Redirect 302 / http://example.com/blocked.html
This presents a cleaner message to users versus default errors.
Based on changing visitor patterns, evolve Apache blocking rules for optimal protection.
Block Requests in WordPress
For WordPress sites not on dedicated hosting, server access is limited. Instead, use .htaccess
:
Step 1: Back up .htaccess
in the root folder
Step 2: Add blocking rules
# Block User Agents
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^.*badbot.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*crawler.*
RewriteRule ^.* - [F]
# Block Referrers
RewriteCond %{HTTP_REFERER} semalt\.com
RewriteRule ^.* - [F]
Step 3: Save changes and monitor effects on traffic.
This blocks unwanted user agents and referrers by throwing a 403 Forbidden error.
Instead of .htaccess
, WordPress security plugins like WordFence allow blocking requests via an intuitive admin dashboard while offering additional protections too.
For reliable blocking, migrate to dedicated hosting/cloud servers where server access enables configurations to block requests at the edge.
Complementary Defenses
For stronger protection, deploy these technologies that filter incoming traffic:
- Web Application Firewall (WAF): Analyzes requests according to rules to block threats at Layer 7. Protects against zero-day threats.
- Cloud CDN: Absorbs DDoS attacks and filters some automated requests before they reach origin infrastructure. Scales to absorb spikes.
- Server Hardening: Minimizing vulnerabilities through patches, protocol disabling, firewall configs etc to reduce attack surface area. Mandatory.
Also enforce site policies like robots.txt, restrictive crawling policies, strict Terms of Service prohibiting automation and scraping to discourage bad bots through legal means.
Multilayered blocking maximizes protection while allowing legitimate requests.
Monitoring Effectiveness
Configuring request blocking is half the battle won. The real test lies in monitoring effectiveness.
Analyze traffic patterns after blocking specific suspects. Watch if the same spikes reappear over time as attack tactics evolve to evade defenses.
Proactively scan your site too using tools like Sucuri SiteCheck to uncover issues early before exploitation.
Based on shifting bad bot activities, keep optimizing filtering rules to prevent leakage while avoiding unintended impacts on normal traffic.
For instance, track block rates through Nginx logs:
tail -f /var/log/nginx/access.log | grep 403 | awk ‘{print $9}‘ | sort
The 403 responses indicate requests being blocked, while the 9th field shows the specific user agent or referrer denied.
Detailed logs inform fine-tuned blocking rule updates to fortify defenses.
Troubleshooting Issues
Server blocking rules powerful but can also inadvertently cause unintended cascading issues when done wrongly:
- Dissatisfactory customer experiences from overly zealous blocking
- Loss of legitimate business requests leading to revenue impact
- CMS and app breakage stopping key functionality
- Domain/IP reputational issues triggering further blocks elsewhere
Debugging misconfigured blocking rules requires sifting through relevant log files like Nginx/Apache access logs, UFW and iptables firewall logs, OS syslog etc.
Analyze logs to pinpoint what is being blocked and why. Based on findings, refine rules for precision blocking while allowing legitimate traffic.
For instance, to permit valid site user agents blocked inadvertently:
set $good_bot ‘Twitterbot|Slackbot‘;
if ($http_user_agent ~* $good_bot) {
set $bot_allowed ‘1‘;
}
if ($bot_allowed !~ ‘1‘) {
return 403;
}
Custom complicated allow rules, combined with broader block rules act as a firewall protecting sites from real threats.
Conclusion
I hope this comprehensive guide helped you learn insider tricks to reliably identify and block unwanted requests.
Unused to combating bad bots? Start with basic blocking by user agents and referrers. Later evolve to advanced techniques like rate limiting as your needs grow.
Combine server-side blocking with edge defenses like CDNs and WAFs for layered protection. And monitor logs vigilantly to plug leaks early before trouble starts.
Stopping threats like scrapers, spam bots and credential stuffers while welcoming genuine visitors is key for robust security, performance and reduced costs.
Aim for precision blocking to mitigate digital risks, while avoiding business disruption. Stay safe!