Go Back    Forum > Digital Publishing / Web Sites > Website and Server Troubleshooting

Reply
 
LinkBack Thread Tools
  #1  
06-01-2011, 07:16 AM
admin's Avatar
admin admin is offline
Site Staff | Web Development
 
Join Date: Jul 2003
Posts: 4,310
Thanked 654 Times in 457 Posts
What Is This?
Inorganic traffic (Note: humans = organic traffic), aka "bots", can really pull at a server, and reduce how well your site works at any given moment. Some inorganic traffic is good, such as Bing or Google, and will help you. These smart bots help catalog your site in search engines, and tend to be fairly conservative when it comes to requesting data from your site, so as not to slow it or outright crash it. Bad bots, however, don't really care about your site -- these exist for far more nefarious reasons, such as harvesting email address, copying/stealing your content, or simply being a resource-eating nuisance. Bad bots, often identifiable by their User Agents, can eat your bandwidth, slow your site, and make overall performance lackluster.

How Much Security Does It Add?
Like any other method used to protect a site, this isn't 100% coverage to block problem, but simply another layer of protection for website owners. Blocking something is better than blocking nothing. (Inversely, there are some REALLY BAD lists online, that will block too much! You'll lose site visitors! YIKES! This list is very conservative and minimalist.)

How Well Does It Work? (aka "I Tried This and It Didn't Help!")
In an ideal world, every website on a server would have these rules, to globally block the problem User Agents. If you have a dedicated server, you can control incoming traffic completely. If you're using a VPS, you can control most of your resource hits caused by malicious/malformed traffic. On cheap shared hosting, it may not help much at all, in terms of speeding up the site, if you're the only site out of 100 or even 1,000 blocking such junk traffic.

How to Install Anti-Bot Protection
You'll need the ability to add and/or write to the root htaccess file. This comes default with most Linux/Apache (or Litespeed) hosts, and can be added to Windows servers (if you have VPS or dedicated, or a really nice host that will allow for a shared install!). If you don't know what htaccess is, ask -- we'll explain in another post.

Anyway, you'll want to add this to your rules:
PHP Code:
RewriteEngine On
RewriteCond 
%{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DownloadDemon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExpressWebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENTHTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ImageStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^ImageSucker [OR]
RewriteCond %{HTTP_USER_AGENTIndyLibrary [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetNinja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOCWebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^MassDownloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDowntool [OR]
RewriteCond %{HTTP_USER_AGENT} ^MisterPiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetVampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^OfflineExplorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^OfflineNavigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^PapaFoto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^TeleportPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebImageCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGoIS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebsiteeXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebsiteQuester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^XaldonWebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule 
^.* - [F,L
If not already entered into your htaccess file, you'll need this before the above rules:
PHP Code:
RewriteEngine On 
And at the end, instead of this:
Code:
RewriteRule ^.* - [F,L]
You may want to consider a URL from spampoison.com -- which works to further harm spammers, bots, etc. When you visit that site, you'll get a URL to use. For example:
Code:
RewriteRule /*$ http://english-1234567890.spampoison.com [L,R]
And that's it.

Junk traffic will be blocked or redirected to spampoison, and your site will run a little healthier.

When implemented on digitalFAQ.com, for example, site load times increased by anywhere from 100-400ms. (That's 0.1 to 0.4 seconds!) While that number may seem small to web hosting novices, that's a huge leap of time/load savings!

- Did this site help you? Then upgrade to Premium Member and show your support!
- Also: Like Us on Facebook for special DVD/Blu-ray news and deals!
Reply With Quote
Someday, 12:01 PM
admin's Avatar
Ads / Sponsors
 
Join Date: ∞
Posts: 42
Thanks: ∞
Thanked 42 Times in 42 Posts
Reply




Similar Threads
Thread Thread Starter Forum Replies Last Post
How to Block Windows Media Player's automatic updates kpmedia Computers 2 08-29-2017 12:57 AM
What settings -- f/stop, shutter speed, ISO, & focus -- for shooting artwork ? Sossity Photo Cameras: Buying & Shooting 3 12-11-2010 11:27 AM
How to password protect a folder with .htaccess/.htpasswd kpmedia Website and Server Troubleshooting 0 10-18-2010 04:41 AM
Block WordPress Spam: How to deny comments to non-referrer traffic kpmedia Website and Server Troubleshooting 0 07-25-2010 10:06 PM
Free methods to save video from an educational site? SailsOnBlue Encode, Convert for streaming 6 03-25-2010 03:51 PM

Thread Tools



 
All times are GMT -5. The time now is 07:33 PM