If you found this page searching Google or search engine, you must be looking for a solution how to block those bad guys from downloading your whole site. I know exactly how it feels to be ripped off, and I have done some research in this area and found few solutions that really work. I thought I would share this with you so that you could save your bandwidth from illegal downloaders.
1. Robots.txt
Robots.txt is the first choice and easiest way to block some of
the agents, some of them are smart enough to rewrite this text
file or ignore it totally. Nevertheless it works for some of them.
To make a robots.txt is easy, open your Notepad type the following
User-agent: HTTrack
Disallow: /icons
Below is the screenshot.
![]() Blocking Bad Agents from downloading your website |
And save as robots.txt and upload it to your root (main folder) of your website. It should in the same place with your index.html in the main folder. Let's go through the code and explain what does it mean and do. 1. User-agent: HTTrack It detects the agents and if the user agent is HTTrack (which is a website downloading tool) it will block it... 2. Disallow: /iconsIt will block this agent to access icons folder, which means HTTrack would not be able to download this (icons) folder.
When I was implementing robots.txt for this HTTrack software first I wanted to block it from accessing all folders so I did this.
User-agent: *
Disallow: /
But it was smart enough to say "Robots.txt is too restrictive, proceeding with download or something". So I had to single out all the folders like I did for the icons folder. Then I checked again, and it could not download those folders - all of them.
For further reading how to work with robots.txt go to http://www.robotstxt.org/wc/robots.html
2. Include script in header section of the page
<head></head>
How this works, the idea is the same - to detect agents that download
your whole site and allow the good ones to access your site. So
you should be very careful when implementing any of this methods,
if you wrongly put the agent like Google in the script then your
website might be blocked from indexing by Google and would not
come up in the Google search results. Here is the code to block
the agents: Put the code below between your <head></head>
tags in your page.
<?
$agent = $_SERVER['HTTP_USER_AGENT'];
if(($agent == "WebCopier v.2.2")||
($agent == "WebCopier v2.5")||
($agent == "WebZIP/5.0 PR1 (http://www.spidersoft.com)")|| {
header("Location: http://www.yoursite.com/no_download.html");
exit();
}
?>
What you need to change here is the code in green, it should be your website address. You should create a page called no_download.html or any other name. And when the agent is blocked it will be redirected to this page. You could put there a text that says downloading your site without your permission is illegal..etc. So that the person downloading your site has some feedback for his bad intention.
To add more agents check your website logs and identify which agents you should include in your script, after you have identified just copy and paste ($agent == "WebCopier v2.5")|| one line and change the name of the agent as it is shown in your logs.
3. Block through .htaccess
If you have access to your htaccess file you have one
more option to block those bad agents. Put the following code
together with your addtional agent names that you would like to
block. Basically .htaccess is a text file that contains some useful
information like sending user to 404.html page when the page is
not found..etc If you don't have an access and don't know what
is it, then contact your Administrator.
RewriteCond %{HTTP_USER_AGENT} ^WebZip
[OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.*$ no_download.html [L]
To add more agents, just add on more line RewriteCond %{HTTP_USER_AGENT} ^WebZip [OR] and change the name of the agent.
no_download.html is the page that has the legal notice or any other text you would like to display.













