An htaccess file is a simple ASCII file, such as you would create through a text editor like NotePad or SimpleText. Many people seem to have some confusion over the naming convention for the file, so let me get that out of the way.
.htaccess is the file extension. It is not file.htaccess or somepage.htaccess, it is simply named .htaccess
In order to create the file, open up a text editor and save an empty page as .htaccess (or type in one character, as some editors will not let you save an empty page). Chances are that your editor will append its default file extension to the name (ex: for Notepad it would call the file .htaccess.txt). You need to remove the .txt (or other) file extension in order to get yourself htaccessing--yes, I know that isn't a word, but it sounds keen, don't it? You can do this by right clicking on the file and renaming it by removing anything that doesn't say .htaccess. You can also rename it via telnet or your ftp program, and you should be familiar enough with one of those so as not to need explaining.
htaccess files must be uploaded as ASCII mode, not BINARY. You may need to CHMOD the htaccess file to 644 or (RW-R--R--). This makes the file usable by the server, but prevents it from being read by a browser, which can seriously compromise your security. (For example, if you have password protected directories, if a browser can read the htaccess file, then they can get the location of the authentication file and then reverse engineer the list to get full access to any portion that you previously had protected. There are different ways to prevent this, one being to place all your authentication files above the root directory so that they are not www accessible, and the other is through an htaccess series of commands that prevents itself from being accessed by a browser, more on that later)
Most commands in htaccess are meant to be placed on one line only, so if you use a text editor that uses word-wrap, make sure it is disabled or it might throw in a few characters that annoy Apache to no end, although Apache is typically very forgiving of malformed content in an htaccess file.
htaccess is an Apache thing, not an NT thing. There are similar capabilities for NT servers, though in my professional experience and personal opinion, NT's ability in these areas is severely handicapped. But that's not what we're here for.
htaccess files affect the directory they are placed in and all sub-directories, that is an htaccess file located in your root directory (yoursite.com) would affect yoursite.com/content, yoursite.com/content/contents, etc. It is important to note that this can be prevented (if, for example, you did not want certain htaccess commands to affect a specific directory) by placing a new htaccess file within the directory you don't want affected with certain changes, and removing the specific command(s) from the new htaccess file that you do not want affecting this directory. In short, the nearest htaccess file to the current directory is treated as the htaccess file. If the nearest htaccess file is your global htaccess located in your root, then it affects every single directory in your entire site.
Before you go off and plant htaccess everywhere, read through this and make sure you don't do anything redundant, since it is possible to cause an infinite loop of redirects or errors if you place something weird in the htaccess.
Also...some sites do not allow use of htaccess files, since depending on what they are doing, they can slow down a server overloaded with domains if they are all using htaccess files. I can't stress this enough: You need to make sure you are allowed to use htaccess before you actually use it. Some things that htaccess can do can compromise a server configuration that has been specifically setup by the admin, so don't get in trouble.
[size=12pt]Error Documents[/size]
This seems to be what people think htaccess was meant for, but it is only part of the general use. We'll be getting into progressively more advanced stuff after this.
In order to specify your own ErrorDocuments, you need to be slightly familiar with the server returned error codes. (List to the right). You do not need to specify error pages for all of these, in fact you shouldn't. An ErrorDocument for code 200 would cause an infinite loop, whenever a page was found...this would not be good.
You will probably want to create an error document for codes 404 and 500, at the least 404 since this would give you a chance to handle requests for pages not found. 500 would help you out with internal server errors in any scripts you have running. You may also want to consider ErrorDocuments for 401 - Authorization Required (as in when somebody tries to enter a protected area of your site without the proper credentials), 403 - Forbidden (as in when a file with permissions not allowing it to be accessed by the
In order to specify your own customized error documents, you simply need to add the following command, on one line, within your htaccess file:
ErrorDocument code /directory/filename.ext
OR
ErrorDocument 404 /errors/notfound.html
This would cause any error code resulting in 404 to be forward to yoursite.com/errors/notfound.html
Likewise with:
ErrorDocument 500 /errors/internalerror.html
You can name the pages anything you want (I'd recommend something that would prevent you from forgetting what the page is being used for), and you can place the error pages anywhere you want within your site, so long as they are web-accessible (through a URL). The initial slash in the directory location represents the root directory of your site, that being where your default page for your first-level domain is located. I typically prefer to keep them in a separate directory for maintenance purposes and in order to better control spiders indexing them through a ROBOTS.TXT file, but it is entirely up to you.
If you were to use an error document handler for each of the error codes I mentioned, the htaccess file would look like the following (note each command is on its own line):
You can specify a full URL rather than a virtual URL in the ErrorDocument string (http://yoursite.com/errors/notfound.html vs. /errors/notfound.html). But this is not the preferred method by the server's happiness standards.
You can also specify HTML, believe it or not!
The only time I use that HTML option is if I am feeling particularly saucy, since you can have so much more control over the error pages when used in conjunction with xSSI or CGI or both. Also note that the ErrorDocument starts with a " just before the HTML starts, but does not end with one...it shouldn't end with one and if you do use that option, keep it that way. And again, that should all be on one line, no naughty word wrapping!
Next, we are moving on to password protection, that last frontier before I dunk you into the true capabilities of htaccess. If you are familiar with setting up your own password protected directories via htaccess, you may feel like skipping ahead.
[size=12pt]
Password protection[/size]
Ever wanted a specific directory in your site to be available only to people who you want it to be available to? Ever got frustrated with the seeming holes in client-side options for this that allowed virtually anyone with enough skill to mess around in your source to get in? htaccess is the answer!
There are numerous methods to password protecting areas of your site, some server language based (such as ASP, PHP or PERL) and client side based, such as JavaScript. JavaScript is not as secure or foolproof as a server-side option, a server side challenge/response is always more secure than a client dependant challenge/response. htaccess is about as secure as you can or need to get in everyday life, though there are ways above and beyond even that of htaccess. If you aren't comfortable enough with htaccess, you can password protect your pages any number of ways, and JavaScript Kit has plenty of password protection scripts for your use.
The first thing you will need to do is create a file called .htpasswd. I know, you might have problems with the naming convention, but it is the same idea behind naming the htaccess file itself, and you should be able to do that by this point. In the htpasswd file, you place the username and password (which is encrypted) for those whom you want to have access.
For example, a username and password of wsabstract (and I do not recommend having the username being the same as the password), the htpasswd file would look like this:
Notice that it is UserName first, followed by the Password. There is a handy-dandy tool available for you to easily encrypt the password into the proper encoding for use in the httpasswd file.
For security, you should not upload the htpasswd file to a directory that is web accessible (yoursite.com/.htpasswd), it should be placed above your www root directory. You'll be specifying the location to it later on, so be sure you know where you put it. Also, this file, as with htaccess, should be uploaded as ASCII and not BINARY.
Create a new htaccess file and place the following code in it:
The first line is the full server path to your htpasswd file. If you have installed scripts on your server, you should be familiar with this. Please note that this is not a URL, this is a server path. Also note that if you place this htaccess file in your root directory, it will password protect your entire site, which probably isn't your exact goal.
The second to last line require user is where you enter the username of those who you want to have access to that portion of your site. Note that using this will allow only that specific user to be able to access that directory. This applies if you had an htpasswd file that had multiple users setup in it and you wanted each one to have access to an individual directory. If you wanted the entire list of users to have access to that directory, you would replace Require user xxx with require valid-user.
The AuthName is the name of the area you want to access. It could anything, such as "EnterPassword". You can change the name of this 'realm' to whatever you want, within reason.
We are using AuthType Basic because we are using basic HTTP authentication.
[size=12pt]Enabling SSI Via htaccess[/size]
Many people want to use SSI, but don't seem to have the ability to do so with their current web host. You can change that with htaccess. A note of caution first...definitely ask permission from your host before you do this, it can be considered 'hacking' or violation of your host's TOS, so be safe rather than sorry:
The first line tells the server that pages with a .shtml extension (for Server parsed HTML) are valid. The second line adds a handler, the actual SSI bit, in all files named .shtml. This tells the server that any file named .shtml should be parsed for server side commands. The last line is just techno-junk that you should throw in there.
And that's it, you should have SSI enabled. But wait...don't feel like renaming all of your pages to .shtml in order to take advantage of this neat little toy? Me either! Just add this line to the fragment above, between the first and second lines:
A note of caution on that one too, however. This will force the server to parse every page named .html for SSI commands, even if they have no SSI commands within them. If you are using SSI sparingly on your site, this is going to give you more server drain than you can justify. SSI does slow down a server because it does extra stuff before serving up a page, although in human terms of speed, it is virtually transparent. Some people also prefer to allow SSI in html pages so as to avoid letting anyone who looks at the page extension to know that they are using SSI in order to prevent the server being compromised through SSI hacks, which is possible. Either way, you now have the knowledge to use it either way.
If, however, you are going to keep SSI pages with the extension of .shtml, and you want to use SSI on your Index pages, you need to add the following line to your htaccess:
This allows a page named index.shtml to be your default page, and if that is not found, index.html is loaded. More on DirectoryIndex later.
[size=12pt]Blocking users by IP[/size]
Is there a pesky person perpetrating pain upon you? Stalking your site from the vastness of the electron void? Blockem! In your htaccess file, add the following code--changing the IPs to suit your needs--each command on one line each:
You can deny access based upon IP address or an IP block. The above blocks access to the site from 123.45.6.7, and from any sub domain under the IP block 012.34.5. (012.34.5.1, 012.34.5.2, 012.34.5.3, etc.) I have yet to find a useful application of this, maybe if there is a site scraping your content you can block them, who knows.
You can also set an option for deny from all, which would of course deny everyone. You can also allow or deny by domain name rather than IP address (allow from .javascriptkit.com works for www.javascriptkit.com or virtual.javascriptkit.com, etc.)
[size=12pt]Blocking users/ sites by referrer[/size]
Blocking users or sites that originate from a particular domain is another useful trick of .htaccess. Lets say you check your logs one day, and see tons of referrals from a particular site, yet upon inspection you can't find a single visible link to your site on theirs. The referral isn't a "legitimate" one, with the site most likely hot linking to certain files on your site such as images, .css files, or files you can't even make out. Remember, your logs will generate a referrer entry for any kind of reference to your site that has a traceable origin.
Before I get to the code itself, it's important to note that blocking access by referrer in .htaccess requires the help of the Apache module mod_rewrite to make out the referrer first. This module is installed by default on most servers (ask your host if you're not sure). So, to deny access all traffic that originate from a particular domain (referrers) to your site, use the following code:
Block traffic from a single referrer:
Block traffic from multiple referrers
In the "single referrer" case above, "badsite\.com" is the domain you wish to block. Note the backslash proceeding the period (".") to actually donate a period, as in Regular Expressions, a period donates any character, which is not what we want. The flag "[NC]" is added to the end of the domain to make it case insensitive, so whether the domain is "badsite.com", "Badsite.com" etc, however bad it gets, it gets blocked. Finally, the last line in the .htaccess file specifies that the action to take when a match is found is to fail the request, meaning the referrer traffic will hit a 403 Forbidden error. The only difference between blocking a single referrer and multiple referrers is the modified [NC, OR] flag in the later case to every domain but the last.
Now, you may have noticed the line "Options +FollowSymlinks" above, which is commented. Uncomment this line if your server isn't configured with FollowSymLinks in its <directory> section in httpd.conf, and you get a 500 Internal Server error when using the code above as is.
[size=12pt]Blocking bad bots and site rippers (aka offline browsers)[/size]
The definition of a "bad bot" varies depending on who you ask, but most would agree they are the spiders that do a lot more harm than good on your site (ie: an email harvester). A site ripper on the other hand are offline browsing programs that a surfer may unleash on your site to crawl and download every one of its pages for offline viewing. In both cases, both your site's bandwidth and resource usage are jacked up as a result, sometimes to the point of crashing your server. Bad bots typically ignore the wishes of your robots.txt file, so you'll want to ban them using means such as .htaccess. The trick is to identify a bad bot.
Below is a useful code block you can insert into.htaccess file for blocking a lot of the known bad bots and site rippers currently out there. It is derived from my reading of the excellent discussion "A close to perfect .htaccess file", specifically, "A close to perfect .htaccess file II." Simply add the below code to your .htaccess file:
Bots that are listed above will all receive a 403 Forbidden error when trying to view your site. The amount of bandwidth savings and decrease in server resource usage as a result may be significant in many cases.
[size=12pt]Change your default directory page[/size]
Some of you may be wondering, just what in the world is a DirectoryIndex? Well, grasshopper, this is a command which allows you to specify a file that is to be loaded as your default page whenever a directory or url request comes in, that does not specify a specific page. Tired of having yoursite.com/index.html come up when you go to yoursite.com? Want to change it to be yoursite.com/ILikePizzaSteve.html that comes up instead? No problem!
This would cause filename.html to be treated as your default page, or default directory page. You can also append other filenames to it. You may want to have certain directories use a script as a default page. That's no problem too!
Placing the above command in your htaccess file will cause this to happen: When a user types in yoursite.com, your site will look for filename.html in your root directory (or any directory if you specify this in the global htaccess), and if it finds it, it will load that page as the default page. If it does not find filename.html, it will then look for index.cgi; if it finds that one, it will load it, if not, it will look for index.pl and the whole process repeats until it finds a file it can use. Basically, the list of files is read from left to right.
Every once in a while, I use this method for the following needs: Say I keep all my include files in a directory called include, and that I keep all my image files in a directory called images, I don't want people to be able to directory browse through them (even though we can prevent that through another htaccess trick, more later) I would specify a DirectoryIndex entry, in a specific htaccess file for those two directories, for /redirect/index.pl that is a redirect page (as explained here) that redirects a request for those directories to be sent to the homepage. Or I could just specify a directory index of index.pl and upload an index.pl file to each of those directories. Or I could just stick in an htaccess redirect page, which is our next subject!
[size=12pt]Redirects[/size]
Ever go through the nightmare of changing significantly portions of your site, then having to deal with the problem of people finding their way from the old pages to the new? It can be nasty. There are different ways of redirecting pages, through http-equiv, javascript or any of the server-side languages. And then you can do it through htaccess, which is probably the most effective, considering the minimal amount of work required to do it.
htaccess uses redirect to look for any request for a specific page (or a non-specific location, though this can cause infinite loops) and if it finds that request, it forwards it to a new page you have specified:
Note that there are 3 parts to that, which should all be on one line : the Redirect command, the location of the file/directory you want redirected relative to the root of your site (/olddirectory/oldfile.html = yoursite.com/olddirectory/oldfile.html) and the full URL of the location you want that request sent to. Each of the 3 is separated by a single space, but all on one line. You can also redirect an entire directory by simple using Redirect /olddirectoryhttp://yoursite.com/newdirectory/
Using this method, you can redirect any number of pages no matter what you do to your directory structure. It is the fastest method that is a global affect.
[size=12pt]Prevent viewing of .htaccess file[/size]
If you use htaccess for password protection, then the location containing all of your password information is plainly available through the htaccess file. If you have set incorrect permissions or if your server is not as secure as it could be, a browser has the potential to view an htaccess file through a standard web interface and thus compromise your site/server. This, of course, would be a bad thing. However, it is possible to prevent an htaccess file from being viewed in this manner:
The first line specifies that the file named .htaccess is having this rule applied to it. You could use this for other purposes as well if you get creative enough.
If you use this in your htaccess file, a person trying to see that file would get returned (under most server configurations) a 403 error code. You can also set permissions for your htaccess file via CHMOD, which would also prevent this from happening, as an added measure of security: 644 or RW-R--R--
[size=12pt]Adding MIME Types[/size]
What if your server wasn't set up to deliver certain file types properly? A common occurrence with MP3 or even SWF files. Simple enough to fix:
AddType is specifying that you are adding a MIME type. The application string is the actual parameter of the MIME you are adding, and the final little bit is the default extension for the MIME type you just added, in our example this is swf for ShockWave File.
By the way, here's a neat little trick that few know about, but you get to be part of the club since JavaScript Kit loves you: To force a file to be downloaded, via the Save As browser feature, you can simply set a MIME type to application/octet-stream and that immediately prompts you for the download.
[size=12pt]Preventing hot linking of images and other file types[/size]
In the webmaster community, "hot linking" is a curse phrase. Also known as "bandwidth stealing" by the angry site owner, it refers to linking directly to non-html objects not on one own's server, such as images, .js files etc. The victim's server in this case is robbed of bandwidth (and in turn money) as the violator enjoys showing content without having to pay for its deliverance. The most common practice of hot linking pertains to another site's images.
Using .htaccess, you can disallow hot linking on your server, so those attempting to link to an image or CSS file on your site, for example, is either blocked (failed request, such as a broken image) or served a different content (ie: an image of an angry man) . Note that mod_rewrite needs to be enabled on your server in order for this aspect of .htaccess to work. Inquire your web host regarding this.
With all the pieces in place, here's how to disable hot linking of certain file types on your site, in the case below, images, JavaScript (js) and CSS (css) files on your site. Simply add the below code to your .htaccess file, and upload the file either to your root directory, or a particular subdirectory to localize the effect to just one section of your site:
Be sure to replace "mydomain.com" with your own. The above code creates a failed request when hot linking of the specified file types occurs. In the case of images, a broken image is shown instead.
Serving alternate content when hot linking is detected
You can set up your .htaccess file to actually serve up different content when hot linking occurs. This is more commonly done with images, such as serving up an Angry Man image in place of the hot linked one. The code for this is:
Same deal- replace mydomain.com with your own, plus angryman.gif.
Time to pour a bucket of cold water on hot linking!
[size=12pt]Preventing Directory Listing[/size]
Do you have a directory full of images or zips that you do not want people to be able to browse through? Typically a server is setup to prevent directory listing, but sometimes they are not. If not, become self-sufficient and fix it yourself:
The * is a wildcard that matches all files, so if you stick that line into an htaccess file in your images directory, nothing in that directory will be allowed to be listed.
On the other hand, what if you did want the directory contents to be listed, but only if they were HTML pages and not images? Simple says I:
This would return a list of all files not ending in .jpg or .gif, but would still list .txt, .html, etc.
And conversely, if your server is setup to prevent directory listing, but you want to list the directories by default, you could simply throw this into an htaccess file the directory you want displayed:
If you do use this option, be very careful that you do not put any unintentional or compromising files in this directory. And if you guessed it by the plus sign before Indexes, you can throw in a minus sign (Options -Indexes) to prevent directory listing entirely--this is typical of most server setups and is usually configured elsewhere in the apache server, but can be overridden through htaccess.
If you really want to be tricky, using the +Indexes option, you can include a default description for the directory listing that is displayed when you use it by placing a file called HEADER in the same directory. The contents of this file will be printed out before the list of directory contents is listed. You can also specify a footer, though it is called README, by placing it in the same directory as the HEADER. The README file is printed out after the directory listing is printed.
.htaccess is the file extension. It is not file.htaccess or somepage.htaccess, it is simply named .htaccess
In order to create the file, open up a text editor and save an empty page as .htaccess (or type in one character, as some editors will not let you save an empty page). Chances are that your editor will append its default file extension to the name (ex: for Notepad it would call the file .htaccess.txt). You need to remove the .txt (or other) file extension in order to get yourself htaccessing--yes, I know that isn't a word, but it sounds keen, don't it? You can do this by right clicking on the file and renaming it by removing anything that doesn't say .htaccess. You can also rename it via telnet or your ftp program, and you should be familiar enough with one of those so as not to need explaining.
htaccess files must be uploaded as ASCII mode, not BINARY. You may need to CHMOD the htaccess file to 644 or (RW-R--R--). This makes the file usable by the server, but prevents it from being read by a browser, which can seriously compromise your security. (For example, if you have password protected directories, if a browser can read the htaccess file, then they can get the location of the authentication file and then reverse engineer the list to get full access to any portion that you previously had protected. There are different ways to prevent this, one being to place all your authentication files above the root directory so that they are not www accessible, and the other is through an htaccess series of commands that prevents itself from being accessed by a browser, more on that later)
Most commands in htaccess are meant to be placed on one line only, so if you use a text editor that uses word-wrap, make sure it is disabled or it might throw in a few characters that annoy Apache to no end, although Apache is typically very forgiving of malformed content in an htaccess file.
htaccess is an Apache thing, not an NT thing. There are similar capabilities for NT servers, though in my professional experience and personal opinion, NT's ability in these areas is severely handicapped. But that's not what we're here for.
htaccess files affect the directory they are placed in and all sub-directories, that is an htaccess file located in your root directory (yoursite.com) would affect yoursite.com/content, yoursite.com/content/contents, etc. It is important to note that this can be prevented (if, for example, you did not want certain htaccess commands to affect a specific directory) by placing a new htaccess file within the directory you don't want affected with certain changes, and removing the specific command(s) from the new htaccess file that you do not want affecting this directory. In short, the nearest htaccess file to the current directory is treated as the htaccess file. If the nearest htaccess file is your global htaccess located in your root, then it affects every single directory in your entire site.
Before you go off and plant htaccess everywhere, read through this and make sure you don't do anything redundant, since it is possible to cause an infinite loop of redirects or errors if you place something weird in the htaccess.
Also...some sites do not allow use of htaccess files, since depending on what they are doing, they can slow down a server overloaded with domains if they are all using htaccess files. I can't stress this enough: You need to make sure you are allowed to use htaccess before you actually use it. Some things that htaccess can do can compromise a server configuration that has been specifically setup by the admin, so don't get in trouble.
[size=12pt]Error Documents[/size]
This seems to be what people think htaccess was meant for, but it is only part of the general use. We'll be getting into progressively more advanced stuff after this.
In order to specify your own ErrorDocuments, you need to be slightly familiar with the server returned error codes. (List to the right). You do not need to specify error pages for all of these, in fact you shouldn't. An ErrorDocument for code 200 would cause an infinite loop, whenever a page was found...this would not be good.
You will probably want to create an error document for codes 404 and 500, at the least 404 since this would give you a chance to handle requests for pages not found. 500 would help you out with internal server errors in any scripts you have running. You may also want to consider ErrorDocuments for 401 - Authorization Required (as in when somebody tries to enter a protected area of your site without the proper credentials), 403 - Forbidden (as in when a file with permissions not allowing it to be accessed by the
In order to specify your own customized error documents, you simply need to add the following command, on one line, within your htaccess file:
ErrorDocument code /directory/filename.ext
OR
ErrorDocument 404 /errors/notfound.html
This would cause any error code resulting in 404 to be forward to yoursite.com/errors/notfound.html
Likewise with:
ErrorDocument 500 /errors/internalerror.html
You can name the pages anything you want (I'd recommend something that would prevent you from forgetting what the page is being used for), and you can place the error pages anywhere you want within your site, so long as they are web-accessible (through a URL). The initial slash in the directory location represents the root directory of your site, that being where your default page for your first-level domain is located. I typically prefer to keep them in a separate directory for maintenance purposes and in order to better control spiders indexing them through a ROBOTS.TXT file, but it is entirely up to you.
If you were to use an error document handler for each of the error codes I mentioned, the htaccess file would look like the following (note each command is on its own line):
Code:
ErrorDocument 400 /errors/badrequest.html ErrorDocument 401 /errors/authreqd.html ErrorDocument 403 /errors/forbid.html ErrorDocument 404 /errors/notfound.html ErrorDocument 500 /errors/serverr.html
You can also specify HTML, believe it or not!
Code:
ErrorDocument 401 "<body bgcolor=#ffffff><h1>You have to actually <b>BE</b> a <a href="#">member</A> to view this page, Colonel!
Next, we are moving on to password protection, that last frontier before I dunk you into the true capabilities of htaccess. If you are familiar with setting up your own password protected directories via htaccess, you may feel like skipping ahead.
[size=12pt]
Password protection[/size]
Ever wanted a specific directory in your site to be available only to people who you want it to be available to? Ever got frustrated with the seeming holes in client-side options for this that allowed virtually anyone with enough skill to mess around in your source to get in? htaccess is the answer!
There are numerous methods to password protecting areas of your site, some server language based (such as ASP, PHP or PERL) and client side based, such as JavaScript. JavaScript is not as secure or foolproof as a server-side option, a server side challenge/response is always more secure than a client dependant challenge/response. htaccess is about as secure as you can or need to get in everyday life, though there are ways above and beyond even that of htaccess. If you aren't comfortable enough with htaccess, you can password protect your pages any number of ways, and JavaScript Kit has plenty of password protection scripts for your use.
The first thing you will need to do is create a file called .htpasswd. I know, you might have problems with the naming convention, but it is the same idea behind naming the htaccess file itself, and you should be able to do that by this point. In the htpasswd file, you place the username and password (which is encrypted) for those whom you want to have access.
For example, a username and password of wsabstract (and I do not recommend having the username being the same as the password), the htpasswd file would look like this:
Code:
wsabstract:y4E7Ep8e7EYV
For security, you should not upload the htpasswd file to a directory that is web accessible (yoursite.com/.htpasswd), it should be placed above your www root directory. You'll be specifying the location to it later on, so be sure you know where you put it. Also, this file, as with htaccess, should be uploaded as ASCII and not BINARY.
Create a new htaccess file and place the following code in it:
Code:
AuthUserFile /usr/local/you/safedir/.htpasswd AuthGroupFile /dev/null AuthName EnterPassword AuthType Basic require user wsabstract
The second to last line require user is where you enter the username of those who you want to have access to that portion of your site. Note that using this will allow only that specific user to be able to access that directory. This applies if you had an htpasswd file that had multiple users setup in it and you wanted each one to have access to an individual directory. If you wanted the entire list of users to have access to that directory, you would replace Require user xxx with require valid-user.
The AuthName is the name of the area you want to access. It could anything, such as "EnterPassword". You can change the name of this 'realm' to whatever you want, within reason.
We are using AuthType Basic because we are using basic HTTP authentication.
[size=12pt]Enabling SSI Via htaccess[/size]
Many people want to use SSI, but don't seem to have the ability to do so with their current web host. You can change that with htaccess. A note of caution first...definitely ask permission from your host before you do this, it can be considered 'hacking' or violation of your host's TOS, so be safe rather than sorry:
Code:
AddType text/html .shtml AddHandler server-parsed .shtml Options Indexes FollowSymLinks Includes
And that's it, you should have SSI enabled. But wait...don't feel like renaming all of your pages to .shtml in order to take advantage of this neat little toy? Me either! Just add this line to the fragment above, between the first and second lines:
Code:
AddHandler server-parsed .html
If, however, you are going to keep SSI pages with the extension of .shtml, and you want to use SSI on your Index pages, you need to add the following line to your htaccess:
Code:
DirectoryIndex index.shtml index.html
[size=12pt]Blocking users by IP[/size]
Is there a pesky person perpetrating pain upon you? Stalking your site from the vastness of the electron void? Blockem! In your htaccess file, add the following code--changing the IPs to suit your needs--each command on one line each:
Code:
order allow,deny deny from 123.45.6.7 deny from 012.34.5. allow from all
You can also set an option for deny from all, which would of course deny everyone. You can also allow or deny by domain name rather than IP address (allow from .javascriptkit.com works for www.javascriptkit.com or virtual.javascriptkit.com, etc.)
[size=12pt]Blocking users/ sites by referrer[/size]
Blocking users or sites that originate from a particular domain is another useful trick of .htaccess. Lets say you check your logs one day, and see tons of referrals from a particular site, yet upon inspection you can't find a single visible link to your site on theirs. The referral isn't a "legitimate" one, with the site most likely hot linking to certain files on your site such as images, .css files, or files you can't even make out. Remember, your logs will generate a referrer entry for any kind of reference to your site that has a traceable origin.
Before I get to the code itself, it's important to note that blocking access by referrer in .htaccess requires the help of the Apache module mod_rewrite to make out the referrer first. This module is installed by default on most servers (ask your host if you're not sure). So, to deny access all traffic that originate from a particular domain (referrers) to your site, use the following code:
Block traffic from a single referrer:
Code:
RewriteEngine on # Options +FollowSymlinks RewriteCond %{HTTP_REFERER} badsite\.com [NC] RewriteRule .* - [F]
Block traffic from multiple referrers
Code:
RewriteEngine on # Options +FollowSymlinks RewriteCond %{HTTP_REFERER} badsite\.com [NC,OR] RewriteCond %{HTTP_REFERER} anotherbadsite\.com RewriteRule .* - [F]
Now, you may have noticed the line "Options +FollowSymlinks" above, which is commented. Uncomment this line if your server isn't configured with FollowSymLinks in its <directory> section in httpd.conf, and you get a 500 Internal Server error when using the code above as is.
[size=12pt]Blocking bad bots and site rippers (aka offline browsers)[/size]
The definition of a "bad bot" varies depending on who you ask, but most would agree they are the spiders that do a lot more harm than good on your site (ie: an email harvester). A site ripper on the other hand are offline browsing programs that a surfer may unleash on your site to crawl and download every one of its pages for offline viewing. In both cases, both your site's bandwidth and resource usage are jacked up as a result, sometimes to the point of crashing your server. Bad bots typically ignore the wishes of your robots.txt file, so you'll want to ban them using means such as .htaccess. The trick is to identify a bad bot.
Below is a useful code block you can insert into.htaccess file for blocking a lot of the known bad bots and site rippers currently out there. It is derived from my reading of the excellent discussion "A close to perfect .htaccess file", specifically, "A close to perfect .htaccess file II." Simply add the below code to your .htaccess file:
Code:
RewriteEngine On RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR] RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR] RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR] RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR] RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR] RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR] RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR] RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR] RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Zeus RewriteRule ^.* - [F,L]
[size=12pt]Change your default directory page[/size]
Some of you may be wondering, just what in the world is a DirectoryIndex? Well, grasshopper, this is a command which allows you to specify a file that is to be loaded as your default page whenever a directory or url request comes in, that does not specify a specific page. Tired of having yoursite.com/index.html come up when you go to yoursite.com? Want to change it to be yoursite.com/ILikePizzaSteve.html that comes up instead? No problem!
Code:
DirectoryIndex filename.html
Code:
DirectoryIndex filename.html index.cgi index.pl default.htm
Every once in a while, I use this method for the following needs: Say I keep all my include files in a directory called include, and that I keep all my image files in a directory called images, I don't want people to be able to directory browse through them (even though we can prevent that through another htaccess trick, more later) I would specify a DirectoryIndex entry, in a specific htaccess file for those two directories, for /redirect/index.pl that is a redirect page (as explained here) that redirects a request for those directories to be sent to the homepage. Or I could just specify a directory index of index.pl and upload an index.pl file to each of those directories. Or I could just stick in an htaccess redirect page, which is our next subject!
[size=12pt]Redirects[/size]
Ever go through the nightmare of changing significantly portions of your site, then having to deal with the problem of people finding their way from the old pages to the new? It can be nasty. There are different ways of redirecting pages, through http-equiv, javascript or any of the server-side languages. And then you can do it through htaccess, which is probably the most effective, considering the minimal amount of work required to do it.
htaccess uses redirect to look for any request for a specific page (or a non-specific location, though this can cause infinite loops) and if it finds that request, it forwards it to a new page you have specified:
Code:
Redirect /olddirectory/oldfile.html http://yoursite.com/newdirectory/newfile.html
Using this method, you can redirect any number of pages no matter what you do to your directory structure. It is the fastest method that is a global affect.
[size=12pt]Prevent viewing of .htaccess file[/size]
If you use htaccess for password protection, then the location containing all of your password information is plainly available through the htaccess file. If you have set incorrect permissions or if your server is not as secure as it could be, a browser has the potential to view an htaccess file through a standard web interface and thus compromise your site/server. This, of course, would be a bad thing. However, it is possible to prevent an htaccess file from being viewed in this manner:
Code:
<Files .htaccess> order allow,deny deny from all </Files>
If you use this in your htaccess file, a person trying to see that file would get returned (under most server configurations) a 403 error code. You can also set permissions for your htaccess file via CHMOD, which would also prevent this from happening, as an added measure of security: 644 or RW-R--R--
[size=12pt]Adding MIME Types[/size]
What if your server wasn't set up to deliver certain file types properly? A common occurrence with MP3 or even SWF files. Simple enough to fix:
Code:
AddType application/x-shockwave-flash swf
By the way, here's a neat little trick that few know about, but you get to be part of the club since JavaScript Kit loves you: To force a file to be downloaded, via the Save As browser feature, you can simply set a MIME type to application/octet-stream and that immediately prompts you for the download.
[size=12pt]Preventing hot linking of images and other file types[/size]
In the webmaster community, "hot linking" is a curse phrase. Also known as "bandwidth stealing" by the angry site owner, it refers to linking directly to non-html objects not on one own's server, such as images, .js files etc. The victim's server in this case is robbed of bandwidth (and in turn money) as the violator enjoys showing content without having to pay for its deliverance. The most common practice of hot linking pertains to another site's images.
Using .htaccess, you can disallow hot linking on your server, so those attempting to link to an image or CSS file on your site, for example, is either blocked (failed request, such as a broken image) or served a different content (ie: an image of an angry man) . Note that mod_rewrite needs to be enabled on your server in order for this aspect of .htaccess to work. Inquire your web host regarding this.
With all the pieces in place, here's how to disable hot linking of certain file types on your site, in the case below, images, JavaScript (js) and CSS (css) files on your site. Simply add the below code to your .htaccess file, and upload the file either to your root directory, or a particular subdirectory to localize the effect to just one section of your site:
Code:
RewriteEngine on RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^http://(www\.)?mydomain.com/.*$ [NC] RewriteRule \.(gif|jpg|js|css)$ - [F]
Serving alternate content when hot linking is detected
You can set up your .htaccess file to actually serve up different content when hot linking occurs. This is more commonly done with images, such as serving up an Angry Man image in place of the hot linked one. The code for this is:
Code:
RewriteEngine on RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^http://(www\.)?mydomain.com/.*$ [NC] RewriteRule \.(gif|jpg)$ http://www.mydomain.com/angryman.gif [R,L]
Time to pour a bucket of cold water on hot linking!
[size=12pt]Preventing Directory Listing[/size]
Do you have a directory full of images or zips that you do not want people to be able to browse through? Typically a server is setup to prevent directory listing, but sometimes they are not. If not, become self-sufficient and fix it yourself:
Code:
IndexIgnore *
On the other hand, what if you did want the directory contents to be listed, but only if they were HTML pages and not images? Simple says I:
Code:
IndexIgnore *.gif *.jpg
And conversely, if your server is setup to prevent directory listing, but you want to list the directories by default, you could simply throw this into an htaccess file the directory you want displayed:
Code:
Options +Indexes
If you really want to be tricky, using the +Indexes option, you can include a default description for the directory listing that is displayed when you use it by placing a file called HEADER in the same directory. The contents of this file will be printed out before the list of directory contents is listed. You can also specify a footer, though it is called README, by placing it in the same directory as the HEADER. The README file is printed out after the directory listing is printed.
0 comments:
Post a Comment