.htaccess files [2009]

Introduction

.htaccess files are plain text files that contain Apache server directives (or commands) and are placed within the folder structure of a web site. These directives can be almost anything that can be configured in the main Apache configuration file (httpd.conf). The main difference is that while the httpd.conf directives are global (but can be assigned to individual folders), an .htaccess file only has jurisdiction over the folder it resides in and any subfolders in that same location.

Keep in mind that the httpd.conf file can do anything an .htaccess file can do, but not vice versa. This is to protect the global configurations on hosting services.

It is worthy to note that separate directives for individual folders can (and should) be configured within the main configuration file. The recommended consensus is that .htaccess files should be avoided in preference to using httpd.conf where possible. Creating a robust global configuration is definitely the best option. Nevertheless, there are situations where using .htaccess files are unavoidable. The most obvious is when a third party provider hosts your domain name and site. Under these circumstances, you will not have the privilege of editing the global configuration file.

As part of securing Apache, we use a configuration entry that disabled .htaccess files. In the .httpd.conf file, we placed the following:

<Directory />
  AllowOverride None
</Directory>

Therefore, to take advantage of using .htaccess files, "AllowOverride" must be set to "All" in httpd.conf.  Most (if not all) of the hosting services should be configured to allow their customers to use .htaccess files in their hosted websites.

Now we need to understand the security benefits and drawbacks of using .htaccess files.

The syntax of .htaccess files is the same as any other Linux configuration file. So, if you’re experienced with directly editing Linux configurations, you will have little trouble understanding how .htaccess files are written. As with all configuration files, .htaccess files can include comments that are ignored by the system. Using comments is highly recommended so you can come back to the file and understand what you have done.

Importantly, .htaccess files are "hidden" in a Unix/Linux environment as are all files starting with a period "." unless specific rights and commands are used to list them. Nevertheless, the Apache server is configured to read any files starting with .ht but never serve them up. This may seem moot, but we must remember that in the past, site content was viewable in a list or "index" format. This "feature" is now very rare. However, there are cases where a simple list can be useful (for directories full of downloadable files, for example); therefore, some webmasters turn it on using .htaccess directives.

To be absolutely sure that your .ht files are inaccessible, you can place some specific directives in your root .htaccess file that locks them down.

<Files ~ “^\.ht”>
Order allow,deny
Deny from all
Satisfy all
</Files>

The above is probably not necessary, as most modern default Apache installations have this directive already set.

So, what can we do with .htaccess files and how do they work? Typically, we would restrict access to certain folders within the site. A folder called "includes", for example, where snippets of code and other scripts reside and need protecting. We would use a "deny from all" entry for this.

In the case of a local LAMP development server, we could use a top level .htaccess file that excluded all traffic except that from the local LAN (if someone managed to get through the firewall).

order deny,allow
deny from all
allow from 192.168.X.X/24

The above directive allows only the local LAN traffic (on the 192.168.X subnet) access to the folder (and subfolders) containing the .htaccess file.

Optionally, there may be a specific address or addresses we wish to explicitly deny (for whatever reason, possibly a persistent attack) while letting everyone else have access. Obviously you would substitute the Xs for valid IP octets.

order deny,allow
deny from X.X.X.X
allow from all

As mentioned above, we can configure our site, or parts of it, to be listed as an index format. This is as simple as adding the line to the root .htaccess file:

Options +Indexes +MultiViews +FollowSymlinks

To ensure that certain folders and files are not listed we would add something like:

IndexIgnore .htaccess */.??* *~ *# */HEADER* */README* */_vti*

There are other commands that can make the list look pretty and offer more functionality. But, because this article is security focussed, this "feature" is NOT recommended. After all, there are plenty of other ways to generate dynamic lists and make a site look nice and consistent without listing everything.

Many sites neglect error handling. Most modern browsers will generate a generic page that reports error numbers and basic information about why it may have occurred and what to do. Nevertheless, webmasters can take control of these error messages with some basic entries in .htaccess files.

While there are a number of ways to handle errors, the simplest and most effective is to redirect the browser to a pre-designed page with a nice message and a link back to your site some place.

ErrorDocument 404 /includes/404.html

Add as many ErrorDocument directives as you think are necessary. Error 404 is the most common.

Using .htaccess Files for Authentication

Now we move on to the authentication features of using .htaccess files in our sites. Remember that these authentication commands could (and should) be implemented from within the httpd.conf file if it is available to the site developer. This would offer slightly better security; however, when using a third party host, we are required to use .htaccess files for this.

When an .htaccess file containing some authorisation directives is encountered in the same directory as a page being accessed, an authorisation dialogue opens in the user’s browser.

There are two types of authorisation (or AuthTypes, as is the command); Basic and Digest. The main difference is that Basic sends the password in plain text while Digest uses an MD5 digest algorithm.

The main configuration file for Apache must have the directive AllowOverride AuthConfig in it to enable authorisation via .htaccess. There are a number of Apache modules required to enable these features. However, we will assume that they are installed and working. We will also assume that we have run the htpasswd or htdigest commands to generate the appropriate password files.

An example of the directives to place in the .htaccess file for basic authentication would be:

AuthType Basic
AuthName "Password Required"
AuthUserFile /www/includes/.htpasswd
AuthGroupFile /www/includes/.htgroup
Require Group admin

The above example uses .htpasswd and .htgroup as the files containing the appropriate passwords and groups. These files can be named anything you like; however, for better security, we have already set up our Apache server to avoid listing or serving up ALL files starting with .ht and, to be doubly sure, blocked direct access to the "includes" folder.

The same directive using digest authentication would look like this:

AuthType Digest
AuthName "Password Required"
AuthDigestFile /www/includes/.htpasswd
AuthDigestGroupFile /www/includes/.htgroup
Require Group admin

If we decided that we only wanted to grant access to one or two users, we could simply list the users instead of using a group, something like this:

AuthType Digest
AuthName "Password Required"
AuthDigestFile /www/includes/.htpasswd
Require user fred george

Although Digest is a slightly more secure method of authentication, we must understand that there are still some security implications. Firstly, some older browsers do not support this method. Secondly, even the digest method uses non-encrypted HTTP to communicate, therefore, it can be intercepted and the digest can be extracted and used for malicious access. It is highly recommended that some form of SSL is implemented to ensure encryption between server and client if your content is important enough to you. 

Some issues with .htaccess files

Not surprisingly, there is a slight performance hit when using .htaccess directives as opposed to including them in the main configuration. This is because each time there is a page request the server reads all directories and subdirectories to know what directives to apply. This is not such an issue on modern, high powered servers, but worth considering.

Reading the files each time a page is requested has a side benefit; because they are read each time there is a page request, changes are instant (unlike having to restart the web server to implement changes made in the main configuration).

Remembering that .htaccess files provide directives for the containing folder and all sub folders thereof. Also, the most recent .htaccess file has priority. What does this mean? Well, consider the following hierarchy:

/www/.htaccess
/www/includes/.htaccess
/www/includes/specialstuff/.htaccess
/www/includes/morespecialstuff

The contents of specialstuff inherits the directives from both the .htaccess files above it as well as the one contained within it. No problem until there are conflicting directives. The bottom-most .htaccess file has priority and over-rules any conflicting directives above it. Additionally, morespecialstuff inherits the directives from the /www/includes/.htaccess file, so would have different directives assigned to it from those in specialstuff.

Although the above example is pretty straightforward, what if you were trying to manage a site that had dozens of folders and dozens of .htaccess files?  That could become very unwieldy and potentially open up some security holes.