CSCI E-12: Fundamentals of Web Site Development
Week 12 Lecture Notes
Configuring Apache with .htaccess files

Harvard University
Division of Continuing Education
Extension School

Course Web Site: http://www.courses.fas.harvard.edu/~cscie12/

Copyright © 1998-2001 David P. Heitmeyer

david_heitmeyer@harvard.edu

Configuring Apache with .htaccess files

Apache Configuration Overview

.htaccess File Example

filename: .htaccess
location: /home/c/s/cscie12/public_html/apache/example/.htaccess
contents:
ErrorDocument 404 /~cscie12/status404.html
filename: status404.html
location: /home/c/s/cscie12/public_html/status404.html
contents:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML lang="en">
  <HEAD>
  <TITLE>
      CSCIE12: 404 Not Found
  </TITLE>
  <BASE href="http://www.courses.fas.harvard.edu/~cscie12/">
  </HEAD>
  <BODY bgcolor="#ffffff" link="#cc3333" vlink="#996633"
  background="images/background.gif">
    <H1>404 Not Found</H1>
    <H2>CSCIE12: Introduction to Web Site Development</H2>
      The resource you requested, <br>
      <strong><!--#echo var="REQUEST_URI"--></strong><br>
      cannot be found.
    <HR>
    The main areas of the site are:<p>
    <!--#include virtual="inc/nav.html"-->
    <HR>
    <!--#include virtual="inc/footer.html"-->
    <HR>
  </BODY>
</HTML>

.htaccess: Scope

Directives within .htaccess files apply to the directory that contains the .htaccess file and all its descendants.

Directives within the file,
/home/c/s/cscie12/public_html/.htaccess
would apply to all files within and "under" the public_html directory for the user cscie12.

Directives within the file,
/home/c/s/cscie12/public_html/assignments/.htaccess
would apply to all files within and "under" the public_html/assignments directory for the user cscie12.

Problems You will encounter when using .htaccess files

500 Internal Server Error
If you see begin seeing 500 Internal Server Error responses from the server after you have created or edited an .htaccess file, the most likely cause of the problem is incorrect permissions and/or an error in the directive syntax.
fas% pwd
/home/j/h/jharvard/public_html
is03:~% ls -l .htaccess
-rw-------   1 jharvard  founder         349 Nov 27 00:03 .htaccess
is03:~% chmod o+r .htaccess
is03:~% ls -l ~/public_html/.htaccess
-rw----r--   1 jharvard  founder         349 Nov 27 00:03 .htaccess

Problems You will encounter when using .htaccess files

You can't "see" your .htaccess file.
fas% ls
assignments
cgi-bin
faq
images
inc
index.html
instructors
lecture
schedule.html
section
syllabus

fas% ls -a
.
..
.htaccess
assignments
cgi-bin
faq
images
inc
index.html
instructors
lecture
schedule.html
section
syllabus

Apache Configuration Sections

Configuration directives can be limited by using "sections", such as Note that only Files and FilesMatch can be used within .htaccess files.

Examples:

<Files .htaccess>
    Order allow,deny
    Deny from all
</Files>
Examples:
# deny access to any tilde backup files
<Files *~>
    Order allow,deny
    Deny from all
</Files>

Configuring Apache with .htaccess files

Custom Error Documents

.htaccess file:
ErrorDocument 404 /~cscie12/status404.html

Redirecting Requests

HTTP Status Codes:
301 Moved permanently
302 Moved temporarily

Redirecting client requests can be very useful:

Note: redirection may also be achieved on some browsers by using the http-equiv attribute of the <META> element. More information and examples are provided at http://www.fas.harvard.edu/~web/tutorial/meta/refresh/. The recommended method is to do it at the server level.

Redirect

.htaccess file:
Redirect 302 /~cscie12/dce.html      http://www.dce.harvard.edu/
Redirect 301 /~cscie12/presentation  http://www.courses.fas.harvard.edu/~cscie12/lecture

Rewrite

Examples of Rewrite Uses

Provide a standard mechanism to access course Web sites within Harvard College.

For example, Chemistry 5 has a catalog number of 5118, so the URL for the course Web site can be reached through: The "real" location of the site is:

HASCS Site Restructure

Many rewrite directives were put in place when the HASCS site was restructured so that links to documents within the previous site would get redirected to the appropriate page in the new site.

Rewrite: Text-only sites

RewriteEngine On
RewriteBase /~cscie12
RewriteCond %{HTTP_USER_AGENT} ^Lynx
RewriteRule ^(index.html)?$ text/ [R=302]
Here is what happens:
fas% lwp-request -USed -H"User-Agent: Lynx" \
     http://www.courses.fas.harvard.edu/~cscie12/index.html
GET http://www.courses.fas.harvard.edu/~cscie12/text/
User-Agent: Lynx

GET http://www.courses.fas.harvard.edu/~cscie12/index.html --> 302 Found
GET http://www.courses.fas.harvard.edu/~cscie12/text/ --> 200 OK
Connection: close
Date: Mon, 27 Nov 2001 19:47:04 GMT
Server: Apache/1.3.12 (Unix) secured_by_Raven/1.4.3 mod_perl/1.24
Content-Type: text/html
Author: David P. Heitmeyer
Client-Date: Mon, 27 Nov 2001 19:47:04 GMT
Client-Peer: 140.247.30.64:80
Title: Fundamentals of Web Site Development
X-Meta-Generator: HTML Tidy, see www.w3.org

An aside: Text-only sites and LINK

Meta-information can be used to describe alternate content.
In ~cscie12/public_html/index2.html
<link title="Text-only version"
         rel="alternate"
         href="http://www.courses.fas.harvard.edu/text/index.html"              
         media="aural, braille, tty">
Lynx view of index2.html provides the text-only version as a link:
                               Fundamentals of Web Site Development (p1 of 2)

   #Text-only version

                         Harvard University, DCE 
                                Fall 2001

                                 CSCIE12

                   Fundamentals of Web Site Development

   David P. Heitmeyer
     _______________________________________________________________

                           Week of November 20

     * Lecture 9 Handout: HTTP
     * Lecture 8 Video and Handouts: JavaScript, Usability,
       Accessibility, Other Content-Types
     * Assignment 4 and Submission Form Available.,
       Due Wednesday, November 15, 2001
     * Lecture 7 Video and Handouts: CSS and JavaScript
     * Lecture 6 Video and Handouts: Web Site Architecture and Design;
       special guest lecture by Elaine Benfatto, Harvard University
-- press space for next page --
  Arrow keys: Up and Down to move.  Right to follow a link; Left to go back.
 H)elp O)ptions P)rint G)o M)ain screen Q)uit /=search [delete]=history list

Directory Index and Listings

Note: Remember the difference between a directory having rwx-----x and rwx---r-x permissions?

DirectoryIndex

DirectoryIndex index.html main.html overview.html slide1.html

More Control over Directory Listings

mod_autoindex

Setting HTTP Headers

Expires

.htaccess file:
ExpiresActive On
ExpiresByType text/html   A3600    # HTML expires in 1 hour
ExpiresByType image/gif   A2592000 # GIF  expires in 30 days
ExpiresByType image/jpeg  A2592000 # JPEG expires in 30 days
ExpiresByType image/png   A2592000 # PNG  expires in 30 days
ExpiresDefault "now plus 1 day"    # types not specified
                                   #  expires in 1 day
Or, expire based upon modification time of document:
ExpiresActive On
ExpiresByType text/html   M86400   # HTML expires 1 day after it was last modified
ExpiresDefault M86400
From the Apache mod_expires documentation:

This module controls the setting of the Expires HTTP header in server responses. The expiration date can set to be relative to either the time the source file was last modified, or to the time of the client access.

The Expires HTTP header is an instruction to the client about the document's validity and persistence. If cached, the document may be fetched from the cache rather than from the source until this time has passed. After that, the cache copy is considered "expired" and invalid, and a new copy must be obtained from the source.

Headers

The optional headers module allows for the customization of HTTP response headers. Headers can be merged, replaced or removed. The server will always add a "Server" and "Date" header to the HTTP response.
Header set Author "David P. Heitmeyer"

asis Documents

Purpose Example:
fas% ls -l sendasiam.html.asis
-rw----r--   1 cscie12  courses       344 Nov 28 23:25 sendasiam.html.asis
sendasiam.html.asis file:
Status: 301 Now where did I leave that URL 
Location: http://www.joe.com/
Content-type: text/html 

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML> 
<HEAD> 
<TITLE>Lame excuses'R'us</TITLE> 
</HEAD> 
<BODY> 
<H1>Fred's exceptionally wonderful page has moved to 
<A HREF="http://www.joe.com/">Joe's</A> site. 
</H1> 
</BODY> 
</HTML> 

WWW Access Control

You can implement access control on all or part of your Web site so that:

Basic Authentication: Warning

Basic Authentication alone does not provide the security and privacy to adequately protect truly confidential or personal information.

Basic Authentication is analogous to simply "closing a door" to parts of your Web site. It will prevent the casual or polite users from "opening the door", but will not prevent someone mildly determined to walking in.

Two issues that contribute to the lack of security and privacy are:

HTTP: Authenticate

fas% telnet 140.247.30.64 80
Trying 140.247.30.64...
Connected to 140.247.30.64.
Escape character is '^]'.
HEAD /~cscie12/assignments/ HTTP/1.1
Host: www.courses.fas.harvard.edu

HTTP/1.1 401 Authorization Required
Date: Mon, 22 Nov 1999 17:42:29 GMT
Server: Apache/1.3.6 (Unix) mod_perl/1.21 secured_by_Raven/1.4.1
WWW-Authenticate: Basic realm="CSCIE12 Assignment Submission"

HTTP: Authentication/Authorization

The username:password is sent MIME BASE 64 encoded (not encrypted).
fas% telnet 140.247.30.64 80
Trying 140.247.30.64...
Connected to 140.247.30.64.
Escape character is '^]'.
HEAD /~cscie12/assignments/ HTTP/1.1
Host: www.courses.fas.harvard.edu
Authorization: BASIC Z3Vlc3Q6a25vY2trbm9jaw== 

HTTP/1.1 200 OK
Date: Mon, 22 Nov 1999 17:46:42 GMT
Server: Apache/1.3.6 (Unix) mod_perl/1.21 secured_by_Raven/1.4.1
Author: David P. Heitmeyer
Content-Type: text/html

Access Control Documentation

Apache

Implementing Access Control

To implement access control, you must create a file name '.htaccess' that contains with the proper configuration instructions. You may also need to create a ".htpasswd" file using the utility "htpasswd" and a ".htgroup" file.

.htaccess

.htaccess
This file contains the instructions the WWW Server needs in order to implement access control. The directives contained within this file will apply to all the files and subdirectories at or below the level of the .htaccess file.

For example, /home/j/h/jharvard/public_html/private/.htaccess will apply to all files contained within the ~jharvard/public_html/private directory (and its subdirectories), but would not be applied to the file ~jharvard/public_html/index.html.

This file needs to be readable by the Web Server.

htpasswd file

.htpasswd
This file contains usernames and encrypted passwords (username:enc_passwd). It is created and managed with the utility, "htpasswd", which can be run from the command line on fas.harvard.edu and ice.fas.harvard.edu.

This file should not lie within your public_html. It should reside at the root level of your home directory (for example, /home/j/h/jharvard/.htpasswd

This file needs to be readable by the Web Server.

fas% which htpasswd
/usr/local/bin/htpasswd

fas% htpasswd
Usage: htpasswd [-c] passwordfile username
The -c flag creates a new file.
Sample content:
fas% more ~cscie12/.htpasswd.demo
guest:79WeSn3vYGsKQ
guest2:wGcgIYLtHNIpM
guest3:j9VzpSX/C8Kr2
guest4:CjHmW1PWNFwXM

htgroup file

.htgroup
This file contains group definitions (group_name:member1 member2 ...).

This file should not lie within your public_html. It should reside at the root level of your home directory (for example, /home/j/h/jharvard/.htgroup

This file needs to be readable by the Web Server.

Access Control Examples

For the examples given, the user "cscie12" is used. You should substitute your username and home directory appropriately.

The following .htpasswd.demo and .htgroup.demo files are used:

/home/c/s/cscie12/.htpasswd.demo
The .htpasswd.demo was generated by using the utility "htpasswd"
ice% htpasswd
Usage: htpasswd [-c] passwordfile username 
The -c flag creates a new file. 

ice% htpasswd -c /home/c/s/cscie12/.htpasswd.demo guest
Adding password for guest 
New password: *****
Re-type password: *****
Password for "guest" (and all other entries) is "guest". Entries for guest2, guest3, and guest4 are created without the "-c" flag, since the .htpasswd.demo file already exists.

Contents of file:

guest:79WeSn3vYGsKQ
guest2:PR4APgA.4CKO.
guest3:5DbCMPbSDstj2
guest4:htPnr8jT4bI5E

.htgroup.demo
Contents of file:

VIP: guest guest4

Access Control Example 1

Any valid user in .htpasswd.demo is allowed access

The"AuthName" is the description that is displayed by the browser in the Basic Authentication dialog box.

Contents of sample .htaccess file:
AuthName "Basic Authentication Tutorial 1"
AuthType Basic
AuthUserFile /home/c/s/cscie12/.htpasswd.demo
require valid-user
Demonstration of Example 1
You may login as any of the following users (username:password):
guest:guest
guest2:guest
guest3:guest
guest4:guest
fas% lwp-request -USed -Cguest2:iforgot \
     http://www.courses.fas.harvard.edu/~cscie12/apache/access/example1
GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example1
Authorization: Basic Z3Vlc3QyOmlmb3Jnb3Q=
User-Agent: lwp-request/1.38

GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example1 --> 401 Authorization Required
GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example1 --> 401 Authorization Required
Connection: close
Date: Mon, 27 Nov 2001 19:13:51 GMT
Server: Apache/1.3.12 (Unix) secured_by_Raven/1.4.3 mod_perl/1.24
WWW-Authenticate: Basic realm="Basic Authentication Tutorial 1"
Content-Base: http://www.courses.fas.harvard.edu/~cscie12/
Content-Type: text/html
Author: David P. Heitmeyer
Client-Date: Mon, 27 Nov 2001 19:13:51 GMT
Client-Peer: 140.247.30.64:80
Client-Warning: Credentials for 'guest2' failed before
Title: CSCIE12: 401 Unauthorized

fas% lwp-request -USed -Cguest2:guest \
     http://www.courses.fas.harvard.edu/~cscie12/apache/access/example1
GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example1/
Authorization: Basic Z3Vlc3QyOmd1ZXN0
User-Agent: lwp-request/1.38

GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example1 --> 401 Authorization Required
GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example1 --> 301 Moved Permanently
GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example1/ --> 200 OK
Connection: close
Date: Mon, 27 Nov 2001 19:12:58 GMT
Server: Apache/1.3.12 (Unix) secured_by_Raven/1.4.3 mod_perl/1.24
Content-Type: text/html
Author: David P. Heitmeyer
Client-Date: Mon, 27 Nov 2001 19:12:58 GMT
Client-Peer: 140.247.30.64:80
Title:

Access Control Example 2

Only certain users in .htpasswd.demo are allowed access

Contents of sample .htaccess file:

AuthName "Basic Authentication Tutorial 2"
AuthType Basic
AuthUserFile /home/c/s/cscie12/.htpasswd.demo
require user guest2 guest3
Demonstration of Example 2
Only guest2 and guest3 are authorized:
guest2:guest
guest3:guest

Unauthorized:
guest:guest
guest4:guest

fas% lwp-request -USed -Cguest2:guest \
     http://www.courses.fas.harvard.edu/~cscie12/apache/access/example2/
GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example2/
Authorization: Basic Z3Vlc3QyOmd1ZXN0
User-Agent: lwp-request/1.38

GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example2/ --> 401 Authorization Required
GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example2/ --> 200 OK
Connection: close
Date: Mon, 27 Nov 2001 19:15:11 GMT
Server: Apache/1.3.12 (Unix) secured_by_Raven/1.4.3 mod_perl/1.24
Content-Type: text/html
Author: David P. Heitmeyer
Client-Date: Mon, 27 Nov 2001 19:15:11 GMT
Client-Peer: 140.247.30.64:80
Title: 

fas% lwp-request -USed -Cguest4:guest \
     http://www.courses.fas.harvard.edu/~cscie12/apache/access/example2/
GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example2/
Authorization: Basic Z3Vlc3Q0Omd1ZXN0
User-Agent: lwp-request/1.38

GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example2/ --> 401 Authorization Required
GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example2/ --> 401 Authorization Required
Connection: close
Date: Mon, 27 Nov 2001 19:15:22 GMT
Server: Apache/1.3.12 (Unix) secured_by_Raven/1.4.3 mod_perl/1.24
WWW-Authenticate: Basic realm="Basic Authentication Tutorial 2"
Content-Base: http://www.courses.fas.harvard.edu/~cscie12/
Content-Type: text/html
Author: David P. Heitmeyer
Client-Date: Mon, 27 Nov 2001 19:15:22 GMT
Client-Peer: 140.247.30.64:80
Client-Warning: Credentials for 'guest4' failed before
Title: CSCIE12: 401 Unauthorized

Access Control Example 3

Only members of a particular group are allowed access

Contents of .htaccess file:

AuthName "Basic Authentication Tutorial 3"
AuthType Basic
AuthUserFile /home/c/s/cscie12/.htpasswd.demo
AuthGroupFile /home/c/s/cscie12/.htgroup.demo
require group VIP

Contents of .htgroup.demo file:

VIP: guest guest4
Demonstration of Example 3
Only members of the group "VIP" (as defined by /home/c/s/cscie12/.htgroup.demo) are authorized (guest and guest4):
guest:guest
guest4:guest

Unauthorized:
guest2:guest
guest3:guest

Access Control Example 4

Only certain computers are allowed access

Contents of sample .htaccess file:

order deny,allow
deny from all
allow from 140.247
allow from 128.103
allow .harvard.edu
Demonstration of Example 4
Computers that are on the Harvard network (computers with hostnames ending in .harvard.edu or with IP addreses beginning with 128.103 or 140.247) will have access, others will be denied.
fas% lwp-request -USed \
     http://www.courses.fas.harvard.edu/~cscie12/apache/access/example4/
GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example4/
User-Agent: lwp-request/1.38

GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example4/ --> 200 OK
Connection: close
Date: Mon, 27 Nov 2001 19:17:01 GMT
Server: Apache/1.3.12 (Unix) secured_by_Raven/1.4.3 mod_perl/1.24
Content-Type: text/html
Author: David P. Heitmeyer
Client-Date: Mon, 27 Nov 2001 19:17:01 GMT
Client-Peer: 140.247.30.64:80
Title:

Access Control Example 5

Only certain computers are denied access

Contents of sample .htaccess file:

order allow,deny
allow from all
deny from .fas.harvard.edu
Demonstration of Example 5
Connections from within the domain 'fas.harvard.edu' will be denied.
fas% lwp-request -USed \
     http://www.courses.fas.harvard.edu/~cscie12/apache/access/example5/
GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example5/
User-Agent: lwp-request/1.38

GET http://www.courses.fas.harvard.edu/~cscie12/apache/access/example5/ --> 403 Forbidden
Connection: close
Date: Mon, 27 Nov 2001 19:17:53 GMT
Server: Apache/1.3.12 (Unix) secured_by_Raven/1.4.3 mod_perl/1.24
Content-Base: http://www.courses.fas.harvard.edu/~cscie12/
Content-Type: text/html
Author: David P. Heitmeyer
Client-Date: Mon, 27 Nov 2001 19:17:53 GMT
Client-Peer: 140.247.30.64:80
Title: CSCIE12: 403 Forbidden

Access Control Example 6

Certain computers are allowed in; others must provide a username and password

Contents of sample .htaccess file:

order deny,allow
deny from all
allow from .yale.edu

AuthType Basic
AuthUserFile /home/c/s/cscie12/.htpasswd.demo
AuthName "Basic Authentication Tutorial 6"
require valid-user

satisfy any
Demonstration of Example 6
Connection from within ".yale.edu" will be allowed; others must provide a valid username and password.

Access Control Example 7

Only certain computers are allowed in and users must provide a valid username and password.

Contents of sample .htaccess file:

order deny,allow
deny from all
allow from .harvard.edu

AuthType Basic
AuthUserFile /home/c/s/cscie12/.htpasswd.demo
AuthName "Basic Authentication Tutorial 7"
require valid-user

satisfy all
Demonstration of Example 7
Only connections from within ".harvard.edu" will be allowed and users must provide a valid username and password (satisfy all).

Requiring SSL (https://)

SSL (Secure Socket Layer) is a protocol that encrypts data between the client and the server. https is HTTP over SSL. More details in our last lecture on Security and Privacy.

Contents of sample .htaccess file:

SSLRequireSSL

Details about enabling .htaccess and allowed directives

.htaccess files: Legal Directives I
Context

Certain Apache directives are legal within .htaccess files. Some are not.
See the Apache Documentation for details. Specifically, look at the Context line that is given for the directive in question. The following is an excerpt from the Apache HTTP Server Version 1.3 documentation

ErrorDocument directive

Syntax: ErrorDocument error-code document
Context: server config, virtual host, directory, .htaccess
Status: core
Override: FileInfo
Compatibility: The directory and .htaccess contexts are only available in Apache 1.1 and later.

Also, the "a" indicator on the Apache Quick Reference Card indicates that the directive is valid within an .htaccess file.

.htaccess files: Legal Directives II
AllowOverride

Users are allowed to override certain aspects of the main server configuration.
The main server configuration file (httpd.conf) contains an AllowOverride directive that determines which directives within .htaccess files Apache will process. The Override line that is given for each directive in the Apache documentation indicates which configuration directive must be active in order to use that directive with an .htaccess file.

For the FAS system, the main server configuration file has the following directive in place for users' public_html directories:

AllowOverride FileInfo AuthConfig Limit Indexes Options
The following is an excerpt from the Apache HTTP Server Version 1.3 documentation

ErrorDocument directive

Syntax: ErrorDocument error-code document
Context: server config, virtual host, directory, .htaccess
Status: core
Override: FileInfo
Compatibility: The directory and .htaccess contexts are only available in Apache 1.1 and later.

.htaccess: Legal Directives III
Apache Modules

Apache is distributed with several modules. These modules may or may not be active within the Apache server with which you are working. The Core features will always be available.

For example, if the Rewrite Module (mod_rewrite) has not been activated, none of the Rewrite directives will be available to use.

Refer to the Status and Module lines in the documentation for each directive and to the documentation for the specific Apache installation you are using.

Apache Modules

On the FAS Web servers, the following Apache modules are active:
mod_access
mod_actions
mod_alias
mod_asis
mod_auth
mod_auth_dbm
mod_autoindex
mod_cgi
mod_dir
mod_env
mod_expires
mod_headers
mod_imap
mod_include
mod_log_config
mod_mime
mod_negotiation
mod_perl
mod_rewrite
mod_setenvif
mod_so
mod_status
mod_unique_id
mod_userdir
mod_usertrack
raven_ssl