Session 11 - Server-Side: HTTP and Apache Web Server Configuration

Harvard University Extension School
Fall 2017

Course Web Site: http://cscie12.dce.harvard.edu/

Instructor email: david_heitmeyer@harvard.edu
Course staff email: cscie12@dce.harvard.edu

Topics

  1. The Internet and the Web
  2. HyperText Transfer Protocol
  3. Apache HTTP Server
  4. Caching - Don't deliver content unnecessarily
  5. Minify and Compress Content
  6. Friendly Errors
  7. Friendly Ways to Get There

slide1 slide2 slide3 slide4 slide5 slide6 slide7 slide8 slide9 slide10 slide11 slide12 slide13 slide14 slide15 slide16 slide17 slide18 slide19 slide20 slide21 slide22 slide23 slide24 slide25 slide26 slide27 slide28 slide29 slide30 slide31 slide32 slide33 slide34 slide35 slide36 slide37 

The Internet and the Web

Internet Routing

Domain Names: Top Level Domains (TLD)

TLDs are managed by the Internet Assigned Numbers Authority (IANA)

Generic: .com, .org, .edu, .gov, etc.

Country codes: .ch, .cn, .de, .uk, .us, etc.

Full listing of TLDs

Getting Your Own Domain and Hosting

  1. Domain Name
  2. Hosting

A very short list of hosting companies as a place to start.

Web Server Software

Web Server Market Share

Netcraft HTTP Server Survey


Netcraft HTTP Server Survey

Netcraft Web Server Survey

HyperText Transfer Protocol

GET /

GET / HTTP/1.1
Host: www.whitehouse.gov

HTTP/1.1 200 OK
Content-Length: 107981
Content-Type: text/html; charset=utf-8
X-Drupal-Cache: HIT
P3P: CP="NON DSP COR ADM DEV IVA OTPi OUR LEG"
X-Age: 67
X-Cache-Hits: 9
X-Varnish: 805108045 805108024
X-AH-Environment: prod
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Link: <https://www.whitehouse.gov/sites/whitehouse.gov/files/images/fb_share_postcard.jpg>; rel="image_src",<https://www.whitehouse.gov/homepage>; rel="canonical",<https://www.whitehouse.gov/node/5596>; rel="shortlink"
X-Generator: Drupal 7 (http://drupal.org)
X-UA-Compatible: IE=edge,chrome=1
Content-Language: en
Strict-Transport-Security: max-age=3600;include_subdomains
ETag: "1429045100.642-1"
Date: Tue, 14 Apr 2015 22:25:54 GMT
Connection: keep-alive

<!DOCTYPE html>
  <!-- truncated for example -->

HTTP Overview

HTTP is Stateless

Each requested resource is a separate, independent, request to the server -- it is a stateless protocol.

HTTP Versions

W3C and Internet Engineering Task Force (IETF) oversees the Hypertext Transfer Protocol.

An HTTP Conversation

HTTP 1.1 Methods

HTTP Response Codes

HTTP 1.1 status codes commonly seen

The complete list:

Common Headers

Request (Browser)

Response (Server)

Looking at HTTP Under the Hood

Viewing HTTP Request and Response Headers

HTTP Header: Host

Problem: "Infinite" domain names; finite IP addresses.

Solution: "Virtual Hosts"

Example: all of the following names map to 140.247.197.241

Host Header

This is required for HTTP 1.1 requests.

HEAD /http/raspberry.gif HTTP/1.1
Host: cscie12.dce.harvard.edu 

HTTP/1.1 200 OK
Date: Tue, 8 Apr 2015 20:23:14 GMT
Server: Apache/2.2 (Fedora)
Last-Modified: Wed, 06 Apr 2005 19:30:42 GMT
ETag: "461fb8-348c-a0f67c80"
Accept-Ranges: bytes
Content-Length: 13452
Connection: close
Content-Type: image/gif

Connection closed by foreign host.

HTTP/2

What are the key differences to HTTP/1.x?

From the HTTP/2 FAQ:

At a high level, HTTP/2:

Apache HTTP Server

apache httpd

Apache Configuration Overview

Scope of .htaccess files

Directives within .htaccess files apply to the directory that contains the .htaccess file and all its descendants.

Directives within the file,
/home/courses/j/h/jharvard/public_html/.htaccess
would apply to all files within and "under" the public_html directory for the user jharvard.

Directives within the file,
/home/courses/j/h/jharvard/public_html/books/.htaccess
would apply to all files within and "under" the public_html/books directory for the user jharvard.

Problems You Will Have with .htaccess files

500 Internal Server Error

500 Internal Server Error

:(

Problems You will encounter when using .htaccess files (Internal Server Error 500)

500 Internal Server Error
If you see begin seeing 500 Internal Server Error responses from the server after you have created or edited an .htaccess file, the most likely cause of the problem is incorrect permissions and/or an error in the directive syntax.
cscie12students% pwd
/home/courses/j/h/jharvard/public_html
cscie12students% ls -l .htaccess
-rw-------   1 jharvard  founder         349 Nov 27 00:03 .htaccess
cscie12students% chmod o+r .htaccess
cscie12students% ls -l ~/public_html/.htaccess
-rw----r--   1 jharvard  founder         349 Nov 27 00:03 .htaccess

Problems You will encounter when using .htaccess files (Can't see the .htaccess file)

You can't "see" your .htaccess file.

Apache Configuration Sections

Configuration directives can be limited by using "sections", such as

Within .htaccess

Note that only Files and FilesMatch can be used within .htaccess files.

Examples:

<Files .htaccess>
    Order allow,deny
    Deny from all
</Files>

Examples:

# deny access to any tilde backup files
<Files *~>
    Order allow,deny
    Deny from all
</Files>

Caching - Don't deliver content unnecessarily

Types of Caching

Caching Related Headers

Local cache and proxy-server cache.

Proxy Servers

Proxy Server

If-Modified-Since

A request for the Apache Software Foundation logo (http://apache.org/img/asf_logo.png) that is part of loading http://apache.org/foundation/
asf logo

Initial request:


GET /img/asf_logo.png HTTP/1.1
Host: apache.org
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36
Referer: http://apache.org/foundation/
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,ro;q=0.6

HTTP/1.1 200 OK
Date: Tue, 14 Apr 2015 22:40:52 GMT
Server: Apache/2.4.7 (Ubuntu)
Last-Modified: Tue, 14 Apr 2015 16:08:47 GMT
ETag: "751e-513b1721525d0"
Accept-Ranges: bytes
Content-Length: 29982
Cache-Control: max-age=3600
Expires: Tue, 14 Apr 2015 23:40:52 GMT
Keep-Alive: timeout=30, max=98
Connection: Keep-Alive
Content-Type: image/png

After expiration, if still located in local cache, browser will make a conditional request:

GET /img/asf_logo.png HTTP/1.1
Host: apache.org
Connection: keep-alive
Accept: image/webp,*/*;q=0.8
If-None-Match: "751e-513b1721525d0"
If-Modified-Since: Tue, 14 Apr 2015 16:08:47 GMT
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36
Referer: http://apache.org/foundation/
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,ro;q=0.6

HTTP/1.1 304 Not Modified
Date: Tue, 14 Apr 2015 22:42:51 GMT
Server: Apache/2.4.7 (Ubuntu)
Connection: Keep-Alive
Keep-Alive: timeout=30, max=100
ETag: "751e-513b1721525d0"
Expires: Tue, 14 Apr 2015 23:42:51 GMT
Cache-Control: max-age=3600

Expires HTTP Header

.htaccess
ExpiresActive On

ExpiresByType text/html   A3600    
# HTML expires in 1 hour

ExpiresByType image/gif   A2592000 
# GIF  expires in 30 days

ExpiresByType image/jpeg  A2592000 
# JPEG expires in 30 days

ExpiresByType image/png   A2592000 
# PNG  expires in 30 days

# types not specified
ExpiresDefault "now plus 1 day"    
#  expires in 1 day  
Or, expire based upon modification time of document:
ExpiresActive On
ExpiresByType text/html   M86400   
# HTML expires 1 day after it was last modified
ExpiresDefault M86400  

Do not cache

If you do not want your page cached, set these HTTP response headers:

Cache-control: no-cache
Pragma: no-cache
Expires: <set to now>  

In .htaccess in Apache, this would translate to:

ExpiresDefault "now"    
Header set Pragma "no-cache"

Minify and Compress Content

Compress Content

mod_deflate compresses content before sending to web browser.

Simple use:

AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE text/plain 
AddOutputFilterByType DEFLATE text/xml 
AddOutputFilterByType DEFLATE text/css 
AddOutputFilterByType DEFLATE application/javascript

Does Compressing Help?


Harvard Summer School CSCI Course Listing

csci hss

firebug - page weight is 172 KB

Savings with Apache DEFLATE output filter

Friendly Errors

Apache Default "Not Found" 404 document:
404

"Not Found" 404 for Whitehouse
404 Not Found for Whitehouse

"Not Found" 404 for Whitehouse
404 Not Found for Harvard University

Custom Error Documents

.htaccess
ErrorDocument 401 /~jharvard/error/status401.html
ErrorDocument 403 /~jharvard/error/status403.html
ErrorDocument 404 /~jharvard/error/status404.html  

Friendly Ways to Get There

HTTP Redirect

Redirecting Requests

HTTP Status Codes:
301 Moved permanently
302 Moved temporarily

Redirecting client requests can be very useful:

Redirect

For cscie12.dce.harvard.edu the .htaccess file contains:

Redirect 302 /syllabus    https://harvard.instructure.com/courses/1812/assignments/syllabus
Try it:

Rewrite

mod_rewrite uses regular expressions to match on a pattern and rewrite incoming URLs to a new URL location.


Using mod_rewrite from within .htaccess

If you use RewriteRule from within an .htaccess files, you must use the RewriteBase directive.
See: http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritebase

Example - Make Simple Links Instead of Complex Ones

Context: Parks and Recreation class offered and how to easily link directly to the class

Park and Rec system:
https://webtrac.littletonrec.com/wbwsc/webtrac.wsc/wbsearch.html

Link I can use with Rewrite rule
http://littletontrack.org/lpr-303107

RewriteEngine On
RewriteBase /
RewriteRule ^lpr-(.*)$ https://webtrac.littletonrec.com/wbwsc/webtrac.wsc/wbsearch.html?per=10&xxsearch=yes&xxdispmap=no+&xxmulti-list=&xxmulti-lbls=&xxrowid=&xxmod=ar&xxactivitynumber=$1&xxage=&xxgrade=&xxkeyword=&xxkeywordoption=N&xxtype=&xxcategory=&xxsortoption=ActivityNumber&xxdisplayoption=D&xxsubmit=Search

Example: Create Links that can always point to the correct place

Road Race Registration is done through a 3rd party service, SignMeUp

Redirect  /registration https://www.signmeup.com/site/reg/register.aspx?fid=B42VRH7

Redirect /map http://maps.google.com/maps/ms?ie=UTF8&hl=en&msa=0&msid=101999702593116464805.00046f1a27a9feb5aacaf&ll=42.52946,-71.485934&spn=0.018975,0.018239&z=15

URL Shortener Services

Copyright © David Heitmeyer