Session 14 - Server-Side: HTTP and Apache Web Server Configuration

Harvard Extension School  
Fall 2020

Course Web Site: https://cscie12.dce.harvard.edu/

Topics

  1. The Internet and the Web
  2. HyperText Transfer Protocol
  3. Apache HTTP Server
  4. Caching - Don't deliver content unnecessarily
  5. Minify and Compress Content
  6. Friendly Errors
  7. Friendly Ways to Get There

Session 14 - Server-Side: HTTP and Apache Web Server Configuration, slide1
The Internet and the Web, slide2
Domain Names: Top Level Domains (TLD), slide3
Getting Your Own Domain and Hosting, slide4
Web Server Software, slide5
HyperText Transfer Protocol, slide6
HTTP Overview, slide7
HTTP Response Codes, slide8
Common Headers, slide9
Looking at HTTP Under the Hood, slide10
HTTP Header: Host, slide11
HTTP/2, slide12
Apache HTTP Server, slide13
Apache Configuration Overview, slide14
Scope of .htaccess files, slide15
Problems You Will Have with .htaccess files, slide16
500 Internal Server Error, slide17
Problems You will encounter when using .htaccess files (Internal Server Error 500), slide18
Problems You will encounter when using .htaccess files (Can't see the .htaccess file), slide19
Apache Configuration Sections, slide20
Caching - Don't deliver content unnecessarily, slide21
Caching Related Headers, slide22
If-Modified-Since, slide23
Expires HTTP Header, slide24
Do not cache, slide25
Minify and Compress Content, slide26
Compress Content, slide27
Does Compressing Help?, slide28
Friendly Errors, slide29
Custom Error Documents, slide30
Friendly Ways to Get There, slide31
HTTP Redirect, slide32
Redirect, slide33
Rewrite, slide34
Example - Make Simple Links Instead of Complex Ones, slide35
Example: Create Links that can always point to the correct place, slide36
URL Shortener Services, slide37

Presentation contains 37 slides

The Internet and the Web

Internet Routing

Domain Names: Top Level Domains (TLD)

TLDs are managed by the Internet Assigned Numbers Authority (IANA)

Generic: .com, .org, .edu, .gov, etc.

Country codes: .ch, .cn, .de, .uk, .us, etc.

Full listing of TLDs

Getting Your Own Domain and Hosting

  1. Domain Name
    • Buy the domain through a "registrar"
    • Provide name servers
    • About $10/yr
  2. Hosting
    • Shared ($7-15/mo)
    • Private / Cloud

A very short list of hosting companies as a place to start.

Web Server Software

Web Server Market Share

Netcraft Web Server Survey

HyperText Transfer Protocol

GET

United States National Archives
www.archives.gov

GET / HTTP/1.1
Host: www.archives.gov
User-Agent: curl/7.49.0
Accept: */*

HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Date: Thu, 30 Jul 2020 23:25:09 GMT
Content-Language: en
Set-Cookie: UUID=7efbfc41-6054-bf24-f977-24eb8d075e4e; expires=Fri, 30-Jul-2021 23:06:47 GMT; Max-Age=31536000; path=/; domain=.archives.gov; httponly
Last-Modified: Thu, 30 Jul 2020 23:06:47 GMT
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Content-Type-Options: nosniff
ETag: W/"1596150407-0-gzip"
v-ttl: 2497
Cache-Control: public, max-age=60, s-maxage=180
v-cache-ttl: 2497
X-Frame-Options: SAMEORIGIN
Accept-Ranges: bytes
Vary: Cookie,Accept-Encoding
X-Cache: Miss from cloudfront
Via: 1.1 6c46ad9c24627fa8c065620a1a7a52a9.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: EWR52-C1
X-Amz-Cf-Id: LqRBsWPmMMWNU4m66BY-LRfHuL1LI8Xrcd6unFAZ0VJJWWdO_I--uA==

<!DOCTYPE html>
  <!-- truncated for example -->

HTTP Overview

HTTP is Stateless

Each requested resource is a separate, independent, request to the server -- it is a stateless protocol.

HTTP Versions

W3C and Internet Engineering Task Force (IETF) oversees the Hypertext Transfer Protocol.

An HTTP Conversation

HTTP 1.1 Methods

HTTP Response Codes

HTTP 1.1 status codes commonly seen

The complete list:

Common Headers

Request (Browser)

Response (Server)

Looking at HTTP Under the Hood

Use your browser developer tools!

HTTP Header: Host

Problem: "Infinite" domain names; finite IP addresses.

Solution: "Virtual Hosts"

Example: all of the following names map to 140.247.197.241

Host Header

This is required for HTTP 1.1 requests.

HEAD /http/raspberry.gif HTTP/1.1
Host: cscie12.dce.harvard.edu

HTTP/1.1 200 OK
Date: Tue, 8 Apr 2020 20:23:14 GMT
Server: Apache/2.2 (Fedora)
Last-Modified: Wed, 06 Apr 2015 19:30:42 GMT
ETag: "461fb8-348c-a0f67c80"
Accept-Ranges: bytes
Content-Length: 13452
Connection: close
Content-Type: image/gif

Connection closed by foreign host.

HTTP/2

What are the key differences to HTTP/1.x?

From the HTTP/2 FAQ:

At a high level, HTTP/2:

Apache HTTP Server

apache httpd

Apache Configuration Overview

Scope of .htaccess files

Directives within .htaccess files apply to the directory that contains the .htaccess file and all its descendants.

Directives within the file,
/home/courses/j/h/jharvard/public_html/.htaccess
would apply to all files within and "under" the public_html directory for the user jharvard.

Directives within the file,
/home/courses/j/h/jharvard/public_html/books/.htaccess
would apply to all files within and "under" the public_html/books directory for the user jharvard.

Problems You Will Have with .htaccess files

500 Internal Server Error

500 Internal Server Error

:(

Problems You will encounter when using .htaccess files (Internal Server Error 500)

500 Internal Server Error
If you see begin seeing 500 Internal Server Error responses from the server after you have created or edited an .htaccess file, the most likely cause of the problem is incorrect permissions and/or an error in the directive syntax.
cscie12students% pwd
/home/courses/j/h/jharvard/public_html
cscie12students% ls -l .htaccess
-rw-------   1 jharvard  founder         349 Nov 27 00:03 .htaccess
cscie12students% chmod o+r .htaccess
cscie12students% ls -l ~/public_html/.htaccess
-rw----r--   1 jharvard  founder         349 Nov 27 00:03 .htaccess

Problems You will encounter when using .htaccess files (Can't see the .htaccess file)

You can't "see" your .htaccess file.

Apache Configuration Sections

Configuration directives can be limited by using "sections", such as

Within .htaccess

Note that only Files and FilesMatch can be used within .htaccess files.

Examples:

<Files .htaccess>
    Order allow,deny
    Deny from all
</Files>

Examples:

# deny access to any tilde backup files
<Files *~>
    Order allow,deny
    Deny from all
</Files>

Caching - Don't deliver content unnecessarily

Types of Caching

Caching Related Headers

Local cache and proxy-server cache.

Proxy Servers

Proxy Server

If-Modified-Since

A request for the Apache Software Foundation logo (http://apache.org/img/asf_logo.png) that is part of loading http://apache.org/foundation/
asf logo

Initial request:


GET /img/asf_logo.png HTTP/1.1
Host: apache.org
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36
Referer: http://apache.org/foundation/
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,ro;q=0.6

HTTP/1.1 200 OK
Date: Tue, 14 Apr 2015 22:40:52 GMT
Server: Apache/2.4.7 (Ubuntu)
Last-Modified: Tue, 14 Apr 2015 16:08:47 GMT
ETag: "751e-513b1721525d0"
Accept-Ranges: bytes
Content-Length: 29982
Cache-Control: max-age=3600
Expires: Tue, 14 Apr 2015 23:40:52 GMT
Keep-Alive: timeout=30, max=98
Connection: Keep-Alive
Content-Type: image/png

After expiration, if still located in local cache, browser will make a conditional request:

GET /img/asf_logo.png HTTP/1.1
Host: apache.org
Connection: keep-alive
Accept: image/webp,*/*;q=0.8
If-None-Match: "751e-513b1721525d0"
If-Modified-Since: Tue, 14 Apr 2015 16:08:47 GMT
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36
Referer: http://apache.org/foundation/
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,ro;q=0.6

HTTP/1.1 304 Not Modified
Date: Tue, 14 Apr 2015 22:42:51 GMT
Server: Apache/2.4.7 (Ubuntu)
Connection: Keep-Alive
Keep-Alive: timeout=30, max=100
ETag: "751e-513b1721525d0"
Expires: Tue, 14 Apr 2015 23:42:51 GMT
Cache-Control: max-age=3600

Expires HTTP Header

.htaccess
ExpiresActive On

ExpiresByType text/html   A3600
# HTML expires in 1 hour

ExpiresByType image/gif   A2592000
# GIF  expires in 30 days

ExpiresByType image/jpeg  A2592000
# JPEG expires in 30 days

ExpiresByType image/png   A2592000
# PNG  expires in 30 days

# types not specified
ExpiresDefault "now plus 1 day"
#  expires in 1 day  
Or, expire based upon modification time of document:
ExpiresActive On
ExpiresByType text/html   M86400
# HTML expires 1 day after it was last modified
ExpiresDefault M86400  

Do not cache

If you do not want your page cached, set these HTTP response headers:

Cache-control: no-cache
Pragma: no-cache
Expires: <set to now>  

In .htaccess in Apache, this would translate to:

ExpiresDefault "now"
Header set Pragma "no-cache"

Minify and Compress Content

Compress Content

mod_deflate compresses content before sending to web browser.

Simple use:

AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/javascript

Does Compressing Help?


Harvard Summer School CSCI Course Listing

csci hss

firebug - page weight is 172 KB

Savings with Apache DEFLATE output filter

Friendly Errors

Apache Default "Not Found" 404 document:
404

"Not Found" 404 for Whitehouse
404 Not Found for Whitehouse

"Not Found" 404 for Whitehouse
404 Not Found for Harvard University

Custom Error Documents

.htaccess
ErrorDocument 401 /~jharvard/error/status401.html
ErrorDocument 403 /~jharvard/error/status403.html
ErrorDocument 404 /~jharvard/error/status404.html  

Friendly Ways to Get There

HTTP Redirect

Redirecting Requests

HTTP Status Codes:
301 Moved permanently
302 Moved temporarily

Redirecting client requests can be very useful:

Redirect

For cscie12.dce.harvard.edu the .htaccess file contains:

Redirect 302 /syllabus    https://harvard.instructure.com/courses/1812/assignments/syllabus
Try it:

Rewrite

mod_rewrite uses regular expressions to match on a pattern and rewrite incoming URLs to a new URL location.


Using mod_rewrite from within .htaccess

If you use RewriteRule from within an .htaccess files, you must use the RewriteBase directive.
See: http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritebase

Example - Make Simple Links Instead of Complex Ones

Context: Parks and Recreation class offered and how to easily link directly to the class

Park and Rec system:
https://webtrac.littletonrec.com/wbwsc/webtrac.wsc/wbsearch.html

Link I can use with Rewrite rule
http://littletontrack.org/lpr-303107

RewriteEngine On
RewriteBase /
RewriteRule ^lpr-(.*)$ https://webtrac.littletonrec.com/wbwsc/webtrac.wsc/wbsearch.html?per=10&xxsearch=yes&xxdispmap=no+&xxmulti-list=&xxmulti-lbls=&xxrowid=&xxmod=ar&xxactivitynumber=$1&xxage=&xxgrade=&xxkeyword=&xxkeywordoption=N&xxtype=&xxcategory=&xxsortoption=ActivityNumber&xxdisplayoption=D&xxsubmit=Search

Example: Create Links that can always point to the correct place

Road Race Registration is done through a 3rd party service, SignMeUp

Redirect  /registration https://www.signmeup.com/site/reg/register.aspx?fid=B42VRH7

Redirect /map http://maps.google.com/maps/ms?ie=UTF8&hl=en&msa=0&msid=101999702593116464805.00046f1a27a9feb5aacaf&ll=42.52946,-71.485934&spn=0.018975,0.018239&z=15

URL Shortener Services