Session 14 - Server-Side: HTTP and Apache Web Server Configuration

Harvard Extension School  
Fall 2021

Course Web Site: https://cscie12.dce.harvard.edu/

Topics

  1. Webpage and Website Optimization
  2. Web Browser and Web Server
  3. HyperText Transfer Protocol
  4. Apache HTTP Server
  5. Caching - Don't deliver content unnecessarily
  6. Typical Expiration / Cache Directives for Websites
  7. Minify and Compress Content
  8. Friendly Errors
  9. Friendly Ways to Get There
  10. HTTP Cookies
  11. Search Engines and Optimization
  12. HTML5 Boilerplate

Session 14 - Server-Side: HTTP and Apache Web Server Configuration, slide1
Webpage and Website Optimization, slide2
Web Browser and Web Server, slide3
Domain Name System, slide4
Domain Names: Top Level Domains (TLD), slide5
Getting Your Own Domain and Hosting, slide6
Web Server Software, slide7
HyperText Transfer Protocol, slide8
HTTP Overview, slide9
Looking at HTTP Under the Hood, slide10
HTTP Header: Host, slide11
HTTP/2, slide12
Apache HTTP Server, slide13
Apache Configuration Overview, slide14
Scope of .htaccess files, slide15
Problems You Will Have with .htaccess files, slide16
500 Internal Server Error, slide17
Problems You will encounter when using .htaccess files (Internal Server Error 500), slide18
Problems You will encounter when using .htaccess files (Can't see the .htaccess file), slide19
Apache Configuration Sections, slide20
Caching - Don't deliver content unnecessarily, slide21
Caching Related Headers, slide22
If-Modified-Since, slide23
Expires HTTP Header, slide24
Do not cache, slide25
Typical Expiration / Cache Directives for Websites, slide26
Minify and Compress Content, slide27
Compress Content, slide28
Does Compressing Help?, slide29
Friendly Errors, slide30
Custom Error Documents, slide31
Friendly Ways to Get There, slide32
HTTP Redirect, slide33
Redirect, slide34
Rewrite, slide35
Example - Make Simple Links Instead of Complex Ones, slide36
Example: Create Links that can always point to the correct place, slide37
URL Shortener Services, slide38
My Example Project - .htaccess setting to improve Webpagetest scores!, slide39
HTTP Cookies, slide40
Cookie Example, slide41
Cookies and Session IDs, slide42
Cookies and JavaScript, slide43
Search Engines and Optimization, slide44
Search Robots, Crawlers, Spiders, slide45
robots.txt and Examples, slide46
Robots meta element in markup, slide47
Content: meta tags, slide48
HTML5 Boilerplate, slide49

Presentation contains 49 slides

Webpage and Website Optimization

webpage test results

lighthouse test results

Web Browser and Web Server

Domain Name System

Computers connect by IP address (number); Humans like names (e.g. www.harvard.edu).
Domain Name System (DNS) resolves names to IP addresses (and the other way too)
www.harvard.edu151.101.18.133

Domain Names: Top Level Domains (TLD)

TLDs are managed by the Internet Assigned Numbers Authority (IANA)

Generic: .com, .org, .edu, .gov, etc.

Country codes: .ch, .cn, .de, .uk, .us, etc.

Full listing of TLDs

Getting Your Own Domain and Hosting

Often Domain Name registration and Hosting will be setup together from the same company, but keep in mind that they are distinct and separate things!

  1. Domain Name
    • Buy the domain through a "registrar"
    • Provide name servers
    • About $10/yr
  2. Hosting
    • Shared ($7-15/mo)
    • Private / Cloud

A very short list of hosting companies as a place to start.

My playground domain: cs12.net

I registered "cs12.net" and from there, I can control the subdomains from there. For example, natureofamerica.cs12.net, noa11ty.cs12.net, wptest.cs12.net.

Web Server Software

Web Server Market Share

Netcraft Web Server Survey

HyperText Transfer Protocol

GET

United States National Archives
www.archives.gov

GET / HTTP/1.1
Host: www.archives.gov
User-Agent: curl/7.64.1
Accept: */*

HTTP/2 200
content-type: text/html; charset=utf-8
content-length: 24409
date: Thu, 29 Jul 2021 20:15:11 GMT
content-language: en
last-modified: Thu, 29 Jul 2021 19:51:32 GMT
strict-transport-security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
v-ttl: 2199
cache-control: public, max-age=60, s-maxage=180
v-cache-ttl: 2199
x-frame-options: SAMEORIGIN
accept-ranges: bytes
etag: W/"1627588292-0-gzip"
vary: Cookie,Accept-Encoding
x-cache: Hit from cloudfront
via: 1.1 6c46ad9c24627fa8c065620a1a7a52a9.cloudfront.net (CloudFront)
x-amz-cf-pop: EWR52-C1
x-amz-cf-id: EBLHmCxYUblJWcaXLd8N6BDnq32dUFGQEhalaplMVENQ2kY6HixP3A==
age: 163

<!doctype html>
<html lang="en" dir="ltr" prefix="fb: //www.facebook.com/2008/fbml">
<head>
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">  <!-- truncated for example -->

HTTP Overview

HTTP is Stateless

Each requested resource is a separate, independent, request to the server -- it is a stateless protocol.

HTTP Versions

W3C and Internet Engineering Task Force (IETF) oversees the Hypertext Transfer Protocol.

An HTTP Conversation

Common Headers

Request (Browser)
Response (Server)

HTTP 1.1 Methods

HTTP Response Codes

HTTP 1.1 status codes commonly seen

The complete list:

Looking at HTTP Under the Hood

Use your browser developer tools!
screenshot of http headers in browser dev tools

HTTP Header: Host

Problem: "Infinite" domain names; finite IP addresses.

Solution: "Virtual Hosts"

Example: all of the following names map to the same IP.

Host Header

This is required for HTTP 1.1 requests.

HEAD /http/raspberry.gif HTTP/1.1
Host: cscie12.dce.harvard.edu

HTTP/1.1 200 OK
Date: Tue, 8 Apr 2020 20:23:14 GMT
Server: Apache/2.2 (Fedora)
Last-Modified: Wed, 06 Apr 2015 19:30:42 GMT
ETag: "461fb8-348c-a0f67c80"
Accept-Ranges: bytes
Content-Length: 13452
Connection: close
Content-Type: image/gif

Connection closed by foreign host.

Host Header: dph445.cs12students.dce.harvard.edu

That's how we have a unique hostname for each student!

HTTP/2

What are the key differences to HTTP/1.x?

From the HTTP/2 FAQ:

At a high level, HTTP/2:

Apache HTTP Server

apache httpd

Apache Configuration Overview

Scope of .htaccess files

Directives within .htaccess files apply to the directory that contains the .htaccess file and all its descendants.

Directives within the file,
/home/users/jh1636/public_html/.htaccess
would apply to all files within and "under" the public_html directory for the user jh1635.

Directives within the file,
/home/users/jh1636/public_html/books/.htaccess
would apply to all files within and "under" the public_html/books directory for the user jh1635.

Problems You Will Have with .htaccess files

500 Internal Server Error

500 Internal Server Error

:(

Problems You will encounter when using .htaccess files (Internal Server Error 500)

500 Internal Server Error
If you see begin seeing 500 Internal Server Error responses from the server after you have created or edited an .htaccess file, the most likely cause of the problem is incorrect permissions and/or an error in the directive syntax.
cs12% pwd
/home/users/jh1636/public_html
cs12% ls -l .htaccess
-rw-------   1 jh1635  founder         349 Nov 27 00:03 .htaccess
cs12% chmod o+r .htaccess
cs12% ls -l ~/public_html/.htaccess
-rw----r--   1 jh1635  founder         349 Nov 27 00:03 .htaccess

Problems You will encounter when using .htaccess files (Can't see the .htaccess file)

You can't "see" your .htaccess file.

Apache Configuration Sections

Configuration directives can be limited by using "sections", such as

Within .htaccess

Note that only Files and FilesMatch can be used within .htaccess files.

Examples:

<Files .htaccess>
    Order allow,deny
    Deny from all
</Files>

Examples:

# deny access to any tilde backup files
<Files *~>
    Order allow,deny
    Deny from all
</Files>

Caching - Don't deliver content unnecessarily

Types of Caching

Caching Related Headers

Local cache and proxy-server cache.

Proxy Servers

Proxy Server

If-Modified-Since

A request for the Apache Software Foundation logo (http://apache.org/img/asf_logo.png) that is part of loading http://apache.org/foundation/
asf logo

Initial request:


GET /img/asf_logo.png HTTP/1.1
Host: apache.org
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36
Referer: http://apache.org/foundation/
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,ro;q=0.6

HTTP/1.1 200 OK
Date: Tue, 14 Apr 2015 22:40:52 GMT
Server: Apache/2.4.7 (Ubuntu)
Last-Modified: Tue, 14 Apr 2015 16:08:47 GMT
ETag: "751e-513b1721525d0"
Accept-Ranges: bytes
Content-Length: 29982
Cache-Control: max-age=3600
Expires: Tue, 14 Apr 2015 23:40:52 GMT
Keep-Alive: timeout=30, max=98
Connection: Keep-Alive
Content-Type: image/png

After expiration, if still located in local cache, browser will make a conditional request:

GET /img/asf_logo.png HTTP/1.1
Host: apache.org
Connection: keep-alive
Accept: image/webp,*/*;q=0.8
If-None-Match: "751e-513b1721525d0"
If-Modified-Since: Tue, 14 Apr 2015 16:08:47 GMT
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36
Referer: http://apache.org/foundation/
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,ro;q=0.6

HTTP/1.1 304 Not Modified
Date: Tue, 14 Apr 2015 22:42:51 GMT
Server: Apache/2.4.7 (Ubuntu)
Connection: Keep-Alive
Keep-Alive: timeout=30, max=100
ETag: "751e-513b1721525d0"
Expires: Tue, 14 Apr 2015 23:42:51 GMT
Cache-Control: max-age=3600

Expires HTTP Header

.htaccess
ExpiresActive On

ExpiresByType text/html   A3600
# HTML expires in 1 hour

ExpiresByType image/gif   A2592000
# GIF  expires in 30 days

ExpiresByType image/jpeg  A2592000
# JPEG expires in 30 days

ExpiresByType image/png   A2592000
# PNG  expires in 30 days

# types not specified
ExpiresDefault "now plus 1 day"
#  expires in 1 day  
Or, expire based upon modification time of document:
ExpiresActive On
ExpiresByType text/html   M86400
# HTML expires 1 day after it was last modified
ExpiresDefault M86400  

Do not cache

If you do not want your page cached, set these HTTP response headers:

Cache-control: no-cache
Pragma: no-cache
Expires: <set to now>  

In .htaccess in Apache, this would translate to:

ExpiresDefault "now"
Header set Pragma "no-cache"

Typical Expiration / Cache Directives for Websites

Expire static content a week or more into the future.

In .htaccess

# Turn on the module.
ExpiresActive on
# Set the default expiry times.
ExpiresDefault "now"
ExpiresByType image/jpg "access plus 1 month"
ExpiresByType image/svg+xml "access 1 month"
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType text/css "access plus 1 month"
ExpiresByType text/javascript "access plus 1 month"
ExpiresByType application/javascript "access plus 1 month"
ExpiresByType image/ico "access plus 1 month"
ExpiresByType text/html "access plus 600 seconds"

What about site updates?

Cache/Expiration based on full URL. So you can reflect the "version" within the URL, either as part of the path or part of the query string.

Minify and Compress Content

Compress Content

mod_deflate compresses content before sending to web browser.

Simple use:

AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/javascript

Does Compressing Help?

This can make a noticable different in the total page weight!

Friendly Errors

Apache Default "Not Found" 404 document:
404

"Not Found" 404 for Whitehouse
404 Not Found for Whitehouse

"Not Found" 404 for Whitehouse
404 Not Found for Harvard University

"404" for my project site
404

Custom Error Documents

.htaccess
ErrorDocument 401 /~jh1635/error/status401.html
ErrorDocument 403 /~jh1635/error/status403.html
ErrorDocument 404 /~jh1635/error/status404.html  

Friendly Ways to Get There

HTTP Redirect

Ways to Achieve this

Redirecting Requests

HTTP Status Codes:
301 Moved permanently
302 Moved temporarily

Redirecting client requests can be very useful:

Redirect

For cscie12.dce.harvard.edu the .htaccess file contains:

Redirect 302 /syllabus    https://harvard.instructure.com/courses/95649/assignments/syllabus
Try it:

Rewrite

mod_rewrite uses regular expressions to match on a pattern and rewrite incoming URLs to a new URL location.


Using mod_rewrite from within .htaccess

If you use RewriteRule from within an .htaccess files, you must use the RewriteBase directive.
See: http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritebase

Example - Make Simple Links Instead of Complex Ones

Context: Parks and Recreation class offered and how to easily link directly to the class

Park and Rec system:
https://webtrac.littletonrec.com/wbwsc/webtrac.wsc/wbsearch.html

Link I can use with Rewrite rule
http://littletontrack.org/lpr-303107

RewriteEngine On
RewriteBase /
RewriteRule ^lpr-(.*)$ https://webtrac.littletonrec.com/wbwsc/webtrac.wsc/wbsearch.html?per=10&xxsearch=yes&xxdispmap=no+&xxmulti-list=&xxmulti-lbls=&xxrowid=&xxmod=ar&xxactivitynumber=$1&xxage=&xxgrade=&xxkeyword=&xxkeywordoption=N&xxtype=&xxcategory=&xxsortoption=ActivityNumber&xxdisplayoption=D&xxsubmit=Search

Example: Create Links that can always point to the correct place

Road Race Registration is done through a 3rd party service, SignMeUp

Redirect  /registration https://www.signmeup.com/site/reg/register.aspx?fid=B42VRH7

Redirect /map http://maps.google.com/maps/ms?ie=UTF8&hl=en&msa=0&msid=101999702593116464805.00046f1a27a9feb5aacaf&ll=42.52946,-71.485934&spn=0.018975,0.018239&z=15

URL Shortener Services

My Example Project - .htaccess setting to improve Webpagetest scores!

Nature of America - My Example Project Site

.htaccess file:


# default to index.html
DirectoryIndex index.html

# BEGIN Expire headers
<IfModule mod_expires.c>
  # Turn on the module.
  ExpiresActive on
  # Set the default expiry times.
  ExpiresDefault "now"
  ExpiresByType image/jpg "access plus 1 month"
  ExpiresByType image/svg+xml "access 1 month"
  ExpiresByType image/gif "access plus 1 month"
  ExpiresByType image/jpeg "access plus 1 month"
  ExpiresByType image/png "access plus 1 month"
  ExpiresByType text/css "access plus 1 month"
  ExpiresByType text/javascript "access plus 1 month"
  ExpiresByType application/javascript "access plus 1 month"
  ExpiresByType image/ico "access plus 1 month"
  ExpiresByType image/x-icon "access plus 1 month"
  ExpiresByType text/html "access plus 600 seconds"
</IfModule>
# END Expire headers

# Security Policy that determines domains that resources can load from
<IfModule mod_headers.c>
  Header set Strict-Transport-Security "max-age=2592000; includeSubDomains; preload"
  Header set Content-Security-Policy: "default-src 'self'; img-src 'self' cdn.jsdelivr.net; script-src 'self' 'unsafe-eval' code.jquery.com cdn.jsdelivr.net *.cloudflare.com; style-src 'self' *.jsdelivr.net *.cloudflare.com fonts.gstatic.com fonts.googleapis.com; font-src 'self' fonts.gstatic.com fonts.googleapis.com"
  Header set X-Frame-Options: DENY
</IfModule>

# compress (DEFLATE) files that are text
<IfModule mod_deflate.c>
  AddOutputFilterByType DEFLATE text/html text/css text/javascript application/javascript application/json
</IfModule>
Options -Indexes

# All errors will go to a common error file
ErrorDocument 404 /underconstruction.html
ErrorDocument 403 /underconstruction.html
ErrorDocument 500 /underconstruction.html

# Shouldn't publish from a git checkout anyway,
#   but just in case, sent requests trying to access .git to 404
RedirectMatch 404 /\.git

HTTP Cookies

HTTP Cookies

HTTP is a stateless protocol. Cookies provide a mechanism to "maintain state", track users, and provide personalization.

Cookie Example

ESPN Cookies

Request made August 5, 2021

set-cookie: connectionspeed=full; path=/; Expires=Thu, 05 Aug 2021 23:26:07 GMT;
set-cookie: country=us; path=/;
set-cookie: edition-view=espn-en-us; path=/; Expires=Thu, 05 Aug 2021 23:26:07 GMT;
set-cookie: edition=espn-en-us; path=/; Expires=Thu, 05 Aug 2021 23:26:07 GMT;
set-cookie: region=ccpa; path=/; Expires=Thu, 05 Aug 2021 23:26:07 GMT;
set-cookie: _dcf=1; path=/; Expires=Thu, 05 Aug 2021 23:26:07 GMT;
set-cookie: SWID=465CB153-A764-448D-C49D-10E2E4D42298; path=/; Expires=Mon, 29 Jul 2041 23:26:07 GMT; domain=espn.com;

Note: Cookie name and value, the cookie path, and cookie expiration (in the future or blank), and cookie domain.

Your Cookies

Firefox Cookies
Firefox Cookie Manager

Chrome - Cookies
cookies in chrome

Cookies and Session IDs

A UserID or SessionID (a long character/number string that is uniquely assigned) is often stored in cookie. The SessionID is used as the key or identifier when storing information about the user or session.

For example, a user logs in to a site. If the username and password match, the server sets a cookie ("Set-Cookie") in the browser that contains a session id; the server also makes an entry in website database that maps the session id to the username. When the cookie is returned, the session id is read and the username is looked up in the database.

http cookie illustration

Cookies and JavaScript

JavaScript gives you access to read and write cookies.

This can be used to record when a user dismisses an 'overlay' dialog such as "This site uses cookies", "Sign up for our email list", etc.

JavaScript can set a cookie upon "dismiss", and then only show the overlay dialog if the cookie is not present.

Search Engines and Optimization


Search Robots, Crawlers, Spiders

Three mechanisms to instruct robots that visit your site:

  1. robots.txt file
  2. robots meta tag
  3. rel="nofollow" for a elements

robots.txt and Examples

Two directives:
Note: robots.txt must be at the root level of the server.

Check out some real robots.txt files!

Robots meta element in markup

<meta name="robots" content="noindex,nofollow" />

The Robots meta element can be used on a per document basis.

HTTP Header: X-Robots-Tag

Content: meta tags

meta tags and Metadata Guidelines (W3 EOWG)

meta elements from Harvard University:

<title>Harvard University</title>
<meta name="description" content="Harvard University is devoted to excellence in teaching, learning, and research, and to developing leaders who make a difference globally." />
<link rel="canonical" href="https://www.harvard.edu/" />
<meta property="og:locale" content="en_US" />
<meta property="og:type" content="website" />
<meta property="og:title" content="Harvard University" />
<meta property="og:description" content="Harvard University is devoted to excellence in teaching, learning, and research, and to developing leaders who make a difference globally." />
<meta property="og:url" content="https://www.harvard.edu/" />
<meta property="og:site_name" content="Harvard University" />
<meta property="article:modified_time" content="2021-07-26T15:33:43+00:00" />
<meta property="og:image" content="https://www.harvard.edu/wp-content/uploads/2021/02/Shield_Social-1-1200x630.jpg" />
<meta property="og:image:width" content="1200" />
<meta property="og:image:height" content="630" />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:title" content="Harvard University" />
<meta name="twitter:description" content="Harvard University is devoted to excellence in teaching, learning, and research, and to developing leaders who make a difference globally." />
<meta name="twitter:image" content="https://www.harvard.edu/wp-content/uploads/2021/02/Shield_Social-1-1024x512.jpg" />

HTML5 Boilerplate

A Basic HTML5 Template

<!doctype html>

<html lang="en">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">

  <title>A Basic HTML5 Template</title>
  <meta name="description" content="A simple HTML5 Template for new projects.">
  <meta name="author" content="SitePoint">

  <meta property="og:title" content="A Basic HTML5 Template">
  <meta property="og:type" content="website">
  <meta property="og:url" content="https://www.sitepoint.com/a-basic-html5-template/">
  <meta property="og:description" content="A simple HTML5 Template for new projects.">
  <meta property="og:image" content="image.png">

  <link rel="icon" href="/favicon.ico">
  <link rel="icon" href="/favicon.svg" type="image/svg+xml">
  <link rel="apple-touch-icon" href="/apple-touch-icon.png">

  <link rel="stylesheet" href="css/styles.css?v=1.0">

</head>

<body>
  <!-- your content here... -->
  <script src="js/scripts.js"></script>
</body>
</html>