Session 14 - Server-Side: HTTP and Apache Web Server Configuration
Harvard Extension School
Fall 2021
Course Web Site: https://cscie12.dce.harvard.edu/
Topics
- Webpage and Website Optimization
- Web Browser and Web Server
- HyperText Transfer Protocol
- Apache HTTP Server
- Caching - Don't deliver content unnecessarily
- Typical Expiration / Cache Directives for Websites
- Minify and Compress Content
- Friendly Errors
- Friendly Ways to Get There
- HTTP Cookies
- Search Engines and Optimization
- HTML5 Boilerplate
Presentation contains 49 slides
Webpage and Website Optimization
- WebPage Test
- Lighthouse (part of browser developer tools)
- PageSpeed Insights
Web Browser and Web Server
Domain Name System
Computers connect by IP address (number); Humans like names (e.g. www.harvard.edu).
Domain Name System (DNS) resolves names to IP addresses (and the other way too)www.harvard.edu
→ 151.101.18.133
Domain Names: Top Level Domains (TLD)
TLDs are managed by the Internet Assigned Numbers Authority (IANA)
Generic: .com
, .org
, .edu
, .gov
, etc.
Country codes: .ch
, .cn
, .de
, .uk
, .us
, etc.
Getting Your Own Domain and Hosting
Often Domain Name registration and Hosting will be setup together from the same company, but keep in mind that they are distinct and separate things!
- Domain Name
- Buy the domain through a "registrar"
- Provide name servers
- About $10/yr
- Hosting
- Shared ($7-15/mo)
- Private / Cloud
A very short list of hosting companies as a place to start.
My playground domain: cs12.net
I registered "cs12.net" and from there, I can control the subdomains from there. For example, natureofamerica.cs12.net, noa11ty.cs12.net, wptest.cs12.net.
Web Server Software
- Apache HTTP Server
- nginx
- "Other" - Microsoft, OpenResty, etc.
Web Server Market Share
HyperText Transfer Protocol
GET
United States National Archives
www.archives.gov
GET / HTTP/1.1
Host: www.archives.gov
User-Agent: curl/7.64.1
Accept: */*
HTTP/2 200
content-type: text/html; charset=utf-8
content-length: 24409
date: Thu, 29 Jul 2021 20:15:11 GMT
content-language: en
last-modified: Thu, 29 Jul 2021 19:51:32 GMT
strict-transport-security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
v-ttl: 2199
cache-control: public, max-age=60, s-maxage=180
v-cache-ttl: 2199
x-frame-options: SAMEORIGIN
accept-ranges: bytes
etag: W/"1627588292-0-gzip"
vary: Cookie,Accept-Encoding
x-cache: Hit from cloudfront
via: 1.1 6c46ad9c24627fa8c065620a1a7a52a9.cloudfront.net (CloudFront)
x-amz-cf-pop: EWR52-C1
x-amz-cf-id: EBLHmCxYUblJWcaXLd8N6BDnq32dUFGQEhalaplMVENQ2kY6HixP3A==
age: 163
<!doctype html>
<html lang="en" dir="ltr" prefix="fb: //www.facebook.com/2008/fbml">
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0"> <!-- truncated for example -->
HTTP Overview
HTTP is Stateless
Each requested resource is a separate, independent, request to the server -- it is a stateless protocol.
HTTP Versions
- HTTP 1.0 (1996)
- HTTP 1.1 (1999)
- HTTP 2 (2014)
An HTTP Conversation
- Client Request
- METHOD Resource HTTP Version
- Client Generated Headers
- Request Body
- Server Response
- Status Line
- Server Generated Headers
- Data
Common Headers
Request (Browser)
- Host
- User-Agent
- Referer
- Accept
- Accept-Language
- Accept-Encoding
- Accept-Charset
- Cookie
- If-Modified-Since
Response (Server)
- Last-Modified
- Content-Length
- Content-Type
- Connection
HTTP 1.1 Methods
- GET
- POST
- HEAD
- PUT
- DELETE
- TRACE
- OPTIONS
HTTP Response Codes
HTTP 1.1 status codes commonly seen
- 200 OK
- 301 Moved permanently
- 302 Moved temporarily
- 304 Not modified
- 401 Unauthorized
- 403 Forbidden
- 404 Not found
- 500 Internal server error
The complete list:
Looking at HTTP Under the Hood
Use your browser developer tools!
HTTP Header: Host
Solution: "Virtual Hosts"
Example: all of the following names map to the same IP.
- cscie12.dce.harvard.edu
- cscis12.dce.harvard.edu
- csci12.dce.harvard.edu
- cs12.dce.harvard.edu
- cs12students.dce.harvard.edu
- dph445.cs12students.dce.harvard.edu
- And your NetID too!
Host Header
This is required for HTTP 1.1 requests.
HEAD /http/raspberry.gif HTTP/1.1
Host: cscie12.dce.harvard.edu
HTTP/1.1 200 OK
Date: Tue, 8 Apr 2020 20:23:14 GMT
Server: Apache/2.2 (Fedora)
Last-Modified: Wed, 06 Apr 2015 19:30:42 GMT
ETag: "461fb8-348c-a0f67c80"
Accept-Ranges: bytes
Content-Length: 13452
Connection: close
Content-Type: image/gif
Connection closed by foreign host.
Host Header: dph445.cs12students.dce.harvard.edu
That's how we have a unique hostname for each student!
HTTP/2
- HTTP/2
- HPACK
What are the key differences to HTTP/1.x?
From the HTTP/2 FAQ:
At a high level, HTTP/2:
- is binary, instead of textual
- is fully multiplexed, instead of ordered and blocking
- can therefore use one connection for parallelism
- uses header compression to reduce overhead
- allows servers to “push” responses proactively into client caches
Apache HTTP Server
- Apache Software Foundation
- Apache HTTP Server Project
- Apache 2.x
- Apache Modules
- PHP
- Python
- many, many others
- Apache HTTP Server Project
Apache Configuration Overview
- Server Configuration
(
httpd.conf
)
Unless you are the server administrator, you generally will not have access to this account. On the DCE systems, you do not have read or write access to this file. Server configuration is read at server start or restart. - Per Directory (
.htaccess
)
Certain configuration directives for Apache can be placed within per-directory.htaccess
files..htaccess
file is read on a per request basis.
Scope of .htaccess files
.htaccess
files apply to the directory that
contains the .htaccess
file and all its descendants.
Directives within the file,
/home/users/jh1636/public_html/.htaccess
would apply to all files within and "under" the public_html directory for the user
jh1635.
Directives within the file,
/home/users/jh1636/public_html/books/.htaccess
would apply to all files within and "under" the public_html/books
directory for the user jh1635.
Problems You Will Have with .htaccess files
- Internal Server Error
- Can't "see" the file
- Incorrect Permissions
500 Internal Server Error
500 Internal Server Error
:(
Problems You will encounter when using .htaccess files (Internal Server Error 500)
If you see begin seeing 500 Internal Server Error responses from the server after you have created or edited an
.htaccess
file, the most
likely cause of the problem is incorrect permissions and/or an error in the directive
syntax.
- Permissions on the
.htaccess
file are not set correctly. Just like HTML and image files, the server must be able to read the.htaccess
file. The simplest way to allow that is to make your.htaccess
file readable by "other".
cs12% pwd
/home/users/jh1636/public_html
cs12% ls -l .htaccess
-rw------- 1 jh1635 founder 349 Nov 27 00:03 .htaccess
cs12% chmod o+r .htaccess
cs12% ls -l ~/public_html/.htaccess
-rw----r-- 1 jh1635 founder 349 Nov 27 00:03 .htaccess
- Syntax Error. An error in the syntax of a directive the
.htaccess
file will result in a 500 Internal Server Error. In addition, correct usage of a directive that is not allowed in the.htaccess
file will result in a 500 status code. Whether or not a directive is allowed depends upon the server configuration file (httpd.conf; AllowOverride) and the directive itself.
Problems You will encounter when using .htaccess files (Can't see the .htaccess file)
You can't "see" your .htaccess file.- HTTP
The web server is typically configured to deny requests for.htaccess
files. For example, the file corresponding to the URL, https://cscie12.dce.harvard.edu/.htaccess exists and is readable by the Web server, but if we try to follow the link, we get a 403 Forbidden response. - UNIX
Thels
command will not list files or directories that begin with a '.' (dot). In order to see the.htaccess
file when you do a directory listing, use the -a (all) option: - SFTP
Sometimes your SFTP program will hide the "dot" files unless explicitly told to show them.
Apache Configuration Sections
Within .htaccess
Note that onlyFiles
and
FilesMatch
can be used within .htaccess
files.
Examples:
<Files .htaccess>
Order allow,deny
Deny from all
</Files>
Examples:
# deny access to any tilde backup files
<Files *~>
Order allow,deny
Deny from all
</Files>
Caching - Don't deliver content unnecessarily
- Fewer HTTP requests to load pages
- Faster Load Times
- Less Bandwidth
Types of Caching
- Local (user's computer)
- Proxy-server
Caching Related Headers
Local cache and proxy-server cache.
- If-Modified-Since
- Age
- Expires
- Last-Modified
- Cache-Control
- ETag
Proxy Servers
If-Modified-Since
A request for the Apache Software Foundation logo (http://apache.org/img/asf_logo.png) that is part of loading
http://apache.org/foundation/
Initial request:
GET /img/asf_logo.png HTTP/1.1
Host: apache.org
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36
Referer: http://apache.org/foundation/
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,ro;q=0.6
HTTP/1.1 200 OK
Date: Tue, 14 Apr 2015 22:40:52 GMT
Server: Apache/2.4.7 (Ubuntu)
Last-Modified: Tue, 14 Apr 2015 16:08:47 GMT
ETag: "751e-513b1721525d0"
Accept-Ranges: bytes
Content-Length: 29982
Cache-Control: max-age=3600
Expires: Tue, 14 Apr 2015 23:40:52 GMT
Keep-Alive: timeout=30, max=98
Connection: Keep-Alive
Content-Type: image/png
After expiration, if still located in local cache, browser will make a conditional request:
GET /img/asf_logo.png HTTP/1.1
Host: apache.org
Connection: keep-alive
Accept: image/webp,*/*;q=0.8
If-None-Match: "751e-513b1721525d0"
If-Modified-Since: Tue, 14 Apr 2015 16:08:47 GMT
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36
Referer: http://apache.org/foundation/
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,ro;q=0.6
HTTP/1.1 304 Not Modified
Date: Tue, 14 Apr 2015 22:42:51 GMT
Server: Apache/2.4.7 (Ubuntu)
Connection: Keep-Alive
Keep-Alive: timeout=30, max=100
ETag: "751e-513b1721525d0"
Expires: Tue, 14 Apr 2015 23:42:51 GMT
Cache-Control: max-age=3600
Expires HTTP Header
.htaccess
ExpiresActive On
ExpiresByType text/html A3600
# HTML expires in 1 hour
ExpiresByType image/gif A2592000
# GIF expires in 30 days
ExpiresByType image/jpeg A2592000
# JPEG expires in 30 days
ExpiresByType image/png A2592000
# PNG expires in 30 days
# types not specified
ExpiresDefault "now plus 1 day"
# expires in 1 day
ExpiresActive On
ExpiresByType text/html M86400
# HTML expires 1 day after it was last modified
ExpiresDefault M86400
Do not cache
If you do not want your page cached, set these HTTP response headers:
Cache-control: no-cache
Pragma: no-cache
Expires: <set to now>
In .htaccess in Apache, this would translate to:
ExpiresDefault "now"
Header set Pragma "no-cache"
Typical Expiration / Cache Directives for Websites
Expire static content a week or more into the future.
In .htaccess
# Turn on the module.
ExpiresActive on
# Set the default expiry times.
ExpiresDefault "now"
ExpiresByType image/jpg "access plus 1 month"
ExpiresByType image/svg+xml "access 1 month"
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType text/css "access plus 1 month"
ExpiresByType text/javascript "access plus 1 month"
ExpiresByType application/javascript "access plus 1 month"
ExpiresByType image/ico "access plus 1 month"
ExpiresByType text/html "access plus 600 seconds"
What about site updates?
Cache/Expiration based on full URL. So you can reflect the "version" within the URL, either as part of the path or part of the query string.
- versioning as part of path
- jQuery 3.6.0:
https://code.jquery.com/jquery-3.6.0.min.js
- jQuery 2.2.4:
https://code.jquery.com/jquery-2.2.4.min.js
- jQuery 3.6.0:
- Versioning in query string
https://www[...]/site.min.css?ver=f1b2b77c823860edee8f0d253d01aaed
orhttps://www[...]/site.css?ver=20211201
Minify and Compress Content
- Fewer bytes == faster load time
- Happier Users
:)
- Less Bandwidth
Compress Content
mod_deflate compresses content before sending to web browser.
Simple use:
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/javascript
Does Compressing Help?
- 70 to 75% reduction for text files (markup, CSS, non-minified JS)
- 50% reduction even for 'minified' JS
This can make a noticable different in the total page weight!
Friendly Errors
Apache Default "Not Found" 404 document:
"Not Found" 404 for Whitehouse
"Not Found" 404 for Whitehouse
"404" for my project site
Custom Error Documents
.htaccess
ErrorDocument 401 /~jh1635/error/status401.html
ErrorDocument 403 /~jh1635/error/status403.html
ErrorDocument 404 /~jh1635/error/status404.html
Friendly Ways to Get There
- Short URLs
- Memorable URLs
- Don't break old URLs
HTTP Redirect
- Publish "clean" URLs, and redirect
- Site reorganization changes URL -- redirect old to new
Ways to Achieve this
- Redirect (Apache)
- Rewrite (Apache)
- Meta http-equiv refresh (in HTML)
- URL shortener services
Redirecting Requests
301 Moved permanently
302 Moved temporarily
Redirecting client requests can be very useful:
- URL moves to a new location
- resource removed
- site structure is reorganized
- Provide "friendly" URLs to advertise, publish, or refer to foot-long URLs.
Redirect
For cscie12.dce.harvard.edu
the .htaccess
file contains:
Redirect 302 /syllabus https://harvard.instructure.com/courses/95649/assignments/syllabus
Rewrite
mod_rewrite uses regular expressions to match on a pattern and rewrite incoming URLs to a new URL location.
Using mod_rewrite from within .htaccess
If you use RewriteRule
from within an .htaccess
files, you must
use the RewriteBase
directive.
See: http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritebase
Example - Make Simple Links Instead of Complex Ones
Context: Parks and Recreation class offered and how to easily link directly to the class
Park and Rec system:https://webtrac.littletonrec.com/wbwsc/webtrac.wsc/wbsearch.html
Link I can use with Rewrite rulehttp://littletontrack.org/lpr-303107
RewriteEngine On
RewriteBase /
RewriteRule ^lpr-(.*)$ https://webtrac.littletonrec.com/wbwsc/webtrac.wsc/wbsearch.html?per=10&xxsearch=yes&xxdispmap=no+&xxmulti-list=&xxmulti-lbls=&xxrowid=&xxmod=ar&xxactivitynumber=$1&xxage=&xxgrade=&xxkeyword=&xxkeywordoption=N&xxtype=&xxcategory=&xxsortoption=ActivityNumber&xxdisplayoption=D&xxsubmit=Search
Example: Create Links that can always point to the correct place
Road Race Registration is done through a 3rd party service, SignMeUp
Redirect /registration https://www.signmeup.com/site/reg/register.aspx?fid=B42VRH7
Redirect /map http://maps.google.com/maps/ms?ie=UTF8&hl=en&msa=0&msid=101999702593116464805.00046f1a27a9feb5aacaf&ll=42.52946,-71.485934&spn=0.018975,0.018239&z=15
URL Shortener Services
- Bitly, TinyURL, Owly, and more
- and others...
My Example Project - .htaccess setting to improve Webpagetest scores!
Nature of America - My Example Project Site
.htaccess
file:
# default to index.html
DirectoryIndex index.html
# BEGIN Expire headers
<IfModule mod_expires.c>
# Turn on the module.
ExpiresActive on
# Set the default expiry times.
ExpiresDefault "now"
ExpiresByType image/jpg "access plus 1 month"
ExpiresByType image/svg+xml "access 1 month"
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType text/css "access plus 1 month"
ExpiresByType text/javascript "access plus 1 month"
ExpiresByType application/javascript "access plus 1 month"
ExpiresByType image/ico "access plus 1 month"
ExpiresByType image/x-icon "access plus 1 month"
ExpiresByType text/html "access plus 600 seconds"
</IfModule>
# END Expire headers
# Security Policy that determines domains that resources can load from
<IfModule mod_headers.c>
Header set Strict-Transport-Security "max-age=2592000; includeSubDomains; preload"
Header set Content-Security-Policy: "default-src 'self'; img-src 'self' cdn.jsdelivr.net; script-src 'self' 'unsafe-eval' code.jquery.com cdn.jsdelivr.net *.cloudflare.com; style-src 'self' *.jsdelivr.net *.cloudflare.com fonts.gstatic.com fonts.googleapis.com; font-src 'self' fonts.gstatic.com fonts.googleapis.com"
Header set X-Frame-Options: DENY
</IfModule>
# compress (DEFLATE) files that are text
<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE text/html text/css text/javascript application/javascript application/json
</IfModule>
Options -Indexes
# All errors will go to a common error file
ErrorDocument 404 /underconstruction.html
ErrorDocument 403 /underconstruction.html
ErrorDocument 500 /underconstruction.html
# Shouldn't publish from a git checkout anyway,
# but just in case, sent requests trying to access .git to 404
RedirectMatch 404 /\.git
HTTP Cookies
- Server "sets" cookies
- Browser "returns" cookies
Cookie Example
- Server returns cookie to HTTP client ("Set-Cookie" response header)
- HTTP client returns cookie to server ("Cookie" request header)
ESPN Cookies
Request made August 5, 2021
set-cookie: connectionspeed=full; path=/; Expires=Thu, 05 Aug 2021 23:26:07 GMT;
set-cookie: country=us; path=/;
set-cookie: edition-view=espn-en-us; path=/; Expires=Thu, 05 Aug 2021 23:26:07 GMT;
set-cookie: edition=espn-en-us; path=/; Expires=Thu, 05 Aug 2021 23:26:07 GMT;
set-cookie: region=ccpa; path=/; Expires=Thu, 05 Aug 2021 23:26:07 GMT;
set-cookie: _dcf=1; path=/; Expires=Thu, 05 Aug 2021 23:26:07 GMT;
set-cookie: SWID=465CB153-A764-448D-C49D-10E2E4D42298; path=/; Expires=Mon, 29 Jul 2041 23:26:07 GMT; domain=espn.com;
Note: Cookie name and value, the cookie path, and cookie expiration (in the future or blank), and cookie domain.
Your Cookies
- Firefox: about:preferences#privacy
- Chrome: chrome://settings/siteData?search=cookies
- Edge: Settings and more → Settings → Site permissions
- Safari: Preferences → Privacy
Firefox Cookies
Chrome - Cookies
Cookies and Session IDs
A UserID or SessionID (a long character/number string that is uniquely assigned) is often stored in cookie. The SessionID is used as the key or identifier when storing information about the user or session.
For example, a user logs in to a site. If the username and password match, the server sets a cookie ("Set-Cookie") in the browser that contains a session id; the server also makes an entry in website database that maps the session id to the username. When the cookie is returned, the session id is read and the username is looked up in the database.
Cookies and JavaScript
JavaScript gives you access to read and write cookies.
This can be used to record when a user dismisses an 'overlay' dialog such as "This site uses cookies", "Sign up for our email list", etc.
JavaScript can set a cookie upon "dismiss", and then only show the overlay dialog if the cookie is not present.
Search Engines and Optimization
- Prepare
- Content: Valid markup (or at least well-formed!), semantic markup,
<title>
, meta tags, visual elements have text - Site: robots.txt, sitemap.xml
- Content: Valid markup (or at least well-formed!), semantic markup,
- Submit your site
Search Robots, Crawlers, Spiders
Three mechanisms to instruct robots that visit your site:
- robots.txt file
- robots meta tag
rel="nofollow"
fora
elements
robots.txt and Examples
- User-Agent
- Disallow
Check out some real robots.txt files!
Robots meta element in markup
<meta name="robots" content="noindex,nofollow" />
- name="robots"
- content
- index or noindex
- follow or nofollow
- noarchive
- none
- all
The Robots meta element can be used on a per document basis.
HTTP Header: X-Robots-Tag
index
— index the pagenoindex
— don't index the pagefollow
— follow links from the pagenosnippet
— don't display descriptions or cached linksnofollow
— don't follow links from the pagenoarchive
— don't cache/archive the pagenone
— do nothing, ignore the pageall
— do whatever you want, default behavior
Content: meta tags
meta tags and Metadata Guidelines (W3 EOWG)
meta elements from Harvard University:
<title>Harvard University</title>
<meta name="description" content="Harvard University is devoted to excellence in teaching, learning, and research, and to developing leaders who make a difference globally." />
<link rel="canonical" href="https://www.harvard.edu/" />
<meta property="og:locale" content="en_US" />
<meta property="og:type" content="website" />
<meta property="og:title" content="Harvard University" />
<meta property="og:description" content="Harvard University is devoted to excellence in teaching, learning, and research, and to developing leaders who make a difference globally." />
<meta property="og:url" content="https://www.harvard.edu/" />
<meta property="og:site_name" content="Harvard University" />
<meta property="article:modified_time" content="2021-07-26T15:33:43+00:00" />
<meta property="og:image" content="https://www.harvard.edu/wp-content/uploads/2021/02/Shield_Social-1-1200x630.jpg" />
<meta property="og:image:width" content="1200" />
<meta property="og:image:height" content="630" />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:title" content="Harvard University" />
<meta name="twitter:description" content="Harvard University is devoted to excellence in teaching, learning, and research, and to developing leaders who make a difference globally." />
<meta name="twitter:image" content="https://www.harvard.edu/wp-content/uploads/2021/02/Shield_Social-1-1024x512.jpg" />
HTML5 Boilerplate
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>A Basic HTML5 Template</title>
<meta name="description" content="A simple HTML5 Template for new projects.">
<meta name="author" content="SitePoint">
<meta property="og:title" content="A Basic HTML5 Template">
<meta property="og:type" content="website">
<meta property="og:url" content="https://www.sitepoint.com/a-basic-html5-template/">
<meta property="og:description" content="A simple HTML5 Template for new projects.">
<meta property="og:image" content="image.png">
<link rel="icon" href="/favicon.ico">
<link rel="icon" href="/favicon.svg" type="image/svg+xml">
<link rel="apple-touch-icon" href="/apple-touch-icon.png">
<link rel="stylesheet" href="css/styles.css?v=1.0">
</head>
<body>
<!-- your content here... -->
<script src="js/scripts.js"></script>
</body>
</html>