Session 01 - The Internet, World Wide Web, Web Browsers, Web Sites, and HTML
Harvard Extension School
Fall 2020
Course Web Site: https://cscie12.dce.harvard.edu/
Topics
- The Internet and the World Wide Web
- A Web Site over Time
- Components of the Web
- Client-side Web Parts: Markup, Style, Function
- HTML Introduction
- Markup Evolution and Standards
- HTML/SGML/XML — What's the Difference?
- HTML5
- File Management
- Relative URLs
- URL to Filename Mapping
Presentation contains 33 slides
The Internet and the World Wide Web
Image from Opte Project and is used under the Creative Commons Attribution-NonCommercial 4.0 International License.
The Internet: Schematic
Types of Traffic on the Internet
- Web
- HTTP, HTTPS
- Video
- HTTP, RTMP, RTSP
- Email
- SMTP, IMAP, POP
- File Transfer
- SFTP, FTP
- Login
- SSH
Tim Berners-Lee on The World Wide Web
Suppose all the information stored on computers everywhere were linked. Suppose I could program my computer to create a space in which everything could be linked to everything.
Tim Berners-Lee
Today, and throughout this year, we should celebrate the Web’s first 25 years. But though the mood is upbeat, we also know we are not done. We have much to do for the Web to reach its full potential. We must continue to defend its core principles and tackle some key challenges.
Tim Berners-Lee in Welcome to the Web's 25 Anniversary
The Web evolved into a powerful, ubiquitous tool because it was built on egalitarian principles and because thousands of individuals, universities and companies have worked, both independently and together as part of the World Wide Web Consortium, to expand its capabilities based on those principles.
Tim Berners-Lee in Long Live the Web (Scientific American, Nov/Dec 2010)
The irony is that in all its various guises -- commerce, research, and surfing -- the Web is already so much a part of our lives that familiarity has clouded our perception of the Web itself.
Tim Berners-Lee in Weaving the Web (1999)
Features of the World Wide Web
- HyperText Information System
- Cross-Platform and Cross-Device
...then and now - Distributed
- 261 million unique domains, 191 million active sites, and 10 million "web-facing" computers
(Netcraft Web Server Survey, August 2020)
- 261 million unique domains, 191 million active sites, and 10 million "web-facing" computers
- Open Standards
- HTML, CSS, JavaScript, HTTP, TCP/IP
- Open Source
- Mozilla, WebKit, Apache HTTP Server, JavaScript, PHP, Python, etc.
- Web Browser: provides a single interface to many services
- Information, Shopping, Banking, Communication, Finance, Business, etc.
- Dynamic, Interactive, Evolving
Approaching the Web
- Variety of
users, devices, platforms, connection speeds, displays, web browsers, browser settings, and languages - Open Standards and Open Source
- Common expectations for user experience
A Web Site over Time
The White House Site (www.whitehouse.gov)
1996
| 1997
|
1997
| 1998
|
1998
| 1999
|
1999
| 2001
|
2001
| 2002
|
2002
| 2007
|
2007
| 2009
|
2009
| 2011
|
2011
| 2015 - Early
|
2015 - Early
| 2015 - Late
|
2017 - August
| 2018 - January
|
Design as of August 2020 remains essentially the same since January 2018. |
A Web Address - URLs (and URIs)
URL/URIhttps://www.archives.gov/historical-docs/voting-rights-act
- Scheme
https://www.archives.gov/historical-docs/voting-rights-act
- Host
https://www.archives.gov/historical-docs/voting-rights-act
- Path
https://www.archives.gov/historical-docs/voting-rights-act
Aside: Names and Locations: URLs, URIs, and URNs
- URI: Uniform Resource Identifier
- URL: Uniform Resource Locator
- URN: Uniform Resource Name
A book example ("Leadership in Turbulent Times" by Doris Kearns Goodwin).
urn:isbn:978-1476795928
- Uniquely identifies by "name"https://www.barnesandnoble.com/w/leadership-doris-kearns-goodwin/1128008541?ean=9781476795928#/
- URL, "L" is for "Location"; tells you were something is "at"
Both are "URIs" one is a URN and the other is a URL.
Components of the Web
1. HTTP Client
2. HTTP Server
3. Network
Network connecting HTTP client with server.
Client-side Web Parts: Markup, Style, Function
- Structure / Markup (HTML, XHTML)
- Structure
- Content
- Style / Presentation (CSS)
- Style
- Presentation
- Appearance
- Function (Javascript)
- Actions
- Manipulations
Our Solar System: Markup
Our Solar System: Markup + Style
Our Solar System: Markup + Style + Function
HTML Introduction
Markup - HTML
The Code
<!DOCTYPE html>
<html lang="en">
<head>
<title>My Schools</title>
</head>
<body>
<h1>My Schools</h1>
<ul>
<li>
<a href="https://www.harvard.edu/">Harvard University</a><br/>
<img src="images/harvard-shield.png" alt="Harvard Shield" />
</li>
<li>
<a href="https://www.ku.edu/">University of Kansas</a><br/>
<img src="images/kansas-jayhawk.png" alt="University of Kansas Jayhawk" />
</li>
</ul>
</body>
</html>
How a Browser Displays It
How Your Browser Thinks About It
Essential HTML5 Document Structure
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Document Title</title>
</head>
<body>
<!-- content goes here -->
</body>
</html>
- html
- head
- meta
- title
- body
- head
Components of HTML Elements
- Start Tag
- Element Name
- Attribute and Value Pairs
- Content
- End Tag
A Hypertext Link
Markup for a Hypertext link:
<a href="http://www.harvard.edu/">Harvard</a>
How it would render in a web browser:
Start Tag<a href="http://www.harvard.edu/">Harvard</a>
Element Name<a href="http://www.harvard.edu/">Harvard</a>
Attribute<a href="http://www.harvard.edu/">Harvard</a>
Attribute Value <a href="http://www.harvard.edu/">Harvard</a>
Content <a href="http://www.harvard.edu/">Harvard</a>
End Tag <a href="http://www.harvard.edu/">Harvard</a>
Elements, Start Tags, Attributes and values, End Tags, Content
Element Names
- Element names are defined by the HTML5 Standard
- HTML5 has 116 elements defined (as of August 2020).
The elements that are part of HTML5 are fairly stable and are not rapidly evolving.In our "skeleton" document, we've already encountered the
html
,head
,title
,meta
,body
elements.
Attributes and Values
- Not all elements will have attributes
- Some start tags will have more than one different attributes defined. For example:
<link rel="stylesheet" href="styles/site.css" />
Content
- Not all elements will have "content". These are often called "empty".
Examples include:
br
,img
,link
,meta
- Content can be text
- Content can be other elements
End Tags
- For empty elements, the end tag is part of the start tag! For example:
<img src="images/harvard-shield.png" alt="Harvard Veritas Shield" />
- Some end tags are optional. But more about this later.
Markup Evolution and Standards
Markup Standards
- HTML5
HTML 5 Living Standard (WHATWG), (work on-going) - XHTML 1.0, a
reformulation of HTML 4.01 into XML 1.0
January 2000, revised August 2002 - HTML 4.01, December 1999
- HTML 4.0, December 1997
- HTML 3.2, January 1997
- HTML 2.0, November 1995
Benefits of Web Standards
- Markup (HTML)
- Style (CSS)
- Function (JavaScript)
- Improved Accessibility
- People (Section 508, WAI)
- Machines
- Search Engines
- Devices
- Stability
- Forward-compatible and backward-compatible.
- Separation of Concerns (Structure, Style, Function)
- lighter, cleaner pages
- easier maintenance
- easier redesign
- Validation
conservative in what you send"
"Postel's Law" or the "Robustness Principle"
HTML/SGML/XML — What's the Difference?
Main differences between HTML/SGML and XML:
HTML | XML | |
---|---|---|
1. | End tags can be "implied" Closing elements that have implied end tags
| End tags always required (even for "empty" elements)
|
2. | Start tags can be "implied"
| Start tags always required
|
3. | Element and attribute names are not case-sensitive
| Element and attribute names are case-sensitive
|
4. | Attribute values do not need to be in quotes if the values contain
alpha-numeric characters only
| Attribute values must always be in quotes
|
Best Practices for Starting Out
- Use start and end tags, even if optional
- Lower case element and attribute names
- Use quotes around attribute values
- Preference note: David prefers the "XML" syntax, but that's a preference, not a mandate; also it is a preference not shared by everyone.
A Tale of Two Documents
XML Syntax
| SGML/HTML Syntax
|
Cleaner version of SGML/HTML Syntax
Of course, you can use the SGML/HTML syntax and write HTML that looks better. Just because the syntax allows you shorten things and leave out things, doesn't mean you have to.
Like this:
<!DOCTYPE html>
<html lang="en">
<head>
<title>My Document</title>
<meta charset="utf-8" >
</head>
<body>
<h1>My Document</h1>
<ul>
<li>coffee
<li>tea
</ul>
<img src="images/mug.jpg" alt="Mug" >
</body>
</html>
HTML5
116 elements defined in HTML5
More information: HTML5 Living Standard from the WHATWG. Section 4 contains the List of elements in HTML.
I've highlighted the 23 elements that you will use and/or see most commonly.
- The root element
html
- Document metadata
head
title
base
link
meta
- Style
style
- Sections
body
article
section
nav
aside
h1
,h2
,h3
,h4
,h5
,h6
hgroup
header
footer
address
- Grouping content
p
hr
pre
blockquote
ol
ul
li
dl
dt
dd
figure
figcaption
main
div
- Text-level semantics
a
em
strong
small
s
cite
q
dfn
abbr
ruby
rt
rp
data
time
code
var
samp
kbd
sub
sup
i
b
u
mark
bdi
bdo
span
br
wbr
- Edits
ins
del
- Embedded content
picture
source
img
iframe
embed
object
param
video
audio
source
track
map
area
- Tabular data
table
caption
colgroup
col
tbody
thead
tfoot
tr
td
th
- Forms
form
label
input
button
select
datalist
optgroup
option
textarea
output
progress
meter
fieldset
legend
- Interactive elements
details
summary
menu
dialog
- Scripting
script
noscript
template
canvas
slot
Most commonly used or seen elements
Start with these 24 — these are elements you will use in most of your web pages, or that you'll find in a majority of web pages.
- The root element
html
- Document metadata
head
title
link
meta
- Sections
body
nav
h1
,h2
header
footer
- Grouping content
p
ul
li
main
div
- Text-level semantics
a
span
br
- Embedded content
img
- Forms
form
label
input
- Scripting
script
How to find out more about them? Two places that I would start are:
- HTML Living Standard (WHATWG). For example:
- MDN web docs, and specifically MDN - HTML elements reference. For example:
Page Structure - header, main, footer
First, recall the basic document structure:
- html
- head
- title
- meta
- body
- head
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Document Title</title>
</head>
<body>
<!-- content goes here -->
</body>
</html>
header, main, footer
MDN HTML elements reference: header, main, footer.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Document Title</title>
</head>
<body>
<header> <!-- page header --> </header>
<main> <!-- main content goes here --> </main>
<footer> <!-- page footer --> </footer>
</body>
</html>
HTML5 Document Template
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Document Title</title>
</head>
<body>
<header> <!-- page header --> </header>
<main> <!-- main content goes here --> </main>
<footer> <!-- page footer --> </footer>
</body>
</html>
File Management
For Class
- Create a directory or folder for your class work.
- Create a "playground" area where you can play.
- Assignments - unzip/extract the materials, then move into your class work folder
For Web Sites
- Use folders or directories to help organize files. Recommendation is to adopt folder names of
styles
(for CSS files),scripts
(for JavaScript files, andimages
(for images). - Use
index.html
filename as appropriate - Prefer filenames that only have lowercase, numeric, underscore or dashes (e.g. avoid spaces, and other things like !@#$%^&*(){}\|?/>
Relative URLs
URLhttps://www.archives.gov/historical-docs/voting-rights-act
- Scheme
https://www.archives.gov/historical-docs/voting-rights-act
- Host
https://www.archives.gov/historical-docs/voting-rights-act
- Path
https://www.archives.gov/historical-docs/voting-rights-act
Absolute and Relative Locations
- Where does
https://summer.harvard.edu/
go to? - How about
/images/mug.png
? - What about
../styles/site.css
?
Absolute and Relative Locations
Relative locations (URLs) are resolved according to the location (URL) of the containing (starting) document!
Absolute or Fully Qualified URLs
Absolute, or fully-qualified, URLs specify the complete information (scheme, host, port, path).
https://news.harvard.edu/gazette/story/2020/07/public-health-experts-unite-to-bring-clarity-to-coronavirus-response/
Relative or Partial URLs
Relative, or partial, URIs specify partial information. The information not provided is resolved from the current location.
<a href="slide2.html">Slide 2</a>
Relative to Server Root
Is this relative or absolute? Scheme, host, and port would be resolved from current location, but path is absolute
<a href="/copyright.html">copyright information</a>
Relative Paths to Parent Locations
../
refers to the parent directory./
refers to current directory
Location:https://www.madeupschool.edu/museums/index.html | |
---|---|
Relative URL | Resolved URL |
../index.html | https://www.madeupschool.edu/index.html |
../arts/index.html | https://www.madeupschool.edu/arts/index.html |
../images/museum_building.jpg | https://www.madeupschool.edu/images/museum_building.jpg |
Relative links are "transportable":
Containing Page:https://stage.madeupschool.edu/museums/index.html | ||
---|---|---|
Relative Link | Document | |
../index.html | https://stage.madeupschool.edu/index.html | |
../arts/index.html | https://stage.madeupschool.edu/arts/index.html | |
../images/museum_building.jpg | https://stage.harvard.edu/images/museum_building.jpg |
URL to Filename Mapping
User directories in a shared environment
Web documents for each user are kept in the user's home directory, in a directory typically named public_html. As an example, for the user jharvard whose home directory is /home/courses/j/h/jharvard
URI | https://cs12students.dce.harvard.edu/~jharvard/index.html |
File | /home/courses/j/h/jharvard/public_html/index.html |
Document Root
The Web documents are typically kept under a single directory, traditionally named htdocs. The full path to this directory is called the "document root" of the Web server, for example, /www/htdocs.
URI | https://www.unicorns-r-us.com/jobs/index.html |
File | /www/unicorns-r-us.com//jobs/index.html |
Directory Requests and "index.html"
URL paths that map to a directory. For example the request:
http://www.madeupschool.edu/museums/
would return the index.html
page in the museums
directory.