Collected website information from over 250 colleges (list from US News & World Report).
For each school, collect the following information:
| URL | http://www.washington.edu/ |
|---|---|
| Title | University of Washington |
| Valid | valid |
| Errors | 0 |
| Doctype | -//W3C//DTD XHTML 1.0 Strict//EN |
| encoding | iso-8859-1 |
| error string | 0 error |
| URL_end | http://www.washington.edu/ |
| thumshot_url | http://open.thumbshots.org/image.pxf?url=http://www.washington.edu/ |
This was accomplished through the wonders of Perl and the xml output of the W3C Markup Validator
output=xml in query stringNote too that we are using Open Thumbshots, which allows us to grab a 120x90 pixel image of a URL.