Forever Knight Wiki
Advertisement

The Internet Archive is a non-profit digital library founded in 1996 by Brewster Kahle. It offers permanent storage and access to collections of digitized materials. In particular, its mandate includes the preservation of websites by taking regular "snapshots" of the World Wide Web. The data is collected by Alexa Internet, which repeatedly spiders the web to cache copies of sites.

The Wayback Machine (accessed at http://www.archive.org/) provides free public access to the Internet Archive's collection of these "snapshots", thus allowing users to see archived versions of web pages of the past. The site is named after the "WABAC Machine" from The Rocky and Bullwinkle Show cartoons of the 1960s (which in turn was a play on the Univac computer).[1]

Over the years, many fan websites have vanished from the open Web. However, it is often possible to use the Wayback Machine to retrieve a copy that has been recorded by the Internet Archive. In fact, the Wayback Machine can often provide not just a single copy of a given site but a years-long history, thus making it possible to track the changes in a site over time and determine the approximate date when it disappeared.

Limitations[]

Although the Wayback Machine is invaluable, it is not infallible. Anyone wanting to use it should note the following:

  • Internet Archive policies respect "no robots" instructions on robots.txt documents. If a site owner includes such a command, the Internet Archive will either not record the site or not let you see it.
  • The Internet Archive was founded in 1996.
    • Sites that disappeared prior to 1996 will obviously not be included.
    • Since their bots track sites through links, and originally worked outward from links known to the founders, many sites that disappeared shortly after the founding of the Internet Archive were also never included.
  • The bots cannot record a file unless they find it.
    • Sites that have never been linked to will never be recorded (at least, not unless the owner specifically puts in a request, which is now possible); and sites rarely linked to may also never be recorded.
    • Pages/graphics may not have been recorded if there was a typo on a link.
    • In most cases, the bots record from the main directory through the links off the index page. If the site owner has left a dummy index page in situ and linked off a table of contents page with another name (e.g. "home.html"), then the internal pages may not be recorded.
    • Subsections of a site that were not linked into the index page may not have been recorded (though they may be, if they were tracked through an outside link from another website).
  • Because of size limitations, the bots have always not recorded large files. What constitutes a "large" file has changed with time: for sites recorded in the early years of the Archive, size constraints were stricter than they are today. However, you will often find that the following have not been recorded:
    • zip files
    • clips from movies/TV
    • vids
    • sound captures
    • screen captures
    • fan art
    • background graphics
    • graphics (of all sorts)
    • longer fan fiction pages
  • In some cases, the bots do a shallow scan of the net that picks up the pages and graphics in the main directory, but not in subdirectories (or not in sub-subdirectories). As a result, files that have been organized into folders such as "graphics/" or "fanfic/" are not always recorded.

References[]

  1. From the article, "Internet Archive", in Wikipedia.
Advertisement