[x] Close ad

LINK ROT

Link rot is the process by which links on a website gradually become irrelevant or broken as time goes on, because websites that they link to disappear, change their content or redirect to new locations.

The phrase also describes the effects of failing to update webpages so that they become out-of-date, containing information that is old and useless, and that clutters up search engine results. This process most frequently occurs in personal homepages and is prevalent in free webhosts such as GeoCities, where there is no financial incentive to fix link rot.

Contents

Discovering

Detecting link rot for a given URL may be difficult using automated methods. If a URL is accessed and returns back an HTTP 200 (OK) response, it may be considered accessible, but the contents of the page may have changed and may no longer be relevant. Some web servers also return a soft 404, a page returned with a 200 (found) response (instead of a 404) that indicates the URL is no longer accessible. Bar-Yossef et al. (Bar-Yossef et al., 2004) developed a heuristic for automatically discovering soft 404s.

Combating

Webmasters

A number of basic rules can help webmasters to reduce link rot, including:

  • Do not keep a hyperlink collection unless you are willing to look after it.
  • Design your hyperlinks to be maintained, such as a central hyperlink collection.
  • Do not link to sub-pages ("deep linking") unless you are confident that they will remain stable.
  • Use hyperlink checker software or a Content Management System (CMS) with link checking included.
  • Use permalinks.
  • Put the right e-mail address or other contact information on the same page where the links are with specific information ("Found a bad link? Contact links@example.com and we'll fix it.")
  • When changing domains, help others fix their link pages by spreading the information well ahead of the migration, and use HTTP status codes to communicate that a page has moved (eg. "301: Moved Permanently").

Authors citing URLs

A number of studies have shown how wide-spread link rot is in academic literature (see below). Authors of scholarly publications should avoid citing "unstable" Internet references. There are several approaches authors may take to avoid introducing link rot into their work:

Tools

There are a number of tools that can be used to combat link rot by archiving web resources:

  • WebCite, a tool specifically for scholarly authors, journal editors and publishers to permanently archive "on-demand" and retrieve cited Internet references (Eysenbach and Trudel, 2005).
  • Archive-It, a subscription service, allows institutions to build, manage and search their own web archive
  • hanzo:web is a personal web archiving service created by Hanzo Archives that can archive a single web resource, a cluster of web resources, or an entire website, as a one-off collection, scheduled/repeated collection, an RSS/Atom feed collection or collect on-demand via Hanzo's open API.
  • Spurl.net is a free on-line bookmarking service and search engine that allows users to save important web resources.

Modern management

On Wikipedia, and other Wiki-based websites only external links still present a maintenance problem. Wikipedia uses a clear color system with internal links, so the user can see if the link is live before clicking on it.

In academic citations

A number of studies have been performed showing the prevalence of link rot in academic literature:

References

See also

External links