Aarhus University Seal

Niels Brügger

Defining a National Web Sphere over time from the Perspectives of Collection, Technology and Scholarship

Research output: Contribution to conferencePaperResearchpeer-review

  • Eld Zierau, Det Kongelige Bibliotek, Denmark
  • Niels Brügger
  • Jakob Moesgaard, Det Kongelige Bibliotek, Denmark
This paper describes a framework supporting definition of how to automatically identify national webpages outside a country’s top level domain. The framework aims at a definition that can be put into operation in order to make automatic detection of national web pages. At the same time the framework aims at a definition that can be reused independent of changed behaviours on the net, changes in jurisdiction and changes in technology. A crucial point in this framework is that the perspectives of collection, technology and Scholarship are present in decision making.
The framework origins from a study that aimed at evaluation of different two different strategies for automatic identification of national webpages outside a country’s top level domain; one strategy was based on data from Internet Archives wide_005 world wide webcrawl, and the other was based on a local web crawl based on bulk harvests from the Danish national web archive, Netarkivet. However in both cases a definition of national webpages was needed. Thus the creation of the framework was a prerequisite for the rest of this study.
Motivation of the study and framework is based on the fact that human communication activities are moving more and more onto the internet. This means that a lot of present and future research in the 20th century information flow depends on optimised collection and archiving of such information in web archives. Web archives often reside within national cultural heritage institutions, regularly having a collection scope outlined within some form of legal deposit legislation.
The challenge to define “national webpages” showed out to be far from trivial, and in creation of the framework it quickly became obvious that such a definition requires input from three important perspectives in order to make qualified decisions. In this paper this definition is based on input from three important fields represented by each of the authors, representing the perspectives of scholarship, the Danish web Archive, and computer science. This represents the perspectives of collection, technology and scholarship, which are all very different but also crucial perspectives when formulating definition of national webpages that is basis for actual collection and thus consequently form a web archive.
Besides the non-trivial need for “national webpages” definition, the study also found reason for arguing that it is necessary to repeatedly adjust web collection strategies within a web archive. The conditions for web collection are constantly changing. Even over a five year period we see: change in technology that can assist in collection, change in human behaviour moving away from countries top levels domains and out on .com, .org etc., and changes in jurisdiction influencing the way that the web can be collected technology, thus regularly adjustments of what is national web pages may likely be needed. Therefore the presented framework consists of a list of general criteria as basis for adjustment of web collection strategies which can be made operational in a specific context taking into account the three perspectives.
Original languageEnglish
Publication year10 Jun 2015
Number of pages7
Publication statusPublished - 10 Jun 2015
EventWeb Archives as scholarly Sources: Issues, Practices and Perspectives - Aarhus University, Aarhus, Denmark
Duration: 8 Jun 201510 Jun 2015


ConferenceWeb Archives as scholarly Sources: Issues, Practices and Perspectives
LocationAarhus University

See relations at Aarhus University Citationformats

ID: 96106758