Aarhus Universitets segl

Digital humanities and web archives: Possible new paths for combining datasets

Publikation: Bidrag til tidsskrift/Konferencebidrag i tidsskrift /Bidrag til avisTidsskriftartikelForskningpeer review

This article discusses the importance of web archives making their collections available as data and not only as sources seen through the Wayback Machine’s interface where only individual web pages are displayed. This will help unlock the full potential of the treasure trove that web archives constitute, and thereby also open up for methods from the wider field of digital humanities. Based on a case study of the entire Danish web domain .dk the article discusses methodological challenges involved in combining large extracted datasets from web archives, namely metadata about the size of websites and data about hyperlinks from the same websites. The aim is to answer the following two questions: 1) How to combine two different types of datasets extracted from a web archive, in this case the Danish Netarkivet? 2) What can the result of such a combination teach us about the structural characteristics of the Danish web domain from 2006 to 2015? The article shows that, indeed, it is possible to go beyond the Wayback Machine as the prime interface to web archives by combining two distinct datasets, and that such a venture can provide valuable knowledge about the overall structure of the Danish web domain, thus highlighting that websites of the same size tend to constitute isolated ‘link islands’, and that big websites are also the most important in the hyperlink network, which is more clearly the case in 2015 than in 2006.
TidsskriftInternational Journal of Digital Humanities
Sider (fra-til)145-168
Antal sider24
StatusUdgivet - nov. 2021


  • webarkiv, metadata, hyperlink, netværk, nationalt webdomæne, danmark

Se relationer på Aarhus Universitet Citationsformater

ID: 216608491