Aarhus University Seal / Aarhus Universitets segl

Digital humanities and web archives: Possible new paths for combining datasets

Publikation: Bidrag til tidsskrift/Konferencebidrag i tidsskrift /Bidrag til avisTidsskriftartikelForskningpeer review

Standard

Digital humanities and web archives : Possible new paths for combining datasets. / Brügger, Niels.

I: International Journal of Digital Humanities, 29.05.2021.

Publikation: Bidrag til tidsskrift/Konferencebidrag i tidsskrift /Bidrag til avisTidsskriftartikelForskningpeer review

Harvard

APA

CBE

MLA

Vancouver

Author

Bibtex

@article{6d289cbf655e4cdfbfd2ffd492b4a120,
title = "Digital humanities and web archives: Possible new paths for combining datasets",
abstract = "This article discusses the importance of web archives making their collections available as data and not only as sources seen through the Wayback Machine{\textquoteright}s interface where only individual web pages are displayed. This will help unlock the full potential of the treasure trove that web archives constitute, and thereby also open up for methods from the wider field of digital humanities. Based on a case study of the entire Danish web domain .dk the article discusses methodological challenges involved in combining large extracted datasets from web archives, namely metadata about the size of websites and data about hyperlinks from the same websites. The aim is to answer the following two questions: 1) How to combine two different types of datasets extracted from a web archive, in this case the Danish Netarkivet? 2) What can the result of such a combination teach us about the structural characteristics of the Danish web domain from 2006 to 2015? The article shows that, indeed, it is possible to go beyond the Wayback Machine as the prime interface to web archives by combining two distinct datasets, and that such a venture can provide valuable knowledge about the overall structure of the Danish web domain, thus highlighting that websites of the same size tend to constitute isolated {\textquoteleft}link islands{\textquoteright}, and that big websites are also the most important in the hyperlink network, which is more clearly the case in 2015 than in 2006.",
keywords = "webarkiv, metadata, hyperlink, netv{\ae}rk, nationalt webdom{\ae}ne, danmark, web archive, metadata, hyperlink, network, national web domain, denmark",
author = "Niels Br{\"u}gger",
year = "2021",
month = may,
day = "29",
doi = "https://doi.org/10.1007/s42803-021-00038-z",
language = "English",
journal = "International Journal of Digital Humanities",
issn = "2524-7840",
publisher = "Springer",

}

RIS

TY - JOUR

T1 - Digital humanities and web archives

T2 - Possible new paths for combining datasets

AU - Brügger, Niels

PY - 2021/5/29

Y1 - 2021/5/29

N2 - This article discusses the importance of web archives making their collections available as data and not only as sources seen through the Wayback Machine’s interface where only individual web pages are displayed. This will help unlock the full potential of the treasure trove that web archives constitute, and thereby also open up for methods from the wider field of digital humanities. Based on a case study of the entire Danish web domain .dk the article discusses methodological challenges involved in combining large extracted datasets from web archives, namely metadata about the size of websites and data about hyperlinks from the same websites. The aim is to answer the following two questions: 1) How to combine two different types of datasets extracted from a web archive, in this case the Danish Netarkivet? 2) What can the result of such a combination teach us about the structural characteristics of the Danish web domain from 2006 to 2015? The article shows that, indeed, it is possible to go beyond the Wayback Machine as the prime interface to web archives by combining two distinct datasets, and that such a venture can provide valuable knowledge about the overall structure of the Danish web domain, thus highlighting that websites of the same size tend to constitute isolated ‘link islands’, and that big websites are also the most important in the hyperlink network, which is more clearly the case in 2015 than in 2006.

AB - This article discusses the importance of web archives making their collections available as data and not only as sources seen through the Wayback Machine’s interface where only individual web pages are displayed. This will help unlock the full potential of the treasure trove that web archives constitute, and thereby also open up for methods from the wider field of digital humanities. Based on a case study of the entire Danish web domain .dk the article discusses methodological challenges involved in combining large extracted datasets from web archives, namely metadata about the size of websites and data about hyperlinks from the same websites. The aim is to answer the following two questions: 1) How to combine two different types of datasets extracted from a web archive, in this case the Danish Netarkivet? 2) What can the result of such a combination teach us about the structural characteristics of the Danish web domain from 2006 to 2015? The article shows that, indeed, it is possible to go beyond the Wayback Machine as the prime interface to web archives by combining two distinct datasets, and that such a venture can provide valuable knowledge about the overall structure of the Danish web domain, thus highlighting that websites of the same size tend to constitute isolated ‘link islands’, and that big websites are also the most important in the hyperlink network, which is more clearly the case in 2015 than in 2006.

KW - webarkiv

KW - metadata

KW - hyperlink

KW - netværk

KW - nationalt webdomæne

KW - danmark

KW - web archive

KW - metadata

KW - hyperlink

KW - network

KW - national web domain

KW - denmark

U2 - https://doi.org/10.1007/s42803-021-00038-z

DO - https://doi.org/10.1007/s42803-021-00038-z

M3 - Journal article

JO - International Journal of Digital Humanities

JF - International Journal of Digital Humanities

SN - 2524-7840

ER -