Big Data in the Arts and Humanities: Some Arts and Humanities Research Council Projects
‘Big Data’ as a shorthand term describes the recent explosion in digital data and the new computing methods developed to manage and exploit these vast amounts of data. Google indexes over 20 billion pages a day; Visa processes 300 million transactions a day; the Large Hadron Collider generated over 200 petabytes of data in the search for the Higgs Boson, while the Square Kilometre Array radio telescope will from 2024 produce over four times as much data as is currently carried on the World Wide Web. We usually assume that Big Data is chiefly connected with science or commerce, but arts and humanities researchers are also exploring the problems and opportunities offered by the exponential growth in digital data.
The datasets used by arts and humanities researchers are starting to pose the classic Big Data characteristics of ‘high volume, high velocity, and/or high variety information assets that require new forms of processing’. The Holocaust Survivor Testimonials Collections by the Shoa Foundation contains about 200 terabytes of data, comparable in scale to such early examples of scientific Big Data as the Sloan Digital Sky Survey. Historians researching the presidency of George W. Bush are fortunate to have access to the e-mail archive of the Presidential office, but they will require new methods to analyse and explore the 200 million e-mails in the Bush archive. Linguistic corpora have for a long time been of great importance to researchers working on language and literature, but new web-based corpora, such as the American English dataset from Google Books containing 200 billion words, pose challenging problems of scale and variability. The world wide web itself is increasingly a fundamental resource for studying culture, history and literature, but how do we archive and analyse the web for such studies?
Recognising the urgency and significance of these issues for arts and humanities researchers, the Arts and Humanities Research Council, the UK funding agency supporting both arts and humanities research, launched in 2013 calls for proposals for research in big data in the Arts and Humanities as part of its strategic theme of ‘Digital Transformations’. In a number of the projects, academic researchers engaged in a process of co-production of research with community partners, as part of the Research Councils UK ‘Connected Communities’ programme. This small booklet provides an overview of the exciting and wide-ranging research undertaken by these projects.
Some of the projects described here engaged in research which will help shape future computing environments, such as a world wide web which can more easily understand the different meanings of words or can read music. Many of the largest data sets with which arts and humanities researchers deal consist of unstructured data such as images, films and sound, and a number of the projects funded by the AHRC explored ways in which researchers can more easily annotate, document and explore such multimedia files. The tools and methods developed by these projects will not only enable both academic researchers and the public to more easily explore our cultural heritage but will also help support innovation in the digital and creative economies.
Much of the enthusiasm for Big Data derives from the way in which techniques such as linking, geolocation, network analysis and visualisation can be used to undertake analyses that were not previously feasible, such as the prediction of pandemics or anticipating the properties of new materials. The AHRC projects illustrate how these methods can also be used to explore major issues in the Arts and Humanities such as reconstituting ancient populations, understanding the effect of plague on cultural activity or the behaviour of voters in elections. These methods can also be used to help capture the multi-vocal and richly layered cultural history of our cities, and a number of the web resources and apps produced by the AHRC projects illustrate how such community memories and engagement can interact with and enhance urban regeneration.
The use of predictive techniques by commercial and retail organisations to target advertising campaigns has been a controversial aspect of Big Data. Frequently, such predictive analytics can seem de-humanising. Some of the AHRC projects explored ways in which such probability-based methods can assist the everyday citizen in understanding and engaging with such huge databases as the UK legislation database. Others showed how probabilistic methods could be used to recapture the individual, as for example when they are used to trace obscure individual in the National Archives. In these ways, Big Data methods can be used to re-humanise the environment, recording the memories of a local community about the history of a holly tree or adding depth to a poetry archive by incorporating the memories of the poets themselves. The AHRC projects illustrate how the memories enshrined in local museums, libraries, archives and libraries could become enriched by engagement with local communities who become co-curators in the enterprise, reshaping and extending our understanding of the process by which such institutions engage with our memories.
The scale and sophistication of such developments as predictive analytics or the internet of things make them seem remote and threatening to many local communities. If we are to fully exploit and benefit from the potential of such developments, we need to ensure that local communities are full partners in them. For a resident in a care home, the internet of things may seem n alien concept, but they can use such technologies to pass on their memories and maintain a sense of continuing identity and worth. The most striking aspect of all the AHRC Big Data projects was the vibrancy and richness of the community co-production work. Whether working with young coders in investigating the personal data produced by smartphones, preparing print-on-demand books for the elderly describing their life story, or encouraging young carers to use an online archive of ornithology, the most important message conveyed by the AHRC researchers is that it was the engagement with the community which was the most fascinating and rewarding aspect of the project.
The arts and humanities have a great deal to offer a Big Data economy and society. Among the many important contributions they can make is a strong awareness of the significance and cultural contexts of design and visualisation, a profound sense of the way in which data is culturally and socially situated, an awareness that data is never ‘raw’, and an ability to move between macro and micro perspectives. Arts and humanities researchers are inherently distrustful of grand sweeping narratives, and that critical eye will be essential as the world becomes more infatuated with data. But the most important contribution that arts and humanities are able to make is that, through working with communities through shared memories and cultures, the arts and humanities will help to ensure that a Big Data society is one which retains a human perspective.
In a spirited intervention at a panel during the Newcastle Literary Festival, part of the Poetics of an Archive project, the poet and library curator Richard Price explained how a vibrant local poetry scene can contribute to a successful urban economy. The AHRC Big Data projects also showed how the spirit of poetry can contribute to a successful data economy.
Andrew Prescott
University of Glasgow
AHRC Theme Leader Fellow for Digital Transformations