Now showing 1 - 10 of 10
  • Publication
    Mining the Cultural Memory of Irish Industrial Schools Using Word Embedding and Text Classification
    The Industrial Memories project aims for new distant (i.e., text analytic) and close readings (i.e., witnessing) of the 2009 Ryan Report, the report of the Irish Government’s investigation into abuse at Irish Industrial Schools. The project has digitised the Report and used techniques such as word embedding and automated text classification using machine learning to re-present the Report’s key findings in novel ways that better convey its contents. The Ryan Report exposes the horrific details of systematic abuse of children in Irish industrial schools between 1920 and 1990. It contains 2,600 pages with over 500,000 words detailing evidence from the 9- year-long investigation. However, the Report’s narrative form and its sheer length effectively make many of it findings quite opaque. The Industrial Memories project uses text analytics to examine the language of the Report, to identify recurring patterns and extract key findings. The project represents the Report via an exploratory web-based interface that supports further analysis of the text. The methodology outlined is scalable and suggests new approaches to such voluminous state documents.
      142
  • Publication
    Systems in Language: Text Analysis of Government Reports of the Irish Industrial School System with Word Embedding
    (Oxford University Press, 2019-06-03) ; ;
    Industrial Memories is a digital humanities initiative to supplement close readings of a government report with new distant readings, using text analytics techniques. The Ryan Report (2009), the official report of the Commission to Inquire into Child Abuse (CICA), details the systematic abuse of thousands of children 15 from 1936 to 1999 in residential institutions run by religious orders and funded and overseen by the Irish State. Arguably, the sheer size of the Ryan Report—over 1 million words— warrants a new approach that blends close readings to witness its findings, with distant readings that help surface system-wide findings embedded in the Report. Although CICA has been lauded internationally for 20 its work, many have critiqued the narrative form of the Ryan Report, for obfuscating key findings and providing poor systemic, statistical summaries that are crucial to evaluating the political and cultural context in which the abuse took place (Keenan, 2013, Child Sexual Abuse and the Catholic Church: Gender, Power, and Organizational Culture. Oxford University Press). In this article, we concentrate on describing the distant reading methodology we adopted, using machine learning and text-analytic methods and report on what they surfaced from the 2 Report. The contribution of this work is threefold: (i) it shows how text analytics can be used to surface new patterns, summaries and results that were not apparent via close reading, (ii) it demonstrates how machine learning can be used to annotate text by using word embedding to compile domain-specific semantic lexicons for feature extraction and (iii) it demonstrates how digital humanities methods can be applied to an official state inquiry with social justice impact.
      320
  • Publication
    Mitigating Gender Bias in Machine Learning Data Sets
    Algorithmic bias has the capacity to amplify and perpetuate societal bias, and presents profound ethical implications for society. Gender bias in algorithms has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address such issues have involved testing learned associations, integrating concepts of fairness to machine learning, and performing more rigorous analysis of training data. Mitigating bias when algorithms are trained on textual data is particularly challenging given the complex way gender ideology is embedded in language. This paper proposes a framework for the identification of gender bias in training data for machine learning. The work draws upon gender theory and sociolinguistics to systematically indicate levels of bias in textual training data and associated neural word embedding models, thus highlighting pathways for both removing bias from training data and critically assessing its impact in the context of search and recommender systems.
      268Scopus© Citations 24
  • Publication
    Uncovering gender bias in newspaper coverage of Irish politicians using machine learning
    (Oxford University Press, 2018-06-09)
    This article presents a text-analytic approach to analysing media content for evidence of gender bias. Irish newspaper content is examined using machine learning and natural language processing techniques. Systematic differences in the coverage of male and female politicians are uncovered, and these differences are analysed for evidence of gender bias. A corpus of newspaper coverage of politicians over a 15-year period was created. Features of the text were extracted and patterns differentiating coverage of male and female politicians were identified using machine learning. Discriminative features were then analysed for evidence of gender bias. Findings showed evidence of gender bias in how female politicians were portrayed, the policies they were associated with, and how they were evaluated. This research also sets out a methodology whereby natural language processing and machine learning can be used to identify gender bias in media coverage of politicians.
      967Scopus© Citations 12
  • Publication
    Data, Power and Bias in Artificial Intelligence
    Artificial Intelligence has the potential to exacerbate societal bias and set back decades of advances in equal rights and civil liberty. Data used to train machine learning algorithms may capture social injustices, inequality or discriminatory attitudes that may be learned and perpetuated in society. Attempts to address this issue are rapidly emerging from different perspectives involving technical solutions, social justice and data governance measures. While each of these approaches are essential to the development of a comprehensive solution, often discourse associated with each seems disparate. This paper reviews ongoing work to ensure data justice, fairness and bias mitigation in AI systems from different domains exploring the interrelated dynamics of each and examining whether the inevitability of bias in AI training data may in fact be used for social good. We highlight the complexity associated with defining policies for dealing with bias. We also consider technical challenges in addressing issues of societal bias.
      235
  • Publication
    Re-reading the Ryan report: Witnessing via close and distant reading
    (Irish-American Cultural Institute, 2017-10-04) ; ;
    In the days following the publication of the Final Report of the Commission to Inquire into Child Abuse (2009), also known as the Ryan Report, there was widespread national and international public reaction to the conclusions of the report that over the course of nine decades abuse had been severe and systemic in the Irish residential-institution system for children run by the religious congregations of the Catholic church.
      492Scopus© Citations 7
  • Publication
    Industrial Memories: Exploring the Findings of Government Inquiries with Neural Word Embedding and Machine Learning
    We present a text mining system to support the exploration of large volumes of text detailing the findings of government inquiries. Despite their historical significance and potential societal impact, key findings of inquiries are often hidden within lengthy documents and remain inaccessible to the general public. We transform the findings of the Irish government’s inquiry into industrial schools and through the use of word embedding, text classification and visualization, present an interactive web-based platform that enables the exploration of the text in new ways to uncover new historical insights.
      488Scopus© Citations 1
  • Publication
    Curatr: A Platform for Exploring and Curating Historical Text Corpora
    The increasing availability of digital collections of historical texts presents a wealth of possibilities for new research in the humanities. However, the scale and heterogeneity of such collections raises significant challenges when researchers attempt to find and extract relevant content. This work describes Curatr, an online platform that incorporates domain expertise and methods from machine learning to support the exploration and curation of large historical corpora. We discuss the use of this platform in making the British Library Digital Corpus of 18th and 19th century books more accessible to humanities researchers.
      182
  • Publication
    Navigating Literary Text with Word Embeddings and Semantic Lexicons
    Word embeddings represent a powerful tool for mining the vocabularies of literary and historical text. However, there is little research demonstrating appropriate strategies for representing text and setting parameters, when constructing embedding models within a digital humanities context. In this paper we examine the effects of these choices using a case study involving 18th and 19th century texts from the British Library. The study demonstrates the importance of examining implicit assumptions around default strategies, when using embeddings with literary texts and highlights the potential of quantitative analysis to inform critical analysis
      286