Now showing 1 - 10 of 16
  • Publication
    Curatr: A Platform for Exploring and Curating Historical Text Corpora
    The increasing availability of digital collections of historical texts presents a wealth of possibilities for new research in the humanities. However, the scale and heterogeneity of such collections raises significant challenges when researchers attempt to find and extract relevant content. This work describes Curatr, an online platform that incorporates domain expertise and methods from machine learning to support the exploration and curation of large historical corpora. We discuss the use of this platform in making the British Library Digital Corpus of 18th and 19th century books more accessible to humanities researchers.
  • Publication
  • Publication
    Novel2Vec: Characterising 19th Century Fiction via Word Embeddings
    Recently, considerable attention has been paid to word embedding algorithms inspired by neural network models. Given a large textual corpus, these algorithms attempt to derive a set of vectors which represent the corpus vocabulary in a new embedded space. This representation can provide a useful means of measuring the underlying similarity between words. Here we investigate this property in the context of annotated texts of 19th-century fiction by the authors Jane Austen, Charles Dickens, and Arthur Conan Doyle. We demonstrate that building word embeddings on these texts can provide us with an insight into how characters group differently under different conditions, allowing us to make comparisons across different novels and authors. These results suggest that word embeddings can potentially provide a useful tool in supporting quantitative literary analysis.
  • Publication
    Waking the Dead: Antigone, Ismene and Anne Enright's Narrators in Mourning
    (Irish Academic Press, 2011-10-31)
    Reflecting in 2008 on the link between her groundbreaking work on gender and her more recent work on war, Judith Butler proposed a relationship between liveable and grievable lives: 'it is very often a struggle to make certain kinds of lost life publicly grievable'. This essay takes Butler's exploration of the 'politics of mourning' as its starting place for a reading of The Gathering and of the short story, 'My Little Sister' from Taking Pictures.
  • Publication
    Navigating Literary Text with Word Embeddings and Semantic Lexicons
    Word embeddings represent a powerful tool for mining the vocabularies of literary and historical text. However, there is little research demonstrating appropriate strategies for representing text and setting parameters, when constructing embedding models within a digital humanities context. In this paper we examine the effects of these choices using a case study involving 18th and 19th century texts from the British Library. The study demonstrates the importance of examining implicit assumptions around default strategies, when using embeddings with literary texts and highlights the potential of quantitative analysis to inform critical analysis
  • Publication
    Mitigating Gender Bias in Machine Learning Data Sets
    Algorithmic bias has the capacity to amplify and perpetuate societal bias, and presents profound ethical implications for society. Gender bias in algorithms has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address such issues have involved testing learned associations, integrating concepts of fairness to machine learning, and performing more rigorous analysis of training data. Mitigating bias when algorithms are trained on textual data is particularly challenging given the complex way gender ideology is embedded in language. This paper proposes a framework for the identification of gender bias in training data for machine learning. The work draws upon gender theory and sociolinguistics to systematically indicate levels of bias in textual training data and associated neural word embedding models, thus highlighting pathways for both removing bias from training data and critically assessing its impact in the context of search and recommender systems.
    Scopus© Citations 21  260
  • Publication
  • Publication
    Long Day's Journey into Night: Modernism, Post-Modernism and Maternal Loss
    (Chelsea House, 2009)
    Long Day's journey into Night may seem a strange starting place for a feminist analysis of modernism and post-modernism. Yet even the most conservative criticism reads this play as an enactment and embodiment of loss, specifically loss of the mother. That loss is rarely seen in the context of a more general "loss", a cultural loss of legitimacy and authenticity, endemic in and enabling modernism, articulated as "disinheritance" by an Other "coded as feminine."