Now showing 1 - 10 of 16
  • Publication
    Novel2Vec: Characterising 19th Century Fiction via Word Embeddings
    Recently, considerable attention has been paid to word embedding algorithms inspired by neural network models. Given a large textual corpus, these algorithms attempt to derive a set of vectors which represent the corpus vocabulary in a new embedded space. This representation can provide a useful means of measuring the underlying similarity between words. Here we investigate this property in the context of annotated texts of 19th-century fiction by the authors Jane Austen, Charles Dickens, and Arthur Conan Doyle. We demonstrate that building word embeddings on these texts can provide us with an insight into how characters group differently under different conditions, allowing us to make comparisons across different novels and authors. These results suggest that word embeddings can potentially provide a useful tool in supporting quantitative literary analysis.
      1344
  • Publication
    Navigating Literary Text with Word Embeddings and Semantic Lexicons
    Word embeddings represent a powerful tool for mining the vocabularies of literary and historical text. However, there is little research demonstrating appropriate strategies for representing text and setting parameters, when constructing embedding models within a digital humanities context. In this paper we examine the effects of these choices using a case study involving 18th and 19th century texts from the British Library. The study demonstrates the importance of examining implicit assumptions around default strategies, when using embeddings with literary texts and highlights the potential of quantitative analysis to inform critical analysis
      286
  • Publication
    Curatr: A Platform for Exploring and Curating Historical Text Corpora
    The increasing availability of digital collections of historical texts presents a wealth of possibilities for new research in the humanities. However, the scale and heterogeneity of such collections raises significant challenges when researchers attempt to find and extract relevant content. This work describes Curatr, an online platform that incorporates domain expertise and methods from machine learning to support the exploration and curation of large historical corpora. We discuss the use of this platform in making the British Library Digital Corpus of 18th and 19th century books more accessible to humanities researchers.
      182
  • Publication
    Exploring the Role of Gender in 19th Century Fiction Through the Lens of Word Embeddings
    Within the last decade, substantial advances have been made in the field of computational linguistics, due in part to the evolution of word embedding algorithms inspired by neural network models. These algorithms attempt to derive a set of vectors which represent the vocabulary of a textual corpus in a new embedded space. This new representation can then be used to measure the underlying similarity between words. In this paper, we explore the role an author's gender may play in the selection of words that they choose to construct their narratives. Using a curated corpus of forty-eight 19th century novels, we generate, visualise, and investigate word embedding representations using a list of gender-encoded words. This allows us to explore the different ways in which male and female authors of this corpus use terms relating to contemporary understandings of gender and gender roles.
      1322
  • Publication
      726
  • Publication
    Mitigating Gender Bias in Machine Learning Data Sets
    Algorithmic bias has the capacity to amplify and perpetuate societal bias, and presents profound ethical implications for society. Gender bias in algorithms has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address such issues have involved testing learned associations, integrating concepts of fairness to machine learning, and performing more rigorous analysis of training data. Mitigating bias when algorithms are trained on textual data is particularly challenging given the complex way gender ideology is embedded in language. This paper proposes a framework for the identification of gender bias in training data for machine learning. The work draws upon gender theory and sociolinguistics to systematically indicate levels of bias in textual training data and associated neural word embedding models, thus highlighting pathways for both removing bias from training data and critically assessing its impact in the context of search and recommender systems.
      268Scopus© Citations 24
  • Publication
      503
  • Publication
    Waking the Dead: Antigone, Ismene and Anne Enright's Narrators in Mourning
    (Irish Academic Press, 2011-10-31)
    Reflecting in 2008 on the link between her groundbreaking work on gender and her more recent work on war, Judith Butler proposed a relationship between liveable and grievable lives: 'it is very often a struggle to make certain kinds of lost life publicly grievable'. This essay takes Butler's exploration of the 'politics of mourning' as its starting place for a reading of The Gathering and of the short story, 'My Little Sister' from Taking Pictures.
      1105