Now showing 1 - 9 of 9
  • Publication
    Novel2Vec: Characterising 19th Century Fiction via Word Embeddings
    Recently, considerable attention has been paid to word embedding algorithms inspired by neural network models. Given a large textual corpus, these algorithms attempt to derive a set of vectors which represent the corpus vocabulary in a new embedded space. This representation can provide a useful means of measuring the underlying similarity between words. Here we investigate this property in the context of annotated texts of 19th-century fiction by the authors Jane Austen, Charles Dickens, and Arthur Conan Doyle. We demonstrate that building word embeddings on these texts can provide us with an insight into how characters group differently under different conditions, allowing us to make comparisons across different novels and authors. These results suggest that word embeddings can potentially provide a useful tool in supporting quantitative literary analysis.
      1344
  • Publication
    Navigating Literary Text with Word Embeddings and Semantic Lexicons
    Word embeddings represent a powerful tool for mining the vocabularies of literary and historical text. However, there is little research demonstrating appropriate strategies for representing text and setting parameters, when constructing embedding models within a digital humanities context. In this paper we examine the effects of these choices using a case study involving 18th and 19th century texts from the British Library. The study demonstrates the importance of examining implicit assumptions around default strategies, when using embeddings with literary texts and highlights the potential of quantitative analysis to inform critical analysis
      286
  • Publication
    Curatr: A Platform for Exploring and Curating Historical Text Corpora
    The increasing availability of digital collections of historical texts presents a wealth of possibilities for new research in the humanities. However, the scale and heterogeneity of such collections raises significant challenges when researchers attempt to find and extract relevant content. This work describes Curatr, an online platform that incorporates domain expertise and methods from machine learning to support the exploration and curation of large historical corpora. We discuss the use of this platform in making the British Library Digital Corpus of 18th and 19th century books more accessible to humanities researchers.
      182
  • Publication
    Identifying representative textual sources in blog networks
    (University College Dublin. School of Computer Science and Informatics, 2011-02) ; ; ; ;
    We apply methods from social network analysis and visualization to facilitate a study of the Irish blogosphere from a cultural studies perspective. We focus on solving the practical issues that arise when the goal is to perform textual analysis of the corpus produced by a network of bloggers. Previous studies into blogging networks have noted difficulties arising when trying to identify the extent and boundaries of these networks. As a response to calls for increasingly data-led approaches in media and cultural studies, we discuss a variety of social network analysis methods that can be used to identify which blogs can be seen as members of a posited "Irish blogging network". We identify hub blogs, communities of sites corresponding to different topics, and representative bloggers within these communities. Based on this study, we propose a set of analysis guidelines for researchers who wish to map out blogging networks.
      2945
  • Publication
    Exploring the Role of Gender in 19th Century Fiction Through the Lens of Word Embeddings
    Within the last decade, substantial advances have been made in the field of computational linguistics, due in part to the evolution of word embedding algorithms inspired by neural network models. These algorithms attempt to derive a set of vectors which represent the vocabulary of a textual corpus in a new embedded space. This new representation can then be used to measure the underlying similarity between words. In this paper, we explore the role an author's gender may play in the selection of words that they choose to construct their narratives. Using a curated corpus of forty-eight 19th century novels, we generate, visualise, and investigate word embedding representations using a list of gender-encoded words. This allows us to explore the different ways in which male and female authors of this corpus use terms relating to contemporary understandings of gender and gender roles.
      1322
  • Publication
    Mitigating Gender Bias in Machine Learning Data Sets
    Algorithmic bias has the capacity to amplify and perpetuate societal bias, and presents profound ethical implications for society. Gender bias in algorithms has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address such issues have involved testing learned associations, integrating concepts of fairness to machine learning, and performing more rigorous analysis of training data. Mitigating bias when algorithms are trained on textual data is particularly challenging given the complex way gender ideology is embedded in language. This paper proposes a framework for the identification of gender bias in training data for machine learning. The work draws upon gender theory and sociolinguistics to systematically indicate levels of bias in textual training data and associated neural word embedding models, thus highlighting pathways for both removing bias from training data and critically assessing its impact in the context of search and recommender systems.
      268Scopus© Citations 24
  • Publication
    Windows on Waverley: exploring the effect of variations in the construction of literary social networks
    In recent years, social network analysis (SNA) has become increasingly popular as a quantitative approach to the examination of literary works, allowing researchers to generate abstract models of character groupings and interactions that appear in texts, and providing new opportunities for the evaluation of theories about communities and societies in literature. The social networks that are generated for a given novel, however, will differ considerably depending on what choices are made in relation to their construction: what types of interactions or co-occurrences are examined, what characters or other entities are considered, whether full texts or subsections such as chapters are investigated, and what automated methods are utilised for extracting character data, among others. This paper examines the effect of varying one specific aspect of network construction, by applying different "sliding window" strategies in order to create variations on social networks in three rather different early 19th-century novels: Pride and Prejudice (1813), Waverley (1814), and Frankenstein (1818). Three window strategies (collinear, co-planar and combination) are discussed, each of which captures qualitatively different social links between characters. We argue that the resulting networks yield different insights into a variety of aspects of the novels' construction, including narrative style and interactions between characters of different social class. We also suggest that rather than seeking to determine a single best-practice methodology for literary SNA, it may instead be illuminating to experiment with different approaches to the modelling of literary texts as social networks.
      85
  • Publication
    Discovering Structure in Social Networks of 19th Century Fiction
    Inspired by the increasing availability of large text corpora online, digital humanities scholars are adopting computational approaches to explore questions in the field of literature from new perspectives. In this paper, we examine detailed social networks of characters, extracted from several works of 19th century fiction by Jane Austen and Charles Dickens. This allows us to apply methodologies from social network analysis, such as community detection, to explore the structure of these networks. By evaluating the results in collaboration with literary scholars, we find that the structure of the character networks can reveal underlying structural aspects within a novel, particularly in relation to plot and characterisation.
      709Scopus© Citations 10