Now showing 1 - 10 of 32
  • Publication
    Lit@EVE: Explainable Recommendation based on Wikipedia Concept Vectors
    (Springer, 2017-12-30) ;
    We present an explainable recommendation system for novels and authors,called Lit@EVE, which is based on Wikipedia concept vectors. In this system,each novel or author is treated as a concept whose definition is extractedas a concept vector through the application of an explainable word embeddingtechnique called EVE. Each dimension of the concept vector is labelled as eithera Wikipedia article or a Wikipedia category name, making the vector representationreadily interpretable. In order to recommend items, the Lit@EVE systemuses these vectors to compute similarity scores between a target novel or authorand all other candidate items. Finally, the system generates an ordered list of suggesteditems by showing the most informative features as human-readable labels,thereby making the recommendation explainable.
      441
  • Publication
    Detecting Attention Dominating Moments Across Media Types
    (CEUR Workshop Proceedings, 2016-03-20) ; ;
    In this paper we address the problem of identifying attention dominating moments in online media. We are interested in discovering moments when everyone seems to be talking about the same thing. We investigate one particular aspect of breaking news: the tendency of multiple sources to concentrate attention on a single topic, leading to a collapse in diversity of content for a period of time. In this work we show that diversity at a topic level is effective for capturing this effect in blogs, in news articles, and on Twitter. The phenomenon is present in three distinctly different media types, each with their own unique features. We describe the phenomenon using case studies relating to major news stories from September 2015.
      180
  • Publication
    How Many Topics? Stability Analysis for Topic Models
    Topic modeling refers to the task of discovering the underlyingthematic structure in a text corpus, where the output is commonlypresented as a report of the top terms appearing in each topic. Despitethe diversity of topic modeling algorithms that have been proposed, acommon challenge in successfully applying these techniques is the selectionof an appropriate number of topics for a given corpus. Choosingtoo few topics will produce results that are overly broad, while choosingtoo many will result in theover-clustering of a corpus into many small,highly-similar topics. In this paper, we propose a term-centric stabilityanalysis strategy to address this issue, the idea being that a model withan appropriate number of topics will be more robust to perturbations inthe data. Using a topic modeling approach based on matrix factorization,evaluations performed on a range of corpora show that this strategy cansuccessfully guide the model selection process.
    Scopus© Citations 129  563
  • Publication
    Using crowdsourcing and active learning to track sentiment in online media
    Tracking sentiment in the popular media has long been of interest to media analysts and pundits. With the availability of news content via online syndicated feeds, it is now possible to automate some aspects of this process. There is also great potential to crowdsource much of the annotation work that is required to train a machine learning system to perform sentiment scoring. We describe such a system for tracking economic sentiment in online media that has been deployed since August 2009. It uses annotations provided by a cohort of non-expert annotators to train a learning system to classify a large body of news items. We report on the design challenges addressed in managing the effort of the annotators and in making annotation an interesting experience.
    Scopus© Citations 67  2915
  • Publication
    TwitterCracy: Exploratory Monitoring of Twitter Streams for the 2016 U.S. Presidential Election Cycle
    We present TwitterCracy, an exploratory search system that allows users to search and monitor across the Twitter streams of political entities. Its exploratory capabilities stem from the application of lightweight time-series based clustering together with biased PageRank to extract facets from tweets and presenting them in a manner that facilitates exploration.
      388
  • Publication
    Discovering Structure in Social Networks of 19th Century Fiction
    Inspired by the increasing availability of large text corpora online, digital humanities scholars are adopting computational approaches to explore questions in the field of literature from new perspectives. In this paper, we examine detailed social networks of characters, extracted from several works of 19th century fiction by Jane Austen and Charles Dickens. This allows us to apply methodologies from social network analysis, such as community detection, to explore the structure of these networks. By evaluating the results in collaboration with literary scholars, we find that the structure of the character networks can reveal underlying structural aspects within a novel, particularly in relation to plot and characterisation.
    Scopus© Citations 10  705
  • Publication
    A Latent Space Analysis of Editor Lifecycles in Wikipedia
    Collaborations such as Wikipedia are a key part of the valueof the modern Internet. At the same time there is concern that thesecollaborations are threatened by high levels of member turnover. In thispaper we borrow ideas from topic analysis to editor activity on Wikipediaover time into a latent space that oers an insight into the evolvingpatterns of editor behavior. This latent space representation reveals anumber of dierent categories of editor (e.g. content experts, social net-workers) and we show that it does provide a signal that predicts aneditor's departure from the community. We also show that long termeditors gradually diversify their participation by shifting edit preferencefrom one or two namespaces to multiple namespaces and experience rel-atively soft evolution in their editor proles, while short term editorsgenerally distribute their contribution randomly among the namespacesand experience considerably uctuated evolution in their editor proles.
      355
  • Publication
    Genome-wide epistatic expression quantitative trait loci discovery in four human tissues reveals the importance of local chromosomal interactions governing gene expression
    Background: Epistasis (synergistic interaction) among SNPs governing gene expression is likely to arise withintranscriptional networks. However, the power to detect it is limited by the large number of combinations to betested and the modest sample sizes of most datasets. By limiting the interaction search space firstly to cis-trans andthen cis-cis SNP pairs where both SNPs had an independent effect on the expression of the most variabletranscripts in the liver and brain, we greatly reduced the size of the search space.Results: Within the cis-trans search space we discovered three transcripts with significant epistasis. Surprisingly, allinteracting SNP pairs were located nearby each other on the chromosome (within 290 kb-2.16 Mb). Despite theirproximity, the interacting SNPs were outside the range of linkage disequilibrium (LD), which was absent betweenthe pairs (r2 < 0.01). Accordingly, we redefined the search space to detect cis-cis interactions, where a cis-SNP waslocated within 10 Mb of the target transcript. The results of this show evidence for the epistatic regulation of 50transcripts across the tissues studied. Three transcripts, namely, HLA-G, PSORS1C1 and HLA-DRB5 share commonregulatory SNPs in the pre-frontal cortex and their expression is significantly correlated. This pattern of epistasis isconsistent with mediation via long-range chromatin structures rather than the binding of transcription factors intrans. Accordingly, some of the interactions map to regions of the genome known to physically interact inlymphoblastoid cell lines while others map to known promoter and enhancer elements. SNPs involved in interactionsappear to be enriched for promoter markers.Conclusions: In the context of gene expression and its regulation, our analysis indicates that the study of cis-cisor local epistatic interactions may have a more important role than interchromosomal interactions.
    Scopus© Citations 4  773
  • Publication
    Tracking the evolution of communities in dynamic social networks
    Real-world social networks from a variety of domains can naturally be modelled as dynamic graphs. However, approaches to detecting communities have largely focused on identifying communities in static graphs. Recently, researchers have begun to consider the problem of tracking the evolution of groups of users in dynamic scenarios. Here we describe a model for tracking the progress of communities over time in a dynamic network, where each community is characterised by a series of significant evolutionary events. This model is used to motivate a community-matching strategy for efficiently identifying and tracking dynamic communities. Evaluations on synthetic graphs containing embedded events demonstrate that this strategy can successfully track communities over time in volatile networks. In addition, we describe experiments exploring the dynamic communities detected in a real mobile operator network containing millions of users.
      4860Scopus© Citations 431
  • Publication
    Mitigating Gender Bias in Machine Learning Data Sets
    Algorithmic bias has the capacity to amplify and perpetuate societal bias, and presents profound ethical implications for society. Gender bias in algorithms has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address such issues have involved testing learned associations, integrating concepts of fairness to machine learning, and performing more rigorous analysis of training data. Mitigating bias when algorithms are trained on textual data is particularly challenging given the complex way gender ideology is embedded in language. This paper proposes a framework for the identification of gender bias in training data for machine learning. The work draws upon gender theory and sociolinguistics to systematically indicate levels of bias in textual training data and associated neural word embedding models, thus highlighting pathways for both removing bias from training data and critically assessing its impact in the context of search and recommender systems.
    Scopus© Citations 23  266