Now showing 1 - 7 of 7
  • Publication
    Extending Jensen Shannon Divergence to Compare Multiple Corpora
    Investigating public discourse on social media platforms has proven a viable way to reflect the impacts of political issues. In this paper we frame this as a corpus comparison problem in which the online discussion of different groups are treated as different corpora to be compared. We propose an extended version of the Jensen-Shannon divergence measure to compare multiple corpora and use the FP-growth algorithm to mix unigrams and bigrams in this comparison. We also propose a set of visualizations that can illustrate the results of this analysis. To demonstrate these approaches we compare the Twitter discourse surrounding Brexit in Ireland and Great Britain across a 14 week time period.
      237
  • Publication
    A Topic-Based Approach to Multiple Corpus Comparison
    (CEUR Workshop Proceedings, 2019-12-06) ; ;
    Corpus comparison techniques are often used to compare different types of online media, for example social media posts and news articles. Most corpus comparison algorithms operate at a word-level and results are shown as lists of individual discriminating words which makes identifying larger underlying differences between corpora challenging. Most corpus comparison techniques also work on pairs of corpora and do need easily extend to multiple corpora. To counter these issues, we introduce Multi-corpus Topic-based Corpus Comparison (MTCC) a corpus comparison approach that works at a topic level and that can compare multiple corpora at once. Experiments on multiple real-world datasets are carried demonstrate the effectiveness of MTCC and compare the usefulness of different statistical discrimination metrics - the χ2 and Jensen-Shannon Divergence metrics are shown to work well. Finally we demonstrate the usefulness of reporting corpus comparison results via topics rather than individual words. Overall we show that the topic-level MTCC approach can capture the difference between multiple corpora, and show the results in a more meaningful and interpretable way than approaches that operate at a word-level.
      4
  • Publication
    Diverging Divergences: Examining Variants of Jensen Shannon Divergence for Corpus Comparison Tasks
    Jensen-Shannon divergence (JSD) is a distribution similarity measurement widely used in natural language processing. In corpus comparison tasks, where keywords are extracted to reveal the divergence between different corpora (for example, social media posts from proponents of different views on a political issue), two variants of JSD have emerged in the literature. One of these uses a weighting based on the relative sizes of the corpora being compared. In this paper we argue that this weighting is unnecessary and, in fact, can lead to misleading results. We recommend that this weighted version is not used. We base this recommendation on an analysis of the JSD variants and experiments showing how they impact corpus comparison results as the relative sizes of the corpora being compared change.
      107
  • Publication
    Effect of Combination of HBM and Certainty Sampling on Workload of Semi-Automated Grey Literature Screening
    With the rapid increase of unstructured text data, grey literature has become an important source of information to support research and innovation activities. In this paper, we propose a novel semiautomated grey literature screening approach that combines a Hierarchical BERT Model (HBM) with active learning to reduce the human workload in grey literature screening. Evaluations over three real-world grey literature datasets demonstrate that the proposed approach can save up to 64.88% of the human screening workload, while maintaining high screening accuracy. We also demonstrate how the use of the HBM model allows salient sentences within grey literature documents to be selected and highlighted to support workers in screening tasks.
      5
  • Publication
    Consumer evaluations of processed meat products reformulated to be healthier - A conjoint analysis study
    Recent innovations in processed meats focus on healthier reformulations through reducing negative constituents and/or adding health beneficial ingredients. This study explored the influence of base meat product (ham, sausages, beef burger), salt and/or fat content (reduced or not), healthy ingredients (omega 3, vitamin E, none), and price (average or higher than average) on consumers' purchase intention and quality judgement of processed meats. A survey (n = 481) using conjoint methodology and cluster analysis was conducted. Price and base meat product were most important for consumers' purchase intention, followed by healthy ingredient and salt and/or fat content. In reformulation, consumers had a preference for ham and sausages over beef burgers, and for reduced salt and/or fat over non reduction. In relation to healthy ingredients, omega 3 was preferred over none, and vitamin E was least preferred. Healthier reformulations improved the perceived healthiness of processed meats. Cluster analyses identified three consumer segments with different product preferences.
    Scopus© Citations 89  255
  • Publication
    Factors that predict consumer acceptance of enriched processed meats
    The study aimed to understand predictors of consumers' purchase intention towards processed meat based functional foods (i.e. enriched processed meat). A cross-sectional survey was conducted with 486 processed meat consumers in spring 2016. Results showed that processed meats were perceived differently in healthiness, with sausage-type products perceived less healthy than cured meat products. Consumers were in general more uncertain than positive about enriched processed meat but differences existed in terms of the attitudes and purchase intention. Following regression analysis, consumers' purchase intention towards enriched processed meat was primarily driven by their attitudes towards the product concept. Perceived healthiness of existing products and eating frequency of processed meat were also positively associated with the purchase intention. Other factors such as general food choice motives, socio-demographic characteristics, consumer health and the consumption of functional foods and dietary supplements in general, were not significant predictors of the purchase intention for enriched processed meat.
      462
  • Publication
    A Sentence-level Hierarchical BERT Model for Document Classification with Limited Labelled Data
    Training deep learning models with limited labelled data is an attractive scenario for many NLP tasks, including document classification. While with the recent emergence of BERT, deep learning language models can achieve reasonably good performance in document classification with few labelled instances, there is a lack of evidence in the utility of applying BERT-like models on long document classification. This work introduces a long-text-specific model — the Hierarchical BERT Model (HBM) — that learns sentence-level features of the text and works well in scenarios with limited labelled data. Various evaluation experiments have demonstrated that HBM can achieve higher performance in document classification than the previous state-of-the-art methods with only 50 to 200 labelled instances, especially when documents are long. Also, as an extra benefit of HBM, the salient sentences identified by learned HBM are useful as explanations for labelling documents based on a user study.
      5Scopus© Citations 7