Options
MacNamee, Brian
Preferred name
MacNamee, Brian
Official Name
MacNamee, Brian
Research Output
Now showing 1 - 10 of 31
- PublicationExtending Jensen Shannon Divergence to Compare Multiple CorporaInvestigating public discourse on social media platforms has proven a viable way to reflect the impacts of political issues. In this paper we frame this as a corpus comparison problem in which the online discussion of different groups are treated as different corpora to be compared. We propose an extended version of the Jensen-Shannon divergence measure to compare multiple corpora and use the FP-growth algorithm to mix unigrams and bigrams in this comparison. We also propose a set of visualizations that can illustrate the results of this analysis. To demonstrate these approaches we compare the Twitter discourse surrounding Brexit in Ireland and Great Britain across a 14 week time period.
111 - PublicationSynthetic Dataset Generation for Online Topic ModelingOnline topic modeling allows for the discovery of the underlying latent structure in a real time stream of data. In the evaluation of such approaches it is common that a static value for the number of topics is chosen. However, we would expect the number of topics to vary over time due to changes in the underlying structure of the data, known as concept drift and concept shift. We propose a semi-synthetic dataset generator, which can introduce concept drift and concept shift into existing annotated non-temporal datasets, via user-controlled paramaterization. This allows for the creation of multiple different artificial streams of data, where the “correct” number and composition of the topics is known at each point in time. We demonstrate how these generated datasets can be used as an evaluation strategy for online topic modeling approaches.
195 - PublicationGraphical Perception of Value Distributions: An Evaluation of Non-Expert Viewers Data LiteracyAn ability to understand the outputs of data analysis is a key characteristic of data literacy and the inclusion of data visualisations is ubiquitous in the output of modern data analysis. Several aspects still remain unresolved, however, on the question of choosing data visualisations that lead viewers to an optimal interpretation of data. This is especially true when audiences have differing degrees of data literacy, and when the aim is to make sure that members of a community, who may differ on background and expertise, will make similar interpretations from data visualisations. In this paper we describe two user studies on perception from data visualisations, in which we measured the ability of participants to validate statements about the distributions of data samples visualised using different chart types. In the first user study, we find that histograms are the most suitable chart type for illustrating the distribution of values for a variable. We contrast our findings with previous research in the field, and posit three main issues identified from the study. Most notably, however,we show that viewers struggle to identify scenarios in which a chart simply does not contain enough information to validate a statement about the data that it represents. In the follow-up study, we ask viewers questions about quantification of frequencies, and identification of most frequent values from different types of histograms and density traces showing one or two distributions of values.This study reveals that viewers do better with histograms when they need to quantify the values displayed in a chart. Among the different types of histograms, interspersing the bars of two distributions in a histogram leads to the most accurate perception. Even though interspersing bars makes them thinner, the advantage of having both distributions clearly visible pays off. The findings of these user studies provide insight to assist designers in creating optimal charts that enable comparison of distributions, and emphasise the importance of using an understanding of the limits of viewers data literacy to design charts effectively.
167 - PublicationOn the Validity of Bayesian Neural Networks for Uncertainty EstimationDeep neural networks (DNN) are versatile parametric models utilised successfully in a diverse number of tasks and domains. However, they have limitations—particularly from their lack of robustness and over-sensitivity to out of distribution samples. Bayesian Neural Networks, due to their formulation under the Bayesian framework, provide a principled approach to building neural networks that address these limitations. This work provides an empirical study evaluating and comparing Bayesian Neural Networks to their equivalent point estimate Deep Neural Networks to quantify the predictive uncertainty induced by their parameters, as well as their performance in view of uncertainty. Specifically, we evaluated and compared three point estimate deep neural networks against their alternative comparable Bayesian neural network utilising well-known benchmark image classification datasets.
70 - PublicationA Categorisation of Post-hoc Explanations for Predictive Models(Association for the Advancement of Artificial Intelligence, 2019-03-27)
; The ubiquity of machine learning based predictive models inmodern society naturally leads people to ask how trustworthythose models are? In predictive modeling, it is quite commonto induce a trade-off between accuracy and interpretability.For instance, doctors would like to know how effective sometreatment will be for a patient or why the model suggesteda particular medication for a patient exhibiting those symptoms? We acknowledge that the necessity for interpretabilityis a consequence of an incomplete formalisation of the prob-lem, or more precisely of multiple meanings adhered to a par-ticular concept. For certain problems, it is not enough to getthe answer (what), the model also has to provide an expla-nation of how it came to that conclusion (why), because acorrect prediction, only partially solves the original problem.In this article we extend existing categorisation of techniquesto aid model interpretability and test this categorisation64 - PublicationExtracting Pasture Phenotype and Biomass Percentages using Weakly Supervised Multi-target Deep Learning on a Small Dataset(2020-08-31)
; ; ; ; The dairy industry uses clover and grass as fodder for cows. Accurate estimation of grass and clover biomass yield enables smart decisions in optimizing fertilization and seeding density, resulting in increased productivity and positive environmental impact. Grass and clover are usually planted together, since clover is a nitrogen-fixing plant that brings nutrients to the soil. Adjusting the right percentages of clover and grass in a field reduces the need for external fertilization. Existing approaches for estimating the grass-clovercomposition of a field are expensive and time consuming—random samples of the pasture are clipped and then the components are physically separated to weigh and calculate percentages of dry grass, clover and weeds in each sample. There is growing interest in developing novel deep learning based approaches to nondestructively extract pasture phenotype indicators and biomass yield predictions of different plant species from agricultural imagery collected from the field. Providing these indicators and predictions from images alone remains a significant challenge. Heavy occlusions in the dense mixture of grass, clover and weeds make it difficult to estimate each component accurately. Moreover, although supervised deep learning models perform well with large datasets, it is tedious to acquire large and diverse collections of field images with precise ground truth for different biomass yields. In this paper, we demonstrate that applying data augmentation and transfer learning is effective in predicting multi-target biomass percentages of different plant species, even with a small training dataset. The scheme proposed in this paper used a training set of only 261 images and provided predictions of biomass percentages of grass, clover, white clover, red clover, and weeds with mean absolute error (MAE) of 6.77%, 6.92%, 6.21%, 6.89%, and 4.80% respectively. Evaluation and testing were performed on a publicly available dataset provided by the Biomass Prediction Challenge [Skovsen et al., 2019]. These results lay the foundation for our next set of experiments with semi-supervised learning to improve the benchmarks and will further the quest to identify phenotype characteristics from imagery in a non-destructive way.60 - PublicationUsing Icicle Trees to Encode the Hierarchical Structure of Source Code(Eurographics: European Association for Computer Graphics, 2016-06-10)
; ; This paper presents a study which evaluates the use of a tree visualisation (icicle tree) to encode the hierarchical structure of source code. The tree visualisation was combined with a source code editor in order to function as a compact overview to facilitate the process of comprehending the global structure of a source code document. Results from our study show that providing an overview visualisation led to an increase in accuracy and a decrease in completion time when participants performed counting tasks. However, in locating tasks, the presence of the visualisation led to a decrease in participants' performance.176 - PublicationReformulation Strategies of Repeated References in the Context of Robot Perception Errors in Situated Dialogue(2015-10-02)
; ; We performed an experiment in which human participants interacted through a natural language dialogue interface with a simulated robot to fulfil a series of object manipulation tasks. We introduced errors into the robot’s perception, and observed the resulting problems in the dialogues and their resolutions. We then introduced different methods for the user to request information about the robot’s understanding of the environment. In this work, we describe the effects that the robot’s perceptual errors and the information request options available to the participant had on the reformulation of the referring expressions the participants used when resolving a unsuccessful reference.111 - PublicationBenchmarking Multi-label Classification AlgorithmsMulti-label classification is an approach to classification prob- lems that allows each data point to be assigned to more than one class at the same time. Real life machine learning problems are often multi-label in nature—for example image labelling, topic identification in texts, and gene expression prediction. Many multi-label classification algorithms have been proposed in the literature and, although there have been some benchmarking experiments, many questions still remain about which ap- proaches perform best for certain kinds of multi-label datasets. This pa- per presents a comprehensive benchmark experiment of eleven multi- label classification algorithms on eleven different datasets. Unlike many existing studies, we perform detailed parameter tuning for each algorithm- dataset pair so as to allow a fair comparative analysis of the algorithms. Also, we report on a preliminary experiment which seeks to understand how the performance of different multi-label classification algorithms changes as the characteristics of multi-label datasets are adjusted.
510 - PublicationKnowing What You Dont Know: Choosing the Right Chart to Show Data Distributions to Non-Expert UsersAn ability to understand the outputs of data analysis is a key characteristic of data literacy and the inclusion of data visualisations is ubiquitous in the output of modern data analysis. Several aspects still remain unresolved, however, on the question of choosing data visualisations that lead viewers to an optimal interpretation of data, especially when audiences have differing degrees of data literacy. In this paper we describe a user study on perception from data visualisations, in which we measured the ability of participants to validate statements about the distributions of data samples visualised using different chart types. We find that histograms are the most suitable chart type for illustrating the distribution of values for a variable. We contrast our findings with previous research in the field, and posit three main issues identified from the study. Most notably, however, we show that viewers struggle to identify scenarios in which a chart simply does not contain enough information to validate a statement about the data that it represents. The results of our study emphasise the importance of using an understanding of the limits of viewers’ data literacy to design charts effectively, and we discuss factors that are crucial to this end.
206