Now showing 1 - 8 of 8
  • Publication
    Knowing What You Dont Know: Choosing the Right Chart to Show Data Distributions to Non-Expert Users
    An ability to understand the outputs of data analysis is a key characteristic of data literacy and the inclusion of data visualisations is ubiquitous in the output of modern data analysis. Several aspects still remain unresolved, however, on the question of choosing data visualisations that lead viewers to an optimal interpretation of data, especially when audiences have differing degrees of data literacy. In this paper we describe a user study on perception from data visualisations, in which we measured the ability of participants to validate statements about the distributions of data samples visualised using different chart types. We find that histograms are the most suitable chart type for illustrating the distribution of values for a variable. We contrast our findings with previous research in the field, and posit three main issues identified from the study. Most notably, however, we show that viewers struggle to identify scenarios in which a chart simply does not contain enough information to validate a statement about the data that it represents. The results of our study emphasise the importance of using an understanding of the limits of viewers’ data literacy to design charts effectively, and we discuss factors that are crucial to this end.
      249
  • Publication
    Robot perception errors and human resolution strategies in situated human-robot dialogue
    (Taylor and Francis, 2017-01) ; ;
    We performed an experiment in which human participants interacted through a natural language dialogue interface with a simulated robot to fulfil a series of object manipulation tasks. We introduced errors into the robot’s perception, and observed the resulting problems in the dialogues and their resolutions. We then introduced different methods for the user to request information about the robot’s understanding of the environment. We quantify the impact of perception errors on the dialogues, and investigate resolution attempts by users at a structural level and at the level of referring expressions.
      532Scopus© Citations 11
  • Publication
    Evaluating Citywide Bus Service Reliability Using Noisy GPS Data
    (IEEE, 2017-09-17) ;
    AbstractAn increasing number of people use smartphoneapplications to plan their trips. Unfortunately, for variousreasons, bus trips suggested by such applications are not asreliable as other trip types (e.g. by car, on foot, or by bicycle),which can result in excessive waiting time, or even the needto revise a planned trip. Traditional punctuality-based busservice reliability metrics do not capture route deviations, whichare especially frequent in rapid changing urban environmentsdue to rapidly changing road conditions caused by trafficcongestion, road maintenance, etc. The prevalence of GPS dataallows buses to be tracked and route deviations to be captured.We use such data to propose and calculate a novel reliabilityscore for bus trips. This score is a linear weighted combinationof distance, time, and speed deviations from an expected, predefinedbus trip. GPS trajectory data is large and noisy whichmakes it challenging to process. This paper also presents anefficient framework that can de-noise and semantically splitraw GPS data by pre-defined bus trips in citywide. Finally,the paper presents a comparative case study that applies theproposed reliability score to publicly available open bus datafrom Rio de Janeiro in Brazil and Dublin in Ireland.
    Scopus© Citations 2  640
  • Publication
    Ensemble Topic Modeling via Matrix Factorization
    (CEUR Workshop Proceedings, 2016-09-21) ; ;
    Topic models can provide us with an insight into the underlying latent structure of a large corpus of documents, facilitating knowledge discovery and information summarization. A range of methods have been proposed in the literature, including probabilistic topic models and techniques based on matrix factorization. However, these methods tend to have stochastic elements in their initialization, which can lead to their output being unstable. That is, if a topic modeling algorithm is applied to the same data multiple times, the output will not necessarily always be the same. With this idea of stability in mind we ask the question – how can we produce a definitive topic model that is both stable and accurate? To address this, we propose a new ensemble topic modeling method, based on Non-negative Matrix Factorization (NMF), which combines a collection of unstable topic models to produce a definitive output. We evaluate this method on an annotated tweet corpus, where we show that this new approach is more accurate and stable than traditional NMF.
      352
  • Publication
    Graphical Perception of Value Distributions: An Evaluation of Non-Expert Viewers Data Literacy
    (Journal of Community Informatics, 2016-06-06) ;
    An ability to understand the outputs of data analysis is a key characteristic of data literacy and the inclusion of data visualisations is ubiquitous in the output of modern data analysis. Several aspects still remain unresolved, however, on the question of choosing data visualisations that lead viewers to an optimal interpretation of data. This is especially true when audiences have differing degrees of data literacy, and when the aim is to make sure that members of a community, who may differ on background and expertise, will make similar interpretations from data visualisations. In this paper we describe two user studies on perception from data visualisations, in which we measured the ability of participants to validate statements about the distributions of data samples visualised using different chart types. In the first user study, we find that histograms are the most suitable chart type for illustrating the distribution of values for a variable. We contrast our findings with previous research in the field, and posit three main issues identified from the study. Most notably, however,we show that viewers struggle to identify scenarios in which a chart simply does not contain enough information to validate a statement about the data that it represents. In the follow-up study, we ask viewers questions about quantification of frequencies, and identification of most frequent values from different types of histograms and density traces showing one or two distributions of values.This study reveals that viewers do better with histograms when they need to quantify the values displayed in a chart. Among the different types of histograms, interspersing the bars of two distributions in a histogram leads to the most accurate perception. Even though interspersing bars makes them thinner, the advantage of having both distributions clearly visible pays off. The findings of these user studies provide insight to assist designers in creating optimal charts that enable comparison of distributions, and emphasise the importance of using an understanding of the limits of viewers data literacy to design charts effectively.
      243
  • Publication
    Benchmarking Multi-label Classification Algorithms
    (CEUR Workshop Proceedings, 2016-09-21) ; ;
    Multi-label classification is an approach to classification prob- lems that allows each data point to be assigned to more than one class at the same time. Real life machine learning problems are often multi-label in nature—for example image labelling, topic identification in texts, and gene expression prediction. Many multi-label classification algorithms have been proposed in the literature and, although there have been some benchmarking experiments, many questions still remain about which ap- proaches perform best for certain kinds of multi-label datasets. This pa- per presents a comprehensive benchmark experiment of eleven multi- label classification algorithms on eleven different datasets. Unlike many existing studies, we perform detailed parameter tuning for each algorithm- dataset pair so as to allow a fair comparative analysis of the algorithms. Also, we report on a preliminary experiment which seeks to understand how the performance of different multi-label classification algorithms changes as the characteristics of multi-label datasets are adjusted.
      685
  • Publication
    Using Icicle Trees to Encode the Hierarchical Structure of Source Code
    (Eurographics: European Association for Computer Graphics, 2016-06-10) ; ;
    This paper presents a study which evaluates the use of a tree visualisation (icicle tree) to encode the hierarchical structure of source code. The tree visualisation was combined with a source code editor in order to function as a compact overview to facilitate the process of comprehending the global structure of a source code document. Results from our study show that providing an overview visualisation led to an increase in accuracy and a decrease in completion time when participants performed counting tasks. However, in locating tasks, the presence of the visualisation led to a decrease in participants' performance.
      227
  • Publication
    Reformulation Strategies of Repeated References in the Context of Robot Perception Errors in Situated Dialogue
    We performed an experiment in which human participants interacted through a natural language dialogue interface with a simulated robot to fulfil a series of object manipulation tasks. We introduced errors into the robot’s perception, and observed the resulting problems in the dialogues and their resolutions. We then introduced different methods for the user to request information about the robot’s understanding of the environment. In this work, we describe the effects that the robot’s perceptual errors and the information request options available to the participant had on the reformulation of the referring expressions the participants used when resolving a unsuccessful reference.
      175