Now showing 1 - 10 of 33
  • Publication
    On the Validity of Bayesian Neural Networks for Uncertainty Estimation
    (CEUR Workshop Proceedings, 2019-12-06) ;
    Deep neural networks (DNN) are versatile parametric models utilised successfully in a diverse number of tasks and domains. However, they have limitations—particularly from their lack of robustness and over-sensitivity to out of distribution samples. Bayesian Neural Networks, due to their formulation under the Bayesian framework, provide a principled approach to building neural networks that address these limitations. This work provides an empirical study evaluating and comparing Bayesian Neural Networks to their equivalent point estimate Deep Neural Networks to quantify the predictive uncertainty induced by their parameters, as well as their performance in view of uncertainty. Specifically, we evaluated and compared three point estimate deep neural networks against their alternative comparable Bayesian neural network utilising well-known benchmark image classification datasets.
  • Publication
    Knowing What You Dont Know: Choosing the Right Chart to Show Data Distributions to Non-Expert Users
    An ability to understand the outputs of data analysis is a key characteristic of data literacy and the inclusion of data visualisations is ubiquitous in the output of modern data analysis. Several aspects still remain unresolved, however, on the question of choosing data visualisations that lead viewers to an optimal interpretation of data, especially when audiences have differing degrees of data literacy. In this paper we describe a user study on perception from data visualisations, in which we measured the ability of participants to validate statements about the distributions of data samples visualised using different chart types. We find that histograms are the most suitable chart type for illustrating the distribution of values for a variable. We contrast our findings with previous research in the field, and posit three main issues identified from the study. Most notably, however, we show that viewers struggle to identify scenarios in which a chart simply does not contain enough information to validate a statement about the data that it represents. The results of our study emphasise the importance of using an understanding of the limits of viewers’ data literacy to design charts effectively, and we discuss factors that are crucial to this end.
  • Publication
    Stacked-MLkNN: A stacking based improvement to Multi-Label k-Nearest Neighbours
    Multi-label classification deals with problems where each datapoint can be assigned to more than one class, or label, at the same time. The simplest approach for such problems is to train independent binary classification models for each label and use these models to independently predict a set of relevant labels for a datapoint. MLkNN is an instance-based lazy learning algorithm for multi-label classification that takes this approach. MLkNN, and similar algorithms, however, do not exploit associations which may exist between the set of potential labels. These methods also suffer from imbalance in the frequency of labels in a training dataset. This work attempts to improve the predictions of MLkNN by implementing a two-layer stack-like method, Stacked-MLkNN which exploits the label associations. Experiments show that Stacked-MLkNN produces better predictions than MLkNN and several other state-of-the-art instance-based learning algorithms.
  • Publication
    Extracting Pasture Phenotype and Biomass Percentages using Weakly Supervised Multi-target Deep Learning on a Small Dataset
    The dairy industry uses clover and grass as fodder for cows. Accurate estimation of grass and clover biomass yield enables smart decisions in optimizing fertilization and seeding density, resulting in increased productivity and positive environmental impact. Grass and clover are usually planted together, since clover is a nitrogen-fixing plant that brings nutrients to the soil. Adjusting the right percentages of clover and grass in a field reduces the need for external fertilization. Existing approaches for estimating the grass-clovercomposition of a field are expensive and time consuming—random samples of the pasture are clipped and then the components are physically separated to weigh and calculate percentages of dry grass, clover and weeds in each sample. There is growing interest in developing novel deep learning based approaches to nondestructively extract pasture phenotype indicators and biomass yield predictions of different plant species from agricultural imagery collected from the field. Providing these indicators and predictions from images alone remains a significant challenge. Heavy occlusions in the dense mixture of grass, clover and weeds make it difficult to estimate each component accurately. Moreover, although supervised deep learning models perform well with large datasets, it is tedious to acquire large and diverse collections of field images with precise ground truth for different biomass yields. In this paper, we demonstrate that applying data augmentation and transfer learning is effective in predicting multi-target biomass percentages of different plant species, even with a small training dataset. The scheme proposed in this paper used a training set of only 261 images and provided predictions of biomass percentages of grass, clover, white clover, red clover, and weeds with mean absolute error (MAE) of 6.77%, 6.92%, 6.21%, 6.89%, and 4.80% respectively. Evaluation and testing were performed on a publicly available dataset provided by the Biomass Prediction Challenge [Skovsen et al., 2019]. These results lay the foundation for our next set of experiments with semi-supervised learning to improve the benchmarks and will further the quest to identify phenotype characteristics from imagery in a non-destructive way.
  • Publication
    Degree Centrality and the Probability of an Infectious Disease Outbreak in Towns within a Region Outbreak in Towns within a Region
    Agent-based models can be used to help study the spread of infectious diseases within a population. As no individual town is in isolation, commuting patterns into and out of a town or city are a vital part of understanding the course of an outbreak within a town. Thus the centrality of a town in a network of towns, such as a county or an entire country, should be an important influence on an outbreak. We propose looking at the probability that an outbreak enters a given town in a region and comparing that probability to the centrality of the town. Our results show that as expected there is a relationship between centrality and outbreaks. Specifically, we found that the degree of centrality of a town affected the likelihood of an outbreak within the network spreading to the town. We also found that for towns where an outbreak begins the degree of centrality of the town affects how the outbreak spreads in the network.
  • Publication
    Reformulation Strategies of Repeated References in the Context of Robot Perception Errors in Situated Dialogue
    We performed an experiment in which human participants interacted through a natural language dialogue interface with a simulated robot to fulfil a series of object manipulation tasks. We introduced errors into the robot’s perception, and observed the resulting problems in the dialogues and their resolutions. We then introduced different methods for the user to request information about the robot’s understanding of the environment. In this work, we describe the effects that the robot’s perceptual errors and the information request options available to the participant had on the reformulation of the referring expressions the participants used when resolving a unsuccessful reference.
  • Publication
    Extending Jensen Shannon Divergence to Compare Multiple Corpora
    Investigating public discourse on social media platforms has proven a viable way to reflect the impacts of political issues. In this paper we frame this as a corpus comparison problem in which the online discussion of different groups are treated as different corpora to be compared. We propose an extended version of the Jensen-Shannon divergence measure to compare multiple corpora and use the FP-growth algorithm to mix unigrams and bigrams in this comparison. We also propose a set of visualizations that can illustrate the results of this analysis. To demonstrate these approaches we compare the Twitter discourse surrounding Brexit in Ireland and Great Britain across a 14 week time period.
  • Publication
    Advanced Flight Efficiency Key Performance Indicators to support Air Traffic Analytics: Assessment of European flight efficiency using ADS-B data
    Flight efficiency is of great concern in the Air Traffic Management (ATM) community since today’s ATM inefficiencies affect both airspace users (AUs) and Air Navigation Service Providers (ANSPs). Each actor has their own vision of flight efficiency: whereas airlines are concerned mainly with aspects that impact their business strategy (fuel consumption, schedule adherence and cost), ANSPs consider other aspects such as sector capacity, Air Traffic Controller (ATC) interventions, emissions and noise. Capturing both visions in new Key Performance Indicators (KPIs) is important to take new steps towards more sustainable air traffic operations. The current standard KPI used to measure flight efficiency is the “horizontal flight efficiency”, which measures the horizontal excess enroute distance compared to the orthodromic distance. This view of efficiency is very limited since it doesn’t take into account other sources of inefficiencies, namely meteorological conditions or the vertical profile of the flight, that have a big impact on the AUs operational objectives. Therefore, advanced metrics are being developed to include these objectives in the assessment of efficiency and to analyse how the inefficiencies are distributed among them, as well as new methodologies to calculate these advanced KPIs in real time. This paper presents a consolidated set of advanced user-centric cost-based efficiency and equity indicators which address different aspects of efficiency such as the horizontal and vertical component, fuel consumption or cost of the flight, thus introducing the airspace user’s viewpoint into consideration. Also, the methodology followed for the calculation of the indicators, based on historical data and in real time, is demonstrated. For the evaluation of the indicators, Automatic Dependent Surveillance-Broadcast (ADS-B) data and a set of user-preferred trajectories (including flight plan, optimal cost and optimal distance) as reference are used. Finally, a flight efficiency and equity assessment of the European traffic flow for three different scenarios is presented, where two whole days of air traffic in the European Civil Aviation Conference (ECAC) area were used for the efficiency indicators, and one month of traffic for specific city pairs was used for the equity indicators. This proves the added value of these newly introduced indicators, showing that different indicators account for different sources of inefficiencies, and that the use of ADS-B data could serve as a reliable source for performance monitoring
      586Scopus© Citations 3
  • Publication
    COVID-19 modelling by time-varying transmission rate associated with mobility trend of driving via Apple Maps
    Compartment-based infectious disease models that consider the transmission rate (or contact rate) as a constant during the course of an epidemic can be limiting regarding effective capture of the dynamics of infectious disease. This study proposed a novel approach based on a dynamic time-varying transmission rate with a control rate governing the speed of disease spread, which may be associated with the information related to infectious disease intervention. Integration of multiple sources of data with disease modelling has the potential to improve modelling performance. Taking the global mobility trend of vehicle driving available via Apple Maps as an example, this study explored different ways of processing the mobility trend data and investigated their relationship with the control rate. The proposed method was evaluated based on COVID-19 data from six European countries. The results suggest that the proposed model with dynamic transmission rate improved the performance of model fitting and forecasting during the early stage of the pandemic. Positive correlation has been found between the average daily change of mobility trend and control rate. The results encourage further development for incorporation of multiple resources into infectious disease modelling in the future.
      105Scopus© Citations 10
  • Publication
    Valve Health Identification Using Sensors and Machine Learning Methods
    Predictive maintenance models attempt to identify developing issues with industrial equipment before they become critical. In this paper, we describe both supervised and unsupervised approaches to predictive maintenance for subsea valves in the oil and gas industry. The supervised approach is appropriate for valves for which a long history of operation along with manual assessments of the state of the valves exists, while the unsupervised approach is suitable to address the cold start problem when new valves, for which we do not have an operational history, come online. For the supervised prediction problem, we attempt to distinguish between healthy and unhealthy valve actuators using sensor data measuring hydraulic pressures and flows during valve opening and closing events. Unlike previous approaches that solely rely on raw sensor data, we derive frequency and time domain features, and experiment with a range of classification algorithms and different feature subsets. The performing models for the supervised approach were discovered to be Adaboost and Random Forest ensembles. In the unsupervised approach, the goal is to detect sudden abrupt changes in valve behaviour by comparing the sensor readings from consecutive opening or closing events. Our novel methodology doing this essentially works by comparing the sequences of sensor readings captured during these events using both raw sensor readings, as well as normalised and first derivative versions of the sequences. We evaluate the effectiveness of a number of well-known time series similarity measures and find that using discrete Frechet distance or dynamic time warping leads to the best results, with the Bray-Curtis similarity measure leading to only marginally poorer change detection but requiring considerably less computational effort.