Now showing 1 - 10 of 47
  • Publication
    Identifying Urban Canopy Coverage from Satellite Imagery Using Convolutional Neural Networks
    (CEUR Workshop Proceedings, 2018-12-07) ; ;
    The availability of high resolution satellite imagery offers a compelling opportunity for the utilisation of state-of-the-art deep learning techniques in the applications of remote sensing. This research investigates the application of different Convolution Neural Network (CNN) architectures for pixel-level segmentation of canopy coverage in urban areas. The performance of two established patch-based CNN architectures (LeNet and a pre-trained VGG16) and two encoder-decoder architectures (a simple 4-layer convolutional encoder-decoder and Unet) was compared using two datasets (a large set of images of the Geerman town of Vaihingen and smaller set of the US city of Denver). Results show that the patch-based methods outperform the encoder-decoder methods. It is also shown that pre-training is only effective with the smaller dataset.
      19
  • Publication
    Ensemble Topic Modeling via Matrix Factorization
    (CEUR Workshop Proceedings, 2016-09-21) ; ;
    Topic models can provide us with an insight into the underlying latent structure of a large corpus of documents, facilitating knowledge discovery and information summarization. A range of methods have been proposed in the literature, including probabilistic topic models and techniques based on matrix factorization. However, these methods tend to have stochastic elements in their initialization, which can lead to their output being unstable. That is, if a topic modeling algorithm is applied to the same data multiple times, the output will not necessarily always be the same. With this idea of stability in mind we ask the question – how can we produce a definitive topic model that is both stable and accurate? To address this, we propose a new ensemble topic modeling method, based on Non-negative Matrix Factorization (NMF), which combines a collection of unstable topic models to produce a definitive output. We evaluate this method on an annotated tweet corpus, where we show that this new approach is more accurate and stable than traditional NMF.
      356
  • Publication
    COVID-19 modelling by time-varying transmission rate associated with mobility trend of driving via Apple Maps
    Compartment-based infectious disease models that consider the transmission rate (or contact rate) as a constant during the course of an epidemic can be limiting regarding effective capture of the dynamics of infectious disease. This study proposed a novel approach based on a dynamic time-varying transmission rate with a control rate governing the speed of disease spread, which may be associated with the information related to infectious disease intervention. Integration of multiple sources of data with disease modelling has the potential to improve modelling performance. Taking the global mobility trend of vehicle driving available via Apple Maps as an example, this study explored different ways of processing the mobility trend data and investigated their relationship with the control rate. The proposed method was evaluated based on COVID-19 data from six European countries. The results suggest that the proposed model with dynamic transmission rate improved the performance of model fitting and forecasting during the early stage of the pandemic. Positive correlation has been found between the average daily change of mobility trend and control rate. The results encourage further development for incorporation of multiple resources into infectious disease modelling in the future.
      129Scopus© Citations 12
  • Publication
    On the Validity of Bayesian Neural Networks for Uncertainty Estimation
    (CEUR Workshop Proceedings, 2019-12-06) ;
    Deep neural networks (DNN) are versatile parametric models utilised successfully in a diverse number of tasks and domains. However, they have limitations—particularly from their lack of robustness and over-sensitivity to out of distribution samples. Bayesian Neural Networks, due to their formulation under the Bayesian framework, provide a principled approach to building neural networks that address these limitations. This work provides an empirical study evaluating and comparing Bayesian Neural Networks to their equivalent point estimate Deep Neural Networks to quantify the predictive uncertainty induced by their parameters, as well as their performance in view of uncertainty. Specifically, we evaluated and compared three point estimate deep neural networks against their alternative comparable Bayesian neural network utilising well-known benchmark image classification datasets.
      174
  • Publication
    Stacked-MLkNN: A stacking based improvement to Multi-Label k-Nearest Neighbours
    Multi-label classification deals with problems where each datapoint can be assigned to more than one class, or label, at the same time. The simplest approach for such problems is to train independent binary classification models for each label and use these models to independently predict a set of relevant labels for a datapoint. MLkNN is an instance-based lazy learning algorithm for multi-label classification that takes this approach. MLkNN, and similar algorithms, however, do not exploit associations which may exist between the set of potential labels. These methods also suffer from imbalance in the frequency of labels in a training dataset. This work attempts to improve the predictions of MLkNN by implementing a two-layer stack-like method, Stacked-MLkNN which exploits the label associations. Experiments show that Stacked-MLkNN produces better predictions than MLkNN and several other state-of-the-art instance-based learning algorithms.
      155
  • Publication
    Knowing What You Dont Know: Choosing the Right Chart to Show Data Distributions to Non-Expert Users
    An ability to understand the outputs of data analysis is a key characteristic of data literacy and the inclusion of data visualisations is ubiquitous in the output of modern data analysis. Several aspects still remain unresolved, however, on the question of choosing data visualisations that lead viewers to an optimal interpretation of data, especially when audiences have differing degrees of data literacy. In this paper we describe a user study on perception from data visualisations, in which we measured the ability of participants to validate statements about the distributions of data samples visualised using different chart types. We find that histograms are the most suitable chart type for illustrating the distribution of values for a variable. We contrast our findings with previous research in the field, and posit three main issues identified from the study. Most notably, however, we show that viewers struggle to identify scenarios in which a chart simply does not contain enough information to validate a statement about the data that it represents. The results of our study emphasise the importance of using an understanding of the limits of viewers’ data literacy to design charts effectively, and we discuss factors that are crucial to this end.
      250
  • Publication
    A Comparison of Bayesian Deep Learning for Out of Distribution Detection and Uncertainty Estimation
    Deep neural networks have been successful in diverse discriminitive classification tasks. Despite their good prediction performance, they are poorly calibrated– i.e., often assigns high confidence to misclassified predictions. Potential consequences could lead to trustworthiness and accountability of models deployed in real applications, where predictions are evaluated based on their confidence scores. In this work we propose to validate and test the efficacy of likelihood based models in the task of out-of-distribution (OoD) detection. On different datasets and metrics we show that Bayesian deep learning models on certain occasions marginally outperform conventional neural networks and in the event of minimal overlap between in/out distribution classes, even the best models exhibit a reduction in AUC scores. Preliminary investigations indicate the potential inherent role of bias due to choices of initialisation, architecture or activation functions.
      13
  • Publication
    Benchmarking Multi-label Classification Algorithms
    (CEUR Workshop Proceedings, 2016-09-21) ; ;
    Multi-label classification is an approach to classification prob- lems that allows each data point to be assigned to more than one class at the same time. Real life machine learning problems are often multi-label in nature—for example image labelling, topic identification in texts, and gene expression prediction. Many multi-label classification algorithms have been proposed in the literature and, although there have been some benchmarking experiments, many questions still remain about which ap- proaches perform best for certain kinds of multi-label datasets. This pa- per presents a comprehensive benchmark experiment of eleven multi- label classification algorithms on eleven different datasets. Unlike many existing studies, we perform detailed parameter tuning for each algorithm- dataset pair so as to allow a fair comparative analysis of the algorithms. Also, we report on a preliminary experiment which seeks to understand how the performance of different multi-label classification algorithms changes as the characteristics of multi-label datasets are adjusted.
      693
  • Publication
    Extracting Pasture Phenotype and Biomass Percentages using Weakly Supervised Multi-target Deep Learning on a Small Dataset
    The dairy industry uses clover and grass as fodder for cows. Accurate estimation of grass and clover biomass yield enables smart decisions in optimizing fertilization and seeding density, resulting in increased productivity and positive environmental impact. Grass and clover are usually planted together, since clover is a nitrogen-fixing plant that brings nutrients to the soil. Adjusting the right percentages of clover and grass in a field reduces the need for external fertilization. Existing approaches for estimating the grass-clovercomposition of a field are expensive and time consuming—random samples of the pasture are clipped and then the components are physically separated to weigh and calculate percentages of dry grass, clover and weeds in each sample. There is growing interest in developing novel deep learning based approaches to nondestructively extract pasture phenotype indicators and biomass yield predictions of different plant species from agricultural imagery collected from the field. Providing these indicators and predictions from images alone remains a significant challenge. Heavy occlusions in the dense mixture of grass, clover and weeds make it difficult to estimate each component accurately. Moreover, although supervised deep learning models perform well with large datasets, it is tedious to acquire large and diverse collections of field images with precise ground truth for different biomass yields. In this paper, we demonstrate that applying data augmentation and transfer learning is effective in predicting multi-target biomass percentages of different plant species, even with a small training dataset. The scheme proposed in this paper used a training set of only 261 images and provided predictions of biomass percentages of grass, clover, white clover, red clover, and weeds with mean absolute error (MAE) of 6.77%, 6.92%, 6.21%, 6.89%, and 4.80% respectively. Evaluation and testing were performed on a publicly available dataset provided by the Biomass Prediction Challenge [Skovsen et al., 2019]. These results lay the foundation for our next set of experiments with semi-supervised learning to improve the benchmarks and will further the quest to identify phenotype characteristics from imagery in a non-destructive way.
      174
  • Publication
    Degree Centrality and the Probability of an Infectious Disease Outbreak in Towns within a Region Outbreak in Towns within a Region
    Agent-based models can be used to help study the spread of infectious diseases within a population. As no individual town is in isolation, commuting patterns into and out of a town or city are a vital part of understanding the course of an outbreak within a town. Thus the centrality of a town in a network of towns, such as a county or an entire country, should be an important influence on an outbreak. We propose looking at the probability that an outbreak enters a given town in a region and comparing that probability to the centrality of the town. Our results show that as expected there is a relationship between centrality and outbreaks. Specifically, we found that the degree of centrality of a town affected the likelihood of an outbreak within the network spreading to the town. We also found that for towns where an outbreak begins the degree of centrality of the town affects how the outbreak spreads in the network.
      137