Options
One-Class Time Series Classification
Author(s)
Date Issued
2020
Date Available
2022-04-29T15:07:10Z
Abstract
This thesis contributes to the state of the art of time series classification and machine learning by investigating three novel data-driven representations for time series in the context of one-class classification. The one-class assumption is useful for all classification problems where only data of a single class is available for training a classifier, or those where it is not known if novel classes may appear at prediction time or what they could look like. Notable examples that can benefit from our research are: anomaly or novelty detection, fault detection, identity authentication, etc. The common thread of our research is to represent time series as feature-vectors then used for classification. The features we extract are: (1) features constructed using dissimilarity measures; (2) features constructed using an evolutionary algorithm; (3) latent features constructed using neural networks. The proposed representations are thoroughly investigated in a variety of one-class classification experiments involving numerous benchmark methods, the 85 data-sets of the UCR/UEA archive and a data-set provided by ICON plc. The key difference between one-class classification and binary or multi-class classification is in the amount of effort needed to gather training data. Binary and multi-class classifiers require exhaustively labelled training data. This can be difficult for problems where all but the samples of one class are scarcely available and ill-defined, e.g. anomaly detection. Or again, gathering labelled data can simply be impossible due to the cost of expert labour required to construct an appropriate data-set. Conversely, one-class classifiers are trained using only samples from a single class. We present a subject authentication problem through accelerometer data as a case study that motivates our research on one-class time series classification. We argue that it is not realistic to assume we can gather labelled training data that represent well both the subject of interest and a fixed population of "others". Hence, the need to learn a classifier using data related to the subject of interest only. We demonstrate that, with respect to the use of raw time series, feature-based representations allow substantial and compelling savings in terms of storage and computational requirements, facilitate the interpretability of the solutions found, and enable visualisation of time series data-sets. We find that these advantages come at the cost of a slight loss in terms of classification performance with respect to a 1-nearest neighbour classifier on raw data. However, by examining data-sets one by one we detail how our representations can outperform raw time series. Furthermore, for some applications, e.g. embedded systems, storage and computational requirements may be more important than a slight loss in classification performance.
Type of Material
Doctoral Thesis
Publisher
University College Dublin. School of Business
Qualification Name
Ph.D.
Copyright (Published Version)
2020 the Author
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
File(s)
Owning collection
Views
208
Last Month
1
1
Acquisition Date
Apr 18, 2024
Apr 18, 2024
Downloads
310
Last Week
1
1
Last Month
8
8
Acquisition Date
Apr 18, 2024
Apr 18, 2024