Computer Science and Informatics Technical Reports
Permanent URI for this collection
A techincal report series created to coincide with the launch of the UCD School of Computer Science and Informatics. These can be downloaded freely. Queries about the technical report series should be addressed to Alexey Lastovetsky.
Please see original homepage, archived at: https://web.archive.org/web/20080226040105/http:/csiweb.ucd.ie/Research/TechnicalReports.html.
Browse
Browsing Computer Science and Informatics Technical Reports by Issue Date
Now showing 1 - 20 of 52
Results Per Page
Sort Options
- PublicationThe maximal segment sum revisited(University College Dublin. School of Computer Science and Informatics, 2004-04)The maximum segment sum is a well known problem. In this report we revisit the problem and focus on the algebraic properties of the operators which we exploit in constructing the solution. This allows us to produce a generic algorithm which solves a family of problems of which the maximum segment sum is just one instance. At the end we list a number of other instances.
23 - PublicationA Non-Intrusive and Incremental Approach to Enabling Direct Communications in RPC-based Grid Programming Systems(University College Dublin. School of Computer Science and Informatics, 2005-04)This paper advocates a non-intrusive and incremental approach to enabling existing Grid programming systems with new features. In particular, it presents a software component enabling NetSolve applications with direct communications between remote tasks. The software component is a supplementary one working on the top of the basic NetSolve system. Its design also allows remote tasks to be freely mixed in a single application, independent on whether each particular task is enabled for direct communications or not. Experiments with this software are also presented.
55 - PublicationA Parallel Algorithm for the Solution of the Deconvolution Problem in Heterogeneous Networks(University College Dublin. School of Computer Science and Informatics, 2005-12-19)
; ; In this work we present two parallel algorithms for the solution of a given least squares problem with structured matrices. This problem arises in many applications most related to digital signal processing, an example is given. Both parallel algorithms have been designed to speed–up the sequential one in a heterogeneous network of computers. They differ from the approximation followed to implement parallel algorithms on heterogeneous networks of computers known as HeHo and HoHe strategies. However, our study goes beyond the practical usefulness of our heterogeneous parallel application. One one hand, the results obtained validates the recent developed HeteroMPI as a very useful tool for programming heterogeneous parallel algorithms. On the other hand, although HeteroMPI has initially been designed to apply the HeHo strategy, we propose a way this tool can be used in the HoHe strategy. Pros and cons of the use of HeteroMPI for both strategies will be deeply study through the application example.60 - PublicationAdding real time into state machine analysis of digital evidenceThis report describes an extension of the finite state machine theory of digital event reconstruction. The proposed extension adds the known times of witness observations as formal objects in the theory and uses them to compute temporal bounds of events, whose time is not known.
21 - PublicationProtein Backbone Angle Prediction in Multidimensional φ-ψ Space(University College Dublin. School of Computer Science and Informatics, 2006-01-20)
; ; A significant step towards establishing the structure and function of a protein is the prediction of the local conformation of the polypeptide chain. In this article we present systems for the prediction of 3 new alphabets of local structural motifs. The motifs are built by applying multidimensional scaling (MDS) and clustering to pair-wise angular distances for multiple φ-ψ angle values collected from high-resolution protein structures. The predictive systems, based on ensembles of bidirectional recurrent neural network architectures, and trained on a large non-redundant set of protein structures, achieve 72%, 66% and 60% correct structural motif prediction on an independent test set for di-peptides (6 classes), tripeptides (8 classes) and tetra-peptides (14 classes), respectively, 28-30% above base-line statistical predictors. To demonstrate that structural motif predictions contain relevant structural information, we build a further system, based on ensembles of two-layered bidirectional recurrent neural networks, to map structural motif predictions into traditional 3-class (helix, strand, coil) secondary structure. This system achieves 79.5% correct prediction using the “hard” CASP 3-class assignment, and 81.4% with a more lenient assignment, outperforming a sophisticated state-of-the-art predictor (Porter) trained in the same experimental conditions. All the predictive systems will be provided free of charge to academic users and made publicly available at the address http://distill.ucd.ie/.15 - PublicationModeling Performance of Many-to-One Collective Communication Operations in Heterogeneous Clusters(University College Dublin. School of Computer Science and Informatics, 2006-05-30)
; ; This paper presents a performance model for Many-to-One type communications on a dedicated heterogeneous cluster of workstations based on a switched Ethernet network. This study finds that Many-to-One communication is more complex than One-to-Many and Point-to-Point communications as it does not show a linear or even continuous dependence of the execution time on message sizes. It displays a very high jump in execution time for a significant range of message sizes. As a result, the proposed model is divided into three parts. The first part is for small sized messages whose model is linear, the second part models the congestion region, and the last part is for large message sizes where linearity resumes. The proposed model is validated for accuracy by the experiments on various platforms with different MPI implementations.17 - PublicationFeatureless Similarity(University College Dublin. School of Computer Science and Informatics, 2007-02-23)
; Assessing the similarity between cases is a key aspect of the retrieval phase in Case-Based Reasoning (CBR). In most CBR work, similarity is assessed based on feature-value descriptions of cases using similarity metrics which use these feature values. In fact it might be said that this notion of a feature-value representation is a defining part of the CBR world-view – it underpins the idea of a problem space with cases located relative to each other in this space. Recently a variety of similarity mechanisms have emerged that are not feature-based. Some of these ideas have emerged in CBR research but many of them have arisen in other areas of data analysis. In fact research on Support Vector Machines(SVM) is a rich source of novel similarity representations because of the emphasis on encoding domain knowledge in the kernel function of the SVM. In this paper we review these novel featureless similarity measures and assess the implications these measures have for CBR research.29 - PublicationAn Assessment of Alternative Strategies for Constructing EMD-Based Kernel Functions for Use in an SVM for Image Classification(University College Dublin. School of Computer Science and Informatics, 2007-03-21)
; Because of their sound theoretical underpinnings, Support Vector Machines (SVMs) have very impressive performance in classification. However, the use of SVMs is constrained by the fact that the affinity measure that is used to build the classifier must produce a kernel matrix that is positive semi-definite (PSD). This is normally not a problem, however many very effective affinity measures are known that will not produce a PSD kernel matrix. One such measure is the EarthMover’s Distance (EMD) for quantifying the difference between images. In this paper we consider three methods for producing a PSD kernel from the EMD and compare SVM-based classifiers that use these measures against a Nearest Neighbour classifier built directly on the EMD. We find that two of these kernelised EMD measures are effective and the resulting SVMs are better than the Nearest Neighbour alternatives.74 - Publicationk-Nearest Neighbour Classifiers(University College Dublin. School of Computer Science and Informatics, 2007-03-27)
; Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier – classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance today because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data.57 - PublicationAn Ensemble Approach to Identifying Informative Constraints for Semi-Supervised Clustering(University College Dublin. School of Computer Science and Informatics, 2007-05-04)
; A number of clustering algorithms have been proposed for use in tasks where a limited degree of supervision is available. This prior knowledge is frequently provided in the form of pairwise must-link and cannot-link constraints. While the incorporation of pairwise supervision has the potential to improve clustering accuracy, the composition and cardinality of the constraint sets can significantly impact upon the level of improvement. We demonstrate that it is often possible to correctly “guess” a large number of constraints without supervision from the coassociations between pairs of objects in an ensemble of clusterings. Along the same lines, we establish that constraints based on pairs with uncertain co-associations are particularly informative, if known. An evaluation on text data shows that this provides an effective criterion for identifying constraints, leading to a reduction in the level of supervision required to direct a clustering algorithm to an accurate solution.24 - PublicationDimension Reduction(University College Dublin. School of Computer Science and Informatics, 2007-08-08)When data objects that are the subject of analysis using machine learning techniques are described by a large number of features (i.e. the data is high dimension) it is often beneficial to reduce the dimension of the data. Dimension reduction can be beneficial not only for reasons of computational efficiency but also because it can improve the accuracy of the analysis. The set of techniques that can be employed for dimension reduction can be partitioned in two important ways; they can be separated into techniques that apply to supervised or unsupervised learning and into techniques that either entail feature selection or feature extraction. In this paper an overview of dimension reduction techniques based on this organisation is presented and representative techniques in each category is described.
330 - PublicationAn Evaluation of One-Class Classification Techniques for Speaker Verification(University College Dublin. School of Computer Science and Informatics, 2007-08-13)
; ; Speaker verification is a challenging problem in speaker recognition where the objective is to determine whether a segment of speech in fact comes from a specific individual. In supervised machine learning terms this is a challenging problem as, while examples belonging to the target class are easy to gather, the set of counterexamples is completely open. In this paper we cast this as a one-class classification problem and evaluate a variety of state-of-the-art one-class classification techniques on a benchmark speech recognition dataset. We show that of the one-class classification techniques, Gaussian Mixture Models shows the best performance on this task.97 - PublicationAn Evaluation of Dimension Reduction Techniques for One-Class Classification(University College Dublin. School of Computer Science and Informatics, 2007-08-13)
; Dimension reduction (DR) is important in the processing of data in domains such as multimedia or bioinformatics because such data can be of very high dimension. Dimension reduction in a supervised learning context is a well posed problem in that there is a clear objective of discovering a reduced representation of the data where the classes are well separated. By contrast DR in an unsupervised context is ill posed in that the overall objective is less clear. Nevertheless successful unsupervised DR techniques such as Principal Component Analysis (PCA) exist – PCA has the pragmatic objective of transforming the data into a reduced number of dimensions that still captures most of the variation in the data. While one-class classification falls somewhere between the supervised and unsupervised learning categories, supervised DR techniques appear not to be applicable at all for one-class classification because of the absence of a second class label in the training data. In this paper we evaluate the use of a number of up-to-date unsupervised DR techniques for one-class classification and we show that techniques based on cluster coherence and locality preservation are effective.91 - PublicationReminding Short-Term Memory Sufferers to Complete Routine Tasks(University College Dublin. School of Computer Science and Informatics, 2007-09-28)
; With the general increase of life span that our advances in health care have afforded us, more people are suffering from short term memory loss than ever before. Short term memory sufferers often forget what they were doing in the middle of a task and can find themselves in dangerous situations, such as leaving the stove on and leaving the house. They could benefit from an RFID based reminder system that would determine what they were doing based on what objects they touch. To use the system, the user wears an RFID glove which has a reader in the palm. The RFID glove reads the tags on the nearby objects. Along with the RFID glove we are developing an application that enables the user to interact with a reminder application. The application alerts the user of important activities they may have forgotten they started and when an activity is interrupted. It also keeps a record of the list of activities they have performed and objects they have touched through out the day.21 - PublicationAnalogical Retrieval(University College Dublin. School of Computer Science and Informatics, 2007-11-29)
; We observe that thus far all computational models of analogy have modelled memory as a set of disjoint, encapsulated, domains. As there does not appear to be any psychological evidence for modelling memory in this way, we suggest that a more realistic model of analogy could be constructed if memory was modelled as one large data structure. We argue that the retrieval sub-process of analogy may not be independent of the mapping sub-process, and that both processes may well be governed by structural similarity. We describe a computational model of analogy which incorporates these three ideas; it models mapping and retrieval together, uses structural similarity to govern matching, and models memory as one large data structure. Retrieval in this system corresponds to the searching of the data structure for analogical matches to a supplied probe. We suggest a practical and efficient algorithm for such retrieval.37 - PublicationHeterogeneous PBLAS: A Set of Parallel Basic Linear Algebra Subprograms for Heterogeneous Computational Clusters(University College Dublin. School of Computer Science and Informatics, 2008)
; ; We present a package, called Heterogeneous PBLAS (HeteroPBLAS), which is built on top of PBLAS and provides optimized parallel basic linear algebra subprograms for Heterogeneous Computational Clusters. We present the user interface and the software hierarchy of the first research implementation of HeteroPBLAS. This is the first step towards the development of a parallel linear algebra package for Heterogeneous Computational Clusters. We demonstrate the efficiency of the HeteroPBLAS programs on a homogeneous computing cluster and a heterogeneous computing cluster.22 - PublicationA Taxonomy of Similarity Mechanisms for Case-Based Reasoning(University College Dublin. School of Computer Science and Informatics, 2008-01-06)Assessing the similarity between cases is a key aspect of the retrieval phase in casebased reasoning (CBR). In most CBR work, similarity is assessed based on feature-value descriptions of cases using similarity metrics which use these feature values. In fact it might be said that this notion of a feature-value representation is a defining part of the CBR world-view – it underpins the idea of a problem space with cases located relative to each other in this space. Recently a variety of similarity mechanisms have emerged that are not founded on this feature-space idea. Some of these new similarity mechanisms have emerged in CBR research and some have arisen in other areas of data analysis. In fact research on Support Vector Machines (SVM) is a rich source of novel similarity representations because of the emphasis on encoding domain knowledge in the kernel function of the SVM. In this paper we present a taxonomy that organises these new similarity mechanisms and more established similarity mechanisms in a coherent framework.
71 - PublicationThe Quantititive Estimation of Asynchrony Among Concurrent SpeakersA novel method for the estimation of asynchrony among two speakers reading together is proposed. Previous estimates of asynchrony were based only on the pointwise measurement of lag. We here adapt the well known method of dynamic time warping to align two utterances. The resulting warp path allows a quantitative estimate of asynchrony. Illustrative examples are provided, which demonstrate that the novel method can distinguish synchronization performance in a variety of speaking conditions.
17 - PublicationUnsupervised Retrieval of Attack Profiles in Collaborative Recommender Systems(University College Dublin. School of Computer Science and Informatics, 2008-04)
; ; Trust, reputation and recommendation are key components of successful ecommerce systems. However, ecommerce systems are also vulnerable in this respect because there are opportunities for sellers to gain advantage through manipulation of reputation and recommendation. One such vulnerability is the use of fraudulent user profiles to boost (or damage) the ratings of items in an online recommender system. In this paper we cast this problem as a problem of detecting anomalous structure in network analysis and propose a novel mechanism for detecting this anomalous structure. We present an evaluation that shows that this approach is effective at uncovering the types of recommender systems attack described in the literature.78 - PublicationCommunity Finding in Large Social Networks Through Problem Decomposition(University College Dublin. School of Computer Science and Informatics, 2008-08)
; ; ; The identification of cohesive communities is a key process in social network analysis. However, the algorithms that are effective for finding communities do not scale well to very large problems, as their time complexity is worse than linear in the number of edges in the graph. This is an important issue for those interested in applying social network analysis techniques to very large networks, such as networks of mobile phone subscribers. In this respect the contributions of this report are two-fold. First we demonstrate these scaling issues using a prominent community-finding algorithm as a case study. We then show that a twostage process, whereby the network is first decomposed into manageable subnetworks using a multilevel graph partitioning procedure, is effective in finding communities in networks with more than 106 nodes.31
- «
- 1 (current)
- 2
- 3
- »