Computer Science and Informatics Technical Reports
Permanent URI for this collection
A techincal report series created to coincide with the launch of the UCD School of Computer Science and Informatics. These can be downloaded freely. Queries about the technical report series should be addressed to Alexey Lastovetsky.
Please see original homepage, archived at: https://web.archive.org/web/20080226040105/http:/csiweb.ucd.ie/Research/TechnicalReports.html.
Browse
Browsing Computer Science and Informatics Technical Reports by Title
Now showing 1 - 20 of 52
Results Per Page
Sort Options
- PublicationActive Learning for Multi-Label Image Annotation(University College Dublin. School of Computer Science and Informatics, 2009-01)
; ; Active learning is useful in situations where labeled data is scarce, unlabeled data is available and labeling has some cost associated with it. In such situations active learning helps by identifying a minimal set of items to label that will allow the training of an effective classifier. Thus active learning is appropriate for annotation tasks in multimedia, particularly in image labeling. In this paper we address the challenge of using active learning for multi-labeling of images in personal image collections. Multi-label learning covers situations where objects can have more than one class label and a learner is trained to assign multiple labels simultaneously. In this paper we report results on a learning system for labeling personal image collections that is both active and multi-label. The focus of the research has been to reduce the overall number of images that are presented to the user for labeling.82 - PublicationAdding real time into state machine analysis of digital evidenceThis report describes an extension of the finite state machine theory of digital event reconstruction. The proposed extension adds the known times of witness observations as formal objects in the theory and uses them to compute temporal bounds of events, whose time is not known.
21 - PublicationAnalogical Retrieval(University College Dublin. School of Computer Science and Informatics, 2007-11-29)
; We observe that thus far all computational models of analogy have modelled memory as a set of disjoint, encapsulated, domains. As there does not appear to be any psychological evidence for modelling memory in this way, we suggest that a more realistic model of analogy could be constructed if memory was modelled as one large data structure. We argue that the retrieval sub-process of analogy may not be independent of the mapping sub-process, and that both processes may well be governed by structural similarity. We describe a computational model of analogy which incorporates these three ideas; it models mapping and retrieval together, uses structural similarity to govern matching, and models memory as one large data structure. Retrieval in this system corresponds to the searching of the data structure for analogical matches to a supplied probe. We suggest a practical and efficient algorithm for such retrieval.35 - PublicationAn Analysis of Current Trends in CBR Research Using Multi-View Clustering(University College Dublin. School of Computer Science and Informatics, 2009-03)
; ; ; The European Conference on Case-Based Reasoning (CBR) in 2008 marked 15 years of international and European CBR conferences where almost seven hundred research papers were published. In this report we review the research themes covered in these papers and identify the topics that are active at the moment. The main mechanism for this analysis is a clustering of the research papers based on both co-citation links and text similarity. It is interesting to note that the core set of papers has attracted citations from almost three thousand papers outside the conference collection so it is clear that the CBR conferences are a sub-part of a much larger whole. It is remarkable that the research themes revealed by this analysis do not map directly to the sub-topics of CBR that might appear in a textbook. Instead they reflect the applications-oriented focus of CBR research, and cover the promising application areas and research challenges that are faced.59 - PublicationAn Assessment of Alternative Strategies for Constructing EMD-Based Kernel Functions for Use in an SVM for Image Classification(University College Dublin. School of Computer Science and Informatics, 2007-03-21)
; Because of their sound theoretical underpinnings, Support Vector Machines (SVMs) have very impressive performance in classification. However, the use of SVMs is constrained by the fact that the affinity measure that is used to build the classifier must produce a kernel matrix that is positive semi-definite (PSD). This is normally not a problem, however many very effective affinity measures are known that will not produce a PSD kernel matrix. One such measure is the EarthMover’s Distance (EMD) for quantifying the difference between images. In this paper we consider three methods for producing a PSD kernel from the EMD and compare SVM-based classifiers that use these measures against a Nearest Neighbour classifier built directly on the EMD. We find that two of these kernelised EMD measures are effective and the resulting SVMs are better than the Nearest Neighbour alternatives.73 - PublicationA Characterization of Wikipedia Content Based on Motifs in the Edit Graph(University College Dublin. School of Computer Science and Informatics, 2011-02)
; ; Wikipedia works because of the many eyes idea. Good Wikipedia pages are authoritative sources because a number of knowledgeable contributors have collaborated to produce an authoritative article on a topic. In this paper we explore the hypothesis that the extent to which the many eyes idea is true for a specific article can be assessed by looking at the edit graph associated with that article, i.e. the network of contributors and articles. As a first step in this direction we show that different parts of Wikipedia have very different edit graph motif profiles. We show that articles on sociologists (a well-curated part of Wikipedia) are very different to articles on footballers in the English Premiership.18 - PublicationCommunity Finding in Large Social Networks Through Problem Decomposition(University College Dublin. School of Computer Science and Informatics, 2008-08)
; ; ; The identification of cohesive communities is a key process in social network analysis. However, the algorithms that are effective for finding communities do not scale well to very large problems, as their time complexity is worse than linear in the number of edges in the graph. This is an important issue for those interested in applying social network analysis techniques to very large networks, such as networks of mobile phone subscribers. In this respect the contributions of this report are two-fold. First we demonstrate these scaling issues using a prominent community-finding algorithm as a case study. We then show that a twostage process, whereby the network is first decomposed into manageable subnetworks using a multilevel graph partitioning procedure, is effective in finding communities in networks with more than 106 nodes.27 - PublicationConnecting Families by Sharing the Minutiae of their Lives(University College Dublin. School of Computer Science and Informatics, 2008-08)
; Recent studies have shown that in distributed families keeping in touch is essential; this calls for technologies that can connect family members and bring them closer virtually. There are several social networking technologies online, but they are seldom designed for family connectedness and do not cater for the needs of computer-novice relatives. We present Near Dear, an application that brings online tools to an ambient display at home. The ambient display makes it easy for computer-novices to update and access online networking tools. We also conducted a user trial and evaluation of this system which indicated that the developed system is convenient and intuitive.22 - PublicationDimension Reduction(University College Dublin. School of Computer Science and Informatics, 2007-08-08)When data objects that are the subject of analysis using machine learning techniques are described by a large number of features (i.e. the data is high dimension) it is often beneficial to reduce the dimension of the data. Dimension reduction can be beneficial not only for reasons of computational efficiency but also because it can improve the accuracy of the analysis. The set of techniques that can be employed for dimension reduction can be partitioned in two important ways; they can be separated into techniques that apply to supervised or unsupervised learning and into techniques that either entail feature selection or feature extraction. In this paper an overview of dimension reduction techniques based on this organisation is presented and representative techniques in each category is described.
330 - PublicationDistortion as a validation criterion in the identification of suspicious reviews(University College Dublin. School of Computer Science and Informatics, 2010-05-02)
; ; ; Assessing the trustworthiness of reviews is a key issue for the maintainers of opinion sites such as TripAdvisor. In this paper we propose a distortion criterion for assessing the impact of methods for uncovering suspicious hotel reviews in TripAdvisor. The principle is that dishonest reviews will distort the overall popularity ranking for a collection of hotels. Thus a mechanism that deletes dishonest reviews will distort the popularity ranking significantly, when compared with the removal of a similar set of reviews at random. This distortion can be quantified by comparing popularity rankings before and after deletion, using rank correlation. We present an evaluation of this strategy in the assessment of shill detection mechanisms on a dataset of hotel reviews collected from TripAdvisor.1051 - PublicationDoes TripAdvisor Makes Hotels Better?(University College Dublin. School of Computer Science and Informatics, 2010-12)
; ; ; It seems reasonable to expect that the emergence of opinion sites such as TripAdvisor should result in significant behavior changes among service providers. They might be expected to improve their service because disgruntled customers have the facility to share their impressions with a wider audience. There are two aspects to this, service providers are motivated to improve their services in order to avoid negative comment that can reach a wide audience and they are informed about what should be changed in order to improve their service. We report on an analysis of reviews relating to the hotel sector in Ireland that demonstrates this “TripAdvisor effect”. We compare these results with an analysis on Las Vegas hotels over a similar time period, where the effect is absent, presumably because sensitivity to reputation on TripAdvisor is already well-established there.210 - PublicationDynamic ant : introducing a new benchmark for genetic programming in dynamic environments(University College Dublin. School of Computer Science and Informatics, 2011-04-14)
; ; ; ; In this paper we present a new variant of the ant problem in the dynamic problem domain. This approach presents a functional dynamism to the problem landscape, where by the behaviour of the ant is driven by its ability to explore the search space being constrained. This restriction is designed in such a way as to ensure that no generalised solution to the problem is possible, thus providing a functional change in behaviour.226 - PublicationDynamic environments can speed up evolution with genetic programming(University College Dublin. School of Computer Science and Informatics, 2011)
; ; We present a study of dynamic environments with genetic programming to ascertain if a dynamic environment can speed up evolution when compared to an equivalent static environment. We present an analysis of the types of dynamic variation which can occur with a variable-length representation such as adopted in genetic programming identifying modular varying, structural varying and incremental varying goals. An empirical investigation comparing these three types of varying goals on dynamic symbolic regression benchmarks reveals an advantage for goals which vary in terms of increasing structural complexity. This provides evidence to support the added difficulty variable length representations incur due to their requirement to search structural and parametric space concurrently, and how directing search through varying structural goals with increasing complexity can speed up search with genetic programming.465 - PublicationThe Embodied Modelling of Gestural Sequencing in SpeechIn this work we formulate and examine the hypothesis that speech sequencing patterns are shaped by the requirements of production and perception efficiency. Many phenomena associated with sequencing in speech can be seen as emergent properties of skilled motor action system behaving according to biologically and evolutionarily plausible parsimonious criteria. The articulatory sequences under consideration are represented in this thesis in the manner of Articulatory Phonology as gestural scores, i.e., temporal and dynamical descriptions of activation patterns of primitive speech actions – gestures. In order to be able to introduce relevant optimality criteria, we thoroughly evaluate the Task Dynamical implementation of Articulatory Phonology. We modify the task dynamical model of the vocal tract so that it accounts not only for the dynamical nature of the abstract sequenced tasks but also for the physical properties of the end effectors participating in the production of a small, clearly defined set of speech gestures. This resulting embodied task dynamical implementation of gestural sequencing allows us to quantitatively evaluate gestural scores in terms of the cost of their realisation, encompassing both functional aspects and the underlying physical effort associated with speech production. In this work we consider three inter-linked cost functions associated with the production and perception of gestural sequences: articulatory effort, parsing cost, and overall utterance duration. The articulatory effort is a measure of expenditure of force exerted by the model muscles in order to achieve a given sequence of gestural targets. The parsing cost is related to the resulting precision of articulation as achieved by the system’s end effectors, and to the demands imposed on the listener in order to parse the utterance. The duration cost straightforwardly reflects the overall duration of the realised gestural score. The cost functions conceived in this way pose mutually contradicting requirements on the temporal alignments and dynamical parameterisations of the gestural sequence. When combined in a single parametric overall cost function used as an objective function of the optimisation problem, this approach leads to identification of sequencing patterns, gestural scores, that are optimal with respect to the realistic interplay of perception and production criteria. We find that this optimisation process results in stable gestural sequences that reproduce several known coarticulatory effects, such as the relative order of articulatory events in simple vowel-consonant-vowel sequences and global v gestural sequencing patterns involving consonant clusters. The emergent intergestural phasing patterns are evaluated in light of the leading theories of gestural sequencing in speech. Our results indicate a considerable dependency of intergestural phasing relations on the articulatory nature of the sequenced gestures and the physiological and anatomical properties of the articulators involved in their realisation. These results are sometimes at odds with some of the intergestural phasing principles postulated in Articulatory Phonology, but show a strong agreement with articulatory data collected by phoneticians.
35 - PublicationAn Ensemble Approach to Identifying Informative Constraints for Semi-Supervised Clustering(University College Dublin. School of Computer Science and Informatics, 2007-05-04)
; A number of clustering algorithms have been proposed for use in tasks where a limited degree of supervision is available. This prior knowledge is frequently provided in the form of pairwise must-link and cannot-link constraints. While the incorporation of pairwise supervision has the potential to improve clustering accuracy, the composition and cardinality of the constraint sets can significantly impact upon the level of improvement. We demonstrate that it is often possible to correctly “guess” a large number of constraints without supervision from the coassociations between pairs of objects in an ensemble of clusterings. Along the same lines, we establish that constraints based on pairs with uncertain co-associations are particularly informative, if known. An evaluation on text data shows that this provides an effective criterion for identifying constraints, leading to a reduction in the level of supervision required to direct a clustering algorithm to an accurate solution.24 - PublicationAn Evaluation of Dimension Reduction Techniques for One-Class Classification(University College Dublin. School of Computer Science and Informatics, 2007-08-13)
; Dimension reduction (DR) is important in the processing of data in domains such as multimedia or bioinformatics because such data can be of very high dimension. Dimension reduction in a supervised learning context is a well posed problem in that there is a clear objective of discovering a reduced representation of the data where the classes are well separated. By contrast DR in an unsupervised context is ill posed in that the overall objective is less clear. Nevertheless successful unsupervised DR techniques such as Principal Component Analysis (PCA) exist – PCA has the pragmatic objective of transforming the data into a reduced number of dimensions that still captures most of the variation in the data. While one-class classification falls somewhere between the supervised and unsupervised learning categories, supervised DR techniques appear not to be applicable at all for one-class classification because of the absence of a second class label in the training data. In this paper we evaluate the use of a number of up-to-date unsupervised DR techniques for one-class classification and we show that techniques based on cluster coherence and locality preservation are effective.86 - PublicationAn Evaluation of One-Class Classification Techniques for Speaker Verification(University College Dublin. School of Computer Science and Informatics, 2007-08-13)
; ; Speaker verification is a challenging problem in speaker recognition where the objective is to determine whether a segment of speech in fact comes from a specific individual. In supervised machine learning terms this is a challenging problem as, while examples belonging to the target class are easy to gather, the set of counterexamples is completely open. In this paper we cast this as a one-class classification problem and evaluate a variety of state-of-the-art one-class classification techniques on a benchmark speech recognition dataset. We show that of the one-class classification techniques, Gaussian Mixture Models shows the best performance on this task.95 - PublicationFeature Extraction from Product Reviews using Feature Similarity and Polarity(University College Dublin. School of Computer Science and Informatics, 2009-10)
; ; Research on developing techniques to access user generated content, and specifically user reviews on different products, came in the focus of the information research community in recent past. In particular, this paper addresses the problem of extracting the features from user comments of a particular product, taking advantage of a corpus with a semistructured format: pros, cons and summary. In this paper we propose a technique to extract a set of features based on user generated pros and cons for a particular product. Then using this set we test a feature similarity function to obtain new features from reviews (both from the pros/cons and from the free text summary) of the same and other products. Our experimental results have shown interesting conclusions.54 - PublicationFeatureless Similarity(University College Dublin. School of Computer Science and Informatics, 2007-02-23)
; Assessing the similarity between cases is a key aspect of the retrieval phase in Case-Based Reasoning (CBR). In most CBR work, similarity is assessed based on feature-value descriptions of cases using similarity metrics which use these feature values. In fact it might be said that this notion of a feature-value representation is a defining part of the CBR world-view – it underpins the idea of a problem space with cases located relative to each other in this space. Recently a variety of similarity mechanisms have emerged that are not feature-based. Some of these ideas have emerged in CBR research but many of them have arisen in other areas of data analysis. In fact research on Support Vector Machines(SVM) is a rich source of novel similarity representations because of the emphasis on encoding domain knowledge in the kernel function of the SVM. In this paper we review these novel featureless similarity measures and assess the implications these measures have for CBR research.27 - PublicationGEVA - Grammatical Evolution in Java (v1.0)(University College Dublin. School of Computer Science and Informatics, 2008-12-05)
; ; ; ; ; GEVA is an open source implementation of Grammatical Evolution in Java developed at UCD’s Natural Computing Research & Applications group. As well as providing the characteristic genotype-phenotype mapper of GE a search algorithm engine and a simple GUI are also provided. A number of sample problems and tutorials on how to use and adapt GEVA have been developed.56
- «
- 1 (current)
- 2
- 3
- »