An Ensemble Approach to Identifying Informative Constraints for Semi-Supervised Clustering
|Title:||An Ensemble Approach to Identifying Informative Constraints for Semi-Supervised Clustering||Authors:||Greene, Derek; Cunningham, Pádraig||Permanent link:||http://hdl.handle.net/10197/12355||Date:||4-May-2007||Online since:||2021-07-28T15:55:52Z||Abstract:||A number of clustering algorithms have been proposed for use in tasks where a limited degree of supervision is available. This prior knowledge is frequently provided in the form of pairwise must-link and cannot-link constraints. While the incorporation of pairwise supervision has the potential to improve clustering accuracy, the composition and cardinality of the constraint sets can significantly impact upon the level of improvement. We demonstrate that it is often possible to correctly “guess” a large number of constraints without supervision from the coassociations between pairs of objects in an ensemble of clusterings. Along the same lines, we establish that constraints based on pairs with uncertain co-associations are particularly informative, if known. An evaluation on text data shows that this provides an effective criterion for identifying constraints, leading to a reduction in the level of supervision required to direct a clustering algorithm to an accurate solution.||Type of material:||Technical Report||Publisher:||University College Dublin. School of Computer Science and Informatics||Series/Report no.:||UCD CSI Technical Reports; UCD-CSI-2007-6||Copyright (published version):||2007 the Authors||Keywords:||Clustering algorithms; Machine learning; Semi-supervised clustering; Ensemble-based clustering; Text corpora||Other versions:||https://web.archive.org/web/20080226040105/http:/csiweb.ucd.ie/Research/TechnicalReports.html||Language:||en||Status of Item:||Not peer reviewed||This item is made available under a Creative Commons License:||https://creativecommons.org/licenses/by-nc-nd/3.0/ie/|
|Appears in Collections:||CASL Research Collection|
Computer Science and Informatics Technical Reports
Show full item record
If you are a publisher or author and have copyright concerns for any item, please email email@example.com and the item will be withdrawn immediately. The author or person responsible for depositing the article will be contacted within one business day.