Options
Causality-driven Ad-hoc Information Retrieval
Author(s)
Date Issued
2024
Date Available
2025-11-14T16:53:19Z
Abstract
Traditional information retrieval systems are primarily focused on finding topically-relevant documents, which are descriptive of a particular query concept. However, when working with sources such as collections of news articles, users frequently seek not only those documents that describe a news event but also documents that explain the chain of events that could have contributed to the occurrence of that event. These associations might be complex, involving a number of causal factors. Motivated by this information need, we formulate the task of \emph{causal information retrieval}. First, we offer a comprehensive review of the existing literature on causality-related research, explaining how the proposed task differs from standard retrieval problems. Following this, we conduct empirical experiments to assess the effectiveness of popular existing retrieval methods to retrieve causally-relevant documents. Our findings illustrate that conventional methods are not suitable for this task, highlighting that causal information retrieval remains an open challenge that merits further research and exploration. To the best of our knowledge, the study of causal information retrieval, especially the extraction of information indicating causality directly from the documents, is a novel area of research. Consequently, there currently exists no off-the-shelf benchmark dataset for evaluating such systems. This thesis contributes a new dataset specifically tailored for causal information retrieval, which is made available to the community to support further research. Additionally, in this thesis, we contend that while causally relevant documents would have partial term overlap with the ones that are topically relevant for a query, it is anticipated that a substantial portion of these documents will employ a distinct set of terms to describe various potential causes that could result in specific effects. To address this issue, we propose an unsupervised feedback model to estimate a distribution of terms that are relatively infrequent but are associated with high weights in the topically-relevant distribution, indicating potential causal relevance. Our experiments reveal that this feedback model proves to be significantly more effective than conventional IR models and several other baseline heuristics related to causality. As a further contribution of this thesis, we introduce a supervised approach to enhance retrieval effectiveness in the context of causality. The fundamental idea here is to analyze input queries and estimate their specificity to the collection, enabling us to determine whether or not to apply feedback in order to retrieve more causally relevant content towards top ranks. We introduce two such supervised query performance estimation models and demonstrate that these approaches yield significant performance improvements on a range of benchmark IR datasets. The effectiveness of the proposed query performance estimation models serves as motivation for the selective feedback model for causal information extraction. We illustrate how the intermediate decision of whether or not to apply query performance prediction ultimately results in an increase in downstream effectiveness.
Type of Material
Doctoral Thesis
Qualification Name
Doctor of Philosophy (Ph.D.)
Publisher
University College Dublin. School of Computer Science
Copyright (Published Version)
2024 the Author
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
File(s)
Loading...
Name
Suchana_Thesis_revised_april2024.pdf
Size
3.35 MB
Format
Adobe PDF
Checksum (MD5)
15d3272ac11be9e7a08c35ca69f76da3
Owning collection