Options
Automatic Summarisation of Large-Scale News Content
Author(s)
Date Issued
2023
Date Available
2025-11-27T17:48:19Z
Abstract
Information overload is quickly becoming a problem for everyone. It is becoming increasingly difficult to keep track of ongoing stories when multiple publishers release news about multitudes of topics every day. Automated methods of structuring and summarising news can help to reduce redundant content and make key information more accessible to individuals and organisations alike. Text summarisation is a growing research area related to this issue. However, the text summarisation space lacks a few of the necessary building blocks to structure and summarise large-scale news content, such as: 1) approaches that summarise news events concisely which are also efficient, accurate, and customisable, 2) publicly available large-scale datasets containing concise summaries of news events to enable the training of high-quality supervised models, 3) improved methods that structure and summarise long-term news topics. In this thesis, we study and develop approaches to automatically structure and summarise news content using natural language processing and machine learning. Firstly, due to their simplicity, we focus on timelines as a way to structure long-range news topics. We examine how well different strategies of building news timelines work and propose a simple and efficient method, which improves upon previous state-of-the-art methods. Secondly, we focus on multi-document summarisation of individual news events, which we see as a required building block to summarise large news collections. Due to the lack of large multi-document summarisation datasets containing concise summaries, we create a new publicly available dataset better suited to this task. This dataset reflects real-world collections of news articles and summaries. We show that our dataset enables the training of supervised abstractive models which produce concise, high-quality summaries of news events. Lastly, we focus on the problem of producing concise summaries with customisable settings, e.g. summary length. We propose an efficient and unsupervised method to compress sentences using reinforcement learning. Our overall aim is to contribute building blocks, insights, and ideas that can help to solve the challenges of efficiently structuring and processing large collections of news in a way that is beneficial to users. All code and datasets created for publications connected to this thesis are publicly available.
Type of Material
Doctoral Thesis
Qualification Name
Doctor of Philosophy (Ph.D.)
Publisher
University College Dublin. School of Computer Science
Copyright (Published Version)
2023 the Author
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
File(s)
Loading...
Name
Ghalandari2023.pdf
Size
3.35 MB
Format
Adobe PDF
Checksum (MD5)
684b48dc0d6599d13fb3c491591fab6f
Owning collection