Repository logo
  • Log In
    New user? Click here to register.Have you forgotten your password?
University College Dublin
    Colleges & Schools
    Statistics
    All of DSpace
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. College of Science
  3. School of Computer Science
  4. Computer Science Research Collection
  5. A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal
 
  • Details
Options

A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal

Author(s)
Ghalandari, Demian Gholipour  
Hokamp, Chris  
Pham, Nghia The  
Glover, John  
Ifrim, Georgiana  
Uri
http://hdl.handle.net/10197/12036
Date Issued
2020-07-10
Date Available
2021-03-11T16:05:39Z
Abstract
Multi-document summarization (MDS) aims to compress the content in large document collections into short summaries and has important applications in story clustering for newsfeeds, presentation of search results, and timeline generation. However, there is a lack of datasets that realistically address such use cases at a scale large enough for training supervised models for this task. This work presents a new dataset for MDS that is large both in the total number of document clusters and in the size of individual clusters. We build this dataset by leveraging the Wikipedia Current Events Portal (WCEP), which provides concise and neutral human-written summaries of news events, with links to external source articles. We also automatically extend these source articles by looking for related articles in the Common Crawl archive. We provide a quantitative analysis of the dataset and empirical results for several state-of-the-art MDS techniques.
Sponsorship
Irish Research Council
Science Foundation Ireland
Other Sponsorship
Aylien Ltd.
Type of Material
Conference Publication
Subjects

Multi-document summar...

News events

Deep learning methods...

DOI
10.18653/v1/2020.acl-main.120
Web versions
https://acl2020.org/
Language
English
Status of Item
Peer reviewed
Journal
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Conference Details
The 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), Online, 5-10 July 2020
This item is made available under a Creative Commons License
https://creativecommons.org/licenses/by/3.0/ie/
File(s)
No Thumbnail Available
Name

2005.10070v1.pdf

Size

435.69 KB

Format

Adobe PDF

Checksum (MD5)

db4307859235db3709b2b9f77e3e0091

Owning collection
Computer Science Research Collection
Mapped collections
Insight Research Collection

Item descriptive metadata is released under a CC-0 (public domain) license: https://creativecommons.org/public-domain/cc0/.
All other content is subject to copyright.

For all queries please contact research.repository@ucd.ie.

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement