XSum

Source: Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

The XSum dataset is a collection of news articles and their corresponding one-sentence summaries. The dataset is designed to evaluate the performance of abstractive single-document summarization systems. The goal is to create a short, one-sentence summary answering the question “What is the article about?”. The dataset is available in English and is monolingual. The dataset was created by EdinburghNLP and is available on Hugging Face.

You can see which subsets and splits are available below.

Split	Details
test	Test set from the Xsum dataset, containing 1,000 labeled examples
test-tiny	Truncated version of the test set from the Xsum dataset, containing 50 labeled examples
bias	Manually annotated bias version of the Xsum dataset, containing 382 labeled examples