The XSum dataset is a collection of news articles and their corresponding one-sentence summaries. The dataset is designed to evaluate the performance of abstractive single-document summarization systems. The goal is to create a short, one-sentence summary answering the question “What is the article about?”. The dataset is available in English and is monolingual. The dataset was created by EdinburghNLP and is available on Hugging Face.
You can see which subsets and splits are available below.
Split | Details |
---|---|
test | Test set from the Xsum dataset, containing 1,000 labeled examples |
test-tiny | Truncated version of the test set from the Xsum dataset, containing 50 labeled examples |
bias | Manually annotated bias version of the Xsum dataset, containing 382 labeled examples |