Human-generated dataset 2 similar to 2014 Provenance Reconstruction Challenge
We created additional datasets similar to those provided at the 2014 Provenance Reconstruction Challenge (human-generated). These datasets were created by randomly scraping the Wikinews website  for news articles. The Wikinews articles are downloaded as html files into one folder. In addition, each dataset contains the following:
- a list of urls of source article files
- downloaded source article files (in html)
- a ground truth file (in Turtle notation)
These datasets are provided in various sizes: 10, 20, 50, 100, and 200 articles.