Human-generated dataset similar to 2014 Provenance Reconstruction Challenge

From Provenance Reconstruction Wiki
Revision as of 06:41, 3 June 2016 by Hazel (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

We created an additional dataset similar to those provided at the 2014 Provenance Reconstruction Challenge (human-generated). This dataset was created by randomly scraping the Wikinews website [1] for news articles (20 articles). The Wikinews articles are downloaded as html files into one folder. In addition, the dataset contains the following:

  • a list of urls of source article files
  • a ground truth file (in Turtle notation)