Datasets

From Provenance Reconstruction Wiki
Jump to: navigation, search

The purpose of this page is to aggregate information about all datasets related to the problem of provenance reconstruction. Primarily, we are looking for datasets which can be used to evaluate provenance reconstruction approaches.

When adding new datasets to the table below, please be sure to:

  • create a new page on this wiki, describing the dataset;
  • add the proper attribution to the people/institution who created/maintain the dataset;
  • add the intended purpose of the dataset;
  • add a link to a location where the dataset can be downloaded.
Provenance Reconstruction Datasets
Dataset Name Attributed To Purpose Download Link
2014 Provenance Reconstruction Challenge (human-generated) Ghent University - iMinds - Data Science Lab & VU Amsterdam Gold standard benchmark for provenance recontruction [1]
2014 Provenance Reconstruction Challenge (machine-generated) Ghent University - iMinds - Data Science Lab & VU Amsterdam Gold standard benchmark for provenance recontruction [2]
Human-generated dataset similar to 2014 Provenance Reconstruction Challenge Ailifan Aierken, Delmar B. Davis, Qi Zhang, Kriti Gupta, Alex Wong, Hazeline U. Asuncion. A Multi-Level Funneling Approach to Data Provenance Reconstruction. e-Science Workshop of Works in Progress, October 2014. Dataset used in the paper [3]
Human-generated dataset 2 similar to 2014 Provenance Reconstruction Challenge Provenance and Traceability Research Group (PTRG) at the University of Washington Bothell: Subha Vasudevan, William Pfeffer, Delmar Davis, Hazeline Asuncion additional dataset [4]