Workload based provenance capture reduction

Priyanka Jadhav
Multiple solutions have been developed that collect provenance in Data-Intensive Scalable Computing (DISC) systems like Apache Spark and Apache Hadoop. Existing solutions include RAMP, Newt, Lipstick and Titian. Though these solutions support debugging within the dataflow programs, they introduce a space overhead of 30-50% of the size of the input data during provenance collection. In a productive environment, this overhead is too high to permanently track provenance and to store all the provenance information. That...
This data repository is not currently reporting usage information. For information on how your repository can submit usage information, please see our documentation.