Recomputation-based data reliability for MapReduce using lineage

Sherif Akoush, Ripduman Sohan & Andy Hopper
Ensuring block-level reliability of MapReduce datasets is expensive due to the spatial overheads of replicating or erasure coding data. As the amount of data processed with MapReduce continues to increase, this cost will increase proportionally. In this paper we introduce Recomputation-Based Reliability in MapReduce (RMR), a system for mitigating the cost of maintaining reliable MapReduce datasets. RMR leverages record-level lineage of the relationships between input and output records in the job for the purposes of...
This data repository is not currently reporting usage information. For information on how your repository can submit usage information, please see our documentation.