Data analysis for scientific experiments and enterprises, large-scale simulations, and machine learning tasks all entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous outputs, the pipeline may fail to execute or produce incorrect results. Inferring the root cause(s) of such failures is challenging, usually requiring time and much human thought, while still being error-prone. We propose a new approach that makes use of iteration and provenance to automatically infer the root causes and derive succinct explanations of failures. Through a detailed experimental evaluation, we assess the cost, precision, and recall of our approach compared to the state of the art. Our experimental data and processing software is available for use, reproducibility, and enhancement.
Comment: To appear in SIGMOD 2020. arXiv admin note: text overlap with arXiv:2002.04640
The different versions of the original document can be found in:
Published on 01/01/2020
Volume 2020, 2020
DOI: 10.1145/3318464.3389763
Licence: Other
Are you one of the authors of this document?