(Created page with " == Abstract == Scientific workflows are often used to automate large-scale data analysis pipelines on clusters, grids, and clouds. However, because workflows can be extremel...")
 
m (Scipediacontent moved page Draft Content 481691223 to Deelman et al 2015a)
 
(No difference)

Latest revision as of 23:19, 28 January 2021

Abstract

Scientific workflows are often used to automate large-scale data analysis pipelines on clusters, grids, and clouds. However, because workflows can be extremely data-intensive, and are often executed on shared resources, it is critical to be able to limit or minimize the amount of disk space that workflows use on shared storage systems. This paper proposes a novel and simple approach that constrains the amount of storage space used by a workflow by inserting data cleanup tasks into the workflow task graph. Unlike previous solutions, the proposed approach provides guaranteed limits on disk usage, requires no new functionality in the underlying workflow scheduler, and does not require estimates of task runtimes. Experimental results show that this algorithm significantly reduces the number of cleanup tasks added to a workflow and yields better workflow makespans than the strategy currently used by the Pegasus Workflow Management System.


Original document

The different versions of the original document can be found in:

http://dx.doi.org/10.1109/works.2014.8
http://raiith.iith.ac.in/1977,
https://ieeexplore.ieee.org/document/7019861,
https://academic.microsoft.com/#/detail/2105512718
Back to Top

Document information

Published on 01/01/2015

Volume 2015, 2015
DOI: 10.1109/works.2014.8
Licence: CC BY-NC-SA license

Document Score

0

Views 0
Recommendations 0

Share this document

Keywords

claim authorship

Are you one of the authors of this document?