EScience in the cloud: A MODIS satellite data reprojection and reduction pipeline in the Windows Azure platform

Latest revision as of 20:11, 3 February 2021

Abstract

The combination of low-cost sensors, low-cost commodity computing, and the Internet is enabling a new era of data-intensive science. The dramatic increase in this data availability has created a new challenge for scientists: how to process the data. Scientists today are envisioning scientific computations on large scale data but are having difficulty designing software architectures to accommodate the large volume of the often heterogeneous and inconsistent data. In this paper, we introduce a particular instance of this challenge, and present our design and implementation of a MODIS satellite data reprojection and reduction pipeline in the Windows Azure cloud computing platform. This cloud-based pipeline is designed with a goal of hiding data complexities and subsequent data processing and transformation from end users. This pipeline is highly flexible and extensible to accommodate different science data processing tasks, and can be dynamically scaled to fulfill scientists' various computational requirements in a cost-efficient way. Experiments show that by running a practical large-scale science data processing job in the pipeline using 150 moderately-sized Azure virtual machine instances, we were able to produce analytical results in nearly 90× less time than was possible with a high-end desktop machine. To our knowledge, this is one of the first eScience applications to use the Windows Azure platform.

Original document

The different versions of the original document can be found in:

http://www.cs.virginia.edu/~humphrey/papers/MODIS_Azure_IPDPS_2010.pdf

http://www.cs.virginia.edu/~humphrey/papers/MODIS_Azure_IPDPS_2010.pdf,

https://dblp.uni-trier.de/db/conf/ipps/ipdps2010.html#LiHAJIR10,

http://environment.snu.ac.kr/wp-content/uploads/2016/03/Li_2010_MODIS_Azure_IPDPS_2010.pdf,

https://ieeexplore.ieee.org/document/5470418,

http://ieeexplore.ieee.org/document/5470418,

https://doi.org/10.1109/IPDPS.2010.5470418,

https://academic.microsoft.com/#/detail/2135575903

http://xplorestaging.ieee.org/ielx5/5465899/5470342/05470418.pdf?arnumber=5470418,

http://dx.doi.org/10.1109/ipdps.2010.5470418

Latest revision as of 20:11, 3 February 2021

Abstract

Original document

Document information

Document Score

Share this document

Keywords

claim authorship

Revision as of 20:11, 3 February 2021 (view source) Scipediacontent (talk \| contribs) (Created page with " == Abstract == The combination of low-cost sensors, low-cost commodity computing, and the Internet is enabling a new era of data-intensive science. The dramatic increase in...")	Latest revision as of 20:11, 3 February 2021 (view source) Scipediacontent (talk \| contribs) m (Scipediacontent moved page Draft Content 497128968 to Li et al 2010b)
(No difference)