Lessons Learned from Applying Big Data Paradigms to a Large Scale Scientific Workflow

Kropf, Peter

Lessons Learned from Applying Big Data Paradigms to a Large Scale Scientific Workflow

Author(s)

Kropf, Peter

Institut d'informatique

Publisher

: CEUR-WS.org

Date issued

November 14, 2016

From page

54

To page

58

Subjects

Scientific workflows Big Data Cloud Computing Apache Spark Hydrology

Abstract

The increasing amount of data related to the execution of scientific workflows has raised awareness of their shift towards parallel data-intensive problems. In this paper, we deliver our experience with combining the traditional high-performance computing and grid-based approaches for scientific workflows, with Big Data analytics paradigms. Our goal was to assess and discuss the suitability of such data-intensive-oriented mechanisms for production-ready workflows, especially in terms of scalability, focusing on a key element in the Big Data ecosystem: the data-centric programming model. Hence, we reproduced the functionality of a MPI-based iterative workflow from the hydrology domain, EnKF-HGS, using the Spark data analysis framework. We conducted experiments on a local cluster, and we relied on our results to discuss promising directions for further research.

Notes

, 2016

Event name

11th Workshop on Workflows in Support of Large-Scale Science, Supercomputing

Location

Salt Lake City

Later version

http://ceur-ws.org/Vol-1800/short1.pdf

Publication type

conference paper

Identifiers

https://libra.unine.ch/handle/20.500.14713/20803

-

https://libra.unine.ch/handle/123456789/25176