Lessons Learned from Applying Big Data Paradigms to a Large Scale Scientific Workflow
Author(s)
Publisher
: CEUR-WS.org
Date issued
November 14, 2016
From page
54
To page
58
Subjects
Scientific workflows Big Data Cloud Computing Apache Spark Hydrology
Abstract
The increasing amount of data related to the execution of scientific workflows has raised awareness of their shift towards parallel data-intensive problems. In this paper, we deliver our experience with combining the traditional high-performance computing and grid-based approaches for scientific workflows, with Big Data analytics paradigms. Our goal was to assess and discuss the suitability of such data-intensive-oriented mechanisms for production-ready workflows, especially in terms of scalability, focusing on a key element in the Big Data ecosystem: the data-centric programming model. Hence, we reproduced the functionality of a MPI-based iterative workflow from the hydrology domain, EnKF-HGS, using the Spark data analysis framework. We conducted experiments on a local cluster, and we relied on our results to discuss promising directions for further research.
Notes
, 2016
Event name
11th Workshop on Workflows in Support of Large-Scale Science, Supercomputing
Location
Salt Lake City
Later version
http://ceur-ws.org/Vol-1800/short1.pdf
Publication type
conference paper