Voici les éléments 1 - 2 sur 2
Vignette d'image
Publication
Accès libre

Applying big data paradigms to a large scale scientific workflow: Lessons learned and future directions

2020-6-1, Kropf, Peter, Lapin, Andrei, Carretero, Jesus, Caíno-Lores, Silvina

The increasing amounts of data related to the execution of scientific workflows has raised awareness of their shift towards parallel data-intensive problems. In this paper, we deliver our experience combining the traditional high-performance computing and grid-based approaches with Big Data analytics paradigms, in the context of scientific ensemble workflows. Our goal was to assess and discuss the suitability of such data-oriented mechanisms for production-ready workflows, especially in terms of scalability. We focused on two key elements in the Big Data ecosystem: the data-centric programming model, and the underlying infrastructure that integrates storage and computation in each node. We experimented with a representative MPI-based iterative workflow from the hydrology domain, EnKFHGS, which we re-implemented using the Spark data analysis framework. We conducted experiments on a local cluster, a private cloud running OpenNebula, and the Amazon Elastic Compute Cloud (AmazonEC2). The results we obtained were analysed to synthesize the lessons we learned from this experience, while discussing promising directions for further research.

Pas de vignette d'image disponible
Publication
Métadonnées seulement

Cloudification of a Legacy Hydrological Simulator using Apache Spark

2016-9-14, Kropf, Peter, Lapin, Andrei, Carretero, Jesus, Caíno-Lores, Silvina

The field of hydrology usually relies on complex multiphysics systems and data collected from geographically distributed sensors in order to obtain good quality predictions and analysis of how wa- ter moves through the environment. Nowadays, the computational resources needed to run such com- plex simulators, and the increasing size of datasets related to the models have arisen an interest to- wards distributed infrastructures like clouds. This paper presents the results of applying a cloudifica- tion methodology to a legacy hydrological simulator (HydroGeoSphere), wrapped with an ensemble Kal- man filter. This work describes how the methodology was applied, the particularities of its implementation and configuration for the Apache Spark iterative map- reduce platform, and the results of an evaluation in a commodity cluster against an MPI implementation of the simulator.