Voici les éléments 1 - 2 sur 2
  • Publication
    Métadonnées seulement
    Lessons Learned from Applying Big Data Paradigms to a Large Scale Scientific Workflow
    (: CEUR-WS.org, 2016-11-14)
    The increasing amount of data related to the execution of scientific workflows has raised awareness of their shift towards parallel data-intensive problems. In this paper, we deliver our experience with combining the traditional high-performance computing and grid-based approaches for scientific workflows, with Big Data analytics paradigms. Our goal was to assess and discuss the suitability of such data-intensive-oriented mechanisms for production-ready workflows, especially in terms of scalability, focusing on a key element in the Big Data ecosystem: the data-centric programming model. Hence, we reproduced the functionality of a MPI-based iterative workflow from the hydrology domain, EnKF-HGS, using the Spark data analysis framework. We conducted experiments on a local cluster, and we relied on our results to discuss promising directions for further research.
  • Publication
    Métadonnées seulement
    Cloudification of a Legacy Hydrological Simulator using Apache Spark
    (2016-9-14) ; ;
    Carretero, Jesus
    ;
    Caíno-Lores, Silvina
    The field of hydrology usually relies on complex multiphysics systems and data collected from geographically distributed sensors in order to obtain good quality predictions and analysis of how wa- ter moves through the environment. Nowadays, the computational resources needed to run such com- plex simulators, and the increasing size of datasets related to the models have arisen an interest to- wards distributed infrastructures like clouds. This paper presents the results of applying a cloudifica- tion methodology to a legacy hydrological simulator (HydroGeoSphere), wrapped with an ensemble Kal- man filter. This work describes how the methodology was applied, the particularities of its implementation and configuration for the Apache Spark iterative map- reduce platform, and the results of an evaluation in a commodity cluster against an MPI implementation of the simulator.