Voici les éléments 1 - 2 sur 2
  • Publication
    Accès libre
    Applying big data paradigms to a large scale scientific workflow: Lessons learned and future directions
    (2020-6-1) ; ;
    Carretero, Jesus
    ;
    Caíno-Lores, Silvina
    The increasing amounts of data related to the execution of scientific workflows has raised awareness of their shift towards parallel data-intensive problems. In this paper, we deliver our experience combining the traditional high-performance computing and grid-based approaches with Big Data analytics paradigms, in the context of scientific ensemble workflows. Our goal was to assess and discuss the suitability of such data-oriented mechanisms for production-ready workflows, especially in terms of scalability. We focused on two key elements in the Big Data ecosystem: the data-centric programming model, and the underlying infrastructure that integrates storage and computation in each node. We experimented with a representative MPI-based iterative workflow from the hydrology domain, EnKFHGS, which we re-implemented using the Spark data analysis framework. We conducted experiments on a local cluster, a private cloud running OpenNebula, and the Amazon Elastic Compute Cloud (AmazonEC2). The results we obtained were analysed to synthesize the lessons we learned from this experience, while discussing promising directions for further research.
  • Publication
    Accès libre
    Integrating hydrological modelling, data assimilation and cloud computing for real-time management of water resources
    (2017-7-1) ;
    Kurtz, Wolfgang
    ;
    ; ; ; ;
    Braun, Torsten
    ;
    ;
    Vereecken, Harry
    ;
    Sudicky, Edward
    ;
    Franssen, Harrie-Jan Hendricks
    ;
    Online data acquisition, data assimilation and integrated hydrological modelling have become more and more important in hydrological science. In this study, we explore cloud computing for integrating field data acquisition and stochastic, physically-based hydrological modelling in a data assimilation and optimisation framework as a service to water resources management. For this purpose, we developed an ensemble Kalman filter-based data assimilation system for the fully-coupled, physically-based hydrological model HydroGeoSphere, which is able to run in a cloud computing environment. A synthetic data assimilation experiment based on the widely used tilted V-catchment problem showed that the computational overhead for the application of the data assimilation platform in a cloud computing environment is minimal, which makes it well-suited for practical water management problems. Advantages of the cloud-based implementation comprise the independence from computational infrastructure and the straightforward integration of cloud-based observation databases with the modelling and data assimilation platform.