Voici les éléments 1 - 10 sur 135
  • Publication
    Accès libre
    Applying big data paradigms to a large scale scientific workflow: Lessons learned and future directions
    (2020-6-1) ; ;
    Carretero, Jesus
    ;
    Caíno-Lores, Silvina
    The increasing amounts of data related to the execution of scientific workflows has raised awareness of their shift towards parallel data-intensive problems. In this paper, we deliver our experience combining the traditional high-performance computing and grid-based approaches with Big Data analytics paradigms, in the context of scientific ensemble workflows. Our goal was to assess and discuss the suitability of such data-oriented mechanisms for production-ready workflows, especially in terms of scalability. We focused on two key elements in the Big Data ecosystem: the data-centric programming model, and the underlying infrastructure that integrates storage and computation in each node. We experimented with a representative MPI-based iterative workflow from the hydrology domain, EnKFHGS, which we re-implemented using the Spark data analysis framework. We conducted experiments on a local cluster, a private cloud running OpenNebula, and the Amazon Elastic Compute Cloud (AmazonEC2). The results we obtained were analysed to synthesize the lessons we learned from this experience, while discussing promising directions for further research.
  • Publication
    Restriction temporaire
  • Publication
    Accès libre
    THUNDERSTORM: A Tool to Evaluate Dynamic Network Topologies on Distributed Systems
    (2019-10-1)
    Liechti, Luca
    ;
    Gouveia, Paulo
    ;
    Neves, João
    ;
    ;
    Matos, Miguel
    ;
    Abstract—Network dynamics, such as sudden changes in latency or available bandwidth, have a significant impact on the performance of distributed systems. While such dynamics are common, especially in WAN deployments, existing tools lack the capabilities to systematically evaluate the impact of such changes in real systems. We present THUNDERSTORM, a tool to evaluate the impact of dynamic network topologies on the performance of large-scale distributed systems. THUNDERSTORM is a fully functional tool that integrates with Kubernetes and can be used to evaluate off-the-shelf applications. THUNDERSTORM defines an easy-to-use language to describe arbitrarily complex network topologies and dynamic events used to enrich the default container composition descriptors. Our evaluation, using micro- and macro-benchmarks, as well as off-the-shelf unmodified systems (e.g., Apache Cassandra, MariaDB) shows that THUNDERSTORM is easy to use, accurate in reproducing dynamic behaviours and that it can help researchers uncover unexpected behaviours otherwise very costly to reproduce in real deployments typically captured only during malfunctioning periods.
  • Publication
    Métadonnées seulement
    A Roadmap for Research in Sustainable Ultrascale Systems
    (Bruxelles: EU-COST IC1305, 2018)
    Sousa, Leonel
    ;
    ;
    Kuonen, Pierre
    ;
    Prodan, Radu
    ;
    Trinh, Tuan Anh
    ;
    Carreto, Jesus
    The COST Action IC1305 (NESUS) proposes in this research roadmap research objectives and twelve associated recommendations, which in combination, can help bring about the notable changes required to make true the existence of sustainable ultrascale computing systems. Moreover, they are useful for industry and stakeholders to define a path towards ultrascale systems.
  • Publication
    Métadonnées seulement
    Efficient Broadcasting Algorithm in Harary-like Networks}
    (2017-8-1)
    Bhabak, Puspal
    ;
    Harutyunyan, Hovhannes
    ;
    In this paper, we analyze the properties of Harary graphs and some derivatives with respect to the achievable performance of communication within network structures based on these graphs. In particular we defined Cordal-Haray graphs on n nodes which can be constructed for any even n for any odd degree between 3 and 2[log n] - 1. We also present a simple algorithm for fast message broadcasting in this network. Our analysis show that when nodes of a Cordal-Harary Graph have logarithmic degree then the broadcasting time will be as small as [log n] which is the minimum possible value for a network on n nodes. All this properties show that Cordal-Harary is a very good network architecture for parallel processing.
  • Publication
    Accès libre
    Integrating hydrological modelling, data assimilation and cloud computing for real-time management of water resources
    (2017-7-1) ;
    Kurtz, Wolfgang
    ;
    ; ; ; ;
    Braun, Torsten
    ;
    ;
    Vereecken, Harry
    ;
    Sudicky, Edward
    ;
    Franssen, Harrie-Jan Hendricks
    ;
    Online data acquisition, data assimilation and integrated hydrological modelling have become more and more important in hydrological science. In this study, we explore cloud computing for integrating field data acquisition and stochastic, physically-based hydrological modelling in a data assimilation and optimisation framework as a service to water resources management. For this purpose, we developed an ensemble Kalman filter-based data assimilation system for the fully-coupled, physically-based hydrological model HydroGeoSphere, which is able to run in a cloud computing environment. A synthetic data assimilation experiment based on the widely used tilted V-catchment problem showed that the computational overhead for the application of the data assimilation platform in a cloud computing environment is minimal, which makes it well-suited for practical water management problems. Advantages of the cloud-based implementation comprise the independence from computational infrastructure and the straightforward integration of cloud-based observation databases with the modelling and data assimilation platform.
  • Publication
    Métadonnées seulement
    A LRAAM-based Partial Order Function for Ontology Matching in the Context of Service Discovery
    (2017-6-14)
    Ludolph, Hendrik
    ;
    Babin, Gilbert
    ;
    The demand for Software as a Service is heavily increasing in the era of Cloud. With this demand comes a proliferation of third-party service offerings to fulfill it. It thus becomes crucial for organizations to find and select the right services to be integrated into their existing tool landscapes. Ideally, this is done automatically and continuously. The objective is to always provide the best possible support to changing business needs. In this paper, we explore an artificial neural network implementation, an LRAAM, as the specific oracle to control the selection process. We implemented a proof of concept and conducted experiments to explore the validity of the approach. We show that our implementation of the LRAAM performs correctly under specific parameters. We also identify limitations in using LRAAM in this context.
  • Publication
    Métadonnées seulement
    Methodological Approach to Data-Centric Cloudific- ation of Scientific Iterative Workflows
    (: Springer, LNCS 10048, 2016-12-14)
    The computational complexity and the constantly increas- ing amount of input data for scientific computing models is threatening their scalability. In addition, this is leading towards more data-intensive scientific computing, thus rising the need to combine techniques and in- frastructures from the HPC and big data worlds. This paper presents a methodological approach to cloudify generalist iterative scientific work- flows, with a focus on improving data locality and preserving perfor- mance. To evaluate this methodology, it was applied to an hydrologi- cal simulator, EnKF-HGS. The design was implemented using Apache Spark, and assessed in a local cluster and in Amazon Elastic Compute Cloud (EC2) against the original version to evaluate performance and scalability.
  • Publication
    Métadonnées seulement
    Lessons Learned from Applying Big Data Paradigms to a Large Scale Scientific Workflow
    (: CEUR-WS.org, 2016-11-14)
    The increasing amount of data related to the execution of scientific workflows has raised awareness of their shift towards parallel data-intensive problems. In this paper, we deliver our experience with combining the traditional high-performance computing and grid-based approaches for scientific workflows, with Big Data analytics paradigms. Our goal was to assess and discuss the suitability of such data-intensive-oriented mechanisms for production-ready workflows, especially in terms of scalability, focusing on a key element in the Big Data ecosystem: the data-centric programming model. Hence, we reproduced the functionality of a MPI-based iterative workflow from the hydrology domain, EnKF-HGS, using the Spark data analysis framework. We conducted experiments on a local cluster, and we relied on our results to discuss promising directions for further research.
  • Publication
    Métadonnées seulement
    Cloudification of a Legacy Hydrological Simulator using Apache Spark
    (2016-9-14) ; ;
    Carretero, Jesus
    ;
    Caíno-Lores, Silvina
    The field of hydrology usually relies on complex multiphysics systems and data collected from geographically distributed sensors in order to obtain good quality predictions and analysis of how wa- ter moves through the environment. Nowadays, the computational resources needed to run such com- plex simulators, and the increasing size of datasets related to the models have arisen an interest to- wards distributed infrastructures like clouds. This paper presents the results of applying a cloudifica- tion methodology to a legacy hydrological simulator (HydroGeoSphere), wrapped with an ensemble Kal- man filter. This work describes how the methodology was applied, the particularities of its implementation and configuration for the Apache Spark iterative map- reduce platform, and the results of an evaluation in a commodity cluster against an MPI implementation of the simulator.