Options
SafeCloud: Secure and Resilient Cloud Architecture
Titre du projet
SafeCloud: Secure and Resilient Cloud Architecture
Description
Cloud infrastructures, despite all their advantages and importance to the competitiveness of modern economies, raise fundamental questions related to the privacy, integrity, and security of offsite data storage and processing tasks. These questions are currently not answered satisfactorily by existing technologies. Furthermore, recent developments in the wake of the expansive and sometimes unauthorised government access to private and sensitive data raise major privacy and security concerns about data located in the cloud, especially when data is physically located, processed, or must transit outside the legal jurisdiction of its rightful owner. This is exacerbated by providers of cloud services that frequently move and process data without notice in ways that are detrimental to the users and their privacy.
SafeCloud will re-architect cloud infrastructures to ensure that data transmission, storage, and processing can be (1) partitioned in multiple administrative domains that are unlikely to collude, so that sensitive data can be protected by design; (2) entangled with inter-dependencies that make it impossible for any of the domains to tamper with its integrity. These two principles (partitioning and entanglement) are thus applied holistically across the entire data management stack, from communication to storage and processing.
Users will control the choice of non-colluding domains for partitioning and the tradeoffs between entanglement and performance, and thus will have full control over what happens to their data. This will make users less reluctant to manage their personal data online due to privacy concerns and will generate important benefits for privacy-sensitive online applications such as distributed cloud infrastructures and medical record storage platforms.
SafeCloud will re-architect cloud infrastructures to ensure that data transmission, storage, and processing can be (1) partitioned in multiple administrative domains that are unlikely to collude, so that sensitive data can be protected by design; (2) entangled with inter-dependencies that make it impossible for any of the domains to tamper with its integrity. These two principles (partitioning and entanglement) are thus applied holistically across the entire data management stack, from communication to storage and processing.
Users will control the choice of non-colluding domains for partitioning and the tradeoffs between entanglement and performance, and thus will have full control over what happens to their data. This will make users less reluctant to manage their personal data online due to privacy concerns and will generate important benefits for privacy-sensitive online applications such as distributed cloud infrastructures and medical record storage platforms.
Chercheur principal
Statut
Completed
Date de début
1 Septembre 2015
Date de fin
31 Août 2018
Site web du projet
Identifiant interne
31562
identifiant
6 Résultats
Voici les éléments 1 - 6 sur 6
- PublicationAccès libreHave a Seat on the ErasureBench: Easy Evaluation of Erasure Coding Libraries for Distributed Storage SystemsWe present ErasureBench, an open-source framework to test and benchmark erasure coding implementations for distributed storage systems under realistic conditions. ErasureBench automatically instantiates and scales a cluster of storage nodes, and can seamlessly leverage existing failure traces. As a first example, we use ErasureBench to compare three coding implementations: a (10,4) Reed-Solomon (RS) code, a (10,6,5) locally repairable code (LRC), and a partition of the data source in ten pieces without error-correction. Our experiments show that LRC and RS codes require the same repair throughput when used with small storage nodes, since cluster and network management traffic dominate at this regime. With large storage nodes, read and write traffic increases and our experiments confirm the theoretical and practical tradeoffs between the storage overhead and repair bandwidth of RS and LRC codes.
- PublicationAccès libreSafeFS: A Modular Architecture for Secure User-Space File Systems (One FUSE to rule them all)(: ACM, 2017-5-22)
;Pontes, Rogério; ;Maia, Francisco ;Paulo, João; ; ; Oliveira, Rui - PublicationAccès libreBlock placement strategies for fault-resilient distributed tuple spaces: an experimental study(: Springer, 2017-6-19)
;Barbi, Roberta ;Buravlev, Vitaly ;Antares Mezzina, ClaudioThe tuple space abstraction provides an easy-to-use programming paradigm for distributed applications. Intuitively, it behaves like a distributed shared memory, where applications write and read entries (tuples). When deployed over a wide area network, the tuple space needs to efficiently cope with faults of links and nodes. Erasure coding techniques are increasingly popular to deal with such catastrophic events, in particular due to their storage efficiency with respect to replication. When a client writes a tuple into the system, this is first striped into k blocks and encoded into 𝑛>𝑘 blocks, in a fault-redundant manner. Then, any k out of the n blocks are sufficient to reconstruct and read the tuple. This paper presents several strategies to place those blocks across the set of nodes of a wide area network, that all together form the tuple space. We present the performance trade-offs of different placement strategies by means of simulations and a Python implementation of a distributed tuple space. Our results reveal important differences in the efficiency of the different strategies, for example in terms of block fetching latency, and that having some knowledge of the underlying network graph topology is highly beneficial. - PublicationMétadonnées seulementA Performance Evaluation of Erasure Coding Libraries for Cloud-Based Data StoresErasure codes have been widely used over the last decade to implement reliable data stores. They offer interesting trade-offs between efficiency, reliability, and storage overhead. Indeed, a distributed data store holding encoded data blocks can tolerate the failure of multiple nodes while requiring only a fraction of the space necessary for plain replication, albeit at an increased encoding and decoding cost. There exists nowadays a number of libraries implementing several variations of erasure codes, which notably differ in terms of complexity and implementation-specific optimizations. Seven years ago, Plank et al. [14] have conducted a comprehensive performance evaluation of open-source erasure coding libraries available at the time to compare their raw performance and measure the impact of different parameter configurations. In the present experimental study, we take a fresh perspective at the state of the art of erasure coding libraries. Not only do we cover a wider set of libraries running on modern hardware, but we also consider their efficiency when used in realistic settings for cloud-based storage, namely when deployed across several nodes in a data centre. Our measurements therefore account for the end-to-end costs of data accesses over several distributed nodes, including the encoding and decoding costs, and shed light on the performance one can expect from the various libraries when deployed in a real system. Our results reveal important differences in the efficiency of the different libraries, notably due to the type of coding algorithm and the use of hardware-specific optimizations.
- PublicationAccès libreOn the Cost of Safe Storage for Public Clouds: an Experimental Evaluation(: IEEE, 2016-9-26)
; ;Pontes, Rogério; ;Maia, Francisco; ;Oliveira, Rui ;Paulo, JoãoCloud-based storage services such as Dropbox, Google Drive and OneDrive are increasingly popular for storing enterprise data, and they have already become the de facto choice for cloud-based backup of hundreds of millions of regular users. Drawn by the wide range of services they provide, no upfront costs and 24/7 availability across all personal devices, customers are well-aware of the benefits that these solutions can bring. However, most users tend to forget-or worse ignore-some of the main drawbacks of such cloud-based services, namely in terms of privacy. Data entrusted to these providers can be leaked by hackers, disclosed upon request from a governmental agency's subpoena, or even accessed directly by the storage providers (e.g., for commercial benefits). While there exist solutions to prevent or alleviate these problems, they typically require direct intervention from the clients, like encrypting their data before storing it, and reduce the benefits provided such as easily sharing data between users. This practical experience report studies a wide range of security mechanisms that can be used atop standard cloud-based storage services. We present the details of our evaluation testbed and discuss the design choices that have driven its implementation. We evaluate several state-of-the-art techniques with varying security guarantees responding to user-assigned security and privacy criteria. Our results reveal the various trade-offs of the different techniques by means of representative workloads on top of industry-grade storage services. - PublicationAccès libreWorst-case, information and all-blocks locality in distributed storage systems: An explicit comparisonDistributed storage systems often use erasure coding techniques to provide reliability while decreasing the storage overhead required by replication. Due to the drawbacks of standard MDS erasure-correcting codes, numerous coding schemes recently proposed for distributed storage systems target other metrics such as repair locality and repair bandwidth. Unfortunately, these schemes are not always practical, and for most of them locality covers information data only. In this article, we compare three explicit linear codes for three types of locality: a Reed-Solomon code for worst-case locality, a recently proposed pyramid code for information locality and the Hamming code HAM, an optimal locally repairable code directly built from its generator matrix for all-blocks locality. We also provide an efficient way for repairing HAM and show that for the same level of storage overhead HAM provides faster encoding, faster repair and lower repair bandwidth than the other two solutions while requiring less than fifty lines of code.