Options
Burihabwa, Dorian
Nom
Burihabwa, Dorian
Affiliation principale
Site web
Fonction
Ancien.ne collaborateur.trice
Identifiants
Résultat de la recherche
Voici les éléments 1 - 7 sur 7
- PublicationAccès libreComposing private and censorship-resistant solutions for distributed storageLe stockage dans le cloud est une pratique communément adoptée pour la sauvegarde de données privées et professionnelles. Virtuellement illimitées, les capacités de stockage dans le cloud permettent à tout un chacun de se concentrer sur son activité sans crainte de manquer d’espace ou de perdre des données. Mais si les utilisateurs confient de plus en plus de données à ces fournisseurs de service de stockage en ligne, ils le font au prix d’une certaine perte de contrôle. Et à une époque où de nombreux services en ligne font une partie de leur chiffre d'affaires sur l’exploitation des données et méta-données utilisateurs, des questions de confidentialité et de sécurité se posent. Les documents mis en lignes par les utilisateurs sont-ils lus par le fournisseur de service ? Le contenu de ces documents est-il partagé par le fournisseur de service avec des partenaires tiers ? Qu’advient-il des données lorsque le fournisseur de service fait faillite ? Si les fournisseurs de services s’efforcent d’apporter des réponses satisfaisantes à leurs clients, la perte de contrôle sur les données continuent d’alimenter de réelles inquiétudes.
À ces inquiétudes vient s’ajouter la question de la fiabilité du service offert par ces fournisseurs de stockage. En e˙et, la plupart des offres sont construites sur des centres de données dispersés à travers le monde. Si la répartition des données permet une meilleure qualité de service, elle amène également son lot de difficultés. Les fournisseurs de services doivent désormais anticiper et prévenir les problèmes survenant à la fois à l’intérieur mais également sur le réseau entre centres de données. Trouver l’équilibre entre sécurité, confidentialité, résilience et performance tout en coordonnant un grand nombre de nœuds de stockage répartis n’est pas une chose aisée. Et même lorsqu’une formule équilibrée est trouvée, elle se paie souvent par une augmentation des coûts de stockage.
Dans cette thèse, nous tentons d’apporter des solutions à ces problèmes en nous concentrant sur trois aspects. Premièrement, nous étudions des solutions flexibles garantissant la sécurité, l’intégrité et la redondance des données pour du stockage dans le cloud. En tirant parti des offres de stockage grand public, nous montrons qu’il est possible de conserver le contrôle du stockage dans le cloud depuis le client.
Dans un second temps, nous construisons une archive de données répartie dont la résilience va au-delà des techniques de redondance standards. Pour cela, nous implémentons Recast, un prototype usant du data entanglement, qui encode et répartit les blocs de données sur de nombreux nœuds de stockage afin d’en assurer la durabilité. Enfin, nous examinons comment réduire l’augmentation des coûts de stockage entrainés par les méthodes proposées ci-dessus par de la dépduplication. Plus précisément, nous faisons usage de la Generalised Deduplication, une méthode dont les résultats vont au-delà de la déduplication classique grâce à une détection de similarité plus fine que la correspondance exacte. Summary
Cloud storage has durably entered the stage as go-to solution for business and personal storage. Virtually extending storage capabilities to infinity, cloud storage enables companies and individuals to focus on content creation without fear of running out of space or losing data. But as users entrust more and more data to the cloud, they also have to accept a loss of control over the data they o˜oad to the cloud. At a time when online services seem to be making a significant part of their profits by exploiting customer data, concerns over privacy and integrity of said data naturally arise. Are their online documents read by the storage provider or its employees? Is the content of these documents shared with third party partners of the storage provider? What happens if the provider goes bankrupt? Whatever answer can be o˙ered by the storage provider, the loss of control should be cause for concern. But storage providers also have to worry about trust and reliability. As they build distributed solutions to accommodate their customers’ needs, these concerns of control extend to the infrastructure they operate on. Conciliating security, confidentiality, resilience and perform-ance over large sets of distributed storage nodes is a tricky balancing act. And even when a suitable balance can be found, it is often done at the expense of increased storage overhead.
In this dissertation, we try to mitigate these issues by focusing on three aspects. First, we study solutions to empower users with flexible tooling ensuring security, integrity and redundancy in distributed storage settings. By leveraging public cloud storage o˙erings to build a configurable file system and storage middleware, we show that securing cloud-storage from the client-side is an e˙ective way maintaining control. Second, we build a distributed archive whose resilience goes beyond standard redundancy schemes. To achieve this, we implement Recast, relying on a data entanglement scheme, that encodes and distributes data over a set of storage nodes to ensure durability at a manageable cost. Finally, we look into o˙setting the increase in storage overhead by means of data reduction. This is made possible by the use of Generalised Deduplication, a scheme that improves over classical data deduplication by detecting similarities beyond exact matches. - PublicationAccès libreOn the Cost of Safe Storage for Public Clouds: an Experimental Evaluation(: IEEE, 2016-9-26)
; ;Pontes, Rogério; ;Maia, Francisco; ;Oliveira, Rui ;Paulo, JoãoCloud-based storage services such as Dropbox, Google Drive and OneDrive are increasingly popular for storing enterprise data, and they have already become the de facto choice for cloud-based backup of hundreds of millions of regular users. Drawn by the wide range of services they provide, no upfront costs and 24/7 availability across all personal devices, customers are well-aware of the benefits that these solutions can bring. However, most users tend to forget-or worse ignore-some of the main drawbacks of such cloud-based services, namely in terms of privacy. Data entrusted to these providers can be leaked by hackers, disclosed upon request from a governmental agency's subpoena, or even accessed directly by the storage providers (e.g., for commercial benefits). While there exist solutions to prevent or alleviate these problems, they typically require direct intervention from the clients, like encrypting their data before storing it, and reduce the benefits provided such as easily sharing data between users. This practical experience report studies a wide range of security mechanisms that can be used atop standard cloud-based storage services. We present the details of our evaluation testbed and discuss the design choices that have driven its implementation. We evaluate several state-of-the-art techniques with varying security guarantees responding to user-assigned security and privacy criteria. Our results reveal the various trade-offs of the different techniques by means of representative workloads on top of industry-grade storage services. - PublicationAccès libreBlockchain-Based Metadata Protection for Archival Systems(: IEEE, 2019-10-1)
;L'Hutereau, Arnaud; ; ; - PublicationMétadonnées seulementA Performance Evaluation of Erasure Coding Libraries for Cloud-Based Data StoresErasure codes have been widely used over the last decade to implement reliable data stores. They offer interesting trade-offs between efficiency, reliability, and storage overhead. Indeed, a distributed data store holding encoded data blocks can tolerate the failure of multiple nodes while requiring only a fraction of the space necessary for plain replication, albeit at an increased encoding and decoding cost. There exists nowadays a number of libraries implementing several variations of erasure codes, which notably differ in terms of complexity and implementation-specific optimizations. Seven years ago, Plank et al. [14] have conducted a comprehensive performance evaluation of open-source erasure coding libraries available at the time to compare their raw performance and measure the impact of different parameter configurations. In the present experimental study, we take a fresh perspective at the state of the art of erasure coding libraries. Not only do we cover a wider set of libraries running on modern hardware, but we also consider their efficiency when used in realistic settings for cloud-based storage, namely when deployed across several nodes in a data centre. Our measurements therefore account for the end-to-end costs of data accesses over several distributed nodes, including the encoding and decoding costs, and shed light on the performance one can expect from the various libraries when deployed in a real system. Our results reveal important differences in the efficiency of the different libraries, notably due to the type of coding algorithm and the use of hardware-specific optimizations.
- PublicationAccès libre
- PublicationAccès libreSafeFS: A Modular Architecture for Secure User-Space File Systems (One FUSE to rule them all)(: ACM, 2017-5-22)
;Pontes, Rogério; ;Maia, Francisco ;Paulo, João; ; ; Oliveira, Rui