Options
Next generation erasure coding methods for cloud storage
Titre du projet
Next generation erasure coding methods for cloud storage
Description
This project proposes the study of next generation erasure coding methods to preserve data in cloud storage systems efficiently. Cloud computing is built with less expensive hardware. Software and hardware failures may cause data loss. The storage of redundant data is essential to preserve digital data. Replication is a de-facto standard to create redundancy, e.g. triplication keeps three replicas in distinct places. Google, Facebook and many other storage systems use triplication. Currently, research and industry efforts are focused on reducing the storage overhead. As a result, erasure coding like Reed-Solomon codes are a popular alternative. None of both approaches can practically tolerate a large amount of simultaneous failures as they consume plenty of resources. Significant trade-offs among the storage overhead, network bandwidth, disk I/O constitute a limitation on a system’s fault-tolerance. As a result, the failure tolerance is low. For instance, triplication tolerates 2 failures, and Reed-Solomon in a common setting used by Facebook tolerates 4 failures. The main question that this project tries to address is: How can we improve the reliability of storage systems while using few resources? Increasing the fault tolerance brings multiple benefits. Notably, it helps for long-term retention of data. In addition, it may facilitate datacenter maintenance and is a deterrent against malicious attacks such tampering or data censorship. The hypothesis is that the creation of interdependencies between old and new content inserted in a system can be used to disperse redundant data across a large amount of devices efficiently.
Chercheur principal
Statut
Completed
Date de début
1 Février 2016
Date de fin
31 Juillet 2016
Chercheurs
Miller, Ethan
Organisations
Identifiant interne
34790
identifiant
Mots-clés
1 Résultats
Voici les éléments 1 - 1 sur 1
- PublicationMétadonnées seulementSimple Data Entanglement Layouts with High ReliabilityWe study the reliability of open and close entan- glements, two simple data distribution layouts for log-structured append-only storage systems. Both techniques use equal numbers of data and parity drives and generate their parity data by computing the exclusive or (XOR) of the most recently appended data with the contents of their last parity drive. While open entanglements maintain an open chain of data and parity drives, closed entanglements include the exclusive or of the contents of their first and last data drives. We evaluate five-year reliabilities of open and closed entanglements, for two different array sizes and drive failure rates. Our results show that open entanglements provide much better five-year reliabilities than mirroring and reduce the probability of a data loss by at least 90 percent over a period of five years. Closed entanglements perform even better and reduce the same probability by at least 98 percent.