Slead: low-memory steady distributed systems slicing

Francisco Maia, Miguel Matos, Etienne Rivière & Rui Oliveira

Résumé Slicing a large-scale distributed system is the process of autonomously
partitioning its nodes into k groups, named slices. Slicing is
associated to an order on node-specific criteria, such as available storage,
uptime, or bandwidth. Each slice corresponds to the nodes between two
quantiles in a virtual ranking according to the criteria.
For instance, a system can be split in three groups, one with nodes with
the lowest uptimes, one with nodes with the highest uptimes, and one
in the middle. Such a partitioning can be used by applications to assign
different tasks to different groups of nodes, e.g., assigning critical tasks to
the more powerful or stable nodes and less critical tasks to other slices.
Assigning a slice to each node in a large-scale distributed system, where
no global knowledge of nodes’ criteria exists, is not trivial. Recently,
much research effort was dedicated to guaranteeing a fast and correct
convergence in comparison to a global sort of the nodes.
Unfortunately, state-of-the-art slicing protocols exhibit flaws that preclude
their application in real scenarios, in particular with respect to cost
and stability. In this paper, we identify steadiness issues where nodes in a
slice border constantly exchange slice and large memory requirements for
adequate convergence, and provide practical solutions for the two. Our
solutions are generic and can be applied to two different state-of-the-art
slicing protocols with little effort and while preserving the desirable properties
of each. The effectiveness of the proposed solutions is extensively
studied in several simulated experiments.
Citation F. Maia, et al., "Slead: low-memory steady distributed systems slicing," in DAIS'12: 12th IFIP International Conference on Distributed Applications and Interoperable Systems, Stockholm, Sweden, 2012.
Type Actes de congrès (Anglais)
Nom de la conférence DAIS'12: 12th IFIP International Conference on Distributed Applications and Interoperable Systems (Stockholm, Sweden)
Date de la conférence 1-6-2012
Editeur commercial Springer
URL http://link.springer.com/chapter/10.1007%2F978-3-642-3082...