Voici les éléments 1 - 10 sur 23
  • Publication
    Métadonnées seulement
    BRISA: Combining Efficiency and Reliability in Epidemic Data Dissemination
    (2012-5-21)
    Matos, Miguel
    ;
    ; ;
    Oliveira, Rui
    ;
    There is an increasing demand for efficient and robust systems able to cope with today's global needs for intensive data dissemination, e.g., media content or news feeds. Unfortunately, traditional approaches tend to focus on one end of the efficiency/robustness design spectrum, by either leveraging rigid structures such as trees to achieve efficient distribution, or using loosely-coupled epidemic protocols to obtain robustness. In this paper we present BRISA, a hybrid approach combining the robustness of epidemic-based dissemination with the efficiency of tree-based structured approaches. This is achieved by having dissemination structures such as trees implicitly emerge from an underlying epidemic substrate by a judicious selection of links. These links are chosen with local knowledge only and in such a way that the completeness of data dissemination is not compromised, i.e., the resulting structure covers all nodes. Failures are treated as an integral part of the system as the dissemination structures can be promptly compensated and repaired thanks to the underlying epidemic substrate. Besides presenting the protocol design, we conduct an extensive evaluation in a real environment, analyzing the effectiveness of the structure creation mechanism and its robustness under faults and churn. Results confirm BRISA as an efficient and robust approach to data dissemination in the large scale.
  • Publication
    Métadonnées seulement
    Exploiting Node Connection Regularity for DHT Replication
    (: IEEE, 2011-10)
    Pace, Alessio
    ;
    Quema, Vivien
    ;
  • Publication
    Métadonnées seulement
    NAT-resilient Gossip Peer Sampling
    (: IEEE, 2009-6-22)
    Kermarrec, Anne-Marie
    ;
    Pace, Alessio
    ;
    Quema, Vivien
    ;
  • Publication
    Métadonnées seulement
  • Publication
    Métadonnées seulement
    TOPiCo: Detecting Most Frequent Items from Multiple High-Rate Event Streams
    (: ACM, 2015-6-29) ; ; ; ;
    Matos, Miguel
    ;
    Oliveira, Rui
    Systems such as social networks, search engines or trading platforms operate geographically distant sites that continu- ously generate streams of events at high-rate. Such events can be access logs to web servers, feeds of messages from participants of a social network, or financial data, among others. The ability to timely detect trends and popularity variations is of paramount importance in such systems. In particular, determining what are the most popular events across all sites allows to capture the most relevant informa- tion in near real-time and quickly adapt the system to the load. This paper presents TOPiCo, a protocol that com- putes the most popular events across geo-distributed sites in a low cost, bandwidth-efficient and timely manner. TOPiCo starts by building the set of most popular events locally at each site. Then, it disseminates only events that have a chance to be among the most popular ones across all sites, significantly reducing the required bandwidth. We give a correctness proof of our algorithm and evaluate TOPiCo using a real-world trace of more than 240 million events spread across 32 sites. Our empirical results shows that (i) TOPiCo is timely and cost-efficient for detecting popular events in a large-scale setting, (ii) it adapts dynamically to the distribution of the events, and (iii) our protocol is particularly efficient for skewed distributions.
  • Publication
    Métadonnées seulement
    Evaluating the Cost and Robustness of Self-organizing Distributed Hash Tables
    Self-organizing construction principles are a natural fit for large-scale distributed system in unpredictable deployment environments. These principles allow a system to systematically converge to a global state by means of simple, uncoordinated actions by individual peers. Indexing services based on the distributed hash table (DHT) abstraction have been established as a solid foundation for large-scale distributed applications. For most DHTs, the creation and maintenance of the overlay structure relies on the exploration and update of an already stabilized structure. We evaluate in this paper the practical interest of self-organizing principles, and in particular gossip-based overlay construction protocols, to bootstrap and maintain various DHT implementations. Based on the seminal work on T-Chord, a self-organizing version of Chord using the T-Man overlay construction service, we contribute three additional self-organizing DHTs: T-Pastry, T-Kademlia and T-Kelips. We conduct an experimental evaluation of the cost and performance of each of these designs using a prototype implementation. Our conclusion is that, while providing equivalent performance in a stabilized system, self-organizing DHTs are able to sustain and recover from higher level of churn than their explicitly-created counterparts, and should therefore be considered as a method of choice for deploying robust indexing layers in adverse environments.
  • Publication
    Métadonnées seulement
    Lightweight, Efficient, Robust Epidemic Dissemination
    (2013-1-13)
    Matos, Miguel
    ;
    ; ;
    Oliveira, Rui
    ;
    Gossip-based protocols provide a simple, scalable, and robust way to disseminate messages in large-scale systems. In such protocols, messages are spread in an epidemic manner. Gossiping may take place between nodes using push, pull, or a combination. Push-based systems achieve reasonable latency and high resilience to failures but may impose an unnecessarily large redundancy and overhead on the system. At the other extreme, pull-based protocols impose a lower overhead on the network at the price of increased latencies. A few hybrid approaches have been proposed—typically pushing control messages and pulling data—to avoid the redundancy of high-volume content and single-source streams. Yet, to the best of our knowledge, no other system intermingles push and pull in a multiple-senders scenario, in such a way that data messages of one help in carrying control messages of the other and in adaptively adjusting its rate of operation, further reducing overall cost and improving both on delays and robustness. In this paper, we propose an efficient generic push-pull dissemination protocol, Pulp, which combines the best of both worlds. Pulp exploits the efficiency of push approaches, while limiting redundant messages and therefore imposing a low overhead, as pull protocols do. Pulp leverages the dissemination of multiple messages from diverse sources: by exploiting the push phase of messages to transmit information about other disseminations, Pulp enables an efficient pulling of other messages, which themselves help in turn with the dissemination of pending messages. We deployed Pulp on a cluster and on PlanetLab. Our results demonstrate that Pulp achieves an appealing trade-off between coverage, message redundancy, and propagation delay.
  • Publication
    Métadonnées seulement
    CoFeed: privacy-preserving Web search recommendation based on collaborative aggregation of interest feedback
    (2013-1-13) ; ;
    Leonini, Lorenzo
    ;
    Luu, Toan
    ;
    Rajman, Martin
    ;
    ; ;
    Valerio, José
    Search engines essentially rely on the structure of the graph of hyperlinks. Although accurate for the main trend, this is not effective when some query is ambiguous. Leveraging semantic information by the mean of interest matching allows proposing complementary results that are tailored to the user's expectations. This paper proposes a collaborative search companion system, CoFeed, that collects user search queries and that considers feedback to build user-centric and document-centric profiling information. Over time, the system constructs ranked collections of elements that maintain the required information diversity and enhance the user search experience by presenting additional results tailored to the user's interest space. This collaborative search companion requires a supporting architecture adapted to large user populations generating high request loads. To that end, it integrates mechanisms for ensuring scalability and load balancing of the service under varying loads and user interest distributions. Moreover, collecting the recommendation data poses the problem of users’ privacy, and the bias one peer can induce to the system by sending fake recommendations. To that end, CoFeed ensures both publisher anonymity and rate limitation. With the former, the origin of the data is never known by the server that processes it, even if several servers collude to spy on some user. The latter, combined with decoupled authentication, allows to minimize the influence of cheating peers sending fake recommendations. Experiments with a deployed prototype highlight the efficiency of the system by analyzing improvement in search relevance, computational cost, scalability and load balancing.
  • Publication
    Métadonnées seulement
    UniCrawl: A Practical Geographically Distributed Web Crawler
    (: IEEE, 2015-6-27)
    Le Quoc, Do
    ;
    Fetzer, Christof
    ;
    ; ; ;
    As the wealth of information available on the web keeps growing, being able to harvest massive amounts of data has become a major challenge. Web crawlers are the core components to retrieve such vast collections of publicly available data. The key limiting factor of any crawler architecture is however its large infrastructure cost. To reduce this cost, and in particular the high upfront investments, we present in this paper a geo- distributed crawler solution, UniCrawl. UniCrawl orchestrates several geographically distributed sites. Each site operates an independent crawler and relies on well-established techniques for fetching and parsing the content of the web. UniCrawl splits the crawled domain space across the sites and federates their storage and computing resources, while minimizing thee inter-site communication cost. To assess our design choices, we evaluate UniCrawl in a controlled environment using the ClueWeb12 dataset, and in the wild when deployed over several remote locations. We conducted several experiments over 3 sites spread across Germany. When compared to a centralized architecture with a crawler simply stretched over several locations, UniCrawl shows a performance improvement of 93.6% in terms of network bandwidth consumption, and a speedup factor of 1.75.