Login
TOPiCo: Detecting Most Frequent Items from Multiple High-Rate Event Streams
Résumé Systems such as social networks, search engines or trading platforms operate geographically distant sites that continu- ously generate streams of events at high-rate. Such events can be access logs to web servers, feeds of messages from participants of a social network, or financial data, among others. The ability to timely detect trends and popularity variations is of paramount importance in such systems. In particular, determining what are the most popular events across all sites allows to capture the most relevant informa- tion in near real-time and quickly adapt the system to the load. This paper presents TOPiCo, a protocol that com- putes the most popular events across geo-distributed sites in a low cost, bandwidth-efficient and timely manner. TOPiCo starts by building the set of most popular events locally at each site. Then, it disseminates only events that have a chance to be among the most popular ones across all sites, significantly reducing the required bandwidth. We give a correctness proof of our algorithm and evaluate TOPiCo using a real-world trace of more than 240 million events spread across 32 sites. Our empirical results shows that
(i) TOPiCo is timely and cost-efficient for detecting popular events in a large-scale setting, (ii) it adapts dynamically to the distribution of the events, and (iii) our protocol is particularly efficient for skewed distributions.
   
Mots-clés top-k query, top-k frequency, distributed systems, event processing
   
Citation V. Schiavoni, et al., "TOPiCo: Detecting Most Frequent Items from Multiple High-Rate Event Streams," in DEBS 2015: 9th ACM International Conference on Distributed Event-Based Systems, Oslo, Norway, 2015.
   
Type Actes de congrès (Anglais)
Nom de la conférence DEBS 2015: 9th ACM International Conference on Distributed Event-Based Systems (Oslo, Norway)
Date de la conférence 29-6-2015
Editeur commercial ACM
URL http://www.debs2015.org
Liée au projet LEADS: Large-Scale Elastic Architecture for Data as a Ser...