Login
TOPiCo: Detecting Most Frequent Items from Multiple High-Rate Event Streams
Abstract Systems such as social networks, search engines or trading platforms operate geographically distant sites that continu- ously generate streams of events at high-rate. Such events can be access logs to web servers, feeds of messages from participants of a social network, or financial data, among others. The ability to timely detect trends and popularity variations is of paramount importance in such systems. In particular, determining what are the most popular events across all sites allows to capture the most relevant informa- tion in near real-time and quickly adapt the system to the load. This paper presents TOPiCo, a protocol that com- putes the most popular events across geo-distributed sites in a low cost, bandwidth-efficient and timely manner. TOPiCo starts by building the set of most popular events locally at each site. Then, it disseminates only events that have a chance to be among the most popular ones across all sites, significantly reducing the required bandwidth. We give a correctness proof of our algorithm and evaluate TOPiCo using a real-world trace of more than 240 million events spread across 32 sites. Our empirical results shows that
(i) TOPiCo is timely and cost-efficient for detecting popular events in a large-scale setting, (ii) it adapts dynamically to the distribution of the events, and (iii) our protocol is particularly efficient for skewed distributions.
   
Keywords top-k query, top-k frequency, distributed systems, event processing
   
Citation V. Schiavoni, et al., "TOPiCo: Detecting Most Frequent Items from Multiple High-Rate Event Streams," in DEBS 2015: 9th ACM International Conference on Distributed Event-Based Systems, Oslo, Norway, 2015, .
   
Type Conference paper (English)
Name of conference DEBS 2015: 9th ACM International Conference on Distributed Event-Based Systems (Oslo, Norway)
Date of conference 29-6-2015
Publisher ACM
URL http://www.debs2015.org
Related project LEADS: Large-Scale Elastic Architecture for Data as a Ser...