Voici les éléments 1 - 7 sur 7
- PublicationMétadonnées seulementElastic Scaling of a High-Throughput Content-Based Publish/Subscribe EnginePublish/subscribe (pub/sub) infrastructures running as a service on cloud environments offer simplicity and flexibility for composing distributed applications. Provisioning them appropriately is however challenging. The amount of stored subscriptions and incoming publications varies over time, and the computational cost depends on the nature of the applications and in particular on the filtering operation they require (e.g., content-based vs. topic-based, encrypted vs. non-encrypted filtering). The ability to elastically adapt the amount of resources required to sustain given throughput and delay requirements is key to achieving cost-effectiveness for a pub/sub service running in a cloud environment. In this paper, we present the design and evaluation of an elastic content-based pub/sub system: E-STREAMHUB. Specific contributions of this paper include: (1) a mechanism for dynamic scaling, both out and in, of stateful and stateless pub/sub operators, (2) a local and global elasticity policy enforcer maintaining high system utilization and stable end-to-end latencies, and (3) an evaluation using real-world tick workload from the Frankfurt Stock Exchange and encrypted content-based filtering.
- PublicationMétadonnées seulementExploiting Concurrency in Domain-Specific Data Structures: A Concurrent Order Book and Workload Generator for Online TradingConcurrent programming is essential to exploit parallel processing capabilities of modern multi-core CPUs. While there exist many languages and tools to simplify the development of concurrent programs, they are not always readily applicable to domain-specific problems that rely on complex shared data structures associated with various semantics (e.g., priorities or consistency). In this paper, we explore such a domain-specific application from the financial field, where a data structure—an order book —is used to store and match orders from buyers and sellers arriving at a high rate. This application has interesting characteristics as it exhibits some clear potential for parallelism, but at the same time it is relatively complex and must meet some strict guarantees, notably w.r.t. the ordering of operations. We first present an accurate yet slightly simplified description of the order book problem and describe the challenges in paral- lelizing it. We then introduce several approaches for introducing concurrency in the shared data structure, in increasing order of sophistication starting from lock-based techniques to partially lock-free designs. We propose a comprehensive workload generator for constructing histories of orders according to realistic models from the financial domain. We finally perform an evaluation and comparison of the different concurrent designs.
- PublicationMétadonnées seulementInfrastructure Provisioning for Scalable Content-based Routing: Framework and AnalysisContent-based publish/subscribe is an attractive paradigm for designing large-scale systems, as it decouples producers of information from consumers. This provides extensive flexibility for applications, which can use a modular architecture. Using this architecture, each participant expresses its interest in events by means of filters on the content of those events instead of using pre-established communication channels. However, matching events against filters has a non-negligible processing cost. Scaling the infrastructure with the number of users or events requires appropriate provisioning of resources for each of the operations involved: routing and filtering. In this paper, we propose and describe a generic, modular, and scalable infrastructure for supporting high-performance content-based publish/subscribe. We analyze its properties and show how it dynamically scales in a realistic setting. Our results provide valuable insights into the design and deployment of scalable content-based routing infrastructures.
- PublicationMétadonnées seulementStreamHub: A Massively Parallel Architecture for High-Performance Content-Based Publish/SubscribeBy routing messages based on their content, publish/subscribe (pub/sub) systems remove the need to establish and maintain fixed communication channels. Pub/sub is a natural candidate for designing large-scale systems, composed of applications running in different domains and communicating via middleware solutions deployed on a public cloud. Such pub/sub systems must provide high throughput, filtering thousands of publications per second matched against hundreds of thousands of registered subscriptions with low and predictable delays, and must scale horizontally and vertically. As large-scale application composition may require complex publications and subscriptions representations, pub/sub system designs should not rely on the specific characteristics of a particular filtering scheme for implementing scalability. In this paper, we depart from the use of broker overlays, where each server must support the whole range of operations of a pub/sub service, as well as overlay management and routing functionality. We propose instead a novel and pragmatic tiered approach to obtain high-throughput and scalable pub/sub for clusters and cloud deployments. We separate the three operations involved in pub/sub and leverage their natural potential for parallelization. Our design, named StreamHub, is oblivious to the semantics of subscriptions and publications. It can support any type and number of filtering operations implemented by independent libraries. Experiments on a cluster with up to 384 cores indicate that StreamHub is able to register 150 K subscriptions per second and filter next to 2 K publications against 100 K stored subscriptions, resulting in nearly 400 K notifications sent per second. Comparisons against a broker overlay solution shows an improvement of two orders of magnitude in throughput when using the same number of cores.
- PublicationMétadonnées seulementEfficient and Confidentiality-Preserving Content-Based Publish/Subscribe with PrefilteringContent-based publish/subscribe provides a loosely-coupled and expressive form of communication for large-scale distributed systems. Confidentiality is a major challenge for publish/subscribe middleware deployed over multiple administrative domains. Encrypted matching allows confidentiality-preserving content-based filtering but has high performance overheads. It may also prevent the use of classical optimizations based on subscriptions containment. We propose a support mechanism that reduces the cost of encrypted matching, in the form of a prefiltering operator using Bloom filters and simple randomization techniques. This operator greatly reduces the amount of encrypted subscriptions that must be matched against incoming encrypted publications. It leverages subscription containment information when available, but also ensures that containment confidentiality is preserved otherwise. We propose containment obfuscation techniques and provide a rigorous security analysis of the information leaked by Bloom filters in this case. We conduct a thorough experimental evaluation of prefiltering under a large variety of workloads. Our results indicate that prefiltering is successful at reducing the space of subscriptions to be tested in all cases. We show that while there is a tradeoff between prefiltering efficiency and information leakage when using containment obfuscation, it is practically possible to obtain good prefiltering performance while securing the technique against potential leakages.
- PublicationAccès libreScalable content-based publish/suscribe and application to online tradingLe système pub/sub basé sur le contenu est un candidat idéal pour concrétiser la communication d’applications à grande-échelle. Il permet de découpler les producteurs de messages (publishers) des consommateurs (subscribers), qui communiquent alors de manière indirecte. Les producteurs génèrent un flux d’informations (publications) qui sont acheminées vers les abonnés en fonction des leurs intérêts (exprimés au travers d’abonnements).
Le filtrage des messages a un coût de traitement non-négligible. La première contribution dans cette thèse est la conception et l’analyse d’une infrastructure générique, modulaire et supportant le passage à l’échelle permettant d’avoir un système pub/sub basé sur le contenu à haute performance.
De tels systèmes pub/sub doivent fournir un débit très élevé, en filtrant des milliers de publications face à des centaines de milliers d’abonnements tout en garantissant une faible latence ainsi qu’un passage à l’échelle horizontal et vertical. La composition d’applications à grande-échelle peut nécessiter des formes complexes de publications et d’abonnements, la conception d’un système pub/sub ne doit pas dépendre des caractéristiques de filtrage particulières pour mettre en œuvre le passage à l’échelle. La seconde contribution de cette thèse est la conception et la mise en œuvre de StreamHub, une approche à plusieurs niveaux innovante et pragmatique offrant une haute performance ainsi qu’un passage à l’échelle. Nous séparons l’ensemble du processus en trois opérations et tirons avantage de leur potentiel naturel de parallélisation.
Dans les scénarii du monde réel, la quantité d’abonnements ainsi que le débit de publications varie au cours du temps et par conséquent les coûts de traitement y étant liés. La troisième contribution de cette thèse est e-StreamHub, un système pub/sub élastique. La troisième contribution de cette thèse contient : (1) un mécanisme permettant la réduction/augmentation des ressources utilisées, (2) un système global et local d’application de polices d’utilisation maintenant un usage élevé du système ainsi que des latences stables et (3) une évaluation faite avec des données réelles provenant de la bourse de Francfort.
La quatrième contribution se concentre sur une application spécifique du monde de la finance, dans laquelle, une structure de donnée nommée carnet d’ordres, est utilisée pour contenir et mettre en correspondance les ordres d’achats et de ventes arrivant à une rythme soutenu. Cette dernière a des propriétés intéressantes mettant en évidence un grand potentiel de parallélisation mais est aussi relativement complexe et requiert le respect de certaines garanties (notamment en rapport à l’ordre des opérations). Nous proposons de nombreuses approches pour tirer profit de la concurrence dans une structure de donnée partagée, en augmentant le niveau de sophistication en partant de solutions basées sur des verrous jusqu’à des conceptions partiellement sans verrouillage. Comme corollaire, nous proposons un générateur de données historiques synthétiques suivant des modèles réalistes venant de l’éconophysique., Publish/subscribe is a popular messaging pattern that provides efficient and decoupled information dissemination in distributed environments. Publishers generate a flow of information as publications, which are routed to subscribers based on their interests expressed as subscriptions.
Matching events against filters has a non-negligible processing cost. Our first contribution within this thesis is the design and the analysis of a generic, modular and scalable infrastructure for supporting high-performance content-based publish/subscribe.
Pub/sub systems must provide high throughput, filtering thousands of publications per second matched against hundreds of thousands of registered subscriptions with low and predictable delays, and must scale horizontally and vertically. As largescale application composition may require complex publications and subscriptions representations, pub/sub system designs should not rely on the specific characteristics of a particular filtering scheme for implementing scalability. The second contribution of this thesis is the design and the implementation of a novel and pragmatic tiered approach, StreamHub, that offers high-throughput and scalability. We divide the whole process in the three operations involved in pub/sub and leverage their natural potential for parallelization.
In many real-world scenarii, the amount of stored subscriptions and the incoming publications rates varies over time, and similarly their linked computational cost. We propose e-StreamHub, an elastic content-based pub/sub system. The third contribution of this thesis includes: (1) a mechanism for scaling both out and in, of stateful and stateless pub/sub operators, (2) a local and global elasticity policy enforcer maintains high system utilization and stable end-to-end latencies and (3) an evaluation using real-world workload from the Frankfurt Stock Exchange.
Lastly, we focus on a domain-specific application from the financial field, where a data structure, named as order book, is used to store and match orders from buyers and sellers arriving at a high pace. This application has interesting characteristics as it exhibits some clear potential for parallelism, but at the same time it is relatively complex and must meet some strict guarantees (notably w.r.t the ordering of operations). In this last contribution, we propose several approaches for introducing concurrency in the shared data structure, in increasing the order of sophistication starting from lock-based technique to partially lock-free designs. Corollary we propose a workload generator for constructing histories according to realistic models from the financial domain.
- PublicationMétadonnées seulementThrifty Privacy: Efficient Support for Privacy-Preserving Publish/SubscribeContent-based publish/subscribe is an appealing paradigm for building large-scale distributed applications. Such applications are often deployed over multiple administrative domains, some of which may not be trusted. Recent attacks in public clouds indicate that a major concern in untrusted domains is the enforcement of privacy. By routing data based on subscriptions evaluated on the content of publications, publish/subscribe systems can expose critical information to unauthorized parties. Information leakage can be avoided by the means of privacy-preserving filtering, which is supported by several mechanisms for encrypted matching. Unfortunately, all existing approaches have in common a high performance overhead and the difficulty to use classical optimization for content-based filtering such as per-attribute containment. In this paper, we propose a novel mechanism that greatly reduces the cost of supporting privacy-preserving filtering based on encrypted matching operators. It is based on a pre-filtering stage that can be combined with containment graphs, if available. Our experiments indicate that pre-filtering is able to significantly reduce the number of encrypted matching for a variety of workloads, and therefore the costs associated with the cryptographic mechanisms. Furthermore, our analysis shows that the additional data structures used for pre-filtering have very limited impact on the effectiveness of privacy preservation.