Voici les éléments 1 - 5 sur 5
Vignette d'image
Publication
Accès libre

A Novel Individually Rational Objective In Multi-Agent Multi-Armed Bandits: Algorithms and Regret Bounds

2020, Aristide C. Y. Tossou, Dimitrakakis, Christos, Jaroslaw Rzepecki, Katja Hofmann

We study a two-player stochastic multi-armed bandit (MAB) problem with different expected rewards for each player, a generalisation of two-player general sum repeated games to stochastic rewards. Our aim is to find the egalitarian bargaining solution (EBS) for the repeated game, which can lead to much higher rewards than the maximin value of both players. Our main contribution is the derivation of an algorithm, UCRG, that achieves simultaneously for both players, a high-probability regret bound of order Õ (T2/3) after any T rounds of play. We demonstrate that our upper bound is nearly optimal by proving a lower bound of (T2/3) for any algorithm. Experiments confirm our theoretical results and the superiority of UCRG compared to the well-known explore-then-commit heuristic.

Vignette d'image
Publication
Accès libre

Thompson Sampling For Stochastic Bandits with Graph Feedback

2017-01-16T10:52:51Z, Aristide C. Y. Tossou, Dimitrakakis, Christos, Devdatt Dubhashi

We present a novel extension of Thompson Sampling for stochastic sequential decision problems with graph feedback, even when the graph structure itself is unknown and/or changing. We provide theoretical guarantees on the Bayesian regret of the algorithm, linking its performance to the underlying properties of the graph. Thompson Sampling has the advantage of being applicable without the need to construct complicated upper confidence bounds for different problems. We illustrate its performance through extensive experimental results on real and simulated networks with graph feedback. More specifically, we tested our algorithms on power law, planted partitions and Erdo's-Renyi graphs, as well as on graphs derived from Facebook and Flixster data. These all show that our algorithms clearly outperform related methods that employ upper confidence bounds, even if the latter use more information about the graph.

Vignette d'image
Publication
Accès libre

On The Differential Privacy of Thompson Sampling With Gaussian Prior

2018-06-24T18:37:09Z, Aristide C. Y. Tossou, Dimitrakakis, Christos

We show that Thompson Sampling with Gaussian Prior as detailed by Algorithm 2 in (Agrawal & Goyal, 2013) is already differentially private. Theorem 1 show that it enjoys a very competitive privacy loss of only O(ln2T) after T rounds. Finally, Theorem 2 show that one can control the privacy loss to any desirable ϵ level by appropriately increasing the variance of the samples from the Gaussian posterior. And this increases the regret only by a term of O(ln2Tϵ). This compares favorably to the previous result for Thompson Sampling in the literature ((Mishra & Thakurta, 2015)) which adds a term of O(Kln3Tϵ2) to the regret in order to achieve the same privacy level. Furthermore, our result use the basic Thompson Sampling with few modifications whereas the result of (Mishra & Thakurta, 2015) required sophisticated constructions.

Vignette d'image
Publication
Accès libre

Achieving Privacy in the Adversarial Multi-Armed Bandit

2017, Aristide C. Y. Tossou, Dimitrakakis, Christos

In this paper, we improve the previously best known regret bound to achieve ϵ-differential privacy in oblivious adversarial bandits from O(T2/3/ϵ) to O(T−−√lnT/ϵ). This is achieved by combining a Laplace Mechanism with EXP3. We show that though EXP3 is already differentially private, it leaks a linear amount of information in T. However, we can improve this privacy by relying on its intrinsic exponential mechanism for selecting actions. This allows us to reach O(lnT−−−√)-DP, with a regret of O(T2/3) that holds against an adaptive adversary, an improvement from the best known of O(T3/4). This is done by using an algorithm that run EXP3 in a mini-batch loop. Finally, we run experiments that clearly demonstrate the validity of our theoretical analysis.

Vignette d'image
Publication
Accès libre

Learning to Match

2017-07-30T21:50:50Z, Philip Ekman, Sebastian Bellevik, Dimitrakakis, Christos, Aristide C. Y. Tossou

Outsourcing tasks to previously unknown parties is becoming more common. One specific such problem involves matching a set of workers to a set of tasks. Even if the latter have precise requirements, the quality of individual workers is usually unknown. The problem is thus a version of matching under uncertainty. We believe that this type of problem is going to be increasingly important. When the problem involves only a single skill or type of job, it is essentially a type of bandit problem, and can be solved with standard algorithms. However, we develop an algorithm that can perform matching for workers with multiple skills hired for multiple jobs with multiple requirements. We perform an experimental evaluation in both single-task and multi-task problems, comparing with the bounded $\epsilon$-first algorithm, as well as an oracle that knows the true skills of workers. One of the algorithms we developed gives results approaching 85\% of oracle's performance. We invite the community to take a closer look at this problem and develop real-world benchmarks.