Options
Dimitrakakis, Christos
Nom
Dimitrakakis, Christos
Affiliation principale
Fonction
Professor
Email
christos.dimitrakakis@unine.ch
Identifiants
Résultat de la recherche
Voici les éléments 1 - 7 sur 7
- PublicationAccès libreMinimax-Bayes Reinforcement Learning(PMLR, 2023)
;Thomas Kleine Buening; ;Hannes Eriksson ;Divya GroverEmilio JorgeWhile the Bayesian decision-theoretic framework offers an elegant solution to the problem of decision making under uncertainty, one question is how to appropriately select the prior distribution. One idea is to employ a worst-case prior. However, this is not as easy to specify in sequential decision making as in simple statistical estimation problems. This paper studies (sometimes approximate) minimax-Bayes solutions for various reinforcement learning problems to gain insights into the properties of the corresponding priors and policies. We find that while the worst-case prior depends on the setting, the corresponding minimax policies are more robust than those that assume a standard (i.e. uniform) prior. - PublicationAccès libreHigh-dimensional near-optimal experiment design for drug discovery via Bayesian sparse sampling(2021-04-23T22:43:16Z)
;Hannes Eriksson; Lars CarlssonWe study the problem of performing automated experiment design for drug screening through Bayesian inference and optimisation. In particular, we compare and contrast the behaviour of linear-Gaussian models and Gaussian processes, when used in conjunction with upper confidence bound algorithms, Thompson sampling, or bounded horizon tree search. We show that non-myopic sophisticated exploration techniques using sparse tree search have a distinct advantage over methods such as Thompson sampling or upper confidence bounds in this setting. We demonstrate the significant superiority of the approach over existing and synthetic datasets of drug toxicity. - PublicationAccès libreBayesian Reinforcement Learning via Deep, Sparse Sampling(2020)
;Divya Grover ;Debabrota BasuWe address the problem of Bayesian reinforcement learning using efficient model-based online planning. We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal policy, with a lower computational complexity. The main novelty is the use of a candidate policy generator, to generate long-term options in the planning tree (over beliefs), which allows us to create much sparser and deeper trees. Experimental results on different environments show that in comparison to the state-of-the-art, our algorithm is both computationally more efficient, and obtains significantly higher reward in discrete environments. - PublicationAccès libreDifferential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost?(2019)
;Debabrota Basu; Aristide TossouBased on differential privacy (DP) framework, we introduce and unify privacy definitions for the multi-armed bandit algorithms. We represent the framework with a unified graphical model and use it to connect privacy definitions. We derive and contrast lower bounds on the regret of bandit algorithms satisfying these definitions. We leverage a unified proving technique to achieve all the lower bounds. We show that for all of them, the learner's regret is increased by a multiplicative factor dependent on the privacy level ϵ. We observe that the dependency is weaker when we do not require local differential privacy for the rewards. - PublicationAccès libreAlgorithms for Differentially Private Multi-Armed Bandits(2016)
;Aristide TossouWe present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist (ϵ,δ) differentially private variants of Upper Confidence Bound algorithms which have optimal regret, O(ϵ−1+logT). This is a significant improvement over previous results, which only achieve poly-log regret O(ϵ−2log2T), because of our use of a novel interval-based mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds. - PublicationAccès librePersonalized news recommendation with context trees(2013)
;Florent Garcin; Boi FaltingsThe profusion of online news articles makes it difficult to find interesting articles, a problem that can be assuaged by using a recommender system to bring the most relevant news stories to readers. However, news recommendation is challenging because the most relevant articles are often new content seen by few users. In addition, they are subject to trends and preference changes over time, and in many cases we do not have sufficient information to profile the reader. In this paper, we introduce a class of news recommendation systems based on context trees. They can provide high-quality news recommendation to anonymous visitors based on present browsing behaviour. We show that context-tree recommender systems provide good prediction accuracy and recommendation novelty, and they are sufficiently flexible to capture the unique properties of news articles. - PublicationAccès libreMonte-Carlo utility estimates for Bayesian reinforcement learning(2013)This paper introduces a set of algorithms for Monte-Carlo Bayesian reinforcement learning. Firstly, Monte-Carlo estimation of upper bounds on the Bayes-optimal value function is employed to construct an optimistic policy. Secondly, gradient-based algorithms for approximate upper and lower bounds are introduced. Finally, we introduce a new class of gradient algorithms for Bayesian Bellman error minimisation. We theoretically show that the gradient methods are sound. Experimentally, we demonstrate the superiority of the upper bound method in terms of reward obtained. However, we also show that the Bayesian Bellman error method is a close second, despite its significant computational simplicity.