Isoperimetry is All We Need: Langevin Posterior Sampling for RL

Emilio Jorge; Dimitrakakis, Christos; Debabrota Basu

Isoperimetry is All We Need: Langevin Posterior Sampling for RL

Author(s)

Emilio Jorge

Dimitrakakis, Christos

Chaire de science des données

Debabrota Basu

Date issued

2024

In

European Workshop on Reinforcement Learning

Abstract

In Reinforcement Learning theory, we often assume restrictive assumptions, like linearity and RKHS structure on the model, or Gaussianity and log-concavity of the posteriors over models, to design an algorithm with provably sublinear regret. In this paper, we study whether we can design efficient low-regret RL algorithms for any isoperimetric distribution, which includes and extends the standard setups in the literature. Specifically, we show that the well-known PSRL (Posterior Sampling-based RL) algorithm yields sublinear regret if the posterior distributions satisfy the Log-Sobolev Inequality (LSI), which is a form of isoperimetry. Further, for the cases where we cannot compute or sample from an exact posterior, we propose a Langevin sampling-based algorithm design scheme, namely LaPSRL. We show that LaPSRL also achieves sublinear regret if the posteriors only satisfy LSI. Finally, we deploy a version of LaPSRL with a Langevin sampling algorithms, SARAH-LD. We numerically demonstrate their performances in different bandit and MDP environments. Experimental results validate the generality of LaPSRL across environments and its competetive performance with respect to the baselines.

Publication type

conference paper

Identifiers

https://libra.unine.ch/handle/20.500.14713/21786

File(s)

Download

Name

34_isoperimetry_is_all_we_need_la.pdf

Type

Main Article

Size

366.34 KB

Format

Adobe PDF