Options
Isoperimetry is All We Need: Langevin Posterior Sampling for RL
Auteur(s)
Date de parution
2024
In
European Workshop on Reinforcement Learning
Résumé
In Reinforcement Learning theory, we often assume restrictive assumptions, like linearity and RKHS structure on the model, or Gaussianity and log-concavity of the posteriors over models, to design an algorithm with provably sublinear regret. In this paper, we study whether we can design efficient low-regret RL algorithms for any isoperimetric distribution, which includes and extends the standard setups in the literature. Specifically, we show that the well-known PSRL (Posterior Sampling-based RL) algorithm yields sublinear regret if the posterior distributions satisfy the Log-Sobolev Inequality (LSI), which is a form of isoperimetry. Further, for the cases where we cannot compute or sample from an exact posterior, we propose a Langevin sampling-based algorithm design scheme, namely LaPSRL. We show that LaPSRL also achieves sublinear regret if the posteriors only satisfy LSI. Finally, we deploy a version of LaPSRL with a Langevin sampling algorithms, SARAH-LD. We numerically demonstrate their performances in different bandit and MDP environments. Experimental results validate the generality of LaPSRL across environments and its competetive performance with respect to the baselines.
Identifiants
Type de publication
conference paper not in proceedings
Dossier(s) à télécharger