Options
Dimitrakakis, Christos
Résultat de la recherche
Environment Design for Inverse Reinforcement Learning
2024, Thomas Kleine Buening, Villin, Victor, Dimitrakakis, Christos
Learning a reward function from demonstrations suffers from low sample-efficiency. Even with abundant data, current inverse reinforcement learning methods that focus on learning from a single environment can fail to handle slight changes in the environment dynamics. We tackle these challenges through adaptive environment design. In our framework, the learner repeatedly interacts with the expert, with the former selecting environments to identify the reward function as quickly as possible from the expert’s demonstrations in said environments. This results in improvements in both sample-efficiency and robustness, as we show experimentally, for both exact and approximate inference.
Better Luck Next Time: About Robust Recourse in Binary Allocation Problems
2024, Meirav Segal, Anne-marie George, Ingrid Chieh Yu, Dimitrakakis, Christos
In this work, we present the problem of algorithmic recourse for the setting of binary allocation problems. In this setting, the optimal allocation does not depend only on the prediction model and the individual’s features, but also on the current available resources, utility function used by the decision maker and other individuals currently applying for the resource. We provide a method for generating counterfactual explanations under separable utilities that are monotonically increasing with prediction scores. Here, we assume that we can translate probabilities of “success” together with some other parameters into utility, such that the problem can be phrased as a knapsack problem and solved by known allocation policies: optimal 0–1 knapsack and greedy. We use the two policies respectively in the use cases of loans and college admissions. Moreover, we address the problem of recourse invalidation due to changes in allocation variables, under an unchanged prediction model, by presenting a method for robust recourse under variables’ distributions. Finally, we empirically compare our method with perturbation-robust recourse and show that our method can provide higher validity at a lower cost.
Minimax-Bayes Reinforcement Learning
2023, Thomas Kleine Buening, Dimitrakakis, Christos, Hannes Eriksson, Divya Grover, Emilio Jorge
While the Bayesian decision-theoretic framework offers an elegant solution to the problem of decision making under uncertainty, one question is how to appropriately select the prior distribution. One idea is to employ a worst-case prior. However, this is not as easy to specify in sequential decision making as in simple statistical estimation problems. This paper studies (sometimes approximate) minimax-Bayes solutions for various reinforcement learning problems to gain insights into the properties of the corresponding priors and policies. We find that while the worst-case prior depends on the setting, the corresponding minimax policies are more robust than those that assume a standard (i.e. uniform) prior.
Interactive Inverse Reinforcement Learning for Cooperative Games
2022, Thomas Kleine Büning, Anne-Marie George, Dimitrakakis, Christos
We study the problem of designing autonomous agents that can learn to cooperate effectively with a potentially suboptimal partner while having no access to the joint reward function. This problem is modeled as a cooperative episodic two-agent Markov decision process. We assume control over only the first of the two agents in a Stackelberg formulation of the game, where the second agent is acting so as to maximise expected utility given the first agent’s policy. How should the first agent act in order to learn the joint reward function as quickly as possible and so that the joint policy is as close to optimal as possible? We analyse how knowledge about the reward function can be gained in this interactive two-agent scenario. We show that when the learning agent’s policies have a significant effect on the transition function, the reward function can be learned efficiently.
Eliciting Kemeny Rankings
2024, George, Anne-Marie, Dimitrakakis, Christos
We formulate the problem of eliciting agents’ preferences with the goal of finding a Kemeny ranking as a Dueling Bandits problem. Here the bandits’ arms correspond to alternatives that need to be ranked and the feedback corresponds to a pairwise comparison between alternatives by a randomly sampled agent. We consider both sampling with and without replacement, i.e., the possibility to ask the same agent about some comparison multiple times or not. We find approximation bounds for Kemeny rankings dependant on confidence intervals over estimated winning probabilities of arms. Based on these we state algorithms to find Probably Approximately Correct (PAC) solutions and elaborate on their sample complexity for sampling with or without replacement. Furthermore, if all agents’ preferences are strict rankings over the alternatives, we provide means to prune confidence intervals and thereby guide a more efficient elicitation. We formulate several adaptive sampling methods that use look-aheads to estimate how much confidence intervals (and thus approximation guarantees) might be tightened. All described methods are compared on synthetic data.
Minimax-Bayes Reinforcement Learning
2023, Thomas Kleine Buening, Dimitrakakis, Christos, Hannes Eriksson, Divya Grover, Emilio Jorge
While the Bayesian decision-theoretic framework offers an elegant solution to the problem of decision making under uncertainty, one question is how to appropriately select the prior distribution. One idea is to employ a worst-case prior. However, this is not as easy to specify in sequential decision making as in simple statistical estimation problems. This paper studies (sometimes approximate) minimax-Bayes solutions for various reinforcement learning problems to gain insights into the properties of the corresponding priors and policies. We find that while the worst-case prior depends on the setting, the corresponding minimax policies are more robust than those that assume a standard (i.e. uniform) prior.
Isoperimetry is All We Need: Langevin Posterior Sampling for RL
2024, Emilio Jorge, Dimitrakakis, Christos, Debabrota Basu
In Reinforcement Learning theory, we often assume restrictive assumptions, like linearity and RKHS structure on the model, or Gaussianity and log-concavity of the posteriors over models, to design an algorithm with provably sublinear regret. In this paper, we study whether we can design efficient low-regret RL algorithms for any isoperimetric distribution, which includes and extends the standard setups in the literature. Specifically, we show that the well-known PSRL (Posterior Sampling-based RL) algorithm yields sublinear regret if the posterior distributions satisfy the Log-Sobolev Inequality (LSI), which is a form of isoperimetry. Further, for the cases where we cannot compute or sample from an exact posterior, we propose a Langevin sampling-based algorithm design scheme, namely LaPSRL. We show that LaPSRL also achieves sublinear regret if the posteriors only satisfy LSI. Finally, we deploy a version of LaPSRL with a Langevin sampling algorithms, SARAH-LD. We numerically demonstrate their performances in different bandit and MDP environments. Experimental results validate the generality of LaPSRL across environments and its competetive performance with respect to the baselines.
Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer
2023, Hannes Eriksson, Debabrota Basu, Tommy Tram, Mina Alibeigi, Dimitrakakis, Christos
In this paper, we study the problem of transferring the available Markov Decision Process (MDP) models to learn and plan efficiently in an unknown but similar MDP. We refer to it as \textit{Model Transfer Reinforcement Learning (MTRL)} problem. First, we formulate MTRL for discrete MDPs and Linear Quadratic Regulators (LQRs) with continuous state actions. Then, we propose a generic two-stage algorithm, MLEMTRL, to address the MTRL problem in discrete and continuous settings. In the first stage, MLEMTRL uses a \textit{constrained Maximum Likelihood Estimation (MLE)}-based approach to estimate the target MDP model using a set of known MDP models. In the second stage, using the estimated target MDP model, MLEMTRL deploys a model-based planning algorithm appropriate for the MDP class. Theoretically, we prove worst-case regret bounds for MLEMTRL both in realisable and non-realisable settings. We empirically demonstrate that MLEMTRL allows faster learning in new MDPs than learning from scratch and achieves near-optimal performance depending on the similarity of the available MDPs and the target MDP.
Risk-Sensitive Bayesian Games for Multi-Agent Reinforcement Learning under Policy Uncertainty
2022-03-18T16:40:30Z, Hannes Eriksson, Debabrota Basu, Mina Alibeigi, Dimitrakakis, Christos
In stochastic games with incomplete information, the uncertainty is evoked by the lack of knowledge about a player's own and the other players' types, i.e. the utility function and the policy space, and also the inherent stochasticity of different players' interactions. In existing literature, the risk in stochastic games has been studied in terms of the inherent uncertainty evoked by the variability of transitions and actions. In this work, we instead focus on the risk associated with the \textit{uncertainty over types}. We contrast this with the multi-agent reinforcement learning framework where the other agents have fixed stationary policies and investigate risk-sensitiveness due to the uncertainty about the other agents' adaptive policies. We propose risk-sensitive versions of existing algorithms proposed for risk-neutral stochastic games, such as Iterated Best Response (IBR), Fictitious Play (FP) and a general multi-objective gradient approach using dual ascent (DAPG). Our experimental analysis shows that risk-sensitive DAPG performs better than competing algorithms for both social welfare and general-sum stochastic games.