Behavior Predictive Representations for Generalization in Reinforcement Learning
Siddhant Agarwal1
Aaron Courville2
Rishabh Agarwal2, 3
1 IIT Kharagpur
2 MILA, Universite de Montreal
3 Google Research Brain Team
NeurIPS 2021 workshop on Deep Reinforcement Learning and Ecological Theory of Reinforcement Learning

Abstract

Deep reinforcement learning~(RL) agents trained on a few environments, often struggle to generalize on unseen environments, even when such environments are semantically equivalent to training environments. Such agents learn representations that overfit the characteristics of the training environments. We posit that generalization can be improved by assigning similar representations to scenarios with similar sequences of long-term optimal behavior. To do so, we propose behavior predictive representations~(BPR) that capture long-term optimal behavior. BPR trains an agent to predict latent state representations multiple steps into the future such that these representations can predict the optimal behavior at the future steps. We demonstrate that BPR provides large gains on a jumping task from pixels, a problem designed to test generalization.

Paper & Code

Siddhant Agarwal, Aaron Courville, Rishabh Agarwal
Behavior Predictive Representations for Generalization in Reinforcement Learning
7th Workshop on Deep Reinforcement Learning
and Ecological Theory of Reinforcement Learning at
Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS)
, 2021
[PDF] [Video Presentation] [Poster] [Slides]