Document Details


2201.02373v11.pdf
Download View Text Delete
Clip: Mirror Learning: A Unifying Framework of Policy Optimisation Jakub Grudzien Kuba 1 Christian Schroeder de Witt 1 Jakob Foerster 1 Abstract Modern deep reinforcement learning (RL) al- gorithms aremotivatedby either the gener- alised policy iteration (GPI) or trust-region learn- ing (TRL) frameworks. However, algorithms thatstrictly respectthese theoretical frameworks have proven unscalable. Surprisingly, the only known scalable algorithms violate the GPI/TRL
Filename: 2201.02373v11.pdf
Filetype: application/pdf
Size: 3138600 bytes
Uploaded On: 2025-10-24
Abstract:
Summary:
Tags:
Notes:
Visible: 1
Status: Parsed
Author: Anonymous Submission
CreationDate: 2024-11-21T01:08:30+00:00
Creator: LaTeX with hyperref
Keywords:
ModDate: 2024-11-21T01:08:30+00:00
PTEX.Fullbanner: This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5
Producer: pdfTeX-1.40.25
Subject: Proceedings of the International Conference on Machine Learning 2022
Title:
Trapped: False
Pages: 20

Return to Document Library