#psrl · IQ Lab

AI 2026.05.03 · 11 min Advanced Rl Theory Deep Dive · 6

Bandit regret을 MDP로 확장할 때 등장하는 diameter D의 역할부터, Bayesian posterior sampling과 linear function approximation이 regret scaling을 어떻게 다르게 압축하는지 추적한다.