tag

#policy-optimization

총 2개의 글

AI 2026.05.03 · 11 min Advanced Rl Foundations Deep Dive · 6

Performance Difference Lemma의 닭과 달걀 문제부터 greedy 정책 손실의 수학적 bound까지, 현대 RL 이론이 공유하는 하나의 언어를 추적한다.

AI 2026.04.28 · 10 min Advanced Advanced Rl Deep Dive · 1

두 정책의 성능 차이를 advantage로 분해하는 PDL부터 surrogate objective, trust region bound, monotonic improvement 보장까지, advanced RL의 단일 이론 체계를 추적한다.