#advantage-estimation

AI 2026.05.03 · 8 min Advanced Policy Gradient Deep Dive · 5

TD residual의 bootstrapping bias부터 GAE의 지수적 가중 평균 유도, λ의 두 극한, 역순 O(T) 구현까지 — advantage estimation의 핵심 설계를 추적한다.