tag

#thompson-sampling

총 2개의 글

AI 2026.05.03 · 11 min Advanced Rl Theory Deep Dive · 3

Posterior sampling의 probability matching 원리부터 정보비율 최소화까지, Bayesian bandit 알고리즘의 통일 원리를 추적한다.

AI 2026.05.03 · 13 min Advanced Rl Theory Deep Dive · 7

Pure Exploration의 두 프레임워크(Fixed-Confidence vs Fixed-Budget)의 근본적 차이부터 Instance-Optimal 알고리즘까지, BAI 이론의 핵심 구조를 추적한다.