#prefill-decode

AI 2026.05.03 · 10 min Advanced Llm Inference Deep Dive · 3

Static batching의 67% GPU 낭비부터 Prefill-Decode 분리까지, LLM 추론 처리량을 3-5배 끌어올리는 배치 전략의 진화를 추적한다.