#batching

총 2개의 글

AI 2026.05.03 · 12 min Advanced Llm Inference Deep Dive · 1

Prefill의 compute-bound와 decode의 memory-bound가 같은 모델에서 공존하는 이유부터 Roofline 분석과 batch 최적화의 한계까지, LLM 서빙의 물리적 제약을 추적한다.

AI 2026.05.03 · 10 min Advanced Llm Inference Deep Dive · 3

Static batching의 67% GPU 낭비부터 Prefill-Decode 분리까지, LLM 추론 처리량을 3-5배 끌어올리는 배치 전략의 진화를 추적한다.