tag

#continuous-batching

총 2개의 글

AI 2026.05.03 · 10 min Advanced Llm Inference Deep Dive · 3

Static batching의 67% GPU 낭비부터 Prefill-Decode 분리까지, LLM 추론 처리량을 3-5배 끌어올리는 배치 전략의 진화를 추적한다.

AI 2026.05.03 · 13 min Advanced Efficient Ml Deep Dive · 7

KV cache 단편화 해소부터 mobile NPU 컴파일까지, LLM inference를 실용적으로 만드는 PagedAttention·Speculative Decoding·Continuous Batching·Edge Deployment의 설계 철학을 추적한다.