tag

#speculative-decoding

총 3개의 글

AI 2026.05.03 · 11 min Advanced Llm Inference Deep Dive · 5

Draft-target 이중 구조의 시스템 복잡성부터 Medusa·EAGLE·Lookahead의 설계 트레이드오프, Best-of-N의 경제성 분석까지, LLM 추론 가속의 핵심 원리를 추적한다.

AI 2026.05.03 · 13 min Advanced Efficient Ml Deep Dive · 7

KV cache 단편화 해소부터 mobile NPU 컴파일까지, LLM inference를 실용적으로 만드는 PagedAttention·Speculative Decoding·Continuous Batching·Edge Deployment의 설계 철학을 추적한다.

AI 2026.05.03 · 9 min Advanced Llm Efficiency Deep Dive · 7

Autoregressive 병목의 수학적 구조부터 Rejection Sampling의 Losslessness 증명, Medusa·EAGLE·Lookahead까지 — draft 전략의 설계 철학을 추적한다.