Width vs. Depth: Speculating on the Margin
speculative decodinglarge language modelswidth vs depthconfidence gatingthroughput optimization
Author: somnial
Date: 7/2/2026
Article Summary:
The article discusses the benefits of speculative decoding in large language models, specifically the trade-off between width (batching multiple sequences) and depth (speculating on a single sequence), and proposes a confidence-gated speculation approach to optimize throughput.