Width vs. Depth: Speculating on the Margin

Other: AI & Machine Learning(blog.doubleword.ai)view on HackerNews

speculative decodinglarge language modelswidth vs depthconfidence gatingthroughput optimization

Author: somnial

Date: 7/2/2026

Article Summary:

The article discusses the benefits of speculative decoding in large language models, specifically the trade-off between width (batching multiple sequences) and depth (speculating on a single sequence), and proposes a confidence-gated speculation approach to optimize throughput.