Papers
arxiv:2503.07807

Training Domain Draft Models for Speculative Decoding: Best Practices and Insights

Published on Mar 10, 2025
Authors:
,
,
,
,
,

Abstract

Knowledge distillation techniques improve speculative decoding efficiency for ___domain-specific large language models by training ___domain-specific draft models, with offline and white-box distillation showing superior performance.

Speculative decoding is an effective method for accelerating inference of large language models (LLMs) by employing a small draft model to predict the output of a target model. However, when adapting speculative decoding to ___domain-specific target models, the acceptance rate of the generic draft model drops significantly due to ___domain shift. In this work, we systematically investigate knowledge distillation techniques for training ___domain draft models to improve their speculation accuracy. We compare white-box and black-box distillation approaches and explore their effectiveness in various data accessibility scenarios, including historical user queries, curated ___domain data, and synthetically generated alignment data. Our experiments across Function Calling, Biology, and Chinese domains show that offline distillation consistently outperforms online distillation by 11% to 25%, white-box distillation surpasses black-box distillation by 2% to 10%, and data scaling trends hold across domains. Additionally, we find that synthetic data can effectively align draft models and achieve 80% to 93% of the performance of training on historical user queries. These findings provide practical guidelines for training ___domain-specific draft models to improve speculative decoding efficiency.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2503.07807
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.07807 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.07807 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.07807 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.