arxiv:2503.07807

Training Domain Draft Models for Speculative Decoding: Best Practices and Insights

Published on Mar 10, 2025

Upvote

Authors:

Fenglu Hong ,

Ravi Raju ,

Abstract

Knowledge distillation techniques improve speculative decoding efficiency for ___domain-specific large language models by training ___domain-specific draft models, with offline and white-box distillation showing superior performance.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Speculative decoding is an effective method for accelerating inference of large language models (LLMs) by employing a small draft model to predict the output of a target model. However, when adapting speculative decoding to ___domain-specific target models, the acceptance rate of the generic draft model drops significantly due to ___domain shift. In this work, we systematically investigate knowledge distillation techniques for training ___domain draft models to improve their speculation accuracy. We compare white-box and black-box distillation approaches and explore their effectiveness in various data accessibility scenarios, including historical user queries, curated ___domain data, and synthetically generated alignment data. Our experiments across Function Calling, Biology, and Chinese domains show that offline distillation consistently outperforms online distillation by 11% to 25%, white-box distillation surpasses black-box distillation by 2% to 10%, and data scaling trends hold across domains. Additionally, we find that synthetic data can effectively align draft models and achieve 80% to 93% of the performance of training on historical user queries. These findings provide practical guidelines for training ___domain-specific draft models to improve speculative decoding efficiency.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2503.07807

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.07807 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.07807 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.07807 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.