Kimi-Linear-48B-A3B-DFlash-240k

A DFlash speculative-decoding drafter for moonshotai/Kimi-Linear-48B-A3B-Instruct — the first public Kimi-Linear DFlash drafter.

Headline

Metric	Value
Target	Kimi-Linear-48B-A3B-Instruct (48B total / 3B active MoE, pure linear-attention)
Drafter	5-layer softmax DFlash, hidden 2304, target_layer_ids=[3,7,15,19,23], block 16, vocab 163840, mask_token_id 163839
Training data	240k target-matched math examples (Moonlight556/kimi-linear-48b-a3b-target-matched-math-240k)
Training compute	1 epoch, 59,867 optimizer steps, 12h41m wall on 2× B200 (FSDP-2)
Math500 / N=64 offline acceptance length	5.2106 (greedy, no thinking; block_size=16)

Comparable benchmarks

Offline mean-acceptance-length on Math500 (N=64, shuffle seed 0, greedy, no thinking) across the la-draftery training family:

Drafter	Target	Train data	Offline accept length
la-draftery Phase 1	Qwen3.5-0.8B	25k target-matched × 2ep	3.99
la-draftery Phase 1.1	Qwen3.5-0.8B	25k target-matched × 6ep	5.02
la-draftery Phase 1.2	Qwen3.5-0.8B	240k target-matched × 1ep	5.71
Phase 2 30k proof	Kimi-Linear-48B-A3B	30k target-matched × 1ep	2.36
Phase 2 240k (this model)	Kimi-Linear-48B-A3B	240k target-matched × 1ep	5.21

Offline acceptance length is teacher-forced (draft argmax vs the target's own greedy continuation, parallel block-wise pass). Serving (online) acceptance via SGLang spec-v2 is a separate measurement and may differ — see la-draftery's docs/015 and docs/016 for the Phase 1.2 serving story.

Architecture & infrastructure

The Kimi-Linear target needs an architectural bridge that the Qwen3.5 path doesn't: its custom forward returns outputs.hidden_states=None, so DFlash can't grab per-layer hidden states the standard way. We capture them via forward hooks instead — see la-draftery/specforge/modeling/target/dflash_target_model_kda.py and the full reproducibility recipe at docs/017.

The drafter architecture itself (5-layer softmax DFlash, causal_head=False) is unchanged from Phase 1.2; only the target-side capture is different.

Files

File	Purpose
`config.json`	DFlash drafter config
`dflash.py`	Drafter modeling code (matches Phase 1.2)
`model.safetensors`	Drafter weights (902 MB)

Usage

See la-draftery/recipes/train_phase2_kimi_30k_proof.sh for the training launcher (the 240k run uses the same recipe with DATA_PATH pointing at the full 240k jsonl), and tools/bench/bench_dflash.py for offline benchmarking.

License

Apache 2.0.

Downloads last month: 21

Safetensors

Model size

0.5B params

Tensor type

BF16

Model tree for Moonlight556/Kimi-Linear-48B-A3B-DFlash-240k

Base model

moonshotai/Kimi-Linear-48B-A3B-Instruct

Finetuned

(7)

this model

Moonlight556
/

Kimi-Linear-48B-A3B-DFlash-240k