Training a Custom EAGLE-3 Head

last updated 2026-06-08

vllm

Notes in which I follow along the BaseTen How to Train a Custom EAGLE-3 Head for Speculative Decoding.

They point to three training frameworks: 1. https://github.com/NVIDIA/Model-Optimizer 1. https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/speculative_decoding 2. ^ above tutorial discusses draft model, EAGLE, and DFlash 2. https://github.com/sgl-project/SpecForge 1. https://sgl-project.github.io/SpecForge/basic_usage/training.html 3. https://github.com/torchspec-project/TorchSpec

Looking over the three of them, it looks like SpecForge has the simplest “happy path” to training an EAGLE-3 head.

Recommended parameters: 1. Test-Time-Training Length: 7-9 2. Number of draft tokens: 3-4, with the claim that “going higher rarely helps because prediction accuracy drops off and verification cost grows” 3. Learning rate: 1e-4 to 2e-5 changing wrt model size 4. Sampling parameters: sample @ T=0; you lose 25% speed sampling at T=1 instead of T=0 apparently.