September 24, 2024
Optimizing Token Initialization for Accelerated Pre-training of Large Language Models
Sonni Zamarian, Andrew Montgomery, James Schneider, et al.