AUTHOREA
Log in Sign Up Browse Preprints
LOG IN SIGN UP
Zhigao Huang
Zhigao Huang

Public Documents 1
[1]¿p1 [1]¿m1 Curriculum-Grown Context Windows: Dynamic Adaptation...
Zhigao Huang
Musheng Chen

Zhigao Huang

and 2 more

March 25, 2025
Training transformers requires balancing computational efficiency with modeling capability through context window size selection — larger windows capture long-range dependencies but incur quadratic attention costs, while smaller windows train faster but limit context. We propose dynamic block size adaptation, a curriculum learning approach that gradually increases the context window during training. Our method maintains the final model architecture while achieving 21.3% faster training (225.4 vs 286.4 minutes) and better validation loss (1.450 vs 1.468) compared to fixed-window baselines on character-level modeling with shakespeare_char. The dynamic approach demonstrates reduced loss fluctuation during training, with 38% lower variance in validation loss curves and 42% faster recovery after window size transitions. Empirical results across multiple random seeds validate our approach’s effectiveness in balancing computational efficiency with modeling performance, while improving inference speed by 2.9% (408.8 vs 397.3 tokens/sec).

| Powered by Authorea.com

  • Home