Robustness remains a critical challenge in machine learning systems, where unexpected or malformed inputs can lead to unpredictable behavior and significant failures. The novel approach of token-level fuzz testing addresses this issue through systematically injecting perturbations into token inputs and analyzing the model's response. By focusing on how slight variations in tokens affect coherence, accuracy, and computational efficiency, the testing framework provides an in-depth evaluation of model vulnerabilities. The methodology applied to Llama revealed significant fragility in the model's token handling, particularly when exposed to non-standard inputs such as special characters or typographical errors. Analysis showed a clear pattern of cascading errors from minor perturbations, suggesting the need for enhanced token embeddings and pre-processing mechanisms. The comprehensive testing framework presented here offers a scalable solution to identify token-level weaknesses in language models, contributing valuable insights to the field of model robustness and reliability.