Large language models (LLMs) have revolutionized the field of artificial intelligence, achieving unprecedented performance in tasks such as text generation, translation, and reasoning. Despite their capabilities, the enormous size and computational demands of these models limit their accessibility and deployment in resource-constrained settings. Knowledge distillation has emerged as a promising approach to address these challenges by transferring knowledge from a large, complex teacher model to a smaller, more efficient student model. This process retains much of the teacher's performance while significantly reducing the computational and memory requirements. This survey provides a comprehensive overview of knowledge distillation techniques tailored for LLMs. We discuss foundational approaches, such as logit matching and feature alignment, alongside advanced methods that leverage intermediate layer supervision, task-specific adaptations, and multimodal extensions. Applications of knowledge distillation in real-world scenarios are explored, emphasizing its role in enabling efficient deployment of LLMs on edge devices and in low-latency environments. Key challenges are identified, including the preservation of emergent behaviors, domain-specific generalization, and the scalability of distillation techniques for increasingly larger models. The survey also highlights ethical and environmental considerations, such as bias transfer and the carbon footprint of model compression. Finally, we outline future research directions, focusing on adaptive frameworks, integration with other compression techniques, and the development of standardized benchmarks to evaluate distilled models.