Transformers have emerged as the foundation of modern deep learning, achieving state-of-the-art performance across diverse domains such as natural language processing, computer vision, and multimodal tasks. However, their quadratic computational complexity and high memory requirements pose significant challenges, especially when scaling to long sequences or deploying on resource-constrained hardware. To address these limitations, the research community has proposed a variety of techniques to make Transformers more efficient. This survey provides a comprehensive overview of these advancements, categorizing them into five main strategies: sparsity-based methods, low-rank approximations, structured attention patterns, memory-efficient mechanisms, and hybrid approaches. For each category, we delve into the underlying principles, highlight representative methods, and analyze their strengths and limitations. We also discuss the trade-offs involved, including computational efficiency, memory savings, and model accuracy. Furthermore, we identify emerging trends and open challenges, such as unified frameworks, hardware-aware optimizations, and sustainable AI practices. By synthesizing the state of the art in efficient Transformers, this survey aims to guide future research and facilitate the adoption of these techniques in both academic and industrial settings.