Deep Neural Networks (DNNs) have revolutionized fields such as computer vision, natural language processing, and speech recognition. Despite their impressive performance, the high computational and memory demands of DNNs present significant challenges for deployment on resource-constrained devices, such as mobile phones and edge computing platforms. Network quantization has emerged as a promising solution to address these challenges by reducing the numerical precision of DNN weights, activations, and gradients, thereby achieving substantial reductions in model size, computation requirements, and energy consumption. This paper provides a comprehensive survey of state-of-the-art network quantization techniques for DNN compression. We categorize these techniques into uniform, non-uniform, and adaptive approaches, analyzing their theoretical foundations, practical implementations, and hardware considerations. Key evaluation metrics, including accuracy retention, computational efficiency, and energy savings, are discussed to highlight the trade-offs involved in applying quantization. Furthermore, we explore recent advancements, such as quantizationaware training, post-training quantization, and hybrid strategies, which aim to enhance the scalability and effectiveness of quantized models. In addition to presenting the current state of the art, this paper identifies critical challenges in the field, such as accuracy degradation, hardware compatibility, and scalability to larger and more complex models. We also outline future research directions, including the integration of neural architecture search, dynamic quantization methods, and innovations in hardware design to optimize support for quantized models. By bridging theoretical insights with practical applications, this survey aims to guide researchers and practitioners in advancing efficient, scalable, and deployable DNN solutions.