Large language models (LLMs) have achieved high accuracy in diverse NLP and computer vision tasks due to selfattention mechanisms relying on GEMM and GEMV operations. However, scaling LLMs poses significant computational and energy challenges, particularly for traditional Von-Neumann architectures (CPUs/GPUs), which incur high latency and energy consumption from frequent data movement. While high energy consumption is a crucial challenge for serves, these issues are exacerbated in edge applications with energy-constrained computational resources. DRAM-based near-memory architectures offer high energy efficiency and throughput, but their basic processing elements have limited operations due to stringent area, power, and timing constraints. This work introduces CIDAN-T3D, a novel Processing-in-Memory (PIM) architecture optimized for LLMs. It features an ultra-low-power Neuron Processing Element (NPE) with high compute density (#Operations/Area), enabling efficient in-situ execution of LLM operations by leveraging high parallelism within DRAM. CIDAN-T3D minimizes data transfers, enhances locality, and achieves significant gains in throughput and energy efficiency. Experiments show a 1.3× throughput and 21.9× improvement in energy efficiency compared to existing near-memory designs, making CIDAN-T3D a scalable and efficient solution for LLM-based Gen-AI applications.