Quantum compilation requires solving the initial mapping problem, which is crucial for optimizing qubit assignment and minimizing fidelity in compiled quantum circuits. This study investigates the application of Deep Reinforcement Learning (DRL) to address initial mapping across various qubit topologies, while considering varying qubit connection fidelities. By leveraging policy gradient algorithms-Advantage Actor-Critic (A2C), Proximal Policy Optimization (PPO), Trust Region Policy Optimization (TRPO), and PPO with action masking-DRL agents effectively compute high-quality mappings for both small-and medium-scale quantum architectures, but their efficiency decreases as the system size grows, highlighting the need for further optimization strategies. Fine-tuning hyperparameters and incorporating action masking are shown to be essential for preventing illegal actions and enhancing mapping accuracy, particularly in larger systems. This research also aids the continued development of initial mapping techniques by introducing a comprehensive qubit connectivity database for systematic evaluation of DRL methods across diverse architectures.