Recently, 3D occupancy prediction, a camera-only perception task, has garnered significant attention for addressing key limitations in traditional 3D object detection, such as overlooking uncommon categories and failing to capture complex geometric shapes. However, current methods for occupancy prediction face two major challenges. First, they require high computational resources, making deployment on non-GPU devices impractical. Second, height estimation is often inaccurate due to the overlooked imbalance in height distribution within datasets. To address these limitations, we propose two innovative strategies. First, we eliminate the use of computationally expensive components such as transformer operators, depth estimation modules, and 3D convolutions. Second, we introduce a focused weighting mechanism to improve height-related accuracy. Building on these strategies, we introduce ConvOcc, an efficient and deployment-friendly framework composed entirely of 2D convolutions. ConvOcc features: (1) Feature Fuse Module for enhanced multi-scale 2D feature fusion, (2) Voxel-to-Image View Transformation for rapid conversion of 2D image features to 3D voxel space, (3) Squash and Stretch Module to simplify complex 3D voxel computations into a more efficient 2D BEV form, (4) Height-Attention Multi-Scale BEV Fusion Module for dynamic reweighting of BEV features based on height, and (5) Multi-Frame Temporal Fusion Strategy for denser voxel feature extraction. Extensive ablation studies validate the effectiveness and efficiency of our approach. ConvOcc achieves 2× FPS with a mean IoU of 36.1 on the Occ3D-nuScenes dataset, all while maintaining deployment-friendly requirements. This work challenges the conventional reliance on 3D methods for occupancy prediction, demonstrating that it can be effectively and efficiently addressed using a 2D-based approach.