Transparent liquid volume estimation is crucial for robot manipulation tasks, such as pouring. However, estimating the volume of transparent liquids is a challenging problem. Most existing methods primarily focus on data collection in the real world, and the sensors are fixed to the robot body for liquid volume estimation. These approaches limit both the timeliness of the research process and the flexibility of perception. In this paper, we present SimLiquid20k, a high-fidelity synthetic dataset for liquid volume estimation, and propose a YOLO-based multi-modal network trained on fully synthetic data for estimating the volume of transparent liquids. Extensive experiments demonstrate that our method can effectively transfer from simulation to the real world. In scenarios involving changes in background, viewpoint, and container variations, our approach achieves an average error of 5% in real-world volume estimation. In addition, our work conducts two application experiments integrate with ChatGPT, showcasing the potential of our method in service robotics. The accompanying video and supplementary materials are available at https://simliquid.github.io/.