Accurate estimates of snow water equivalent (SWE) are essential for understanding hydrological processes and managing effective management of water resources, particularly in snow-dominated regions. Many methods for estimating SWE rely on in-situ measurements and/or numerical models. In-situ measurements, such as those provided by the USDA Snotel network, have the advantage of being direct observations of SWE but are only sparsely available and suffer from challenges of representativity. At the same time, numerical models embed knowledge of the physical processes underlying the snowpack accumulation and ablation but can be computationally expensive to run over large areas. In this study, we investigate applying deep learning techniques to predict the spatiotemporal distribution of SWE from a combination of atmospheric forcings derived from the Weather Research and Forecasting (WRF) model, geographic parameters related to topography and land cover that influence snow persistence, and historical observations of snow presence/absence from remote sensing data. By leveraging static variables and dynamic atmospheric forcings from WRF as input features, we train a convolutional long short-term memory (ConvLSTM) network to predict SWE. Our proposed deep learning model aims to accelerate the prediction of spatially distributed SWE compared to traditional methods and can complement process-based land surface models often used to predict SWE. The computational savings associated with training and forward integration of machine learning based models open the door to high-resolution ensemble forecasting of SWE and assimilation of observations for real-time SWE estimation.