Dissolved oxygen (DO) sustains aquatic life and is an essential water quality measure. Our capabilities of forecasting DO levels, however, remain elusive. Unlike the increasingly intensive earth surface and hydroclimatic data, water quality data often have large temporal gaps and sparse areal coverage. Here we ask the question: can a Long Short-Term Memory (LSTM) deep learning model learn the spatio-temporal dynamics of stream DO from intensive hydroclimatic and sparse DO observations at the continental scale? That is, can the model harvest the power of big hydroclimatic data and use them for water quality forecasting? Here we used data from CAMELS-chem, a new dataset that includes sparse DO concentrations from 236 minimally-disturbed watersheds. The trained model can generally learn the theory of DO solubility under specific temperature, pressure, and salinity conditions. It captures the bulk variability and seasonality of DO and exhibits the potential of forecasting water quality in ungauged basins without training data. It however often misses concentration peaks and troughs where DO level depends on complex biogeochemical processes. The model surprisingly does not perform better where data are more intensive. It performs better in basins with low streamflow variations, low DO variability, high runoff-ratio (> 0.45), and precipitation peaks in winter. This work suggests that more frequent data collection in anticipated DO peak and trough conditions are essential to help overcome the issue of sparse data, an outstanding challenge in the water quality community.