loading page

Can unsupervised profile classification help create interpretable and robust oceanographic knowledge?
  • Dan(i) Jones
Dan(i) Jones
British Antarctic Survey, NERC, UKRI

Corresponding Author:dcjones.work@gmail.com

Author Profile

Abstract

Oceanographic structure is often represented as a collection of vertical profiles, i.e. temperature, salinity, and/or biogeochemical values at various depths. These profiles contain information about water mass structures and the boundaries between them, which are consequences of the integrated effects of water mass formation, advection, and destruction. In recent years, researchers have applied various unsupervised profile classification methods in an attempt to identify a set of “profile types” and the spatially coherent regimes associated with them. These efforts have identified a number of regimes that are consistent with existing oceanographic knowledge, and they have also identified previously under-appreciated structural differences. However, as this application area matures, questions remain about the strengths and limitations of these methods as applied to oceanography. A key question is “under what circumstances does unsupervised profile classification produce interpretable and scientifically useful knowledge?” Here, I explore the mechanisms and parameters of various unsupervised learning approaches, in particular Gaussian Mixture Modeling, in an attempt to clarify the conditions under which unsupervised learning produces robust, interpretable, and trustworthy understanding. As with pattern classification approaches in general, there is a tradeoff between interpretability and accuracy (the ability of the method to represent the full underlying structure of the system). As a case study, I explore an unsupervised profile classification application in the Weddell Gyre. I show that, using a combination of statistical guidance, expert judgment, and traditional oceanographic analysis, we can, in some cases, increase the interpretability of a profile classification model with acceptable losses in accuracy. The goal is to elucidate the conditions under which unsupervised learning can be fully integrated into the oceanographic knowledge generation process, both by confronting existing understanding and by highlighting new avenues for exploration.