Dissecting unsupervised learning through hidden Markov modelling in
electrophysiological data
Abstract
Unsupervised, data-driven methods are commonly used in neuroscience to
automatically decompose data into interpretable patterns. These patterns
differ from one another depending on the assumptions of the models. How
these assumptions affect specific data decompositions in practice,
however, is often unclear, which hinders model applicability and
interpretability. For instance, the hidden Markov model (HMM)
automatically detects characteristic, recurring activity patterns
(so-called states) from time series data. States are defined by a
certain probability distribution, whose state-specific parameters are
estimated from the data. But what specific features, from all of those
that the data contain, do the states capture? That depends on the choice
of probability distribution and on other model hyperparameters. Using
both synthetic and real data, we aim at better characterising the
behaviour of two HMM types that can be applied to electrophysiological
data. Specifically, we study which differences in data features (such as
frequency, amplitude or signal-to-noise ratio) are more salient to the
models and therefore more likely to drive the state decomposition.
Overall, we aim at providing guidance for an appropriate use of this
type of analysis on one or two-channel neural electrophysiological data,
and an informed interpretation of its results given the characteristics
of the data and the purpose of the analysis.