2.3 Automatic model structure identification
Two challenges are encountered when constructing a hybrid model, the
first being identifying the physically correct model structure of the
kinetic part, and the second being preventing the overfitting of the
data-driven part. For a purely kinetic model, the first challenge is
addressed via model discrimination which consists of several iterations
to develop various structures and test their data fitting performance31. For a purely data-driven model, the second
challenge is solved via a hyper-parameter selection framework to
identify the simplest model structure that well fits the data according
to certain criteria 32. In this work, these two
challenges are addressed simultaneously for the first time by creating
an automatic model structure detection strategy comprised of model
reformulation and sparse optimisation. The original idea of this
strategy was proposed to reveal natural laws based on experiment data
using basic mathematical operators without any prior knowledge33. This idea was then applied to nonlinear dynamic
systems in 2016 to identify the governing equations for fluid dynamics34. The current study launched the first investigation
to modify this strategy and adopt it into bioprocess hybrid modelling.
Details of this strategy are explained below.
Initially, the original hybrid model is reformulated into a
mixed-integer nonlinear programming (MINLP) problem by assigning binary
variables into each individual term in the hybrid model, as presented in
Eq. (4a)-(4c). By adding two equality constraints Eq. (4d)-(4e),
different kinetic equations for nitrate uptake and lutein production are
merged into one expression shown as Eq. (4b) and (4c), respectively. As
binary variables can only be either 0 or 1, imposing the equality
constraints will ensure that only one of the possible kinetic
formulations (e.g. the first term on the right-hand side of Eq.
(3b) or (3c)) is active, with the others being inactive as their binary
variables are 0. Through this way, kinetic model discrimination can be
executed more efficiently than the trial and error approach. This method
can also be applied to general cases in which multiple contradictory
kinetic hypotheses are present. Data nominalisation must be executed
before using the polynomial regression model. However, this is not
needed for kinetic model construction, as each parameter in the model
has its unique physical meaning and unit.
\begin{equation}
\frac{dc_{X}}{\text{dt}}=u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \bullet\frac{I_{0}e^{-\tau\bullet c_{X}\bullet z}}{I_{0}e^{-\tau\bullet c_{X}\bullet z}+k_{s}}\ \bullet c_{X}+{b_{10}\bullet a_{10}\bullet c}_{1}+\sum_{j=1}^{4}{{b_{1j}\bullet a_{1j}\bullet c}_{1}{\bullet c}_{j}}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(4a\right)\nonumber \\
\end{equation}\begin{equation}
\frac{dc_{N}}{\text{dt}}=b_{N1}\bullet\left(-Y_{N/X}\bullet\frac{u_{0}\bullet c_{N}}{c_{N}+K_{N}}\ \ \bullet\frac{I_{0}e^{-\tau\bullet c_{X}\bullet z}}{I_{0}e^{-\tau\bullet c_{X}\bullet z}+k_{s}}\bullet c_{X}\right)+b_{N2}\bullet\left(-Y_{N/X}\bullet\frac{u_{0}\bullet c_{N}}{c_{N}+K_{N}}\ \bullet c_{X}\right)+F_{\text{in}}\bullet c_{N,in}+{b_{20}\bullet a_{20}\bullet c}_{2}+\sum_{j=1}^{4}{{b_{2j}\bullet a_{2j}\bullet c}_{2}{\bullet c}_{j}}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4b)\nonumber \\
\end{equation}\begin{equation}
\frac{dc_{L}}{\text{dt}}=b_{L1}\bullet\left(Y_{L/X}\bullet u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \ \bullet\frac{I_{0}e^{-\tau\bullet c_{X}\bullet z}}{I_{0}e^{-\tau\bullet c_{X}\bullet z}+k_{\text{sL}}}\bullet c_{X}\right)+b_{L2}\bullet\left(Y_{L/X}\bullet u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \ \bullet c_{X}\right)+{b_{30}\bullet a_{30}\bullet c}_{3}+\sum_{j=1}^{4}{{b_{3j}\bullet a_{3j}\bullet c}_{3}{\bullet c}_{j}}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(4c\right)\nonumber \\
\end{equation}\begin{equation}
b_{N1}+b_{N2}=1\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \left(4d\right)\nonumber \\
\end{equation}\begin{equation}
b_{L1}+b_{L2}=1\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \left(4e\right)\nonumber \\
\end{equation}where \(b_{\text{Ni}}\) and \(b_{\text{Li}}\) (\(i=1,2\)) are binary
variables for the kinetic model, \(b_{\text{ij}}\) (\(i=1,2,3\),\(j=0,1,2,3\)) are binary variables for the data-driven model,\(a_{\text{ij}}\) (\(i=1,2,3\), \(j=0,1,2,3,4\)) are coefficients of
the polynomial terms, \(c_{i}\) (\(i=1,2,3,4\)) are normalised values
of concentrations of biomass, nitrate, and lutein and incident light
intensity, respectively.
Once reformulated, the next step is to implement sparse optimisation to
automatically identify the most probable kinetic model structure and
minimum number of polynomial terms that can well fit the multiple
generated datasets. Therefore, the objective function for parameter
estimation is designed such that in addition to minimising residues
between model prediction and experimental data, it also penalises the
total number of active binary variables in the polynomial terms to avoid
overfitting. In this way, the optimal hybrid model structure alongside
its parameter values can be simultaneously identified. The objective
function therefore is formulated as Eq. 5.
\begin{equation}
\min{}F=\sum_{k=1}^{n}{\sum_{p=1}^{m}{\sum_{i=1}^{3}{\left(c_{S_{i},p,k,E}-c_{S_{i},p,k,M}\right)^{2}\bullet w_{S_{i},j,k}}}}+w_{b}\bullet\sum_{j=0}^{4}{\sum_{i=1}^{3}b_{\text{ij}}}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(5\right)\nonumber \\
\end{equation}where \(c_{S_{i},p,k,E}\) and \(c_{S_{i},p,k,E}\) are experimental and
model simulated value for concentration of state \(S_{i}\) at time step\(p\in m\) in the \(k\)th dataset (total number of
datasets is \(n\)), respectively, \(w_{S_{i},j,k}\) and \(w_{b}\) are
the weight for each data point and the sum of binary variables,
respectively.
In this work, parameter estimation was formulated as a weighted
nonlinear least squares optimisation problem. However, as the current
model is highly nonlinear due to its complex model structure and many
discrete (binary) variables, it must be relaxed so that it can be solved
using the standard dynamic parameter estimation method22. Conventionally, an MINLP problem is relaxed via
introducing new parameters and extra linear constraints to reduce
nonlinearity of the original problem 35. For example,
converting Eq. (4a) to Eq. (6a) where new continuous parameters\(B_{1j}\) are introduced to substitute the bilinear term\(b_{1j}\bullet a_{1j}\) and satisfy the constraints Eq. (6b)-(6c).
\begin{equation}
\frac{dc_{X}}{\text{dt}}=u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \bullet\frac{I_{0}e^{-\tau\bullet c_{X}\bullet z}}{I_{0}e^{-\tau\bullet c_{X}\bullet z}+k_{s}}\bullet c_{X}+{B_{10}\bullet c}_{1}+\sum_{j=1}^{4}{{B_{1j}\bullet c}_{1}{\bullet c}_{j}}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(6a\right)\nonumber \\
\end{equation}\begin{equation}
a_{1j}-a_{1j}^{U}\bullet\left(1-b_{1j}\right)\leq B_{1j}\leq a_{1j}-a_{1j}^{L}\bullet\left(1-b_{1j}\right)\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(6b\right)\nonumber \\
\end{equation}\begin{equation}
a_{1j}^{L}\bullet b_{1j}\leq B_{1j}\leq a_{1j}^{U}\bullet b_{1j}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(6c\right)\nonumber \\
\end{equation}where \(a_{1j}^{L}\) and \(a_{1j}^{U}\) are the lower and upper bound of\(a_{1j}\), respectively.
In spite of its uses over a range of chemical engineering research, this
relaxation still results in an MINLP. Given this hybrid dynamic model is
highly nonlinear, another relaxation is proposed in this study to
reformulate the MINLP into an NLP (nonlinear programming problem) by
converting binary variables into continuous variables with two extra
constraints Eq. (7a)-(7b). To satisfy constraints Eq. (7a)-(7b), values
of \(b_{\text{ij}}\) have to be either 0 or 1. In this way, the original
problem can be effectively solved via the continuous dynamic parameter
estimation method 22.
\begin{equation}
0\leq b_{\text{ij}}\leq 1\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \left(7a\right)\nonumber \\
\end{equation}\begin{equation}
b_{\text{ij}}\bullet\left(1-b_{\text{ij}}\right)\leq 0\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \left(7b\right)\nonumber \\
\end{equation}The NLP model was discretised by orthogonal collocation over finite
elements in time, and was then solved using the interior point nonlinear
optimisation solver IPOPT through a multi-start framework. This was
executed in the Python optimisation environment Pyomo36. The model was simulated in Mathematica 11. Details
regarding the this dynamic model parameter estimation procedure can be
found in 22.