2.3 Automatic model structure identification
Two challenges are encountered when constructing a hybrid model, the first being identifying the physically correct model structure of the kinetic part, and the second being preventing the overfitting of the data-driven part. For a purely kinetic model, the first challenge is addressed via model discrimination which consists of several iterations to develop various structures and test their data fitting performance31. For a purely data-driven model, the second challenge is solved via a hyper-parameter selection framework to identify the simplest model structure that well fits the data according to certain criteria 32. In this work, these two challenges are addressed simultaneously for the first time by creating an automatic model structure detection strategy comprised of model reformulation and sparse optimisation. The original idea of this strategy was proposed to reveal natural laws based on experiment data using basic mathematical operators without any prior knowledge33. This idea was then applied to nonlinear dynamic systems in 2016 to identify the governing equations for fluid dynamics34. The current study launched the first investigation to modify this strategy and adopt it into bioprocess hybrid modelling. Details of this strategy are explained below.
Initially, the original hybrid model is reformulated into a mixed-integer nonlinear programming (MINLP) problem by assigning binary variables into each individual term in the hybrid model, as presented in Eq. (4a)-(4c). By adding two equality constraints Eq. (4d)-(4e), different kinetic equations for nitrate uptake and lutein production are merged into one expression shown as Eq. (4b) and (4c), respectively. As binary variables can only be either 0 or 1, imposing the equality constraints will ensure that only one of the possible kinetic formulations (e.g. the first term on the right-hand side of Eq. (3b) or (3c)) is active, with the others being inactive as their binary variables are 0. Through this way, kinetic model discrimination can be executed more efficiently than the trial and error approach. This method can also be applied to general cases in which multiple contradictory kinetic hypotheses are present. Data nominalisation must be executed before using the polynomial regression model. However, this is not needed for kinetic model construction, as each parameter in the model has its unique physical meaning and unit.
\begin{equation} \frac{dc_{X}}{\text{dt}}=u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \bullet\frac{I_{0}e^{-\tau\bullet c_{X}\bullet z}}{I_{0}e^{-\tau\bullet c_{X}\bullet z}+k_{s}}\ \bullet c_{X}+{b_{10}\bullet a_{10}\bullet c}_{1}+\sum_{j=1}^{4}{{b_{1j}\bullet a_{1j}\bullet c}_{1}{\bullet c}_{j}}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(4a\right)\nonumber \\ \end{equation}\begin{equation} \frac{dc_{N}}{\text{dt}}=b_{N1}\bullet\left(-Y_{N/X}\bullet\frac{u_{0}\bullet c_{N}}{c_{N}+K_{N}}\ \ \bullet\frac{I_{0}e^{-\tau\bullet c_{X}\bullet z}}{I_{0}e^{-\tau\bullet c_{X}\bullet z}+k_{s}}\bullet c_{X}\right)+b_{N2}\bullet\left(-Y_{N/X}\bullet\frac{u_{0}\bullet c_{N}}{c_{N}+K_{N}}\ \bullet c_{X}\right)+F_{\text{in}}\bullet c_{N,in}+{b_{20}\bullet a_{20}\bullet c}_{2}+\sum_{j=1}^{4}{{b_{2j}\bullet a_{2j}\bullet c}_{2}{\bullet c}_{j}}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4b)\nonumber \\ \end{equation}\begin{equation} \frac{dc_{L}}{\text{dt}}=b_{L1}\bullet\left(Y_{L/X}\bullet u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \ \bullet\frac{I_{0}e^{-\tau\bullet c_{X}\bullet z}}{I_{0}e^{-\tau\bullet c_{X}\bullet z}+k_{\text{sL}}}\bullet c_{X}\right)+b_{L2}\bullet\left(Y_{L/X}\bullet u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \ \bullet c_{X}\right)+{b_{30}\bullet a_{30}\bullet c}_{3}+\sum_{j=1}^{4}{{b_{3j}\bullet a_{3j}\bullet c}_{3}{\bullet c}_{j}}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(4c\right)\nonumber \\ \end{equation}\begin{equation} b_{N1}+b_{N2}=1\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \left(4d\right)\nonumber \\ \end{equation}\begin{equation} b_{L1}+b_{L2}=1\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \left(4e\right)\nonumber \\ \end{equation}
where \(b_{\text{Ni}}\) and \(b_{\text{Li}}\) (\(i=1,2\)) are binary variables for the kinetic model, \(b_{\text{ij}}\) (\(i=1,2,3\),\(j=0,1,2,3\)) are binary variables for the data-driven model,\(a_{\text{ij}}\) (\(i=1,2,3\), \(j=0,1,2,3,4\)) are coefficients of the polynomial terms, \(c_{i}\) (\(i=1,2,3,4\)) are normalised values of concentrations of biomass, nitrate, and lutein and incident light intensity, respectively.
Once reformulated, the next step is to implement sparse optimisation to automatically identify the most probable kinetic model structure and minimum number of polynomial terms that can well fit the multiple generated datasets. Therefore, the objective function for parameter estimation is designed such that in addition to minimising residues between model prediction and experimental data, it also penalises the total number of active binary variables in the polynomial terms to avoid overfitting. In this way, the optimal hybrid model structure alongside its parameter values can be simultaneously identified. The objective function therefore is formulated as Eq. 5.
\begin{equation} \min{}F=\sum_{k=1}^{n}{\sum_{p=1}^{m}{\sum_{i=1}^{3}{\left(c_{S_{i},p,k,E}-c_{S_{i},p,k,M}\right)^{2}\bullet w_{S_{i},j,k}}}}+w_{b}\bullet\sum_{j=0}^{4}{\sum_{i=1}^{3}b_{\text{ij}}}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(5\right)\nonumber \\ \end{equation}
where \(c_{S_{i},p,k,E}\) and \(c_{S_{i},p,k,E}\) are experimental and model simulated value for concentration of state \(S_{i}\) at time step\(p\in m\) in the \(k\)th dataset (total number of datasets is \(n\)), respectively, \(w_{S_{i},j,k}\) and \(w_{b}\) are the weight for each data point and the sum of binary variables, respectively.
In this work, parameter estimation was formulated as a weighted nonlinear least squares optimisation problem. However, as the current model is highly nonlinear due to its complex model structure and many discrete (binary) variables, it must be relaxed so that it can be solved using the standard dynamic parameter estimation method22. Conventionally, an MINLP problem is relaxed via introducing new parameters and extra linear constraints to reduce nonlinearity of the original problem 35. For example, converting Eq. (4a) to Eq. (6a) where new continuous parameters\(B_{1j}\) are introduced to substitute the bilinear term\(b_{1j}\bullet a_{1j}\) and satisfy the constraints Eq. (6b)-(6c).
\begin{equation} \frac{dc_{X}}{\text{dt}}=u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \bullet\frac{I_{0}e^{-\tau\bullet c_{X}\bullet z}}{I_{0}e^{-\tau\bullet c_{X}\bullet z}+k_{s}}\bullet c_{X}+{B_{10}\bullet c}_{1}+\sum_{j=1}^{4}{{B_{1j}\bullet c}_{1}{\bullet c}_{j}}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(6a\right)\nonumber \\ \end{equation}\begin{equation} a_{1j}-a_{1j}^{U}\bullet\left(1-b_{1j}\right)\leq B_{1j}\leq a_{1j}-a_{1j}^{L}\bullet\left(1-b_{1j}\right)\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(6b\right)\nonumber \\ \end{equation}\begin{equation} a_{1j}^{L}\bullet b_{1j}\leq B_{1j}\leq a_{1j}^{U}\bullet b_{1j}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(6c\right)\nonumber \\ \end{equation}
where \(a_{1j}^{L}\) and \(a_{1j}^{U}\) are the lower and upper bound of\(a_{1j}\), respectively.
In spite of its uses over a range of chemical engineering research, this relaxation still results in an MINLP. Given this hybrid dynamic model is highly nonlinear, another relaxation is proposed in this study to reformulate the MINLP into an NLP (nonlinear programming problem) by converting binary variables into continuous variables with two extra constraints Eq. (7a)-(7b). To satisfy constraints Eq. (7a)-(7b), values of \(b_{\text{ij}}\) have to be either 0 or 1. In this way, the original problem can be effectively solved via the continuous dynamic parameter estimation method 22.
\begin{equation} 0\leq b_{\text{ij}}\leq 1\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \left(7a\right)\nonumber \\ \end{equation}\begin{equation} b_{\text{ij}}\bullet\left(1-b_{\text{ij}}\right)\leq 0\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \left(7b\right)\nonumber \\ \end{equation}
The NLP model was discretised by orthogonal collocation over finite elements in time, and was then solved using the interior point nonlinear optimisation solver IPOPT through a multi-start framework. This was executed in the Python optimisation environment Pyomo36. The model was simulated in Mathematica 11. Details regarding the this dynamic model parameter estimation procedure can be found in 22.