2.2 Hybrid model construction
As stated previously, the complexity and nonlinearity of a bioprocess
consists both of identified physical mechanisms and undetermined process
dynamics. Therefore, as shown in Eq. 2, the principle of a hybrid model
is to quantify the underlying bioprocess behaviour by using a kinetic
model (first term on the right-hand side of Eq. (2a)) to tackle the
known dynamics (physical knowledge) and a data-driven model (second term
on the right-hand side of Eq. (2a)) to account for the unknown dynamics.
\begin{equation}
\frac{d\mathbf{S}}{\text{dt}}=K\left(\mathbf{S}\right)+D\left(\mathbf{S}\right)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2a)\nonumber \\
\end{equation}\begin{equation}
D\left(S_{i}\right)={a_{i}\bullet S}_{i}+\sum_{j=1}^{M}{{a_{\text{ij}}\bullet S}_{i}{\bullet S}_{j}}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2b)\nonumber \\
\end{equation}where \(\mathbf{S}\) is the state variable vector\(\mathbf{S=}\left(X,N,P\right)^{T}\),and \(X,N,P\) represent the
concentrations of biomass, nutrient, and product respectively.\(K\left(\mathbf{S}\right)\) and \(D\left(\mathbf{S}\right)\) are
the kinetic model and the data-driven model, respectively. \(S_{i}\) is
a state variable, \(M\) is the total number of state and control
variables, \(a_{i}\) and \(a_{\text{ij}}\) are coefficients of the
polynomial terms.
Distinct from a pure kinetic model, the kinetic model used for hybrid
model construction does not require a complex model structure to fully
capture the process nonlinearity; it only aims to approximate the
overall trend of process dynamics. Thus, classic kinetic models such as
the Monod model and the Droop model can be directly adopted without the
necessity of further modification based on more detailed physical
information. Similarly, compared to a pure data-driven model, the
data-driven model used in a hybrid model simulates only the unknown
terms, in other words, mismatch between the kinetic model and the
process. The nonlinearity of this mismatch is greatly reduced compared
to the original bioprocess, as the general behaviour has been described
by the kinetic model. A number of previous studies have shown that after
subtracting the mechanistic process trajectory, even a simple
data-driven model e.g. PLS which is mainly used for linear
systems, can well capture the processes physically undetermined
behaviour 11. As a result, it may not be necessary to
embed sophisticated data-driven models such as ANNs or GPs into a hybrid
model.
In this study, a 2nd degree polynomial regression
model shown as Eq. (2b) was selected as the data-driven model to
estimate the mismatch between the kinetic model and the process data.
2nd degree polynomial regression models have been
predominantly used in response surface methodology for optimal
experimental design and analysis 23,24. Their use has
been extended into dynamic systems through the recent progress in
dynamic response surface methodology 25. Moreover, a
2nd degree dynamic polynomial model is also known as
an extension of the Lotka-Volterra model 26, a classic
model used in bioinformatics to simulate growth and competition amongst
different populations. Parameter estimation and uncertainty analysis of
a polynomial regression model is more straightforward to implement when
compared to a complex data-driven model, as extracting gradient
information is challenging in ANNs or GPs, as is their parameter
estimation and optimisation. This feature is particularly advantageous
for industrial applications, as estimating bioprocess uncertainty is a
severe challenge for industrial systems operation and decision-making.
When considering the kinetic part \(K\left(\mathbf{S}\right)\), its
aim is to approximate the process trajectory. As the current system is
mainly affected by light intensity, light attenuation, and nitrate
supply, only these three factors are considered when building the simple
kinetic model. Photo-inhibition, biomass decay, and lutein
self-degradation are not included
in\(\text{\ K}\left(\mathbf{S}\right)\) as their effects are subtle
and will result in a highly complex model structure such as Eq.
(1a)-(1f). They are characterised as the model-plant mismatch and are
considered by the data-driven model. The kinetic part of the hybrid
model is presented in Eq. (3a)-(3d). It is worth noticing that specific
to algal photo-production systems, although light intensity affects cell
growth which in turn influences nitrate uptake, it has yet been
confirmed if light directly triggers nitrate consumption. In fact,
several previously proposed models do not link light intensity with
nitrate uptake (denoted by Eq. (3b)) 27,28, whilst
others tried to establish a direct interaction between these two (seen
as Eq. (3c)) 29,30. Similarly, it is still not clear
how significantly light affects lutein synthesis. Thus, Eq. (3d) and Eq.
(3e) are proposed in this study based on different hypothesises (no/weak
effect and strong effect, respectively). The final kinetic model
structure will be determined via the automatic model identification
framework as presented in the next section.
\begin{equation}
\left.\ \frac{dc_{X}}{\text{dt}}\right|_{K}=u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \bullet\frac{I_{0}e^{-\tau\bullet c_{X}\bullet z}}{I_{0}e^{-\tau\bullet c_{X}\bullet z}+k_{s}}\ \bullet c_{X}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(3a\right)\nonumber \\
\end{equation}\begin{equation}
\left.\ \frac{dc_{N}}{\text{dt}}\right|_{K}=-Y_{N/X}\bullet u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \bullet c_{X}+F_{\text{in}}\bullet c_{N,in}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(3b\right)\nonumber \\
\end{equation}\begin{equation}
\left.\ \frac{dc_{N}}{\text{dt}}\right|_{K}=-Y_{N/X}\bullet u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \ \bullet\frac{I_{0}e^{-\tau\bullet c_{X}\bullet z}}{I_{0}e^{-\tau\bullet c_{X}\bullet z}+k_{s}}\bullet c_{X}+F_{\text{in}}\bullet c_{N,in}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(3c\right)\nonumber \\
\end{equation}\begin{equation}
\left.\ \frac{dc_{L}}{\text{dt}}\right|_{K}=Y_{L/X}\bullet u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \ \bullet c_{X}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(3d\right)\nonumber \\
\end{equation}\begin{equation}
\left.\ \frac{dc_{L}}{\text{dt}}\right|_{K}=Y_{L/X}\bullet u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \ \bullet\frac{I_{0}e^{-\tau\bullet c_{X}\bullet z}}{I_{0}e^{-\tau\bullet c_{X}\bullet z}+k_{\text{sL}}}\bullet c_{X}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(3e\right)\nonumber \\
\end{equation}where the subscript \(K\) refers to the kinetic part of the hybrid
model.