2.2 Hybrid model construction
As stated previously, the complexity and nonlinearity of a bioprocess consists both of identified physical mechanisms and undetermined process dynamics. Therefore, as shown in Eq. 2, the principle of a hybrid model is to quantify the underlying bioprocess behaviour by using a kinetic model (first term on the right-hand side of Eq. (2a)) to tackle the known dynamics (physical knowledge) and a data-driven model (second term on the right-hand side of Eq. (2a)) to account for the unknown dynamics.
\begin{equation} \frac{d\mathbf{S}}{\text{dt}}=K\left(\mathbf{S}\right)+D\left(\mathbf{S}\right)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2a)\nonumber \\ \end{equation}\begin{equation} D\left(S_{i}\right)={a_{i}\bullet S}_{i}+\sum_{j=1}^{M}{{a_{\text{ij}}\bullet S}_{i}{\bullet S}_{j}}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2b)\nonumber \\ \end{equation}
where \(\mathbf{S}\) is the state variable vector\(\mathbf{S=}\left(X,N,P\right)^{T}\),and \(X,N,P\) represent the concentrations of biomass, nutrient, and product respectively.\(K\left(\mathbf{S}\right)\) and \(D\left(\mathbf{S}\right)\) are the kinetic model and the data-driven model, respectively. \(S_{i}\) is a state variable, \(M\) is the total number of state and control variables, \(a_{i}\) and \(a_{\text{ij}}\) are coefficients of the polynomial terms.
Distinct from a pure kinetic model, the kinetic model used for hybrid model construction does not require a complex model structure to fully capture the process nonlinearity; it only aims to approximate the overall trend of process dynamics. Thus, classic kinetic models such as the Monod model and the Droop model can be directly adopted without the necessity of further modification based on more detailed physical information. Similarly, compared to a pure data-driven model, the data-driven model used in a hybrid model simulates only the unknown terms, in other words, mismatch between the kinetic model and the process. The nonlinearity of this mismatch is greatly reduced compared to the original bioprocess, as the general behaviour has been described by the kinetic model. A number of previous studies have shown that after subtracting the mechanistic process trajectory, even a simple data-driven model e.g. PLS which is mainly used for linear systems, can well capture the processes physically undetermined behaviour 11. As a result, it may not be necessary to embed sophisticated data-driven models such as ANNs or GPs into a hybrid model.
In this study, a 2nd degree polynomial regression model shown as Eq. (2b) was selected as the data-driven model to estimate the mismatch between the kinetic model and the process data. 2nd degree polynomial regression models have been predominantly used in response surface methodology for optimal experimental design and analysis 23,24. Their use has been extended into dynamic systems through the recent progress in dynamic response surface methodology 25. Moreover, a 2nd degree dynamic polynomial model is also known as an extension of the Lotka-Volterra model 26, a classic model used in bioinformatics to simulate growth and competition amongst different populations. Parameter estimation and uncertainty analysis of a polynomial regression model is more straightforward to implement when compared to a complex data-driven model, as extracting gradient information is challenging in ANNs or GPs, as is their parameter estimation and optimisation. This feature is particularly advantageous for industrial applications, as estimating bioprocess uncertainty is a severe challenge for industrial systems operation and decision-making.
When considering the kinetic part \(K\left(\mathbf{S}\right)\), its aim is to approximate the process trajectory. As the current system is mainly affected by light intensity, light attenuation, and nitrate supply, only these three factors are considered when building the simple kinetic model. Photo-inhibition, biomass decay, and lutein self-degradation are not included in\(\text{\ K}\left(\mathbf{S}\right)\) as their effects are subtle and will result in a highly complex model structure such as Eq. (1a)-(1f). They are characterised as the model-plant mismatch and are considered by the data-driven model. The kinetic part of the hybrid model is presented in Eq. (3a)-(3d). It is worth noticing that specific to algal photo-production systems, although light intensity affects cell growth which in turn influences nitrate uptake, it has yet been confirmed if light directly triggers nitrate consumption. In fact, several previously proposed models do not link light intensity with nitrate uptake (denoted by Eq. (3b)) 27,28, whilst others tried to establish a direct interaction between these two (seen as Eq. (3c)) 29,30. Similarly, it is still not clear how significantly light affects lutein synthesis. Thus, Eq. (3d) and Eq. (3e) are proposed in this study based on different hypothesises (no/weak effect and strong effect, respectively). The final kinetic model structure will be determined via the automatic model identification framework as presented in the next section.
\begin{equation} \left.\ \frac{dc_{X}}{\text{dt}}\right|_{K}=u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \bullet\frac{I_{0}e^{-\tau\bullet c_{X}\bullet z}}{I_{0}e^{-\tau\bullet c_{X}\bullet z}+k_{s}}\ \bullet c_{X}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(3a\right)\nonumber \\ \end{equation}\begin{equation} \left.\ \frac{dc_{N}}{\text{dt}}\right|_{K}=-Y_{N/X}\bullet u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \bullet c_{X}+F_{\text{in}}\bullet c_{N,in}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(3b\right)\nonumber \\ \end{equation}\begin{equation} \left.\ \frac{dc_{N}}{\text{dt}}\right|_{K}=-Y_{N/X}\bullet u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \ \bullet\frac{I_{0}e^{-\tau\bullet c_{X}\bullet z}}{I_{0}e^{-\tau\bullet c_{X}\bullet z}+k_{s}}\bullet c_{X}+F_{\text{in}}\bullet c_{N,in}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(3c\right)\nonumber \\ \end{equation}\begin{equation} \left.\ \frac{dc_{L}}{\text{dt}}\right|_{K}=Y_{L/X}\bullet u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \ \bullet c_{X}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(3d\right)\nonumber \\ \end{equation}\begin{equation} \left.\ \frac{dc_{L}}{\text{dt}}\right|_{K}=Y_{L/X}\bullet u_{0}\bullet\frac{c_{N}}{c_{N}+K_{N}}\ \ \bullet\frac{I_{0}e^{-\tau\bullet c_{X}\bullet z}}{I_{0}e^{-\tau\bullet c_{X}\bullet z}+k_{\text{sL}}}\bullet c_{X}\text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ }\left(3e\right)\nonumber \\ \end{equation}
where the subscript \(K\) refers to the kinetic part of the hybrid model.