1. Introduction
Developing industrially focused mathematical models is one of the grand
research challenges for the design, operation and commercialisation of
next generation sustainable chemical and biochemical processes. Due to
the lack of petroleum resources and the severe environmental issues
surrounding them, microorganism based bio-production processes have
become an attractive candidate to substitute traditional chemical
processes for the industrial synthesis of platform chemicals and
high-value materials 1–3. Given the sophisticated
metabolisms, two characteristics exist in most bio-production processes.
The first is that different strains and species share similar behaviour
with respect to biomass growth, nutrient consumption, and bioproduct
accumulation due to their delicate metabolic regulation mechanisms4,5. Whilst the second is that bioprocesses are
difficult to reproduce, meaning that their performance varies from batch
to batch even under similar operating conditions, as metabolic reactions
are sensitive to the change of culture environment6,7.
At this moment, different predictive models have been proposed to
account for bioprocess complexities. On the one hand, elaborate kinetic
models have been developed by embedding new physical understandings into
classic models such as the Monod and the Droop model8,9. These have been used to simulate, optimise, and
scale up both fermentation processes and algal photo-production systems7,10. However, identifying a correct model structure
to quantify the physical knowledge is a challenging task, usually with
long development times. This often results in a complex model structure
leading to issues with parameter estimation and identifiability, and
sacrificing the model’s predictive capability 11. On
the other hand, frontier machine learning models such as artificial
neural networks, Gaussian processes, and reinforcement learning have
been applied for bioprocess dynamic modelling and online optimisation,
and their competency has been reported in a number of publications12–14. Although these data-driven models can well
capture complex process behaviours in a specific operating range without
prior physical knowledge, they suffer from other inherent weaknesses,
such as the risk in model overfitting and difficulties in extrapolating
a broader range of metabolism governed process behaviours11,15.
To resolve these challenges, a third modelling strategy – hybrid
modelling – has been proposed in recent years 16.
This strategy aims to combine physical knowledge and machine learning
into a hybrid model structure to inherit the respective advantages of
both kinetic models and data-driven models. The structure of a hybrid
model is flexible (e.g. parallel structure or sequential
structure) and depends on the amount of available physical information
and process data 17. In spite of its merits and
industrial potential, there exists only a few pioneering research
studies attempting to improve and apply this technology into bioprocess
engineering 18–20. In addition, hybrid model
identification remains a challenge, as its kinetic aspect suffers from
difficulties in quantifying physical knowledge and its data-driven part
poses risks in overfitting. As a result, this study aims to develop a
general framework that integrates state-of-the-art automatic model
structure identification technology into the hybrid modelling strategy
to facilitate its future industrial applications in bioprocess
engineering.