loading page

Genetic Algorithm based Semi-supervised Convolutional Neural Network for Real-time Monitoring of Escherichia Coli Fermentation of Recombinant Protein Production Using a Raman Sensor
  • +8
  • Zhenguo WEN,
  • Yuan Liu,
  • Xiaotian Zhou,
  • Teng Wang,
  • An Luo,
  • Zhaojun Jia,
  • Xingquan Pan,
  • Weiqi Cai,
  • Mengge Sun,
  • Xuezhong Wang,
  • Guangzheng Zhou
Zhenguo WEN
Beijing Institute of Petrochemical Technology

Corresponding Author:wenzhenguo@bipt.edu.cn

Author Profile
Yuan Liu
Beijing Institute of Petrochemical Technology
Author Profile
Xiaotian Zhou
Beijing Institute of Petrochemical Technology
Author Profile
Teng Wang
Beijing Institute of Petrochemical Technology
Author Profile
An Luo
Beijing Institute of Petrochemical Technology
Author Profile
Zhaojun Jia
Beijing Institute of Petrochemical Technology
Author Profile
Xingquan Pan
Beijing Institute of Petrochemical Technology
Author Profile
Weiqi Cai
Beijing Institute of Petrochemical Technology
Author Profile
Mengge Sun
Beijing Institute of Petrochemical Technology
Author Profile
Xuezhong Wang
Beijing Institute of Petrochemical Technology
Author Profile
Guangzheng Zhou
Beijing Institute of Petrochemical Technology
Author Profile

Abstract

Raman spectroscopy, as a label-free sensor, is commonly used for real-time monitoring of key parameters in the cultivation of recombinant protein. However, ensuring accurate parameter values necessitates a large quantity of offline measurement data, which is time-consuming and labor-intensive. In order to address the limitations of conventional complex data preprocessing, this study considers a genetic algorithm-based semi-supervised convolutional neural network (GA-SCNN). The GA-SCNN facilitates feature extraction and unsupervised sequence labeling, and has been applied to the model system of E. coli expressing recombinant ProA5M protein. By applying model prediction and sequence interpolation techniques, the GA-SCNN significantly expanded the database for glucose, lactate, ammonium ions, and OD600 from 52 to 1302 samples. A comparative analysis using standard regression algorithms has demonstrated the superior predictive performance of the GA-SCNN framework when dealing with a large volume of spectral data without the requirement for preprocessing. Model cross-validation has confirmed high accuracy and robustness in determining coefficients. In addition, a transfer learning strategy has been employed using the OD600 data and limited recombinant protein expression data to develop a prediction model for the target protein. Validation experiments demonstrate good agreement between model predictions and offline results.
18 Oct 2023Submitted to Biotechnology and Bioengineering
18 Oct 2023Submission Checks Completed
18 Oct 2023Assigned to Editor
18 Oct 2023Review(s) Completed, Editorial Evaluation Pending
31 Oct 2023Reviewer(s) Assigned