Results
OPLS models were built to predict cell growth or mAb expression as a function of amino acid stoichiometric balances. Since cell growth was measured as viable cell density (VCD) throughout the culture, CVC was calculated to quantify the total cells that were present within a given time interval. VCD profiles of CHO cell culture typically follow four growth phases, namely, lag phase, log phase, stationary phase, and death phase (Dutton, Scharer, & Moo-Young, 2006). Similarly, in the training dataset, VCD profiles of the 25 batches also reflected the same four phases but varied drastically in peak cell densities (black lines in Fig. 1a). Peak VCD was observed at day 9 for majority of the training batches with the exception of a few that peaked near day 7 representing those batches that received a lower percentage of the high nutrient feed. In contrast, CVC profiles increase over time as they represented a growing sum of total in the culture (green lines in Fig. 1a). Intuitively, modeling the time at which peak VCD was reach or the end of the log phase would highlight the time-dependent contributions by amino acids from the start of the culture which would produce a model with narrower utility. Since >75% of the training runs showed day 9 as peak VCD and the remaining runs also achieve highest total cells on day 9, the CVC at day 9 was chosen as the response variable for the growth model. A similar OPLS model was created to predict titer but in contrast to VCD, mAb titer was measured as the cumulative total concentration of mAb within the culture at any given time. Since peak mAb titer was observed by the end of all cultures, day 14 titer was chosen as the response variable for the production model (Fig. 1b).
To measure the variability and generalization of a model prediction, a relatively large distribution space was required in the training dataset. Accordingly, the amino acid SBs from the 25 training batches in a BLM format were analyzed by PCA. Each observation or score of the PCA model in BLM format, which represented a single batch, was graphically analyzed in a score plot and the 375 amino acid SBs were analyzed in a loadings plot to identify any collinearity or dependencies of the variables within the datasets that could possibly bias the prediction (Fig. 1c and Fig. 1d). The PCA model explained 38.8% variance in the in the first component and 14.7% variance in the 2ndcomponent (Fig. 1c). Although only two components are graphed, a 5-component model was built to ensure greater than 70% of variance was captured to represent the majority of the dataset. However, based on the first two components alone, the distribution of the batches showed a random dispersion and lack of any specific clustering. In addition, the variable loadings plot did not highlight any time-dependent grouping, suggesting a minimal collinearity or internal bias within the training dataset in terms of amino acid SBs (Fig. 1d). The generalized distribution of the variables provided a strong potential for the OPLS model to learn and predict across a varying space for future batches. The reliability of the PCA model was further justified by the criteria to remove any outliers that could cause internal biases. Accordingly, a 95% Hotelling’s T2 ellipse was provided as a confidence interval around the dataset to identify any batches that deviated from the majority (Rencher, 1993).
To ensure a strong fit for both models without a significant loss of predictive power, the OPLS model for day 9 CVC was built with 1 predictive component and 9 orthogonal components resulting in a R2 of 0.912 and Q2 of 0.726 (Supplementary Fig. S1a). The requirement of additional orthogonal components was further reflected by the diverse spread of CVC profiles throughout the 25 training batches ranging between 25E6 – 45E6 cell-days per mL at day 9. Similarly, there was a large distribution of day 14 titer ranging from 0.2 to 1.05 relative titer values (Fig. 2b). Accordingly, the OPLS model for day 14 Titer had 1 predictive component and 6 orthogonal components with R2 of 0.832 but a Q2 of 0.422. Although the predictive power of the production model to generalize to diverse future batches was not as strong as that of the growth model, it was able to highlight information on key variables for media optimization (Supplementary Fig. S1b).