Identifying Variable Importance with Stoichiometric Balances
Since amino acid SBs provided information on the cellular metabolic
state in terms of amino acid consumption, model-predicted key amino acid
additions at specific time points provided the potential to improve cell
growth and mAb productivity. However, to identify key stoichiometric
balances from the growth model and the mAb production model, three
aspects had to be considered: (1) a variable ranking heuristic that
could provide a mathematical method to evaluate the importance of each
variable at a given time point; (2) a directionality coefficient that
could provide information on each variable’s relationship towards the
response variable; and (3) the direction of the stoichiometric balance
since positive and negative balances represented two distinct metabolic
states. First, since each component in an OPLS model is described by a
weighted contribution of each variable in the dataset, the cumulative
squared sum of all the weighted contributions of each variable across
all the components was calculated to represent the variable importance
to projection (VIP). Since the VIP is a squared sum always resulting in
a positive magnitude, the VIP was used as a ranking system for all the
variables in terms of importance to the predictability of the model.
Moreover, Powers et al described that the average VIP value compared
across all variables is typically around 1 (Powers et al., 2020).
Accordingly, VIP values greater than 1 were selected to be variables
that significantly contributed towards the predictability of the
response variable whereas those with a VIP less than 0.5 were presumed
to have nominal contributions to the overall model. Variables with VIP
values between 1 and 0.5 added to the accuracy and reliability of the
model however did not significantly contribute to the predictive power
of the model. Second, correlative directionality for each variable was
averaged across all the components and represented as the magnitude and
directionality of each variable’s coefficient. In such a case,
positively correlated values represented those stoichiometric balances
that would increase co-dependently with the response variable regardless
of the actual magnitude of the variable. Lastly, since stoichiometric
balances could exist either as a positive value representing greater
consumption than the theoretical demand or a negative value representing
a lack of consumption compared to the theoretical demand, the
directionality of the stoichiometric balance was used to distinguish
between nutrient rich and nutrient limited conditions. However, since
the model-selected amino acid SBs were the weighted sums across all the
batches, the directionality sign of the amino acid SBs was based on the
process control from the training dataset. Accordingly, Table 2 shows
the directionality signs of all the amino acid SBs from the process
control condition and provides the reference values for biomass and mAb
composition for each amino acid.
Based on the selection factors for variable importance, three distinct
experimental criteria emerged for SBs to validate the growth and
production models. The first Criterion consisted of positively
correlated SBs with a VIP greater than 1 and a positive stoichiometric
balance sign representing amino acids that are being favorably consumed
greater than the theoretical demand. The second Criterion consisted of
positively correlated stoichiometric balances with a VIP greater than 1
and a negative stoichiometric balance sign representing amino acids that
are being favorably consumed less than the theoretical demand. The third
Criterion consisted of positively correlated stoichiometric balances
with a VIP less than 0.5 and a negative stoichiometric balance sign also
representing amino acids that are being favorably consumed less than the
theoretical demand but deemed unimportant by the model. For each
Criterion however, the sign of the magnitude of the stoichiometric
balance was representative of the process control cultures of the
training dataset as opposed to the average across all the training
batches as the goal of the model was to improve cell growth and mAb
production beyond the current benchmark (Table 1). In all cases however,
negatively correlated stoichiometric balances were disregarded since
removing nutrient components from an existing chemically defined medium
poses a greater operational challenge than supplementing additional
nutrients. Therefore, the scope of this study focused only on positively
correlated stoichiometric balances.
For both the growth model and production model, scaled positive
coefficients from OPLS were plotted for each time-dependent
stoichiometric balance grouped by amino acid (Fig. 2). For each plot,
the variables displayed included those that fell within the VIP factor
for each Criterion and the variables highlighted included those that met
the stoichiometric balance sign factor for each Criterion (Growth Model:
Orange Bars; Production Model: Green Bars). For each model and
Criterion, amino acid cocktail feeds were developed for the
corresponding day based on the highlighted amino acid SBs.
Interestingly, most variables for both the growth and the production
models for Criterion 1 had positive stoichiometric balance signs (Fig.
2a and Fig. 2b). This was representative of the high nutrient feed
conditions since amino acids can routinely be saturated within the
extracellular environment. However, Criterion 1 was designed to measure
the effectiveness of providing an increased concentration of amino acids
already being consumed beyond the theoretical demand. For Criterion 2,
only a few variables were highlighted as important based on negative
stoichiometric balance signs. For instance, only alanine (days 1, 2, 4,
5, 6, and 7), cystine (days 3, 4, 6, 7, and 8) , and glycine (days 1, 2,
3, and 7) were identified as important for the growth model (Fig. 2c)
whereas the same variables with the addition of lysine (day 9) and
methionine (day 9) were identified as important in addition to the
former three for the production model (Fig. 2d). Interestingly, the high
crossover of amino acids identified between the production model and the
cell growth model supports the notion that increased total cells would
produce increased antibody. Criterion 3, on the other hand, was
primarily designed to measure the heuristic property of the VIP value
and thus, served as a negative control. Only those stoichiometric
balances were selected that provided minimal contribution to the
predictive power of the model. The selected Criterion 3 amino acids were
like that of Criterion two but were found important on different days
providing additional justification that stoichiometric balances can also
help highlight when a specific nutrient demand is needed by the cells
(Fig. 3e and Fig. 3f).