Statistical Analysis
General descriptive statistics of all data were computed separately for each influenza season (2017/2018 and 2018/2019) and were reported as means with standard deviations, medians with interquartile range, or frequency and percentage, as appropriate. The normality of data and presence of outliers were assessed using histograms and box-and-whisker plots. Data clustering was performed as above with each year using the PAM algorithm, and model results reported. Patient and clinical characteristics between clusters were compared using Chi-squared or Fisher’s exact tests for categorical variables and independent t-tests, ANOVA, Mann-Whitney U, or Kruskal-Wallis tests for continuous variables, as appropriate.
To determine if different classes of early hospitalization characteristics were associated with severe hospital sequelae, a series of models were constructed separately for each influenza season. For binary outcomes (ICU admission, ventilator use, and prolonged hospital length of stay), Firth’s logistic regression models were constructed. Generalized linear models were used for the continuous outcome of hospital length of stay. Variables chosen a priori for model inclusion were k-medoids cluster group, age, sex, CCI (as a continuous variable), hospital, and influenza vaccination status. An exploratory analysis was conducted as in the primary analysis with the removal of outliers prior to clustering, with an outlier conservatively defined as a value less than the 1st quartile-1.5*(interquartile range) or greater than the 3rd quartile+1.5*(interquartile range). Outliers were then imputed to the mean of remaining values stratified by age group and hospital. To maintain comparability with the primary analysis, the same number of clusters were implemented within a given influenza year.
Analysis was conducted using RStudio version 1.2.5042 and SAS v9.4 (SAS Institute, Cary, NC). A p-value of 0.05 was considered statistically significant.