INTRODUCTION
Infectious respiratory diseases caused by influenza virus, respiratory syncytial virus, and SARS-CoV-2 can cause significant illness and are responsible for hundreds of thousands of hospitalizations in the United States annually1. Data on in-hospital progression of disease and treatment course are broadly available and used to evaluate severity of illness2,3, or the impact of vaccination4,5 and treatment6-8. However, the primary cause of admission, particularly in those with baseline multimorbidity, might not be due to acute illness but other causes either exacerbated by milder respiratory tract infection (e.g., asthma) or possibly unrelated to infection (e.g., dehydration). This might bias results of vaccine or antiviral effectiveness against prevention or attenuation of severe disease. Differences in general health and health care seeking behaviour are difficult to directly measure9,10, and individuals may present and be admitted to the hospital at different stages in their disease course with varying disease severity. These patterns vary by population, health system, and specific etiology11-14. While patients hospitalized with respiratory diseases such as influenza have historically been older with significant comorbidity11,15, the pattern has differed in various phases of the COVID-19 pandemic16.
The heterogeneity of the hospitalized population at admission creates challenges when examining events occurring during hospitalization. Differential baseline comorbidity and presenting symptomology can significantly confound the use of hospital data as a surveillance metric for respiratory disease severity, and can bias estimates of the effectiveness of interventions to reduce influenza morbidity or progression of disease.
Unsupervised machine learning algorithms provide a way to derive and characterize different groups of patients independent of an outcomes or treatment framework17,18. When applied to clinical data, this methodology can help identify distinct phenotypes of individuals driven by underlying relationships between health metrics. The aims of the current study were to develop clinically distinct clusters of patients based on laboratory and physiologic measurements within the first 24 hours of hospitalization, to determine if cluster membership was associated with worse in-hospital outcomes, and to evaluate the association of influenza vaccination on in-hospital outcomes within a given cluster.