DISCUSSION
In this cohort of individuals in the 2017/2018 and 2018/2019 influenza seasons, we created clinically meaningful groups using k-medoids clustering to improve the analysis of severity in a population of patients hospitalized with influenza. Our results suggest that those who were in clusters with hyperglycemia and lower oxygen saturation at admission had higher risk of adverse in-hospital sequelae, and are thus potential cohorts of interest for further study of vaccine or antiviral effects.
We found glucose to be significantly different between clusters, with one cluster having significantly higher glucose in both years. The distribution of diabetes was also consistent across years, with approximately 70% prevalence in the high-glucose clusters and 30% prevalence in the non-hyperglycemic clusters. Together, these results highlight that the use of simple dichotomous classifications for complex conditions such as diabetes may not accurately indicate a patient’s risk for adverse outcomes. Indeed, controlling for such complex confounding has long been problematic within infectious disease severity research, most recently when examining treatments and hospital outcomes related to infection with SARS-CoV-2, leading to inconsistent results25-27. This challenge is due in part to differential measurement and management of confounding, including analyses at the point of hospitalization admission, given model limitations in the number of confounders which can be included and their often-complex interrelationships. The use of techniques such as k-medoids clustering to simultaneously account for multiple measures of comorbidity and group like patients together independent of outcomes-based analysis provides a tool to increase homogeneity within groups and heterogeneity across groups for a more robust confounding adjustment.
More traditional dimensional reduction methods such as the use of propensity score matching have often been used to account for differential patterns of comorbidities between groups of interest. While propensity score matching is useful in reducing heterogeneity in the presence of a single exposure of interest, it becomes complex in instances where multiple treatments or exposures are being compared simultaneously. Additionally, there is inherent reduction in sample size when matching, limited by the number of individuals with and without the exposure having similar propensity scores; individuals in either group with uncommon comorbidity profiles may be overlooked and excluded from the matching if their propensity score does not align. For example, a 2020 study by Groeneveld et al examining the effective of oseltamivir lost 36% of oseltamivir patients and 65% of controls when matching, reducing the sample size to 88 pairs6. While use of propensity score matching has been shown to reduce bias28, such significant loss of data, especially in a rare-outcomes setting, may lead to an increase in Type II error, and thus incorrect conclusions, due to inadequate power29,30. K-medoids clustering can be used to identify subgroups that are biologically different without such restrictions, maintaining sample size for more robust analysis of effect modification by multiple treatment types. It should be noted that outliers within the range of biologically normal values are of great clinical significance, as these individuals may be at higher risk for adverse outcomes. K-medoids clustering is robust to such outliers through use of data-derived centroids for the clusters, rather than an arbitrary mean.
This study has several strengths, most notably that the cohort was nested within a large prospective two-center study of influenza vaccine effectiveness across multiple seasons, allowing for a robust and diverse analytic cohort. Both case definition and EHR data capture were standardized across sites, reducing heterogeneity of data quality. Additionally, the use of two hospitals within our region allowed for a more generalizable analysis. The biggest limitation of the study is small sample size and small number of outcomes; however, we believe our analysis has minimized some of the bias from this limitation.
The use of k-medoids clustering to characterize heterogeneity in severity analysis has many direct and current applications. One of the most immediate applications can be for evaluating the effectiveness of new and existing antivirals for severe respiratory disease. Previous studies of such treatments have utilized traditional methods of covariate adjustment, which may contribute to heterogeneity of study findings31. The use of this clustering method to phenotype baseline presentation can reduce this confounding, and can be quickly implemented for these analyses. Such a technique will be needed as we continue understand how new antiviral treatments affect severity, and how vaccination impacts severity in instances of low vaccine effectiveness.