Challenges and pitfalls for AI application in medicine
AI is not without pitfalls, and serious challenges must be overcome to
deliver its full potential. The most critical challenges are described
below, with potential directions to surmount them. For a more in-depth
discussion of AI’s current most pressing issues, the reader is referred
to several excellent reviews.
Data
AI systems and models are as good as the data they learn from. This
relates to the data’s (1) quality and quantity, (2) suitability, and (3)
availability. The first challenge refers to data quality and quantity.
Low quality of input data, leading to biased outcomes, is often referred
to as the GIGO (‘garbage in garbage out’) principle. Data quantity also
remains challenging, since AI models are extremely ‘data hungry’,
especially for deep learning methods. The availability and quality of
data labels are critical, as label inaccuracies directly impair model
reliability. In particular for images, manual labeling of images is
time-consuming. Combining and harmonizing multiple datasets is
increasingly used to overcome these data limitations. The use of
synthetic data may also help, where additional data is generated by
simulating from a known data distribution, which has been shown to
improve model performance. Similarly, in image analysis, data
augmentation is often used to (fictively) increase the data sample size
by applying data transformations on existing (non-synthetic data
points). Another strategy to improve model reliability on relatively
small datasets is transfer learning , especially popular in NLP
and image analysis. This technique enables researchers to train a
complex model on relatively small datasets by recalibrating existing
parameters of known models.
Data suitability poses a second challenge. Akin to traditional
analytical methods, AI approaches need adequate study designs to yield
reliable outcomes, from data collection to the appropriate analytical
strategy. Training algorithms based on unsuitable data may lead to
biased outcomes. For example, it is increasingly clear that AI and ML
algorithms can engrain racial bias when models are trained using
racially imbalanced data sets.
Data availability may pose a third challenge, as data is often siloed
within individual institutions, and curated, publicly accessible
clinical datasets remain rare. The reason for this includes patient
privacy, lack of data-sharing infrastructure, and competition among
institutions. In immunology, efforts are being made to break open silos
and democratize datasets. Examples include the National Institutes of
Health (NIH)-curated resources on open-access COVID-19 data, or the
European Health Data Space for the safe exchange and reuse of health
data. These developments are aided by novel data sharing and integration
approaches, such as federated learning, where a model is centrally
trained while the data are kept locally. Recently Swarm Learning was
introduced, a decentralized machine-learning approach that does not
require central coordination. The researchers demonstrate that the model
outperforms individual sites in disease classification while retaining
complete confidentiality.
Explainability
The lack of explainability of AI algorithms hampers clinical
implementation. Unlike statistical methods such as regression, which are
inherently explainable, the learned patterns of AI models are more
complex, and their estimated parameters are not directly interpretable
(Figure 3).