Evaluation of the Quality and Reliability of ChatGPT-4's Responses on
Allergen Immunotherapy Using Validated Tools
Abstract
Background: Artificial Intelligence (AI) technologies could
potentially change many aspects of clinical practice. While Allergen
Immunotherapy (AIT) can change the course of allergic diseases providing
relief of symptoms that extend for many years after treatment
completion, it can also bring uncertainty to patients, who turn to
readily available resources such as ChatGPT-4 to address these doubts.
The aim of this study was to use validated tools to evaluate the
information provided by ChatGPT-4 regarding AIT in terms of quality,
reliability and readability. Methods: In accordance with AIT
clinical guidelines, 24 questions were selected and introduced in
ChatGPT-4. Answers were evaluated by a panel of allergists, using
validated tools DISCERN, JAMA Benchmark and Flesch Reading Ease Score
and Grade Level. Results: Questions were sorted into 6
categories. ChatGPT provided bad quality information according to
DISCERN medians scores in the “Definition”, “Standardization and
Efficacy”, and “Safety and Adverse Reactions” categories. It provided
insufficient information according to JAMA Benchmark across all
categories. Finally, ChatGPT-4 answers required a “college graduate”
level of education to be understood as they were very difficult to read.
Conclusions: ChatGPT-4 exhibits potential as a valuable
complement to healthcare; however, it requires further refinement. The
information it provides should be approached with caution regarding its
quality, as significant details may be omitted or may not be fully
comprehensible. Artificial intelligence models continue to evolve, and
medical professionals should participate in this process, given that AI
impacts various aspects of life, including health, to ensure the
availability of optimal information.