Evaluation of the Quality and Reliability of ChatGPT-4's Responses on Allergen Immunotherapy Using Validated Tools

Ivan Cherrez-Ojeda; Torsten Zuberbier; Gabriela Rodas-Valero; Jorge Sanchez; Michael Rudenko; Stephanie Dramburg; Pascal Demoly; Davide Caimmi; Maximiliano Gómez; German Ramón; Ghada Fouda E; Kim Quimby R; Herberto Chong-Neto; Oscar Calderón; Jose Ignacio Larco; Olga Patricia  Monge Ortega; Marco Faytong-Haro; Oliver Pfaar; Jean Bousquet; Karla Robles-Velasco

doi:10.22541/au.173538245.59171682/v1

loading page

Evaluation of the Quality and Reliability of ChatGPT-4's Responses on Allergen Immunotherapy Using Validated Tools

Ivan Cherrez-Ojeda,
Torsten Zuberbier,
Gabriela Rodas-Valero,
Jorge Sanchez,
Michael Rudenko,
Stephanie Dramburg,
Pascal Demoly,
Davide Caimmi,
Maximiliano Gómez,
German Ramón,
Ghada Fouda E,
Kim Quimby R,
Herberto Chong-Neto,
Oscar Calderón,
Jose Ignacio Larco,
Olga Patricia Monge Ortega,
Marco Faytong-Haro,
Oliver Pfaar,
Jean Bousquet,
Karla Robles-Velasco

Abstract

Background: Artificial Intelligence (AI) technologies could potentially change many aspects of clinical practice. While Allergen Immunotherapy (AIT) can change the course of allergic diseases providing relief of symptoms that extend for many years after treatment completion, it can also bring uncertainty to patients, who turn to readily available resources such as ChatGPT-4 to address these doubts. The aim of this study was to use validated tools to evaluate the information provided by ChatGPT-4 regarding AIT in terms of quality, reliability and readability. Methods: In accordance with AIT clinical guidelines, 24 questions were selected and introduced in ChatGPT-4. Answers were evaluated by a panel of allergists, using validated tools DISCERN, JAMA Benchmark and Flesch Reading Ease Score and Grade Level. Results: Questions were sorted into 6 categories. ChatGPT provided bad quality information according to DISCERN medians scores in the “Definition”, “Standardization and Efficacy”, and “Safety and Adverse Reactions” categories. It provided insufficient information according to JAMA Benchmark across all categories. Finally, ChatGPT-4 answers required a “college graduate” level of education to be understood as they were very difficult to read. Conclusions: ChatGPT-4 exhibits potential as a valuable complement to healthcare; however, it requires further refinement. The information it provides should be approached with caution regarding its quality, as significant details may be omitted or may not be fully comprehensible. Artificial intelligence models continue to evolve, and medical professionals should participate in this process, given that AI impacts various aspects of life, including health, to ensure the availability of optimal information.