Remzi Dogan -

Objectives: This study aims to evaluate and compare the accuracy, comprehensiveness, and readability of responses generated by ChatGPT-3.5 and ChatGPT-4.0 to common patient questions regarding pharyngitis. Design: A cross-sectional study design was employed, wherein 30 potential patient questions regarding pharyngitis were posed to both ChatGPT-3.5 and ChatGPT-4.0. The questions were categorized into general information, diagnosis, treatment and management, and complications. Setting: The study was conducted online, with evaluations carried out independently by five ENT specialists and three ENT residents using a standardized questionnaire. Participants: Five ENT specialists with over 10 years of experience and three ENT residents with more than three years of experience participated in grading the responses. Main Outcome Measures: Responses were assessed for accuracy using a four-point grading system. Readability was evaluated using the Flesch-Kincaid Grade Level and Flesch Reading Ease scores. Statistical analysis was performed using chi-square tests for accuracy and independent samples t-tests for readability. Results: ChatGPT-4.0 provided more comprehensive and accurate responses than ChatGPT-3.5, with significant improvements observed across all questions (p = 0.01). However, no significant differences were noted in specific sections (p > 0.05). Readability scores indicated both versions required a high educational level, with ChatGPT-4.0 showing marginally better readability (p < 0.05). Conclusion: While ChatGPT-4.0 showed enhanced accuracy compared to ChatGPT-3.5, both versions exhibited high reading difficulty, underscoring the need for more accessible language in AI-generated medical content. Further research is necessary to improve readability and define the safe integration of AI in healthcare.