Morteza Heydari

and 3 more

Aims: The present study aims to assess the capabilities, limitations, and practical considerations of integrating Large language models alongside pharmacists. Specifically, the performance of four LLMs was evaluated in responding to related queries for oral Medication dosage prescriptions across different age Groups. Method: The questions were categorized into seven domains. These questions were selected based on the most frequently prescribed drug referenced in global guidelines. Three questions and a case scenario per question were designed for each domain. Prompts were written using the zero-shot method, and collected responses were assessed based on five key factors: Response rate, Accuracy, Completeness, Clarity, and Safety, by comparing them against UpToDate. Results: None of the LLMs had direct access to UpToDate. However, GPT-4o responded correctly to all of the case-based questions. While GPT4o achieved the highest performance, results showed Copilot significantly weaker than it (P<0.05). Meanwhile, the lowest Response rate was observed in Gemini1.5Pro, while Copilot ranked last. Additionally, all LLMs were Safe except Copilot and Claud3.5sonnet V2, which produced unsafe and hazardous responses. Discussions: Findings underscore that while LLMs avoided answering or provided incomplete information, GPT-4o has had promising performance in handling both simple and complex queries in direct and case-based questions. Our results highlight the need to consider specific conditions before the wider integration of LLMs in pharmaceutical practice.