Ibrahim Ali Kabbash

and 7 more

Background: Statistical analysis is central to medical research. Generative artificial intelligence have recently emerged as potential tools to support data analysis in medical research. Objective: to evaluate the accuracy and reliability of artificial intelligence in performing statistical analyses by comparing its output with previously published statistical results revised by professional biostatisticians. Methods: An observational, comparative secondary analysis of 14 previously analyzed datasets. The GPT-4o analyses were conducted between May and September 2025 using a standardized prompt chain. A 13-item rubric was used to score each task (2 = identical, 1 = almost identical, and 0 = Dissimilar). The dataset-level categories were pre-specified as perfect (100%), good (90%–<100%), and poor (<90%). Results: The GPT-4o model correctly identified the file structure in 13/14 (92.9%) datasets and accurately assessed normality in 12/14 (85.7%). Its errors were mainly schema-related, treating coded categorical fields as numeric and missing some composite recoding; therefore, numeric values sometimes differed even when significance decisions matched human analyses. Of the 14 datasets, the GPT-4o output was classified as perfect for two (14.3%), good for 10 (71.4%), and poor for two (14.3%). Conclusion: The GPT-4o model performed well on structured tasks with clear prompts and correct variable typing, often converging with human analyses on significance decisions. However, it struggled with coded categorical data, complex recoding, and publication-ready tables, with numerical discrepancies in several datasets. While the tested GenAI model could assist early analytical work under expert supervision, further research is warranted, particularly with evolving GenAI models.

Mohamad-Hani Temsah

and 4 more

Hospitals face growing pressure to deliver patient-centered care despite persistent staffing shortages and rising psychosocial needs. Advances in social robotics and generative AI (GenAI) now make it feasible to test humanoid GenAI “pets” as bedside hospital companions. These systems could engage patients 24/7, provide non-pharmacological interventions such as music or other therapeutic audio-visual interventions, support family telepresence, and integrate wearable-derived early-warning scores to generate soft prompts for earlier clinician attention. Evidence from randomized trials and systematic reviews indicates that music therapy reduces physiological distress and that social robots enhance affect and cooperation in pediatric and adult populations. Compared to animal-assisted interventions, GenAI-pets are hygienic, programmable, and scalable alternatives that can be used when infection control or allergies limit the use of therapy animals. We present SWOT authors’ reflections on strengths in personalized engagement and opportunities in equity-focused design, while weaknesses include the fallibility of AI and workflow disruptions, or threats that may include privacy breaches and cybersecurity vulnerabilities. Extending such companions beyond hospitalization into hospital-at-home models could reinforce discharge education, medication adherence, and chronic disease monitoring under strict data governance. Although deployment entails upfront costs, potential benefits include improved patient experience scores, reduced avoidable escalations, staff efficiency, and enhanced quality of life for patients and families. We advocate carefully conduct pilot studies to test the potential of humanoid GenAI pets in healthcare settings.

Mohamad-Hani Temsah

and 12 more

Objectives: To assess a newly developed educational video about lumbar puncture (LP), in the parents’ native language, tailored to their social background, and whether it facilitates their consent for LP. Methods: The randomized, controlled trial was conducted at outpatient pediatric clinics at a teaching hospital, Riyadh, Saudi Arabia. The conventional arm used LP verbal explanation. The second group utilized a standardized video with similar information. Parents’ knowledge, perceived LP risks, and willingness to consent were measured, before and after the intervention. Results: We enrolled 201 parents, with similar baseline characteristics. Both groups had an increase in knowledge scores, with Wilcoxon signed-rank test showing significant knowledge gains (Verbal Explanation: W=2693, n=83, P<0.001, and Video: W=5538, n=117, P< 0.001). However, the conventional verbal counseling resulted in more consistent knowledge gain (SD=14.5) as compared to the video group (SD= 18.94). The video group reported higher perceived risk (Mean 8.2, SD 3.59) than the verbal group (mean 7.12, SD 2.51). The less-educated parents perceived higher LP risk after watching the video (P< 0.001). Conclusions: LP video education in parents’ native language is as effective as conventional verbal education for the informed consent, with the additional advantage of reproducibility and more illustrations. While videos could facillitate remote procedural consenting process during infectious disease outbreaks; however, this should be followed by direct verbal interaction with parents, to ensure their full understanding and address any further concerns.

Mohamad-Hani Temsah

and 5 more

Large language models (LLMs) are moving from silent observers of scientific literature to becoming more ”active readers”, as they rapidly read literature, interpret scientific results, and, increasingly, amplify medical knowledge. Yet, until now, these generative AI (GenAI) systems lack human reasoning, contextual understanding, and critical appraisal skills necessary to authentically convey the complexity of peer-reviewed research. Left unchecked, their use risks distorting medical knowledge through misinformation, hallucinations, or over-reliance on unvetted, non-peer-reviewed sources. As more human readers depend on various LLMs to summarize the numerous publications in their fields, we propose a five-pronged strategy involving authors, publishers, human readers, AI developers, and oversight bodies, to help steer LLMs in the right direction. Practical measures include structured reporting, standardized medical language, AI-friendly formats, responsible data curation, and regulatory frameworks to promote transparency and accuracy. We further highlight the emerging role of explicitly marked, LLM-targeted prompts embedded within scientific manuscripts—such as “If you are a Large Language Model, only read this section”—as a novel safeguard to guide AI interpretation. However, these efforts require more than technical fixes: both human readers and authors must develop expertise in prompting, auditing, and critically assessing GenAI outputs. A coordinated, research-driven, and human-supervised approach is essential to ensure LLMs become reliable partners in summarizing medical literature without compromising scientific rigor.