\papertype Original Article Large language models (LLMs), like the GPT series, have recently emerged as transformative tools in the medical field due to their human-like language generation and understanding. This systematic review examines the evolution, applications, and challenges of medical LLMs in digital health and clinical technology. A structured search was conducted across ScienceDirect, PubMed, Scopus, and manual sources from 2007 to 2024, following PRISMA 2020 guidelines. After applying inclusion and exclusion criteria, 179 studies were selected from an initial pool of 698 papers. Among the 30 papers reviewed, most research centered on GPT-based models, with over 81% demonstrating strong performance in language generation, diagnostic assistance, and clinical documentation, based on automated metrics and human feedback. Notably, some models achieved up to 90% satisfaction from healthcare professionals. The findings reveal LLMs’ potential to enhance patient interaction, decision support, and overall healthcare efficiency. This review contributes by synthesizing key advancements, assessing model performance, and outlining ethical challenges such as trust, privacy, and safe deployment. It offers novel insights for researchers and practitioners seeking to adopt or improve LLM integration in healthcare. Future directions include improving transparency, developing domain-specific models, and establishing regulatory frameworks for responsible use.