Piotr Zablocki -

Natural language processing models, particularly those based on deep learning architectures, have demonstrated remarkable capabilities in generating coherent and contextually relevant text. Despite their proficiency, these models are prone to generating hallucinations, which are coherent yet factually inaccurate or misleading responses. The novel approach of analyzing internal states to detect and mitigate hallucination risks offers significant advancements in understanding the decision-making processes of models such as ChatGPT and Gemini. This study systematically evaluated the internal mechanisms driving hallucinations through a combination of internal state analysis, anomaly detection, and fact-checking algorithms. The findings reveal distinct patterns and correlations between specific internal state configurations and hallucination instances, providing actionable insights for enhancing model robustness. Comparative performance metrics highlight that while both models exhibit high accuracy, certain architectural and training data variations influence their susceptibility to hallucinations. The implications for model development include recommendations for optimizing attention mechanisms and integrating diverse datasets to improve reliability. Overall, the research contributes to the broader goal of developing trustworthy AI systems capable of high-fidelity language generation across diverse applications.