Omar Shalash

and 3 more

Deception detection is considered a concern for all individuals in their everyday lives, as it greatly affects human interactions. While multiple automatic lie detection systems exist, their accuracy still needs to be improved, additionally, the lack of adequate and realistic datasets hinders the development of reliable systems. This paper presents a new multimodal dataset with physiological data (heart rate, galvanic skin response, and body temperature), in addition to demographic data (age, weight, and height). The presented dataset was collected from 49 unique subjects. Moreover, this paper presents a polygraph-based lie detection system utilizing multimodal sensor fusion. Different machine learning algorithms are used and evaluated; the Random Forest classifier, with 100 estimators, achieves a high accuracy of 97%, outperforming Logistic Regression (58%), Support Victor Machine (58% with perfect recall of 1.00), and k-Nearest Neighbor (83%). The model shows excellent precision and recall (0.97 each), making it effective for applications such as criminal investigations. The results reveal that demographic factors, such as weight and height, contribute more to the model’s predictions than physiological signals. With a computation time of 0.06 seconds, Random Forest is efficient for real-time use. Additionally, a robust k-fold cross-validation procedure was conducted, combined with Grid Search and Particle Swarm Optimization (PSO) for hyperparameter tuning, which substantially reduced the gap between training and validation accuracies from several percentage points to under 1%, underscoring the model’s enhanced generalization and reliability in real-world scenarios.