Ziyun Wang

and 4 more

Machine learning models trained on vast amounts of data often face challenges related to privacy, particularly when exposed to sensitive information that could be unintentionally retrieved through adversarial methods. A novel approach is explored in this work, introducing machine unlearning as an effective mechanism to mitigate privacy leakage by selectively removing the influence of specific sensitive data points from a trained model. Unlike traditional methods that rely on retraining from scratch or incorporating broad noise injection, machine unlearning focuses on targeted weight modifications that do not significantly degrade model utility while enhancing privacy protections. Extensive experiments on the Llama model demonstrated that the application of machine unlearning techniques successfully reduced privacy risks, as evidenced through a significant drop in the retrieval of sensitive data. The findings showed a balanced reduction in privacy leakage while preserving performance on non-sensitive tasks, positioning unlearning as a viable method to address privacy concerns in machine learning systems. Moreover, the work provides critical insights into the trade-offs between privacy and utility, emphasizing the importance of optimizing for both aspects in practical applications where data security and performance must coexist. Through this contribution, the study adds to the evolving understanding of how privacy can be embedded within AI systems while minimizing the adverse impacts on functionality.