Yoshiki Nishikado

and 5 more

The growing sophistication of adversarial attacks targeting machine learning models has led to increasing concerns about the security and robustness of widely deployed systems. Modality fusion offers a novel defense mechanism that enhances the resilience of models against reverse preference attacks, a specific type of adversarial manipulation aimed at altering preference signals to subvert model performance. Through the integration of multiple data modalities, the Mistral LLM was equipped with the ability to process a richer, more complex set of features, effectively distributing the impact of adversarial interference across several input channels. Experiments demonstrated that this multi-modal approach not only improved accuracy but also significantly reduced performance degradation under high-intensity attack scenarios. The inclusion of attention mechanisms further enabled the model to dynamically prioritize information based on context, improving its adaptability and real-time performance under adversarial conditions. Although the modality fusion mechanism introduced a moderate increase in computational overhead, the corresponding improvements in robustness, particularly in mitigating the effects of reverse preference attacks, made it a highly effective solution for defending against adversarial threats. The findings emphasize the critical role that multi-modal processing can play in securing machine learning models against increasingly sophisticated attacks while maintaining performance.