AUTHOREA
Log in Sign Up Browse Preprints
LOG IN SIGN UP
Teon Volkova
Teon Volkova

Public Documents 1
A Novel Approach to Optimize Large Language Models for Named Entity Matching with Mon...
Teon Volkova
Evander Delacruz

Teon Volkova

and 2 more

September 17, 2024
Named entity matching plays a critical role in data integration tasks, where the challenge lies in accurately identifying and linking records that refer to the same real-world entities across disparate data sources. The novel integration of Monte Carlo Tree Search (MCTS) for optimizing hyperparameter tuning in fine-tuning language models, such as Mistral, presents a significant advancement in enhancing the accuracy and efficiency of entity matching. The research focuses on leveraging Mistral LLM to identify subtle variations in ransomware-related entities, showcasing the effectiveness of MCTS in systematically exploring the hyperparameter space to achieve optimal model configurations. Extensive experimentation demonstrated the superiority of the fine-tuned model over traditional approaches, including rule-based systems and support vector machines, in terms of precision, recall, and F1-score. The study also involved a detailed error analysis and sensitivity analysis, highlighting the critical impact of hyperparameter selection on model performance and the ability of MCTS to streamline the optimization process. The findings provide valuable insights into the potential of combining advanced search techniques with language models to address complex entity matching tasks in various domains.

| Powered by Authorea.com

  • Home