loading page

LLM-assisted topic modeling for hate speech characterization
  • Alejandro Buitrago López,
  • Javier Pastor-Galindo,
  • José A. Ruipérez-Valiente
Alejandro Buitrago López
Universidad de Murcia

Corresponding Author:alejandro.buitragol@um.es

Author Profile
Javier Pastor-Galindo
Universidad de Murcia
Author Profile
José A. Ruipérez-Valiente
Universidad de Murcia
Author Profile

Abstract

In the digital era, the internet and social media have transformed communication but have also facilitated the spread of hate speech and disinformation, leading to radicalization, polarization, and toxicity. This is especially concerning for media outlets due to their significant role in shaping public discourse. This study examines the topics, sentiments, and hate prevalence in 337,807 response messages (website comments and tweets) to news from five Spanish media outlets (La Vanguardia, ABC, El País, El Mundo, and 20 Minutos) in January 2021. These public reactions were originally labeled as distinct types of hate by experts following an original procedure, and they are now classified into three sentiment values (negative, neutral, or positive) and main topics. The BERTopic unsupervised framework was used to extract 81 topics, manually named with the help of Large Language Models (LLMs) and grouped into nine primary categories. Results show social issues ( 22.22%), expressions and slang ( 20.35%), and political issues ( 11.80%) as the most discussed. Content is mainly negative ( 62.7%) and neutral ( 28.57%), with low positivity ( 8.73%). Toxic narratives relate to conversation expressions, gender, feminism, and COVID-19. Despite low levels of hate speech ( 3.98%), the study confirms high toxicity in online responses to social and political topics.
22 Oct 2024Submitted to Expert Systems
23 Oct 2024Submission Checks Completed
23 Oct 2024Assigned to Editor
30 Oct 2024Reviewer(s) Assigned