You need to sign in or sign up before continuing. dismiss

Recent advancements in artificial intelligence have expanded the possibilities for processing complex multi-modal documents, yet maintaining context coherence and factual accuracy across different modalities remains a significant challenge (Tu et al. 2024; Es et al. 2024; Klein et al. 2006a). Building upon foundational work in graph-based retrieval systems, particularly Microsoft's GraphRAG (Edge et al. 2024) and the open-source LightRAG (Guo et al. 2024) project, we present LuminiRAG, a system that advances document understanding through several key innovations in content extraction and query processing. Our primary contributions include: (1) an intelligent content relevance assessment mechanism that selectively extracts meaningful information from multi-modal documents (Ram et al. 2023; Fan et al. 2024), (2) an enhanced vision-language processing pipeline that preserves semantic relationships between textual and visual elements (Chan et al. 2024), (3) a dynamic knowledge graph construction approach that extends GraphRAG principles to maintain cross-modal relationships (Petroni et al. 2021), and (4) a sophisticated query processing engine that handles complex, multi-hop queries through adaptive semantic thresholding and comprehensive response verification (Gao et al. 2023). Through extensive evaluation on diverse document sets in the financial domain, including financial reports, transaction documents, and contractual documents, LuminiRAG demonstrates substantial improvements over traditional RAG approaches (Es et al. 2024; Salemi and Zamani 2024). Our results show particularly significant improvements in handling complex queries involving tabular data, cross-references, and multi-modal content relationships, establishing new benchmarks in document understanding while maintaining practical processing efficiency.