AUTHOREA
Log in Sign Up Browse Preprints
LOG IN SIGN UP
SAAD BELEFQIH
SAAD BELEFQIH

Public Documents 1
Semantic Schema Extraction in NoSQL Databases using BERT Embedding
SAAD BELEFQIH
AHMED ZELLOU

SAAD BELEFQIH

and 2 more

June 09, 2024
NoSQL databases, valued for flexibility and scalability, pose analytics challenges due to their schema-less nature. Automatic schema extraction is crucial, with existing techniques limited in handling nested structures. Leveraging Natural Language Processing (NLP) advancements, this paper introduces a novel BERT embedding-based approach for extracting schemas from NoSQL databases. The method analyzes semantic relationships within triplets from JSON documents through four stages: triplet extraction, preprocessing, BERT embedding generation, and similarity analysis. Evaluation on real datasets demonstrates over 83% accuracy in extracting valid nested schema components. The study reveals interdisciplinary intersections, using NLP to unveil structures in scenarios lacking explicit schemas, showcasing significant potential for autonomous schema extraction from raw, unstructured data formats.

| Powered by Authorea.com

  • Home