loading page

Understanding and tracing semantics of concepts to application domains emerging from source code, documentation, and tests
  • Zaki Pauzi,
  • Andrea Capiluppi,
  • Cezar Sas
Zaki Pauzi
Rijksuniversiteit Groningen

Corresponding Author:a.z.bin.mohamad.pauzi@rug.nl

Author Profile
Andrea Capiluppi
Rijksuniversiteit Groningen
Author Profile
Cezar Sas
Rijksuniversiteit Groningen
Author Profile

Abstract

As software artifacts continuously evolve and increase in number, the need for automated traceability increases due to the complexity of trace links. Besides tracing components across different artifacts, the need for tracing to application domains is critical to understand the classification of semantics and the coverage (i.e., which application domain is present in each artifact?). In this paper, we propose the notion of using NLP to map concepts emerging from software artifacts to application domains, and tracing these between artifacts. We extracted the corpus keywords from source code, documentation, and tests. We ran an optimised Latent Dirichlet Allocation (LDA) to generate the concepts emerging from each artifact. We then calculated the similarity scores of each concept against each application domain, and ranked the difference of these scores between pairwise artifacts. Results show that the ranking of the inverse of the difference represents the strength of tracing in semantics, and different embeddings show varying results. We observed the strong applicability of our method and its replicability by other researchers and practitioners, particularly in detecting synchronised application domains that are traced between artifacts.
27 Aug 2024Submitted to Journal of Software: Evolution and Process
28 Aug 2024Submission Checks Completed
28 Aug 2024Assigned to Editor
12 Dec 2024Reviewer(s) Assigned