Variability in data transformation: towards data migration product lines

David Romero-Organvidez, David Benavides, Jose-Miguel Horcas, María Teresa Gómez-López

Technical Track

Location PinHaus der Universität, Schlösslistrasse 5, 3008 Bern, Switzerland
7 February 2024, 11:20 CET
SpeakerDavid Romero-Organvidez
DiscussantPaul Grünbacher
https://dl.acm.org/doi/10.1145/3634713.3634724

Software evolution often requires data management and more concretely data migration. Data migration follows an ETL process: extracting (E) data from a source, transforming (T) the data depending on migration needs, and loading (L) the data in a target data storage. Data migration projects are recognised to be complex and challenging tomanage, which can lead to resource loss and planning delays. Among the reasons for data migration project failure is the lack of systematic artifact reuse (e.g., transformation script) in the data migration process. Every new data migration project is often developed from scratch. Software product line (SPL) engineering has been applied in many different domains to systematically reuse artifacts (e.g., code platforms, test cases) in software development processes and there are many positive experiences when applying SPL to reduce cost and time. In this paper, we present an approach using SPL techniques for data migration projects, concretely, in the data transformation stage. Our solution facilitates the automated creation of scripts that can be reused in different data migration projects. The feasibility of the proposal is validated in the domain of web information systems modernization. The validation shows how various migration scripts can be created to transform data between different content management systems. With this work, new opportunities are opened for studying the synergies of SPL and data migration. To the best of our knowledge, this is the first proposal that uses a complete stack of SPL that materializes the reuse of artifacts for data migration.