Data Warehousing Process Modeling from Classical Approaches to New Trends: Main Features and Comparisons
Abstract
:1. Introduction
- ETL process modeling based on UML;
- ETL process modeling based on ontology;
- ETL process modeling based on MDA;
- ETL process modeling based on graphical flow which includes BPMN, CPN, YAWL, and the data visualization flow;
- ETL process modeling based on ad hoc formalisms, which include conceptual constructs, CommonCube, and EMD;
- ELT process modeling approaches for Big Data.
- We perform an exhaustive study through a systematic literature review in the data warehousing modeling field;
- We propose a new classification system for ETL/ELT process modeling approaches;
- We identify a set of comparison criteria, on which we based our literature review;
- We define and compare the existing categories of approaches;
- We investigate the new trends of ETL/ELT, specifically in the context of Big Data warehousing;
- Finally, we provide a set of recommendations and an example for comparative study.
2. Comparison Criteria and Features for Modeling Data Warehousing Processes
3. Summary and Comparison of ETL/ELT Process Modeling Approaches
3.1. Proposed Classification of ETL/ELT Process Modeling Approaches
3.2. ETL Process Modeling Approaches Based on UML
3.2.1. Summary of ETL Process Modeling Approaches Based on UML
3.2.2. Comparison of UML-Based Approaches
3.3. ETL Process Modeling Approaches Based on Ontology
3.3.1. Summary of ETL Process Modeling Approaches Based on Ontology
3.3.2. Comparison of Ontology-Based Approaches
- Reusability: According to the authors, the proposed model (or a part of it) is reusable.
- Formal specification: In our context, this is the definition of requirements, tasks, and data schemas in a formal way, by defining a vocabulary and expressions dedicated to these purposes. Formal specification is used too much in ontology-based modeling to formalize the developed ontologies. Moreover, this method allows for simplification of the presented model and facilitates its understanding.
- Business requirement.
- The type of ontology: domain ontology or application ontology. The application ontology models the useful knowledge for specific applications and, according to [46], should provide the ability for modeling various types of information, including the concepts of the domain, the relationships between those concepts, the attributes characterizing each concept and, finally, the different representation formats and (ranges of) values for each attribute. In contrast, the domain ontology is a more general ontology, which may pre-exist and may be developed independently of the data repositories. It enables the reuse, organization, and communication of knowledge and semantics between information users and providers [59].
- The type of data heterogeneity treated: structural heterogeneity, semantic heterogeneity, or both. In [44], it was considered that structural heterogeneity arises from data in information systems being stored in different structures, such that they need homogenization; while semantic heterogeneity considers the intended meaning of the information items. In order to achieve semantic interoperability in a heterogeneous information system, the meaning of the interchanged information must be understood across the systems.
- The proposed ontological approach, either based on a single-ontology approach, a multiple-ontology approach, or a hybrid approach. According to [60], single-ontology approaches use one global ontology to provide a shared vocabulary for the specification of the semantics. All information sources are related to one global ontology. In multiple-ontology approaches, each information source is described by its separate ontology. In principle, the source ontology can be a combination of several other ontologies, but the fact that the different source ontologies share the same vocabulary is not guaranteed. In hybrid approaches, the semantics of each source is described by its ontology, but all of the local ontologies use the shared global vocabulary. Each type of approach has advantages and disadvantages. More details are provided in [60].
3.4. ETL Process Modeling Approaches Based on MDA
- CIM: A computation-independent model is placed on the top of the architecture, which is used only to describe the system requirements. This model helps to present exactly what the system is expected to do. It is also known in the literature as a “domain model” or “business model”.
- PIM: A platform-independent model is a model of a sub-system that contains no information specific to the platform or the technology used to realize it [61].
- PSM: A platform-specific model is a model of a sub-system that includes information about the specific technology for its implementation on a specific platform and, hence, possibly contains elements specific to the platform [61].
- QVT: Query, view, transformation is a Meta-Object Facility (MOF) standard for specifying model transformations [63]. The QVT language can ensure the formal transformations between the different models of MDA layers (CIM, PIM, and PSM).
- Code: An interpretation of the PSM model already obtained can be used to generate an application code and execute it using an appropriate tool.
3.4.1. Summary of ETL Process Modeling Approaches Based on MDA
3.4.2. Comparison of MDA-Based Approaches
3.5. ETL Process Modeling Approaches Based on Graphical Flow Formalism
3.5.1. Summary of ETL Process Modeling Approaches Based on BPMN
3.5.2. Summary of ETL Process Modeling Approaches Based on CPN
3.5.3. Summary of ETL Process Modeling Approaches Based on YAWL
3.5.4. Summary of ETL Process Modeling Approaches Based on Data Flow Visualization
3.5.5. Comparison of Graphical Flow Formalism-Based Approaches
3.6. ETL Process Modeling Approaches Based on Ad Hoc Formalisms
3.6.1. Summary of ETL Process Modeling Approaches Based on CommonCube
3.6.2. Summary of ETL Process Modeling Approaches Based on EMD
3.6.3. Comparison of Ad Hoc Formalism-Based Approaches
3.7. ELT Process Modeling Approaches for Big Data
3.7.1. Summary of ELT Process Modeling for Big Data
3.7.2. Comparison of ELT Process Modeling Approaches for Big Data
4. Discussion and Findings
- The modeling methods based on standard modeling languages for the software development, such as UML or BPEL, or based on the standard notation BPMN, were confirmed to be powerful methods, as they favor standardization of the ETL workflow design. In addition, these standard-based methods are easy to implement, as recognized tools support them; moreover, their validation and evaluation will be straightforward. First, UML is over-demanded, used, and counted among the first standard modeling languages, which make it possible to produce good documentation on its various diagrams, and several use cases were provided, which saves new users time and effort when deploying it. Second, it can be exploited by commercial tools, as long as it is a standard technology. More generally, the documentation provided with a standard languages facilitates user comprehension and handling, even if they are not an experienced designer. Third, UML provides a set of packages that decompose the design of an ETL process into simple sub-processes (i.e., different logical units), thus facilitating the creation of the ETL model and, subsequently, the maintenance of the ETL process, regardless of its degree of complexity. However, despite the efforts conducted in [22], in terms of proposing an extension mechanism that allows the UML to model the transformations of the ETL at the low “attribute” level, according to other authors [6,34,39], this gap still presents a constraint to them. They considered that modeling based on the UML at the attribute level will lead to overly complicated models, unlike if we use conceptual constructs to conceptually model the elements involved in the ETL process, as mentioned in [77,89].
- Several researchers favored the use of ontologies for data warehousing modeling, for various reasons: First, they can identify the schema of the data source and DW, enrich the metadata, and interchange these metadata among repositories [35,106]. Therefore, the supporting data classification, visual representation, and documentation are good. Second, according to [49,107], the use of an ontology is the best method for capturing the domain model’s semantics and resolving the semantic problems of both heterogeneity and interoperability. Third, by using an ontology, it is possible to define how two concepts of an ontology are structurally related, the type of relationship they have, and whether the relationship is symmetric, reflexive, or transitive. This is the way in which [45] defined the semantic integration of disparate data sources. Forth, they provide an explicit and formal representation, with well-defined semantics that allow for automated reasoning on metadata, including inference rules to derive new information from the available data [18,44,45]. Nevertheless, among the limits of semantic modeling, resolution of the heterogeneity of data sources, particularly semantic resolution, and mapping between these sources are very complex tasks. Furthermore, based on the OWL language, the ETL model can be redefined and reused during different stages of a DW design; however, this solution applies only to relational databases and does not support the semi-structured and unstructured data that the DW can receive. Indeed, according to [108], “In separated operational data sources, the syntax and semantics of data sources are extremely heterogeneous. In the ETL process, to establish a relationship between semantically similar data, the mapping between these sources can hardly be fully resolved by fixed met0amodels or frameworks”.
- As for CWM, from the literature, Simitsis [109] deduced that “There does not exist common model for the metadata of ETL processes and CWM is not sufficient for this purpose, and it is too complicated for real-world applications”. In addition, according to [49], the CWM is more appropriate for resolving schema conflicts than the underlying semantics of the domain being modeled, which leads us to deduce that this standard should always be coupled with other methods focusing on semantic integration, such as ontologies, as proposed in [49].
- According to [62], MDA models can represent systems at any level of abstraction or from different points of view, ranging from enterprise architectures to technological implementations. Further, from one PIM, one or more PSM can be derived by applying appropriate transformations. Therefore, the advantages of separating business logic and technology in the MDA by providing different layers (e.g., CIM, PIM, PSM, and code) lead toward interoperable, reusable, and portable software components and data models based on standard models [61]. In this context, from the comparison in Table 4, we noted that the studied works based on the MDA tended to model the three levels: conceptual, logical, and physical. Moreover, as previously mentioned, all contributions met the “QVT” criteria to ensure the transformations between the different MDA layers. Finally, the primary strength of MDA-based methods is the automated transformation of models to implementations through the use of model-to-text (M2T) transformations, which automatically generate code from models. This automatic code generation seems simple overall, but relying on reliable patterns and referring to rich and constantly updated libraries is necessary. Moreover, according to [110], this task is comparable to manual development of the ETL procedure.
- The BMPN is advantageous, thanks to the clarity and simplicity of notations for process representation and its powerful expressiveness, based on the use of a palette of conceptual tools to express business processes in an intuitive language. In addition to its description of the characterizations of ETL activities, it can express data and control objects, which are indispensable for the synchronization of the transformation flows. Moreover, the BPMN can be used to create a platform-independent conceptual model of an ETL workflow. We found works coupling BPMN and MDA or MDD for data warehouse modeling, such as the proposed framework of [25], which was summarized in Section 3.5.1. Furthermore, BPMN is a formalism that relies on business requirements to model the ETL at a conceptual level. Finally, enterprise processes based on BPMN are designed uniformly, making communication between them easy.
- The use of patterns is also interesting. Indeed, [28] mentioned, in their work, that the use of ETL patterns in workflow systems contexts provides a way to specify and share communication protocols, increases the data interchange across systems, and allows for the integration of new ETL patterns. Hence, they can be used and reused according to the needs of a practical application scenario [28], consequently reducing potential design errors and both simplifying and alleviating the task of implementing ETL systems.
- Big Data characteristics—In our case, we were dealing with data from Twitter and other websites, allowing for tracking of the evolution of the COVID-19 pandemic and vaccination campaigns; therefore, we were dealing with massive volumes of data from different sources (massive volume, variability).
- The type of data gathered—We collected CSV files and, hence, the data type was structured.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Inmon, W.H. Building the Data Warehouse, 1st ed.; John Wiley & Sons. Inc.: Hoboken, NJ, USA, 1996. [Google Scholar]
- Vassiliadis, P. Data Warehouse Modeling And Quality Issues; National Technical University of Athens Zographou: Athens, Greece, 2000. [Google Scholar]
- Inmon, W.H. Building the Data Warehouse, 3rd ed.; Wiley: New York, NY, USA, 2002. [Google Scholar]
- Kakish, K.; Kraft, T.A. ETL evolution for real-time data warehousing. In Proceedings of the Conference on Information Systems Applied Research, New Orleans, LA, USA, 1–4 November 2012; Volume 2167, p. 1508. [Google Scholar]
- Kimball, R.; Reeves, L.; Ross, M.; Thornthwaite, W. The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing, and Deploying Data Warehouses; Wiley: New York, NY, USA, 1998. [Google Scholar]
- Trujillo, J.; Luján-Mora, S. A UML based approach for modeling ETL processes in data warehouses. In Proceedings of the International Conference on Conceptual Modeling, Chicago, IL, USA, 13–16 October 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 307–320. [Google Scholar]
- Singh, J. ETL methodologies, limitations and framework for the selection and development of an ETL tool. Int. J. Res. Eng. Appl. Sci. 2016, 6, 108–112. [Google Scholar]
- Muñoz, L.; Mazón, J.N.; Trujillo, J. Systematic review and comparison of modeling ETL processes in data warehouse. In Proceedings of the 5th Iberian Conference on Information Systems and Technologies, Santiago de Compostela, Spain, 16–19 June 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1–6. [Google Scholar]
- Laney, D. 3D Data Management: Controlling Data Volume, Velocity and Variety; Lakshen, G.A., Ed.; Meta Group: Menlo Park, CA, USA, 2001; pp. 1–4. [Google Scholar]
- Jo, J.; Lee, K.W. MapReduce-based D_ELT framework to address the challenges of geospatial Big Data. ISPRS Int. J. Geo-Inf. 2019, 8, 475. [Google Scholar] [CrossRef]
- Cottur, K.; Gadad, V. Design and Development of Data Pipelines. Int. Res. J. Eng. Technol. (IRJET) 2020, 7, 2715–2718. [Google Scholar]
- Fang, H. Managing data lakes in Big Data era: What’s a data lake and why has it became popular in data management ecosystem. In Proceedings of the 2015 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), Shenyang, China, 8–12 June 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 820–824. [Google Scholar]
- Demarest, M. The Politics of Data Warehousing. June 1997, Volume 6, p. 1998. Available online: http://www.hevanet.com/demarest/marc/dwpol.html (accessed on 29 April 2022).
- March, S.T.; Hevner, A.R. Integrated decision support systems: A data warehousing perspective. Decis. Support Syst. 2007, 43, 1031–1043. [Google Scholar] [CrossRef]
- Solomon, M.D. Ensuring A Successful Data Warehouse Initiative. Inf. Syst. Manag. 2005, 22, 26–36. [Google Scholar] [CrossRef]
- Muñoz, L.; Mazon, J.N.; Trujillo, J. ETL process Modeling Conceptual for Data Warehouses: A Systematic Mapping Study. IEEE Lat. Am. Trans. 2011, 9, 358–363. [Google Scholar] [CrossRef]
- Oliveira, B.; Belo, O. Approaching ETL processes Specification Using a Pattern-Based ontology. In Data Management Technologies and Applications; Francalanci, C., Helfert, M., Eds.; Series Title: Communications in Computer and Information Science; Springer International Publishing: Cham, Switzerland, 2017; Volume 737, pp. 65–78. [Google Scholar] [CrossRef]
- Ali, S.M.F.; Wrembel, R. From conceptual design to performance optimization of ETL workflows: Current state of research and open problems. VLDB J. 2017, 26, 777–801. [Google Scholar] [CrossRef]
- Jindal, R.; Taneja, S. Comparative study of data warehouse design approaches: A survey. Int. J. Database Manag. Syst. 2012, 4, 33. [Google Scholar] [CrossRef]
- Nabli, A.; Bouaziz, S.; Yangui, R.; Gargouri, F. Two-ETL Phases for Data Warehouse Creation: Design and Implementation. In Advances in Databases and Information Systems; Tadeusz, M., Valduriez, P., Bellatreche, L., Eds.; Series Title: Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; Volume 9282, pp. 138–150. [Google Scholar] [CrossRef]
- Chandra, P.; Gupta, M. Comprehensive survey on data warehousing research. Int. J. Inf. Technol. 2017, 10, 217–224. [Google Scholar] [CrossRef]
- Luján-Mora, S.; Vassiliadis, P.; Trujillo, J. Data mapping diagrams for data warehouse design with UML. In Proceedings of the International Conference on Conceptual Modeling, Shangai, China, 8–12 November 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 191–204. [Google Scholar]
- Bellatreche, L.; Khouri, S.; Berkani, N. Semantic Data Warehouse Design: From ETL to Deployment à la Carte. In Database Systems for Advanced Applications; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Series Title: Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; Volume 7826, pp. 64–83. [Google Scholar] [CrossRef]
- Mazón, J.N.; Trujillo, J. An MDA approach for the development of data warehouses. Decis. Support Syst. 2008, 45, 41–58. [Google Scholar] [CrossRef]
- El Akkaoui, Z.; Zimányi, E.; Mazón, J.N.; Trujillo, J. A BPMN-Based Design and Maintenance Framework for ETL processes. Int. J. Data Warehous. Min. 2013, 9, 46–72. [Google Scholar] [CrossRef]
- Oliveira, B.; Belo, O. From ETL Conceptual Design to ETL Physical Sketching using Patterns. In Proceedings of the 20th International Conference on Enterprise Information Systems, Madeira, Portugal, 21–24 March 2018; pp. 262–269. [Google Scholar] [CrossRef]
- Silva, D.; Fernandes, J.M.; Belo, O. Assisting data warehousing populating processes design through modelling using coloured petri nets. In Proceedings of the 3rd Industrial Conference on Simulation and Modeling Methodologies, Technologies and Applications, Reykjavik, Iceland, 29–31 July 2013. [Google Scholar]
- Belo, O.; Cuzzocrea, A.; Oliveira, B. Modeling and supporting ETL processes via a pattern-oriented, task-reusable framework. In Proceedings of the 2014 IEEE 26th International Conference on Tools with Artificial Intelligence, Limassol, Cyprus, 10–12 November 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 960–966. [Google Scholar]
- Dupor, S.; Jovanovic, V. An approach to conceptual modelling of ETL processes. In Proceedings of the 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 26–30 May 2014; IEEE: Opatija, Croatia, 2014; pp. 1485–1490. [Google Scholar] [CrossRef]
- Bala, M.; Boussaid, O.; Alimazighi, Z. A Fine-Grained Distribution Approach for ETL processes in Big Data Environments. Data Knowl. Eng. 2017, 111, 114–136. [Google Scholar] [CrossRef]
- Li, Z.; Sun, J.; Yu, H.; Zhang, J. CommonCube-based Conceptual Modeling of ETL processes. In Proceedings of the 2005 International Conference on Control and Automation, Budapest, Hungary, 26–29 June 2005; IEEE: Budapest, Hungary, 2005; Volume 1, pp. 131–136. [Google Scholar] [CrossRef]
- El-Sappagh, S.H.A.; Hendawi, A.M.A.; El Bastawissy, A.H. A proposed model for data warehouse ETL processes. J. King Saud Univ. Comput. Inf. Sci. 2011, 23, 91–104. [Google Scholar] [CrossRef]
- Muñoz, L.; Mazón, J.N.; Pardillo, J.; Trujillo, J. Modelling ETL processes of data warehouses with UML activity diagrams. In Proceedings of the OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”, Monterrey, Mexico, 9–14 November 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 44–53. [Google Scholar]
- Mallek, H.; Walha, A.; Ghozzi, F.; Gargouri, F. ETL-web process modeling. In Proceedings of the ASD Advances on Decisional Systems Conference, Hammamet, Tunisia, 29–31 May 2014. [Google Scholar]
- Biswas, N.; Chattopadhyay, S.; Mahapatra, G.; Chatterjee, S.; Mondal, K.C. SysML Based Conceptual ETL process Modeling. In Computational Intelligence, Communications, and Business Analytics; Mandal, J.K., Dutta, P., Mukhopadhyay, S., Eds.; Series Title: Communications in Computer and Information Science; Springer: Singapore, 2017; Volume 776, pp. 242–255. [Google Scholar] [CrossRef]
- Ambler, S. A UML Profile for Data Modeling. 2002. Available online: http://www.agiledata.org/essays/umlDataModelingProfile.html (accessed on 29 April 2022).
- Naiburg, E.; Naiburg, E.J.; Maksimchuck, R.A. UML for Database Design; Addison-Wesley Professional: Boston, MA, USA, 2001. [Google Scholar]
- Rational Rose 2000e: Rose Extensibility User’s Guide; Rational Software Corporation: San Jose, CA, USA, 2000.
- Muñoz, L.; Mazón, J.N.; Trujillo, J. Automatic generation of ETL processes from conceptual models. In Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP—DOLAP ’09, Hong Kong, China, 6 November 2009; p. 33. [Google Scholar] [CrossRef]
- Biswas, N.; Chattapadhyay, S.; Mahapatra, G.; Chatterjee, S.; Mondal, K.C. A New Approach for Conceptual extraction-transformation-loading process Modeling. Int. J. Ambient Comput. Intell. 2019, 10, 30–45. [Google Scholar] [CrossRef]
- Guarino, N. Formal ontology in Information Systems. In Proceedings of the First International Conference (FOIS’98), Trento, Italy, 6–8 June 1998; IOS Press: Amsterdam, The Netherlands, 1998. [Google Scholar]
- Skoutas, D.; Simitsis, A.; Sellis, T. ontology-Driven Conceptual Design of ETL processes Using Graph transformations. In Journal on Data Semantics XIII; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Series Title: Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5530, pp. 120–146. [Google Scholar] [CrossRef]
- Jovanovic, P.; Romero, O.; Simitsis, A.; Abelló, A. Requirement-Driven Creation and Deployment of Multidimensional and ETL Designs. In Advances in Conceptual Modeling; Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., et al., Eds.; Series Title: Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7518, pp. 391–395. [Google Scholar] [CrossRef]
- Skoutas, D.; Simitsis, A. Designing ETL processes using semantic web technologies. In Proceedings of the 9th ACM International Workshop on Data Warehousing and OLAP—DOLAP ’06, Atlanta, GA, USA, 17–21 October 2022; ACM Press: Arlington, VR, USA, 2006; p. 67. [Google Scholar] [CrossRef]
- Deb Nath, R.P.; Hose, K.; Pedersen, T.B. Towards a programmable semantic extract-transform-load framework for semantic data warehouses. In Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP, Atlanta, GA, USA, 17–21 October 2022; pp. 15–24. [Google Scholar]
- Skoutas, D.; Simitsis, A. ontology-Based Conceptual Design of ETL processes for Both Structured and Semi-Structured Data. Int. J. Semant. Web Inf. Syst. 2007, 3, 1–24. [Google Scholar] [CrossRef]
- Hoang, A.D.T.; Nguyen, B.T. An Integrated Use of CWM and Ontological Modeling Approaches towards ETL processes. In Proceedings of the 2008 IEEE International Conference on e-Business Engineering, Xi’an, China, 22–24 October 2008; IEEE: Xi’an, China, 2008; pp. 715–720. [Google Scholar] [CrossRef]
- Oliveira, B.; Belo, O. An ontology for Describing ETL Patterns Behavior. In Proceedings of the 5th International Conference on Data Management Technologies and Applications, Lisbon, Portugal, 24–26 July 2016; pp. 102–109. [Google Scholar] [CrossRef]
- Thi, A.D.H.; Nguyen, B.T. A Semantic approach towards CWM-based ETL processes. Proc. I-SEMANTICS 2008, 8, 58–66. [Google Scholar]
- TPC-H Homepage. Available online: http://www.tpc.org/tpch/ (accessed on 10 April 2022).
- Chang, D.D.T. Common Warehouse Metamodel (CWM), UML and XML. In Proceedings of the Meta Data Conference, 19–23 March 2000; p. 56. Available online: https://cwmforum.org/cwm.pdf (accessed on 27 July 2022).
- Ontology Definition Metamodel; OMG Object Management Group: Needham, MA, USA, 2014; p. 362.
- Romero, O.; Abelló, A. A framework for multidimensional design of data warehouses from ontologies. Data Knowl. Eng. 2010, 69, 1138–1157. [Google Scholar] [CrossRef]
- Romero, O.; Simitsis, A.; Abelló, A. GEM: Requirement-driven generation of ETL and multidimensional conceptual designs. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, Toulouse, France, 29 August–2 September 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 80–95. [Google Scholar]
- TPC-DS Homepage. Available online: https://www.tpc.org/tpcds/ (accessed on 10 April 2022).
- Khouri, S.; El Saraj, L.; Bellatreche, L.; Espinasse, B.; Berkani, N.; Rodier, S.; Libourel, T. CiDHouse: Contextual SemantIc Data WareHouses. In Database and Expert Systems Applications; Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; pp. 458–465. [Google Scholar] [CrossRef]
- Lehigh University Benchmark (LUBM). Available online: http://swat.cse.lehigh.edu/projects/lubm/ (accessed on 10 April 2022).
- Deb Nath, R.P.; Hose, K.; Pedersen, T.B.; Romero, O. SETL: A programmable semantic extract-transform-load framework for semantic data warehouses. Inf. Syst. 2017, 68, 17–43. [Google Scholar] [CrossRef]
- Mena, E.; Kashyap, V.; Illarramendi, A.; Sheth, A. Domain specific ontologies for semantic information brokering on the global information infrastructure. In Formal Ontology in Information Systems; IOS Press: Amsterdam, The Netherlands, 1998; Volume 46, pp. 269–283. [Google Scholar]
- Wache, H.; Voegele, T.; Visser, U.; Stuckenschmidt, H.; Schuster, G.; Neumann, H.; Hübner, S. Ontology-based integration of information-a survey of existing approaches. In Proceedings of the IJCAI-01 Workshop: Ontologies and Information Sharing, Seattle, WA, USA, 4–6 August 2001. [Google Scholar]
- Miller, J.; Mukerji, J. MDA Guide Version 1.0.1; OMG: Needham, MA, USA, 2003; p. 62. [Google Scholar]
- MDA Specifications|Object Management Group. 2014. Available online: https://www.omg.org/mda/specs.htm (accessed on 10 April 2022).
- Gardner, T.; Griffin, C.; Koehler, J.; Hauser, R. A review of OMG MOF 2.0 Query/Views/transformations Submissions and Recommendations towards the final Standard. In Proceedings of the MetaModelling for MDA Workshop, York, UK, 24–25 November 2003; Citeseer: Princeton, NJ, USA, 2003; Volume 13, p. 41. [Google Scholar]
- Mazon, J.N.; Trujillo, J.; Serrano, M.; Piattini, M. Applying MDA to the development of data warehouses. In Proceedings of the 8th ACM international workshop on Data warehousing and OLAP—DOLAP, Bremen, Germany, 31 October 31–5 November 2005; p. 57. [Google Scholar] [CrossRef]
- Maté, A.; Trujillo, J. A trace metamodel proposal based on the model driven architecture framework for the traceability of user requirements in data warehouses. Inf. Syst. 2012, 37, 753–766. [Google Scholar] [CrossRef]
- Maté, A.; Trujillo, J. Tracing conceptual models’ evolution in data warehouses by using the model driven architecture. Comput. Stand. Interfaces 2014, 36, 831–843. [Google Scholar] [CrossRef]
- Didonet, M.; Fabro, D.; Bézivin, J.; Valduriez, P. Weaving Models with the Eclipse AMW plugin. In Proceedings of the Eclipse Modeling Symposium, Eclipse Summit Europe, Esslingen, Germany, 11–12 October 2006. [Google Scholar]
- Mazón, J.N.; Trujillo, J.; Serrano, M.; Piattini, M. Designing data warehouses: From business requirement analysis to multidimensional modeling. In Proceedings of the International Workshop on Requirements Engineering for Business. Need and IT Alignment (REBNITA 2005), Paris, France, 29–30 August 2005; University of New South Wales Press: Kensington, Australia, 2005; Volume 5, pp. 44–53. [Google Scholar]
- Jouault, F.; Kurtev, I. Transforming models with ATL. In Proceedings of the Satellite Events at the MoDELS 2005 Conference, Montego Bay, Jamaica, 2–7 October 2005; Springer: Berlin/Heidelberg, Germany, 2006; Volume 43, p. 45. [Google Scholar]
- El Akkaoui, Z.; Zimanyi, E. Defining ETL worfklows using BPMN and BPEL. In Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP—DOLAP ’09, Hong Kong, China, 6 November 2009; p. 41. [Google Scholar] [CrossRef]
- Akkaoui, Z.E.; Mazón, J.N.; Vaisman, A.; Zimányi, E. BPMN-based conceptual modeling of ETL processes. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, Vienna, Austria, 3–6 September 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 1–14. [Google Scholar]
- El Akkaoui, Z.; Vaisman, A.; Zimányi, E. A Quality-based ETL Design Evaluation Framework. In Proceedings of the 21st International Conference on Enterprise Information Systems, Heraklion, Crete, Greece, 3–5 May 2019; pp. 249–257. [Google Scholar] [CrossRef]
- Wilkinson, K.; Simitsis, A.; Castellanos, M.; Dayal, U. Leveraging business process models for ETL design. In Proceedings of the International Conference on Conceptual Modeling, Vancouver, BC, Canada, 1–4 November 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 15–30. [Google Scholar]
- Jensen, K.; Kristensen, L.M. Coloured Petri Nets: Modelling and Validation of Concurrent Systems; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Pan, B.; Zhang, G.; Qin, X. Design and realization of an ETL method in business intelligence project. In Proceedings of the 2018 IEEE 3rd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), Chengdu, China, 20–22 April 2018; pp. 275–279. [Google Scholar] [CrossRef]
- Vassiliadis, P.; Simitsis, A.; Skiadopoulos, S. Conceptual modeling for ETL processes. In Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP—DOLAP ’02, McLean, VR, USA, 8 November 2002; pp. 14–21. [Google Scholar] [CrossRef]
- Vassiliadis, P.; Simitsis, A.; Skiadopoulos, S. Modeling ETL activities as graphs. In Proceedings of the Design and Management of Data Warehouses, Toronto, ON, Canada, 27 May 2002; Volume 58, pp. 52–61. [Google Scholar]
- Vassiliadis, P.; Simitsis, A.; Georgantas, P.; Terrovitis, M. A Framework for the Design of ETL Scenarios. In Proceedings of the International Conference on Advanced Information Systems Engineering, Klagenfurt/Velden, Austria, 16–20 June 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 520–535. [Google Scholar]
- Vassiliadis, P.; Vagena, Z.; Skiadopoulos, S.; Karayannidis, N.; Sellis, T. Arktos: Towards the modeling, design, control and execution of ETL processes. Inf. Syst. 2001, 26, 537–561. [Google Scholar] [CrossRef]
- Simitsis, A.; Vassiliadis, P. A Methodology for the Conceptual Modeling of ETL processes. In Proceedings of the Conference on Advanced Information Systems Engineering (CAiSE), Klagenfurt/Velden, Austria, 16–20 June 2003; p. 12. [Google Scholar]
- Bala, M.; Alimazighi, Z. ETL-XDesign: Outil d’aide à la modélisation de processus ETL. In Proceedings of the 6éme édition des Avancées sur les Systèmes Décisionnels, Blida, Algeria, 1–3 April 2012; pp. 155–166. [Google Scholar] [CrossRef]
- Bala, M.; Boussaid, O.; Alimazighi, Z. P-ETL : Parallel-ETL based on the MapReduce paradigm. In Proceedings of the IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), Doha, Qatar, 10–14 November 2014; pp. 42–49. [Google Scholar] [CrossRef]
- Bala, M.; Boussaid, O.; Alimazighi, Z. extracting-transforming-loading Modeling Approach for Big Data Analytics. Int. J. Decis. Support Syst. Technol. 2016, 8, 50–69. [Google Scholar] [CrossRef]
- Bala, M.; Boussaid, O.; Alimazighi, Z. Big-ETL: Extracting transforming loading approach for Big Data. In Proceedings of the International Conference on Parallel and Distributed processing Techniques and Applications (PDPTA), Las Vegas, NV, USA, 27–30 July 2015; p. 462. [Google Scholar] [CrossRef]
- Kabiri, A.; Chiadmi, D. KANTARA: A Framework to Reduce ETL Cost and Complexity. Int. J. Eng. Technol. (IJET) 2016, 8, 1280–1284. [Google Scholar]
- Kabiri, A.; Wadjinny, F.; Chiadmi, D. Towards a Framework for Conceptual Modeling of ETL processes. In Innovative Computing Technology; Pichappan, P., Ahmadi, H., Ariwa, E., Eds.; Series Title: Communications in Computer and Information Science; Springer: Berlin/Heidelberg, Germany, 2011; Volume 241, pp. 146–160. [Google Scholar] [CrossRef]
- Kabiri, A.; Chiadmi, D. A method for modelling and organazing ETL processes. In Proceedings of the Second International Conference on the Innovative Computing Technology (INTECH 2012), Casablanca, Morocco, 18–20 September 2012; pp. 138–143. [Google Scholar] [CrossRef]
- Boshra, A.H.E.B.M.; Hendawi, R.A.M. Entity mapping diagram for modeling ETL processes. In Proceedings of the Third International Conference on Informatics and Systems (INFOS), Giza, Egypt, 19–22 March 2005. [Google Scholar]
- Hendawi, A.M.; Sappagh, S.H.A.E. EMD: Entity mapping diagram for automated extraction, transformation, and loading processes in data warehousing. Int. J. Intell. Inf. Database Syst. 2012, 6, 255. [Google Scholar] [CrossRef]
- Jamra, H.A.; Gillet, A.; Savonnet, M.; Leclercq, E. Analyse des discours sur Twitter dans une situation de crise. In Proceedings of the INFormatique des ORganisations et des Systèmes d’Information et de Décision (INFORSID), Dijon, France, 2–4 June 2020; p. 16. [Google Scholar]
- Basaille, I.; Kirgizov, S.; Leclercq, E.; Savonnet, M.; Cullot, N.; Grison, T.; Gavignet, E. Un observatoire pour la modélisation et l’analyse des réseaux multi-relationnels. Doc. Numérique 2017, 20, 101–135. [Google Scholar]
- Moalla, I.; Nabli, A.; Hammami, M. Towards Opinions analysis method from social media for multidimensional analysis. In Proceedings of the 16th International Conference on Advances in Mobile Computing and Multimedia, Yogyakarta, Indonesia, 19–21 November 2018; pp. 8–14. [Google Scholar] [CrossRef]
- Walha, A.; Ghozzi, F.; Gargouri, F. Design and Execution of ETL process to Build Topic Dimension from User-Generated Content. In Proceedings of the International Conference on Research Challenges in Information Science, Online, 11–14 May 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 374–389. [Google Scholar]
- Walha, A.; Ghozzi, F.; Gargouri, F. From user generated content to social data warehouse: Processes, operations and data modelling. Int. J. Web Eng. Technol. 2019, 14, 203. [Google Scholar] [CrossRef]
- Bruchez, R. Les Bases de Données NoSQL et le BigData: Comprendre et Mettre en Oeuvre; Editions Eyrolles: Paris, France, 2015. [Google Scholar]
- Gallinucci, E.; Golfarelli, M.; Rizzi, S. Approximate OLAP of document-oriented databases: A variety-aware approach. Inf. Syst. 2019, 85, 114–130. [Google Scholar] [CrossRef]
- Mallek, H.; Ghozzi, F.; Teste, O.; Gargouri, F. BigDimETL with NoSQL Database. Procedia Comput. Sci. 2018, 126, 798–807. [Google Scholar] [CrossRef]
- Yangui, R.; Nabli, A.; Gargouri, F. ETL based framework for NoSQL warehousing. In Proceedings of the European, Mediterranean, and Middle Eastern Conference on Information Systems, Coimbra, Portugal, 7–8 September 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 40–53. [Google Scholar]
- Souibgui, M.; Atigui, F.; Yahia, S.B.; Si-Said Cherfi, S. Business intelligence and analytics: On-demand ETL over document stores. In Proceedings of the International Conference on Research Challenges in Information Science, Limassol, Cyprus, 23–25 September 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 556–561. [Google Scholar]
- Salinas, S.O.; Nieto Lemus, A.C. Data Warehouse and Big Data Integration. Int. J. Comput. Sci. Inf. Technol. 2017, 9, 1–17. [Google Scholar] [CrossRef]
- Munshi, A.A.; Mohamed, Y.A.R.I. Data lake lambda architecture for smart grids Big Data analytics. IEEE Access 2018, 6, 40463–40471. [Google Scholar] [CrossRef]
- Pal, G.; Li, G.; Atkinson, K. Multi-Agent Big-Data Lambda Architecture Model for E-Commerce Analytics. Data 2018, 3, 58. [Google Scholar] [CrossRef]
- Antoniu, G.; Costan, A.; Pérez, M.; Stojanovic, N. The Sigma Data processing Architecture. In Proceedings of the Leveraging Future Data for Extreme-Scale Data Analytics to Enable High-Precision Decisions, Big Data and Extreme Scale Computing 2nd Series, (BDEC2), Bloomington, IN, USA, 28–30 November 2018. [Google Scholar]
- Gillet, A.; Leclercq, E.; Cullot, N. Evolution et formalisation de la Lambda Architecture pour des analyses a hautes performances-Application aux donnees de Twitter. Rev. Ouvert. De L’Ingenierie Des Syst. D’Information (ROISI) 2021, 2, 26. [Google Scholar] [CrossRef]
- Warren, J.; Marz, N. Big Data: Principles and Best Practices of Scalable Realtime Data Systems; Simon and Schuster: New York, NY, USA, 2015. [Google Scholar]
- Pardillo, J.; Mazon, J.N. Using Ontologies for the Design of Data Warehouses. Int. J. Database Manag. Syst. 2011, 3, 73–87. [Google Scholar] [CrossRef]
- Ta’a, A.; Abdullah, M.S. ontology development for ETL process design. In Ontology-Based Applications for Enterprise Systems and Knowledge Management; IGI Global: Pennsylvania, PA, USA, 2013; pp. 261–275. [Google Scholar]
- Hofferer, P. Achieving business process model interoperability using metamodels and ontologies. In Proceedings of the ECIS 2007, St. Gallen, Switzerland, 7–9 June 2007. [Google Scholar]
- Simitsis, A. Modeling and Optimization of Extraction-Transformation-Loading (ETL) Processes in Data Warehouse Environments. Ph.D. Thesis, National Technical University of Athens, Athens, Greece, 2004. [Google Scholar]
- Samoylov, A.; Tselykh, A.; Sergeev, N.; Kucherova, M. Review and analysis of means and methods for automatic data extraction from heterogeneous sources. In Proceedings of the IV International Research Conference “Information Technologies in Science, Management, Social Sphere and Medicine” (ITSMSSM), Tomsk, Russia, 5–8 December 2017. [Google Scholar] [CrossRef]
- Dhaouadi, A.; Bousselmi, K.; Monnet, S.; Gammoudi, M.M.; Hammoudi, S. A Multi-layer Modeling for the Generation of New Architectures for Big Data Warehousing. In Proceedings of the International Conference on Advanced Information Networking and Applications, Sydney, Australia, 13–15 April 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 204–218. [Google Scholar]
Criteria | Value | Definition | Relevance |
---|---|---|---|
Standard formalism | A formalism can be a tool or a framework already tested and validated by the domain community. | To allow avoiding the multiplicity of formalisms and proprietary notations, reducing misunderstanding, and facilitating interoperability. | |
Graphical notations/symbols | Graphical shapes and notations often grouped in palettes and used to model ETL activities, objects, and data sources. | Relevant for communication, as they are more readable and understandable by human beings. | |
Modeling level | Conceptual, logical, physical | Conceptual, logical, and physical data models are the three levels of data modeling. | Taking these three models jointly into account can ensure a consistent DW process. |
Modeled phase | Extract, transform, load | Extract, transform, load are the three main phases of the DW process. | Each phase represents a central stage in the process of designing a DW. |
transformation level | Attribute, entity | The level at which the ETL transformation activities are effected. They can be at the entity level or at a lower level (attribute). | Shows how much the approach focuses on the detail of the modeling. |
Data source storage schema | Illustrates the details of the data source structures involved in the data warehousing process. | The data source schemas must be well-defined, in order to ensure their integration into the DW. | |
DW data storage schema | Defines the physical storage of the DW, depending on the target platform (e.g., relational, MD, OO). | The schema of the data target must be well-defined, in order to facilitate the mapping task with the source schema. | |
Mapping (schema/diagram) | A schema translation is used to map the source schema to the DW schema. It can be of diagram or schema form. | This inter-schema mapping allows us to understand the transition steps between the source and the target, and visually summarizes the mapping. | |
ETL meta-model | The process meta-model defines generic entities involved in all DW processes. | Enables managing extensibility at the meta-layer and adaptability at the model layer for a specific DW. | |
Prototype/modeling tool | A framework or tool is provided to implement the proposed model. | Shows the feasibility of the proposed model. | |
Integrated approach | The proposed ETL model is integrated into a global approach for the design of a DW. | Provides a consolidated view of the integration of the model into an end-to-end data warehousing process. | |
Rules/techniques/ algorithms of transformations | The means used to ensure the transition between the different levels of modeling: from conceptual to logical and from logical to physical. | Enriches the proposed model by means of inter-level transformation. Provides detailed insight into the technique deployed for transitions. | |
ETL activities described | The ETL activities described by the model. | Provides insight into the activities supported by the model and the application of the proposed approach. | |
Data type | Structured, semi-structured, unstructured | The type of data supported by the model. | Provides an idea regarding the complexity of data processing that the model will have to support. |
Mapping/ transformation technique | Manual, semiautomatic, automatic | A mapping technique is a process for creating a link between two distinct data models (source and target). It can be manual, semiautomatic, or automatic. | It is important in the logical process modeling phase in order to ensure a consistent DW process. |
Entity relationship | The relationships between the different entities presented in the data source storage schema and DW storage schema. | Highlights the different relationships, thus reinforcing understanding. | |
Approach validation | Propose validation of the proposed approach through a detailed case study and the proposal of a prototype or a framework. | Ensures the feasibility and implementation of the model in a concrete scenario. | |
Approach evaluation (Benchmark) | Conduct an experimental evaluation after the validation of the approach by use of a concrete use-case; for example, in order to check some performance parameters. In our context, the evaluation can be carried out through a benchmark. | Allows for recognition of the effort made by the researchers to verify the deployment of the proposed model and its features. | |
Interoperability | The interaction of the process model with the physical layer. | Provides insight into the deployment of the model. | |
Extensibility | Projects an idea about the capabilities of the model to support new features, such as adding new ETL tasks to be performed or changing data types. | Allows us to determine the scalability of the proposed model. | |
Explicit definition of transformation | Explicitly detail the tasks performed in the “transform” phase of an ETL process. | Facilitates understanding and implementation of the model. | |
Layered architecture/workflow | The model is composed of several layers, from which we can instantiate a multi-layer architecture. Each layer presents a level of modeling or a step of the process for the case of a workflow. | Allows us to identify the different layers and steps in terms of the modeling levels: conceptual, logical, and physical. Workflows allow for the orchestration of tasks and modularization of the data warehousing process model. | |
Workflow management | Describes the workflow management. | Facilitates understanding of the workflow, as well as its inputs and outputs. | |
GUI support | The proposed model supports a graphical user interface. | Displays that the model is exploitable. | |
ETL process requirement | Describes the specifications required by the ETL process for model design and implementation. | Provides an idea about the required environment and the functional requirements of the process. | |
Comprehensive tracking and documentation | A detailed description of all the tasks and steps supported by the ETL process, in addition to rich documentation. | Facilitates its familiarity, comprehension, and the deployment step. |
Approach | [6] | [22] | [33] | [39] | [34] | [35] | [40] | |
---|---|---|---|---|---|---|---|---|
Criteria | ||||||||
Standard formalism | X | X | X | X | X | X | ||
Graphical notations | X | X | ||||||
Modeling level | Conceptual | X | X | X | X | X | X | X |
Logical | X | X | X | X | ||||
Physical | X | X | ||||||
Modeled phase | Extract | X | X | X | ||||
Transform | X | X | X | X | X | |||
Load | X | X | X | |||||
Transformation level | Attribute | X | ||||||
Entity | X | X | X | X | ||||
Data source storage schema | X | X | ||||||
DW data storage schema | X | X | X | |||||
Mapping (schema/diagram) | X | X | X | |||||
Mapping technique | Manual | X | X | |||||
Semiautomatic | ||||||||
Automatic | ||||||||
ETL meta-model | X | |||||||
Prototype/modeling tool | X | X | ||||||
Integrated approach | X | X | ||||||
Rules/techniques/algorithms of transformations | X | X | ||||||
Automatic transformation | X | X | ||||||
ETL activities described | 10 | 3 | 10 | 9 | 3 | 2 | 2 | |
Data type | Structured | X | X | X | X | X | X | |
Semi-structured | ||||||||
Unstructured | X | |||||||
Entity relationship | X | X | ||||||
Approach validation | X | X | X | |||||
Approach evaluation (benchmark) | ||||||||
Interoperability | X | X | X | |||||
Extensibility | X | X | X | |||||
Explicit definition of transformation | X | X | ||||||
Layered architecture | X | X | ||||||
Workflow management | X | X | ||||||
GUI support | X | |||||||
ETL process requirement | X | X | ||||||
Dynamic aspect | X | X | X | |||||
Class diagram | X | X | ||||||
Activity diagram | X | X | X | X | X | |||
Requirement diagram | X | X | ||||||
Object diagram | X |
Approach | [42,44,46] | [49] | [47] | [43] | [23] | [45] | [17,48] | |
---|---|---|---|---|---|---|---|---|
Criteria | ||||||||
Standard formalism | X | X | ||||||
Graphical notations/symbols | X | |||||||
Modeling level | Conceptual | X | X | X | X | |||
Logical | X | X | X | X | X | X | ||
Physical | X | X | X | X | X | X | X | |
Modeled Pphase | Extract | X | X | X | X | |||
Transform | X | X | X | X | X | X | ||
Load | X | X | X | X | ||||
Transformation level | Attribute | X | X | X | ||||
Entity | X | X | X | X | X | X | ||
Data source storage schema | X | X | X | X | ||||
DW data storage schema | X | X | X | X | ||||
Mapping (schema/diagram) | X | X | X | |||||
Mapping technique | Manual | X | X | |||||
Semiautomatic | X | X | ||||||
Automatic | X | X | X | |||||
ETL meta-model | X | X | X | |||||
Prototype/modeling tool | X | X | X | X | X | X | ||
Integrated approach | X | X | X | X | ||||
Rules/techniques/algorithm of transformations | X | X | X | X | ||||
Automatic transformation | X | X | X | |||||
ETL activities described | 13 | 1 | 1 | NA | 10 | NA | 10 | |
Automatic transformation | X | X | X | |||||
Data type | Structured | X | X | X | X | X | X | |
Semi-structured | X | X | X | |||||
Unstructured | X | |||||||
Entity relationship | X | X | X | X | X | X | ||
Approach validation | X | X | X | X | X | |||
Approach evaluation (benchmark) | X | X | X | X | ||||
Interoperability | X | X | X | X | X | |||
Extensibility | X | X | X | X | X | X | ||
Explicit definition of transformation | X | X | X | |||||
Layered architecture/workflow | X | X | X | X | X | |||
Workflow management | X | X | ||||||
GUI support | X | X | X | |||||
ETL process requirement | X | |||||||
Comprehensive tracking and documentation | X | X | X | X | X | |||
Reusability | X | X | X | X | X | |||
Formally specification | X | X | X | |||||
Business requirement | X | X | X | |||||
Ontology approach | Single | X | X | X | ||||
Multiple | ||||||||
Hybrid | X | X | X | X | X | |||
Ontology | Application | X | X | |||||
Domain | X | X | X | X | ||||
Heterogeneity | Semantic | X | X | X | X | X | X | X |
Structural | X | X | X | X |
Approach | [24,64,68] | [39] | [65,66] | |
---|---|---|---|---|
Criteria | ||||
Standard formalism | X | X | X | |
Graphical notations/symbols | X | |||
Modeling level | Conceptual | X | X | X |
Logical | X | X | ||
Physical | X | X | X | |
Modeled phase | extract | X | ||
Transform | X | X | X | |
Load | ||||
Transformation level | Attribute | |||
Entity | X | X | X | |
Data source storage schema | X | |||
DW data storage schema | X | X | ||
Mapping (schema/diagram) | X | X | X | |
Mapping technique | Manual | |||
Semiautomatic | X | |||
Automatic | X | X | ||
ETL meta-model | X | X | ||
Prototype/modeling tool | X | X | ||
Integrated approach | ||||
Automatic transformation | X | X | X | |
ETL activities described | NA | 9 | 2 | |
Data type | Structured | X | X | X |
Semi-structured | ||||
Unstructured | ||||
Entity relationship | ||||
Approach validation | X | X | X | |
Approach evaluation (benchmark) | ||||
Interoperability | X | X | X | |
Extensibility | X | X | X | |
Explicit definition of transformation | X | X | X | |
Layered architecture/workflow | X | X | ||
Workflow management | ||||
GUI support | X | X | ||
ETL process requirement | ||||
Comprehensive tracking and documentation | X | X | X | |
Reusability | X | X | X | |
Formally specification | X | X | X | |
Business requirement | X | X | ||
ETL constraints | X | X | ||
MDA layers | CIM | X | X | |
PIM | X | X | X | |
QVT | X | X | X | |
PSM | X | X | ||
Code | X |
Approach | BPMN | CPN | YAWL | D. Flow Visualization | |||||
---|---|---|---|---|---|---|---|---|---|
Criteria | [25,70,71] | [73] | [20] | [26] | [27] | [28] | [29] | [75] | |
Standard formalism | X | X | X | X | X | ||||
Graphical notations/symbols | X | X | X | ||||||
Modeling level | Conceptual | X | X | X | X | X | X | ||
Logical | X | X | X | X | |||||
Physical | X | X | X | ||||||
Modeled phase | Extract | X | X | X | X | X | X | X | X |
transform | X | X | X | X | X | X | |||
load | X | X | X | X | X | ||||
Transformation level | Attribute | X | |||||||
Entity | X | X | X | X | X | X | |||
Data source storage schema | X | X | X | ||||||
DW data storage schema | X | X | X | ||||||
Mapping (schema/diagram) | |||||||||
Mapping technique | Manual | X | X | X | X | X | |||
Semiautomatic | |||||||||
Automatic | X | ||||||||
ETL meta-model | X | X | X | ||||||
Prototype/modeling tool | X | X | |||||||
Integrated approach | |||||||||
Rules/techniques/algorithm of transformations | |||||||||
Automatic transformation | |||||||||
ETL activities described | 16 | 8 | 10 | NA | NA | 3 | 7 | NA | |
Data type | Structured | X | X | X | X | X | X | X | X |
Semi-structured | |||||||||
Unstructured | |||||||||
Entity relationship | X | ||||||||
Approach validation | X | X | X | X | X | ||||
Approach evaluation (benchmark) | |||||||||
Interoperability | X | X | X | X | X | ||||
Extensibility | X | X | X | X | X | X | X | ||
Explicit definition of transformation | X | X | |||||||
Layered architecture/workflow | X | X | X | X | |||||
Workflow management | X | X | X | X | X | ||||
GUI support | X | X | |||||||
ETL process requirement | X | ||||||||
Comprehensive tracking and documentation | X | ||||||||
Reusability | X | X | X | X | |||||
Formal specification | X | ||||||||
Business requirement | X | X | X | ||||||
ETL constraints | X |
Approach | Conceptual Constructs | CommonCube | EMD | |||||
---|---|---|---|---|---|---|---|---|
Criteria | [77,78] | [85,86,87] | [81] | [31] | [88] | [32] | [89] | |
Standard formalism | ||||||||
Graphical notations/symbols | X | X | X | X | X | X | ||
Modeling level | Conceptual | X | X | X | X | X | X | X |
Logical | X | X | ||||||
Physical | X | X | X | X | ||||
Modeled phase | Extract | X | X | X | X | X | X | |
Transform | X | X | X | X | X | X | X | |
Load | X | X | X | X | X | X | ||
Transformation level | Attribute | X | X | X | X | X | X | |
Entity | X | X | X | X | ||||
Data source storage schema | X | X | X | X | ||||
DW data storage schema | X | X | X | X | X | X | ||
Mapping (schema/diagram) | X | X | ||||||
Mapping technique | Manual | X | X | X | X | |||
Semiautomatic | X | X | X | |||||
Automatic | ||||||||
ETL meta-model | X | X | X | X | X | |||
Prototype/modeling tool | X | X | X | X | ||||
Integrated approach | X | X | ||||||
Rules/techniques/algorithm of transformations | ||||||||
Automatic transformation | ||||||||
ETL activities described | 12 | 8 | 12 | 7 | 15 | 15 | 15 | |
Data type | Structured | X | X | X | X | X | X | X |
Semi-structured | X | X | ||||||
Unstructured | ||||||||
Entity relationship | X | X | X | X | X | X | ||
Approach validation | X | X | X | X | ||||
Approach evaluation (benchmark) | X | |||||||
Interoperability | X | X | X | X | X | |||
Extensibility | X | X | X | |||||
Explicit definition of transformation | X | X | X | |||||
Layered architecture/workflow | X | X | X | X | X | |||
Workflow management | X | X | X | |||||
GUI support | X | X | X | X | X | |||
ETL process requirement | X | X | ||||||
Comprehensive tracking and documentation | X | X | ||||||
Reusability | X | X | X | |||||
Formally specification | X | X | ||||||
Business requirement | ||||||||
ETL constraints | X | X |
Approach | [83] | [93] | [92] | [99] | [100] | [104] | |
---|---|---|---|---|---|---|---|
Criteria | |||||||
Data type | Structured | X | X | X | |||
Semi-structured | X | X | X | X | X | ||
Unstructured | X | ||||||
Mapping (schema/diagram) | X | ||||||
Mapping technique | Manual | ||||||
Semiautomatic | X | X | |||||
Automatic | X | ||||||
Entity relationship | X | ||||||
Approach validation | X | X | X | X | |||
Approach evaluation (benchmark) | |||||||
Interoperability | X | X | X | X | |||
Extensibility | X | X | X | ||||
Explicit definition of transformation | X | X | |||||
Layered architecture/workflow | X | X | X | X | X | X | |
Workflow management | X | X | X | ||||
GUI support | X | X | X | X | |||
ETL process requirement | X | ||||||
Comprehensive tracking and documentation | X | X | |||||
Reusability | X | X | X | ||||
Formally specification | X | ||||||
Business requirement | X | X | |||||
ETL constraints | |||||||
Big Data | Massive volume | X | X | X | X | ||
Velocity | X | ||||||
Variability | X | ||||||
Veracity | X | X | X | ||||
DB | Relational | X | X | ||||
NoSQL | X | X | X | ||||
Processing | Batch | X | X | X | X | ||
Stream | X |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dhaouadi, A.; Bousselmi, K.; Gammoudi, M.M.; Monnet, S.; Hammoudi, S. Data Warehousing Process Modeling from Classical Approaches to New Trends: Main Features and Comparisons. Data 2022, 7, 113. https://doi.org/10.3390/data7080113
Dhaouadi A, Bousselmi K, Gammoudi MM, Monnet S, Hammoudi S. Data Warehousing Process Modeling from Classical Approaches to New Trends: Main Features and Comparisons. Data. 2022; 7(8):113. https://doi.org/10.3390/data7080113
Chicago/Turabian StyleDhaouadi, Asma, Khadija Bousselmi, Mohamed Mohsen Gammoudi, Sébastien Monnet, and Slimane Hammoudi. 2022. "Data Warehousing Process Modeling from Classical Approaches to New Trends: Main Features and Comparisons" Data 7, no. 8: 113. https://doi.org/10.3390/data7080113
APA StyleDhaouadi, A., Bousselmi, K., Gammoudi, M. M., Monnet, S., & Hammoudi, S. (2022). Data Warehousing Process Modeling from Classical Approaches to New Trends: Main Features and Comparisons. Data, 7(8), 113. https://doi.org/10.3390/data7080113