Design Methodology
Dimension Modeling - There are two different approaches to the goal of a dimension Data Warehouse dimension model.
- Kimball - This methodology focuses on a bottom-up approach, emphasizing the value of the data warehouse to the users as quickly as possible. This methodology uses a combination of Facts and Dimensions to create definition of the data marts.
- Inmon - Bill Inmon saw a need to integrate data from different OLTP systems into a centralized repository (Data Warehouse) with a so called top-down approach. Bill Inmon envisions a data warehouse at center of the "Corporate Information Factory" (CIF), which provides a logical framework for delivering business intelligence (BI), business analytics and business management capabilities. This methodology builds a complete normalized data model first then develops a set of data marts to create the complete CIF.
Data Vault - Data vault modeling makes no distinction between good and bad data ("bad" meaning not conforming to business rules). The modeling method is designed to be resilient to change in the business environment where the data being stored is coming from, by explicitly separating structural information from descriptive attributes. Data vault is designed to enable parallel loading as much as possible, so that very large implementations can scale out without the need for major redesign.
EAV – Event Attribute Value is used when you have many attributes for an entity and these attribute are dynamic (added/removed). Also there is a high possibility that many of these attribute would have empty or null value most of the time. In such a situation EAV structure has many advantages mainly with optimized mysql storage. Disadvantages of EAV is mainly speed since it require multiple sql queries across tables to perform any operation.
Data Lakes - A data lake is a central location in which to store all your data, regardless of its source or format. It is typically, although not always, built using Hadoop. The data can be structured or unstructured. You can then use a variety of storage and processing tools—typically tools in the extended Hadoop family—to extract value quickly and inform key organizational decisions.