Hello All,
In my previous blog, we went through different concepts on a macro level. From this blog onward we will delve deeper and will understand each concept in detail.
First concept that we will talk about in this blog is about Dimensional Modelling.
What is Dimensional Modelling?
From a high level, Dimensional Modelling is the technique, concepts in designing a Data Warehouse. We build data warehouse with certain design methodology, technique in order to analyse billions of rows in few seconds which is called as Dimensional Modelling.
Some of the key elements of Dimensional Modelling are as follows:
Fact Table: A fact table is a simple table that contains facts and dimensions. Facts means numerical values that helps in making decision when aggregated or analysed across different dimensions for example profit in a transaction, cost paid by customer in a transaction. The facts are stored in the fact table corresponding to a particular set of dimensions. For example we store profit by product by store by time by customer type. Therefore profit here is the fact and product, store, time, customer type are dimensions.
Facts are of three different types:
Additive Facts: These are the facts when aggregated across dimensions have a meaningful outcome For example : Profit
Semi-Additive Facts: These are the facts that can be aggregated across some dimensions but does not have a meaningful outcome when aggregated across other dimensions. For example: To aggregate the stock level of a product on a particular day, but aggregating it over a period wont tell us about the correct stock level value about the product.
Non- Additive Facts: These are the Facts that can not be aggregated across any dimension. For example: ratio, percentage etc.
Dimension Table: This table contains values which adds meaning to fact values.
The table that contains dimensions across which facts are sliced, diced , drilled down and drilled through. Dimensions helps in bringing out the values that are useful for business people. For Example Fact such as profit is useless until we do not know that profit value is by product by time by location and many more which will help in inferring useful information.
Grain: The highest level of detail that your data contains is called grain. Grain helps in defining the level of questions that your dimensional model can answer, for Example lets consider our transcript. A transcript contains data at three levels i.e.
1) GPA at Subject Level
2) GPA at Semester Level
3) GPA at Year Level
These are three different levels of grain, hence if we store data regarding subject we can calculate our GPA at Semester and Year level, and if we store our GPA regarding semester level we can roll it up to Year level but can not drill down to Subject level, Similar is the case for Year level.
Thus determining the grain is the most important design aspect of Dimensional Modelling.With this i conclude my blog on Unraveling the Unravelable Dimensional Modelling
Stay tuned to know more about dimensional Modelling. In my next blog i will discuss about Star Schema, Snow-Flake Schema, Business process and a glimpse into Market Basket Analysis.
References:
The Data Warehouse Toolkit - Third Edition, Ralph Kimball and Margy Ross
http://en.wikipedia.org/wiki/Dimensional_Modelling
In my previous blog, we went through different concepts on a macro level. From this blog onward we will delve deeper and will understand each concept in detail.
First concept that we will talk about in this blog is about Dimensional Modelling.
What is Dimensional Modelling?
From a high level, Dimensional Modelling is the technique, concepts in designing a Data Warehouse. We build data warehouse with certain design methodology, technique in order to analyse billions of rows in few seconds which is called as Dimensional Modelling.
Some of the key elements of Dimensional Modelling are as follows:
- Fact Table
- Dimension Table
- Grain
Fact Table: A fact table is a simple table that contains facts and dimensions. Facts means numerical values that helps in making decision when aggregated or analysed across different dimensions for example profit in a transaction, cost paid by customer in a transaction. The facts are stored in the fact table corresponding to a particular set of dimensions. For example we store profit by product by store by time by customer type. Therefore profit here is the fact and product, store, time, customer type are dimensions.
Facts are of three different types:
Additive Facts: These are the facts when aggregated across dimensions have a meaningful outcome For example : Profit
Semi-Additive Facts: These are the facts that can be aggregated across some dimensions but does not have a meaningful outcome when aggregated across other dimensions. For example: To aggregate the stock level of a product on a particular day, but aggregating it over a period wont tell us about the correct stock level value about the product.
Non- Additive Facts: These are the Facts that can not be aggregated across any dimension. For example: ratio, percentage etc.
Dimension Table: This table contains values which adds meaning to fact values.
The table that contains dimensions across which facts are sliced, diced , drilled down and drilled through. Dimensions helps in bringing out the values that are useful for business people. For Example Fact such as profit is useless until we do not know that profit value is by product by time by location and many more which will help in inferring useful information.
Grain: The highest level of detail that your data contains is called grain. Grain helps in defining the level of questions that your dimensional model can answer, for Example lets consider our transcript. A transcript contains data at three levels i.e.
1) GPA at Subject Level
2) GPA at Semester Level
3) GPA at Year Level
These are three different levels of grain, hence if we store data regarding subject we can calculate our GPA at Semester and Year level, and if we store our GPA regarding semester level we can roll it up to Year level but can not drill down to Subject level, Similar is the case for Year level.
Thus determining the grain is the most important design aspect of Dimensional Modelling.With this i conclude my blog on Unraveling the Unravelable Dimensional Modelling
Stay tuned to know more about dimensional Modelling. In my next blog i will discuss about Star Schema, Snow-Flake Schema, Business process and a glimpse into Market Basket Analysis.
References:
The Data Warehouse Toolkit - Third Edition, Ralph Kimball and Margy Ross
http://en.wikipedia.org/wiki/Dimensional_Modelling
No comments:
Post a Comment