Wednesday, March 12, 2014

Growing Business Changing Data Changing Dimensions!!!

Hi All,

In the previous blog we discussed about the different type of keys such as Domain, Natural, Surrogate keys which will help us to retrieve data easily and will maintain data in a coherent manner.

The only thing constant in the world of Business is change, thus as Business grows the need to change becomes inevitable. In this Blog we will be discussing about different ways to accommodate change in Data in a DataWarehouse.

This concept is called as Slowly Changing Dimensions, therefore these are as follows:

a) Type 0 - Retain Original
  • Attributes that do not change, Example: Date dimension and Durable Supernatural Key

b) Type 1 - Overwrite
  • Update the dimension table row by overwriting
  • History is lost
  • Need to rebuild OLAP cubes


c) Type 2 – Add new row
  • Most Commonly used technique
  • New Row added to the table
  • Distinguishes using effective date columns
  • Requires use of Surrogate Keys
  • Fact table remains unaffected



d) Type 3 - Add New Attribute
  • Not used Commonly
  • Add new attribute to capture the previous value
  • Not useful for attributes that changes unpredictably
  • Useful when the change affects many rows and constitutes a big events e.g.: major re-organization 




e)  Type 4 - Add Mini-Dimension
  • Very large table with some frequently changing attributes
  • Break off the Frequently Changing dimensions into a separate dimension
  • Type 2 technique will lead to explosion of data in frequently changing dimensions




These are some of the techniques used in order to encounter changing dimensions and data problem.

With this i conclude my Fifth Blog on "Growing Business Changing Data Changing Dimensions!!!". Stay tuned to learn more about basics of Dimensional Modelling and Business Intelligence.

References:
The Data Warehouse Toolkit - Third Edition, Ralph Kimball and Margy Ross
MIS587_K&RChapter_05





Wednesday, February 26, 2014

Keys Locking Up The Sane Data

Hi All,

In the previous blog we discussed about different Schema in which a Data Warehouse can be designed. Now lets discuss some of the other important concepts which are essential in order to provide flexibility and scalability to a Data Warehouse.

Some of the important concepts regarding the keys in order to maintain data integrity are as follows:

Surrogate Key:
The advent of Surrogate key comes from the basics of maintaining temporal or historical data in the database. Distinction between a Surrogate Key and Primary Key depends on whether the database is Current Database or Temporal Database. Lets discuss Surrogate Key usability in two different types of database
       
Current Database:
Since current database stores only current data , hence there is a one to one relationship between Surrogate Key and Primary Key. In this case the Surrogate Key may be used as a primary key.
     
Temporal Database:
As temporal database stores historical and temporary database therefore there is a one to many relationship between Surrogate Key and Business key. Distinction between these two keys is highly important as Surrogate Key identifies one row in the table uniquely, whereas business key identifies a business entity in the business model uniquely. The table holds the life span of the entity in the modeled world. An example of a temporal table containing Surrogate Keys.


Advantages of Surrogate Key:
  • Application can not lose their reference to a row in the database.
  • Surrogate key tend to be a compact data type, hence they perform faster when queried
  • Surrogate helps in maintaining the uniformity of detail or an entity throughout the data

Natural Key:
A natural key is a key that is formed on the entities or attributes that already exists in the world. For example a table consisting of Genetic DNA makeup of a person can be considered as a Natural Key.

A Natural key is a candidate key that has a logical relationship to the attributes within that row.A Natural Key is sometimes called as a Domain Key. It ensures that there can only be one row for a Key.

Disadvantages of Natural Key:
We need to choose natural key cautiously as choosing a column that may change dynamically can lead to problems. For example if we choose name of a person as natural key and if the name of the person changes due to religion transformation, changing the name of a person in the database and reflecting the changes across all the table containing a foreign key will be an overhead for the database.


With this i would like to conclude my blog on "Keys Locking Up The Sane Data". Stay tuned to learn about Slowly Changing Dimensions, the next important step in creating a Coherent Dimensional Model.

References:
The Data Warehouse Toolkit - Third Edition, Ralph Kimball and Margy Ross
http://en.wikipedia.org/wiki/Dimensional_Modelling
http://en.wikipedia.org/wiki/Surrogate_key
http://en.wikipedia.org/wiki/Natural_key







Sunday, February 9, 2014

Snow Covered Wagon hitched to a Star (Dimensional Modelling)

Hello One and All,

In my last Blog we discussed about Dimensional Modelling and some of its components. Today we will go through different Schema that can be used during Dimensional Modelling to create a Data Warehouse.

Before we start with today's topic , For my viewers those who are new to this field i would like to revisit some of the key points of my previous blogs:

1)  Business Intelligence is mainly divided into three parts as per my understanding
                 a) Data Warehouse design and Implementation (ETL process)
                 b) Data Analysis (Using OLAP cubes)
                 c) Reporting and Dashboard Creation
For further details revisit my First blog 

2) Important Components involved in Dimensional Modelling or Data Warehouse Designing
                 a)  Fact Tables (Additive Facts, Semi-Additive Facts, Non- Additive Facts)
                 b)  Dimension Table
                 c)  Grain
For further details revisit my Second blog

After a thorough revision of previous concepts lets start our today's discussion about different Schema involved in Dimensional Modelling or Data Warehouse Designing.

First of all i would like to explain the meaning of the topic i.e
Snow Covered Wagon Hitched to a Star = SnowFlake Schema and Star Schema are two types of Schema that are used while designing a Data Warehouse, Hence they can be explained as follows:

Star Schema:  A Star Schema is one of the simplest and easiest schema to understand. A schema which consists of Dimension tables only attached to Fact tables. A Star Schema get its name from its physical representation where we have Fact table at the center with Dimension table surrounding it representing the star points. Star Schema can also be called as a denormalized schema as it contains Dimensional table in the lowest normal form.

Star Schema
Snow Flake Schema:   The Snowflake Schema can be formed by Snowflaking(Normalizing) the Star Schema . The Dimension table in a Snowflake Schema may or may not contain a parent table depending on its normal form. All the dimension tables in Snowflake schema are normalized to third normal form (thus dimension table does not contain any transitive dependencies).


SnowFlake Schema
Pros and Cons of both the Schemas are as follows:

Star Schema:
Pros: A Star Schema is very quick in response to ad-hoc queries as this kind of schema can answer maximum queries with minimum number of joins.
Cons:  A Star Schema suffers in space and data management as the data is redundant in dimension table as the tables are in their lowest normal forms.

Snowflake Schema: 
Pros:    A SnowFlake Schema is preferred for Space and Data Management as the Schema consists of table in third normal form.
Cons:   A SnowFlake Schema takes large amount of time in querying database as this type of design involves joins to answer a query and as joins are highly expensive.

Hence a choice between Schema is made as per business requirement, the system that requires faster query performance uses Star Schema whereas the one that requires Space Management uses SnowFlake Schema.

With this i conclude my blog about Schema. Stay tuned to learn more about different challenges and tasks involved while designing a Data Warehouse. In my coming blog we will talk about Surrogate Keys, Natural Keys, Durable Keys, Slowly Changing Dimensions Type and much more.

References:
The Data Warehouse Toolkit - Third Edition, Ralph Kimball and Margy Ross
http://en.wikipedia.org/wiki/Dimensional_Modelling

















Wednesday, February 5, 2014

Unraveling the Unravelable Dimensional Modelling

Hello All,

In my previous blog, we went through different concepts on a macro level. From this blog onward we will delve deeper and will understand each concept in detail.

First concept that we will talk about in this blog is about Dimensional Modelling.

What is Dimensional Modelling?
From a high level, Dimensional Modelling is the technique, concepts in designing a Data Warehouse. We build data warehouse with certain design methodology, technique  in order to analyse billions of rows in few seconds which is called as Dimensional Modelling.

Some of the key elements of Dimensional Modelling are as follows:

  1. Fact Table
  2. Dimension Table
  3. Grain


Fact Table:   A fact table is a simple  table that contains facts and dimensions. Facts means numerical values  that helps in making decision when aggregated or analysed across different dimensions for example profit in a transaction, cost paid by customer in a transaction. The facts are stored in the fact table corresponding to a particular set of dimensions. For example we store profit by product by store by time by customer type. Therefore profit here is the fact and product, store, time, customer type are dimensions.

Facts are of three different types:

Additive Facts: These are the facts when aggregated across dimensions have a meaningful outcome For example :  Profit

Semi-Additive Facts: These are the facts that can be aggregated across some dimensions but does not     have a meaningful outcome when aggregated across other dimensions. For example: To aggregate the stock level  of a product on a particular day, but aggregating it over a period wont tell us about the correct stock level value about the product.

Non- Additive Facts: These are the Facts that can not be aggregated across any dimension. For example: ratio, percentage etc.

Dimension Table: This table contains values which adds meaning to fact values.
The table that contains dimensions across which facts are sliced, diced , drilled down and drilled through. Dimensions helps in bringing out the values that are useful for business people. For Example Fact such as profit is useless until we do not know that profit value is by product by time by location and many more which will help in inferring useful information.

Grain:  The highest level of detail that your data contains is called grain. Grain helps in defining the level of questions that your dimensional model can answer, for Example lets consider our transcript. A transcript contains data at three levels i.e.
     1) GPA at Subject Level
     2) GPA at Semester Level
     3) GPA at Year Level

These are three  different levels of grain,  hence if we store data regarding subject we can calculate our GPA at Semester and Year level, and if we store our GPA regarding semester level we can roll it up to Year level but can not drill down to Subject level, Similar is the case for Year level.

Thus determining the grain is the most important design aspect of Dimensional Modelling.With this i conclude my blog on Unraveling the Unravelable Dimensional Modelling

Stay tuned to know more about dimensional Modelling. In my next blog i will discuss about Star Schema, Snow-Flake Schema, Business process and a glimpse into Market Basket Analysis.

References:
The Data Warehouse Toolkit - Third Edition, Ralph Kimball and Margy Ross
http://en.wikipedia.org/wiki/Dimensional_Modelling












Sunday, January 26, 2014

Business Intelligence a World in itself !!!


Hello Everyone,

I would first like to thank people those who showed interest in reading this blog.

Have you ever wondered why does Super Market people keeps shifting stuff around ?  I never realized that there might be such a powerful strategy behind placing of all this stuff. The Concept is called as Market Basket Analysis. Lets have a look at  the methodology, concept, technology and tools behind all of this.


Before i get started with the concepts of Business Intelligence and dive into the mechanics of it, i want to make sure that any person who is reading this blog must be passionate about making a positive impact to the lives of people.

Without further adieu lets start our journey,

What is Business Intelligence?

Business Intelligence is a technology that helps you convert raw form of data into Information and actionable insights that could easily be utilized by business people in order to make important decisions. The three phases as i understood out of my First Class taught by Prof. Henry Childers can be broadly classified as following:

  • Data Warehouse
  •  Data Analysis.
  •  Reporting and Dashboard Creation.

Now, lets have a look at each of these concepts in detail:

Data Warehouse:  Data Warehouse is a database used for reporting and data analysis. Integrating data from one or more sources such as flat files , Oracle database, SQL server database and many more  creating a central repository of data. Implementation of Data Warehouse involves several other processes such as extraction , transformation, staging and loading of data into Data Warehouse (whole process is commonly known as ETL).

Data Analysis : Data Analysis is one of the most interesting part in the whole domain of Business Intelligence. As the name clearly indicates this process is used for analyzing data. This process is used in analysis of billions and trillions of rows in a single click. The most important component that helps in analyzing data is called cube (a three dimensional model in order to analyze data).

Reporting and Dashboard Creation:  I term this part to be one of the most challenging part of the whole process as with changing business needs and changing time , the information that needs to be represented via reports or dashboards also gets changed. This is the part of the whole process that makes you realize that the way you see and perceive raw data is highly different from what you can do by utilizing the same data.

The more you dig deeper the more interesting it becomes, but not at the cost of concepts, but by realizing and seeing the impact that these concept make on businesses and lives of people. With this i conclude that Business Intelligence has become an essential part of the Corporate World.

As we move forward we will dig deeper and deeper and i would like to hold the discussion about the impact that Business Intelligence makes on Businesses for next time.

References:
The Data Warehouse Toolkit - Third Edition, Ralph Kimball and Margy Ross
http://en.wikipedia.org/wiki/Data_warehouse
http://en.wikipedia.org/wiki/Business_Intelligence