SlideShare a Scribd company logo
1
Data Warehousing
Dimensional Modeling (DM)
Ch Anwar ul Hassan (Lecturer)
Department of Computer Science and Software
Engineering
Capital University of Sciences & Technology,
Islamabad Pakistan
anwarchaudary@gmail.com
2
The need for ER modeling?
 Problems with early COBOLian data
processing systems.
 Data redundancies
 From flat file to Table, each entity ultimately
becomes a Table in the physical schema.
 Simple O(n2) join to work with tables
3
Why ER Modeling has been so successful?
 Coupled with normalization drives out all
the redundancy out of the database.
 Change (or add or delete) the data at just
one point.
 Can be used with indexing for very fast
access.
 Resulted in success of OLTP systems.
4
Need for DM: Un-answered Qs
 Lets have a look at a typical ER data model first.
 Some Observations:
 All tables look-alike, as a consequence it is difficult to
identify:
 Which table is more important ?
 Which is the largest?
 Which tables contain numerical measurements of the
business?
 Which table contain nearly static descriptive attributes?
Is DM really needed? In order to better understand the need for DM lets have a
look at the diagram showing the retail data in simplified 3N
5
6
Need for DM: Complexity of Representation
 Many topologies for the same ER
diagram, all appearing different.
Very hard to visualize and remember.
A large number of possible connections to
any two (or more) tables
1
10
3
12
2
6
5
11 4
7
8
9
1
10
3
12
2
6
5
11
4
7
8
9
7
Need for DM: The Paradox
 The Paradox: Trying to make information accessible using
tables resulted in an inability to query them!
 ER and Normalization result in large number of tables which
are:
 Hard to understand by the users (DB programmers)
 Hard to navigate optimally by DBMS software
 Real value of ER is in using tables individually or in pairs
 Too complex for queries that span multiple tables with a
large number of records
8
ER vs. DM
ER DM
Constituted to optimize OLTP
performance.
Constituted to optimize DSS
query performance.
Models the micro relationships
among data elements.
Models the macro
relationships among data
elements with an overall
deterministic strategy.
A wild variability of the
structure of ER models.
All dimensions serve as
equal entry points to the
fact table.
Very vulnerable to changes in
the user's querying habits,
because such schemas are
asymmetrical.
Changes in users' querying
habits can be
accommodated by
automatic SQL generators.
9
How to simplify a ER data model?
Two general methods:
 De-Normalization
 Dimensional Modeling (DM)
10
What is DM?…
 A simpler logical model optimized for decision
support.
 Inherently dimensional in nature, with a single
central fact table and a set of smaller
dimensional tables.
 Multi-part key for the fact table
 Dimensional tables with a single-part PK.
 Keys are usually system generated
11
What is DM?...
 Results in a star like structure, called star
schema or star join.
 All relationships mandatory M-1.
 Single path between any two levels.
 Supports ROLAP operations.
12
Dimensions have Hierarchies
Items
Books Cloths
Fiction Text Men Women
MedicalEngg
Analysts tend to look at the data through dimension at a
particular “level” in the hierarchy
13
The two Schemas
Star
Snow-flake
14
“Simplified” 3NF (Retail)
CITY DISTRICT
1
ZONE CITY
DISTRICT DIVISION
MONTH QTR
STORE # STREET ZONE ...
WEEK MONTH
DATE WEEK
RECEIPT # STORE # DATE ...
ITEM #RECEIPT # ... $
ITEM # CATEGORY
ITEM #
DEPTCATEGORY
year
month
week
sale_header
store
sale_detail
item_x_cat
item_x_splir
cat_x_dept
M
1
M
1M
1
M
1
1
M M
1
M
M M1
1
M
1
1
M
YEAR QTR
1
M
quarter
SUPPLIER
DIVISION PROVINCEM
1
BACK
division
district
zone
15
Vastly Simplified Star Schema
RECEIPT#
STORE#
DATE
ITEM# M
Fact Table
ITEM#
CATEGORY
DEPT
SUPPLIER
Product Dim
M
Sale Rs.
M
STORE#
ZONE
CITY
PROVINCE
Geography Dim
DISTRICT
DATE
WEEK
QUARTER
YEAR
Time Dim
MONTH
.
.
.
1
1
1
facts
DIVISION
16
The Benefit of Simplicity
Beauty lies in close correspondence
with the business, evident even to
business users.
17
Features of Star Schema
Dimensional hierarchies are collapsed into a single
table for each dimension. Loss of Information?
A single fact table created with a single header from the
detail records, resulting in:
 A vastly simplified physical data model!
 Fewer tables (thousands of tables in some ERP systems).
 Fewer joins resulting in high performance.
 Quantifying space requirement
18
The Process of Dimensional Modeling
Four Step Method from ER to DM
1. Choose the Business Process
2. Choose the Grain
3. Choose the Facts
4. Choose the Dimensions
19
Step-1: Choose the Business Process
 A business process is a major operational
process in an organization.
 Typically supported by a legacy system
(database) or an OLTP.
 Examples: Orders, Invoices, Inventory etc.
 Business Processes are often termed as
Data Marts and that is why many people
criticize DM as being data mart oriented.
20
Star-1
Star-2
Snow-flake
Step-1: Separating the Process
21
Step-2: Choosing the Grain
 Grain is the fundamental, atomic level of data to be
represented.
 Grain is also termed as the unit of analyses.
 Example grain statements
 Typical grains
 Individual Transactions
 Daily aggregates (snapshots)
 Monthly aggregates
 Relationship between grain and expressiveness.
 Grain vs. hardware trade-off.
22
Step-2: Relationship b/w Grain
Daily aggregates
6 x 4 = 24 values
Four aggregates per week
4 x 4 = 16 values
Two aggregates per week
2 x 4 = 8 values
LOW Granularity HIGH Granularity
23
The case FOR data aggregation
 Works well for repetitive queries.
 Follows the known thought process.
 Justifiable if used for max number of queries.
 Provides a “big picture” or macroscopic view.
24
The case AGAINST data aggregation
Aggregation is irreversible.
 Can create monthly sales data from weekly sales
data, but the reverse is not possible.
Aggregation limits the questions that can be
answered.
 What, when, why, where, what-else, what-next
25
The case AGAINST data aggregation
Aggregation can hide crucial facts.
The average of 100 & 100 is same as 150 & 50
26
Aggregation hides crucial facts Example
Week-1 Week-2 Week-3 Week-4 Average
Zone-1 100 100 100 100 100
Zone-2 50 100 150 100 100
Zone-3 50 100 100 150 100
Zone-4 200 100 50 50 100
Average 100 100 100 100
Just looking at the averages i.e. aggregate
27
Aggregation hides crucial facts chart
0
50
100
150
200
250
Week-1 Week-2 Week-3 Week-4
Z1 Z2 Z3 Z4
Z1: Sale is constant (need to work on it)
Z2: Sale went up, then fell (need of concern)
Z3: Sale is on the rise, why?
Z4: Sale dropped sharply, need to look deeply.
W2: Static sale
28
“We need monthly sales
volume and Rs. by
week, product and Zone”
Facts
Dimensions
Step 3: Choose Facts statement
29
 Choose the facts that will populate
each fact table record.
 Remember that best Facts are Numeric,
Continuously Valued and Additive.
 Example: Quantity Sold, Amount etc.
Step 3: Choose Facts
30
 Choose the dimensions that apply to
each fact in the fact table.
 Typical dimensions: time, product,
geography etc.
 Identify the descriptive attributes that
explain each dimension.
 Determine hierarchies within each
dimension.
Step 4: Choose Dimensions
31
Step-4: How to Identify a Dimension?
 The single valued attributes during recording of a
transaction are dimensions.
Calendar_Date
Time_of_Day
Account _No
ATM_Location
Transaction_Type
Transaction_Rs
Fact Table
Dim
Time_of_day: Morning, Mid Morning, Lunch Break etc.
Transaction_Type: Withdrawal, Deposit, Check balance etc.
32
Step-4: Can Dimensions be Multi-valued?
 Are dimensions ALWYS single?
 Not really
 What are the problems? And how to handle them
 Calendar_Date (of inspection)
 Reg_No
 Technician
 Workshop
 Maintenance_Operation
 How many maintenance operations are possible?
 Few
 Maybe more for old cars.
33
Step-4: Dimensions & Grain
 Several grains are possible as per business
requirement.
 For some aggregations certain descriptions do not
remain atomic.
 Example: Time_of_Day may change several times
during daily aggregate, but not during a transaction
 Choose the dimensions that are applicable
within the selected grain.
Ad

More Related Content

What's hot (20)

MIS Overview
MIS OverviewMIS Overview
MIS Overview
SuryaSourabh
 
Introduction to Teradata And How Teradata Works
Introduction to Teradata And How Teradata WorksIntroduction to Teradata And How Teradata Works
Introduction to Teradata And How Teradata Works
BigClasses Com
 
Dimensional Modelling - Basic Concept
Dimensional Modelling - Basic ConceptDimensional Modelling - Basic Concept
Dimensional Modelling - Basic Concept
Folio3 Software
 
Chap 8
Chap 8Chap 8
Chap 8
Neelam Soni
 
Normalization
NormalizationNormalization
Normalization
Masud Parves
 
Statistics with R
Statistics with R Statistics with R
Statistics with R
Ruru Chowdhury
 
Datapreprocessing
DatapreprocessingDatapreprocessing
Datapreprocessing
Chandrika Sweety
 
Statistical software packages
Statistical software packagesStatistical software packages
Statistical software packages
Km Ashif
 
BI Knowledge Sharing Session 2
BI Knowledge Sharing Session 2BI Knowledge Sharing Session 2
BI Knowledge Sharing Session 2
Kelvin Chan
 
Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentation
Vijay Yadav
 
BI Knowledge Sharing Session 1
BI Knowledge Sharing Session 1BI Knowledge Sharing Session 1
BI Knowledge Sharing Session 1
Kelvin Chan
 
Data analysis
Data analysisData analysis
Data analysis
Nursing Path
 
Introduction to spss 1
Introduction to spss 1Introduction to spss 1
Introduction to spss 1
Michael Taiwo
 
Stat11t alq chapter03
Stat11t alq chapter03Stat11t alq chapter03
Stat11t alq chapter03
raylenepotter
 
Chap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesChap02 presenting data in chart & tables
Chap02 presenting data in chart & tables
Uni Azza Aunillah
 
Processing of data in research
Processing of data in researchProcessing of data in research
Processing of data in research
HazirAli
 
Data entry in Excel and SPSS
Data entry in Excel and SPSS Data entry in Excel and SPSS
Data entry in Excel and SPSS
Dhritiman Chakrabarti
 
Akhiesh maurya
Akhiesh mauryaAkhiesh maurya
Akhiesh maurya
Akhilesh Maurya
 
Research methods module 5 msf
Research methods module 5 msfResearch methods module 5 msf
Research methods module 5 msf
Independent
 
Discrete And Continuous Simulation
Discrete And Continuous SimulationDiscrete And Continuous Simulation
Discrete And Continuous Simulation
Nguyen Chien
 
Introduction to Teradata And How Teradata Works
Introduction to Teradata And How Teradata WorksIntroduction to Teradata And How Teradata Works
Introduction to Teradata And How Teradata Works
BigClasses Com
 
Dimensional Modelling - Basic Concept
Dimensional Modelling - Basic ConceptDimensional Modelling - Basic Concept
Dimensional Modelling - Basic Concept
Folio3 Software
 
Statistical software packages
Statistical software packagesStatistical software packages
Statistical software packages
Km Ashif
 
BI Knowledge Sharing Session 2
BI Knowledge Sharing Session 2BI Knowledge Sharing Session 2
BI Knowledge Sharing Session 2
Kelvin Chan
 
Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentation
Vijay Yadav
 
BI Knowledge Sharing Session 1
BI Knowledge Sharing Session 1BI Knowledge Sharing Session 1
BI Knowledge Sharing Session 1
Kelvin Chan
 
Introduction to spss 1
Introduction to spss 1Introduction to spss 1
Introduction to spss 1
Michael Taiwo
 
Stat11t alq chapter03
Stat11t alq chapter03Stat11t alq chapter03
Stat11t alq chapter03
raylenepotter
 
Chap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesChap02 presenting data in chart & tables
Chap02 presenting data in chart & tables
Uni Azza Aunillah
 
Processing of data in research
Processing of data in researchProcessing of data in research
Processing of data in research
HazirAli
 
Research methods module 5 msf
Research methods module 5 msfResearch methods module 5 msf
Research methods module 5 msf
Independent
 
Discrete And Continuous Simulation
Discrete And Continuous SimulationDiscrete And Continuous Simulation
Discrete And Continuous Simulation
Nguyen Chien
 

Similar to Intro to Data warehousing lecture 08 (20)

mdmodel multidimensional (MD) modeling approach to represent more complex da...
mdmodel  multidimensional (MD) modeling approach to represent more complex da...mdmodel  multidimensional (MD) modeling approach to represent more complex da...
mdmodel multidimensional (MD) modeling approach to represent more complex da...
anitha803197
 
Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8
Shani729
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
Sunita Sahu
 
When & Why\'s of Denormalization
When & Why\'s of DenormalizationWhen & Why\'s of Denormalization
When & Why\'s of Denormalization
Aliya Saldanha
 
Dwh lecture 12-dm
Dwh lecture 12-dmDwh lecture 12-dm
Dwh lecture 12-dm
Sulman Ahmed
 
Lecture 13
Lecture 13Lecture 13
Lecture 13
Shani729
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
AnushaSharma81
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
vivekjv
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra
QUONTRASOLUTIONS
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
Dhiren Gala
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Animesh Srivastava
 
On multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and queryingOn multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and querying
Jaspreet Issaj
 
Preprocessing_new.ppt
Preprocessing_new.pptPreprocessing_new.ppt
Preprocessing_new.ppt
YashikaSengar2
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
Ashish Chandwani
 
Lecture 3:Introduction to Dimensional Modelling.pptx
Lecture 3:Introduction to Dimensional Modelling.pptxLecture 3:Introduction to Dimensional Modelling.pptx
Lecture 3:Introduction to Dimensional Modelling.pptx
RehmahAtugonza
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
ABDEL RAHMAN KARIM
 
Dwh lecture 13-process dm
Dwh  lecture 13-process dmDwh  lecture 13-process dm
Dwh lecture 13-process dm
Sulman Ahmed
 
Cs437 lecture 1-6
Cs437 lecture 1-6Cs437 lecture 1-6
Cs437 lecture 1-6
Aneeb_Khawar
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
BsMath3rdsem
 
Intro to Data warehousing Lecture 04
Intro to Data warehousing   Lecture 04Intro to Data warehousing   Lecture 04
Intro to Data warehousing Lecture 04
AnwarrChaudary
 
mdmodel multidimensional (MD) modeling approach to represent more complex da...
mdmodel  multidimensional (MD) modeling approach to represent more complex da...mdmodel  multidimensional (MD) modeling approach to represent more complex da...
mdmodel multidimensional (MD) modeling approach to represent more complex da...
anitha803197
 
Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8
Shani729
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
Sunita Sahu
 
When & Why\'s of Denormalization
When & Why\'s of DenormalizationWhen & Why\'s of Denormalization
When & Why\'s of Denormalization
Aliya Saldanha
 
Lecture 13
Lecture 13Lecture 13
Lecture 13
Shani729
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
vivekjv
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra
QUONTRASOLUTIONS
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
Dhiren Gala
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Animesh Srivastava
 
On multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and queryingOn multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and querying
Jaspreet Issaj
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
Ashish Chandwani
 
Lecture 3:Introduction to Dimensional Modelling.pptx
Lecture 3:Introduction to Dimensional Modelling.pptxLecture 3:Introduction to Dimensional Modelling.pptx
Lecture 3:Introduction to Dimensional Modelling.pptx
RehmahAtugonza
 
Dwh lecture 13-process dm
Dwh  lecture 13-process dmDwh  lecture 13-process dm
Dwh lecture 13-process dm
Sulman Ahmed
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
BsMath3rdsem
 
Intro to Data warehousing Lecture 04
Intro to Data warehousing   Lecture 04Intro to Data warehousing   Lecture 04
Intro to Data warehousing Lecture 04
AnwarrChaudary
 
Ad

More from AnwarrChaudary (20)

Intro to Data warehousing lecture 20
Intro to Data warehousing   lecture 20Intro to Data warehousing   lecture 20
Intro to Data warehousing lecture 20
AnwarrChaudary
 
Intro to Data warehousing lecture 19
Intro to Data warehousing   lecture 19Intro to Data warehousing   lecture 19
Intro to Data warehousing lecture 19
AnwarrChaudary
 
Intro to Data warehousing lecture 18
Intro to Data warehousing   lecture 18Intro to Data warehousing   lecture 18
Intro to Data warehousing lecture 18
AnwarrChaudary
 
Intro to Data warehousing lecture 17
Intro to Data warehousing   lecture 17Intro to Data warehousing   lecture 17
Intro to Data warehousing lecture 17
AnwarrChaudary
 
Intro to Data warehousing lecture 16
Intro to Data warehousing   lecture 16Intro to Data warehousing   lecture 16
Intro to Data warehousing lecture 16
AnwarrChaudary
 
Intro to Data warehousing lecture 15
Intro to Data warehousing   lecture 15Intro to Data warehousing   lecture 15
Intro to Data warehousing lecture 15
AnwarrChaudary
 
Intro to Data warehousing lecture 14
Intro to Data warehousing   lecture 14Intro to Data warehousing   lecture 14
Intro to Data warehousing lecture 14
AnwarrChaudary
 
Intro to Data warehousing lecture 13
Intro to Data warehousing   lecture 13Intro to Data warehousing   lecture 13
Intro to Data warehousing lecture 13
AnwarrChaudary
 
Intro to Data warehousing lecture 12
Intro to Data warehousing   lecture 12Intro to Data warehousing   lecture 12
Intro to Data warehousing lecture 12
AnwarrChaudary
 
Intro to Data warehousing lecture 11
Intro to Data warehousing   lecture 11Intro to Data warehousing   lecture 11
Intro to Data warehousing lecture 11
AnwarrChaudary
 
Intro to Data warehousing lecture 10
Intro to Data warehousing   lecture 10Intro to Data warehousing   lecture 10
Intro to Data warehousing lecture 10
AnwarrChaudary
 
Intro to Data warehousing lecture 09
Intro to Data warehousing   lecture 09Intro to Data warehousing   lecture 09
Intro to Data warehousing lecture 09
AnwarrChaudary
 
Intro to Data warehousing lecture 07
Intro to Data warehousing   lecture 07Intro to Data warehousing   lecture 07
Intro to Data warehousing lecture 07
AnwarrChaudary
 
Intro to Data warehousing Lecture 06
Intro to Data warehousing   Lecture 06Intro to Data warehousing   Lecture 06
Intro to Data warehousing Lecture 06
AnwarrChaudary
 
Intro to Data warehousing lecture 05
Intro to Data warehousing   lecture 05Intro to Data warehousing   lecture 05
Intro to Data warehousing lecture 05
AnwarrChaudary
 
Intro to Data warehousing lecture 03
Intro to Data warehousing   lecture 03Intro to Data warehousing   lecture 03
Intro to Data warehousing lecture 03
AnwarrChaudary
 
Intro to Data warehousing lecture 02
Intro to Data warehousing   lecture 02Intro to Data warehousing   lecture 02
Intro to Data warehousing lecture 02
AnwarrChaudary
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
AnwarrChaudary
 
Introduction to Software Engineering
Introduction to Software EngineeringIntroduction to Software Engineering
Introduction to Software Engineering
AnwarrChaudary
 
PDCA Plan Do Check Act
PDCA Plan Do Check ActPDCA Plan Do Check Act
PDCA Plan Do Check Act
AnwarrChaudary
 
Intro to Data warehousing lecture 20
Intro to Data warehousing   lecture 20Intro to Data warehousing   lecture 20
Intro to Data warehousing lecture 20
AnwarrChaudary
 
Intro to Data warehousing lecture 19
Intro to Data warehousing   lecture 19Intro to Data warehousing   lecture 19
Intro to Data warehousing lecture 19
AnwarrChaudary
 
Intro to Data warehousing lecture 18
Intro to Data warehousing   lecture 18Intro to Data warehousing   lecture 18
Intro to Data warehousing lecture 18
AnwarrChaudary
 
Intro to Data warehousing lecture 17
Intro to Data warehousing   lecture 17Intro to Data warehousing   lecture 17
Intro to Data warehousing lecture 17
AnwarrChaudary
 
Intro to Data warehousing lecture 16
Intro to Data warehousing   lecture 16Intro to Data warehousing   lecture 16
Intro to Data warehousing lecture 16
AnwarrChaudary
 
Intro to Data warehousing lecture 15
Intro to Data warehousing   lecture 15Intro to Data warehousing   lecture 15
Intro to Data warehousing lecture 15
AnwarrChaudary
 
Intro to Data warehousing lecture 14
Intro to Data warehousing   lecture 14Intro to Data warehousing   lecture 14
Intro to Data warehousing lecture 14
AnwarrChaudary
 
Intro to Data warehousing lecture 13
Intro to Data warehousing   lecture 13Intro to Data warehousing   lecture 13
Intro to Data warehousing lecture 13
AnwarrChaudary
 
Intro to Data warehousing lecture 12
Intro to Data warehousing   lecture 12Intro to Data warehousing   lecture 12
Intro to Data warehousing lecture 12
AnwarrChaudary
 
Intro to Data warehousing lecture 11
Intro to Data warehousing   lecture 11Intro to Data warehousing   lecture 11
Intro to Data warehousing lecture 11
AnwarrChaudary
 
Intro to Data warehousing lecture 10
Intro to Data warehousing   lecture 10Intro to Data warehousing   lecture 10
Intro to Data warehousing lecture 10
AnwarrChaudary
 
Intro to Data warehousing lecture 09
Intro to Data warehousing   lecture 09Intro to Data warehousing   lecture 09
Intro to Data warehousing lecture 09
AnwarrChaudary
 
Intro to Data warehousing lecture 07
Intro to Data warehousing   lecture 07Intro to Data warehousing   lecture 07
Intro to Data warehousing lecture 07
AnwarrChaudary
 
Intro to Data warehousing Lecture 06
Intro to Data warehousing   Lecture 06Intro to Data warehousing   Lecture 06
Intro to Data warehousing Lecture 06
AnwarrChaudary
 
Intro to Data warehousing lecture 05
Intro to Data warehousing   lecture 05Intro to Data warehousing   lecture 05
Intro to Data warehousing lecture 05
AnwarrChaudary
 
Intro to Data warehousing lecture 03
Intro to Data warehousing   lecture 03Intro to Data warehousing   lecture 03
Intro to Data warehousing lecture 03
AnwarrChaudary
 
Intro to Data warehousing lecture 02
Intro to Data warehousing   lecture 02Intro to Data warehousing   lecture 02
Intro to Data warehousing lecture 02
AnwarrChaudary
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
AnwarrChaudary
 
Introduction to Software Engineering
Introduction to Software EngineeringIntroduction to Software Engineering
Introduction to Software Engineering
AnwarrChaudary
 
PDCA Plan Do Check Act
PDCA Plan Do Check ActPDCA Plan Do Check Act
PDCA Plan Do Check Act
AnwarrChaudary
 
Ad

Recently uploaded (20)

Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptxUnit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Mayuri Chavan
 
Pope Leo XIV, the first Pope from North America.pptx
Pope Leo XIV, the first Pope from North America.pptxPope Leo XIV, the first Pope from North America.pptx
Pope Leo XIV, the first Pope from North America.pptx
Martin M Flynn
 
Module 1: Foundations of Research
Module 1: Foundations of ResearchModule 1: Foundations of Research
Module 1: Foundations of Research
drroxannekemp
 
Cyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top QuestionsCyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top Questions
SONU HEETSON
 
Botany Assignment Help Guide - Academic Excellence
Botany Assignment Help Guide - Academic ExcellenceBotany Assignment Help Guide - Academic Excellence
Botany Assignment Help Guide - Academic Excellence
online college homework help
 
How to Manage Manual Reordering Rule in Odoo 18 Inventory
How to Manage Manual Reordering Rule in Odoo 18 InventoryHow to Manage Manual Reordering Rule in Odoo 18 Inventory
How to Manage Manual Reordering Rule in Odoo 18 Inventory
Celine George
 
Origin of Brahmi script: A breaking down of various theories
Origin of Brahmi script: A breaking down of various theoriesOrigin of Brahmi script: A breaking down of various theories
Origin of Brahmi script: A breaking down of various theories
PrachiSontakke5
 
PUBH1000 Slides - Module 11: Governance for Health
PUBH1000 Slides - Module 11: Governance for HealthPUBH1000 Slides - Module 11: Governance for Health
PUBH1000 Slides - Module 11: Governance for Health
JonathanHallett4
 
How to Configure Extra Steps During Checkout in Odoo 18 Website
How to Configure Extra Steps During Checkout in Odoo 18 WebsiteHow to Configure Extra Steps During Checkout in Odoo 18 Website
How to Configure Extra Steps During Checkout in Odoo 18 Website
Celine George
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-14-2025 .pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-14-2025  .pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-14-2025  .pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-14-2025 .pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
LDMMIA Reiki Yoga S6 Free Workshop Money Pt 2
LDMMIA Reiki Yoga S6 Free Workshop Money Pt 2LDMMIA Reiki Yoga S6 Free Workshop Money Pt 2
LDMMIA Reiki Yoga S6 Free Workshop Money Pt 2
LDM & Mia eStudios
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
How To Maximize Sales Performance using Odoo 18 Diverse views in sales module
How To Maximize Sales Performance using Odoo 18 Diverse views in sales moduleHow To Maximize Sales Performance using Odoo 18 Diverse views in sales module
How To Maximize Sales Performance using Odoo 18 Diverse views in sales module
Celine George
 
COPA Apprentice exam Questions and answers PDF
COPA Apprentice exam Questions and answers PDFCOPA Apprentice exam Questions and answers PDF
COPA Apprentice exam Questions and answers PDF
SONU HEETSON
 
Look Up, Look Down: Spotting Local History Everywhere
Look Up, Look Down: Spotting Local History EverywhereLook Up, Look Down: Spotting Local History Everywhere
Look Up, Look Down: Spotting Local History Everywhere
History of Stoke Newington
 
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docxPeer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
19lburrell
 
GENERAL QUIZ PRELIMS | QUIZ CLUB OF PSGCAS | 4 MARCH 2025 .pdf
GENERAL QUIZ PRELIMS | QUIZ CLUB OF PSGCAS | 4 MARCH 2025 .pdfGENERAL QUIZ PRELIMS | QUIZ CLUB OF PSGCAS | 4 MARCH 2025 .pdf
GENERAL QUIZ PRELIMS | QUIZ CLUB OF PSGCAS | 4 MARCH 2025 .pdf
Quiz Club of PSG College of Arts & Science
 
The History of Kashmir Lohar Dynasty NEP.ppt
The History of Kashmir Lohar Dynasty NEP.pptThe History of Kashmir Lohar Dynasty NEP.ppt
The History of Kashmir Lohar Dynasty NEP.ppt
Arya Mahila P. G. College, Banaras Hindu University, Varanasi, India.
 
CNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscessCNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscess
Mohamed Rizk Khodair
 
How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18
Celine George
 
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptxUnit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Mayuri Chavan
 
Pope Leo XIV, the first Pope from North America.pptx
Pope Leo XIV, the first Pope from North America.pptxPope Leo XIV, the first Pope from North America.pptx
Pope Leo XIV, the first Pope from North America.pptx
Martin M Flynn
 
Module 1: Foundations of Research
Module 1: Foundations of ResearchModule 1: Foundations of Research
Module 1: Foundations of Research
drroxannekemp
 
Cyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top QuestionsCyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top Questions
SONU HEETSON
 
Botany Assignment Help Guide - Academic Excellence
Botany Assignment Help Guide - Academic ExcellenceBotany Assignment Help Guide - Academic Excellence
Botany Assignment Help Guide - Academic Excellence
online college homework help
 
How to Manage Manual Reordering Rule in Odoo 18 Inventory
How to Manage Manual Reordering Rule in Odoo 18 InventoryHow to Manage Manual Reordering Rule in Odoo 18 Inventory
How to Manage Manual Reordering Rule in Odoo 18 Inventory
Celine George
 
Origin of Brahmi script: A breaking down of various theories
Origin of Brahmi script: A breaking down of various theoriesOrigin of Brahmi script: A breaking down of various theories
Origin of Brahmi script: A breaking down of various theories
PrachiSontakke5
 
PUBH1000 Slides - Module 11: Governance for Health
PUBH1000 Slides - Module 11: Governance for HealthPUBH1000 Slides - Module 11: Governance for Health
PUBH1000 Slides - Module 11: Governance for Health
JonathanHallett4
 
How to Configure Extra Steps During Checkout in Odoo 18 Website
How to Configure Extra Steps During Checkout in Odoo 18 WebsiteHow to Configure Extra Steps During Checkout in Odoo 18 Website
How to Configure Extra Steps During Checkout in Odoo 18 Website
Celine George
 
LDMMIA Reiki Yoga S6 Free Workshop Money Pt 2
LDMMIA Reiki Yoga S6 Free Workshop Money Pt 2LDMMIA Reiki Yoga S6 Free Workshop Money Pt 2
LDMMIA Reiki Yoga S6 Free Workshop Money Pt 2
LDM & Mia eStudios
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
How To Maximize Sales Performance using Odoo 18 Diverse views in sales module
How To Maximize Sales Performance using Odoo 18 Diverse views in sales moduleHow To Maximize Sales Performance using Odoo 18 Diverse views in sales module
How To Maximize Sales Performance using Odoo 18 Diverse views in sales module
Celine George
 
COPA Apprentice exam Questions and answers PDF
COPA Apprentice exam Questions and answers PDFCOPA Apprentice exam Questions and answers PDF
COPA Apprentice exam Questions and answers PDF
SONU HEETSON
 
Look Up, Look Down: Spotting Local History Everywhere
Look Up, Look Down: Spotting Local History EverywhereLook Up, Look Down: Spotting Local History Everywhere
Look Up, Look Down: Spotting Local History Everywhere
History of Stoke Newington
 
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docxPeer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
19lburrell
 
CNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscessCNS infections (encephalitis, meningitis & Brain abscess
CNS infections (encephalitis, meningitis & Brain abscess
Mohamed Rizk Khodair
 
How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18
Celine George
 

Intro to Data warehousing lecture 08

  • 1. 1 Data Warehousing Dimensional Modeling (DM) Ch Anwar ul Hassan (Lecturer) Department of Computer Science and Software Engineering Capital University of Sciences & Technology, Islamabad Pakistan anwarchaudary@gmail.com
  • 2. 2 The need for ER modeling?  Problems with early COBOLian data processing systems.  Data redundancies  From flat file to Table, each entity ultimately becomes a Table in the physical schema.  Simple O(n2) join to work with tables
  • 3. 3 Why ER Modeling has been so successful?  Coupled with normalization drives out all the redundancy out of the database.  Change (or add or delete) the data at just one point.  Can be used with indexing for very fast access.  Resulted in success of OLTP systems.
  • 4. 4 Need for DM: Un-answered Qs  Lets have a look at a typical ER data model first.  Some Observations:  All tables look-alike, as a consequence it is difficult to identify:  Which table is more important ?  Which is the largest?  Which tables contain numerical measurements of the business?  Which table contain nearly static descriptive attributes?
  • 5. Is DM really needed? In order to better understand the need for DM lets have a look at the diagram showing the retail data in simplified 3N 5
  • 6. 6 Need for DM: Complexity of Representation  Many topologies for the same ER diagram, all appearing different. Very hard to visualize and remember. A large number of possible connections to any two (or more) tables 1 10 3 12 2 6 5 11 4 7 8 9 1 10 3 12 2 6 5 11 4 7 8 9
  • 7. 7 Need for DM: The Paradox  The Paradox: Trying to make information accessible using tables resulted in an inability to query them!  ER and Normalization result in large number of tables which are:  Hard to understand by the users (DB programmers)  Hard to navigate optimally by DBMS software  Real value of ER is in using tables individually or in pairs  Too complex for queries that span multiple tables with a large number of records
  • 8. 8 ER vs. DM ER DM Constituted to optimize OLTP performance. Constituted to optimize DSS query performance. Models the micro relationships among data elements. Models the macro relationships among data elements with an overall deterministic strategy. A wild variability of the structure of ER models. All dimensions serve as equal entry points to the fact table. Very vulnerable to changes in the user's querying habits, because such schemas are asymmetrical. Changes in users' querying habits can be accommodated by automatic SQL generators.
  • 9. 9 How to simplify a ER data model? Two general methods:  De-Normalization  Dimensional Modeling (DM)
  • 10. 10 What is DM?…  A simpler logical model optimized for decision support.  Inherently dimensional in nature, with a single central fact table and a set of smaller dimensional tables.  Multi-part key for the fact table  Dimensional tables with a single-part PK.  Keys are usually system generated
  • 11. 11 What is DM?...  Results in a star like structure, called star schema or star join.  All relationships mandatory M-1.  Single path between any two levels.  Supports ROLAP operations.
  • 12. 12 Dimensions have Hierarchies Items Books Cloths Fiction Text Men Women MedicalEngg Analysts tend to look at the data through dimension at a particular “level” in the hierarchy
  • 14. 14 “Simplified” 3NF (Retail) CITY DISTRICT 1 ZONE CITY DISTRICT DIVISION MONTH QTR STORE # STREET ZONE ... WEEK MONTH DATE WEEK RECEIPT # STORE # DATE ... ITEM #RECEIPT # ... $ ITEM # CATEGORY ITEM # DEPTCATEGORY year month week sale_header store sale_detail item_x_cat item_x_splir cat_x_dept M 1 M 1M 1 M 1 1 M M 1 M M M1 1 M 1 1 M YEAR QTR 1 M quarter SUPPLIER DIVISION PROVINCEM 1 BACK division district zone
  • 15. 15 Vastly Simplified Star Schema RECEIPT# STORE# DATE ITEM# M Fact Table ITEM# CATEGORY DEPT SUPPLIER Product Dim M Sale Rs. M STORE# ZONE CITY PROVINCE Geography Dim DISTRICT DATE WEEK QUARTER YEAR Time Dim MONTH . . . 1 1 1 facts DIVISION
  • 16. 16 The Benefit of Simplicity Beauty lies in close correspondence with the business, evident even to business users.
  • 17. 17 Features of Star Schema Dimensional hierarchies are collapsed into a single table for each dimension. Loss of Information? A single fact table created with a single header from the detail records, resulting in:  A vastly simplified physical data model!  Fewer tables (thousands of tables in some ERP systems).  Fewer joins resulting in high performance.  Quantifying space requirement
  • 18. 18 The Process of Dimensional Modeling Four Step Method from ER to DM 1. Choose the Business Process 2. Choose the Grain 3. Choose the Facts 4. Choose the Dimensions
  • 19. 19 Step-1: Choose the Business Process  A business process is a major operational process in an organization.  Typically supported by a legacy system (database) or an OLTP.  Examples: Orders, Invoices, Inventory etc.  Business Processes are often termed as Data Marts and that is why many people criticize DM as being data mart oriented.
  • 21. 21 Step-2: Choosing the Grain  Grain is the fundamental, atomic level of data to be represented.  Grain is also termed as the unit of analyses.  Example grain statements  Typical grains  Individual Transactions  Daily aggregates (snapshots)  Monthly aggregates  Relationship between grain and expressiveness.  Grain vs. hardware trade-off.
  • 22. 22 Step-2: Relationship b/w Grain Daily aggregates 6 x 4 = 24 values Four aggregates per week 4 x 4 = 16 values Two aggregates per week 2 x 4 = 8 values LOW Granularity HIGH Granularity
  • 23. 23 The case FOR data aggregation  Works well for repetitive queries.  Follows the known thought process.  Justifiable if used for max number of queries.  Provides a “big picture” or macroscopic view.
  • 24. 24 The case AGAINST data aggregation Aggregation is irreversible.  Can create monthly sales data from weekly sales data, but the reverse is not possible. Aggregation limits the questions that can be answered.  What, when, why, where, what-else, what-next
  • 25. 25 The case AGAINST data aggregation Aggregation can hide crucial facts. The average of 100 & 100 is same as 150 & 50
  • 26. 26 Aggregation hides crucial facts Example Week-1 Week-2 Week-3 Week-4 Average Zone-1 100 100 100 100 100 Zone-2 50 100 150 100 100 Zone-3 50 100 100 150 100 Zone-4 200 100 50 50 100 Average 100 100 100 100 Just looking at the averages i.e. aggregate
  • 27. 27 Aggregation hides crucial facts chart 0 50 100 150 200 250 Week-1 Week-2 Week-3 Week-4 Z1 Z2 Z3 Z4 Z1: Sale is constant (need to work on it) Z2: Sale went up, then fell (need of concern) Z3: Sale is on the rise, why? Z4: Sale dropped sharply, need to look deeply. W2: Static sale
  • 28. 28 “We need monthly sales volume and Rs. by week, product and Zone” Facts Dimensions Step 3: Choose Facts statement
  • 29. 29  Choose the facts that will populate each fact table record.  Remember that best Facts are Numeric, Continuously Valued and Additive.  Example: Quantity Sold, Amount etc. Step 3: Choose Facts
  • 30. 30  Choose the dimensions that apply to each fact in the fact table.  Typical dimensions: time, product, geography etc.  Identify the descriptive attributes that explain each dimension.  Determine hierarchies within each dimension. Step 4: Choose Dimensions
  • 31. 31 Step-4: How to Identify a Dimension?  The single valued attributes during recording of a transaction are dimensions. Calendar_Date Time_of_Day Account _No ATM_Location Transaction_Type Transaction_Rs Fact Table Dim Time_of_day: Morning, Mid Morning, Lunch Break etc. Transaction_Type: Withdrawal, Deposit, Check balance etc.
  • 32. 32 Step-4: Can Dimensions be Multi-valued?  Are dimensions ALWYS single?  Not really  What are the problems? And how to handle them  Calendar_Date (of inspection)  Reg_No  Technician  Workshop  Maintenance_Operation  How many maintenance operations are possible?  Few  Maybe more for old cars.
  • 33. 33 Step-4: Dimensions & Grain  Several grains are possible as per business requirement.  For some aggregations certain descriptions do not remain atomic.  Example: Time_of_Day may change several times during daily aggregate, but not during a transaction  Choose the dimensions that are applicable within the selected grain.
  翻译: