Sie sind auf Seite 1von 8

Exercise #4 ETL

Group 1:
- Nguyen Anh Vu
- Pham Quoc Tuan
- Tran Duy Trung

LOGICAL MODEL
Internal order fact (Order from supermarket to company)

SUPPLIER_DIM
SUPPLIER_ID (PK) AREA_DIM
SUPPLIER_NAME SUP_ID (PK)
PROD_DIM SUP_NAME
PROD_ID (PK) ADDR
PROD_NAME DISTRICT
PROD_GROUP (FK) CITY
SUPPLIER_ID(FK)
PROD_GROUP
INTERNAL_ORDER
PROD_GROUP (PK)
SUP_ID (FK)
GROUP_NAME
PROD_ID (FK) PK
DATE_ID (FK)

Quantity

D_IN_ORDER_DIM
DATE_ID
DATE
MONTH
YEAR
1. Mapping rules to DW conceptual schema:
In this step, we analysis what the attributions have the common meaning between DB1 and
DB2 (DB1 and DB2 column). After that, based on the DW logical design, we choose what
attributions will be used and extracted.

DB1 DB2 Storage DS DDS

SALE BILLNO

TOTAL_VA S_AMOUNT
LUE

B_DATE S_DATE

NAME_
CASHIER

SM_NAME SM_ADDRESS AREA_DIM/STORE


_ADDR

PRODNO PROD_CODE PROD_ID PROD_DIM/PROD_ID

QTY S_QTY

UNIT_PRIC U_PRICE
E

PRODUC PRODNO PROD_CODE PROD_ID PROD_DIM/PROD_ID


T

P_NAME PROD_NAME PROD_NAME PROD_DIM/PROD_NAME

UNIT_PRIC
E

PROD_CA PROD_CAT GROUP_NAME PROD_GROUP/PROD_GROUP


T

SM_WH PRODNO PROD_CODE PROD_ID PROD_DIM/PROD_ID

QTY_IN_S QTY_IN_STO
TOCK CK
DAMAGED DAMAGED_Q
_QTY TY

REQUEST ORDERED_Q REQUESTED_Q INTERNAL_ORDER/ QUANTITY


ED_QTY TY TY

DATE ORDER_DATE D_IN_ORDER_DIM

For mapping between Stage DS to NDS and to DDS

Stage DS NDS DDS

PROD_ID PROD_ID PROD_DIM/PROD_ID

PROD_NAME PROD_NAME PROD_DIM/PROD_NAME

GROUP_NAME PROD_GROUP PROD_GROUP/PROD_GROUP

REQUESTED_QTY QUANTITY INTERNAL_ORDER/ QUANTITY

ORDER_DATE ORDER_DATE/DATE D_IN_ORDER_DIM/DATE

ORDER_DATE/MONTH D_IN_ORDER_DIM/MONTH

ORDER_DATE/YEAR D_IN_ORDER_DIM/YEAR

SM_ADDRESS SUPERMARKET/SM_NAME AREA_DIM/SUPERMARKET

SUPERMARKET/SM_STREET AREA_DIM/STREET

SUPERMARKET/SM_DISTRICT AREA_DIM/DISTRICT

SUPERMARKET/SM_CITY AREA_DIM/CITY

We designed to store data in Stage DS and in NDS in Relational Database.

Schema for Stage DS

SUPERMARKET_TABLE: {
PROD_ID: Integer.
ORDER_DATE: String
PROD_NAME: String
GROUP_NAME: String
REQUESTED_QTY: Integer
SM_ADDRESS: String}
Example:
SUPERMARKET_TABLE:
PROD_ID* PROD_NAME* GROUP_NAME REQUESTED_QTY ORDER_DATE SM_ADDRESS

5 Dieu Hong Fish 10 25 Nov 2016 Big C Thao


Fish Diem, số 12
đường Quốc
Hương,
phường Thảo
Điền, quận 2,
HCM

134 Ba Co Gai Canned food 100 06/23/20016 Big C Go Vap,


792 Nguyễn
Kiệm, P.3,
Q.Gò Vấp,
Ho Chi Minh
34 OMO detergent 100 23/12/16 BIG C Hoang
Van Thu, 202B
Hoàng Văn
Thụ, P.9, Q.
Phú Nhuận,
TP.HCM.

45 Neptune food 10 22-9-2016 No Information

Schema for NDS:


Database in NDS includes three tables:

SUPERMARKET: {
SM_ID: Integer
SM_NAME: String
SM_STREET: String
SM_DISTRICT: String
SM_CITY: String }

ORDER_DATE: {
DATE_ID: Integer
DATE: Integer (from 0 to 31)
MONTH: Integer (from 1 to 12)
YEAR: Interger }

SUPERMARKET_ORDER: {
PROD_ID: Integer
DATE_ID: Integer
PROD_NAME: String
PROD_GROUP: String
QUANTITY: Integer
SM_ID: Integer }

Example:
SUPERMARKET:
SM_ID SM_NAME SM_STREET SM_DISTRICT SM_CITY
0 No information No Information No Information No Information

12 Big C Go Vap 792 Nguyen Go Vap Ho Chi Minh


Kiem
5 Big C Hoang 292B Hoang Van Phu Nhuan Ho Chi Minh
Van Thu Thu
1 Big C Thao Dien 12 Quoc Huong 2 Ho Chi Minh

ORDER_DATE
DATE_ID DATE MONTH YEAR
1 25 11 2016
2 23 6 2016
3 23 12 2016
4 22 9 2016

SUPERMARKET_ORDER
PROD_ID DATE_I PROD_NAME PROD_GROUP QUANTITY SM_ID
D
5 1 Dieu Hong Fish Fish 10 1
134 2 Ba Co Gai canned food 100 12
34 3 OMO detergent 100 5
45 4 Neptune Food 100 0

2. Transformation:
2.1 Transformation in Extraction step:
The tasks of transformation in this step is changing different data types in sources to common
data type in database in Stage DS and filling missing value.
Change data type from excel file to database in Stage DS:
- Data type in excel file is not standard, it changes depended on Microsoft Excel or
manually by human. Thus, the data type of data in excel file is not consistent. We have
to change the data type read from excel file to common data type in Stage DS.
For example:

Value in excel Value in Stage DS


PRODNO 0005 (string) 5 (integer)
DATE 22-9-2016 (date) 22-9-2016 (string)
Filling missing value:
- Since in DB2, there is no information of Supermarket ( as SM_NAME_ADDR in DB1), we will
replace supermarket information from DB2 by “no information” value.

2.2 Transformation in from Stage DS to NDS:


Transformation from Stage DS to NDS includes normalization, standardization and correction.
ORDER_DATE:

Normalizatio Standardization Correctio


n n
25 Nov 2016 => Date: 25, => Date: 25,
Month: Nov Month: 11
Year: 2016 Year: 2016
06/23/20016 => Date: 23 => Date: 23 => Date
Month: 06 Month: 6 : 23
Year: Year: Mont
20016 20016 h: 6
Year
:
2016
23/12/16 => Date: 23 => Date: 23
Month: 12 Month: 12
Year: 16 Year: 2016
22-9-2016 => Date: 22 => Date: 22
Month: 9 Month: 9
Year: 2016 Year: 2016

Normalization rule: separate each part in original date data with “/” or “ “ or “-“. Third part is
year. First part and second part: if it is greater than 12, it is Date, else, it is month.

SM_ADDRESS:
Normalizatio Standardizatio Correctio
n n n
Big C Thao => Name: Big C => Name: Big => Na
Diem, số 12 Thao Diem C Thao me:
đường Street: số Diem Big
Quốc 12 đường Street: 12 C
Hương, Quốc Quoc Huong Tha
phường Hương District: 2 o
Thảo Điền, District: City: Ho Chi Die
quận 2, quận 2 Minh n
HCM City: HCM Str
eet:
12
Qu
ốc

ơng
Dis
tric
t: 2
Cit
y:
Ho
Chi
Min
h
Big C Go => Name: Big C => Name: Big D
Vap, 792 Go Vap C Go Vap
Nguyễn Street: 792 Street: 792
Kiệm, P.3, Nguyễn Nguyen
Q.Gò Vấp, Kiệm Kiem
Ho Chi District: District: Go
Minh Q.Gò Vấp Vap
City: Ho City: Ho
Chi Minh Chi Minh
BIG C Hoang => Name: BIG => Name: BIG
Van Thu, 202B C Hoang C Hoang
Hoàng Văn Van Thu Van Thu
Thụ, P.9, Q. Street: 202B Street:
Phú Nhuận, Hoàng Văn 202B Hoang
TP.HCM. Thu District: Van Thu
Q. Phú District:
Nhuận Phu Nhuan
City: City: Ho Chi
TP.HCM. Minh

Normalization rule: separate each part in original address data with “,” or “. “ or “-“. First parth is
supermarket name, second part is street number, fourth part is district, fifth part is city.

3.Method for the extraction and the loading steps:


3.1 The extraction step:
As the purpose of data warehouse is for optimization business goal, the data in DW will be
queried and analyzed periodically, then we use the incremental extraction by querying
periodically from the source, for example, the data is updated by the end of each month.
3.1.1 Extraction from Source Database to Stage DS
3.1.2 Extraction from Source Excel file to Stage DS
3.2 Loading step:
3.2.1 The data set to load dimension tables and to derive the fact measure is showed in
table 1
Since it’s natural to append time data, we use SDC 2 for D_IN_ORDER_DIM.
The address of store rarely change so we use SDC 1 for AREA_DIM.
For PROD_DIM, we use SDC 3.

3.2.2 Materialized view loading:


In our DW conceptual model, INTERNAL_ORDER is a base fact. We just have to
load base fact for it.
The update order of views should be determine if there is more fact in DW.
For example:
(AREA_Store, Day, Prod) --> (AREA_Distric, Month, Prod) -->
(AREA_District, Month, Prod_Group)

Das könnte Ihnen auch gefallen