Sie sind auf Seite 1von 11

International

Journal of Information
Technology OF
& Management
Information System
(IJITMIS), ISSN 0976
INTERNATIONAL
JOURNAL
INFORMATION
TECHNOLOGY
&
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME

MANAGEMENT INFORMATION SYSTEM (IJITMIS)

ISSN 0976 6405(Print)


ISSN 0976 6413(Online)
Volume 5, Issue 2, May - August (2014), pp. 40-50
IAEME: http://www.iaeme.com/IJITMIS.asp
Journal Impact Factor (2014): 6.2217 (Calculated by GISI)
www.jifactor.com

IJITMIS
IAEME

KNOWLEDGE DISCOVERY FROM VEHICLE E-GOVERNANCE


DATA USING DATA WAREHOUSING AND DATA MINING
Pushpal Desai1
1

(M.Sc. (I.T.) Programme, VNSGU, Surat, India)

ABSTRACT
In this paper, multi dimensional schema design, data cube and OLAP operations on
Vehicle e-governance data is discussed. The proposed data mining model and its
implementation on Vehicle e-governance data is also discussed. In the first phase, Clustering
data mining algorithm is implemented to identify important clusters from the Vehicle egovernance data. In the second phase, Association Rules Mining algorithm is applied to
explore novel relationships from the important data clusters observed in the first phase. The
results indicate that novel relationship can be found using the proposed model.
Keywords: Clustering, Association Rules Analysis, Microsoft SQL Server Analysis
Services.
I.

INTRODUCTION

Inmon who is known as the father of data warehousing defines a data warehouse as a
subject oriented, integrated, nonvolatile, and time variant collection of data in support of
management decisions [7] [8]. Hen and Kamber defined data mining as Extracting or
mining knowledge from large amount of data [2]. The data warehouse and data mining
algorithm are applied in various domains for knowledge discovery. The data warehouse and
data mining algorithms are successfully used in Banking, Insurance, Finance, Marketing,
Education, Telecommunication, Medical Science, Power Industry, Weather Forecasting,
Product Design, Customer Relationship Management (CRM) etc In our earlier research
work, we tried to find association rules from Birth registration, Decease Registration,
Property and Vehicle e-governance data [4] [5] [6]. In this article, Association Rules
algorithm is applied to find interesting patterns and relationship from the different clusters of
Vehicle e-governance data.
40

International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME

II.

METHODOLOGY

To provide better understanding of proposed knowledge discovery model, a flowchart for the proposed model as shown in the Figure 1. The proposed model for knowledge
discovery involves three major phases.
In the first phase, various data preprocessing tasks on source data to convert into clean
and consistent data.

Fig 1: Proposed Model for Knowledge discovery for e-governance data


In the second phase, data warehouse is designed considering various analytical needs
of the organization from the preprocessed data. In the first task, various dimensions, fact and
measures are indentified keeping in mind organizations analytical purpose. In the next task,
the multidimensional schema design is developed considering various dimensions tables and
fact tables. In the last task, data cubes are created and perform various OLAP operations like
data drill, slice, dice etcon it.
In the third phase, clustering and association rules mining algorithms are used to
discover knowledge from the data warehouse. In the first task, clustering algorithm is applied
on data cube to indentify major clusters or group from the data cube. In the second task,
association rules mining algorithm is applied to find novel and interesting relationship from
the data clusters observed in the first task.
41

International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME

III.

MULTIDIMENSIONAL SCHEMA DESIGN

The multidimensional schema is designed for Vehicle e-governance data and OLAP
operations are performed on data cubes for data analysis. Typically, automobile companies
keep on adding new models and hence frequent updates are required in the data warehouse.
The Snowflake schema design is proposed because the vehicles models can be easily
updated in the data warehouse. The Snowflake Schema which stores data in normalized form
allows us to easily update data in the data warehouse. In contrast to Snowflake, the Star
Schema design stores data in de-normalized form and that make it difficult to update data in
the data warehouse. In the proposed Snowflake Schema design, VehicleRegistrationbase
Table was used as the Fact Table and Modaelmasterbase, Companynamemaster,
Vehicletypebase and Sitemaster were used as Dimension Tables. There are many measures
like Vehicle Registration Count, Vehicle Amount, Tax amount. The Figure 2 shows the
proposed Snowflake Schema design of the Vehicle data.

Fig 2: Proposed Snowflake Schema Design for Vehicle Data


After implementing Snowflake schema, Data Cube are created and various OLAP
operations are performed like Slice, Dice, Drill-down and Roll-up on Vehicle Data Cube
using Microsoft SQL Server Analysis Services [1] [3] .

42

International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME

IV.

CLUSTERING

The Owner Surname, Vehicle Model, Vehicle Company, Vehicle Type, Vehicle
Price, Vehicle Tax and Registration Year are used as input parameters and Registration Date
is used as key column and generated Clustering model for Vehicle Data Cube. The Cluster
model is used to identify important group of data from the source data. The Clustering is
performed using K-mean algorithm using Microsoft Analysis Service [1] [3].

Fig 3: Proposed Clustering Data Mining Model for Vehicle Data Cube.
V.

ASSOCIATION RULES MINING

The Association Rules algorithm was applied on Vehicle Cluster data. For example,
to find interesting relationship from vehicle data, Car, Motorcycle, Autorikshaw and
Moped and Scooter clusters data are used. The Owner Surname, Vehicle Type were used as
input fields and Vehicle Company and Vehicle Model Name were used as predict only
attributes. The Apriori algorithm was used to find Association Rules from important data
clusters [1] [3].
43

International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME

Fig 4: Proposed Association Rules Data Mining Model for Vehicle Data Clusters.
VI.

RESULTS

The data cube was created considering Vehicle Registration Count and VAT
measures. The Model Masterbase, Vehicle Typebase, Year master, Site master and
Company master tables were selected as dimension tables.

Fig 5: Vehicle Data cubes Dimensions and Measures


44

International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME

In the drill-down and dice operation on Vehicle Data Cube, two dimensions Year
and Company Code were selected. The Registration Year value as 2003 to 2005 and
Company Code value as 1 Hero Honda were selected. The result shown in the Figure 6
indicates that 48,189 vehicles are registered of Hero Honda company during the year
2003 to the year 2005.

Fig 6: Drill-down and Dice operation on Vehicle Data Cube with Two Dimensions
In further drill-down operation on Vehicle Data Cube, Model Id dimension with
value 1 Splender was added. The Figure 7 shows that 18,015 are registered for this
particular vehicle model. The roll-up operation can be performed on all above data cubes by
removing various dimensions used in drill-down operations.

Fig 7: Drill-down and Dice operations on Vehicle Data Cube with Three Dimensions
The Clustering data mining algorithm created 10 Clusters from the Vehicle data. The
Cluster Diagram of the same is shown in the Figure 8.

Fig 8: Cluster Diagram for Vehicle data


45

International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME

The Cluster profile of the same model indicates presence of variables like Company
Name, Owner Surname and Vehicle Type Name. The variable Company Name name
has many states such as "Maruti Suzuki", "Hero Honda", "Bajaj", "Honda", "Huyndai",
"Tata Motors", "TVS" and Others. The variable Owner Surname has different states like
"Patel", "Wala", "Shah", "Desai", "SINGH", "SHAIKH", "KHAN", "PATIL" and Others.
The Vehicle Type Name variable has "CAR", "MOTORCYCLE", "AUTORICKSHAW",
"MOPED_SCOOTER" and "COMMERCIAL" states.

Fig 9: Cluster Profile for Vehicle data Clustering Model


To properly understand each cluster data and to answer questions such as:
Which clusters contain data of AUTORICKSHAW? What are the names of the
Companies that manufactured the AUTORICKSHAW? What are the Surnames of citizens
who purchased AUTORICKSHAW?
Which clusters contain data of MOTORCYCLE? What are the names of the
companies which manufactured the MOTORCYCLE? What are the Surnames of citizens
who purchased MOTORCYCLE?
To answer such questions, cluster diagrams shading variables are used to understand
impact of different variables with its states. For example, the Vehicle Type Name with
AUTORICKSHAW state result is shown in the Figure 10. The result indicates that Cluster
7 is having 100% population for AUTORICKSHAW state.

46

International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME

Figure 10: Cluster Diagram for Vehicle data (Vehicle Type Name =
AUTORICKSHAW)
Further analysis can be performed by viewing characteristics of Cluster 7. The
Cluster 7 characteristic is provided in the Figure 11.

Figure 11: Cluster 7 Characteristic for Vehicle data


The Vehicle Type Name variable with Motorcycle value and its cluster diagram
is shown in the Figure 12. The result indicates that Cluster 3, Cluster 4 and Cluster 9 are
having population for this state. The Cluster 3 is having 100% population for the value
Motorcycle.

47

International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME

Figure 12: Cluster Diagram for Vehicle data (Vehicle Type Name = Motorcycle)
The characteristics of the Cluster 3 shown in the Figure 13 indicate that Company
Name variable is present with two values Hero Honda and TVS. For the Hero Honda
value the probability is 97.09% percent where as for TVS value the probability is 1.23%.
For the Owner Surname field the value Patel is present with 92.72% probability.

Figure 13: Cluster 3 Characteristic for Vehicle data

48

International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME

In Vehicle Data Cluster, Company Name, Vehicle Type and Model Name
variables were used to find novel relationship. The Company Name and Vehicle Type
were used as input fields and Model Name was used as predict only attribute. Many
interesting relationships were found from the association rules mining model.
For example, the results provided in the Table 1 indicate that for Ford companys
car model Ford Ikon is likely to be sold in the city of Surat. Similarly, Toyota
companys car model Qualis, Honda companys car model Honda City, Mahindra
companys car model Scorpio, Fiat companys car model Palio , Tata Motors
companys car model Indica and Huyndai companys car model Santro most likely to
be sold in the city of Surat.
Table 1: Association Rules for Company Name, Vehicle Type=Car and Model Name
attributes
Rule
Confidence
0.866

Rule
Importance
4.476980868

0.7

4.381475564

0.941

4.209951446

0.694

4.08037921

0.68

4.072957414

0.719

3.680400153

0.749

3.613327965

Association Rules
Company Name = Ford, Vehicle Type Name = CAR -> Model Name =
FORD IKON
Company Name = Toyota, Vehicle Type Name = CAR -> Model Name =
QUALIS
Company Name = Honda, Vehicle Type Name = CAR -> Model Name =
HONDA CITY
Company Name = Mahindra, Vehicle Type Name = CAR -> Model Name =
SCORPIO TURBO
Company Name = Fiat, Vehicle Type Name = CAR -> Model Name =
PALIO
Company Name = Tata Motors, Vehicle Type Name = CAR -> Model Name
= INDICA
Company Name = Huyndai, Vehicle Type Name = CAR -> Model Name =
SANTRO

Similarly, interesting relationship between moped / scooter manufacturer and its


model were found. The results provided in the Table 2 suggest that for Suzuki companys
model Access 125, Honda companys model Honda Activa, TVS companys model
Pep and Hero Honda companys model Pleasure is most likely to be sold.
Table 2: Association Rules for Company Name, Vehicle Type=Moped_Scooter and
Model Name attributes
Rule
Confidence
0.828

Rule
Importance
4.458154689

0.83

3.362508672

0.434

3.060536766

0.49

2.973270492

0.711

2.7995445

Association Rules
Company Name = SUZUKI, Vehicle Type Name = MOPED_SCOOTER ->
Model Name = ACCESS 125
Company Name = Honda, Vehicle Type Name = MOPED_SCOOTER ->
Model Name = HONDA ACTIVA
Vehicle Type Name = MOPED_SCOOTER -> Model Name = HONDA
ACTIVA
Company Name = TVS, Vehicle Type Name = MOPED_SCOOTER ->
Model Name = PEP
Vehicle Type Name = MOPED_SCOOTER, Company Name = Hero Honda
-> Model Name = PLEASURE

49

International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME

VII.

CONCLUSION

The practical research demonstrates that data cube operations such as drill-down, rollup, slice and dice could be extremely useful to administrator working at the municipal
corporation, as they are able to query data considering several dimensions. The cube
operations also provide lot of freedom to the administrators as query is not fixed in nature
like we normally find in OLTP systems. Data cube operations allow administrators to
execute ad hoc queries which not possible in the OLTP systems. These results can be utilized
by automobile companies to increase sales of their products by focusing on specific
community residing in the city of Surat. The results are unique in sense that e-governance
data can be utilized by private companies to increase their sales, improve marketing of the
product and analyze the vehicle purchase trend of the citizens. Furthermore, results of
Clustering and Association Rules data mining gives better understanding of data and finds
hidden trends and new relationships from e-governance data.
VIII. LIMITATIONS
All results are based on data provided by the municipal corporation for the research
purpose only. Hence results may change, if data warehouse and data mining is applied on
actual data sets.
IX.

REFERENCES

[1]

Brian Larson 2008. Delivering Business Intelligence with Microsoft SQL Server 2008,
McGrawHill.
[2] Hen and Kamber 2011. Data Mining Concepts and Techniques, Morgan Kaufmann
Publishers.
[3] Jamie MacLennan et al. 2008. Data Mining with SQL Server 2008, Wiley.
[4] Pushpal Desai and Dr. Apurva Desai 2011, The Study on Data Warehouse and
Data Mining for Birth Registration System of the Surat City, International Journal
of Computer Applications, Number 4 - Article 2, 2011, pp. 1-5, ISBN: 978-93-8074663-0.
[5] Pushpal Desai and Dr. Apurva Desai 2012, An empirical analysis using data mining on
property tax - e-governance data, In the proceedings of National Seminar on Natural
language Processing and Data Mining, Department of Computer Science, Surat, India.
[6] Pushpal Desai and Dr. Apurva Desai 2012, An empirical analysis based on association
rules mining on E-Governance system, In the proceedings of International Conference
& Workshop on Recent Trends in Technology 2012, TCET, Mumbai, India.
[7] W. H. Inmon 2005. Building the Data Warehouse, Wiley.
[8] W. H. Inmon et al. 2001 Corporate Information Factory, Wiley.
[9] Kuldeep Deshpande and Dr. Bhimappa Desai, A Critical Study of Requirement
Gathering and Testing Techniques for Datawarehousing, International Journal of
Information Technology and Management Information Systems (IJITMIS), Volume 5,
Issue 1, 2014, pp. 60 - 71, ISSN Print: 0976 6405, ISSN Online: 0976 6413.
[10] Pushpal Desai, Building Aggregates in the Data Warehouse: A Case Study of Birth,
Deceased and Property Registration E-Governance Data, International Journal of
Advanced Research in Engineering & Technology (IJARET), Volume 5, Issue 6, 2014,
pp. 8 - 14, ISSN Print: 0976-6480, ISSN Online: 0976-6499.
50

Das könnte Ihnen auch gefallen