Beruflich Dokumente
Kultur Dokumente
Journal of Information
Technology OF
& Management
Information System
(IJITMIS), ISSN 0976
INTERNATIONAL
JOURNAL
INFORMATION
TECHNOLOGY
&
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME
IJITMIS
IAEME
ABSTRACT
In this paper, multi dimensional schema design, data cube and OLAP operations on
Vehicle e-governance data is discussed. The proposed data mining model and its
implementation on Vehicle e-governance data is also discussed. In the first phase, Clustering
data mining algorithm is implemented to identify important clusters from the Vehicle egovernance data. In the second phase, Association Rules Mining algorithm is applied to
explore novel relationships from the important data clusters observed in the first phase. The
results indicate that novel relationship can be found using the proposed model.
Keywords: Clustering, Association Rules Analysis, Microsoft SQL Server Analysis
Services.
I.
INTRODUCTION
Inmon who is known as the father of data warehousing defines a data warehouse as a
subject oriented, integrated, nonvolatile, and time variant collection of data in support of
management decisions [7] [8]. Hen and Kamber defined data mining as Extracting or
mining knowledge from large amount of data [2]. The data warehouse and data mining
algorithm are applied in various domains for knowledge discovery. The data warehouse and
data mining algorithms are successfully used in Banking, Insurance, Finance, Marketing,
Education, Telecommunication, Medical Science, Power Industry, Weather Forecasting,
Product Design, Customer Relationship Management (CRM) etc In our earlier research
work, we tried to find association rules from Birth registration, Decease Registration,
Property and Vehicle e-governance data [4] [5] [6]. In this article, Association Rules
algorithm is applied to find interesting patterns and relationship from the different clusters of
Vehicle e-governance data.
40
International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME
II.
METHODOLOGY
To provide better understanding of proposed knowledge discovery model, a flowchart for the proposed model as shown in the Figure 1. The proposed model for knowledge
discovery involves three major phases.
In the first phase, various data preprocessing tasks on source data to convert into clean
and consistent data.
International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME
III.
The multidimensional schema is designed for Vehicle e-governance data and OLAP
operations are performed on data cubes for data analysis. Typically, automobile companies
keep on adding new models and hence frequent updates are required in the data warehouse.
The Snowflake schema design is proposed because the vehicles models can be easily
updated in the data warehouse. The Snowflake Schema which stores data in normalized form
allows us to easily update data in the data warehouse. In contrast to Snowflake, the Star
Schema design stores data in de-normalized form and that make it difficult to update data in
the data warehouse. In the proposed Snowflake Schema design, VehicleRegistrationbase
Table was used as the Fact Table and Modaelmasterbase, Companynamemaster,
Vehicletypebase and Sitemaster were used as Dimension Tables. There are many measures
like Vehicle Registration Count, Vehicle Amount, Tax amount. The Figure 2 shows the
proposed Snowflake Schema design of the Vehicle data.
42
International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME
IV.
CLUSTERING
The Owner Surname, Vehicle Model, Vehicle Company, Vehicle Type, Vehicle
Price, Vehicle Tax and Registration Year are used as input parameters and Registration Date
is used as key column and generated Clustering model for Vehicle Data Cube. The Cluster
model is used to identify important group of data from the source data. The Clustering is
performed using K-mean algorithm using Microsoft Analysis Service [1] [3].
Fig 3: Proposed Clustering Data Mining Model for Vehicle Data Cube.
V.
The Association Rules algorithm was applied on Vehicle Cluster data. For example,
to find interesting relationship from vehicle data, Car, Motorcycle, Autorikshaw and
Moped and Scooter clusters data are used. The Owner Surname, Vehicle Type were used as
input fields and Vehicle Company and Vehicle Model Name were used as predict only
attributes. The Apriori algorithm was used to find Association Rules from important data
clusters [1] [3].
43
International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME
Fig 4: Proposed Association Rules Data Mining Model for Vehicle Data Clusters.
VI.
RESULTS
The data cube was created considering Vehicle Registration Count and VAT
measures. The Model Masterbase, Vehicle Typebase, Year master, Site master and
Company master tables were selected as dimension tables.
International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME
In the drill-down and dice operation on Vehicle Data Cube, two dimensions Year
and Company Code were selected. The Registration Year value as 2003 to 2005 and
Company Code value as 1 Hero Honda were selected. The result shown in the Figure 6
indicates that 48,189 vehicles are registered of Hero Honda company during the year
2003 to the year 2005.
Fig 6: Drill-down and Dice operation on Vehicle Data Cube with Two Dimensions
In further drill-down operation on Vehicle Data Cube, Model Id dimension with
value 1 Splender was added. The Figure 7 shows that 18,015 are registered for this
particular vehicle model. The roll-up operation can be performed on all above data cubes by
removing various dimensions used in drill-down operations.
Fig 7: Drill-down and Dice operations on Vehicle Data Cube with Three Dimensions
The Clustering data mining algorithm created 10 Clusters from the Vehicle data. The
Cluster Diagram of the same is shown in the Figure 8.
International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME
The Cluster profile of the same model indicates presence of variables like Company
Name, Owner Surname and Vehicle Type Name. The variable Company Name name
has many states such as "Maruti Suzuki", "Hero Honda", "Bajaj", "Honda", "Huyndai",
"Tata Motors", "TVS" and Others. The variable Owner Surname has different states like
"Patel", "Wala", "Shah", "Desai", "SINGH", "SHAIKH", "KHAN", "PATIL" and Others.
The Vehicle Type Name variable has "CAR", "MOTORCYCLE", "AUTORICKSHAW",
"MOPED_SCOOTER" and "COMMERCIAL" states.
46
International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME
Figure 10: Cluster Diagram for Vehicle data (Vehicle Type Name =
AUTORICKSHAW)
Further analysis can be performed by viewing characteristics of Cluster 7. The
Cluster 7 characteristic is provided in the Figure 11.
47
International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME
Figure 12: Cluster Diagram for Vehicle data (Vehicle Type Name = Motorcycle)
The characteristics of the Cluster 3 shown in the Figure 13 indicate that Company
Name variable is present with two values Hero Honda and TVS. For the Hero Honda
value the probability is 97.09% percent where as for TVS value the probability is 1.23%.
For the Owner Surname field the value Patel is present with 92.72% probability.
48
International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME
In Vehicle Data Cluster, Company Name, Vehicle Type and Model Name
variables were used to find novel relationship. The Company Name and Vehicle Type
were used as input fields and Model Name was used as predict only attribute. Many
interesting relationships were found from the association rules mining model.
For example, the results provided in the Table 1 indicate that for Ford companys
car model Ford Ikon is likely to be sold in the city of Surat. Similarly, Toyota
companys car model Qualis, Honda companys car model Honda City, Mahindra
companys car model Scorpio, Fiat companys car model Palio , Tata Motors
companys car model Indica and Huyndai companys car model Santro most likely to
be sold in the city of Surat.
Table 1: Association Rules for Company Name, Vehicle Type=Car and Model Name
attributes
Rule
Confidence
0.866
Rule
Importance
4.476980868
0.7
4.381475564
0.941
4.209951446
0.694
4.08037921
0.68
4.072957414
0.719
3.680400153
0.749
3.613327965
Association Rules
Company Name = Ford, Vehicle Type Name = CAR -> Model Name =
FORD IKON
Company Name = Toyota, Vehicle Type Name = CAR -> Model Name =
QUALIS
Company Name = Honda, Vehicle Type Name = CAR -> Model Name =
HONDA CITY
Company Name = Mahindra, Vehicle Type Name = CAR -> Model Name =
SCORPIO TURBO
Company Name = Fiat, Vehicle Type Name = CAR -> Model Name =
PALIO
Company Name = Tata Motors, Vehicle Type Name = CAR -> Model Name
= INDICA
Company Name = Huyndai, Vehicle Type Name = CAR -> Model Name =
SANTRO
Rule
Importance
4.458154689
0.83
3.362508672
0.434
3.060536766
0.49
2.973270492
0.711
2.7995445
Association Rules
Company Name = SUZUKI, Vehicle Type Name = MOPED_SCOOTER ->
Model Name = ACCESS 125
Company Name = Honda, Vehicle Type Name = MOPED_SCOOTER ->
Model Name = HONDA ACTIVA
Vehicle Type Name = MOPED_SCOOTER -> Model Name = HONDA
ACTIVA
Company Name = TVS, Vehicle Type Name = MOPED_SCOOTER ->
Model Name = PEP
Vehicle Type Name = MOPED_SCOOTER, Company Name = Hero Honda
-> Model Name = PLEASURE
49
International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976
6405(Print), ISSN 0976 6413(Online), Volume 5, Issue 2, May - August (2014), pp. 40-50 IAEME
VII.
CONCLUSION
The practical research demonstrates that data cube operations such as drill-down, rollup, slice and dice could be extremely useful to administrator working at the municipal
corporation, as they are able to query data considering several dimensions. The cube
operations also provide lot of freedom to the administrators as query is not fixed in nature
like we normally find in OLTP systems. Data cube operations allow administrators to
execute ad hoc queries which not possible in the OLTP systems. These results can be utilized
by automobile companies to increase sales of their products by focusing on specific
community residing in the city of Surat. The results are unique in sense that e-governance
data can be utilized by private companies to increase their sales, improve marketing of the
product and analyze the vehicle purchase trend of the citizens. Furthermore, results of
Clustering and Association Rules data mining gives better understanding of data and finds
hidden trends and new relationships from e-governance data.
VIII. LIMITATIONS
All results are based on data provided by the municipal corporation for the research
purpose only. Hence results may change, if data warehouse and data mining is applied on
actual data sets.
IX.
REFERENCES
[1]
Brian Larson 2008. Delivering Business Intelligence with Microsoft SQL Server 2008,
McGrawHill.
[2] Hen and Kamber 2011. Data Mining Concepts and Techniques, Morgan Kaufmann
Publishers.
[3] Jamie MacLennan et al. 2008. Data Mining with SQL Server 2008, Wiley.
[4] Pushpal Desai and Dr. Apurva Desai 2011, The Study on Data Warehouse and
Data Mining for Birth Registration System of the Surat City, International Journal
of Computer Applications, Number 4 - Article 2, 2011, pp. 1-5, ISBN: 978-93-8074663-0.
[5] Pushpal Desai and Dr. Apurva Desai 2012, An empirical analysis using data mining on
property tax - e-governance data, In the proceedings of National Seminar on Natural
language Processing and Data Mining, Department of Computer Science, Surat, India.
[6] Pushpal Desai and Dr. Apurva Desai 2012, An empirical analysis based on association
rules mining on E-Governance system, In the proceedings of International Conference
& Workshop on Recent Trends in Technology 2012, TCET, Mumbai, India.
[7] W. H. Inmon 2005. Building the Data Warehouse, Wiley.
[8] W. H. Inmon et al. 2001 Corporate Information Factory, Wiley.
[9] Kuldeep Deshpande and Dr. Bhimappa Desai, A Critical Study of Requirement
Gathering and Testing Techniques for Datawarehousing, International Journal of
Information Technology and Management Information Systems (IJITMIS), Volume 5,
Issue 1, 2014, pp. 60 - 71, ISSN Print: 0976 6405, ISSN Online: 0976 6413.
[10] Pushpal Desai, Building Aggregates in the Data Warehouse: A Case Study of Birth,
Deceased and Property Registration E-Governance Data, International Journal of
Advanced Research in Engineering & Technology (IJARET), Volume 5, Issue 6, 2014,
pp. 8 - 14, ISSN Print: 0976-6480, ISSN Online: 0976-6499.
50