Sie sind auf Seite 1von 12

Dr. Barjesh Kochar & Prof.

Pankaj Lathar
ABSTRACT
Identifying the sequential patterns from a huge database sequence is a main problem in the area of
knowledge discovery and data mining. Therefore, only if an efficient mining technique is used the stored
information will be helpful. In the earlier effort an innovative data mining technique based on sequential
pattern mining and fuzzy logic was used to efficiently mine the RFID data. In a large database, if the entire
set of sequential patterns is presented in the result the user may find it difficult to understand and employ the
mining result. It is found that even efficient algorithms that have been proposed for mining large amount of
sequential patterns from huge databases is a computationally costly task. An efficient data mining system
that generates the most favorable sequential pattern is proposed to overcome this issue. Developing a utility
considered RFID data mining technique is the main aim of exploration. Generation of dataset from the
warehoused RFID data is the first stage in the proposed technique. Then, with various pattern length
combinations the sequential patterns are mined and by using the sequential patterns the fuzzy rules are
generated. Each pattern has its own utility. From the mined sequential patterns the most favorable
sequential pattern is generated by using Genetic Algorithm (GA). To find out the sequential pattern with
maximum profit, the fitness function of the GA will be used. The implementation result shows that the
proposed mining system performs accurately by extracting the important RFID tags and its combinations,
nature of movement of the tags and the optimum sequential patterns. Focusing only on the consequential
sequential patterns that the users find interesting leads to productive trade in RFIDenabled applications.
Data Mining System, RFID, GeneticAlgorithm(GA), Fuzzy rules Keywords: .
Mining Best Utility Pattern from RFID Data
Warehouse through Genetic Algorithm
INTRODUCTION
As a result of the recent development of information,
and accessibility of low-priced storage, huge data
collection has been possible throughout the previous
decades. Utilizing this information to comprehend
competitive benefits, by analyzing the data is the
eventual purpose of this huge data collection i.e.,
determining previously unknown patterns in data
that can direct the process of decision making [1]
[8]. On the basis of direct handling of the data by a
person the conventional data analysis methods are
generally based and they are not extendable to large
data sets. For competent storage and searching of
large data sets primary tools are obtainable in
database technology. But, the challenging and
unsettled issue is assisting the humans to analyze
and understand large masses of data. There is an
assurance that novel techniques and intelligent tools
that are presented by the forthcoming data mining
field will meet these challenges. The term data
mining, which is also called Knowledge Discovery
in Databases (KDD), is defined as The non-trivial
extraction of implicit, previously unknown, and
potentiallyuseful informationfromdata [2] [7].
Data mining [3] a multidisciplinary united effort
from databases, machine learning, and statistics, is
winning in turning masses of data into small
valuable pieces. In a real-world application the
ultimate goal of a data mining task might be e.g. to
allow a company to either improve its marketing,
sales, and customer support operations or through
better understanding of its customers, recognize a
fraudulent customer. Data mining methods have
been successfully carried out in a variety of fields
VIPS
VIVEKANANDA JOURNAL OF RESEARCH (1)
including marketing [10], manufacturing, process
control, and fraud detection [9], bioinformatics,
information retrieval, adaptive hypermedia,
electronic commerce and network management [4].
Descriptive mining and Predictive mining are the
two types of data mining tasks [5]. The fundamental
characteristics or common properties of the data in
the database are portrayed by a technique denoted by
descriptive mining. The technique of predictive
mining figures out patterns from the data, this
enables predictions to be made. Tasks like
Classification, Regression and Deviation detection
are includedinpredictive mining methods.
Many latest and emerging applications are found by
mining information from a huge database. One of
the fields that incorporate the sequential pattern
mining in RFID database is the Radio Frequency
Ident i fi cat i on (RFID). Radi o Frequency
Identification (RFID) is a high-speed, real-time,
precise information gathering and processing
technology, which by employing radio-frequency
signal identifies the objects distinctively [6]. An
extensive variety of organizations and individuals
are being helped by RFID technology, for instance,
hospitals and patients, retailers and customers, and
manufacturers and distributors all through the
supply chain to achieve substantial productivity
gains and efficiencies [11]. Motivated by long
sequences in text data, biological data, software
engineering, and sensor networks, mining repetitive
gapped subsequences was studied to capture the
occurrences of sequential patterns repeating within
each sequence of a large database and then use them
as features for classification or prediction. The tags
are very diverse from printed barcodes in their
ability to hold data, at which range the tags can be
read, and the absence of line-of-sight constraints
[12].
Finding all frequent sequential patterns with a user-
specified least support is the goal of sequential
pattern mining. Usually, the sequential pattern
mining approaches are either generate-and-test (also
known as Apriori) or pattern growth (also known as
divide-and-conquer) or vertical format method
approach [13]. Of the many approaches [15] that
have been proposed in sequential pattern mining
most of them are focused on the following two
issues: (1) enhancing the competency of the mining
process and (2) widening the mining of sequential
patterns to other types of time related patterns [16].
The retailing industry problems motivated the issue
of sequential patterns discovery. However, the
results are applicable to numerous scientific and
business domains, like stocks and markets basket
analysis, natural disasters (e.g. earthquakes), DNA
sequence analyses, gene structure analyses, web log
click stream analyses, and so on [18]. Time is the
most important feature for this task, mainly whenthe
results are necessary in a limitedperiod of time [17].
In many cases, sequential pattern mining still faces
hard challenges in both efficacy and competence,
nevertheless efficiency of mining the whole set of
sequential patterns has been enhanced considerably.
On the one hand, in a large database there could be a
huge quantity of sequential patterns. Only a small
subset of such patterns often interests a user. By
presenting the complete set of sequential patterns
the mining result would be tough to understand and
hard to employ [22].To optimize the cost of the
interesting sequential patterns Genetic Algorithm
(GA) is employed. GAoptimizers are vigorous and
they function well with discontinuous and non
differentiable functions where the customary local
optimizers fail. Processes such as genetic
combination, mutation, and natural selection in a
design based on the concepts of evolution are used
by the optimizationtechniques.
Even efficient algorithms that have been proposed
for mining, it can be found that mining large amount
of sequential patterns from huge databases is a
computationally expensive task. In this work, an
effective data mining system that generates the
optimum sequential pattern is proposed. The main
aim of the exploration is to develop a utility
considered RFID data mining technique. It is
intended to discover an optimum sequential pattern
based on their utility. The rest of the paper is
organized as follows: section 2 describes some of
VIPS
VIVEKANANDA JOURNAL OF RESEARCH (2)
the recent related works. Section 3 briefs about GA
and section 4 details about the proposed method,
optimization of sequential patterns using GA.
Experimental results and analysis of the proposed
methodology are discussed in Section 5. Finally,
concludingremarks are provided in Section 6.
Numerous researches have been proposed by
researchers for an effective data mining process. In
this section, a brief review of some important
contributions from the existing literature is
presented.
For frequent item set mining that identifies high-
utility item combinations an algorithm was
presented by J. Hu and A. Mojsilovic [18]. In
difference to the customary association rule and
frequent item mining methods, the objective of the
algorithm was to locate segments of data, defined
through combinations of some items (rules), which
gratify certain conditions as a group and maximize a
predefined objective function. They devise the task
as an optimization problem, present a competent
estimation to resolve it by specialized partition trees,
called High-Yield Partition Trees, and examine the
functioning of diverse splitting strategies. The
algorithmwas tested on real-world data sets, and it
accomplishedvery good results.
For numerous sequential pattern mining
applications, Jian Pei et al [19] proposed that the
Constraints were vital. Nevertheless, no systematic
study was available on constraint-based sequential
pattern mining. In their paper, that issue was
investigated and it was pointed out that the
framework which was developed for constrained
frequent-pattern mining did not fit our mission well.
On the basis of a sequential pattern growth
methodology an extended framework was
developed. Their study illustrates that under this
new framework the constraints can be effectively
and efficiently pushed deep into the sequential
pattern mining. Furthermore, their framework can
be extended to constraint-based structured pattern
miningas well.
A methodology with two processes for sequence
classification that utilizes sequential pattern mining
and optimization was presented by Themis P.
Exarchos et al. [21]. In the first stage, a series
classification model, which was found on a set of
sequential patterns, was defined and two sets of
weights one for the patterns and the other for classes
were set up. In the second stage, by employing an
optimization technique the weight values were
assessed to achieve best classification precision. By
altering the number of sequences, the number of
patterns and the number of classes, extensive
appraisal was done on the methodology, and it has
compared with similar sequence classification
approaches.
Data mining is a well accepted verity that the
process of data mining produces numerous patterns
from the given data and it was proposed by
S.Shankar et al. [22]. The procedure of discovering
frequent item sets and association rules were the
most important tasks in data mining. For mining
frequent item sets and association rules several
competent algorithms were attainable in the
literature. In recent years incorporating utility
considerations in data mining tasks was gaining
fame. The business value has been improved by
certain association rules and these rules of interest
were accredited by the data mining community over
a long time. The discovery of frequent item sets and
association rules from transaction databases
benefits numerous business applications. A
complete survey and study of a variety of techniques
in existence for frequent item set mining,
association rule mining with utility considerations
have been proposed in their paper.
Mining Sequential Patterns in large databases has
become a vital data mining task with broad
applications and this was described by Mourad
Ykhlef and Hebah ElGibreen [23]. In the field of
data mining it was an important task, which
describes potential sequenced relationships among
items in a database. Numerous diverse algorithms
were introduced for their task. The precise optimal
Sequential Pattern rule were found by the
RELATEDWORKS
VIPS
VIVEKANANDA JOURNAL OF RESEARCH (3)
conventional algorithms but particularly when they
were applied on large databases it takes a long time.
Currently, some evolutionary algorithms, namely
Particle Swarm Optimization and Genetic
Algorithm, were proposed and have been applied to
solve their problem. A new variety of hybrid
evolutionary algorithm that combines Genetic
Algorithm (GA) with Particle Swarm Optimization
(PSO) to mine Sequential Pattern was introduced in
their paper, so as to enhance the pace of evolutionary
algorithms convergence. Their algorithm was
referredtoas SP-GAPSO.
A search and optimization technique which is
inspired by nature's evolutionary processes is
genetic algorithm (GA). Apopulation of candidates
iterates through multiple generations of selection,
crossover, and mutation until an optimized solution
survives, much in the manner of survival of the
fittest. GAs are computer based optimization
techniques that employs the Darwinian evolution of
nature as a model [24]. The work of Holland (1975)
obtained a huge popularity for them. Usually, they
are employed for problems, which have an immense
and complex search space with an increased number
of local optimums [27]. The strength behind GAs is
the fact that the search space is traversed in parallel
by arbitrarily generating solutions and those
solutions are endlessly evaluated with a fitness
function [25]. Generally, three different search
phases are there in GA: (1) creating an initial
population; (2) Evaluating the population by a
fitness function; (3) producing a new population
[21]. In GA, the solutions are termed as individuals
or chromosomes [27]. The genetic search starts with
an arbitrarily generated population inside which, a
fitness functionevaluates every individual.
The individuals of existing and following
generations are duplicated or eliminated on the basis
of the fitness values. By applying GA operators
further generations are produced [21] i.e.
reproduction, crossover and mutation which are
sequentially applied to each individual with certain
probabilities [23], [22]. The first operator which is
the production operator (elitism) produces one or
more copies of any individual that posses a high
fitness value; or else, the individual is detached from
the solution pool [29]. Two randomly chosen parent
individuals are taken by the crossover operator as
input, and then they are combined and they generate
two children. This process of combining takes place
by choosing two crossover points in the strings of
the parents and then exchanging the genes between
these two points [26]. The mutation of individuals
through the alteration of parts of their genes is the
next step in each generation [30]. Mutation brings
inconsistency into the population of the succeeding
generation by altering a gene of a chromosome.
Making sure that the search algorithm is not bound
on a local optimumis its main goal [22]. It is used to
make sure that all likely alleles can go into the
population and hence preserve the population
diversity [21]. It is a very important component of
GAs and to produce diversity for GAs it is a
variationoperator [28].
By means of a novel data cleaning, transformation
and loading technique the RFID data has been
effectively warehoused, which was dedicatedly
proposed for RFID data. The previous works
illustrated that the required knowledge from the
warehoused RFIDdata was efficiently mined by the
proposed novel RFID data mining system. The
present work is intended to discover an optimum
sequential pattern on their cost, termed as utility
assigned. To identify the optimal sequential pattern
the GA-based technique is employed. After the
fuzzy rules are created from the sequential patterns,
the optimal sequential patterns are recognized by the
GA based method as per their utility assigned. The
sequential pattern with maximum profit is
discovered by the fitness function of the GA. For
easy understanding of the proposed mining system
the optimal sequential pattern of RFID data is
briefed in the following sub-section, prior to detail
the proposed mining system.
GENETICALGORITHM(GA)
AN EFFICIENT DATAMINING SYSTEM
BASED ON GA
VIPS
VIVEKANANDA JOURNAL OF RESEARCH (4)
VIPS
VIVEKANANDA JOURNAL OF RESEARCH (5)
VIPS
VIVEKANANDA JOURNAL OF RESEARCH (6)
VIPS
VIVEKANANDA JOURNAL OF RESEARCH (7)
VIPS
VIVEKANANDA JOURNAL OF RESEARCH (8)
VIPS
VIVEKANANDA JOURNAL OF RESEARCH (9)
CONCLUSION
REFERENCES
In this paper, we have presented a data mining
system for mining the information that are
applicable to the type of movement of the tags,
which are attached to the warehouse goods. The
proposed mining systemmined knowledge fromthe
warehoused data by generating I-dataset, mining
sequential patterns and then by generating fuzzy
rules from the sequential patterns. After that, on the
basis of their assigned utility, the sequential patterns
are optimized by using GA. The outcome of the
system, optimized fuzzy rules with corresponding
profit, has detailed the type of the tag movement
with a fuzzy score. Given a part of the tag (indirectly
it refers to a product) movement, the fuzzy rules
clasp the persisting path of the tag (product). In this
manner, diverse length combinations of the tags
have been taken into consideration and their
movement has been understood. The movements
are considered only for some important tags and
combinations and not for all tags and their
combinations. Fromthe implementation results and
comparative analysis, we observed that our
proposed system will efficiently identify the
optimumsequential pattern. So, with the help of the
presented optimized data mining system, tracking
of goods in large warehouses can be executed
efficiently. As we only concentrated on the
optimized sequential patterns the cost of mining the
sequential patterns is minimized. The extracted
information would be helpful for warehouse
management.
1. Bin Li and Dennis Shasha,
ACMSIGMODRecord, Vol.27, No.2,
pp.541-543, June 1998.
2. Anand, Bell and Hughes,
Data and Knowledge Engineering,
Vol.18, No.3, pp.189-223, 1996.
3. Agrawal, Imielinsk and Swami,
IEEE
Transaction Knowledge and Data Engineering,
vol. 5, no. 6, pp. 914-925, 1993.
4. Chen and Liu,
International
Journal of Business Intelligence and Data
Mining, Vol.1, No.1, pp.4-11, 2005.
5. Yashpal Singh and Alok Singh Chauhan,
Journal of
Theoretical and Applied Information
Technology, Vol.5, No.6, pp.36-42,2009.
6. C.M. Roberts,
Computers & Security, Vol.25,pp. 18
26, 2006.
7. Hatim A. Aboalsamh,
WSEAS Transactions on
Computers, Vol.7, No.8, pp.1352-1361, August
2008.
8. Sathiyamoorthi and Murali Bhaskaran,
International Journal of
Recent Trends in Engineering, Vol. 2, No.
3,pp.1-5, November 2009
9. Jayanthi Ranjan and Vishal Bhatnagar,
Journal of
Knowledge Management Practice, Vol. 9, No.
1, March 2008.
10. Michael J. Shaw, Chandrasekar Subramaniam,
Gek Woo Tan and Michael E. Welge,
Decision support systems, Vol.31,
No.1, pp.127-137, May 2001
11. Asghar Sabbaghi and Ganesh Vaidyanathan,
Journal of
Theoretical and Applied Electronic Commerce
Research, Vol. 3, No. 2, p.p. 71-81, 2008, ISSN
07181876.
12. Asif, Z., Mandviwalla, M.,
Communications of the
Association for Information Systems, Vol. 15,
"Free Parallel Data
Mining",
"EDM: A general
framework for data mining based on evidence
theory",
Database
Mining: A Performance Perspective,
"Data mining from1994 to 2004:
an application-oriented review",
"Neural Networks In Data Mining",
"Radio frequency identification
(RFID)",
"A novel Boolean
algebraic framework for association and
pattern mining",
"Data
Mining for Intelligent Enterprise Resource
Planning System",
"A
Review of Data Mining Tools In Customer
Relationship Management",
"Knowledge management and data mining for
marketing",
Effectiveness and Efficiency of RFID
technology in Supply Chain Management:
Strategic values and Challenges,
"Integrating the
supply chain with RFID: a technical and
business analysis",
VIPS
VIVEKANANDA JOURNAL OF RESEARCH (10)
No. 24, pp.393-427, 2005.
13. Jian Pei,Jiawei Han, Behzad Mortazavi-Asl,
Jianyong Wang, Helen Pinto, Qiming Chen,
Umeshwar Dayal and Mei-Chun Hsum,
IEEE
Transactions on Knowledge and Data
Engineering, Vol. 16, No. 10, pp.1-17, October
2004.
14. M.S. Chen, J. Han, P.S. Yu,
IEEE
Transactions on Knowledge and Data
Engineering, Vol.8, No.6,pp.866 883, 1996.
15. Yen-Liang Chen and Ya-Han Hu,
Decision Support Systems, Vol. 42, pp. 1203-
1215, 2006.
16. Kuen-Fang Jea, Ke-Chung Lin and I-En Liao,
International
Journal of Innovative Computing, Information
andControl, Vol.5, No.8, August 2009.
17. Dhany Saputra, Dayang R.A.Rambli and Oi
Mean Foong,
International Journal of
Computer Science and Engineering, Vol. 2,
No.2, pp.49-554, 2008.
18. J. Hu and A. Mojsilovic,
Pattern Recognition, Vol. 40, pp.
33173324, 2007.
19. Jian Pei, Jiawei Han and WeiWang,
Journal of
Intelligent Information Systems,Vol.28
,No.2,pp.133 -160, April 2007.
20. Shigeaki Sakurai, Youichi Kitahara and Ryohei
Orihara,
International Journal of Computational
Intelligence, Vol. 4, No.4, pp.252-260, 2008.
21. Themis P. Exarchos, Markos G. Tsipouras,
Costas Papaloukas and Dimitrios I. Fotiadis, "A
t wo-st age met hodol ogy for sequence
classification based on sequential pattern
mining and optimization", Data & Knowledge
Engineering, Vol.66, pp.467487, 2008.
22. Shankar and Purusothaman, "
International Journal of Soft Computing
Applications, Vol.10, No.4, pp.81-95, 2009.
23. Mourad Ykhlef and Hebah ElGibreen,
World Academy of Science,
Engineering and Technology,Vol.60,pp.863-
870,2009.
24. Jyothi Pillai and O.P.Vyas,
International Journal of Computer Applications
(0975 8887), Vol. 5, No.11, pp.9-13,August
2010.
25. M. Sedighizadeh and A. Rezazadeh,
World Academy of Science,
EngineeringandTechnology, Vol. 37, 2008.
26. P. Radhakrishnan, V.M. Prasad and M.R.
Gopalan,
Journal of Computer Science,
Vol. 5, No. 3, pp. 233-241, 2009.
27. Basheer M. Al-Maqaleh and Kamal K.
Bharadwaj,
World Academy of Science, Engineering and
Technology, vol. 11, pp. 43-46, 2005.
28. Timo Mantere,
"Mining Sequential Patterns by Pattern-
Growth: The PrefixSpan Approach",
Data mining: an
overview from a database perspective,
"Constraint-
based sequential pattern mining: The
consideration of recency and compactness",
"Mining hybrid sequential patterns by
hierarchical mining technique",
"Mining Sequential Patterns
Using I-PrefixSpan",
High-utility pattern
mining: A method for discovery of high-utility
item sets,
"Constraint-based sequential pattern mining:
The pattern-growth methods",
"A Sequential Pattern Mining Method
based on Sequential Interestingness",
Utility Sentient
Frequent Itemset Mining and Association Rule
Mining: A Literature Survey and Comparative
Study",
"Mining
Sequential Patterns Using Hybrid Evolutionary
Algorithm",
"Overviewof Itemset
Utility Mining and its Applications",
"Using
Genetic Algorithm for Distributed Generation
Allocation to Reduce Losses and Improve
Voltage Profile,
"Optimizing Inventory Using Genetic
Algorithm for Efficient Supply Chain
Management,"
"Genetic Programming Approach
to Hierarchical Production Rule Discovery,"
A Min-Max Genetic Algorithm
with Alternating Multiple Sorting for Solving
VIPS
VIVEKANANDA JOURNAL OF RESEARCH (11)
Constrained Problems,
Improved Off-Line Intrusion Detection Using
A Genetic Algorithm,
"Selectionof RTOS for
an Efficient Design of Embedded Systems,"
Combining
Genetic Algorithms With Imperfect And
Subdivided Features For The Automatic
Registration Of Point Clouds (GAREG-ISF),
"A Comparative Study of Adaptive
Mutation Operators for Genetic Algorithms,"
"The Rank-
scaled Mutation Rate for Genetic Algorithms,
"A Genetic Algorithm-based Solution for
Intrusion Detection,"
in Proceedings of the
Ninth Scandinavian Conference on Artificial
Intelligence, 2006.
29. Pedro A. Diaz-Gomez and Dean F. Hougen,
Proceedings of the
Seventh International Conference on Enterprise
Information Systems, 25-28, 2005, pp. 66-73,
May25-28, Miami, USA, 2005.
30. S. Ramanarayana Reddy,
International Journal of Computer Science and
Network Security, Vol.6 No.6, pp. 29-37, June
2006
31. Stefan Schenk and Klaus Hanke,
Proceedings of the 3rd ISPRS International
Workshop, Vol. 38,
32. Imtiaz Korejo, Shengxiang Yang and
ChangheLi,
in
proceedings of the 8th Metaheuristic
International Conference, July 1316, 2009.
33. Mike Sewell, Jagath Samarabandu, Ranga
Rodrigo, and Kenneth McIsaac,
Int ernat i onal Journal of Informat i on
Technology, Vol. 3, No. 1, 2006.
34. Zorana Bankovic, Jos M. Moya, AlfaroAraujo,
Slobodan Bojanic and Octavio Nieto-Taladriz,
Journal of Information
Assurance and Security, Vol. 4, pp. 192-199,
2009.
VIPS
VIVEKANANDA JOURNAL OF RESEARCH (12)

Das könnte Ihnen auch gefallen