Beruflich Dokumente
Kultur Dokumente
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 3, May June 2013 ISSN 2278-6856
aggregation functions and predicates. For e.g. if the count of a group is below TH. IB queries are today being processing no of techniques that do not scale well to large data sets. Hence, it is necessary to develop well-organized techniques to process them easily. One simple technique to reply is an IB query by organizing an array of counters in the memory. These counters are used to count the data values of each unique target attribute value for every single pass of data. However, this is hard because relational database table is several times larger than main memory. In another method, the records of the database table were sorted on the hard disk and then passed the sorted records in to main memory to form an aggregation. Further it selects aggregation values which are greater than a specified TH. If the available memory is less than the table size then the data is to be passed over in more number of times from the hard disk. Therefore query evaluations (QE) consume long execution time and extremely large hard disk requirements. To quickly evaluate the IB query, all the bitmap vectors of attributes in the selection are indexed. A bitmap for an attribute in a database table can be viewed as a matrix having R rows consisting corresponding number of rows and C columns indicating the number of distinct values of an attribute. If there is a bitmap vector in the kth position of the attribute then the element in the matrix is 1 else 0. Then the original bitmap vectors were aligned with available free space in the memory using word aligned hybrid compression technique (WAH). A couple of bitmap vectors with similar 1 bit positions were obtained to make a bitwise-AND operation. The resulting bitmap vector overcomes greater number of 1 bit than the TH specified in the IB query. Then that couple together with its count of 1 bit were added to the IB result set. The couple was next examined for the subsequent 1 bit positions in each of them after bitwise-XOR with resulting vector. Now, if the number of 1s were more than TH in this result then this bitmap vector was preserved for further processing. The IB queries are efficiently computed using compacted BI by deferring bitwise-XOR operations. In this job, the delayed strategy exclude disqualified bitwise-XOR Page 340
Keywords: Database, IB Query, Bitmap index (BI), priority queue (PQ), Threshold (TH)
1. Introduction
The size of the data warehouse (DW) is increasing extremely as the need of client requirements every day. Most aggregated value indicates input information of business such as revenue, sales, income etc. Business Analysts (BA) are often responsible to evaluate and use these aggregated values to compete with present competitive present world. Mostly data mining (DM) queries are IB queries. In particular, IB query is a unique class of aggregation query that compute an aggregate value above user specified threshold (T) [1, 2]. IB queries were first considered in DM field. The syntax of an IB query on a relation R (A1, A2 An) is stated below: SELECT Ai, Aj, , Am, AGG(*), FROM R, GROUP BY Ai, Aj, Am, .HAVING AGG (*) > = TH. Where Ai, Aj.Am represents a subset of attributes in R and referred as aggregate attributes. AGG represents an aggregation function. The greater than or equal to (>=) is a symbol used as a evaluation predicate. In this work, an IB query with aggregation function COUNT having the anti-monotone property is focused. IB queries have an interesting anti-monotone property for many of the Volume 2, Issue 3 May June 2013
Page 341
3. PROPOSED MODEL
This section proposes the research model to be carried out on the topic under study in the following two sub sections An algorithm for pruning the vectors dynamically by computing latest counts for reinsertion Validates the proposal using a sample database
1.2 0 1.9 Y3 9 4.5 Y1 6 7.5 X2 Y2 6 X 2.7 Y3 1 8 X 7.9 Y1 2 8 X 4.4 Y2 2 5 X 0.7 Y1 2 8 X 7.3 Y3 1 4 0.0 X2 Y2 9 X 2.3 Y1 2 3 X 1.8 Y1 3 9 Database table(R)
Page 342
5. RESULTS
This section describes the results obtained in our experiment conducted in the previous section and are tabulated in Table 5.1. The first tuple in Table 1 indicates different THs. The second and third tuples correspond an execution time made in IBPQ and IBDB functions respectively. Table 1 tabulates the different execution times in seconds for the iceberg result set with respect to icebergPQ, icebergDB and icebergWD functions. The first row contains different thresholds. The second and third row lists out the number of seconds required to execute icebergPQ, icebergDB and icebergWD functions respectively. Table 5.1: Execution times with different thresholds (THs)
Threshold (TH) icebergPQ 1000 1.993 2000 0.585 3000 0.879 4000 1.346 5000 1.743 6000 1.354 7000 1.182 8000 1.271 9000 1.792
icebergDB icebergW D
1.521
0.678
1.234
1.678
0.218
0.359
0.359
0.839
0.229
1.340
0.345
1.567
1.563
1.130
0.489
0.769
0.231
0.739
6. CONCLUSIONS
This paper presents an efficient DPQ strategy for processing an IB query using compacted BIs. The contribution of the approach is speeded up the query evaluation (QE) process by emptying the compactness queue. Hence, the QE process is optimized. There are several research trends in evaluating the IB query efficiently such as reducing of redundant bits from the bitmap vector, the exclusion of ineffective bitwise-AND operations, And PQ is ordered by low compactness, Initial high compactness, Initial low compactness.
4. TESTING
This section describes the experimentation carried out on the implementation described in the previous section under a specified IB threshold values that increase from 1000 to 9000. First, the IB query is responsible to select the similar records with X and Y aggregate attributes from the table R which are having a TH value ranging between 1000 and 9000. Then, the experiment is to be conducted by firing an IB query on the database table which consists of millions of tuples with two attributes X and Y and COUNT as an aggregation function. The first function i.e. GenerateBitmaps accepts all these rows as input. This function first produces one bitmap vector of each different value of an aggregate attribute and aligned the words in a dense model. Then the compressed words of each bitmap vectors are given as input to the next function i.e. insertion sort to arrange them with high density 1s counts order in DQA and DQB through special function called DQ. FirstoneBitpostion. This determines the first one bit position in each vector and inserts them into respective density queues DQA and DQB created through special function called DQ. These two density queues are given as input to EfficientBitmapPruning function which repeatedly calls the NextAlignedVector function and First1bitposition function as an internal to it in to main program until any one of the Density queue becomes empty. Each time the NextAlignedVector function returns two top most aligned bitmap vectors from each Density queue to the main program i.e.,efficienticebergqueryevalautionwithDQs. From all such records which are having a COUNT value greater than 1000 to 9000 are generated as an output. The experiment is repeated for different IB TH by keeping the same number of rows in a
REFERENCES
[1] Bin He, Hui-I Hsiao, Ziyang Liu, Yu Huang and Yi Chen, Efficient Iceberg Query Evaluation Using Compressed Bitmap Index, IEEE Transactions On Knowledge and Data Engineering, vol 24, issue 9, sept 2011, pp.15701589 [2] D.E. Knuth, The Art of Computer Programming: A Foundation for computer mathematics Addison-Wesley Professional, second edition, ISBN NO: 0-201-89684-2, January 10, 1973. [3] G.Antoshenkov, Byte-aligned Bitmap Compression, Proceedings of the Conference on Data Compression, IEEE Computer Society, Washington, DC, USA, Mar28-30,1995, pp.476 [4] Hsiao H, Liu Z, Huang Y, Chen Y, Efficient Iceberg Query Evaluation using Compressed Bitmap Index, in Knowledge and Data Engineering, IEEE, Issue: 99, 2011, pp:1. [5] Jinuk Bae,Sukho Lee, Partitioning Algorithms for the Computation of Average Iceberg Queries, SpringerVerlag, ISBN:3-540-67980-4, 2000, pp: 276 286. [6] J.Baeand, S.Lee, Partitioning Algorithms for the Computation of Average Iceberg Queries, in DaWaK, 2000.
Page 343
Mr. K.Sunil Kumar is a Research Scholar. He is currently working as an Associate Professor in Vaageswari College of Engineering, Karimnagar. He has 11 years of Teaching Experience. He has 6 International Journals to his credit. His areas of interests are DWDM and Computer Networks.
Author Biographies
Mr. M.Laxmaiah is a Research Scholar in JNTUH, Kukatpally, Hyderabad. He is currently working as Professor & Head of CSE Dept in Tirumala Engineering College, Bogarm (v) Keesara (M), Hyderabad, AP, India. He has 15 years of experience in Education and 4 Years of experience in Research field. He has 4 research publications at International Journals. His areas of interest include Databases, Data Warehousing & Mining. Dr.A.Govardhan did his BE in Computer Science and Engineering from Osmania University College of Engineering, Hyderabad in 1992, M.Tech from Jawaharlal Nehru University, Delhi in 1994 and PhD from Jawaharlal Nehru Technological University, Hyderabad in 2003. He is currently working as Professor in CSE and Director of Evaluation, JNTUH, Kukatpally, Hyderabad. He has guided more Than 120 M.Tech Projects and number of MCA and B.Tech projects. He has 160 research publications at International/National Journals and Conferences. His areas of interest include Databases, Data Warehousing & Mining, Information Retrieval, Computer Networks, Image Processing.
Dr. C.Sunil Kumar did his B.E in Computer Science and Engineering from University of Madras, Vellore, India, in 1998, M.Tech in Computer Science and Engineering from SRM University, Chennai, India, in 2005. He is Doctorate holder in Computer Science and Engineering, JNT University, Hyderabad 2012, India. Currently, he is Professor & Head of CSE at Vaageswari College of Engineering, Karimnagar, JNT University, Hyderabad, India. He has guided more than 20 M.Tech projects and 40 B.Tech projects. He has 30 research publications at International/National Journals and Conferences. His research interests are Distributed Databases, Data warehousing and Data Mining.
Page 344