Sie sind auf Seite 1von 5

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 3, May June 2013 ISSN 2278-6856

AN APPROACH TO EVALUATE AGGREGATE QUERIES EFFICIENTLY USING PRIORITY QUEUE TECHNIQUE


M.Laxmaiah1, K.Sunil Kumar2, Dr.A.Govardhan3 and Dr. C.Sunil Kumar4
JNT University, Tirumala Engineering College, Bogaram, Hyderabad, A.P, India 2Vaageswari College of Engineering, Karimnagar, A.P, India 3JNT University, Professor in CSE, Director of Evaluation, Hyderabad, A.P, India 4JNT University, Vaageswari College of Engineering, Karimnagar, A.P, India
1

Abstract: In particular, an Aggregate query by name


Iceberg (IB) query is a special kind of an aggregation query that calculates an aggregate value based on user preferred threshold (TH). The bitmap index (BI) is a common data structure (DS) for fast retrieval of matching rows in a relational database table. These resultant rows are useful to compute aggregate functions. In this work; the propose priority queue (PQ) approach to evaluate iceberg (IB) queries efficiently using compacted BI are proposed. The approach organizes the vectors in PQ by allowing for high density of 1s count to achieve finest reducing effect. Indepth experimentation demonstrates our proposed model is much more efficient than existing strategy.

aggregation functions and predicates. For e.g. if the count of a group is below TH. IB queries are today being processing no of techniques that do not scale well to large data sets. Hence, it is necessary to develop well-organized techniques to process them easily. One simple technique to reply is an IB query by organizing an array of counters in the memory. These counters are used to count the data values of each unique target attribute value for every single pass of data. However, this is hard because relational database table is several times larger than main memory. In another method, the records of the database table were sorted on the hard disk and then passed the sorted records in to main memory to form an aggregation. Further it selects aggregation values which are greater than a specified TH. If the available memory is less than the table size then the data is to be passed over in more number of times from the hard disk. Therefore query evaluations (QE) consume long execution time and extremely large hard disk requirements. To quickly evaluate the IB query, all the bitmap vectors of attributes in the selection are indexed. A bitmap for an attribute in a database table can be viewed as a matrix having R rows consisting corresponding number of rows and C columns indicating the number of distinct values of an attribute. If there is a bitmap vector in the kth position of the attribute then the element in the matrix is 1 else 0. Then the original bitmap vectors were aligned with available free space in the memory using word aligned hybrid compression technique (WAH). A couple of bitmap vectors with similar 1 bit positions were obtained to make a bitwise-AND operation. The resulting bitmap vector overcomes greater number of 1 bit than the TH specified in the IB query. Then that couple together with its count of 1 bit were added to the IB result set. The couple was next examined for the subsequent 1 bit positions in each of them after bitwise-XOR with resulting vector. Now, if the number of 1s were more than TH in this result then this bitmap vector was preserved for further processing. The IB queries are efficiently computed using compacted BI by deferring bitwise-XOR operations. In this job, the delayed strategy exclude disqualified bitwise-XOR Page 340

Keywords: Database, IB Query, Bitmap index (BI), priority queue (PQ), Threshold (TH)

1. Introduction
The size of the data warehouse (DW) is increasing extremely as the need of client requirements every day. Most aggregated value indicates input information of business such as revenue, sales, income etc. Business Analysts (BA) are often responsible to evaluate and use these aggregated values to compete with present competitive present world. Mostly data mining (DM) queries are IB queries. In particular, IB query is a unique class of aggregation query that compute an aggregate value above user specified threshold (T) [1, 2]. IB queries were first considered in DM field. The syntax of an IB query on a relation R (A1, A2 An) is stated below: SELECT Ai, Aj, , Am, AGG(*), FROM R, GROUP BY Ai, Aj, Am, .HAVING AGG (*) > = TH. Where Ai, Aj.Am represents a subset of attributes in R and referred as aggregate attributes. AGG represents an aggregation function. The greater than or equal to (>=) is a symbol used as a evaluation predicate. In this work, an IB query with aggregation function COUNT having the anti-monotone property is focused. IB queries have an interesting anti-monotone property for many of the Volume 2, Issue 3 May June 2013

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 3, May June 2013 ISSN 2278-6856
operations. In its evaluation process, the 1 bit in original vector is updated by subtracting it from count of 1 bit in resulting vector, if that is below TH then that vector is disqualified to take XOR operation on it. Hence it is reduced. Therefore the IB query is evaluated quickly by reducing the large number of bitmap vectors in a large database table resulting with less computation to speed up assessment time. But it was observed that considerable execution time was spent due to bitwise-XOR operations. In this paper, we propose a thickness PQ strategy to answer an IB query efficiently using compacted BI [8, 9 and 10]. The strategy allows the flow of vectors with high counts in the PQs by using an insertion sort (IS). Moreover, a vector with high counts are more likelihood to select large number of associated positions which leads to take less number of bitwise-AND operations. Then the dynamic reducing step is added to this strategy in order to achieve optimal reducing effect. Hence, PQ strategy is more stylish than existing methodologies. The experimental results for large synthetic data sets used to signify a considerable improvement which proves efficient IB query computation.
occupied with candidates. All these techniques were tuple-scan (TS) based approaches as they require at least one scan of each row in the relation. None of them influence the BI for query optimization. A comparison was presented for Collective IB Query Evaluation (CIQE) using three benchmark algorithms such as Sort-MergeAggregate (SMA), Hybrid-Hash-Aggregate (HHA) [7]. CIQE indicates that performance on data sets with low to moderate number of targets, and moderate to high skew was better than SMA but, not for low twist and high number of targets. HHA performance was not energetic and quite bad when the number of targets was high. In calculation, it has implementation problems. There was a considerable presentation gap between online algorithms and Oracle, indicating a scope for designing better IB query processing algorithms.

2.2 Bitmap Indices


BI is known to be efficient in order to speed up the IB queries especially used in the DW applications and in column stores. Model 204 was the first profit-making product making extensive use of the BI. This implementation was a hybrid between the basic BI and the list of row identifiers RID. In DW applications, BI is shown to perform better than tree based index (TBI) scheme, such as the variants of B-tree or R-tree. Compacted BI is widely used in column oriented data bases, such as C-store [14] to improve the performance over row oriented databases. WAH and BBC are two important density methods mostly used in query processing with little effort [3]. More importantly, bitmaps are compressed with BBC and WAH can directly participate in bitwise operations without decompression. BBC is effective in both sinking index sizes and query performance. BBC encodes the bitmaps in bytes, while WAH encodes in words. The new word aligned schemes use only 50% more space, but perform logical operations on compressed data 12 times faster than BBC. The development of bitmap compression methods and encoding strategies further broaden the applicability of BI. Nowadays it can be applied on all types attributes such as high cardinality attributes, numerical attributes and text attributes. An approach of executing the IB queries efficiently using the compressed BI are proposed and developed an effective bitmap pruning strategy for processing IB queries. The index-pruning based approach eliminates the need of scanning and processing the entire table which speeds up the IB query processing. The BI is used for an effective bit pruning strategy to improve efficiency of an IB query evaluation [4]. A bitwise-AND operation is conducted on similar 1 bit positioned bitmap vectors. The 1s count in the resulting vector R is calculated and compared with IB TH. If number of 1s were greater or equal to TH, then that pair was added to IB result set else rejected. A bitwise XOR was performed on the original bitmap vector with the resulting vector R and then count the number of 1s to determine vector pruning. If the number of 1s were greater than threshold then the resulting bitmap vector after bitwise XOR operation was queued for further processing, otherwise original bitmap vector was reduced. The IBDQ (Ice berg density queue) algorithm utilizes the high PQ strategy. The algorithm operates in two phases. In first phase, the strategy allows the flow of bitmap vectors in to PQ in an order of high 1s count by using IS. A function Bit1 Count generates the count of 1s in a bitmap vector which is defined as density of bitmap vector. The Next Aligned Vector function checks for alignment of two bitmap vectors. In second phase,

2. ANALYSIS OF RELATED WORK


Here an analysis of related work is proposed in the introduction part of this work that is explained in the following two sub sections. In the first subsection, i.e. A the processing of IB queries using tuple-scan based approach is evaluated. The related research using BI in second sub section i.e. B which is the focus of this work for optimization of IB queries is discussed. In modern times, the evaluation of IB queries has attracted researchers considerably due to the demand of scalability and competence.

2.1 Evaluating Iceberg (IB) Query


The processing of IB query was first studied by extending the probabilistic techniques and suggested hybrid and multi buckets algorithms. The sample and multiple hash function methods were used as basic building blocks of probabilistic procedures such as scaled-sampling and coarse-count algorithms. They estimated the sizes of query results in order to predict the valid IB results. This improves aggregate query performance and reduces memory requirements greatly. However, these techniques wrongly resulted in false positives and negatives. To overcome from these bugs, efficient strategies are designed by hybridizing the sampling and coarse-count techniques. The linear counting algorithm (LCA) is based on hashing method that allocates a bitmap of size M in the memory. All entries are initialized to 0s. The LCA algorithm then scans the relation and applies a hash function to each data value in the column of interest. The hash function generates a bitmap address and the algorithm sets this addressed bit to 1. The algorithm first counts the number of empty bitmap entries. It then estimates the column cardinality by dividing this count by the bitmap size m and plugging the result. Partitioning Algorithms for computation of Average Iceberg (AIB) Queries based on a theorem to select candidates by means of partitioning [5, 6]. The uniqueness of this POP algorithm is to partition a relation logically and to postpone partitioning in order to use main memory efficiently until all buckets are

Volume 2, Issue 3 May June 2013

Page 341

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 3, May June 2013 ISSN 2278-6856
the algorithm processes the two aligned vectors by conducting AND operation and tests whether the result of this AND operation is leading to IB result or not. If yes, add them in to iceberg result set and proceeds to further processing based on their remaining 1s count or else it is pruned using dynamic pruning. The reinsertion of the vector is considered whose future reference is useful by computing its latest count. count in resultant vector is examined to determine whether this count is above TH or not. If it is above TH, that vector pair is added to iceberg result set along with its count value. Now the counts of these vectors are updated by subtracting the rs count from it. Then we compare the updated count value with TH. If it is above TH we change the aligned 1s as 0 and reinsert them in to respective DQs upon latest count by conducting a bitwiseXOR operation between original vectors to resultant vector. Else the vector is reduced by dynamic pruning (DP) method. In this case, the XOR operations are stored. After every bitwise AND operation the DQs are subjected to realize the restructuring with its latest counts to conduct the bitwise AND operation between high count vector pairs. The same process is repeated until either of the DQs becomes empty.

3. PROPOSED MODEL
This section proposes the research model to be carried out on the topic under study in the following two sub sections An algorithm for pruning the vectors dynamically by computing latest counts for reinsertion Validates the proposal using a sample database

Insertion sort (IS)


for K 1 to length (B)-1 key B[j ] > B [K] is added in the sorted sequence B [0, .. j-1] iK-1 While i >= 0 and B [i] > key B [i +1] B[i] i i -1 B [i +1] key The above algorithm is implemented to order the bitmap vectors in the PQ on highest counts and this queue of vectors are given as input to function nextAlignedVectors.

3.1 Proposed Algorithm


An algorithm is proposed in this section to evaluate an IB query by reducing the bitmap vectors dynamically using high counts:

Pseudocode of proposed algorithm


IBDQ (attribute X, attribute Y, threshold TH) Input: {attribute X, attribute Y, threshold TH} Output: IB Results 1: PQM.clear, PQB.clear 2: for each vector a of attribute X do 3: a.count = BIT1_ COUNT (a) 4: if a.count >= TH then 5: for each vector a of attribute X do 6: insert PQ (a, PQA, a.count) 7: for each vector b of attribute Y do 8: b.count = BIT1_ COUNT (b) 9: if b.count >= TH then 10: for each vector b of attribute Y do 11: insert PQ( b, PQB, b.count) 12: R = null 13: a, b = nextAlignedVectors (PQA, PQB, TH) 14: while a <> null and b <> null do 15: PQA.pop 16: PQB.pop 17: r = BITWISE_AND (a,b) 18: if r.count >= TH then 19: Add IB result (a. value, b.value, r.count) into R 20: a. count= a. count- r.count 21: b.count= b.count- r.count 22: if a.count >= TH then 23: insert PQ (a, attribute X, a.count) 24: if b.count >= T then 25: insert PQ (b, attribute Y, b.count) 26: a,b = nextAlignedVectors (PQA, PQB, TH) 27: return R The above IBDQ algorithm is described in a DPQ strategy. The algorithm operates in two phases. In the first phase it prefers to enter bitmap vectors into PQ by density of 1s count. A function BIT1_COUNT generates the count of 1s in a bitmap vector which is defined as density of that vector. The function IS exercised on the bitmap vectors to arrange them in PQ with high 1s count. The nextAlignedVector function tests for aligned positions in two bitmap vectors. In second phase, the algorithm processes the IB query by conducting bitwise-AND operations between the highest count vectors that are chosen from each respective DQs as top positioned. Then bitwise AND operation is conducted between two aligned vectors. The 1s

3.2 Validation of DPQ Strategy on RDBMS


This section demonstrates the validity of the proposed DPQ and evaluates an IB query having two aggregate attributes with COUNT function. SELECT X, Y, COUNT (*) FROM R GROUP BY X, Y HAVING COUNT (*) > 3;

Table 3.1: Sample data base table(R) and BI


X X 2 X 1 X 2 Y Y2 C X 1 0 1 0 0 1 0 0 0 1 0 0 0 X 2 1 0 1 1 0 1 1 1 0 1 0 0 BI -X X 3 0 0 0 0 0 0 0 0 0 0 1 1 Y 1 0 0 1 0 0 1 0 1 0 0 1 1 Y 2 1 0 0 1 0 0 1 0 0 1 0 0 BI-Y Y 3 0 1 0 0 1 0 0 0 1 0 0 0

1.2 0 1.9 Y3 9 4.5 Y1 6 7.5 X2 Y2 6 X 2.7 Y3 1 8 X 7.9 Y1 2 8 X 4.4 Y2 2 5 X 0.7 Y1 2 8 X 7.3 Y3 1 4 0.0 X2 Y2 9 X 2.3 Y1 2 3 X 1.8 Y1 3 9 Database table(R)

Volume 2, Issue 3 May June 2013

Page 342

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 3, May June 2013 ISSN 2278-6856
The above mentioned query is assigned to fetch the count of attribute X and Y values from the database table R under a specified parameter called TH and BI are those in BI-X and BIY. The bitmap vectors are entered into respective priority queues say DQA and DQB in the order of their high density of 1 bits are as X2, X1 and Y1, Y2, Y3. Here, the vector X3 is not been entered because the density is less than TH. Hence it is directly reduced. The top positioned bitmap vectors X2: 101101110100 and Y1: 001001010011 are chosen from DQA and DQB respectively. Now the vectors are subjected to verify the alignment between them. If they are aligned, bitwise-AND operation is conducted, and obtained resulting bitmap vector R as 001001010000. We compute the compactness in the resulting vector R as 3 and compare with TH. Therefore, this couple of bitmap vectors X2 and Y1 is identified as iceberg result and added to the iceberg result set along with its count value. Further these vectors are also sent to the computational block to compute the latest compactness by subtracting the compactness of result vector R from each vector. The same is formulated as follows: X2.count = X2.count- R. count which is equals to 4 and B1.count = Y1.count - R. count is 2. Then the count of the both vector are compared with TH. The vector X2 is passing the TH where as the vector Y1 do not. Hence, it is reduced. Whereas the vector X2 is reinserted into DQA with its latest count by changing the aligned 1s as 0 by conducting the bitwise-XOR operations between vector X2 and resultant vector R. Therefore DQA is required to restructure and arranging them up on latest counts. Then vectors in DQA and DQB are modified as X2 and X1 & Y2 and Y3. The same process is continued until either of priority queues becomes empty. relational database table and noted the results with density DQs.

5. RESULTS
This section describes the results obtained in our experiment conducted in the previous section and are tabulated in Table 5.1. The first tuple in Table 1 indicates different THs. The second and third tuples correspond an execution time made in IBPQ and IBDB functions respectively. Table 1 tabulates the different execution times in seconds for the iceberg result set with respect to icebergPQ, icebergDB and icebergWD functions. The first row contains different thresholds. The second and third row lists out the number of seconds required to execute icebergPQ, icebergDB and icebergWD functions respectively. Table 5.1: Execution times with different thresholds (THs)
Threshold (TH) icebergPQ 1000 1.993 2000 0.585 3000 0.879 4000 1.346 5000 1.743 6000 1.354 7000 1.182 8000 1.271 9000 1.792

icebergDB icebergW D

1.521

0.678

1.234

1.678

0.218

0.359

0.359

0.839

0.229

1.340

0.345

1.567

1.563

1.130

0.489

0.769

0.231

0.739

6. CONCLUSIONS
This paper presents an efficient DPQ strategy for processing an IB query using compacted BIs. The contribution of the approach is speeded up the query evaluation (QE) process by emptying the compactness queue. Hence, the QE process is optimized. There are several research trends in evaluating the IB query efficiently such as reducing of redundant bits from the bitmap vector, the exclusion of ineffective bitwise-AND operations, And PQ is ordered by low compactness, Initial high compactness, Initial low compactness.

4. TESTING
This section describes the experimentation carried out on the implementation described in the previous section under a specified IB threshold values that increase from 1000 to 9000. First, the IB query is responsible to select the similar records with X and Y aggregate attributes from the table R which are having a TH value ranging between 1000 and 9000. Then, the experiment is to be conducted by firing an IB query on the database table which consists of millions of tuples with two attributes X and Y and COUNT as an aggregation function. The first function i.e. GenerateBitmaps accepts all these rows as input. This function first produces one bitmap vector of each different value of an aggregate attribute and aligned the words in a dense model. Then the compressed words of each bitmap vectors are given as input to the next function i.e. insertion sort to arrange them with high density 1s counts order in DQA and DQB through special function called DQ. FirstoneBitpostion. This determines the first one bit position in each vector and inserts them into respective density queues DQA and DQB created through special function called DQ. These two density queues are given as input to EfficientBitmapPruning function which repeatedly calls the NextAlignedVector function and First1bitposition function as an internal to it in to main program until any one of the Density queue becomes empty. Each time the NextAlignedVector function returns two top most aligned bitmap vectors from each Density queue to the main program i.e.,efficienticebergqueryevalautionwithDQs. From all such records which are having a COUNT value greater than 1000 to 9000 are generated as an output. The experiment is repeated for different IB TH by keeping the same number of rows in a

REFERENCES
[1] Bin He, Hui-I Hsiao, Ziyang Liu, Yu Huang and Yi Chen, Efficient Iceberg Query Evaluation Using Compressed Bitmap Index, IEEE Transactions On Knowledge and Data Engineering, vol 24, issue 9, sept 2011, pp.15701589 [2] D.E. Knuth, The Art of Computer Programming: A Foundation for computer mathematics Addison-Wesley Professional, second edition, ISBN NO: 0-201-89684-2, January 10, 1973. [3] G.Antoshenkov, Byte-aligned Bitmap Compression, Proceedings of the Conference on Data Compression, IEEE Computer Society, Washington, DC, USA, Mar28-30,1995, pp.476 [4] Hsiao H, Liu Z, Huang Y, Chen Y, Efficient Iceberg Query Evaluation using Compressed Bitmap Index, in Knowledge and Data Engineering, IEEE, Issue: 99, 2011, pp:1. [5] Jinuk Bae,Sukho Lee, Partitioning Algorithms for the Computation of Average Iceberg Queries, SpringerVerlag, ISBN:3-540-67980-4, 2000, pp: 276 286. [6] J.Baeand, S.Lee, Partitioning Algorithms for the Computation of Average Iceberg Queries, in DaWaK, 2000.

Volume 2, Issue 3 May June 2013

Page 343

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 3, May June 2013 ISSN 2278-6856
[7] K. P. Leela, P. M. Tolani, and J. R. Haritsa.On Incorporating Iceberg Queries in Query Processors, in DASFAA, 2004, pages 431442. [8] K.Stockinger, J.Cieslewicz, K.Wu, D.Rotem and A.Shoshani. Using Bitmap Index for Joint Queries on Structured and Text Data, Annals of Information Systems, 2009, pp: 123. [9] K.Wu,E.J.Otoo and A.Shoshani. Optimizing Bitmap Indices with Efficient Compression, ACM Transactions on Database System, 31(1):138, 2006. [10] K.Wu et al On the Performance of Bitmap Indices for High Cardinality Attributes, VLDB, 2004, pp: 2435.

Mr. K.Sunil Kumar is a Research Scholar. He is currently working as an Associate Professor in Vaageswari College of Engineering, Karimnagar. He has 11 years of Teaching Experience. He has 6 International Journals to his credit. His areas of interests are DWDM and Computer Networks.

Author Biographies
Mr. M.Laxmaiah is a Research Scholar in JNTUH, Kukatpally, Hyderabad. He is currently working as Professor & Head of CSE Dept in Tirumala Engineering College, Bogarm (v) Keesara (M), Hyderabad, AP, India. He has 15 years of experience in Education and 4 Years of experience in Research field. He has 4 research publications at International Journals. His areas of interest include Databases, Data Warehousing & Mining. Dr.A.Govardhan did his BE in Computer Science and Engineering from Osmania University College of Engineering, Hyderabad in 1992, M.Tech from Jawaharlal Nehru University, Delhi in 1994 and PhD from Jawaharlal Nehru Technological University, Hyderabad in 2003. He is currently working as Professor in CSE and Director of Evaluation, JNTUH, Kukatpally, Hyderabad. He has guided more Than 120 M.Tech Projects and number of MCA and B.Tech projects. He has 160 research publications at International/National Journals and Conferences. His areas of interest include Databases, Data Warehousing & Mining, Information Retrieval, Computer Networks, Image Processing.

Dr. C.Sunil Kumar did his B.E in Computer Science and Engineering from University of Madras, Vellore, India, in 1998, M.Tech in Computer Science and Engineering from SRM University, Chennai, India, in 2005. He is Doctorate holder in Computer Science and Engineering, JNT University, Hyderabad 2012, India. Currently, he is Professor & Head of CSE at Vaageswari College of Engineering, Karimnagar, JNT University, Hyderabad, India. He has guided more than 20 M.Tech projects and 40 B.Tech projects. He has 30 research publications at International/National Journals and Conferences. His research interests are Distributed Databases, Data warehousing and Data Mining.

Volume 2, Issue 3 May June 2013

Page 344

Das könnte Ihnen auch gefallen