Choice of An Optimized Data Transmogrification Privacy Preserving Data Mining Technique

Choice of an Optimized Data Transmogrification Privacy Preserving Data Mining Technique
J.Indumathi Department of Computer Science and Engineering, Anna University,Chennai 600 025. Tamilnadu,India indu@cs.annauniv.edu Dr.G.V.Uma Department of Computer Science and Engineering, Anna University,Chennai 600 025. Tamilnadu,India gvuma@annauniv.edu
his need based on the trade-off between the data utility and level of uncertainty. Keywords-Data mining, Data Matrix, Dimensionality Reduction, Dissimilarity Matrix, Privacy-Preserving Data Mining, Random Projection,Orthogonal projection, Rotations, Sparse Projection, transmogrification.
Abstract
Investigating various dazzling research problems that cover to be embarked upon for effectively managing and controlling the speedy accessibility of information utilizing the latest technologies we are fanned by the smoldering necessity into a devouring flame to protect privileged information and enabling its use for research or other purposes. The technological tempo in Privacy Preserving Data Mining (PPDM) has also enabled us to reach the take-off stage from where we can march into the 21st century with confidence & poise. The recent proliferation in PPDM techniques has harried to put forward new techniques and we endeavour to discover the best amongst the existing perturbation solutions. In this manuscript we propose a generic framework for generating Privacy Preserving Data Mining (PPDM) data. The scenario addressed is for parties owning confidential databases wishing to run a data mining algorithm on the union of their databases, without revealing any unnecessary information. We analyzed the privacy breaches of the orthogonal projection based data perturbation technique. Our work explores the possibility of using mathematical random projection based data perturbation technique as an extension to improve the privacy level. This technique is a Value Distortion approach which is disparate from the Probability Distribution approach. This type of projection-based technique tries to improve the level of privacy protection while still preserving certain statistical characteristics of the data. With the motivation of increasing the data utility, and without compromising the simplicity of the process of perturbation we have proposed as a new variant of the random projection-based perturbation technique known as the sparse projection-based perturbation technique. This toil also proposes the data mining techniques to mine results from the projected matrix. Finally, we conclude discussing the data utility, privacy and performance issues of these above mathematical projections. Among the three techniques, the user can choose the one which suits
INFOS2008, March 27-29, 2008 Cairo-Egypt 2008 Faculty of Computers & Information-Cairo University
1.Introduction
Knowledge discovery research has jumped into the bandwagon speeding towards the 21St century by initiating a valuable venture for information revolution. We are ushering into a high technological era of internet which as an experimental setup has grown into a highly uncountable one and we are flooded with the opportunity to mine knowledge from large voluminous data obtained from numerous diverse sites. The importance of Privacy in data mining is no where better dramatized than in many developing areas of knowledge discovery and is particularly becoming important in counter-terrorism and homeland defense-related applications. Although health organizations are allowed to release data as long as the identifiers are removed, it is not considered safe enough since re-identification attacks have emerged which can link different public data sets to relocate the original subjects [1]. Yet unless we can learn to harness for well-designed techniques that pay careful attention to hiding privacy-sensitive information and preserving the inherent statistical dependencies inefficiency & waste in applying technical discoveries will continue which very seriously calls for data mining applications. There are various techniques available for privacy preserving data mining. One type of classification of privacy preserving techniques is given below (Figure 1). The projection techniques fall under the category of value distortion approach in perturbation technique.
NET-38
applications. Data privacy is often dictated by the characteristics of the domain. There exists a mounting body of literature on this topic. In the following of this section, we provide a brief description of the various techniques and methodologies. Data perturbation approaches can be grouped into two main categories viz., probability distribution approach and the value distortion approach. The probability distribution approach replaces the data with another sample from the same (or estimated) distribution [5] or by the distribution itself [6].The value distortion approach perturbs values of data elements or attributes directly by either additive noise, multiplicative noise, or some other randomization procedures. In this work, we mainly focus on the value distortion techniques. Agrawal et al. [7] proposed a value distortion technique to protect the privacy by adding random noise from a Gaussian distribution to the actual data. They showed that this technique appears to mask the data while allowing extraction of certain patterns like the original data distribution and decision tree models with good accuracy. This approach has been further extended in [8] where an Expectation-Maximizationbased (EM) algorithm is applied for a better reconstruction of the distribution. An information theoretic measure for quantifying the privacy is also discussed there. Evfimievski et al. [9], [10] and Rizvi [11] considered the same approach in the context of association rule mining and suggested techniques for limiting privacy breaches. More recently, Kargupta et al. [2] questioned the use of random additive noise and pointed out that additive noise can be easily filtered out in many cases that may lead to compromising the privacy. Given the large body of existing signalprocessing literature on filtering random additive noise, the utility of random additive noise for privacy-preserving data mining is not quite clear. The possible drawback of additive noise makes one wonder about the possibility of using multiplicative noise [12], [13] for protecting the privacy of the data while maintaining some of the original analytic properties. Two basic forms of multiplicative noise have been well studied in the statistics community. The first method is based on generating random numbers that have a truncated Gaussian distribution with mean one and small variance, and multiplying each element of the original data by the noise. The second method is to take a logarithmic transformation of the data first (for positive data only), compute the covariance, and generate random noise following a multivariate Gaussian distribution with mean zero and variance equalling a constant times the covariance computed in the last step, then add this noise to each element of
Figure. 1. Classification of PPDM showing the transmogrification
This paper considers an improvised randomized multiplicative data perturbation technique for centralised and distributed privacy preserving data mining. It was enthused by the work presented elsewhere [2] that pointed out some of the problems of additive random perturbation and the work presented in [3] which pointed out the multiplicative data perturbation technique. Specifically, this paper explores the possibility of using improved multiplicative random projection matrices for constructing a new representation of data. The transformed data in the new representation is released to the data miner. By projecting the data onto a random subspace, we will dramatically change its original form while preserving much of its underlying statistical characteristics. This paper enhances the random projection-based multiplicative perturbation technique in the context of computing inner product matrix from distributed data, which is computationally equivalent to many problems such as computing Euclidean distance, correlation, angles or even covariance between a set of vectors. These statistical aggregates play a critical role in many data mining techniques such as clustering, principal component analysis, and classification. This paper introduces very sparse random projections - based data perturbation approach which preserves the length and distance between the original data vectors as an extension to improve the data utility level. The random projection-based technique may be even more powerful when used with some other geometric transformation techniques like scaling, translation and rotation. Combining these techniques with SMCbased techniques is also a divergence in an orthogonal interesting direction. In our belief, the field of privacy preserving data mining is still in its infancy; the techniques for proving the correctness and quantifying privacy preserving capabilities are still in the development. Although our analytical and experimental results look promising, we make cautious claims.
2.Related Work
Preserving data privacy is gaining its momentum as an increasingly important issue in many data mining
NET-39
the transformed data, and finally take the antilog of the noise-added data. Multiplicative perturbation overcomes the scale problem, and it has been proved that the mean and variance/covariance of the original data elements can be estimated from the perturbed version. In practice, the first method is good if the data disseminator only wants to make minor changes to the original data; however the second method assures higher security than the first one and still maintains the data utility very well in the log-scale. One of the main problems of the traditional additive perturbation and multiplicative perturbation is that they perturb each data element independently, and therefore the similarity between attributes or observations which are considered as vectors in the original data space is not well preserved. Many distance/similarity based data mining applications are thus impaired. In this paper, we propose a substitute approach to perturb data using multiplicative noise. Instead of applying noise to each element of the data, we make use of random projection matrices for constructing a perturbed representation of the data. This technique not only protects the confidentiality but also preserves certain statistical properties of the data, e.g., the inner product, the angles, the correlation, and the Euclidean distance. The very sparse random projection-based perturbation technique is presented next as an extension to improve the data utility level. The random projection-based technique may be even more powerful when used with some other geometric transformation techniques like scaling, translation and rotation, but that is till under study. Moreover, this kind of perturbation also allows dimension reduction, which is well suited for data mining problem where multiple parties want to collaboratively conduct computation on the union of their private data with as little communication as possible. 3. Problem Description 3.1. Problem Statement To furtherance the exercise of selecting the flower from the flock (i.e., best perturbation technique amongst the PPDM) inside a controlled, accordbased procedure utilizing to the maximum coverage probable the information, know-how, and proficiency of the privacy engineering.
Figure: 2. Modus Operandi for selecting the Best PPDM Technique for an Application-High Level Diagram.
3.2. Problem Description

In this paper we consider a randomized multiplicative data perturbation technique for centralised and distributed privacy preserving data mining. Specifically, this paper explores the possibility of using improvised multiplicative random projection matrices for constructing a new representation of data. The transformed data is the new data representation which is released to the data miner. By projecting the data onto a random subspace, we will dramatically change its original form while preserving much of its underlying statistical characteristics. This paper studies the random projection-based multiplicative perturbation technique in the context of computing inner product matrix from distributed data, which is computationally equivalent to many problems such as computing Euclidean distance, correlation, angles or even covariance between a set of vectors. These statistical aggregates play a critical role in many data mining techniques such as clustering, principal component analysis, and classification. This paper introduces a very sparse random transformationbased data perturbation approach as an extension to improve the data utility level.
4. Architecture Of The Proposed Work

We bring out a diagrammatic schematic representation of the blocks as shown in figure 2,3 involved in the proposed architecture.
4.1. Block Diagram

4.1.1. Calculating the Data Matrix In this block data sets are given as an input, which in the course of action gets converted to data matrices and is delivered at the output. The data sets must be in ARFF (Attribute Relation File Format) format. All the attributes are real-valued attributes. From the
NET-40
above details we have to generate the Data matrix at each site
4.1.3. Calculating Perturbed Data Matrix (PDM) 4.1.3.1.Orthogonal Transformation This section presents a deterministic multiplicative perturbation method using random orthogonal matrices in the context of computing inner product matrix. An orthogonal transformation is a linear transformation, R : IRn -> IRn, which preserves the length of vectors as well as the angles between them. Usually, orthogonal transformations correspond to and may be represented using orthogonal matrices. Since only the transformed data is released, there are actually an infinite number of inputs and transformation procedures that can simulate the output, while the observer has no idea what is the real form of the original data. Therefore, random orthogonal transformation seems to be a good way to protect datas privacy while preserving its utility. However, from the geometric point of view, an orthogonal transformation is either a pure rotation when the determinant of the orthogonal matrix is 1 or a roto inversion (i.e., a rotation followed by a flip) when the determinant is -1, and, therefore, it is possible to re-identify the original data through a proper rotation and it is possible to estimate their original forms quite accurately using Independent Component Analysis (ICA).U, V Matrices of the above discussion are the perturbed matrices in orthogonal Projection. 4.1.3.2.Random Projection The basic idea for the Random projection is based on the (JOHNSON-LINDENSTRAUSS LEMMA)[14]. This lemma shows that any set of s points in mdimensional Euclidean space can be embedded into an O( log s/ 2 )-dimensional space such that the pair-wise distance of any two points are maintained within an arbitrarily small factor. This beautiful property implies that it is possible to change the datas original form by reducing its dimensionality but still maintains its statistical characteristics.The beauty of this property is that inner product is directly related to many other distance-related metrics. U,V are Random Perturbed Data Matrix. 4.1.3.3.Sparse Projection In Sparse Projection, Achlioptas [15] proposed using the projection matrix R with i.i.d (independent and identically distributed) entries in,
Figure 3. Block Diagram of the proposed architecture
Partition Support: Two types of partitions being dealt are horizontal and vertical partition. It can be done by the support given by our project. The reason for this support is that, in case of a third party support for data mining problem the user does not know about the partitions. They only know about their data. The partition support gives a support for them to partition their data so that they identify the perturbed Matrix. Data structure used in this module is an M * N two dimensional array for the Data matrix. 4.1.2. Calculating Random Matrix (R) Here based on the partition of the data sets the row wise or column wise projection matrix is generated. The random entries are used to create the random matrix .The dimensions of the random matrix are decided based on the data matrix and partition type (viz., horizontal / vertical). We have to select the optimal dimensions for the random matrix (based on the data matrix) or else in malice of improving the data utility it will decrease the privacy level and performance. We should also give the dimensions of the data matrix. The random matrix R can be any one among v.z., an Orthogonal Matrix; Random Projection matrix of Normal or Gaussian distribution; Sparse Random Projection Matrix. Based on the desire of the user, the desired Random matrix is selected. Since, the third party is not aware of the type of the projection, the identity of the individual element of the data matrix is not possible thus augmenting the privacy level. Data structure used in this module is a K * M or N * K two dimensional array for the Random matrix R. Here K<M or K<N respectively.
where Achlioptas used s = 1 or s = 3. With s = 3, one can achieve a threefold speedup because only 1 / 3 of the data need to be processed (hence the name sparse
NET-41
random projections). Since the multiplications with sqrt (s) can be delayed, no floating point arithmetic is needed and all computation amounts to highly optimized database aggregation operations. From this distribution the elements of the random sparse matrix are (rij ) generated and data matrix is multiplied with the random sparse matrix and the Perturbed Data Matrix is calculated.
} else J= J + 1 } } After calculating the data matrix, it will be converted in to the column major order representation of matrix for simplicity of access. 2. Algorithm for Calculating Random Matrix Calculate Orthogonal Matrix () Input: seed to generate random numbers Output: The Orthogonal Matrix in column major order { A = Identity Matrix (N) For I = 1 to N-1 rows { Generate the N elements of the I th column, Compute the Householder matrix H (I) that annihilates the sub diagonal elements, Set A = A * H (I)' = A * H (I) } } In the above algorithm first N X N Identity matrix is calculated and assigned to A matrix. Then N elements of the first column are calculated. Then using the first column values the H (1) house holder matrix is calculated. Then setting A := A * H1' = A * H is taken place. In the next step, generating the lower N1 elements of the second column, and computing the Householder matrix H2 that annihilates them, is taken place. Then setting A = A * H2' = A * H2 = H1 * H2 is taken place. Similarly on the N-1 step, generate the lower 2 elements of column N-1.Compute the Householder matrix H (N-1) that annihilates them, and then, Finally, A is set like below, A = A * H (N-1)' = A * H (N-1) = H (1) * H (2) * ... * H (N-1). Here, H (V) = I - 2 * v * v' / ( v' * v ) Here V is a vector of unit L2 norm which defines an orthogonal Householder pre multiplier matrix H with the property that the Kth column of H*A is zero below the diagonal. For the random projection and the sparse projection we have to simply generate random elements from Independent and Identically Distributed (IID) distribution and we have to populate the N X N Random matrix.
4.2. Finding Global matrix

Based on the horizontal or vertical partitioning Global Matrix (GM) is found. The third party gets the perturbed matrix at each site. These matrices are the input for this module. They build the Global Matrix based on the partitions, which is the main output of our project. From the global matrix, global ARF file is found for the purpose of data mining. From this GPM the third party can not find the original data matrix and thus we have successfully handled the problem of privacy in Data mining.
4.3. Data Mining

This module has two sub modules, namely, finding Inner Product / Euclidean Distance from heterogeneously Distributed and implementing KMeans Clustering
5. Steps And Algorithms

The datasets are in the form of ARFF file. The mean and variance are inputs for Random projection. STEPS: 1. Calculating the Data Matrix 2. Calculating Random Matrix (R) 3. Calculating Perturbed Data Matrix (PDM) (a).Orthogonal Transformation: (b).Random Projection: (i).Row-Wise Projection: (ii).Column-Wise Projection: (c).Sparse Projection 4. Finding Global matrix and data mining 1. Algorithm for Calculating Data Matrix
Calculate Data Matrix () Input: The Data set in ARFF format. Output: The Data matrix. { While (! End of File) { DM [ I ][ J ] Read (Value) If (end of line) { I=I+1 J=0
NET-42
3. Algorithm matrix
for
calculating
perturbed
The problem has been implemented using the integrated algorithms. INTEGERATED ALGORITHM 1- Finding Inner Product / Euclidean Distance from heterogeneously Distributed. Problem: Let X be an m * n1 data matrix owned by A and Y be an m * n2 matrix owned by B. Compute the column-wise inner product and Euclidean distance matrices of the data (X : Y) without directly accessing it. Inputs: The Data Matrix From the data owner, Seed for the Random number generation, Mean and Variance of the distribution (if Random Projection) Algorithm: 1. A and B cooperatively generate a secret random seed and use this seed to generate a k * m random matrix R. 2. A and B project their data onto IRk using R and release the perturbed version U = 1 / (sqrt (k) * ) * RX V = 1 / (sqrt (k) * ) * RY 3. The third party computes the inner product matrix using the perturbed data U and V and gets
Calculate perturbed matrix () Input: Data matrix (N2 X N3) and Random matrix (N1 X N2) Output: Perturbed Matrix { for I = 0 to N1 { for J = 0 to N3 { C [I +J * N1] = 0.0; for K = 0 to N2 { C [I+J*N1] = C [I + J* N1] + A [I + K * N1] * B [K +J* N2]; } } } } The above is the matrix multiplication algorithm. The data matrix and random matrix are the input to this module. They are in the column major order. The output perturbed matrix also in column major order. 4. Algorithm for finding global matrix and data mining Find Global Matrix() Input: Perturbed matrices U, V and the type of partition used (horizontal or vertical) Output: The global matrix { If (type of partition = horizontal) then { Calculate
INTEGERATED ALGORITHM 2 -Projection Based Privacy Preserving K-Means Clustering: Problem: Let X be an m * n1 data matrix owned by A and Y be an m * n2 matrix owned by B. Compute the column-wise inner product and Euclidean distance matrices of the data directly accessing the raw data. without
} Else { Calculate (U V) } Assign to GM Do K Means Clustering on GM } In the above algorithm based on the data partition we find the GM (global matrix).Then the K-Means clustering algorithm is implemented on the GM.
Inputs: The Data Matrix From the data owner, Seed for the Random number generation, Mean and Variance of the distribution (if Random Projection) Algorithm: 1. A and B cooperatively generate a secret random seed and use this seed to generate a k * m random matrix R. 2. A and B project their data onto IRk using R and release the perturbed version U = 1 / (sqrt (k) * ) * XR V = 1 / (sqrt (k) * ) * YR 3. The third party does K-Means Clustering over the dataset .
6. Implementation
NET-43
7. Results and Analysis

Privacy Factor is the amount of effort spent in PCA to reconstruct the Original data from perturbed data or the number reconstructions needed to reconstruct the original data. Data Utility is the percentage of similarity between the data mined results from original data and perturbed data. Based on the above factors we are going to analyze the mathematical projections that we have implemented. Orthogonal Projection: In the orthogonal projection data utility is fully based on the orthogonal matrix. For example if the data is M * N matrix and if we select N * N orthogonal matrix, the orthogonal projected matrix will have the same dimension of M * N. So, the inner product values will not be affected more because of very less projection thereby providing high data utility. Since, the number of reconstructions required to reconstruct is very less using the PCA (Principle Component Analysis) the privacy is less when compared to the random projection. Random Projection: In the random projection data utility is fully based on the random matrix and its dimension. For example if the data is M * N matrix and if we select K * M (K<M) random matrix, the random projected matrix will have the dimension of K * N. So if the K value is very less means the data matrix will be projected more which will decrease the data utility. At the same time decreasing the K value will increase the privacy factor. The number reconstructions will be more. So there should be some balance in selecting the K value. While comparing to the orthogonal projection random projection will have less data utility and high privacy factor. Sparse Projection: This is also a type of random projection. In this projection data utility is fully based on the sparse matrix dimension, because the elements of the sparse matrix will be 0 or 1 or 1 only. So the elements in data matrix will present (1) or not present (0) or rotated (-1) in the projected matrix. So, the sparse projection will have high data utility than random projection but less privacy factor. Running Time Analysis The importance K value in the privacy preserving clustering algorithm Let X and Y be two data sets owned by A and B, respectively. X is an m * n1 matrix, and Y is an m * n2 matrix. Let R be a k * m (here k < m) random matrix such that each entry rij of R is independent and identically
chosen from some unknown distribution with mean zero and variance 2. Further, let U = 1 / (sqrt (k) * ) * RX; V = 1 / (sqrt (k) * ) * RY; E(UTV) = XTY The above definition shows the row-wise projection. The resulting perturbed matrix U will have the dimension of k * n1. Here k<m, so there will be dimensionality reduction. Thats why this technique is called as Projection (ie., From M dimensional space to K dimensional space).We cannot simply decide the K value. If the K value is very less means the data utility will also decreased but because of dimensionality reduction the running time of the data mining algorithm will become very less. Data utility K ---- (1) Time consumed to run the K ---- (2) k-means clustering algorithm
Graph 6.1
By keeping the K value as 90 (constant) and increasing the M value as 100,200,300 etc we can get the graph 6.1. This proves the equation (2). And while considering the data utility, it will be decreased by the constant K value and increasing M value, because more projection will yield to low data utility. This is shown in the Graph 6.2.
Graph 6.2
With increasing M value (100,200,300 etc) we increase the K value also (90,190,290 etc) and we can get the high data utility as the graph 6.3.
NET-44
The three techniques we have proposed are orthogonal transformation-based data perturbation approach, random projection-based perturbation technique and the sparse projection-based perturbation technique. Among the three techniques, the user can choose the one which suits his need based on the trade-off between the data utility and privacy.
9. Future Work
Try to look ahead, anticipating eventualities, preparing for contingencies & provide an orderly sequence for achieving the random projection-based techniques objectives may be even more dominant when used with some other geometric transformation techniques like scaling, translation and rotation. Amalgamating this with SMC-based techniques we can develop a new hybrid technique which is a growth in another interesting direction. Extending the above algorithm for alphanumeric and categorical attributes is another proposed future work. A primary direction for future work is motivated by the fact that there is still not a good benchmark to quantify the privacy and data utility.
Graph 6.3
But because of very little projection the running time will remain same as other data mining algorithms (ie., without projection). This is shown in the Graph 6.4.This again proves the equation (1)
10.References
[1] L. Sweeney, k-anonymity: a model for protecting privacy, International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, vol. 10, no. 5, pp. 557570, 2002. [Online]. Available: http://privacy.cs.cmu.edu/people/sweeney/kanony mity.html [2] H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar, On the privacy preserving properties of random data perturbation techniques, in Proceedings of the IEEE International Conference on Data Mining, Melbourne, FL, November 2003. [3]Kun Liu, Hillol Kargupta, and Jessica Ryan, Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining, IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 1, pp. 92106January 2006. [4] C. K. Liew, U. J. Choi, and C. J. Liew, A data distortion by probability distribution, ACM Transactions on Database Systems (TODS), vol. 10, no. 3,pp. 395411, 1985. [Online]. Available: http://portal.acm.org/citation.cfm?id=4017 [5] E. Lefons, A. Silvestri, and F. Tangorra, An analytic approach to statistical databases, in Proceedings of the 9th International Conference on Very Large Data Bases. Florence, Italy: Morgan Kaufmann Publishers Inc., November 1983, pp. 260274. [Online]. Available: http://portal.acm.org/citation.cfm?id=673617 [6] R. Agrawal and R. Srikant, Privacy preserving data mining, in Proceedings of the ACM
Graph 6.4
Summarizing the main points we understand that if data utility is of primary concern in a domain, then our rating will be in the order of Orthogonal projection followed by Sparse projection and at last the Random projection. Incase of Privacy, is our Spartan we select Random projection as the first choice followed by Orthogonal projection and finally Sparse projection.
8. Conclusion
In this paper introduces an orthogonal transformationbased data perturbation approach which preserves the length and distance between the original data vectors, thereby keeping data utility a maximum. The random projection-based perturbation technique is presented next as an extension to improve the privacy level. With the motivation of increasing the data utility, and without compromising the simplicity of the process of perturbation we have proposed as a new variant of the random projection-based perturbation technique known as the sparse projection-based perturbation technique.
NET-45
SIGMOD Conference on Management of Data, Dallas, TX, May 2000, pp. 439450. [7] D. Agrawal and C. C. Aggarwal, On the design and quantification of privacy preserving data mining algorithms, in Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of Database Systems, Santa Barbara, CA, 2001, pp. 247255. [Online]. Available: http://portal.acm.org/citation.cfm?id=375602 [8] A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke, Privacy preserving mining of association rules, in Proceedings of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD02), July 2002. [9] A. Evfimevski, J. Gehrke, and R. Srikant, Limiting privacy breaches in privacy preserving data mining, in Proceedings of the ACM SIGMOD/PODS Conference, San Diego, CA, June 2003. [10] S. J. Rizvi and J. R. Haritsa, Maintaining data privacy in association rule mining, in Proceedings of the 28th VLDB Conference, Hong Kong, China, August 2002. [11] J. J. Kim and W. E. Winkler, Multiplicative noise for masking continuous data, Statistical Research Division, U.S. Bureau of the Census, Washington D.C., Tech. Rep. Statistics #2003-01, April 2003. [12] K. Muralidhar, D. Batrah, and P. J. Kirs, Accessibility, security, and accuracy in statistical databases: The case for the multiplicative fixed data perturbation approach, Management Science, vol. 41, no. 9, pp. 15491584, 1995. [13] W. B. Johnson and J. Lindenstrauss, Extensions of lipshitz mapping into hilbert space, Contemporary Mathematics, vol. 26, pp. 189206, 1984. [14] Ping Li, Trevor J.Hastie, and Kenneth W.Church, Very Sparse Random Projections,KDD06 August, ACM 2006.
Authors
Indumathi.J received her M.E. from Anna University, Chennai, India in year 1992 and M.B.A from Madurai Kamaraj University, Madurai, India in 1994. She is working for Anna University as a Senior Lecturer. She is currently doing her Ph.D from Anna University, Chennai. Her field of interest spans and is not limited to Computer Science and Engineering, Financial Management. Her research interests include Security for Data Mining, Databases, Networks, Computers, Software Engineering, Software Testing, Project Management, Biomedical Engineering, Genetic Privacy and Ontology.
G.V.Uma, a Polymath and winner of Young Scientist Award received her M.E. from Bharathidasan University, India in year 1995 and Ph.D. from Anna University, Chennai, India in 2002. She is working for Anna University as a Assistant Professor..Her research interests include Software Engineering, Genetic Privacy,Ontology.Knowledge Engineering & Mangement, Natural Language Processing.
NET-46

Choice of An Optimized Data Transmogrification Privacy Preserving Data Mining Technique

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Choice of An Optimized Data Transmogrification Privacy Preserving Data Mining Technique

Hochgeladen von

Copyright:

Verfügbare Formate

Choice of an Optimized Data Transmogrification Privacy Preserving Data Mining Technique

Figure. 1. Classification of PPDM showing the transmogrification

3.2. Problem Description

4. Architecture Of The Proposed Work

4.1. Block Diagram

above details we have to generate the Data matrix at each site

Figure 3. Block Diagram of the proposed architecture

4.2. Finding Global matrix

4.3. Data Mining

5. Steps And Algorithms

7. Results and Analysis

Das könnte Ihnen auch gefallen