Beruflich Dokumente
Kultur Dokumente
Presented by
P. Sai Kiran J. Veeraiah Chowdary P. Sudheer Kumar T. Bhargav
Guided By Mr.M.M.M.Durga
ABSTRACT
Email mining is a process of discovering useful pattern from emails. Clustering techniques can be applied over email data to create groups of similar emails for measuring the similarity between pair of email objects . To measure the distance between two email objects more accurately, normal clustering distance techniques could not be a good choice. A weighted email attribute similarity based data mining model is proposed to for email clustering to discover email groups. Custom user defined weights are assigned for the similarity measured between a pair of email attributes to calculate the similarity between pairs of emails.
INTRODUCTION
Email communication has came up as the most effective and popular way of communication today. E-mail data that are now becoming the dominant form of interand intra-organizational written communication for many companies and government departments. Emails are the essential parts of life now just like mobile phones.
CLUSTERING ALGORITHMS
The most widely used clustering algorithm in textual data is the K-Means algorithm. In order to group some points in K clusters, K-Means works in 4 basic steps: 1. Randomly choose K instances within the dataset and assign them as cluster centers 2. Assign the remaining instances to their closest cluster center 3. Find a new center for each cluster. 4. If the new cluster centers are identical to the previous ones, then the algorithm stops. Otherwise, repeat steps 2-4.
EXISTING
APPROACHES
Existing model solutions include following: Automatic foldering is a more sophisticated approach based on filters matching the message with existing mail folders. Conversation view is an improved variation on the threaded view approach. It has been introduced in Google's Gmail service.
AND
SIMILARITY
1. DICE SIMILARITY
2. Cosine Similarity
3. TF-IDF SIMILARITY
4.Jaccard Similarity
Jaccard Sim = (X*Y) / (|X||Y|-(X*Y))
PROPOSED MODEL
The overall similarity between a pair of emails is represented by SimEmail which is the weighted summation of all of the similarities.
EXPERIMENTAL ANALYSIS
CONCLUSION
This technique includes the distance between all of the attributes of an email. The other direction of work for more email mining operations like thread summarization, automatic answering of the emails and classification of the emails for participating all the attributes of the emails and achieving more accurate results.