Willkommen bei Scribd!

Abstract

Hochgeladen von

0% fanden dieses Dokument nützlich (0 Abstimmungen)

22 Ansichten2 Seiten

In this paper, we propose a fuzzy similarity-based self-constructing algorithm for feature clustering. The words in the feature vector of a document set are grouped into clusters, based on similarity test. When all the words have been fed in, a desired number of clusters are formed automatically. The extracted feature is a weighted combination of the words contained in the cluster.

Originalbeschreibung:

Copyright

Verfügbare Formate

DOC, PDF, TXT oder online auf Scribd lesen

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Dieses Dokument melden

Copyright:

Attribution Non-Commercial (BY-NC)

Verfügbare Formate

Als DOC, PDF, TXT herunterladen oder online auf Scribd lesen

Markieren Sie unangemessene Inhalte

0% fanden dieses Dokument nützlich (0 Abstimmungen)

22 Ansichten2 Seiten

Abstract

Hochgeladen von

Shirisha Reddy

Copyright:

Attribution Non-Commercial (BY-NC)

Verfügbare Formate

Als DOC, PDF, TXT herunterladen oder online auf Scribd lesen

Markieren Sie unangemessene Inhalte

Zu Seite

Sie sind auf Seite 1von 2

Im Dokument suchen

A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification

ABSTRACT AbstractFeature clustering is a powerful method to reduce the dimensionality of feature vectors for text classification. In this paper,we propose a fuzzy similarity-based self-constructing algorithm for feature clustering. The words in the feature vector of a document set are grouped into clusters, based on similarity test. Words that are similar to each other are grouped into the same cluster. Each cluster is characterized by a membership function with statistical mean and deviation. When all the words have been fed in, a desired number of clusters are formed automatically. We then have one extracted feature for each cluster. The extracted feature, corresponding to a cluster, is a weighted combination of the words contained in the cluster. By this algorithm, the derived membership functions match closely with and describe properly the real distribution of the training data. Besides, the user need not specify the number of extracted features in advance, and trial-and-error for determining the appropriate number of extracted features can then be avoided. Experimental results show that our method can run faster and obtain better extracted features than other methods. Index TermsFuzzy similarity, feature clustering, feature extraction, feature reduction, text classification.

Existing system
IN text classification, the dimensionality of the feature vector is usually huge. For example, 20 Newsgroups and Reuters21578 top-10, which are two real-world data sets, both have more than 15,000 features. Such high dimensionality can be a severe obstacle for classification algorithms . To alleviate this difficulty, feature reduction approaches are applied before document classification tasks are performed .Two major approaches, feature selection and feature extraction have been proposed for feature reduction. In general, feature extraction approaches are more effective than feature selection techniques, but are more computationally expensive .Therefore, developing scalable and efficient feature extraction algorithms is highly demanded for dealing with high-dimensional document data sets.

Proposed system: Our clustering algorithm is an incremental, self-constructing learning approach. Word patterns are considered one by one. The user does not need to have any idea about

the number of clusters in advance. No clusters exist at the beginning, and clusters can be created if necessary. For each word pattern, the similarity of this word pattern to each existing cluster is calculated to decide whether it is combined into an existing cluster or a new cluster is created. Once a new cluster is created, the corresponding membership function should be initialized. On the contrary, when the word pattern is combined into an existing cluster, the membership function of that cluster should be updated accordingly.

SOFTWARE REQUIREMENTS: Operating System : WINDOWS XP Front End Back End : JDK1.6 :ORACLE 10G

HARDWARE REQUIREMENTS: PROCESSOR RAM HARD DISK

: Pentium 4 : 512MB : 40GB

Das könnte Ihnen auch gefallen

HTML Forms Built On User Trait Detection
Dokument16 Seiten
HTML Forms Built On User Trait Detection
saikiran
Noch keine Bewertungen
Feature Subset Selection With Fast Algorithm Implementation
Dokument5 Seiten
Feature Subset Selection With Fast Algorithm Implementation
seventhsensegroup
Noch keine Bewertungen
Influential Vocabulary Detection
Dokument15 Seiten
Influential Vocabulary Detection
api-263491997
Noch keine Bewertungen
Methods For Selecting and Improving Software Clustering Algorithms
Dokument5 Seiten
Methods For Selecting and Improving Software Clustering Algorithms
princegirish
Noch keine Bewertungen
Clustering in Machine Learning - Javatpoint
Dokument10 Seiten
Clustering in Machine Learning - Javatpoint
mangotwin22
Noch keine Bewertungen
Automatic Text Classification and Focused Crawling
Dokument5 Seiten
Automatic Text Classification and Focused Crawling
International Journal of Application or Innovation in Engineering & Management
Noch keine Bewertungen
Dimensionality Reduction
Dokument4 Seiten
Dimensionality Reduction
PAWAN TIWARI
Noch keine Bewertungen
Online Handwritten Cursive Word Recognition
Dokument40 Seiten
Online Handwritten Cursive Word Recognition
kousalya
Noch keine Bewertungen
An Improved Fast Clustering Method For Feature Subset Selection On High-Dimensional Data Clustering
Dokument5 Seiten
An Improved Fast Clustering Method For Feature Subset Selection On High-Dimensional Data Clustering
International Journal of Application or Innovation in Engineering & Management
Noch keine Bewertungen
Decision Trees. These Models Use Observations About Certain
Dokument6 Seiten
Decision Trees. These Models Use Observations About Certain
ashking
Noch keine Bewertungen
Data Structures Interview Questions Set - 1
Dokument3 Seiten
Data Structures Interview Questions Set - 1
Swastik Acharya
Noch keine Bewertungen
Chapter Five
Dokument16 Seiten
Chapter Five
Samuel Getachew
Noch keine Bewertungen
Introduction To KEA-Means Algorithm For Web Document Clustering
Dokument5 Seiten
Introduction To KEA-Means Algorithm For Web Document Clustering
surendiran123
Noch keine Bewertungen
Anomalous Topic Discovery in High Dimensional Discrete Data
Dokument4 Seiten
Anomalous Topic Discovery in High Dimensional Discrete Data
Brightworld Projects
Noch keine Bewertungen
UNIT 3 DWDM Notes
Dokument32 Seiten
UNIT 3 DWDM Notes
Divyansh
Noch keine Bewertungen
Feature Selection in Machine Learning
Dokument4 Seiten
Feature Selection in Machine Learning
STYX
Noch keine Bewertungen
LSJ1507 - Context-Based Diversification
Dokument5 Seiten
LSJ1507 - Context-Based Diversification
Swetha Pattipaka
Noch keine Bewertungen
Unit-3 DWDM 7TH Sem Cse
Dokument54 Seiten
Unit-3 DWDM 7TH Sem Cse
Navdeep Khubber
Noch keine Bewertungen
Authors: Rupali & Japuneet Kaur Bajwa: M.Tech. (CSE)
Dokument57 Seiten
Authors: Rupali & Japuneet Kaur Bajwa: M.Tech. (CSE)
Rupali
Noch keine Bewertungen
What Are The Types of Machine Learning?
Dokument24 Seiten
What Are The Types of Machine Learning?
sahil kumar
100% (1)
Bio-Inspired Heuristics For Clustering in Big Data Applications
Dokument7 Seiten
Bio-Inspired Heuristics For Clustering in Big Data Applications
Ameya Deshpande.
Noch keine Bewertungen
GlobalLogic - Optimization Algorithms For Machine Learning
Dokument4 Seiten
GlobalLogic - Optimization Algorithms For Machine Learning
Kumar manickam
Noch keine Bewertungen
Objective Oriented SDLC Models
Dokument11 Seiten
Objective Oriented SDLC Models
Jasmin K Sam
Noch keine Bewertungen
An Efficient Method For High
Dokument6 Seiten
An Efficient Method For High
phaniteja
Noch keine Bewertungen
PAPER PUBLISH - Edited
Dokument9 Seiten
PAPER PUBLISH - Edited
Kranti Sri
Noch keine Bewertungen
Applications of TOP 10 Algorithms
Dokument16 Seiten
Applications of TOP 10 Algorithms
sharath
Noch keine Bewertungen
Lecture 25 K Means Clustering
Dokument28 Seiten
Lecture 25 K Means Clustering
sunita chalageri
Noch keine Bewertungen
Breast Cancer Classification
Dokument16 Seiten
Breast Cancer Classification
Tester
100% (1)
Recommender: An Analysis of Collaborative Filtering Techniques
Dokument5 Seiten
Recommender: An Analysis of Collaborative Filtering Techniques
Paras Mirra
Noch keine Bewertungen
Feature Extraction: 4.1. Principal Component Analysis (PCA)
Dokument10 Seiten
Feature Extraction: 4.1. Principal Component Analysis (PCA)
Amandeep Sharma
Noch keine Bewertungen
Graph Regularized Feature Selection With Data Reconstruction
Dokument10 Seiten
Graph Regularized Feature Selection With Data Reconstruction
Technos_Inc
Noch keine Bewertungen
Models For Machine Learning: M. Tim Jones
Dokument10 Seiten
Models For Machine Learning: M. Tim Jones
Shanti Guru
Noch keine Bewertungen
Unit-Iv DWDM
Dokument28 Seiten
Unit-Iv DWDM
varsha.j2177
Noch keine Bewertungen
Testing and Analysis of Predictive ML Algorithms-Pages-421-443
Dokument23 Seiten
Testing and Analysis of Predictive ML Algorithms-Pages-421-443
indaniarmaan2scribd
Noch keine Bewertungen
Effective Search Engine_final with modules
Dokument12 Seiten
Effective Search Engine_final with modules
1993helanangel
Noch keine Bewertungen
BSIT F21 DS & Algo Lecture 1
Dokument21 Seiten
BSIT F21 DS & Algo Lecture 1
aine h
Noch keine Bewertungen
Introduction To Clustering Thesaurus Generation Item Clustering
Dokument15 Seiten
Introduction To Clustering Thesaurus Generation Item Clustering
swathy reghu
Noch keine Bewertungen
Weka Clustering
Dokument15 Seiten
Weka Clustering
VyankteshKshirsagar
Noch keine Bewertungen
Scalable Content-Aware Collaborative Filtering
Dokument11 Seiten
Scalable Content-Aware Collaborative Filtering
Akn Nanthan
Noch keine Bewertungen
Comparison of Graph Clustering Algorithms
Dokument6 Seiten
Comparison of Graph Clustering Algorithms
seventhsensegroup
Noch keine Bewertungen
Data Type Integer String: Write All Text
Dokument3 Seiten
Data Type Integer String: Write All Text
Mickelle John Guadaña Leaño
Noch keine Bewertungen
Data Structures and C++ Interview Questions
Dokument99 Seiten
Data Structures and C++ Interview Questions
Karthik Reddy
Noch keine Bewertungen
National Institute of Technology Karnataka, Surathkal: A Project Report On
Dokument9 Seiten
National Institute of Technology Karnataka, Surathkal: A Project Report On
Buddharaj Ambhore
Noch keine Bewertungen
Which Machine Learning Algorithm Should I Use - The SAS Data Science Blog
Dokument15 Seiten
Which Machine Learning Algorithm Should I Use - The SAS Data Science Blog
tanvir anwar
Noch keine Bewertungen
Slicing approach to privacy preserving data publishing
Dokument19 Seiten
Slicing approach to privacy preserving data publishing
keerthana1234
Noch keine Bewertungen
Clustering Based Attribute Subset Selection Using Fast Algorithm
Dokument8 Seiten
Clustering Based Attribute Subset Selection Using Fast Algorithm
James Moreno
Noch keine Bewertungen
Unit 3 Clustering
Dokument28 Seiten
Unit 3 Clustering
Aman Prasad
Noch keine Bewertungen
Article 72
Dokument4 Seiten
Article 72
Tuấn Anh Nguyễn
Noch keine Bewertungen
Thesis On Query Optimization in Distributed Database
Dokument6 Seiten
Thesis On Query Optimization in Distributed Database
sheilabrooksvirginiabeach
100% (1)
Machine learning types and algorithms explained
Dokument3 Seiten
Machine learning types and algorithms explained
Syeda Khaliqa Hamid
Noch keine Bewertungen
Semantic Integration in Heterogeneous Databases Using Neural Networks
Dokument30 Seiten
Semantic Integration in Heterogeneous Databases Using Neural Networks
Sai Kiran Vemula
Noch keine Bewertungen
TOPIC ANALYSIS PRESENTATION
Dokument23 Seiten
TOPIC ANALYSIS PRESENTATION
Nader AlFakeeh
Noch keine Bewertungen
1-2 The Problem 3-4 Proposed Solution 5-7 The Experiment 8-9 Experimental Results 10-11 Conclusion 12 References 13
Dokument14 Seiten
1-2 The Problem 3-4 Proposed Solution 5-7 The Experiment 8-9 Experimental Results 10-11 Conclusion 12 References 13
NikhilLinSaldanha
Noch keine Bewertungen
Improving The Efficiency of Document Clustering and Labeling Using Modified FPF Algorithm
Dokument8 Seiten
Improving The Efficiency of Document Clustering and Labeling Using Modified FPF Algorithm
prakash
Noch keine Bewertungen
Principal Component Analysis
Dokument3 Seiten
Principal Component Analysis
Kang Chul
Noch keine Bewertungen
Module 3.5 Ensemble Learning XGBoost
Dokument26 Seiten
Module 3.5 Ensemble Learning XGBoost
Duane Eugenio Ani
Noch keine Bewertungen
Intelligent Information Retrieval From The Web
Dokument4 Seiten
Intelligent Information Retrieval From The Web
Rajeev Prithyani
Noch keine Bewertungen
Research Papers On K Means Clustering
Dokument5 Seiten
Research Papers On K Means Clustering
tutozew1h1g2
100% (3)
Recent Advances in Clustering A Brief Survey
Dokument9 Seiten
Recent Advances in Clustering A Brief Survey
Anwar Shah
Noch keine Bewertungen
Feature Selection in Machine Learning with Python
Von Everand
Feature Selection in Machine Learning with Python
Soledad Galli
Noch keine Bewertungen
ARV4518PW User Manual (2007-12-22)
Dokument144 Seiten
ARV4518PW User Manual (2007-12-22)
elpitagorico
Noch keine Bewertungen
Wa470 5 PDF
Dokument12 Seiten
Wa470 5 PDF
Dino Oporto Prudencio
Noch keine Bewertungen
Timing Simulation
Dokument3 Seiten
Timing Simulation
Karthik Sharma
100% (1)
Sertifikat InWOCNA DEF
Dokument1.208 Seiten
Sertifikat InWOCNA DEF
c4What ID
Noch keine Bewertungen
COM+ Event Tutorial
Dokument11 Seiten
COM+ Event Tutorial
Roger Martinez Dx
Noch keine Bewertungen
Aa 7000
Dokument28 Seiten
Aa 7000
Evelton
100% (1)
Nama: Randi Padilah Nim: 20200810062 Class: TINFC-2020-01
Dokument5 Seiten
Nama: Randi Padilah Nim: 20200810062 Class: TINFC-2020-01
Randy Fadilah
Noch keine Bewertungen
Unimotor Technical Data
Dokument108 Seiten
Unimotor Technical Data
Eugen Blaga
Noch keine Bewertungen
Micom P12X/Y: Three Phase and Earth Fault Overcurrent Relays
Dokument8 Seiten
Micom P12X/Y: Three Phase and Earth Fault Overcurrent Relays
Andri Wahyudi
Noch keine Bewertungen
Computer Vocabulary Can You Match The Definitions and The Words?
Dokument1 Seite
Computer Vocabulary Can You Match The Definitions and The Words?
ANGELA ANDREA
Noch keine Bewertungen
Working With Statistics Using Excel: K.V.S. Sarma Professor of Statistics Sri Venkateswara University Tirupati - 517 502
Dokument50 Seiten
Working With Statistics Using Excel: K.V.S. Sarma Professor of Statistics Sri Venkateswara University Tirupati - 517 502
Jitendra Kumar
Noch keine Bewertungen
Applications & Tools: Example Blocks For Wincc and Step 7
Dokument48 Seiten
Applications & Tools: Example Blocks For Wincc and Step 7
krcedinac04
Noch keine Bewertungen
Timers of ATmega16 Microcontroller
Dokument21 Seiten
Timers of ATmega16 Microcontroller
alfibaria
Noch keine Bewertungen
Sterrad 100S - Maintenance Guide
Dokument44 Seiten
Sterrad 100S - Maintenance Guide
Raicevic Milan Trico
67% (3)
Customizing RMS Receipts
Dokument3 Seiten
Customizing RMS Receipts
Nico
Noch keine Bewertungen
Computer Specifications: Hardware Requirements
Dokument2 Seiten
Computer Specifications: Hardware Requirements
Odellien Saja
Noch keine Bewertungen
Remote Graphics Software RGS DL&Install
Dokument1 Seite
Remote Graphics Software RGS DL&Install
HELIX2009
Noch keine Bewertungen
Dac Linux Installation Notes White Paper
Dokument7 Seiten
Dac Linux Installation Notes White Paper
jbeatofl
Noch keine Bewertungen
Nagios Fan For Hyper-V
Dokument28 Seiten
Nagios Fan For Hyper-V
nia8aprihansah
Noch keine Bewertungen
Mouse As Joy Text
Dokument5 Seiten
Mouse As Joy Text
Williams Nilo
Noch keine Bewertungen
7 Innovation Secrets of Steve Jobs
Dokument66 Seiten
7 Innovation Secrets of Steve Jobs
KarthickPackiam
Noch keine Bewertungen
RSA306B USB Real Time Spectrum Analyzer Datasheet 37W603759
Dokument26 Seiten
RSA306B USB Real Time Spectrum Analyzer Datasheet 37W603759
HafiziAhmad
Noch keine Bewertungen
Big Bazaar
Dokument3 Seiten
Big Bazaar
Shreya Batra
Noch keine Bewertungen
Tle Css - Grade 10: Let Us Discover
Dokument7 Seiten
Tle Css - Grade 10: Let Us Discover
KentJosephEspinosaPalua
Noch keine Bewertungen
Wily Intro Scope Faq
Dokument3 Seiten
Wily Intro Scope Faq
ramprasad_ks
75% (4)
BIOS Password and Locked Hard Disk Recovery
Dokument4 Seiten
BIOS Password and Locked Hard Disk Recovery
jelenjek83
Noch keine Bewertungen
Configure Cacti to use Spine poller
Dokument5 Seiten
Configure Cacti to use Spine poller
Ngụy Điệp
Noch keine Bewertungen
160kva Spec Sheet
Dokument3 Seiten
160kva Spec Sheet
Godfrey James Machota
Noch keine Bewertungen
Running Java Bytecode On The Android - Sun JVM On Top of DalvikVM - Stack Overflow
Dokument2 Seiten
Running Java Bytecode On The Android - Sun JVM On Top of DalvikVM - Stack Overflow
fcassia625
Noch keine Bewertungen
Pic18 Spi Source Code
Dokument9 Seiten
Pic18 Spi Source Code
evilvortex
Noch keine Bewertungen