Beruflich Dokumente
Kultur Dokumente
Data: Data refers to raw facts usually collected, as a result of experience, observation, experiment or processes within a computer system or a set of premises. Data are often viewed as a lowest level of abstraction from which information and knowledge are derived Database: A database is a structured collection of records or data that is stored on your computer system. The structure is achieved by organizing the data according to the database model. Database Management System : A Database Management System is a computer program that enables, modify or store information in a database Information: Information refers to the collection of processed data Repository: A repository is a central place in which an aggregating data is kept and maintained in an organized way usually in computer storage. Data Warehouse: A datawarehouse is a subject-oriented, integrated, timevariant and non-volatile collection of data in support of management decision making process. Data mining: Extraction of interesting information or patterns from data in large databases.
Data Preprocessing:
Data Preprocessing is used to avoid dirty, incomplete, noisy and inconsistent data. Major tasks in data preprocessing: 1. Data Cleaning 2. Data Integration 3. Data Transformation 4. Data Reduction 5. Data Discretization
WEKA: versions:
There are several versions of WEKA: WEKA 3.0: book version compatible with description in data mining book WEKA 3.2: GUI version adds graphical user interfaces (book version is command-line only) WEKA 3.3: development version with lots of improvements
WEKA understands ARFF, CSV, C4.5 and binary file formats. In total, WEKA understands flat files only.
Exploring WEKA:
When you open WEKA, it looks like below. WEKA GUI has 4 tabs: Simple CLI, Explorer, Experimenter, Knowledge flow
2. Save AS filename.XLS 3. Open the same file but now save as filename. CSV and save as type: CSV (delimited) 4. Open the same file but now save as filename. CSV and save as type: CSV (delimited)
file
like
6. Now open the filename. CSV with MS-Word and type the format as below
7. Now save as filename.ARFF and save as type : plain text 8. An ARFF file will get created like below
9. If you click this ARFF file, directly you Will enter into the WEKA GUI environment like below
known as level-wise search, where k-itemsets are used to explore (k+1) itemsets. Algorithm : Input : Database D, min_sup. Output : L, frequent itemsets in D. Method : (1) L1= find_frrequent_1-itemsets (2) For (k=2;Lk-1; k++) { (3) Ck= apriori_gen(Lk-1, min_sup); (4) For each transaction t D { (5) Ct =subset (Ck, t); (6) For each candidate c Ct (7) c.count++; (8) } (9) Lk ={c Ck|c.count >= min_sup} (10) } (11) Return L=UkLk; Procedure apriori_gen (Lk-1:frequent (k-1)-itemsets;min_sup) (1) for each itemset l1Lk-1 (2) for each itemset l2 Lk-1 (3) if (l1[1]=l2[1])(l1[2]=l2[2]).(l1[k-2]=l2[k-2])(l1[k1]<l2[k-1]) then (4) c=l1l2; (5) if has_infrequent_subset(c, Lk-1) then (6) delete c; (7) else add c to Ck; (8) } (9) Return Ck Procedure has_infrequent_subset(c:candidate k-itemset;lk-1) (1) for each (k-1)-subsets of c (2) if s Lk-1 then (3) return true; (4) return false;
APRIORI in WEKA Explorer: 1. Open WEKA GUI, a window like below will come
2. Now Click the Explorer tab, a window like below will come
3. Click the open file button and select the arff file to load in your WEKA like below..
4. After loading , all your data will be seen in your Explorer window of WEKA like below
5.
When you click the choose button, a list of association rule mining algorithms will come. In that click APRIORI
6. We can even set the properties, that is, adjustment of support and confidence thresholds by right clicking on that..
7. When you click on start button, your algorithm will run and the output (Strong association rules) will be appeared in the output window