Sie sind auf Seite 1von 46

Roll No : 04 09 8039

1. Gain insight for running pre-defined decision trees and explore results
using MS OLAP Analytics.

Solution:

The purpose of this experiment is to generate a decision tree for a given data set. We can either write our own data set or use a predefined data set provided to us as in this case is the bank example.

We start off by opening the weka explorer window.

The above screen would appear, click on the Explorer button to begin.

1|Page

Roll No : 04 09 8039

Once the explorer is opened, we click on open file and select the appropriate data set. REMEMBER the data sets if written by you it should be saved as .arff file. For now we import an already existing file by using the open option. As shown below,

2|Page

Roll No : 04 09 8039 Once we import the data set, the software itself generates the following,

Weka displays all the attributes of the imported dataset and shows some statistics based graph.

Click on the classify tab and choose the algorithm J48 as shown below,

3|Page

Roll No : 04 09 8039

After selecting J48, we can see the defaults are selected that is Cross-Validation and in the drop down box, Nominal attribute pep is selected as default. Click on the start button and an output screen as below will be seen,

4|Page

Roll No : 04 09 8039

In the result list Right Click trees.J48 & choose the option visualize tree, as seen below,

5|Page

Roll No : 04 09 8039

The output as below will be generated,

We also save the result buffer as follows,

The buffer result gives information about the TP RATE, FP RATE, PRECISION, RECALL, F-MEASURE and the CONFUSION MATRIX.

6|Page

Roll No : 04 09 8039

7|Page

Roll No : 04 09 8039

Buffer result.arff (Includes Time taken, Decision tree details, confusion matrix and all other details) Upon opening the buffer result.arff file we get,

=== Run information === Scheme: Relation: Instances: Attributes: weka.classifiers.trees.J48 -C 0.25 -M 2 bank 300 9 age sex region income married children car mortgage pep 10-fold cross-validation

Test mode:

=== Classifier model (full training set) === J48 pruned tree -----------------children = YES | income <= 30099.3

8|Page

Roll No : 04 09 8039
| | car = YES: NO (50.0/15.0) | | car = NO | | | married = YES | | | | income <= 13106.6: NO (9.0/2.0) | | | | income > 13106.6 | | | | | mortgage = YES: YES (12.0/3.0) | | | | | mortgage = NO | | | | | | income <= 18923: YES (9.0/3.0) | | | | | | income > 18923: NO (10.0/3.0) | | | married = NO: NO (22.0/6.0) | income > 30099.3: YES (59.0/7.0) children = NO | married = YES | | mortgage = YES | | | region = INNER_CITY | | | | income <= 39547.8: YES (12.0/3.0) | | | | income > 39547.8: NO (4.0) | | | region = RURAL: NO (3.0/1.0) | | | region = TOWN: NO (9.0/2.0) | | | region = SUBURBAN: NO (4.0/1.0) | | mortgage = NO: NO (57.0/9.0) | married = NO | | mortgage = YES | | | age <= 39 | | | | age <= 28: NO (4.0) | | | | age > 28: YES (5.0/1.0) | | | age > 39: NO (11.0) | | mortgage = NO: YES (20.0/1.0) Number of Leaves Size of the tree : : 17 31

Time taken to build model: 0.09 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances Incorrectly Classified Instances Kappa statistic Mean absolute error Root mean squared error Relative absolute error Root relative squared error Total Number of Instances === Detailed Accuracy By Class === TP Rate Class YES NO Weighted Avg. 0.536 0.815 0.687 FP Rate 0.185 0.464 0.336 Precision 0.712 0.673 0.691 Recall 0.536 0.815 0.687 F-Measure 0.612 0.737 0.68 ROC Area 0.683 0.683 0.683 206 94 0.3576 0.379 0.4816 76.2791 % 96.6145 % 300 68.6667 % 31.3333 %

=== Confusion Matrix === a b <-- classified as

9|Page

Roll No : 04 09 8039
74 64 | 30 132 | a = YES b = NO

This completes the first experiment.

2. Design a data mart from scratch to store the credit history of customers of
a bank. Use this credit profiling to process future loan applications.

10 | P a g e

Roll No : 04 09 8039

Solution:

The purpose of this experiment is the prediction of attribute values for the future unknown attribute values using the known class labeled values of the records.

Start the software SPRINT (WELLY), as shown below,

Click on the Explorer button to open the Welly Explorer,

11 | P a g e

Roll No : 04 09 8039

Use the open file button to import the bank data set,

12 | P a g e

Roll No : 04 09 8039

Select the classify tab and choose the J48 algorithm and as before click the start button.

13 | P a g e

Roll No : 04 09 8039 After this, in the test options section select the supplied test set option and click the set button and provide the test set file bank-new.arff.

Once we have provided the test set i.e. bank-new.arff we right click on the result list and choose the option as visualize classifier errors, as shown below,

14 | P a g e

Roll No : 04 09 8039

The output screen as below will be seen,

15 | P a g e

Roll No : 04 09 8039

On this page click on the save option and save the file as shown below,

The saved file contains the predicted values. We show this by comparing all the 3 datasets being used here, Bank.arff (main dataset), Bank-new.arff (supplied test set) and bankpredicted.arff (generated output).

16 | P a g e

Roll No : 04 09 8039

Bank.arff:

Bank-new.arff:
17 | P a g e

Roll No : 04 09 8039

Bank-predicted.arff:

18 | P a g e

Roll No : 04 09 8039

Thus the values have been predicted. As we can see that the question marks have been replaced with a yes or no value.

19 | P a g e

Roll No : 04 09 8039

3. For a given dataset generate the Association rules using weka and based
on these association rules describe which rules are Strong and which rules are Weak.

Solution:

The purpose of this experiment is to generate the associate rules for a given dataset in this case we use the contact lenses dataset. Point to remember here is that association rules can be only generated for nominal attributes (that is only attributes which have a choice as in [yes,no] or [male,female] etc). Based on the generated rules we have to calculate the support and confidence values and then describe which rules are Strong or Weak.

We start off by opening the weka explorer window.

The above screen would appear, click on the Explorer button to begin.

20 | P a g e

Roll No : 04 09 8039

Now provide the contact-lenses data set using the open file option,

21 | P a g e

Roll No : 04 09 8039

Once we have done this, click on the associate tab and choose the Apriori algorithm which should be chosen by default,

And then click the start button, the associate rules will be generated as below,
22 | P a g e

Roll No : 04 09 8039

=== Run information === Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1 Relation: contact-lenses Instances: 24 Attributes: 5 age spectacle-prescrip astigmatism tear-prod-rate contact-lenses === Associator model (full training set) === Apriori ======= Minimum support: 0.2 (5 instances) Minimum metric <confidence>: 0.9 Number of cycles performed: 16 Generated sets of large itemsets:

23 | P a g e

Roll No : 04 09 8039
Size of set of large itemsets L(1): 11 Size of set of large itemsets L(2): 21 Size of set of large itemsets L(3): 6 Best rules found: 1. tear-prod-rate=reduced 12 ==> contact-lenses=none 12 conf:(1) 2. spectacle-prescrip=myope tear-prod-rate=reduced 6 ==> contactlenses=none 6 conf:(1) 3. spectacle-prescrip=hypermetrope tear-prod-rate=reduced 6 ==> contactlenses=none 6 conf:(1) 4. astigmatism=no tear-prod-rate=reduced 6 ==> contact-lenses=none 6 conf:(1) 5. astigmatism=yes tear-prod-rate=reduced 6 ==> contact-lenses=none 6 conf:(1) 6. contact-lenses=soft 5 ==> astigmatism=no 5 conf:(1) 7. contact-lenses=soft 5 ==> tear-prod-rate=normal 5 conf:(1) 8. tear-prod-rate=normal contact-lenses=soft 5 ==> astigmatism=no 5 conf:(1) 9. astigmatism=no contact-lenses=soft 5 ==> tear-prod-rate=normal 5 conf:(1) 10. contact-lenses=soft 5 ==> astigmatism=no tear-prod-rate=normal 5 conf:(1)

Confidence values are already given. Support values must be calculated by dividng the value in each rule by the total no. of instances i.e. For example, 1st rule says that there are 12 instances where, for tear-prod-rate attribute the value is reduced and for all that contact-lenses attribute value is none so support is calculated as 12 divided by total no. of instances which is 24. Hence , 12/24 = 0.5. Thus the support for rule 1 is 0.5 and confidence is 1. Likewise, support and confidence values should be calculated for all the rules. Based on the question given we can decide as to whether the rule is strong or weak.( In the question it would be mentioned as rules with support value 0.5 or above and confidence value 1 are all strong rules, so based on such a question we must calculate the values and demonstrate which rules are strong and which rules are weak.)

24 | P a g e

Roll No : 04 09 8039

4. To understand ETL (Extract Transform Load) processes.

Solution:

The purpose of this experiment is to create 2 tables and show inner join operation on these tables using MySQL 5.0 based on query.

We begin by starting the MySQL command prompt. The command prompt asks for a password. As shown below,

Enter password: **** (i.e. lab4) Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.1.42-community MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases; +--------------------+ | Database +--------------------+ | information_schema | | mysql | test | useless | | | |

+--------------------+ 4 rows in set (0.03 sec)

mysql> use information_schema;

25 | P a g e

Roll No : 04 09 8039
Database changed mysql> show tables; +---------------------------------------+ | Tables_in_information_schema +---------------------------------------+ | CHARACTER_SETS | COLLATIONS | | |

| COLLATION_CHARACTER_SET_APPLICABILITY | | COLUMNS | COLUMN_PRIVILEGES | ENGINES | EVENTS | FILES | GLOBAL_STATUS | GLOBAL_VARIABLES | KEY_COLUMN_USAGE | PARTITIONS | PLUGINS | PROCESSLIST | PROFILING | REFERENTIAL_CONSTRAINTS | ROUTINES | SCHEMATA | SCHEMA_PRIVILEGES | SESSION_STATUS | SESSION_VARIABLES | STATISTICS | TABLES | TABLE_CONSTRAINTS | TABLE_PRIVILEGES | | | | | | | | | | | | | | | | | | | | | |

26 | P a g e

Roll No : 04 09 8039
| TRIGGERS | USER_PRIVILEGES | VIEWS | | |

+---------------------------------------+ 28 rows in set (0.00 sec)

mysql> use mysql; Database changed mysql> show tables; +---------------------------+ | Tables_in_mysql +---------------------------+ | columns_priv | db | event | func | general_log | help_category | help_keyword | help_relation | help_topic | host | ndb_binlog_index | plugin | proc | procs_priv | servers | slow_log | tables_priv | time_zone | | | | | | | | | | | | | | | | | | |

27 | P a g e

Roll No : 04 09 8039
| time_zone_leap_second | time_zone_name | time_zone_transition | | |

| time_zone_transition_type | | user |

+---------------------------+ 23 rows in set (0.19 sec)

mysql> create database jkl; Query OK, 1 row affected (0.00 sec)

mysql> use jkl; Database changed

mysql> create table orders(orderid varchar(5),productid varchar(5),quantity int( 5),unitsaleprice int(5),discountprice int(5),numoffreeservice int(3)); Query OK, 0 rows affected (0.05 sec)

mysql> insert into orders values('O101','P121',10,5000,10,2); Query OK, 1 row affected (0.03 sec)

mysql> insert into orders values('O101','P01',1,234,5,3); Query OK, 1 row affected (0.02 sec)

mysql> insert into orders values('O101','P180',2,4000,12,3); Query OK, 1 row affected (0.02 sec)

mysql> insert into orders values('O102','P02',5,2500,3,2); Query OK, 1 row affected (0.02 sec)

28 | P a g e

Roll No : 04 09 8039
mysql> insert into orders values('O102','P122',2,2800,3,2); Query OK, 1 row affected (0.03 sec)

mysql> SELECT * FROM ORDERS; +---------+-----------+----------+---------------+---------------+------------------+ | orderid | productid | quantity | unitsaleprice | discountprice | numoffreeservice | +---------+-----------+----------+---------------+---------------+------------------+ | O101 | O101 | O101 | O102 | O102 | P121 | P01 | P180 | P02 | P122 | | | | | 10 | 1| 2| 5| 2| 5000 | 234 | 4000 | 2500 | 2800 | 10 | 5| 12 | 3| 3| 2| 3| 3| 2| 2|

+---------+-----------+----------+---------------+---------------+------------------+ 5 rows in set (0.00 sec)

mysql> create table products(productid varchar(5) primary key,companyid varchar( 5),productname varchar(10),producttype varchar(10),productprice int(5),productdo m date,productinstock int(5)); Query OK, 0 rows affected (0.06 sec)

mysql> insert into products values('P01','30','CABLE',121,234,'1999-01-09',25); Query OK, 1 row affected (0.02 sec)

mysql> insert into products values('P02','28','OPCABLE',122,500,'1998-08-04',35) ; Query OK, 1 row affected (0.03 sec)

mysql> insert into products values('P121','12','MONITOR',147,5000,'2001-09-25',1 9);

29 | P a g e

Roll No : 04 09 8039
Query OK, 1 row affected (0.03 sec)

mysql> insert into products values('P122','11','BATTERY',124,1400,'2003-08-15',1 5);

Query OK, 1 row affected (0.03 sec)

mysql> insert into products values('P180','2','FAX',168,2100,'2003-03-12',12); Query OK, 1 row affected (0.01 sec)

mysql> SELECT * FROM PRODUCTS; +-----------+-----------+-------------+-------------+--------------+------------+----------------+ | productid | companyid | productname | producttype | productprice | productdom | productinstock | +-----------+-----------+-------------+-------------+--------------+------------+----------------+ | P01 | P02 | P121 | P122 | P180 | 30 | 28 | 12 | 11 |2 | CABLE | OPCABLE | MONITOR | BATTERY | FAX | 121 | 122 | 147 | 124 | | | | | 234 | 1999-01-09 | 500 | 1998-08-04 | 5000 | 2001-09-25 | 1400 | 2003-08-15 | 2100 | 2003-03-12 | 25 | 35 | 19 | 15 | 12 |

| 168

+-----------+-----------+-------------+-------------+--------------+------------+----------------+ 5 rows in set (0.00 sec)

mysql> select p1.productid,p1.productname,p2.quantity,p2.orderid from products as p1 INNER JOIN orders as p2 on p1.productid=p2.productid; +-----------+-------------+----------+---------+ | productid | productname | quantity | orderid | +-----------+-------------+----------+---------+ | P01 | P02 | P121 | CABLE | OPCABLE | MONITOR | | | 1 | O101 5 | O102 10 | O101 | | |

30 | P a g e

Roll No : 04 09 8039
| P122 | P180 | BATTERY | FAX | | 2 | O102 2 | O101 | |

+-----------+-------------+----------+---------+ 5 rows in set (0.00 sec)

Various options such as show databases;, show tables;, use database-name; can be used as well.

31 | P a g e

Roll No : 04 09 8039

5. Generate a report using the Report Studio of cognos 8 using attributes


from provided tables.

Solution:

The purpose of this experiment is to show the generation of a report using the report studio of cognos using the tables provided.

We start of opening the internet explorer and typing in the address,

192.100.100.150/cognos 8

A page as below will appear,

32 | P a g e

Roll No : 04 09 8039 Click on Quick Tour and in here select the MJCET public folder among the options and click on the report studio option present at the top right of the screen. As shown below,

Once u click on the report studio, the report studio opens as below,

33 | P a g e

Roll No : 04 09 8039

Choose the create new report option and begin,

34 | P a g e

Roll No : 04 09 8039

You can choose any of the mentioned options, in this example I show using a list, so click on the list option and press OK.

35 | P a g e

Roll No : 04 09 8039 Using the inset table options present on the left of the report studio, from the mypck folder we can use any of the tables. Here we use the customer table and import some of its attributes onto the right side as shown above. We simply drag and drop the attribute onto the right side to generate the view as above.

We then use the play/run button on the tools horizontal menu and click on it to generate the report.

The run button is the 12th button from the left. As shown above,

The generated report is as shown below,

36 | P a g e

Roll No : 04 09 8039

6. Generate Query based reports using Query Studio which performs the
following operations. i. ii. iii. iv. Pivot Group Ungroup Filter

Solution:

The purpose of this experiment is to show the generation of a query based reports using the query studio of cognos using the tables provided.

We start of opening the internet explorer and typing in the address,

192.100.100.150/cognos 8

A page as below will appear,

37 | P a g e

Roll No : 04 09 8039

Click on Quick Tour and in here select the MJCET public folder among the options and click on the query studio option present at the top right of the screen. As shown below,

38 | P a g e

Roll No : 04 09 8039

The query studio begins as below,

39 | P a g e

Roll No : 04 09 8039

Select the attributes by merely just drag and drop from left to right side or use the Insert button provided on the left side of the query studio as shown in the above image. Use the buttons in the tools to generate the reports for the corresponding operations.

In query studio as it is in report studio we have a list of tools as shown below,

These tools include the operations such as run, group, ungroup, pivot and filter buttons as shown above.

i.

Pivot:

Here in query studio we choose order and product tables attributes to show the operations. In pivot we show the relation between attributes as shown below,

40 | P a g e

Roll No : 04 09 8039

As we can see we have selected orderId, productId and Quantity from the 2 tables mentioned above and we have selected orderId as the attribute upon which we apply the pivot operation this is highlighted by the yellow colour. We then use the pivot button provided in the tools options and we generate the below,

41 | P a g e

Roll No : 04 09 8039

ii.

Group:
In group operation we show the forming of groups for a particular inserted value, here we use orderId, productId and productName attributes and group on orderId attribute. As we can see below,

Selection of orderId as the attribute on which group operation is applied is signified by yellow colour.

42 | P a g e

Roll No : 04 09 8039

iii.

Ungroup:
In ungroup operation we show the opposite of grouping operation we remove the crosstab formed using the group operation. As we can see below,

43 | P a g e

Roll No : 04 09 8039 Selection of orderId as the attribute on which group operation is applied is signified by yellow colour.

iv.

Filter:
In Filter operation we can choose which values of a particular selection of attributes we want to view. As we can see below,

44 | P a g e

Roll No : 04 09 8039

We have selected orderId, productId and Quantity as the attributes and select orderId as our attribute on which filter operation is applied signified by the yellow color.

Here we select the orderId values for which the report should be generated as we can see in the previous figure. We then hit the ok button and we generate the report below,
45 | P a g e

Roll No : 04 09 8039

46 | P a g e

Das könnte Ihnen auch gefallen