Sie sind auf Seite 1von 6

Data Mining Assignment-2 Kunal Sonalkar (UFID:76411926)

1. Dataset Titanic:
Rules which met criteria “Support = 0.01” and “Confidence = 0.9”

lhs rhs support confidence lift


1 {Class=2nd,Age=Child} => {Survived=Yes} 0.01090413 1.0000000 3.095640
2 {Class=1st,Sex=Female} => {Survived=Yes} 0.06406179 0.9724138 3.010243
3 {Class=2nd,Sex=Male,Age=Adult} => {Survived=No} 0.06996820 0.9166667 1.354083
4 {Class=1st,Sex=Female,Age=Adult} => {Survived=Yes} 0.06360745 0.9722222 3.009650

Rules which are left after removing redundant rules and sorted according to the lift value:

lhs rhs support confidence lift


1 {Class=2nd,Age=Child} => {Survived=Yes} 0.01090413 1.0000000 3.095640
2 {Class=1st,Sex=Female} => {Survived=Yes} 0.06406179 0.9724138 3.010243
3 {Class=2nd,Sex=Male,Age=Adult} => {Survived=No} 0.06996820 0.9166667 1.354083

Here the 4th rule is removed from the above table because it was subset of Rule 2 in the table. So
when we remove the redundant rules then it gets eliminated.
If we consider the 2nd rule (highlighted): it means that there are 6.4% cases in which Sex=”Female” and
“Survived=Yes” co-exist.
Confidence shows how confident the algorithm is in regards to this association.
Lift is a performance measure of a target model at classifying or predicting the cases.
Scatter Plot

Conclusion:
There are 3 major rules which we get from Apriori algorithm. The first rule has confidence=1 which
implies that whenever {Class=2nd,Age=Child} we will have {Survives=Yes}.
2. Dataset Game of Thrones:
Thrones

Rules which met criteria “Support = 0.01” and “Confidence


“Conf = 0.9”

After Removing Redundant rules we get the following rules:


-I have put a screenshot of first few rules generated after removing the redundant rules and they have
been sorted according to their “lift” values. Total there are 51 rules.
If we run the code as :
inspect(subset(rules2.sorted, subset = rhs %pin% "Survives=0")),
"Survives=0")), it gives me all the rules in the sorted
order of the lift value whose RHS value is “Survives=0”.
But when we execute this line of code, we get 0 results because there is no rule where we get
“Survive=0” on the RHS.
Hence there is no rule which says otherwise about the survival of Jon Snow.

Considering Nobility:
The following rules have “Nobility=1” in LHS and “Survives=1” in RHS. So “Nobility” does play a role in
survival.

There are 15 rules out of 51 where “Nobility=1” implies “Survives=1”.


Hence Nobility doesn’t have a very high confidence but it plays a role in “Survival” attribute
attribute.
As we don’tt have any rule which says {Survives=0} hence any favourite character we pick, he/she will
survive.
Considering Gender:
For male:

There are 5 rules from “Gender=Male” and 9 rules from “Gender=Female” out of 51.
Clearly the “Gender” attribute hasn’t got much role to determine “Survival”.
For Female:

3. Dataset Retail:

Total there are 1943 rules. I have put up a few rules from the result.
LHS RHS Supp. Conf. Lift
1931 {Bread=0,
Cleaner=0,
Dairy=1,
Drink=0,
Dry_BakingGoods=1,
Fruit=1,
PaperGoods=0,
Vegetable=1} => {Beverage=1} 0.0203 0.9022222 1.363491
1932 {Bread=0,
Cleaner=0,
CannedGoods=0,
Dairy=1,
Drink=0,
Dry_BakingGoods=1,
FrozenFood=0,
Fruit=1,
Tobacco=0} => {Beverage=1} 0.0129 0.9020979 1.363303
1933 {Bread=0,
Cleaner=0,
Dairy=1,
Drink=0,
Dry_BakingGoods=1,
Fruit=1,
PaperGoods=0} => {Beverage=1} 0.0221 0.9020408 1.363217
1934 {Cleaner=1,
Drink=0,
FrozenFood=0,
Fruit=0,
Tobacco=0} => {Beverage=1} 0.0101 0.9017857 1.362832
1935 {Bread=0,
Cleaner=0,
CannedGoods=0,
Drink=0,
Dry_BakingGoods=1,
Fruit=1,
PaperGoods=0,
Vegetable=1} => {Beverage=1} 0.0210 0.9012876 1.362079
1936 {Bread=0,
Cleaner=0,
CannedGoods=0,
Drink=0,
Dry_BakingGoods=1,
Fruit=1,
PaperGoods=0} => {Beverage=1} 0.0228 0.9011858 1.361925
1937 {Cleaner=0,
CannedGoods=0,
Dairy=0,
Drink=0,
Tobacco=1} => {Beverage=1} 0.0200 0.9009009 1.361494
1938 {Cleaner=0,
Drink=0,
FrozenFood=0,
Fruit=1,
PaperGoods=1,
Tobacco=1,
Vegetable=1} => {Beverage=1} 0.0109 0.9008264 1.361382
1939 {Cleaner=0,
CannedGoods=1,
Dairy=0,
Drink=0,
Dry_BakingGoods=1} => {Beverage=1} 0.0118 0.9007634 1.3612871940 {Bread=0,
Cleaner=0,
Drink=0,
Dry_BakingGoods=1,
Fruit=1,
PaperGoods=0,
Vegetable=1} => {Beverage=1} 0.0299 0.9006024 1.361043
1941 {Bread=0,
Cleaner=0,
CannedGoods=1,
Drink=0,
Vegetable=1} => {Beverage=1} 0.0261 0.9000000 1.360133
1942 {Bread=0,
Cleaner=0,
Drink=0,
Dry_BakingGoods=1,
FrozenFood=0,
PaperGoods=0,
Vegetable=1} => {Beverage=1} 0.0162 0.9000000 1.360133
1943 {Drink=0,
Dry_BakingGoods=0,
FrozenFood=0,
Fruit=1,
PaperGoods=1,
Tobacco=1,
Vegetable=1} => {Beverage=1} 0.0144 0.9000000 1.360133

Conclusion:
According to the rules if “Vegetable=1” then “Beverage=1” will have a high confidence.
Hence sale of items like “Vegetables” will give us more sale of “Beverage”. And forgoing “vegetables”
will decrease it.
There are many other rules which can be used to establishing useful relations between sale of
different products.

Major Challenges Faced:

1. While using the function as.factor to convert the nominal attributes to categorical, we run out of
contiguous memory and it shows an error of memory management. So in this case we need to
free some already installed packages and then run the script.
2. In Game of Thrones dataset according to our given support and confidence thresholds we don’t
get any rule which says {Survives=0} hence it becomes very difficult to predict accurately the fate
of a character in the upcoming records.

Das könnte Ihnen auch gefallen