Beruflich Dokumente
Kultur Dokumente
1. Dataset Titanic:
Rules which met criteria “Support = 0.01” and “Confidence = 0.9”
Rules which are left after removing redundant rules and sorted according to the lift value:
Here the 4th rule is removed from the above table because it was subset of Rule 2 in the table. So
when we remove the redundant rules then it gets eliminated.
If we consider the 2nd rule (highlighted): it means that there are 6.4% cases in which Sex=”Female” and
“Survived=Yes” co-exist.
Confidence shows how confident the algorithm is in regards to this association.
Lift is a performance measure of a target model at classifying or predicting the cases.
Scatter Plot
Conclusion:
There are 3 major rules which we get from Apriori algorithm. The first rule has confidence=1 which
implies that whenever {Class=2nd,Age=Child} we will have {Survives=Yes}.
2. Dataset Game of Thrones:
Thrones
Considering Nobility:
The following rules have “Nobility=1” in LHS and “Survives=1” in RHS. So “Nobility” does play a role in
survival.
There are 5 rules from “Gender=Male” and 9 rules from “Gender=Female” out of 51.
Clearly the “Gender” attribute hasn’t got much role to determine “Survival”.
For Female:
3. Dataset Retail:
Total there are 1943 rules. I have put up a few rules from the result.
LHS RHS Supp. Conf. Lift
1931 {Bread=0,
Cleaner=0,
Dairy=1,
Drink=0,
Dry_BakingGoods=1,
Fruit=1,
PaperGoods=0,
Vegetable=1} => {Beverage=1} 0.0203 0.9022222 1.363491
1932 {Bread=0,
Cleaner=0,
CannedGoods=0,
Dairy=1,
Drink=0,
Dry_BakingGoods=1,
FrozenFood=0,
Fruit=1,
Tobacco=0} => {Beverage=1} 0.0129 0.9020979 1.363303
1933 {Bread=0,
Cleaner=0,
Dairy=1,
Drink=0,
Dry_BakingGoods=1,
Fruit=1,
PaperGoods=0} => {Beverage=1} 0.0221 0.9020408 1.363217
1934 {Cleaner=1,
Drink=0,
FrozenFood=0,
Fruit=0,
Tobacco=0} => {Beverage=1} 0.0101 0.9017857 1.362832
1935 {Bread=0,
Cleaner=0,
CannedGoods=0,
Drink=0,
Dry_BakingGoods=1,
Fruit=1,
PaperGoods=0,
Vegetable=1} => {Beverage=1} 0.0210 0.9012876 1.362079
1936 {Bread=0,
Cleaner=0,
CannedGoods=0,
Drink=0,
Dry_BakingGoods=1,
Fruit=1,
PaperGoods=0} => {Beverage=1} 0.0228 0.9011858 1.361925
1937 {Cleaner=0,
CannedGoods=0,
Dairy=0,
Drink=0,
Tobacco=1} => {Beverage=1} 0.0200 0.9009009 1.361494
1938 {Cleaner=0,
Drink=0,
FrozenFood=0,
Fruit=1,
PaperGoods=1,
Tobacco=1,
Vegetable=1} => {Beverage=1} 0.0109 0.9008264 1.361382
1939 {Cleaner=0,
CannedGoods=1,
Dairy=0,
Drink=0,
Dry_BakingGoods=1} => {Beverage=1} 0.0118 0.9007634 1.3612871940 {Bread=0,
Cleaner=0,
Drink=0,
Dry_BakingGoods=1,
Fruit=1,
PaperGoods=0,
Vegetable=1} => {Beverage=1} 0.0299 0.9006024 1.361043
1941 {Bread=0,
Cleaner=0,
CannedGoods=1,
Drink=0,
Vegetable=1} => {Beverage=1} 0.0261 0.9000000 1.360133
1942 {Bread=0,
Cleaner=0,
Drink=0,
Dry_BakingGoods=1,
FrozenFood=0,
PaperGoods=0,
Vegetable=1} => {Beverage=1} 0.0162 0.9000000 1.360133
1943 {Drink=0,
Dry_BakingGoods=0,
FrozenFood=0,
Fruit=1,
PaperGoods=1,
Tobacco=1,
Vegetable=1} => {Beverage=1} 0.0144 0.9000000 1.360133
Conclusion:
According to the rules if “Vegetable=1” then “Beverage=1” will have a high confidence.
Hence sale of items like “Vegetables” will give us more sale of “Beverage”. And forgoing “vegetables”
will decrease it.
There are many other rules which can be used to establishing useful relations between sale of
different products.
1. While using the function as.factor to convert the nominal attributes to categorical, we run out of
contiguous memory and it shows an error of memory management. So in this case we need to
free some already installed packages and then run the script.
2. In Game of Thrones dataset according to our given support and confidence thresholds we don’t
get any rule which says {Survives=0} hence it becomes very difficult to predict accurately the fate
of a character in the upcoming records.