Sie sind auf Seite 1von 7

Aster Analytic Learning

Series
Using Collaborative Filter in Aster

1
Aster Analytic Learning Series – Collaborative Filter

Genre: Association Analysis

Background: Collaborative filtering is used by analysts to find items or


events that are frequently paired with other items or events. “People who
shopped for this item also shopped for...”

Use Cases:
- Market Basket Analytics
- Recommendation Engines
- Variety of Association Analytics

Demonstration
Fun Fact: Multi-Node/Item Affinity: Aster’s implementation is a single
line of SQL-MR that provides the following outputs: Support,
Confidence, Lift, Z-Score, and Raw Score Probability.

2
How Aster Does Collaborative Filtering – Deeper Dive
Aster’s Collaborative Filter output can be rendered in the
form of a graph diagram.

The diagram to the left is also known as a Sigma,


Network, or Graph Diagram.

• Nodes: Objects that are connected, also called


vertices.
• Edges: Connections between the nodes, lines
connecting the nodes.

SIGMA DIAGRAM: Aster Visualization In our example to the left we see items in a shopping cart
as the NODES and the strength of them being purchased
together as the EDGES.

Aster Visualizations Support


directed graphs. Note the
strength of the relationship
between the nodes
reflected in the thickness
of the edge and the color.

3
Data Input – Collaborative Filter
date_id – Basket date surrogate key

customer_id – customer identifier

store_id – store identifier

basket_id – basket identifier

product_id – product in the basket


identifier

sales_quantity – quantity sold

discount_quantity – discount amount

4
SQL-MR Statement – Collaborative Filter
----------------------------------------------------
-- SQL-MR SYNTAX - COLLABORATIVE FILTER INPUTTABLE('sales_fact') – The table name
---------------------------------------------------- of the input source.

SELECT * OUTPUTTABLE('cf_sales_fact') – true


FROM cfilter ( converts all case to lower case.
ON (SELECT 1)
PARTITION BY 1 INPUTCOLUMNS('product_id') – items that are
INPUTTABLE('sales_fact') associated that are in the join column
OUTPUTTABLE('cf_sales_fact')
INPUTCOLUMNS('product_id') JOINCOLUMNS('basket_id') – the basket
JOINCOLUMNS('basket_id') identifier or the unique identifier of the
DROPTABLE('true') inputcolumn items.
);
DROPTABLE('true') – Drop the OUTPUTTABLE on
additional runs.

5
Output Review – Collaborative Filter
col1_item1 – Item 1 identifier

col2_item2 – Item 2 identifier

cntb - Count of co-occurance of both items together

cnt1 – Count of item 1 within the partition

cnt2 – Count of Item 2 within the partition

score – Raw score (cntb * cntb)/(cnt1 * cnt2)

support - Support of a product or product bundle indicates the


popularity of the product or product bundle in the transaction
set. Higher the support, more popular is the product or
product bundle. This measure can help in identifying driver of
traffic to the store.

confidence - Confidence can be used for product placement


strategy and increasing profitability. Place high-margin items
with associated high selling (driver) items.

lift - Lift values greater than 1.0 indicate that transactions


containing Item B tend to contain Item A more often than
transactions that do not contain Item B.

z_score - Assuming cntb follows a normal distribution, the


z_score is (cntb – mean(cntb))/sd(cntb). It is a way to
measure how significant the co-occurrence is.

6
Review the Output – Collaborative Filter