You are on page 1of 9

Apache Mahout

What is it ? How does it work ? Machine Learning Algorithms Install

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Mahout What is it ?

Machine learning For large data Based on Hadoop But can work on a non Hadoop cluster Scaleable Licensed by Apache

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Mahout How does it work ?

Uses Hadoop Map Reduce Has many supplied algorithms Supports four use cases

Recommendation mining Clustering Classification Frequent Itemset Mining

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Mahout - Machine Learning


Machine learning what does it mean ?

A branch of artificial intelligence Systems that learn from data Classify data after learning Learn on test data sets Generalisation the ability to classify unseen data sets

after learning

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Mahout Algorithms
Some of the available algorithms (among many others)

Collaborative filtering

Narrow Sense make predictions about user interests by collecting preferences General - Multi agent collaboration for information filtering Mode seeking, used for visual tracking Find unique features

Mean shift clustering

Parallel frequent pattern mining

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Mahout Install
So how do we install Mahout and test it ?

Install Maven

sudo apt-get install maven3 You will need subversion installed svn co http://svn.apache.org/repos/asf/mahout/trunk Go to dir containing pom.xml file

Install Apache Mahout


mvn install

## in ./trunk

Full details available in the Mahout install guide on our web site shop

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Mahout Test Install


So let us run a test

cd $MAHOUT_HOME/examples/bin ./build-reuters.sh choose option 1 kmeans clustering Should finish with see next slide

Full details available in the Mahout install guide on our web site shop

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Mahout Test Install


cd $MAHOUT_HOME/examples/bin ; ./build-reuters.sh Please call cluster-reuters.sh directly next time. This file is going away. Please select a number to choose the corresponding clustering algorithm 1. kmeans clustering 2. fuzzykmeans clustering 3. lda clustering Enter your choice : 1 ok. You chose 1 and we'll use kmeans Clustering ................................. Inter-Cluster Density: NaN Intra-Cluster Density: 0.0 CDbw Inter-Cluster Density: NaN CDbw Intra-Cluster Density: NaN CDbw Separation: NaN Full details available in the Mahout install guide on our web site shop www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Contact Us

Feel free to contact us at


www.semtech-solutions.co.nz info@semtech-solutions.co.nz

We offer IT project consultancy We are happy to hear about your problems You can just pay for those hours that you need To solve your problems