Sie sind auf Seite 1von 3

What is H2O?

H2O is fast, scalable, open-source machine learning and deep learning for Smarter Applications.
Advanced algorithms, like Deep Learning, Boosting, and Bagging Ensembles are readily
available for application designers to build smarter applications through elegant APIs. Using in-
memory compression techniques, H2O can handle billions of data rows in-memory, even with a
fairly small cluster. The platform includes interfaces for R, Python, Scala, Java, JSON and
Coffeescript/JavaScript, along with a built-in web interface, Flow, that make it easier for non-
engineers to stitch together complete analytic workflows.

The platform was built alongside (and on top of) both Hadoop and Spark Clusters and is
typically deployed within minutes. H2O implements almost all common machine learning
algorithms, such as generalized linear modeling (linear regression, logistic regression, etc.),
Nave Bayes, principal components analysis, time series, k-means clustering, and others. H2O
also implements best-in-class algorithms such as Random Forest, Gradient Boosting, and Deep
Learning at scale. Customers can build thousands of models and compare them to get the best
prediction results. H2O is nurturing a grassroots movement of physicists, mathematicians,
computer and data scientists to herald the new wave of discovery with data science.

Academic researchers and Industrial data scientists collaborate closely with our team to make
this possible. Stanford university giants Stephen Boyd, Trevor Hastie, Rob Tibshirani advise the
H2O team to build scalable machine learning algorithms. With 100s of meetups over the past two
years, H2O has become a word-of-mouth phenomenon growing amongst the data community by
a 100-fold and is now used by 12,000+ users, deployed in 2000+ corporations using R, Python,
Hadoop and Spark.

Deep Learning has been dominating recent machine learning competitions with better
predictions. Unlike the neural networks of the past, modern Deep Learning has cracked the code
for training stability and generalization and scales on big data. It is the algorithm of choice for
highest predictive accuracy.

H2Os Deep Learning Architecture


As described above, H2O follows the model of multi-layer, feedforward neural networks for
predictive modeling. This section provides a more detailed description of H2Os Deep Learning
features, parameter configurations, and computational implementation.

H2Os Deep Learning functionalities include:


Purely supervised training protocol for regression and classification tasks
fast and memory-efficient Java implementations based on columnar compression and fine-grain
Map/Reduce
multi-threaded and distributed parallel computation to be run on either a single node or a multi-
node cluster fully automatic per-neuron adaptive learning rate for fast convergence
optional specification of learning rate, annealing and momentum options
regularization options include L1, L2, dropout, Hogwild! and model averaging to prevent
model overfitting elegant web interface or fully scriptable R API from H2O CRAN package
grid search for hyper-parameter optimization and model selection model check pointing for
reduced run times and model tuning
automatic data pre and post-processing for categorical and numerical data
automatic imputation of missing values
automatic tuning of communication vs computation for best performance
model export in plain java code for deployment in production environments
additional expert parameters for model tuning
deep autoencoders for unsupervised feature learning and anomaly detection capabilities

H2O is a Java ML framework built on top of Hadoop and Spark. Therefore, their aim is large-
scale distributed (Big Data) ML/analytics in enterprise setting.

So the reasons to choose H2O over Scikit-learn:

You develop in Java


You have HUGE amount of data (100s of GB at least)
You work for big enterprise
You have a Hadoop/Spark cluster at your disposal

H2O, an open-source predictive analytics platform that is fast, powerful, and easy to use. Using
a combination of extraordinary math and high-performance parallel processing, H2O allows you
to quickly create models for big data. The steps below show you how to download and start
analyzing data at high speeds with H2O.

What Youll Learn

How to download H2O (just updated to OS X El Capitan? Then Java too)

How to use H2O with IPython Notebook & where to get demo scripts


A Delicious Drink of WaterDownloading H2O

(If you dont feel like reading the long version below just go here)

I recommend downloading the latest release of H2O (which is Bleeding Edge as of this
moment) because it has the most Python features, but you can also see the other releases, as well
as the software requirements.

DISCLAIMER: I am not the author of this document, and the entire


document is only meant to be used as a wiki reference for local XENON1T
RPI group members. Furthermore, this document shall not be used as a
reference in research publications or proceedings or any copyright material.
References
1. Deep Learning with H2O by Arno Candel & Viraj Parmar
2. WPENGINE. https://blog.h2o.ai/2015/11/newbies-guide-python/
3. Data analysis in R IML Machine Learning Workshop, CERN by Andrew John Lowe
Wigner Research Centre for Physics.
https://indico.cern.ch/event/595059/contributions/2522194/attachments/1432097/220029
9/R-tutorial.pdf

4. https://www.stat.berkeley.edu/~ledell/docs/h2o_hpccon_oct2015.pdf

Das könnte Ihnen auch gefallen