Sie sind auf Seite 1von 2


Cloudera Introduction to Data Science: Building Recommender Systems

Cloudera, the leader in Apache Hadoop-based software and services, offers intensive Hadoop training that arms you with the knowledge and skills to take full advantage of this powerful opensource technology.

Take your knowledge to the next level with Clouderas Data Science Training and Certification
Data scientists build information platforms to ask and answer previously unimaginable questions. Learn how data science helps companies reduce costs, increase profits, improve products, retain customers and identify new opportunities. Cloudera Universitys three-day course helps participants understand what data scientists do and the problems they solve. Through in-class simulations, participants apply data science methods to real-world problems in different industries and, ultimately, prepare for data scientist roles in the field. Through lecture and interactive, hands-on exercises, attendees will cover topics such as:
> The growing need for and enablers of data science, the role of data scientists and

vertical use cases and business applications

> Where and how to acquire data, methods for evaluating source data and data

transformation and preparation

> Types of statistics and analytical methods and their relationship > Machine learning fundamentals and breakthroughs, the importance of algorithms and

data as a platform
> How to implement and manage recommenders using Apache Mahout and how to set

up and evaluate data experiments

> Steps for deploying to production and tips for working at scale

Upon completion of the course, attendees receive a voucher for a Cloudera Certified Professional: Data Science exam. Certification is a great differentiator; it helps establish individuals as leaders in their field, providing customers with tangible evidence of skills and expertise.

This course is suitable for software engineers, data analysts and statisticians with basic knowledge of Apache Hadoop: HDFS, MapReduce, Hadoop Streaming, Apache Hive. Students should have proficiency in a scripting language: Python is strongly preferred, but familiarity with Perl or Ruby is sufficient.


Course Outline: Cloudera Introduction to Data Science

Introduction Data Science Overview > What Is Data Science? > The Growing Need for Data Science > The Role of a Data Scientist Use Cases > Finance > Retail > Advertising > Defense and Intelligence > Telecommunications and Utilities > Healthcare and Pharmaceuticals Project Lifecycle > Steps in the Project Lifecycle > Lab Scenario Explanation Data Acquisition > Where to Source Data > Acquisition Techniques Evaluating Input Data > Data Formats > Data Quantity > Data Quality Data Transformation > Anonymization > File Format Conversion > Joining Datasets Data Analysis and Statistical Methods > Relationship Between Statistics and Probability > Descriptive Statistics > Inferential Statistics Fundamentals of Machine Learning > Overview > The Three Cs of Machine Learning > Spotlight: Nave Bayes Classifiers > Importance of Data and Algorithms Recommender Overview > What Is a Recommender System? > Types of Collaborative Filtering > Limitations of Recommender Systems > Fundamental Concepts Introduction to Apache Mahout > What Apache Mahout Is (and Is Not) > A Brief History of Mahout > Availability and Installation > Demonstration: Using Mahouts ItemBased Recommender Implementing Recommenders with Apache Mahout > Overview > Similarity Metrics for Binary Preferences > Similarity Metrics for Numeric Preferences > Scoring Experimentation and Evaluation > Measuring Recommender Effectiveness > Designing Effective Experiments > Conducting an Effective Experiment > User Interfaces for Recommenders Production Deployment and Beyond > Deploying to Production > Tips and Techniques for Working at Scale > Summarizing and Visualizing Results > Considerations for Improvement > Next Steps for Recommenders Conclusion Appendix A : Hadoop Overview Appendix B: Mathematical Formulas Appendix C : Language and Tool Reference

Cloudera Certified Professional (CCP): Data Science

Establish yourself as an expert by completing the certification exam for data scientists. Cloudera Certified Professional: Data Science is the highest level of technical certification Cloudera offers. CCP: Data Science certifies your knowledge and skills as a data scientist using Apache Hadoop on large datasets. The credential requires both a written exam and a hands-on, performance-based exam including completion of a real-world data science challenge on a live system.

Cloudera, Inc. 220 Portage Avenue, Palo Alto, CA 94306 USA | 1-888-789-1488 or 1-650-362-0488 |
2012 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.