Sie sind auf Seite 1von 2

Course Title: Data Science

Course Code: CoSc 613


Credit Hours: 3 (lab 2 hrs.)
Instructor: Michael Melese
Course Description
Data Science is a dynamic and fast-growing field at the interface of Statistics and Computer
Science. The emergence of massive datasets containing millions or even billions of observations
provides the primary impetus for the field. Such datasets arise, for instance, in large-scale retailing,
telecommunications, astronomy, and internet social media. This course will emphasize practical
techniques for working with large-scale date. Specific topics covered will include statistical
modeling and machine learning, data pipelines, programming languages, "big data" tools, and real-
world topics and case studies.

Course Objectives:
At the end of this course, the student will be able to:
• Describe what Data Science is and the skill sets needed to be a data scientist.
• Explain, in basic terms, what Statistical Inference mean.
• Identify probability distributions commonly used as foundations for statistical modeling.
• Understand different machine learning algorithm.
• Describe the Data Science Process and how its components interact.
• Use APIs and other tools to scrap, collect and analyze data.
• Create effective visualization of given data.
Course Content

Chapter One: Introduction: What is Data Science?


•Big Data and Data Science hype
•Why now? – Datafication
•Current landscape of perspectives
•Skill sets needed
Chapter Two: Statistical Inference

Populations and samples

Statistical modeling, probability distributions, fitting a model

Introduction to R

Exploratory Data Analysis and the Data Science Process
Chapter Three: Basic Machine Learning Algorithms (Article on Data Science & ML algorithm)
• Linear Regression
• k-Nearest Neighbors (k-NN)
• k-means
• Support Vector Machine (SVM)
• Naive Bayes
• Principal Component Analysis
• Machine Learning Algorithms and Usage in Applications
Chapter Four: Recommendation Systems: Building a User-Facing Data Product
• Algorithmic ingredients of a Recommendation Engine
• Dimensionality Reduction
• Singular Value Decomposition
Chapter Five: Data Visualization
• Basic principles, ideas and tools for data visualization
• Discussion on industry projects
• Visualization of a complex dataset
Teaching Methods:
• Lectures and discussions
• Review of research papers and literatures
• Practical sessions

Assessment Methods:
Exercises, Assignments and Projects 20%
Literature review 20%
Presentations 20%
Written Examination 40%

Text Books:

• Cathy O’Neil and Rachel Schutt. Doing Data Science, Straight Talk from The Frontline.
O’Reilly. 2014.
Reference Books:

• Jure Leskovek, Anand Rajaraman and Jeffrey Ullman. Mining of Massive Datasets. v2.1,
Cambridge University Press. 2014.
• Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. ISBN 0262018020.
2013.
• Foster Provost and Tom Fawcett. Data Science for Business: What You Need to Know
about Data Mining and Data-analytic Thinking. ISBN 1449361323. 2013.
• Trevor Hastie, Robert Tibshirani and Jerome Friedman. Elements of Statistical Learning,
Second Edition. ISBN 0387952845. 2009.
• Avrim Blum, John Hopcroft and Ravindran Kannan. Foundations of Data Science.
Mohammed J. Zaki and Wagner Miera Jr. Data Mining and Analysis: Fundamental
Concepts and Algorithms. Cambridge University Press. 2014.
• Jiawei Han, Micheline Kamber and Jian Pei. Data Mining: Concepts and Techniques, Third
Edition. ISBN 0123814790. 2011.