ML Microsoft Course Overview: Machine Learning in Context

ML Microsoft
Course Overview
Machine Learning in Context

In this course, we'll provide you with an introduction to the fantastic world of machine
learning (ML). We will first help you understand the overall place of machine
learning in the broader context of computer science. We'll take you through its
history, perspectives, approaches, challenges, essential tools, and fundamental
processes.
Training a Model
After that, we will focus on the core machine learning process: training a model. We'll
cover the entire chain of tasks, from data import, transformation, and management to
training, validating, and evaluating the model.
Foundational Concepts
Since this is an introductory course, our goal will be to give you an understanding of
foundational concepts. We'll talk about the fundamentals of supervised (classification
and regression) and unsupervised (clustering) approaches.
More Advanced Techniques

Then, building on these basics, we'll have a look at more advanced techniques
like ensemble learning and deep learning.
Classic Applications of ML
We'll also cover some of the best known specific applications of machine learning,
like recommendations, text classification, anomaly detection, forecasting, and feature
learning.
Managed Services, Cloud Computing, and Microsoft Azure
Many machine learning problems involve substantial requirements—things like model
management, computational resource allocation, and operationalization. Meeting all
these requirements on your own can be difficult and inefficient—which is why it's often
very beneficial to use Software as a Service (SaaS), managed services, and cloud
computing to outsource some of the work. And that's exactly what we'll be doing in this
course—specifically, we'll show you how to leverage Microsoft Azure to empower your
machine learning solutions.
Responsible AI
At the very end of the course, we'll talk about the broader impact of machine learning.
We'll discuss some of the challenges and risks that are involved with machine learning,
and then see how we can use principles of responsible artificial intelligence, such
as transparency and explainability, to help ensure our machine learning applications
generate positive impact, and avoid harming others.
Lesson2:
Lesson Overview
In this lesson, our goal is to give you a high-level introduction to the field of machine
learning, including the broader context in which this branch of computer science exists.
Here are the main topics we'll cover:
 What machine learning is and why it's so important in today's world

 The historical context of machine learning
 The data science process
 The types of data that machine learning deals with
 The two main perspectives in ML: the statistical perspective and the computer
science perspective
 The essential tools needed for designing and training machine learning models
 The basics of Azure ML
 The distinction between models and algorithms
 The basics of a linear regression model
 The distinction between parametric vs. non-parametric functions
 The distinction between classical machine learning vs. deep learning
 The main approaches to machine learning
 The trade-offs that come up when making decisions about how to design and
training machine learning models
In the process, you will also train your first machine learning model using Azure
Machine Learning Studio.
SIGUIENTE
What is Machine Learning?

One of our goals in this lesson is to help you get a clearer, more specific understanding
of what machine learning is and how it differs from other approaches.
Let's start with a classic definition. If you look up the term in a search engine, you might
find something like this:
Machine learning is a data science technique used to extract patterns from data, allowing
computers to identify related data, and forecast future outcomes, behaviors, and trends.
Let's break that down a little. One important component of machine learning is that we
are taking some data and using it to make predictions or identify important relationships.
But looking for patterns in data is done in traditional data science as well. So how does
machine learning differ? In this next video, we'll go over a few examples to illustrate the
difference between machine learning and traditional programming.
QUESTION 1 OF 5
What type of approach is shown in this image?

Traditional Programming
Machine Learning
ENVIAR
QUESTION 2 OF 5
What type of approach is shown in this image?


Machine Learning
ENVIAR
QUESTION 3 OF 5
Imagine you want to create a function that multiplies two numbers

together (e.g., given the inputs 2 and 3, the function will generate the
output 6).
What approach is best suited to this problem?

Machine Learning
ENVIAR
QUESTION 4 OF 5
Now imagine that you have some images that contain handwritten
numbers. You want to create a program that will recognize which number
is in each picture, but you're not sure exactly what characteristics can be
used to best tell the numbers apart.
Which is the best approach for creating this program?


Traditional programming
Machine learning
ENVIAR
QUESTION 5 OF 5
In traditional programming, the inputs of hard-coded rules and data are

used to arrive at the output of answers, but in machine learning the
approach is quite different.
Mark all of the options below that are true statements about machine
learning.

Data is input to train an algorithm
Historical answers are input to train an algorithm
Rules are explicitly programmed
Rules are the output learned by the algorithm

ENVIAR
SIGUIENTE
Applications of Machine Learning
The applications of machine learning are extremely broad! And the opportunities cut
across industry verticals. Whether the industry is healthcare, finance, manufacturing,
retail, government, or education, there is enormous potential to apply machine
learning to solve problems in more efficient and impactful ways.
We'll take a tour through some of the major areas where machine learning is applied,
mainly just to give you an idea of the scope and type of problems that machine
learning is commonly used to address.
Examples of Applied Machine Learning

Machine learning is used to solve an extremely diverse range of problems. For your
reference, here are all the examples we discussed in the video, along with links to
further reading in case you are curious and want to learn more about any of them:
Automate the recognition of disease

Trained physicians can only review and evaluate a limited volume of patients or patient
images (X-rays, sonograms, etc.). Machine learning can be used to spot the disease,
hence reducing physician burnout. For example, Google has trained a deep learning
model to detect breast cancer and Stanford researchers have used deep learning
models to diagnose skin cancer.
Recommend next best actions for individual care plans

With the mass digitization of patient data via systems that use EMRs (Electronic Medical
Records) and EHRs (Electronic Health Records), machine learning can be used to help
build effective individual care plans. For example, IBM Watson Oncology can help
clinicians explore potential treatment options. More examples of how machine learning
impacts healthcare can be found here.
Enable personalized, real-time banking experiences with chatbots

You've likely encountered this when you call a customer service number. Machine
learning can be used to intercept and handle common, straightforward issues through
chat and messaging services, so customers can quickly and independently resolve
simple issues that would otherwise have required human intervention. With the
chatbot, a customer can simply type in a question and the bot engages to surface the
answer. Refer to this article to find more information about chatbot powered machine
learning.
Identify the next best action for the customer

Real-time insights that incorporate machine learning tools—such as sentiment analysis
—can help organizations assess the likelihood of a deal closing or the level of a
customer’s loyalty. Personally-tailored recommendations powered by machine learning
can engage and delight customers with information and offers that are relevant to
them.
Capture, prioritize, and route service requests to the correct employee, and
improve response times
A busy government organization gets innumerable service requests on an annual basis.
Machine learning tools can help to capture incoming service requests, to route them to
the correct employee in real-time, to refine prioritization, and improve response times.
Can check out this article if you're curious to learn more about ticket routing
Brief History of Machine Learning

QUIZ QUESTION
There is often confusion between the terms machine learning, deep

learning, and artificial intelligence. See if you can match each term with its
description:
Artificial intelligence
Machine learning
Deep learning
DESCRIPTION
TERM
A broad term that refers to computers thinking more like humans.
A subcategory of artificial intelligence that involves learning from data without being
explicitly programmed.
A subcategory of machine learning that uses a layered neural-network architecture
originally inspired by the human brain.
ENVIAR
Further Reading
 What’s the Difference Between Artificial Intelligence, Machine Learning and Deep
Learning? by Michael Copeland at NVIDIA
SIGUIENTE
The Data Science Process

Big data has become part of the lexicon of organizations worldwide, as more and more
organizations look to leverage data to drive informed business decisions. With this
evolution in business decision-making, the amount of raw data collected, along with
the number and diversity of data sources, is growing at an astounding rate. This data
presents enormous potential.
Raw data, however, is often noisy and unreliable and may contain missing values and
outliers. Using such data for modeling can produce misleading results. For the data
scientist, the ability to combine large, disparate data sets into a format more
appropriate for analysis is an increasingly crucial skill.
The data science process typically starts with collecting and preparing the data before
moving on to training, evaluating, and deploying a model. Let's have a look.
QUESTION 1 OF 3
Here are the typical steps of the data science process that we just
discussed. Can you remember the correct order?
Deploy the model and then retrain as necessary
Train the model and evaluate its performance
Collect and prepare the data
STEP
DESCRIPTION
Steps 1 & 2
Steps 3 & 4
Steps 5 & 6
ENVIAR
QUESTION 2 OF 3
Here are some of the steps once again, along with some of the actions
that you would carry out during those steps. Can you match the step with
the appropriate action?
(Again, first try to do it from memory—but have a look at the text or video
above if you get stuck.)
Prepare the data
Evaluate the model
Train the model
Deploy the model
ACTION
WHICH STEP OF THE PROCESS?
Package the model and dependencies
Run the model through a final exam using data from your validation data set
Create features needed for the model
Select the algorithm, and prepare training, testing, and validation data sets
ENVIAR
QUESTION 3 OF 3
In machine learning, often you have to tune parameters for the chosen
learning algorithm to improve the performance on relevant metrics, such
as prediction accuracy. At what stage of the data science lifecycle do you
optimize the parameters?

Training the model
Evaluating the model
Deploying the model

ENVIAR
SIGUIENTE
Common Types of Data
It's All Numerical in the End
Note that although we've described numerical data as a distinct category, it is actually
involved in some way with all of the data types we've described. With the example of
stock performance (above) the stock prices are numerical data points. So why do we
give this as an example of "time-series data" rather than "numerical data"? It is the
ordering of the numerical data points across points in time that leads us to call the
data time-series data.
What is more, all data in machine learning eventually ends up being numerical,
regardless of whether it is numerical in its original form, so it can be processed by
machine learning algorithms.
For example, we may want to use gender information in the dataset to predict if an
individual has heart disease. Before we can use this information with a machine
learning algorithm, we need to transfer male vs. female into numbers, for instance, 1
means a person is male and 2 means a person is female, so it can be processed. Note
here that the value 1 or 2 does not carry any meaning.
Another example would be using pictures uploaded by customers to identify if they are
satisfied with the service. Pictures are not initially in numerical form but they will need
to be transformed into RGB values, a set of numerical values ranging from 0 to 255, to
be processed.
QUESTION 1 OF 2
Have a look at this graph:

Population of Greece since 1961 (Wikimedia Commons)
What type of data is this?

Numerical
Time-Series
Categorical
Text
ENVIAR
QUESTION 2 OF 2
Have a look at this chart showing the number of people who like each
flavor of ice cream:
What type of data is this?

Numerical
Time-Series
Categorical
Text
ENVIAR
SIGUIENTE
Tabular Data
In machine learning, the most common type of data you'll encounter is tabular data—
that is, data that is arranged in a data table. This is essentially the same format as you
work with when you look at data in a spreadsheet.
Here's an example of tabular data showing some different clothing products and their
properties:
SKU Make Color Quantity Price
908721 Guess Blue 789 45.33
456552 Tillys Red 244 22.91
789921 A&F Gree 387 25.92

n
872266 Guess Blue 154 17.56

Notice how tabular data is arranged in rows and columns.
QUESTION 1 OF 2
Looking at the table above, can you figure out what

the rows vs. columns are for?

Each row describes a single product (e.g., a shirt), while each column describes a
property the products can have (e.g., the color of the product)
Each column describes a single product (e.g., a shirt), while each row describes a
property the products can have (e.g., the color of the product)
ENVIAR
QUESTION 2 OF 2
Below are the components of a table. What does each of these

components represent?
Cell
Row
Column
WHAT IT REPRESENTS
COMPONENT
An item or entity.
A property that the items or entities in the table can have.
A single value.
ENVIAR
Vectors
It is important to know that in machine learning we ultimately always work with
numbers or specifically vectors.
A vector is simply an array of numbers, such as (1, 2, 3)—or a nested array that
contains other arrays of numbers, such as (1, 2, (1, 2, 3)).
Vectors are used heavily in machine learning. If you have taken a basic course in linear
algebra, then you are probably in good shape to begin learning about how they are
used in machine learning. But if linear algebra and vectors are totally new to you, there
are some great free resources available to help you learn. You may want to have a look
at Khan Academy's excellent introduction to the topic here or check out Udacity's
free Linear Algebra Refresher Course.
For now, the main points you need to be aware of are that:
 All non-numerical data types (such as images, text, and categories) must
eventually be represented as numbers
 In machine learning, the numerical representation will be in the form of an array
of numbers—that is, a vector
As we go through this course, we'll look at some different ways to take non-numerical
data and vectorize it (that is, transform it into vector form).
2.8Scaling Data
Scaling Data
Scaling data means transforming it so that the values fit within some range or scale,
such as 0–100 or 0–1. There are a number of reasons why it is a good idea to scale your
data before feeding it into a machine learning algorithm.
Let's consider an example. Imagine you have an image represented as a set of RGB
values ranging from 0 to 255. We can scale the range of the values from 0–255 down to
a range of 0–1. This scaling process will not affect the algorithm output since every
value is scaled in the same way. But it can speed up the training process, because now
the algorithm only needs to handle numbers less than or equal to 1.
Two common approaches to scaling data include standardization and normalization.
Standardization
Standardization rescales data so that it has a mean of 0 and a standard deviation of 1.
The formula for this is:
(𝑥 − 𝜇)/𝜎
We subtract the mean (𝜇) from each value (x) and then divide by the standard deviation
(𝜎). To understand why this works, it helps to look at an example. Suppose that we have
a sample that contains three data points with the following values:
50
100
150
The mean of our data would be 100, while the sample standard deviation would be 50.
Let's try standardizing each of these data points. The calculations are:
(50 − 100)/50 = -50/50 = -1

(100 − 100)/50 = 0/50 = 0
(150 − 100)/50 = 50/50 = 1
Thus, our transformed data points are:
-1
0
1
Again, the result of the standardization is that our data distribution now has a mean of
0 and a standard deviation of 1.
Normalization
Normalization rescales the data into the range [0, 1].
The formula for this is:
(𝑥 −𝑥𝑚𝑖𝑛)/(𝑥𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛)
For each individual value, you subtract the minimum value (𝑥𝑚𝑖𝑛) for that input in the
training dataset, and then divide by the range of the values in the training dataset. The
range of the values is the difference between the maximum value (𝑥𝑚𝑎𝑥) and the
minimum value (𝑥𝑚𝑖𝑛).
Let's try working through an example with those same three data points:
50
100
150
The minimum value (𝑥𝑚𝑖𝑛) is 50, while the maximum value (𝑥𝑚𝑎𝑥) is 150. The range of
the values is 𝑥𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛 = 150 − 50 = 100.
Plugging everything into the formula, we get:
(50 − 50)/100 = 0/100 = 0

(100 − 50)/100 = 50/100 = 0.5
(150 − 50)/100 = 100/100 = 1
Thus, our transformed data points are:
0
0.5
1
Again, the goal was to rescale our data into values ranging from 0 to 1—and as you can
see, that's exactly what the formula did.
QUESTION 1 OF 3
Which of the below refers to standardization and which refers

to normalization?
Normalization
Normalization
Standardization
Standardization
DESCRIPTION
STANDARDIZATION OR NORMALIZATION?
Rescales the data to have mean = 0 and standard deviation = 1
Rescales the data into the range [0, 1]
(𝑥 −𝑥𝑚𝑖𝑛)/(𝑥𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛)
(𝑥 − 𝜇)/𝜎
ENVIAR
QUESTION 2 OF 3
Standardize -5,10,15. Knowing that the mean is 7 and the standard

deviation is 10

-1.2, 0.3, 0.8

-1, 0.5, 0.8
-1.2, 0.3, 0.5
-1.2, 0.5, 0.8

ENVIAR
QUESTION 3 OF 3
Normalize -5,10,15. Knowing that the mean is 7 and the standard

deviation is 10.

-0.5, 0.65, 1.0
0.0, 0.75, 1.0
0.5, 0.75, 0.8
0.0, 0.65, 0.8

ENVIAR
SIGUIENTE
2.9 Encoding Categorical Data.

Encoding Categorical Data
As we've mentioned a few times now, machine learning algorithms need to have data
in numerical form. Thus, when we have categorical data, we need to encode it in some
way so that it is represented numerically.
There are two common approaches for encoding categorical data: ordinal
encoding and one hot encoding.
Ordinal Encoding
In ordinal encoding, we simply convert the categorical data into integer codes ranging
from 0 to (number of categories – 1). Let's look again at our example table of
clothing products:
908721 Guess Blue 789 45.33
456552 Tillys Red 244 22.91
789921 A&F Gree 387 25.92

n
872266 Guess Blue 154 17.56
If we apply ordinal encoding to the Make property, we get the following:

Make Encoding
A&F 0
Gues 1
s
Tillys 2
And if we apply it to the Color property, we get:
Color Encoding
Red 0
Gree 1
n
Blue 2
Using the above encoding, the transformed table is shown below:

Mak
SKU e Color Quantity Price
908721 1 2 789 45.33
456552 2 0 244 22.91
789921 0 1 387 25.92
872266 1 2 154 17.56
One of the potential drawbacks to this approach is that it implicitly assumes an order
across the categories. In the above example, Blue (which is encoded with a value of 2)
seems to be more than Red (which is encoded with a value of 1), even though this is in
fact not a meaningful way of comparing those values. This is not necessarily a problem,
but it is a reason to be cautious in terms of how the encoded data is used.
One-Hot Encoding
One-hot encoding is a very different approach. In one-hot encoding, we transform
each categorical value into a column. If there are n categorical values, n new columns
are added. For example, the Color property has three categorical values: Red, Green,
and Blue, so three new columns Red, Green, and Blue are added.
If an item belongs to a category, the column representing that category gets the
value 1, and all other columns get the value 0. For example, item 908721 (first row in
the table) has the color blue, so we put 1 into that Blue column for 908721 and 0 into
the Red and Green columns. Item 456552 (second row in the table) has color red, so
we put 1 into that Red column for 456552 and 0 into the Green and Blue columns.
If we do the same thing for the Make property, our table can be transformed as
follows:
Tilly Re
SKU A&F Guess s d Green Blue Quantity Price
908721 0 1 0 0 0 1 789 45.33

Tilly Re
SKU A&F Guess s d Green Blue Quantity Price
456552 0 0 1 1 0 0 244 22.91
789921 1 0 0 0 1 0 387 25.92
872266 0 1 0 0 0 1 154 17.56
One drawback of one-hot encoding is that it can potentially generate a very large
number of columns.
QUESTION 1 OF 4
Have a look at this tabular data:
Mamma
ID l Reptile Fish
012 1 0 0
204 0 0 1
009 0 1 0
105 1 0 0
What type of encoding has been performed on this?


Ordinal encoding
One-hot encoding
ENVIAR
QUESTION 2 OF 4
Looking again at the table in the previous question, what category is

animal 204?

Mammal
Reptile
Fish
ENVIAR
QUESTION 3 OF 4
Again looking at the above animals table, suppose we do the following:
1. Add two new categories, Amphibian and Bird

2. Add one bird with ID 303 in the table
Which one of the following statements is correct about the new table?

There are 5 columns in the new table including the ID column
Animal 303 has 1 in the Mammal column
The Amphibian column has 0 for all animals
Animal 303 has 0 in the Bird column
ENVIAR
QUESTION 4 OF 4
John is looking to train his first machine learning model. One of his inputs
includes the size of the T-Shirts, with possible values of XS, S, M, L, and XL.
What is the best approach John can employ to preprocess the T-Shirt size
input feature?
2.10 Image Data
Image Data
Images are another example of a data type that is commonly used as input in machine
learning problems—but that isn't initially in numerical format. So, how do we represent
an image as numbers? Let's have a look.
Taking a Closer Look at Image Data

Let's look a little closer at how an image can be encoded numerically. If you zoom in on
an image far enough, you can see that it consists of small tiles, called pixels:
The color of each pixel is represented with a set of values:
 In grayscale images, each pixel can be represented by a single number, which

typically ranges from 0 to 255. This value determines how dark the pixel appears
(e.g., 0 is black, while 255 is bright white).
 In colored images, each pixel can be represented by a vector of three numbers
(each ranging from 0 to 255) for the three primary color channels: red, green, and blue.
These three red, green, and blue (RGB) values are used together to decide the color of
that pixel. For example, purple might be represented as 128, 0, 128 (a mix of
moderately intense red and blue, with no green).
The number of channels required to represent the color is known as the color
depth or simply depth. With an RGB image, depth = 3, because there are three
channels (Red, Green, and Blue). In contrast, a grayscale image has depth = 1,
because there is only one channel.
Encoding an Image
Let's now talk about how we can use this data to encode an image. We need to know
the following three things about an image to reproduce it:
 Horizontal position of each pixel

 Vertical position of each pixel
 Color of each pixel
Thus, we can fully encode an image numerically by using a vector with three
dimensions. The size of the vector required for any given image would be the height
* width * depth of that image.
QUESTION 1 OF 2
Assume this figure is the numerical representation of an RGB image in the

red channel:
Each of the squares represents one pixel and the value in the square is
the pixel value.
Which of these statements is incorrect?


This image can be encoded by a vector with the dimension of 4*4*3
The total number of pixels in this image is 48
The numerical representation of the image in the green channel has the dimension

of 4*4
The image has uniform aspect ratio but may need to be normalized.
ENVIAR
QUESTION 2 OF 2
There is a square shaped RGB image that consists of 900 pixels. Which of

the following statements are correct?

Without any preprocessing, the image can be encoded by a 3-dimension vector with
the dimension 45*20*3
This image has a dimension of 30*30
If the image is cropped to half of the original size, it can be encoded by a vector with
the dimension 15*15*2
If the image is converted to grayscale, it can be encoded by a vector with the

dimension 30*30*1
ENVIAR
Other Preprocessing Steps

In addition to encoding an image numerically, we may also need to do some other
preprocessing steps. Generally, we would want to ensure that the input images have
a uniform aspect ratio (e.g., by making sure all of the input images are square in shape)
and are normalized (e.g. subtract mean pixel value in a channel from each pixel value in
that channel). Some other preprocessing operations we might want to do to clean the
input images include rotation, cropping, resizing, denoising, and centering the image
2.11 Text Data.
Text Data
Text is another example of a data type that is initially non-numerical and that must be
processed before it can be fed into a machine learning algorithm. Let's have a look at
some of the common tasks we might do as part of this processing.
Normalization
One of the challenges that can come up in text analysis is that there are often multiple
forms that mean the same thing. For example, the verb to be may show up
as is, am, are, and so on. Or a document may contain alternative spellings of a word,
such as behavior vs. behaviour. So one step that you will sometimes conduct in
processing text is normalization.
Text normalization is the process of transforming a piece of text into a canonical (official)
form.
Lemmatization is an example of normalization. A lemma is the dictionary form of a
word and lemmatization is the process of reducing multiple inflections to that single
dictionary form. For example, we can apply this to the is, am, are example we
mentioned above:
Original
word Lemmatized word
is be
are be
am be
In many cases, you may also want to remove stop words. Stop words are high-
frequency words that are unnecessary (or unwanted) during the analysis. For example,
when you enter a query like which cookbook has the best pancake recipe into
a search engine, the words which and the are far less relevant
than cookbook, pancake, and recipe. In this context, we might want to
consider which and the to be stop words and remove them prior to analysis.
Here's another example:
Original text Normalized text
The quick fox. [quick, fox]

The lazzy dog. [lazy, dog]
The rabid [rabid, hare]

hare.
Here we have tokenized the text (i.e., split each string of text into a list of smaller parts
or tokens), removed stop words (the), and standardized spelling
(changing lazzy to lazy).
QUESTION 1 OF 5
Here's another example:
Mary had a little lamb. [Mary, have, a, little, lamb]
Jack and Jill went up the hill. [Jack, and, Jill, go, up, the, hill]
London bridge is falling [London, bridge, be, fall, down]

down.
Looking at the normalized text, which of the following have been done?

Tokenization
Removal of stop words
Lemmatization
ENVIAR
Vectorization
After we have normalized the text, we can take the next step of actually encoding it in a
numerical form. The goal here is to identify the particular features of the text that will
be relevant to us for the particular task we want to perform—and then get those
features extracted in a numerical form that is accessible to the machine learning
algorithm. Typically this is done by text vectorization—that is, by turning a piece of
text into a vector. Remember, a vector is simply an array of numbers—so there are
many different ways that we can vectorize a word or a sentence, depending on how we
want to use it. Common approaches include:
 Term Frequency-Inverse Document Frequency (TF-IDF) vectorization
 Word embedding, as done with Word2vec or Global Vectors (GloVe)
The details of these approaches are a bit outside the scope of this class, but let's take a
closer look at TF-IDF as an example. The approach of TF-IDF is to give less importance
to words that contain less information and are common in documents, such as "the"
and "this"—and to give higher importance to words that contain relevant information
and appear less frequently. Thus TF-IDF assigns weights to words that signify their
relevance in the documents.
Here's what the word importance might look like if we apply it to our example
quic
k fox lazy dog rabid hare the
0.32 0.23 0.12 0.23 0.56 0.12 0.0
Here's what that might look like if we apply it to the normalized text:
quic
k fox lazy dog rabid hare
[quick, fox] 0.32 0.23 0.0 0.0 0.0 0.0
[lazy, dog] 0.0 0.0 0.1 0.23 0.0 0.0

2
quic
[rabid, 0.0 0.0 0.0 0.0 0.56 0.12

hare]
Noticed that "the" is removed since it has 0 importance here.

Each chunk of text gets a vector (represented here as a row in the table) that is the
length of the total number of words that we are interested in (in this case, six words). If
the normalized text does not have the word in question, then the value in that position
is 0, whereas if it does have the word in question, it gets assigned to the importance of
the word.
QUESTION 2 OF 5
Let's pause to make sure this idea is clear. In the table above, what does
the value 0.56 mean?

It means that the word fox has some importance in [quick, fox].
It means that the word rabid has some importance in [quick, fox].
It means that the word fox has some importance in [rabid, hare].
It means that the word rabid has some importance in [rabid, hare].

ENVIAR
QUESTION 3 OF 5
What vector will be used to represent "quick, lazy hare"


(0.32, 0.23, 0.12, 0.0, 0.0, 0.0)
(0.32, 0.0, 0.12, 0.0, 0.0, 0.12)


(0.0, 0.0, 0.12, 0.0, 0.56, 0.12)
(0.0, 0.0, 0.12, 0.23, 0.0, 0.12)

ENVIAR
Feature Extraction
As we talked about earlier, the text in the example can be represented by vectors with
length 6 since there are 6 words total.
[quick, fox] as (0.32, 0.23, 0.0, 0.0, 0.0, 0.0)
[lazy, dog] as (0.0, 0.0, 0.12, 0.23, 0.0, 0.0)
[rabid, hare] as (0.0, 0.0, 0.0 , 0.0, 0.56, 0.12)
We understand the text because each word has a meaning. But how do algorithms
understand the text using the vectors, in other words, how do algorithms extract
features from the vectors?
Vectors with length n can be visualized as a line in an n dimension space. For example,

a vector (1,1) can be viewed as a line starting from (0, 0) and ending at (1,1).
Any vector with the same length can be visualized in the same space. How close one
vector is to another can be calculated as vector distance. If two vectors are close to
each other, we can say the text represented by the two vectors have a similar meaning
or have some connections. For example, if we add [lazy, fox] to our example:
quic
[quick, fox] 0.32 0.23 0.0 0.0 0.0 0.0

quic
[lazy, dog] 0.0 0.0 0.1 0.23 0.0 0.0

2
[rabid, 0.0 0.0 0.0 0.0 0.56 0.12

hare]
[lazy, fox] 0.0 0.23 0.1 0.0 0.0 0.0

2
Apparently, [lazy, fox] is more similar to [lazy, dog] than [rabid, hare], so the vector
distance of [lazy, fox] and [lazy, dog] is smaller than that to [lazy, fox] and [rabid, hare].
QUESTION 4 OF 5
Imagine the words "monkey", "rabbit", "bird" and "raven" are represented
by vectors with the same length. Based on the meanings of the words,
which two words would we expect to have the smallest vector distance?

"monkey" and "rabbit"
"monkey" and "raven"
"rabbit" and "bird"
"raven" and "bird"

ENVIAR
2.12 Two perspectives on ML
Two Perspectives on ML
Computer science vs. Statistical perspective

As you can see, data plays a central role in how problems are modeled in machine
learning. In very broad terms, we can think of machine learning as a matter of using
some data (perhaps historical data that we already have on hand) to train a model.
Then, once the model is trained, we can feed it new input data and have it tell us
something useful.
So the general idea is that we create models and then feed data into these models to
generate outputs. These outputs might be, for example, predictions for future trends
or patterns in the data.
This idea draws on work not only from computer science, but also statistics—and as a
result, you will often see the same underlying machine learning concepts described
using different terms. For example, a computer scientist might say something like:
We are using input features to create a program that can generate the desired output.
In contrast, someone with a background in statistics might be inclined to say something
more like:
We are trying to find a mathematical function that, given the values of the independent
variables can predict the values of the dependent variables.
While the terminology are different, the challenges are the same, that is how to get the
best possible outcome.
QUIZ QUESTION
Can you match the terms below, from the computer science perspective,
with their counterparts from the statistical perspective?
independent variable
dependent variable
function
COMPUTER SCIENCE
STATISTICAL
program
input
output
ENVIAR
In the end, having an understanding of the underlying concepts is more important than
memorizing the terms used to describe those concepts. However, it's still essential to
be familiar with the terminology so that you don't get confused when talking with
people from different backgrounds.
Over the next couple of pages, we'll take a look at these two different perspectives and
get familiar with some of the related terminology.
2.13 The Computer Science Perspective
Computer science terminology

As we discussed earlier, one of the simplest ways we can organize data for machine learning is in
a table, like the table of clothing products we looked at earlier in this lesson:
908721 Guess Blue 789 45.33
456552 Tillys Red 244 22.91

789921 A&F Gree 387 25.92

n
872266 Guess Blue 154 17.56
What are some of the terms we can use to describe this data?
For the rows in the table, we might call each row an entity or an observation about an entity. In
our example above, each entity is simply a product, and when we speak of an observation, we are
simply referring to the data collected about a given product. You'll also sometimes see a row of
data referred to as an instance, in the sense that a row may be considered a single example (or
instance) of data.
For the columns in the table, we might refer to each column as a feature or attribute which
describes the property of an entity. In the above
example, color and quantity are features (or attributes) of the products.
Input and output

Remember that in a typical case of machine learning, you have some kind of input which you
feed into the machine learning algorithm, and the algorithm produces some output. In most
cases, there are multiple pieces of data being used as input. For example, we can think of a single
row from the above table as a vector of data points:
(908721, Guess, Blue, 789, 45.33)
Again, in computer science terminology, each element of the input vector (such
as Guess or Blue ) is referred to as an attribute or feature. Thus, we might feed these input
features into our machine learning program and the program would then generate some kind of
desired output (such as a prediction about how well the product will sell). This can be
represented as:
Output = Program(Input Features)
An important step in preparing your data for machine learning is extracting the relevant features
from the raw data. (The topic of feature extraction is an important one that we'll dive into in
greater detail in a later lesson.)
QUESTION 1 OF 2
Have a look at this data:

I
D Name Species Age
1 Jake Cat 3
2 Bailey Dog 7
3 Jenna Dog 4
4 Marco Cat 12
Which of the following terms might we use to refer to the part of the table that is
highlighted?
(Select all that apply.)

A row
An attribute
An entity
An instance
An input vector
A feature
ENVIAR
QUESTION 2 OF 2
And how about now?
I
D Name Species Age
1 Jake Cat 3
2 Bailey Dog 7
3 Jenna Dog 4
4 Marco Cat 12
Which of the following terms might we use to refer to the part of the table that is
highlighted?

A column

An attribute

An entity

An instance
The Computer Science Perspective
Computer science terminology

As we discussed earlier, one of the simplest ways we can organize data for machine
learning is in a table, like the table of clothing products we looked at earlier in this
lesson:
908721 Guess Blue 789 45.33
456552 Tillys Red 244 22.91
789921 A&F Gree 387 25.92

n
872266 Guess Blue 154 17.56
What are some of the terms we can use to describe this data?
For the rows in the table, we might call each row an entity or an observation about an
entity. In our example above, each entity is simply a product, and when we speak of
an observation, we are simply referring to the data collected about a given product.
You'll also sometimes see a row of data referred to as an instance, in the sense that a
row may be considered a single example (or instance) of data.
For the columns in the table, we might refer to each column as
a feature or attribute which describes the property of an entity. In the above
example, color and quantity are features (or attributes) of the products.
Input and output

Remember that in a typical case of machine learning, you have some kind
of input which you feed into the machine learning algorithm, and the algorithm
produces some output. In most cases, there are multiple pieces of data being used as
input. For example, we can think of a single row from the above table as a vector of
data points:
(908721, Guess, Blue, 789, 45.33)
Again, in computer science terminology, each element of the input vector (such
as Guess or Blue) is referred to as an attribute or feature. Thus, we might feed
these input features into our machine learning program and the program would then
generate some kind of desired output (such as a prediction about how well the product
will sell). This can be represented as:
Output = Program(Input Features)
An important step in preparing your data for machine learning is extracting the relevant
features from the raw data. (The topic of feature extraction is an important one that
we'll dive into in greater detail in a later lesson.)
QUESTION 1 OF 2
Have a look at this data:
I
D Name Species Age
1 Jake Cat 3
2 Bailey Dog 7
3 Jenna Dog 4
4 Marco Cat 12
Which of the following terms might we use to refer to the part of the table
that is highlighted?

 A row
An attribute
 An entity
 An instance
 An input vector

A feature
ENVIAR
QUESTION 2 OF 2
And how about now?

I
D Name Species Age
1 Jake Cat 3
2 Bailey Dog 7
3 Jenna Dog 4
4 Marco Cat 12
Which of the following terms might we use to refer to the part of the table
that is highlighted?

 A column
 An attribute

An entity
An instance
 A feature
ENVIAR
SIGUIENTE
The Statistical Perspective
Statistical terminology
In statistics, you'll also see the data described in terms of independent
variables and dependent variables. These names come from the idea that the value
of one variable may depend on the value of some other variables. For example, the
selling price of a house is the dependent variable that depends on some independent
variables—like the house's location and size.
In the example of clothing products we looked at earlier in this lesson:
90872 Guess Blue 789 45.33

1
45655 Tillys Red 244 22.91

2
78992 A&F Green 387 25.92

1
87226 Guess Blue 154 17.56

6
We might use data in each row (e.g. (908721, Guess, Blue, 789, 45.33)) to
predict the sale of the corresponding item. Thus, the sale of each item is dependent on
the data in each row. We can call the data in each row the independent variables and
call the sale the dependent variable.
Input and output

From a statistical perspective, the machine learning algorithm is trying to learn a
hypothetical function (f) such that:
Output Variable = f(Input Variables)
Typically, the independent variables are the input, and the dependent variables are the
output. Thus, the above formula can also be expressed as:
Dependent Variable = f(Independent Variables)
In other words, we are feeding the independent variables into the function, and the
function is giving us the resulting values of the dependent variables. With the housing
example, we might want to have a function that can take the independent variables
of size and location as input and use these to predict the likely selling price of the
house as output.
Yet another way to represent this concept is to use shorthand notation. Often, the
input variables are denoted as X and the output variable is denoted as Y:
Y = f(X)
In the case of multiple input variables, X would be an input vector, meaning that it
would be composed of multiple individual inputs (e.g. (908721, Guess, Blue,
789, 45.33)). When this is the case, you'll see the individual inputs denoted with a
subscript, as in X1, X2, X3, and so on
The Tools for Machine Learning

Many tools have been developed to make machine learning more powerful and easier
to implement. On this page, we'll take a look at the typical components you might
employ in a machine learning ecosystem. You don't need to understand the details of
these tools at this stage, and we don't assume you've had previous experience with
them. Our goal at this point is simply to give you some idea of what some of the
popular tools are and how they relate to one another.
The Machine Learning Ecosystem

A typical machine learning ecosystem is made up of three main components:
1. Libraries. When you're working on a machine learning project, you likely will not
want to write all of the necessary code yourself—instead, you'll want to make use of
code that has already been created and refined. That's where libraries come in.
A library is a collection of pre-written (and compiled) code that you can make use of in
your own project. NumPy is an example of a library popularly used in data science,
while TensorFlow is a library specifically designed for machine learning. Read
this article for some other useful library.
2. Development environments. A development environment is a software application
(or sometimes a group of applications) that provide a whole suite of tools designed to
help you (as the developer or machine learning engineer) build out your
projects. Jupyter Notebooks and Visual Studio are examples of development
environments that are popular for coding many different types of projects, including
machine learning projects.
3. Cloud services. A cloud service is a service that offers data storage or computing
power over the Internet. In the context of machine learning, you can use a cloud
service to access a server that is likely far more powerful than your own machine, or
that comes equipped with machine learning models that are ready for you to use. Read
more information about different cloud services from this article
For each of these components, there are multiple options you can choose from. Let's
have a look at some examples.
Notebooks
Notebooks are originally created as a documenting tool that others can use to
reproduce experiments. Notebooks typically contain a combination of runnable code,
output, formatted text, and visualizations. One of the most popular open-source
notebooks used today by data scientists and data science engineers is Jupyter
notebook, which can combine code, formatted text (markdown) and visualization.
Notebooks contains several independent cells that allow for the execution of code
snippets within those cells. The output of each cell can be saved in the notebook and
viewed by others.
End-to-end with Azure

You can analyze and train a small amount of data with your local machine using Jupyter
notebook, Visual studio, or other tools. But with very large amounts of data, or you
need a faster processor, it's a better idea to train and test the
model remotely using cloud services such as Microsoft Azure. You can use Azure Data
Science Virtual Machine, Azure Databricks, Azure Machine Learning Compute, or SQL
server ML services to train and test models and use Azure Kubernetes to deploy
models.
QUIZ QUESTION
Below are the development environments we just discussed. Can you

match each one with its description?
Jupyter Notebooks
Azure Databricks
Visual Studio Code
Visual Studio
DESCRIPTION
DEVELOPMENT ENVIRONMENT
Microsoft's core development environment
Open-source tool that can combine code, markdown, and visualizations together in a
single document.
A light-weight code editor from Microsoft
Data analytics platform, optimized for use with Microsoft cloud services
ENVIAR
SIGUIENTE
Libraries for Machine Learning
For your reference, here are all the libraries we went over in the video. This is a lot of
info; you should not feel like you need to be deeply knowledgable about every detail of
these libraries. Rather, we suggest that you become familiar with what each library
is for, in general terms. For example, if you hear someone talking about matplotlib, it
would be good for you to recognize that this is a popular library for data visualization.
Or if you see a reference to TensorFlow, it would be good to recognize this as a popular
machine learning library.
Core Framework and Tools
 Python is a very popular high-level programming language that is great for data
science. Its ease of use and wide support within popular machine learning platforms,
coupled with a large catalog of ML libraries, has made it a leader in this space.
 Pandas is an open-source Python library designed for analyzing and
manipulating data. It is particularly good for working with tabular data and time-series
data.
 NumPy, like Pandas, is a Python library. NumPy provides support for large,
multi-dimensional arrays of data, and has many high-level mathematical functions that
can be used to perform operations on these arrays.
Machine Learning and Deep Learning
 Scikit-Learn is a Python library designed specifically for machine learning. It is
designed to be integrated with other scientific and data-analysis libraries, such
as NumPy, SciPy, and matplotlib (described below).
 Apache Spark is an open-source analytics engine that is designed for cluster-
computing and that is often used for large-scale data processing and big data.
 TensorFlow is a free, open-source software library for machine learning built
by Google Brain.
 Keras is a Python deep-learning library. It provide an Application Programming
Interface (API) that can be used to interface with other libraries, such as TensorFlow, in
order to program neural networks. Keras is designed for rapid development and
experimentation.
 PyTorch is an open source library for machine learning, developed in large part
by Facebook's AI Research lab. It is known for being comparatively easy to use,
especially for developers already familiar with Python and a Pythonic code style.
Data Visualization
 Plotly is not itself a library, but rather a company that provides a number of
different front-end tools for machine learning and data science—including an open
source graphing library for Python.
 Matplotlib is a Python library designed for plotting 2D visualizations. It can be
used to produce graphs and other figures that are high quality and usable in
professional publications. You'll see that the Matplotlib library is used by a number of
other libraries and tools, such as SciKit Learn (above) and Seaborn (below). You can
easily import Matplotlib for use in a Python script or to create visualizations within a
Jupyter Notebook.
 Seaborn is a Python library designed specifically for data visualization. It is based
on matplotlib, but provides a more high-level interface and has additional features for
making visualizations more attractive and informative.
 Bokeh is an interactive data visualization library. In contrast to a library like
matplotlib that generates a static image as its output, Bokeh generates visualizations in
HTML and JavaScript. This allows for web-based visualizations that can have interactive
features.
QUIZ QUESTION
Below are some of the libraries we just went over. See if you can match
each library with its main focus.
Machine learning
Data visualization
Data visualization
Analyzing/manipulating data
Machine learning
LIBRARY
WHAT IS IT FOR?
TensorFlow
Matplotlib
Pandas
PyTorch
Bokeh
ENVIAR
SIGUIENTE

ML Microsoft Course Overview: Machine Learning in Context

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

ML Microsoft Course Overview: Machine Learning in Context

Hochgeladen von

Copyright:

Verfügbare Formate

ML Microsoft

Machine Learning in Context

More Advanced Techniques

Here are the main topics we'll cover:

 What machine learning is and why it's so important in today's world

What is Machine Learning?

What type of approach is shown in this image?

What type of approach is shown in this image?

Imagine you want to create a function that multiplies two numbers

Which is the best approach for creating this program?

In traditional programming, the inputs of hard-coded rules and data are

Historical answers are input to train an algorithm

Rules are explicitly programmed

Rules are the output learned by the algorithm

Examples of Applied Machine Learning

Automate the recognition of disease

Recommend next best actions for individual care plans

Enable personalized, real-time banking experiences with chatbots

Identify the next best action for the customer

Brief History of Machine Learning

There is often confusion between the terms machine learning, deep

The Data Science Process

Evaluating the model

Deploying the model

Have a look at this graph:

SKU Make Color Quantity Price

908721 Guess Blue 789 45.33

456552 Tillys Red 244 22.91

789921 A&F Gree 387 25.92

872266 Guess Blue 154 17.56

Looking at the table above, can you figure out what

Below are the components of a table. What does each of these

Two common approaches to scaling data include standardization and normalization.

(50 − 100)/50 = -50/50 = -1

(50 − 50)/100 = 0/100 = 0

Which of the below refers to standardization and which refers

Standardize -5,10,15. Knowing that the mean is 7 and the standard

-1.2, 0.3, 0.5

-1.2, 0.5, 0.8

Normalize -5,10,15. Knowing that the mean is 7 and the standard

0.0, 0.75, 1.0

0.5, 0.75, 0.8

0.0, 0.65, 0.8

2.9 Encoding Categorical Data.

908721 Guess Blue 789 45.33

456552 Tillys Red 244 22.91

789921 A&F Gree 387 25.92

872266 Guess Blue 154 17.56

If we apply ordinal encoding to the Make property, we get the following:

And if we apply it to the Color property, we get:

Using the above encoding, the transformed table is shown below:

908721 1 2 789 45.33

456552 2 0 244 22.91

789921 0 1 387 25.92

872266 1 2 154 17.56

908721 0 1 0 0 0 1 789 45.33

456552 0 0 1 1 0 0 244 22.91

789921 1 0 0 0 1 0 387 25.92

872266 0 1 0 0 0 1 154 17.56

Have a look at this tabular data:

What type of encoding has been performed on this?

Looking again at the table in the previous question, what category is

Again looking at the above animals table, suppose we do the following: