27 views

Uploaded by oletoro14

- 596
- Benchmarking Violent Scenes Detection in Movies
- Automatic Text Classification ThroughMachine Learning
- 11_Machine_Learning_System_Design.pdf
- Information Extraction From WWW - A Survey
- Ng_MLY01_07.pdf
- Spell Check
- Cancer Prediction Paper
- New Filtering Approaches for Phishing Email
- Automatic Satire Detection
- 08Clas
- LearningToLinkWithWikipediaM IHW LearningToLinkWithWikipedia
- 4-1-16-23
- Machova Barcak Bednar 6
- Text Classification
- 5
- Pruning Decision Trees
- 08120647 (1)
- Efficient Road Patch Detection based on Active Contour Segmentation
- A Machine Learning Tutorial With Examples _ Toptal

You are on page 1of 14

In the next few lessons I'd like to talk about machine learning system design. These lessons will touch on

the main issues that you may face when designing a complex machine learning system. And even though

the next set of videos may seem somewhat less mathematical, I think that this material may turn out to

be very useful, and potentially huge time savers when you're building big machine learning systems.

Concretely, I'd like to begin with the issue of prioritizing how to spend your time on what to work on, and

I'll begin with an example on spam classification. Let's say you want to build a spam classifier. Here are a

couple of examples of obvious spam and non-spam emails

Notice how spammers will deliberately misspell words, like Vincent with a 1, and mortgages with 0

instead an o. Let's say we have a labeled training set of some number of spam emails and some non-

spam emails denoted with labels = 1 0, how do we build a classifier using supervised learning to

distinguish between spam and non-spam?

In order to apply supervised learning, the first decision we must make is how do we want to represent X,

that is the features of the email. Given the features X and the labels y in our training set, we can then train

a classifier, for example using logistic regression. Here's one way to choose a set of features for our emails:

we could come up with a list of maybe a hundred words that we think are indicative of whether e-mail is

spam or non-spam, for example the words

So, if a piece of e-mail contains the word 'deal' or buy or discount maybe it's more likely to be spam,

whereas if a piece of email contains your name it would be more like to be a non-spam email.

Given the word features, we can then take a piece of email and encode it into a feature vector as follows

Given a piece of e-mail like this shown, I'm going to check and see whether or not each of the feature

words appears in the e-mail. And then I'm going to define a feature vector X like shown. In general, I'm

not going to recount how many times the word occurs.

So, that gives me a feature representation of a piece of email. By the way, even though I've described this

process as manually picking a hundred words, in practice what's most commonly done is to look through

a training set, and in the training set pick the most frequently occurring words where is usually

between ten thousand and fifty thousand, and use those as your features.

Now, if you're building a spam classifier one question that you may face is, what's the best use of your

time in order to make your spam classifier have higher accuracy, or lower error? One natural inclination

is going to collect lots of data, because in fact there's this tendency to think that more data gives a better

algorithm. And in fact, in the email spam domain, there are actually pretty serious projects called

HoneyPot projects, which create fake email addresses to try to collect tons of spam email and use it to

train learning algorithms.

The problem is that collecting more data is not always helpful. So, for most machine learning problems,

there are a lot of other things you could usually imagine doing to improve performance. For spam, one

thing you might think of is to develop more sophisticated features on the email, maybe based on the

email routing information. And this would be information contained in the email header. So, when

spammers send email, very often they will try to obscure the origins of the email, and maybe use fake

email headers, or send email through very unusual sets of computer service through very unusual routes,

in order to get the spam to you. And some of this information will be reflected in the email header. Hence,

looking at the email headers its possible to develop more sophisticated features to identify if something

is spam. Something else you might consider doing is to look at the email message body, that is the email

text, and try to develop more sophisticated features. For example, should the word 'discount' and the

word 'discounts' be treated as the same words or should we have treat the words 'deal' and 'dealer' as

the same word? Still in this case we can develop more complex features about punctuation because

maybe spam is using exclamation marks and misspellings words a lot more. Summarizing the spam

classification options

The point is that in machine learning problems the number of options available is not few. And what

usually happens is that a research group or product group will randomly fixate on one option. And

sometimes that turns out not to be the most fruitful way to spend your time. Hence, what Id like to tell

you in next lesson is about the concept of error analysis and talk about the way where you can try to have

a more systematic way to choose amongst the options of the many different things you might work. And

therefore be more likely to select what is actually a good way to spend your time in machine learning

system design.

In the last lesson, I talked about that when faced with a machine learning problem, there are often lots of

different ideas on how to improve the algorithm. In this lesson let's talk about the concepts of error

analysis which will help you to be more systematically when making some of these decisions. If you're

starting work on a machine learning product or building a machine learning application, it is often

considered very good practice to start, not by building a very complicated system with lots of complex

features, but to instead start by building a very simple algorithm, that you can implement quickly.

And when I start on a learning problem, what I usually do is spend at most one day, literally at most 24

hours to try to get something really quick and dirty; not at all sophisticated system. And then implement

it and test it on my cross validation data. Once you've done that, you can then plot learning curves to try

to figure out if your learning algorithm may be suffering from high bias or high variance or something else

and use that to try to decide if having more data and more features and so on are likely to help.

And the reason that this is a good approach is often when you're just starting out on a learning problem,

there's really no way to tell in advance whether you need more complex features or whether you need

more data or something else. And it's often by implementing even a very, very quick and dirty

implementation and by plotting learning curves that that helps you make these decisions. So, you can

think of this as a way of avoiding what's sometimes called premature optimization in computer

programming.

In addition to plotting learning curves, one other thing that's often very useful to do is what's called error

analysis. And what I mean by that is that when building, say a spam classifier, I will often look at my cross

validation set and manually look at the emails that my algorithm is making errors on. So, look at the spam

emails and non-spam emails that the algorithm is misclassifying, and see if you can spot any systematic

patterns in what type of that examples. And often by doing that, this is the process that would inspire you

to design new features.

*Obs.: usually error analysis is doing on cross validations set rather than on test set.

Summarizing

Concretely, here's a specific example. Let's say you've built a spam classifier and you have 500 examples

in your cross-validation set

(i) The idea is to categorize the misclassified emails. For example in previous 100 misclassified emails we

could get this

And by counting up the number of emails in these different categories you might discover, for example,

that the algorithm is doing really particularly poorly on emails trying to steal passwords, and that may

suggest that it might be worth your effort to look more carefully at that type of email, and see if you can

come up with better features to categorize them correctly.

(ii) And also, what I might do is look at what features might have helped the algorithm classify the emails.

So let's say we have these features with some frequencies

And if this is what you get on your cross validation set then it really tells you that maybe deliberate spelling

is a sufficiently rare phenomenon that maybe is not really worth all your time trying to write algorithms

to detect that. But if you find a lot of spammers are using unusual punctuation then maybe that's a strong

sign that it might actually be worth your while to spend the time to develop more sophisticated features

based on this.

So, this sort of error analysis which is really the process of manually examining the mistakes that the

algorithm makes, can often help guide you to the most fruitful avenues to pursue. And this also explains

why I often recommend implementing a quick and dirty implementation of an algorithm. What we really

want to do is figure out what are the most difficult examples for an algorithm to classify. And very often

for different learning algorithms, they'll often find similar categories of examples difficult. And by having

a quick and dirty implementation, that's often a quick way to let you identify some errors and identify

what are the hard examples so that you can focus your efforts on those.

Lastly, when developing learning algorithms, one other useful tip is to make sure that you have way of

make the numerical evaluation of your learning algorithm. Now what I mean by that is that if you're

developing a learning algorithm, it is often incredibly helpful if you have a way of evaluating your learning

algorithm that just gives you back a single real number. Maybe accuracy, maybe error. But the single real

number that tells you how well your learning algorithm is doing. I'll talk more about this specific concepts

in later lessons, but here's a specific example.

Let's say we are trying to decide whether or not we should treat words like discount, discounts, discounter,

discounting, as the same word. So maybe one way to do that is to just look at the first few characters in a

word. In natural language processing, the way that this is done is actually using a type of software called

stemming software (for example Porter stemming). But using a stemming software that basically looks at

the first few alphabets can be bad, because some information are lost. So if you're trying to decide

whether or not to use stemming software for a stem classifier, maybe error analysis may not actually be

helpful. Instead, the best way to figure out if using stemming software is good to help your classifier is if

you have a way to numerically evaluate your algorithm. Concretely, maybe the most natural thing to do

is to look at the cross validation error of the algorithm's performance with and without stemming. So, if

you run your algorithm without stemming and you end up with

In this case it looks like using stemming is a good idea. For this particular problem, there's a very natural

single real number evaluation metric, namely, the cross validation error. We'll see later, examples where

coming up with this single number evaluation metric may need a little bit more work. But in some cases

this can be used. For example, we can use this same sort of evaluation to decide if we should treated

lower case words than upper case words differently

So when you're developing a learning algorithm, very often you'll be trying out lots of new ideas and lots

of new versions. And by having a single rule number evaluation metric, you can then just look and see if a

new idea has improved or worsened the performance of the learning algorithm and this will let you often

make much faster progress.

In the previous lesson, I talked about error analysis and the importance of having error metrics, which is

the use of a single real number evaluation metric for your learning algorithm to tell how well it's doing. In

the context of evaluation and of error metrics, there is one important case, where it's particularly tricky

to come up with an evaluation metric for your learning algorithm. That case is the case of what's called

skewed classes.

Consider the problem of cancer classification, where we have features of medical patients and we want

to decide whether or not they have cancer. So this is like the malignant versus benign tumor classification

example that we had earlier. So, we have this setting

So, we're making 99% correct diagnosis. Seems like a really impressive result, because we're correct 99%

percent of the time. But now, let's say we find out that only 0.5% of patients in our training test sets

actually have cancer. So only half a percent of the patients that come through our screening process have

cancer. In this case, the 1% error no longer looks so impressive. And in particular, here's a piece of non-

learning code that takes this input of features X and it ignores it

It always predicts nobody has cancer and this algorithm would actually get 0.5% error. So this is even

better than the 1% error that we were getting just now and this is a non-learning algorithm that is just

predicting = 0 all the time.

This setting when the ratio of positive or negative examples is very close to one of two extremes, where,

in this case, the number of positive examples is much smaller than the number of negative examples ( =

1 is very rarely), is what we call the case of skewed classes. We just have a lot more of examples from one

class than from the other class. And by just predicting the most frequent class all the time an algorithm

can do pretty well.

So, in skewed classes a single error evaluation metric could not tell you if the algorithm are doing good or

are just greedy to the most frequent class. One such evaluation metric that handles this problem is called

precision/recall.

Let's say we are evaluating a binary classifier on the test set. And lets assume that we have skewed classes

where we have = 1 for the rarest class (in precision/recall evaluation this is an very used convention)

False positive: the algorithm classified a negative example as a positive one. It saw a common example

as rare. For example it classified a patient without cancer as having cancer.

False negative: the algorithm classified a positive example as a negative one. It saw a rare example as

common. For example it classified a patient with cancer as healthy.

Given these definitions here's a different way of evaluating the performance of our algorithm. We're going

to compute two numbers. The first is called precision

The second number we're going to compute is called recall

So by computing precision and recall this will usually give us a better sense of how well our classifier is

doing. And in particular if we have a learning algorithm that predicts = 0 (most frequent class) all the

time, then this classifier will have precision and recall equal to zero, because there won't be any true

positives and so that's a quick way for us to recognize that isn't a very good classifier (considering = 1

as the rare class).

Any false positive (the algorithm dont make positive predictions) ;

A few false negatives (the few ones that were positive and we predicted as negatives;

All true negatives.

And by using precision and recall, we find that it's not possible for an algorithm to "cheat" and predict

= 0 or = 1 all the time and get high precision and recall. And in particular, if a classifier is getting high

precision and high recall, then we are actually confident that the algorithm has to be doing well, even if

we have very skewed classes. So for the problem of skewed classes precision and recall gives us more

direct insight into how the learning algorithm is doing and this is often a much better way to evaluate our

learning algorithms, than looking just at classification error or classification accuracy.

In the last lesson, we talked about precision and recall as an evaluation metric for classification problems

with skew classes. For many applications, we'll want to somehow control the trade off between position

and recall. Let me tell you how to do that and also show you even more effective ways to use precision

and recall as an evaluation metric for learning algorithms.

As a reminder, here are the definitions of precision and recall from the previous lesson

And let's continue our cancer classification example, where = 1 if the patient has cancer and = 0

otherwise. And let's say we've trained in logistic regression classifier, which outputs probabilities between

zero and one. So, as usual, we're going to predict one, if () 0.5 and predict zero if () < 0.5 and

this classifier may give us some value for precision and some value for recall

But now, suppose we want to predict that a patient has cancer only if we're very confident that they

really do. Because you know if you go to a patient and you tell them that they have cancer, it's going to

give them a huge shock because this is seriously bad news and they may end up going through a pretty

painful treatment process. And so maybe we want to tell someone that we think they have cancer only if

we're very confident. One way to do this would be to modify the algorithm, so that instead of setting the

threshold at 0.5, we might instead say that we'll predict that = 1, only if () 0.7

And if you do this then you're predicting some of this cancer only when you're more confident (on rare

class to avoid false positives), and so you end up with a classifier that has higher precision, because all

the patients that you're going to predict as having cancer, all of those patients are now pretty chance as

actually have cancer. And so, a higher fraction of the patients that you predict to have cancer, will actually

turn out to have cancer, because you restricted the number of prediction increasing the threshold for =

1 prediction. But in contrast, this classifier will have lower recall, because we are going to predict = 1

on a smaller number of patients and the number of false negatives increased.

Now we could even take this further. Instead of setting the threshold at 0.7, we can set this at 0.9 and

we'll predict = 1 only if we are more than 90% certain that the patient has cancer, and so, you know, a

large fraction that those patients who were classified as = 1 will turn out to have cancer and so, this is

a high precision classifier. However, it will have low recall because since we increase the threshold for =

1 prediction its more likely to avoid some actual patient who indeed have cancer (for example one who

had cancer and had 65% chance as having cancer by the algorithm is classified as not having wrongly).

Now consider a different example. Suppose we want to avoid missing too many actual cases of cancer.

So we want to avoid the false negatives. In particular, if a patient actually has cancer, but we fail to tell

them that they have cancer, then that can be really bad, because we deny the necessary treatment. In

this case, rather than setting higher probability threshold, we might instead take this value and set it to a

lower value, like

By doing so, we are being more conservative and tell them that they may have cancer, so they can seek

treatment if necessary. And in this case, what we would have is going to be a higher recall classifier,

because we're going to be correctly flagging a higher fraction of all of the patients that actually do have

cancer, but we're going to end up with lower precision, because the higher fraction of the patients that

we said have cancer will turn out not to have cancer after all.

And so, in general, for most classifiers, there is going to be a trade off between precision and recall. And

as you vary the value of this threshold, you can actually plot us some curve that trades off precision and

recall

And as you vary the threshold you can actually trace all the curve from your classifier to see the range of

different values you can get for precision recall. And by the way, the position recall curve can look like

many different shapes (as illustrated above) depending on the details of the classifier.

So is there a way to choose this threshold automatically? Or, more generally, if we have a few different

algorithms or a few different ideas for algorithms, how do we compare different precision recall numbers?

For example, suppose we have three different learning algorithms, (or actually the same algorithm with

different values for the threshold), how do we decide which of these algorithms is best?

One of the things we talked about earlier is the importance of a single real number evaluation metric. And

that is the idea of having a number that just tells you how well is your classifier doing. But by switching to

the precision recall metric, we've actually lost that. Where as in contrast, if we had a single real number

evaluation metric, that helps us much more quickly decide which algorithm to go with and helps us as well

to much more quickly evaluate different changes that we may be contemplating for an algorithm. So, how

can we get a single real number evaluation metric?

One natural thing that you might try is to look at the average between precision and recall and look at

what classifier has the highest average value

But this turns out not to be such a good solution because if we have a classifier that predicts = 1 all the

time (algorithm 3), then if you do that, you would get the highest value for the average, and thats wrong.

In contrast, there is a different way of combining precision recall. It is called the F-score and it uses that

formula

So, for this example we have

And in this case Algorithm 1 has the highest F-score; Algorithm 2 has the second highest and algorithm 3

has the lowest. The F-score, which is also called the F1-score it kind of combines precision and recall. But

for the F-score to be large, both position and recall have to be large, none of them can be small.

I should say that there are many different possible formulas for combining position and recall. This F-score

formula is really just one out of a much larger number of possibilities, but historically or traditionally this

is what people in machine learning use. And this usually gives you the effect that you want because if

either precision or recall is zero, this gives you a very low value. Whereas in contrast a perfect F-score

would be if precision equals one and recall equals one, that would give you an F-score equals one. And

this usually gives a reasonable rank ordering of different classifiers.

So, in this lesson we talked about the notion of trading off between position and recall and how we can

vary the threshold that we use to decide whether to predict = 1 or = 0. And of course, if your goal is

to automatically set that threshold one pretty reasonable way to do that would also be to try a range of

different values of thresholds and evaluate these different thresholds on your cross validation set, and

then to pick whatever value of threshold gives you the highest F-score on your cross validation setting.

In this lesson, I'd like to switch tracks a bit and touch on another important aspect of machine learning

system design, that is the issue of how much data to train on.

Now, in some earlier lessons, I had cautioned against blindly going out and just spending lots of time

collecting lots of data, because it's only sometimes that that would actually help. But it turns out that

under certain conditions, and I will say in this lesson what those conditions are, getting a lot of data and

training on a certain type of learning algorithm, can be a very effective way to get a learning algorithm to

do very good performance.

For example, consider the problem of classifying between confusable words, so for example, in the

sentence

should it be to, two or too? Well, for this example, it would be two. So, this is one example of a set of

confusable words. So researches try to work with this problem and they took a few different learning

algorithms which were sort of considered state of the art back in the day, like

The exact details of these algorithms aren't important. Think of this as just picking four different

classification algorithms. But what they did was they varied the training set size and tried out these

learning algorithms on a range of training set sizes and that's the result they got

And the trends were that these algorithms had remarkably similar performance. Note that the

performance of the algorithms all pretty much monotonically increase (horizontal axis is the training set

size in millions), but that raise is not too much if you observe more careful. See that, for example, if you

take some "superior algorithm" it could easily beat the accuracy of an inferior algorithm with tons more

data. But more data can help (as we can see above).

Let's try to lay out a set of assumptions under which having a massive training set we think will be able to

help. Let's assume that

For example, if we take the confusable words problem we could capture what are the surrounding words

around the blank space that we're trying to fill in. So usually that is pretty much information to tell me the

correct word that I want in the blank space. So, just by looking at surrounding words then that gives me

enough information to pretty unambiguously decide what is the label in some set of confusable words.

So that's an example where the future has sufficient information to specify .

For a counter example consider a problem of predicting the price of a house from only the size of the

house and from no other features. Well, there's so many other factors that would affect the price of a

house other than just the size of a house that it's actually very difficult to predict the price accurately. So

that would be a counter example to this assumption that the features have sufficient information to

predict the price to the desired level of accuracy.

So, one way to think if my features are enough is given the input features , given the information

available, if we were to go to human expert in this domain, can a human experts actually predict the value

of ? For the first example if we go to an expert human English speaker, this person could easily predict

a particular problem of confusable words (like the too, two, to problem). But in contrast, if we go to an

expert in houses market and if I just tell them the size of a house and I ask what the price is, probably

them could not tell me an accuracy prediction.

So, suppose the features have enough information to predict the value of based on . And let's suppose

we use a learning algorithm with a large number of parameters and features (like a neural network with

many hidden units). So these would be a powerful learning algorithm with a lot of parameters that can fit

very complex functions. So, I'm going to call these as low-bias algorithm because you can fit very complex

functions. Chances are, if we run an algorithm like this on data sets, that you fit the training set well, given

a small training error.

Now let's say, we use a massive training set where , so the hypothesis function will be unlikely to

overfit, and therefore the test error would be small as training error. And this would be a very good

performance. And in this casa a lot of data could help in high complex hypothesis to avoid overfitting.

Another way to think about this is that in order to have a high performance learning algorithm we want it

not to have high bias and not to have high variance. So the bias problem we're going to address by making

sure we have a learning algorithm with many parameters (the number of parameters would follow the

number of features that achieve that assumption that a human expert could solve the problem given

available features) and by using a very large training set, this ensures that we don't have a variance

problem.

- 596Uploaded byMax Grin
- Benchmarking Violent Scenes Detection in MoviesUploaded byAnonymous ntIcRwM55R
- Automatic Text Classification ThroughMachine LearningUploaded bySrinu Srinivas
- 11_Machine_Learning_System_Design.pdfUploaded bymarc
- Information Extraction From WWW - A SurveyUploaded byDaniel Bittencourt Silva
- Ng_MLY01_07.pdfUploaded byKhushali Dudhrejia
- Spell CheckUploaded byMohit Sinha
- Cancer Prediction PaperUploaded byAbdullah Nadeem
- New Filtering Approaches for Phishing EmailUploaded byseventhsensegroup
- Automatic Satire DetectionUploaded byMyagmarbayar Nergui
- 08ClasUploaded byiltdf
- LearningToLinkWithWikipediaM IHW LearningToLinkWithWikipediaUploaded byHeba Lubbad
- 4-1-16-23Uploaded bydiankusuma123
- Machova Barcak Bednar 6Uploaded byTexi_XD
- Text ClassificationUploaded byBrahmesh Mandya
- 5Uploaded byAmal Raju
- Pruning Decision TreesUploaded byLn Amitav Biswas
- 08120647 (1)Uploaded byShivaprasad.b.k
- Efficient Road Patch Detection based on Active Contour SegmentationUploaded byIJIRST
- A Machine Learning Tutorial With Examples _ ToptalUploaded byinag2012
- Syllabus_3650014 Machine LearningUploaded bykn patel
- Efficient Diversity Aware Retrieval System for Handling Medical QueriesUploaded byesatjournals
- 10.1016@j.inffus.2013.04.006Uploaded byZakaria Sutomo
- HumanactivityrecognitionfromaccelerometerdatausingawearabledeviceUploaded byCandra Ritz
- Ml CheatsheetUploaded byvamsi krishna
- OrangeUploaded byjamesbor
- IPLSAUploaded byΣτη Γαλαρία
- IRJET-Survey on Automated System for Fake News Detection using NLP & Machine Learning ApproachUploaded byIRJET Journal
- Weakly Supervised Critical Nugget Finding Algorithm for Improving Classification AccuracyUploaded byseventhsensegroup
- Automatic Identification of Fracture Region 2018(2)-annotated.pdfUploaded bySyed Ashmad

- ConcurSosUploaded byoletoro14
- 9. Neural Networks - LearningUploaded byoletoro14
- teUploaded byoletoro14
- Skawinski, Neural Classifiers in Recognizing of the Tooth Contact of Spiral and Hypoid Gleason Bevel GearsUploaded byoletoro14
- Notes on Machine LEarningUploaded byoletoro14
- easychair.pdfUploaded byoletoro14
- Barber, Bayesian Reasoning and Machine LearningUploaded byoletoro14
- LN-1Uploaded byAbhishek Tyagi
- LN-1Uploaded byAbhishek Tyagi
- Sensor Ultrasônico HC-SR04Uploaded byoletoro14
- PerceptronUploaded byapi-3814100
- cartaceij879asdhindsUploaded byoletoro14
- Exercicios de Pneumatica e Eletropneumatica Cascata e Passo a PassoUploaded byMario Freitas Junior
- PythonUploaded byDaniel Gomes

- BS ISO 4120-2004Uploaded byNabilKnouzi
- chapter 6Uploaded byFredie Lee
- Power CalculationsUploaded byUproott
- Use of Computer Based Training to Improve Aircraft Inspection PerformanceUploaded byAriawan D Rachmanto
- Ginger for Nausea and Vomiting in Pregnancy .17Uploaded byMarnia Sulfiana
- Improving Spam Filtering by Detecting Gray MailUploaded byOleksiy Kovyrin
- Set2Uploaded byAnirudh Jayaraman
- PBL in BiologyUploaded bynelmian
- FDA Guidance for IndustryUploaded byagonzalez7466
- Assignment 2Uploaded byEhsan Karim
- Rmsa Unit IIUploaded byDurga Prasad Nalla
- Edexcel S2 Cheat SheetUploaded bysushdhake5009
- Stats TheoryUploaded byNauman-ur-Rasheed
- A2 Psychology Research MethodsUploaded byGeorge Noorland
- Pr Omah - Pertemuan 4Uploaded byufa
- Statistical Sampling QuestionUploaded byNad Adenan
- Notes research methodology semUploaded bykalisury
- 7Uploaded bymuktisolia17
- 10_Sampling and Sample Size Calculation 2009 Revised NJF_WBUploaded byObodai Manny
- R09 Correlation and Regression IFT NotesUploaded bymurary
- sample-size-753-2008Uploaded byNeven Ahmed Hassan
- StatisticsUploaded byPravin Sasatte
- Session 7 Basic Approach to Hypothesis TestingUploaded bytyseer.elbasheir
- StatisticsUploaded bya1saved
- Types of Research DesignUploaded byAshish Patel
- skittles research project statisticsUploaded byapi-329181361
- One Sample ProceduresUploaded by张伟文
- Statistics AssignmentUploaded byBatman Samra
- koyim2Uploaded bymillati.hanifah
- t_test_pdfUploaded byNikhil Gandhi