Sie sind auf Seite 1von 13

Cases, Variables, Types of variables


 Matrix and Frequency Table
 Graphs and shapes of Distributions
 Mode, Median and Mean
 Range, Interquartile Range and Box Plot
 Variance and Standard Deviation
 Z-scores
 Contingency Table, Scatterplot, Pearson’s r
 Basics of Regression
 Elementary Probability
 Random Variables and Probability Distributions
 Normal Distribution, Binomial Distribution & Poisson Distribution

EXPLORE YOUR DATA: CASES,


VARIABLES, TYPES OF VARIABLES

A data set contains informations about a sample. A Dataset consists of cases. Cases
are nothing but the objects in the collection. Each case has one or more attributes or
qualities, called variables which are characteristics of cases.

Example:

Suppose you are collecting information about breast cancer patients. Now for each and
every cancer patient you want to know the below information

1. Sample code number: id number


2. Clump Thickness: 1 – 10
3. Uniformity of Cell Size: 1 – 10
4. Uniformity of Cell Shape: 1 – 10
5. Marginal Adhesion: 1 – 10
6. Single Epithelial Cell Size: 1 – 10
7. Bare Nuclei: 1 – 10
8. Bland Chromatin: 1 – 10
9. Normal Nucleoli: 1 – 10
10. Mitoses: 1 – 10
11. Class: (2 for benign, 4 for malignant)

These features were taken from UCI Breast Cancer Dataset. You can find it here
https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original)
For this example, breast cancer patients themselves are cases and all these
characteristics of the patients are variables.
In a study, cases can be many different things. They can be individual patients and
group of patients. But they can also be, for instance, companies, schools or countries
etc.
we can have many, different kinds of variables, representing different characteristics.
Because of this reason there are various level of measurements or different types of
variables.

Categorical Variables:
Both nominal and ordinal variables can be called categorical variables.

1. Nominal Variable:

A nominal variable is made up of various categories which has no order.

Example:

Gender of a patient may be Male or Female or State where they live in. Here each
category differs from each other but there is no ranking order.

2. Ordinal Variable:

The second level of measurement is the ordinal level. There is not only a difference
between the categories of a variable; there is also an order. An example might be
Highest paid, Average Paid and Lowest Paid employee.
Quantitative/ Numerical Variables:
1. Continuous Variable:

A variable is continuous if the possible values of the variable form an interval. An


example is, again, the height of a patient. Someone can be 172 centimeters tall and 174
centimeters tall. But also, for instance, 170.2461. We don’t have a set of separate
numbers, but an infinite region of values.

2. Discrete Variable:

A variable is discrete if its possible categories form a set of separate numbers.


For the above breast cancer data Uniformity of Cell Size: 1 – 10 is an example of
discrete variable.
Z-Score Definition
REVIEWED BY ADAM HAYES

Updated Jun 24, 2019

What Is a Z-Score?
A Z-score is a numerical measurement used in statistics of a value's relationship to the mean
(average) of a group of values, measured in terms of standard deviationsfrom the mean. If a Z-
score is 0, it indicates that the data point's score is identical to the mean score. A Z-score of 1.0
would indicate a value that is one standard deviation from the mean. Z-scores may be positive or
negative, with a positive value indicating the score is above the mean and a negative score
indicating it is below the mean.

Z-scores are measures of an observation's variability and can be put to use by traders in
determining market volatility. The Z-score is more commonly known as the Altman Z-score.

Volume 75%
01:55

00:44

01:55

Z-Score

The Altman Z-Score Formula Is


The Altman Z-score is the output of a credit-strength test that helps gauge the likelihood of
bankruptcy for a publicly traded manufacturing company. The Z-score is based on five
key financial ratios that can be found and calculated from a company's annual 10-K report. The
calculation used to determine the Altman Z-score is as follows:

\begin{aligned} &\zeta=1.2A+1.4B+3.3C+0.6D+1.0E\\ &\textbf{where:}\\


&\text{Zeta}(\zeta)=\text{The Altman }Z\text{-score}\\ &A=\text{Working
capital/total assets}\\ &B= \text{Retained earnings/total assets}\\
&C=\text{Earnings before interest and taxes (EBIT)/total}\\
&\qquad\text{assets}\\ &D=\text{Market value of equity/book value of total
liabilities}\\ &E=\text{Sales/total assets} \end{aligned}
ζ=1.2A+1.4B+3.3C+0.6D+1.0Ewhere:Zeta(ζ)=The Altman Z-
scoreA=Working capital/total assetsB=Retained earnings/total assetsC=Earnings
before interest and taxes (EBIT)/totalassetsD=Market value of equity/book valu
e of total liabilitiesE=Sales/total assets
Typically, a score below 1.8 indicates that a company is likely heading for or is under the weight
of bankruptcy. Conversely, companies that score above 3 are less likely to experience
bankruptcy.

What Do Z-Scores Tell You?


Z-scores reveal to statisticians and traders whether a score is typical for a specified data set or if
it is atypical. In addition to this, Z-scores also make it possible for analysts to adapt scores from
various data sets to make scores that can be compared to one another accurately. Usability testing
is one example of a real-life application of Z-scores.

Edward Altman, a professor at New York University, developed and introduced the Z-score
formula in the late 1960s as a solution to the time-consuming and somewhat confusing process
investors had to undergo to determine how close to bankruptcy a company was. In reality, the Z-
score formula Altman developed ended up providing investors with an idea of the overall
financial health of a company.

The Difference Between Z-Scores and Standard Deviation


Standard deviation is essentially a reflection of the amount of variability within a given data set.
To calculate standard deviation, first calculate the difference between each data point and
the mean. The differences are then squared, summed and averaged to produce the variance. The
standard deviation is simply the square root of the variance, which brings it back to the original
unit of measure.

The Z-score, by contrast, is the number of standard deviations a given data point lies from the
mean. To calculate Z-score, simply subtract the mean from each data point and divide the result
by the standard deviation.

For data points that are below the mean, the Z-score is negative. In most large data sets, 99% of
values have a Z-score between -3 and 3, meaning they lie within three standard deviations above
and below the mean.

Altman Z-Score Plus


Altman developed and released the Altman Z-Score Plus in 2012. This formula is used to
evaluate both public and private companies and can be used for non-manufacturing companies as
well as manufacturing companies. The Z-Score Plus is suitable for companies in the United
States as well as non-US companies, including those in emerging economies, such as China.

 Z-scores are used in statistics to measure an observation's deviation from the group's mean
value.
 Z-scores reveal to statisticians and traders whether a score is typical for a specified data set or if
it is atypical.
 The Altman Z-Score is frequently used in testing credit strength.

Limitations of Z-Scores
Alas, the Z-score is not perfect and needs to be calculated and interpreted with care. For starters,
the Z-score is not immune to false accounting practices. Since companies in trouble may be
tempted to misrepresent financials, the Z-score is only as accurate as the data that goes into it.

The Z-score also isn't much use for new companies with little to no earnings. These companies,
regardless of their financial health, will score low. Moreover, the Z-score doesn't address the
issue of cash flows directly, only hinting at it through the use of the net working capital-to-asset
ratio. After all, it takes cash to pay the bills.

Finally, Z-scores can swing from quarter to quarter when a company records one-time write-offs.
These can change the final score, suggesting that a company that's really not at risk is on the
brink of bankruptcy.

Compete Risk Free with $100,000 in Virtual Cash

Put your trading skills to the test with our FREE Stock Simulator. Compete with thousands of
Investopedia traders and trade your way to the top! Submit trades in a virtual environment before you
start risking your own money. Practice trading strategies so that when you're ready to enter the real
market, you've had the practice you need. Try our Stock Simulator today >>

Related Terms

Altman Z-Score
The Altman Z-score is the output of a credit-strength test that gauges a publicly
traded manufacturing company's likelihood of bankruptcy.
more
Zeta Model
The Zeta Model is a mathematical formula that estimates the chances of a public
company going bankrupt within a two-year time period.
more
T-Test Definition
A t-test is a type of inferential statistic used to determine if there is a significant
difference between the means of two groups, which may be related in certain
features.
more
Standard Deviation Definition
The standard deviation is a statistic that measures the dispersion of a dataset
relative to its mean and is calculated as the square root of the variance. It is
calculated as the square root of variance by determining the variation between
each data point relative to the mean.
more
Understanding Variance
Variance measures how far each number in a data set is from the mean and is
calculated by taking the differences between each number in the set and the
mean, squaring the differences and dividing the sum of the squares by the
number of values in the set.
more
Three-Sigma Limits: What You Need to Know
Three-Sigma Limits is a statistical calculation that refers to data within three
standard deviations from a mean.
more

VARIANCE AND STANDARD DEVIATION

In “Range, Interquartile Range and Box Plot” section, it is explained that Range,
Interquartile Range (IQR) and Box plot are very useful to measure the variability of
the data.
There are two other kind of variability that a statistician use very often for their study.

1. Variance
2. Standard Deviation

Why variance and Standard Deviation are good


measures of variability?
Because variance and standard deviation consider all the values of a variable to
calculate the variability of your data.
There are two types of variance and standard deviation in terms of Sample and
Population. First their formula has been given. Then, what is the difference between
sample and population has been discussed below.
Variance:
Here is the formula for sample and population variance and standard deviation. There is
slight difference observe them carefully.

Where

 X is individual one value


 N is size of population
 x̄ is the mean of population

How to calculate variance step by step:


1. Calculate the mean x̄.
2. Subtract the mean from each observation. X- x̄
3. Square each of the resulting observations. (X- x̄) ^2
4. Add these squared results together.
5. Divide this total by the number of observations n (in case of population) to get
variance S2. If you are calculating sample variance then divide by n-1.
6. Use the positive square root to get standard deviation S.
Here,
N =11
N-1=10
Mean (x̄) =15
Sample variance ( s² ) = 639.74/10 = 63.97
Population ( σ² ) = 639.74/11 = 58.16
S = 8.00
σ = 7.6

Intuition:
1. If variance is high, that means you have larger variability in your dataset. In the other
way, we can say more values are spread out around your mean value.
2. Standard deviation represents the average distance of an observation from the mean
3. The larger the standard deviation, larger the variability of the data.

Standard Deviation:
The Standard Deviation is a measure of how spread out numbers are. Its symbol is σ
(the greek letter sigma) for population standard deviation and S for sample standard
deviation. It is the square root of the Variance.

Population vs. Sample Variance and Standard


Deviation
The primary task of inferential statistics (or estimating or forecasting) is making an
opinion about something by using only an incomplete sample of data.
In statistics, it is very important to distinguish between population and sample. A
population is defined as all members (e.g. occurrences, prices, annual returns) of a
specified group. Population is the whole group.
A sample is a part of a population that is used to describe the characteristics (e.g. mean
or standard deviation) of the whole population. The size of a sample can be less than
1%, or 10%, or 60% of the population, but it is never the whole population. As both
sample and population are not same thing therefore slight difference is there in their
formula.

A question may raise that at the time of calculating Variance why


we do square the difference?

To get rid of negatives so that negative and positive don’t cancel each other when
added together.
+5 -5 = 0
Z-SCORE OR STANDARDIZED SCORE

What is Z-score and how to calculate them?


 A Standardized Score (Z-Score) is useful to know how many standard deviations an
element falls from the mean. A z-score can be calculated from the following formula.

where z is the z-score, X is the value of the element, μ is the population mean, and σ is
the standard deviation.

 Z-score will help to understand a specific observation is common or exceptional in your


study.
 As mean is the middle point. So, negative z-score represent values below the mean.
While positive z-score represent values above the mean
 If you add all Z-Score you will get a value 0 because positive and negative z-score will
cancel out each other.
 If your data is extremely right skewed then probably you will get large positive Z-Score.
On the other way, if distribution is left skewed then you will get large negative Z-Score
 Z-Score of Mean is 0 as it is the middle value
 If value of |Z| is greater than 2 then we can tell a distribution is unusual or exceptional.

Why Z-Score is needed?


 Sometimes, in your statistical analysis you want to figure out a specific observation is
common or exceptional case. Then Z-score will help to understand the standard
deviation it falls below or above the mean.

Bell Shaped Distribution and Empirical Rule


If distribution is bell shape then it is assumed that about 68% of the elements have a z-
score between -1 and 1; about 95% have a z-score between -2 and 2; and about 99%
have a z-score between -3 and 3.

Let’s take an example why Z-Score are useful?


A person is having two sons. He wants to know who scored better on their standardized
test with respect to the other test takers. Ram who earned an 1800 on his SAT or Sham
who scored a 24 on his ACT Exam ?

Here we cannot simply compare and tell who has done better as they are measured in
different scale.
So, his father will be interested to observe how many standard deviation of their
respective mean of their distribution Ram and Sham score.
Ram = (1800- 1500) / 300 =1 standard deviation above the mean
Sham = (24 – 21 ) / 5= 0.6 standard deviation above the mean
Now his father can conclude Ram indeed did a better score than Sham.
If it is still not clear to you and want to explore more then can go to the below sites and
have a look.
https://en.wikipedia.org/wiki/Standard_score
https://statistics.laerd.com/statistical-guides/standard-score-2.php
http://stattrek.com/statistics/dictionary.aspx?definition=z%20score
http://www.nku.edu/~statistics/212_Using_the_Empirical_Rule.htm
You can find some other related questions about Z-Score in Quora
https://www.quora.com/Does-the-empirical-Rule-include-4-standard-deviations
https://www.quora.com/While-calculating-a-z-score-what-do-you-do-when-standard-
deviation-is-zero
https://www.quora.com/What-is-the-Difference-between-the-Z-test-and-Z-score

Das könnte Ihnen auch gefallen