Sie sind auf Seite 1von 21

12

THIS

STATISTICS

IN

PSYCHOLOGY
CONTENTS
Introduction What is Statistics? Use of Statistics (Box 12.1) Types of Statistics : Descriptive and Inferential Parametric and Non-parametric Statistics (BOX 12.2) Levels of Measurement Graphical Representation of Data Bar Diagram, Frequency Polygon, and Histogram Measures of Central Tendency Mean, Median, and Mode Measures of Variability Range and Standard Deviation Correlation : Understanding the Relationships Product Moment Correlation (Box 12.3) Rank Order Correlation (Box 12.4) Normal Distribution Curve

CHAPTER COVERS

Nature and types of statistics Levels of quantification of psychological variables Preparation of various types of graphs Computation of central tendencies Concepts of variability and correlation Nature of normal distribution curve BY
THE END OF THIS CHAPTER YOU WOULD

BE ABLE TO

understand the concept of statistics and its uses, differentiate between different levels of measurement, draw bar, polygon, and histogram, differentiate and compute mean, median and mode, appreciate the nature of normal probability curve, and understand the concepts of variability and correlation.

Key terms Summary Review Questions Answers to Learning Checks

240

Introduction to Psychology

INTRODUCTION
The knowledge you have acquired by reading various chapters in this book and a large body of psychological knowledge that exists today, is the result of scientific investigations various researchers have undertaken. In the process of research, investigators gather, organise, and interpret numerical (and other kinds) of data. In this context, statistics plays a very important role. An example from research will be helpful in appreciating the important role of statistics in handling the data that we generally obtain in behavioural sciences. Suppose, a community requests a researcher to answer the following question: Does viewing of co-operation on the television (TV) promote co-operative behaviour in children? To answer this question, the investigator employs an experimental approach. She designs an experiment in which she manipulates viewing of cooperation on TV. She randomly selects 100 twelve-year-old children, and randomly assigns them to two groups of 50 children each. Further, she randomly selects experimental and control groups. The experimental group is asked to view cooperative scenes on TV every day for one hour for a total of 30 days. The control group also views television for the same amount of time, as the experimental group, but no cooperation is displayed in this set of TV programmes. The experimenter decides about an appropriate measure of cooperation and after every six days of viewing the TV programmes, the participants in both the groups are observed for the expression of cooperation, total of five observations are recorded from each participant in each of the two groups. Each participant has been observed on five occasions. Thus, the investigator has a total of 500 observations or scores at his disposal. The investigator has to reduce this mass of data into some manageable form. This is usually accomplished by a process of averaging scores (e.g., Mean). In order to compare the two groups for the display of cooperation, the means of the two groups need to be compared. Suppose, it is found that the mean of the cooperation scores of the experimental group is higher than that of the control group. Can she take the decision that the cooperation displayed by the experimental group is greater than the control group? A difference of few points in favour of the experimental group could be due to other factors than the manipulation of viewing cooperation (Independent variable) on TV. Can one be sure that the difference between the groups is large enough not to be dismissed as accidental or chance event? In order to make such a decision we need to demonstrate that the difference in the means of the two groups is actually due to viewing of cooperation on TV and not a fluke. In this chapter, you will learn how statistics helps researchers in gathering, organising, and interpreting various types of data. In particular, you will study the meaning of statistics and its types. This will be followed by levels of measurement, and methods of graphical representation. You will also learn the concepts and method of computation of central tendencies, range and standard deviation. Finally, you will get an opportunity to learn about the concept of correlation and normal distribution curve.

Statistics in Psychology

241

WHAT

IS

STATISTICS?

You have just observed that the researcher has to deal with voluminous amount of data and has to draw dependable conclusions on the basis of data collected for the purpose. Statistics help the researcher in making sense of the enormous amount of data. Let us first understand the term statistics. Technically statistics is that branch of mathematics which deals with numerical data. To begin with let us understand some of the frequently used terms in statistics such as variable, sample, population etc. are explained below: Variable : A Variable is a measure that changes from one observation to the other. Thus, scores obtained on an intelligence test represent a variable because it changes from one person to another. Variables may be continuous or discrete. Continuous variable can take on any score value. For example RT can assume any value. A Discrete variable can take on a limited number of score values. For example, the variable family size is a discrete variable because it can only take whole numbers, e.g., 5 members.

N/n: Capital N is used when referring to total number of observations or total participants in a sample. Small n is used to refer to number of observations or participants in the subgroups or subsets. Sample and Sampling : A sample is a sub-set of observations or participants from a population, drawn by using some sampling technique (e.g., random sampling). A sampling is a technique of drawing a sample from the population. Random sampling (or probability sampling), is the best method of drawing representative sample from the population, where every element in the population has equal chance or probability of being included in the sample. For inferential purposes, the sampling procedure is required to be random. There are other sampling techniques also (e.g., incidental, purposive, etc.). Population : A population is a complete set of scores or observations about which an investigator wishes to draw conclusions. A sample is a part of that population. Note that a population is defined in terms of scores or observations rather than people, we deal with statistical population.

BOX 12.1

USE OF STATISTICS The second use of statistics relates to the testing of hypotheses. An investigator may have formulated a hypothesis that he wants to empirically test. Suppose, an investigator formulated a hypothesis that the intelligence among 12 year old children is normally distributed. S/he would select a large sample of children of this age group randomly from a specific population and administer a standardised test. In order to test the hypothesis the investigator tests the nature of obtained distribution of scores on intelligence test, with the help of chi2 test, testing goodness-of-fit. That is, how well the data and the hypothesis fit? Another, example: the investigator may hypothesize that Group A (experimental group) is better on psychomotor performance than Group B (control group). S/He finds that the mean of Group A is higher than that of Group B. The
contd...

Statistical techniques are useful tools that help researchers in gathering, organising and interpreting data. There are three main uses of statistics Describing data, hypotheses testing, and inferring population values from sample values. Sometimes the objective of an investigator may be simply to describe, organise, and summarise data so that the observations become easier to comprehend. In such situations, the data may be presented in graphical forms, or represented by some central tendency (e.g., Mean, Median, and Mode). Some measure of variability (e.g., range, SD, Variance etc.) are provided to understand the nature of central tendency. For example, to understand the value of mean it is essential to understand the nature of dispersion (e.g., SD). On other occasions, the investigator may be interested in understanding the relationship between variables and work out some index of correlation.

242

Introduction to Psychology

investigator will apply an appropriate statistical test to rule out the operation of chance. The third use of statistics is to draw inferences about conditions that exist in a population from the study of a sample drawn from a population. For example, an investigator may be interested in knowing the attitude of public towards capital punishment. S/he will draw a large sample (randomly) from the population and then use the responses of the sample to estimate the attitude

of the whole population with the help of inferential statistics. It is possible to estimate population values if sample is large enough and drawn randomly from the population of interest. Henry Clay remarked Statistics is no substitute for judgement. Statistical techniques are sophisticated tools that are as good as individual who uses them. If the user is careless and nave, the use of statistics could have disastrous outcome.

TYPES OF STATISTICS: DESCRIPTIVE AND INFERENTIAL There are two main types of statistical analysis we employ while dealing with the data. Let us explain these two types of statistics. Descriptive Statistics : Descriptive statistics are useful in organizing and summarising the data. They help to represent and communicate the data clearly and in a way that can be readily understood by readers. For example, graphical representations (e.g., bar charts, histograms, pie charts, frequency polygons, scatter-grams, etc.), averages (such as mean), dispersions (such as standard deviation), and correlations (such as product moment r) are useful in organising and summarising mass of data. All of these are techniques of descriptive statistics. Inferential Statistics : This type of statistics is useful in drawing conclusions or making inferences from the data. In other words, inferential procedures are used to make educated guesses (inferences) about population on the basis of samples. These guesses are the best way to learn about a
BOX 12.2

population. For example, we are able to accurately predict about the election outcomes with the help of scientifically designed opinion polls. Much of the knowledge we have today in behavioural sciences has been derived using inferential procedures. LEVELS
OF

MEASUREMENT

Measurement is the use of rules to assign a number to a specific observation. For example, you want to assess the level of anxiety your friend experiences during the examination days. Anxiety is the variable you want to assess or measure. One way to measure anxiety is to assign a score equal to the number of items your friend checks on the Anxiety Scale. So by assigning a number for a specific response on the item in the scale you can measure the level of anxiety. Psychologists use four types of measurement scales. These are: nominal, ordinal, interval, and ratio. They possess different properties of measurement. Let us briefly discuss these scales and their properties.

PARAMETRIC AND NON-PARAMETRIC STATISTICS either nominal or ordinal, and the sampling technique may or may not be random. That is the reason the non-parametric methods are called distribution free tests. Another important difference between the two branches is that in non-parametric statistical tests we only test the hypotheses and in parametric we test the hypotheses and also infer the parametric values from the sample.

Statistics has two branches : Parametric and Nonparametric. Parametric statistics can be used when certain assumptions, like nature of distribution is normal, scale is either interval or ratio, sampling method adopted is random or stratified. On the other hand, the non-parametric statistical methods (e.g., Chi sq test) are applied when the distribution is non-normal, measurement scale used in collecting data is

Statistics in Psychology

243

1. Nominal Scale : This represents the lowest level of measurement. A nominal scale measures just the mere presence of some variable. It sorts objects or attributes into different categories. The objects may be number or labels. The numbers assigned, however, cannot be ordered or added. For example, members of a football team are assigned numbers 1, 2, 3,. . We may categorise people as men, women, Indian, young, old and so on. So, as long as we classify objects, people, or events in terms of numerals or names, the purpose of the scale is served. Nominal measurements are useful to the social scientists. Since, number labeled things can be counted and compared this procedure is easy to use. For example, we may count that in the class there are 15 boys and 23 girls, it serves our purpose of classifying, counting, and comparing. The requirements of nominal measurement are simple. All members of a set are assigned the same numeral and no two sets are assigned the same numeral. For example, number 1 may be assigned to all cars and 2 to all two wheelers parked in a particular parking lot. 2. Ordinal Scale : Ordinal measurement requires that objects of a set are rank-ordered on the basis of some characteristic or property. For example, you rank order all your friends of the opposite sex according to their attractiveness. The first person you rank is the most attractive, the second on your scale is the next most attractive, and so on. Ordinal numbers indicate rank order. Such numbers do not indicate absolute quantities, nor do they indicate that the intervals between the numbers are equal. Let us explain this point with the help of data given in Table 12.1.

It is observed from Table 12.1 that the student number 3, who is ranked 1st has secured 95 marks and the student ranked 2nd T*m055 Two0 and tfirst t rankedhe he75-0.0072 proper3ortls

Statistics in Psychology

247

convert the tables into frequencies (f) as shown in the last column of Table 12.2. Confirm that total of f is equal to n if the distribution is considered subsample; or N if it is total sample or total observations. Frequency Polygon is a line figure used to represent data from a frequency distribution. The frequency polygon (Greek word meaning many angles) is a series of connected points above the midpoint of each class interval. Each point is at a height equal to the frequency (f) of scores in that interval. The steps involved in constructing a frequency polygon are: (a) Prepare a frequency distribution in tabular form. (b) Decide on a suitable scale for X-axis and Y-axis (as explained earlier). (c) Label the midpoints of class interval along the X-axis. (d) Place a point above the midpoint of each class interval at a height equal to the frequency value of the scores in that interval. (e) Connect the points with a straight line. (f) After joining the points bring the polygon down to the horizontal axis (x-axis) at both ends. One point before the midpoint in the beginning and one point after the last mid-point. The data together with frequency distribution is presented in Table 12.2 and frequency polygon is shown in Fig. 12.4.

14 12 10 8 6 4 2 X

FREQUENCIES

162

172

182

192

142

152

167

147

157

SCORES (MIDPOINTS)

Fig. 12.4. Frequency polygon of scores of 50 participants on an intelligence (test scores given in Table 12.2.)

HISTOGRAM It is a bar graph that presents data from frequency distribution. Both polygon and histogram are prepared when data are on either on interval or ratio scale. Both depict the same distribution and you can superimpose one upon the other, on the same set of data (see Figure 12.5) and both tell the same story. However, a polygon is preferred for grouped frequency distribution and histogram in case of ungrouped frequency distribution of a discrete variable or with data treated as discrete variable. In the frequency polygon all the scores within a given

Table 12.2 Frequency Distribution of Scores of Students on an Intelligence Test (N=50)


Class Intervals 195-199 190-194 185-189 180-184 175-179 170-174 165-169 160-164 155-159 150-154 145-149 140-144 Mid Points (x) 197 192 187 182 177 172 167 162 157 152 147 142 Tallies ll llI llII llII llII llII llII llII I llII llII llII ll l f 2 3 4 4 5 10 6 5 4 4 2 1 N=50

177

187

197

248

Introduction to Psychology

interval are represented by the mid-point of that interval, whereas, in a histogram the scores are assumed to be spread uniformly over the entire interval. Within each interval of a histogram the frequency is shown by a rectangle, the base being the length of the class interval and the height having frequency within that interval. Histogram differs from the bar diagram on two counts. One, histogram is prepared from a data set that is on a continuous series. Two, the data are obtained on either interval or ratio scale. In Fig. 12.5 a histogram is prepared from the frequency distribution of scores given in Table 12.2 and a polygon superimposed to demonstrate the similarity and differences between the two. The first interval in the histogram actually begins at 139.5, the exact lower limit of the interval and ends at 144.5, the exact upper limit of the interval. However, we start the first interval at 140 and second at 145, third at 150, and so on. The frequency of 1 on 140-144 is represented by a rectangle, the base of which is the length of the interval (140-145) and height of which is one unit up on the Y-axis. Similarly, the frequency of 2 on the next interval is represented by a rectangle one interval long (145-149) and 2 Y units high. The heights of the other rectangles will vary with the frequencies of the intervals. Each interval in a histogram is represented by a separate
Y 14 12

rectangle. The rise and fall of the rectangles increases or decreases depending on the number of scores for various intervals. Note, the bars or rectangles are joined together, whereas in the bar diagram they are not. As in a frequency polygon, the total frequency (N) is represented by the area of the histogram. The frequency polygon can be constructed on the same graph by joining the midpoints of each rectangle, as shown in Fig. 12.5. It may be noted that frequency polygon is less precise than the histogram. However, if we have to compare two or more distributions, frequency polygons on the same axis are more revealing as compared to histograms. Recapitulation After collecting data, the next step is to organise the data to get a quick overview of the entire data. Graphical representation helps in achieving this objective. To this end three different kinds of graphs are frequently used : Bar Diagram, Frequency Polygon, and Histogram. Bar diagram is very similar to a histogram in shape. However, the bar diagram is used when there is discontinuity between the various categories and space is kept in between the rectangles because the variable represented on the x-axis is discrete. On the other hand histogram is constructed from data that are on an interval or ratio scales and only when the data are on a continuous series. Frequency polygon can be constructed on the histogram, by joining the midpoints of each rectangle of the histogram.
LEARNING CHECKS II

10 8 6 4 2

160

170

180

190

195

140

150

165

145

155

SCORES (mid points)

Fig. 12.5. Histogram and conversion of histogram into frequency polygon (Data given in Table 12.2.)

175

185

200

1. The space for graphical representation of data is divided into quadrants. 2. Bar diagram is prepared when the data are and the measurement is on or scale. 3. Frequency polygon is a figure prepared from a frequency distribution. 4. variable is represented on the x-axis and variable on y-axis. 5. Histogram is prepared from data on a series and the scale used is either or .

FREQUENCIES

Statistics in Psychology

249

MEASURES

OF

CENTRAL TENDENCY

Suppose that the Principal of your school is interested in knowing how students of psychology in her school compare to students of a nationally renowned school. She would like to compare the psychology result of the two schools. The average scores of the two schools can be compared for the purpose. Measures of this kind are called measures of central tendency. The purpose is to provide a single summary figure that best describes the central location of the observations or data. The central tendency of a distribution is the score value near the center of the distribution. It represents the basic or central trend in the data. A measure of central tendency helps simplify comparison of two or more groups. For example, we have two groups created Calculation of Mean from Grouped Data randomly from a specific population, one When the data are large, we convert it into group is randomly assigned to treatment frequency distribution by arranging the scores condition (Experimental group) and the into class intervals, as shown in Table 12.2. second is not given any treatment (Control Let us work out mean from the data grouped group). Both the groups are observed on into frequency distribution. The calculation of dependent variable after the treatment. In mean has been given in Table 12.3. For grouped order to study the effect of treatment, the data the formula for calculating mean is: average performance of the two groups needs to be compared. Later, in this chapter you Sf x X = will discover that we need to know more about N the dispersion of scores in the group than Where: f frequency just comparing them on some group average. X the mid point of the classThere are three commonly used measures of interval central tendency: Arithmetic Mean, Median, N the total number of observations and Mode. Let us learn about each of these fX is the sum of the midpoints indices and their computation. weighted by their frequencies. The Arithmetic Mean : The arithmetic mean or Participant for brevity mean, is the Per month income Number in rupees (X) sum of all the scores in a 200 1 distribution divided by MODE (MOST FREQUENT) the total number of 200 2 scores. This is also 250 3 MEDIAN (MIDDLE) sometimes called 350 4 average. We generally do 2,000 5 not use the term average because the term is also MEAN (ARITHMETIC MEAN) X : 3,000 5 = 600 used for other measures Fig 12.6. The three measures of central tendency. Generally, the of central tendency. (we mean is the best index of central tendency, but in this instance call the mean as the median is more informative arithmetic mean because

in statistics we also use geometric and harmonic means). Let us get acquainted with some symbols that we use in calculating central tendencies. Add together N The total number of observations in the study (N=n1+n2+ .) n The number of observations in each of the subgroups. X Raw Scores Mean of the sample Mean of the population Calculation of Mean from Un-grouped Data Let us take up an example to demonstrate the calculation of mean from the ungrouped data obtained from 10 participants as given below. X: 8, 7, 3, 9, 4, 4, 5, 6, 8, 8 X= 8+7+3+9+4+4+5+6+8+8 = 62 Mean = X = X/N = 62/10 = 6.2

250

Introduction to Psychology

In Table 12.3 the mid points (X) are given against each class-interval. The X values are multiplied by the respective f to obtain fX, as presented in in the last column of the table. All the fX values are added to get fX. Finally, fX value is divided by N which is 50. The mean value comes to 170.7. This mean has been calculated by the direct method. The Median : The median is the score value that divides the distribution into halves. It is such a value that half of the scores in the distribution fall below it and half of them fall above it. Calculation of Median from Ungrouped Data: When the scores are not grouped into class intervals in a tabular form, we arrange the scores in the ascending order as given below: 1, 3, 5, 6, 8, 10, 11 When the n is an odd number, the middle score becomes the median. In the above

The median in the above example is the average of the two middle scores 6 and 7 (6+7/2). Calculation of Mdn from Grouped Data : The formula for calculating the median when the data are grouped in class intervals is:
Mdn = +

n /2 - F i fm

where: = exact lower limit of the class interval within which the mdn lies n/2 = one half of the total number of scores F = sum of the scores of f of all class intervals below fm = frequency (number of scores) within the interval upon which the mdn falls. i = size of class interval

Table 12.3 Calculation of Mean from the Grouped data (N=50)


Class intervals 195-199 190-194 185-189 180-184 175-179 170-174 165-169 160-164 155-159 150-154 145-149 140-144 Mid Points (x) 197 192 187 182 177 172 167 162 157 152 147 142 f 2 3 4 4 5 10 6 5 4 4 2 1 N = 50 fx 394 576 748 728 885 1720 1002 810 628 608 294 142

fX=8535

: fX/N = 8535/50 = 170.7

problem 6 is the median. The score 6 has an equal number of scores below and above it. You can observe that there are 3 scores above it and 3 below it. When the n is even number of scores, there is no middle score, so the median is taken as the point halfway between the two scores. Let us consider an example. Suppose, there are 8 students in a class and they get following scores on a test. 0, 3, 5, 6, 7, 10, 11, 12

Median is a point which divides the scores into two equal halves. In the above example there should be 25 scores above the median and 25 below. If we start adding the frequencies (f) from below we discover that 25 lies in the class-interval 170-174, mark the f as indicated in Table 12.4. Below the f of 10 the total of frequencies is 22. The lower limit of the class interval in which the mdn lies, is 169.5.

Statistics in Psychology

251

Table 12.4 Calculation of Mdn from Grouped Data


Class intervals 195-199 190-194 185-189 180-184 175-179 170-174 165-169 160-164 155-159 150-154 145-149 140-144 Mid Points (x) 197 192 187 182 177 172 167 162 157 152 147 142 f 2 3 4 4 5 10 6 5 4 4 2 1 N=50 Cumulative Frequency (C.F) 50 48 45 41 37 32 22 16 11 7 3 1

Let us apply the Formula to derive Median : Here: = 169.5 n/2 = 50/2=25 F = 22 fm = 10 i = 5 Mdn = 169.5 + (25 22/10)x5 = 169.5 + 1.5 = 171.00 We can also calculate the mdn by proceeding downwards, from the top. Let us see how we can work out from the opposite direction. The mdn lies in the class interval 170174 having f of 10. From top start adding the frequencies till we reach the value 25. The upper 5 frequencies add upto 18. So, we require 7 points to make it 25. To be more precise we need 7 points from 10 to make it to 25. Therefore, 7/10x5 = 3.5 should be subtracted from the actual upper limit (174.5) of the class interval in which the median lies. Therefore, 174.5 3.5 = 171.00. Note, the difference in calculation in proceeding from two different ends of the class interval. The Mode : The mode (or Mo for brevity), is the score value (or class interval) with the highest frequency. In an ungrouped data the mode is that single score which occurs in a distribution of scores most frequently.
Calculating Mode from Ungrouped Data :

3, 5, 5, 6, 7, 7, 8, 8, 8, 9, 10 The mode in the above data is 8 because it occurs most frequently, 3 times, in the data. The great advantage of mode, compared to mean and median is that it can be computed for any type of data obtained through nominal, ordinal, interval, or ratio
ACTIVITY 12.3 Select a sample of 10 students randomly from your class (write the name of each student on a separate chit and fold it. Place all the chits in a bag and mix them thoroughly. Then blindly draw 10 chits, one by one, and record the names). Prepare a proper data sheet as given below. Write the height of each student in CMS, against each serial number. Serial No. Height in Cms. 1 _________ 2 _________ 3 _________ 4 _________ 5 _________ 6 _________ 7 _________ 8 _________ 9 _________ 10 _________ Calculate the Mean, Median, and Mode from the data and interpret.

Consider the following scores of a group of 11 students on a class test of mathematics (arranged in ascending order):

252

Introduction to Psychology

scales. On the other hand, the greatest disadvantage is that it ignores much information available in the data. Calculating Mode from Grouped Data : A common meaning of mode is fashionable and it has the same implication in statistics. In the frequency distribution given in Table 12.4 the class interval 170-174 contains the largest frequency (f=10) and 172 being the midpoint is the mode.
When to Use the Mean, Median, and Mode

Mode, in ungrouped data, is the score value with the highest frequency. When data is grouped mode is the mid point of the classinterval with largest frequency.
LEARNING CHECKS III

The Mean is used when : l the measure of central value having maximum stability is required. l the scores symmetrically fall around a central point i.e. the distribution of scores conform normal distribution. l measures of dispersion, such as standard deviation, are to be calculated. The Median is used when : l the exact 50 percent point or the midpoint of the distribution is required. l extreme scores are likely to affect the mean. The median is not affected by the extreme scores l position of an individual score is to be found in terms of its percentage distance from the mid-point of the distribution. The Mode is used when : l quick and approximate measure of central point of the distribution is required. l the measure of the central value is required to denote the most typical characteristic of the group. Recapitulation The three measures of central tendency are: Mean, Median, and Mode. Mean is also called Arithmetic mean and also sometimes called average. From the raw data mean can be obtained by adding all the scores and dividing it by the total number of observations. From group data it can be obtained by the formula fX/N. Median is a value that falls in the center of the distribution or the 50 th percentile. Median in ungrouped data is the middle score. When data is grouped, the median can be worked out by using formula.

Fill in the blanks. 1. The most stable measure of central tendency is , sometimes called or . 2. The Median of a distribution is the value that falls in the of the distribution. 3. The Mode is the score value or class interval with the frequency. 4. A distribution with two highest frequencies is a distribution. 5. Mean and Median can be calculated only when the data is either on or scale.

MEASURES

OF

VARIABILITY

Earlier, it was stated that whenever we want to compare two or more groups we compare their means (or median). However, mean alone is not sufficient for the comparison of two or more groups, some measure of variability or dispersion is essential for the purpose. Let us explain this concept with the help of an example: In Table 12.5, there are two groups and each group has 8 scores. The means of the two groups are exactly the same but the scores in the group has different degrees of variability or spread. The dispersion in terms of SD of group A is 1.25 and that of group B 4.39. The dispersion of group A is less than the group B. Thus, the real or statistical meaning of the same mean is different. A mean score in itself does not tell much unless we know the degree of variability in the series for which the mean has been computed. Measures of variability express quantitatively the extent to which the scores in a distribution are scattered. In other words they describe the spread of an entire set of scores, in terms of a single index.

Statistics in Psychology

253

Such measures are useful for precise description and for statistical inference. Let us try to understand two important measures of variability namely the Range and the Standard Deviation. Range : The range is the simplest measure of variability. It is the difference between the highest and the lowest score in the distribution. Range is a distance between the two extreme scores in the group of data. Let us consider the following set of scores (arranged in ascending order) of two groups in Table 12.6. The range is computed when we wish to make a rough estimate of variability. As you might have noticed the range takes account of extremes of a series of scores. However, it does not represent variability well when n is small or when large gaps are there in the data. You must have observed that the range in group B is very large in comparison to Group A, there being large gaps in the scores of group B. It has increased the range substantially because of the last score, an increase from 25 to 50. Standard Deviation : The standard deviation is the most widely used index of variability. It reflects how the scores in a given set of data are spread out about the mean. To be more precise, the standard deviation(s) indicates the average of distances of all the scores around the mean. Like mean, SD is more important in statistical work than other measures of variation. The square of standard deviation (s2 ) is called variance. The larger the SD of a set of scores, the

more spread out the scores are relative to the group mean. In Table 12.6, the two set of scores (Group A & B) with the same mean values and different standard deviations are presented. The mean with smaller SD is more reliable than one with larger SD. Small SD value reflects homogeneity of the data or scores. The SD is most frequently used statistic in behavioural research with mean as measure of central tendency, i.e., when mean value is calculated the SD value is essential. In fact, mean score without SD is not interpretable. It is to be remembered that we draw sample/s (generally random), and on the basis of sample value/s we infer parametric values. To identify the two values (sample and parameter), we label the standard deviation calculated from the sample as s and the value inferred for the population or parameter as (Greek letter sigma). Corresponding to the sample and parameter values of the means are X and (Greek letter mu), respectively. The formula for calculating standard deviation from ungrouped data is: s=
S x2 N

where: x is the deviation of a score from the mean (x=X- X ) N is the number of scores or observations

Table 12.5 Degree of Variability in two Groups


Groups Group A: Group B: Scores (X) 5, 5, 7, 7, 7, 8, 8, 8 1, 3, 4, 5, 7, 10, 12, 13

X
6.875 6.875

Standard Deviation (s) 1.25 4.39

Table 12.6 Calculation of Range


Group Scores (X) Range Highest Lowest Score Score 18 50 2 2 = 16 = 48

Group A: Group B:

2, 5, 6, 7, 7, 8, 10, 14, 16, 18 2, 5, 7, 10, 15, 18, 20, 25, 50

254

Introduction to Psychology

The formula for calculating standard deviation from the group data : s=
x 2 N

CORRELATION: UNDERSTANDING THE RELATIONSHIPS We know that certain characteristics or properties go together. In research, also we become interested in knowing how some variables are related or associated with other variables. For example, we can say that hard work and achievement scores (i.e. examination marks) are associated in the sense that a student who works hard is likely to get higher marks in the examination and a student who does not work hard gets poor marks (positive correlation). In the same manner we observe that many variables are interrelated, some positively and some negatively. Correlation index indicates how two entities change (i.e., increase or decrease) in relation to each other. How can we best describe the extent and direction of relationship between two variables, such as intelligence and examination results? This question is about correlation and is closely related to the problem of prediction. That is, if we consistently obtain high positive correlation between intelligence and achievement scores then we can predict achievement on the basis of intelligence level. Thus, greater the association between two variables, the more accurately we can predict one variable from the other. Problems like these and many others which involve relationships among variables can be studied with the technique of correlation. When we study the relationship between two variables, it is called bivariate correlation. However, in certain cases we are interested in the study of relationship between many variables, an index of such a relationship is called multivariate correlation. In the present context we will discuss only about bivariate correlation. When the relationship between two sets of measures is linear (one that can be described by a straight line as demonstrated in Fig. 12.7) the correlation between scores may be expressed by product-moment coefficient of correlation, designated by the letter r. The method of calculating correlation coefficient (r) was invented by

Where : x is the deviation from the mean (x=X- X ) f is the frequency associated with the class interval N is the number of scores or observations Recapitulation Mean or any other measure of central tendency is not sufficient to describe the data unless a measure of variability is also given. Variability is a general term, meaning variation or dispersion among the scores in a sample. Variability includes range, semiinterquartile range, average deviation, variance and standard deviation. The calculation of the range and the standard deviation have been explained. The range is the simplest measure of variability, calculated by subtracting the lowest score from the highest. The range is not a stable measure of variability when there are large gaps in the scores. The standard deviation is the most stable measure of variability, generally used in conjunction with the mean. Variance is the square of standard deviation(s2). Standard deviation reflects how much the scores tend to vary or depart from the mean score or reflects the average distances of all the scores around the mean.
LEARNING CHECKS IV

1. Different measures of variability are: , , and . 2. Range is calculated by subtracting the score from the . 3. The Standard Deviation indicates the of distances of all scores around the . 4. Variance is the of Standard Deviation. 5. Like Mean, Standard Deviation is the most measure of dispersion.

Statistics in Psychology

255

Pearson in 1896. Galton was the first to use the symbol r to denote a simple correlation coefficient. It is often written with subscript as rxy. The linear relationship can be understood by inspecting Figure 12.7.
Y

100 90 80 70 60 50 X

110

100

IQ SCORES

Fig. 12.7 Linear relationship between intelligence and achievement scores

Looking at Figure 12.7, it can be observed that as the intelligence level increases the achievement scores also increase. The trend of increment of scores is linear. The coefficient of correlation varies from +1 to -1. Both +1 and 1 correlation are called perfect correlations. In order to understand positive, negative and zero correlations let us examine some concrete examples. Perfect Positive Correlation (r = +1): In a test of English and Mathematics a student gets the highest marks in both. Another student who gets second highest rank in English also gets second highest in Mathematics, and a student who gets third highest in English also gets third highest in Mathematics. If this trend follows, the correlation will be perfect and is demonstrated in Table 12.7a. Perfect Negative Correlation (r = -1): In a test of English and Mathematics, a student obtains highest mark in English but lowest in Mathematics. Another student who gets second highest rank in English gets second

105

95

115

90

120

lowest in Mathematics. If this trend is maintained, the correlation between the scores of tests of two subjects will be perfect negative. It is demonstrated in Table 12.7b. Near Zero Correlation : A student who gets top rank in English gets third rank in Mathematics. Another student who gets second rank in English gets first in Mathematics. When the distributions of ranks in the two papers are random and no systematic trend is demonstrated, the correlation is near zero or no relationship exists in the two variables. Such a relationship is demonstrated in Table 12.7c. Linear relationship between two variables, as demonstrated in Fig.12.7, is the result of fitting a linear curve to the data (the coordinate points are generally scattered, but have an underlying linear trend. After determining the trend, a best fitting curve is drawn, as shown in Fig. 12.7). If we select a large sample and take observations from each person on two variables and plot the data on a graph, the resulting plot is called a scatter diagram. The scatter gram could look like the three figures presented in Fig. 12.8. It can be observed that: (a) when the coordinate points on the graph (scatter points) can be enclosed in an ellipse, inclined about 45 degrees towards X-axis, the correlation will be positive and high as demonstrated in Fig. 12.8a. (b) when the points on the scatter gram can be enclosed in an ellipse as in (a) but the incline of the ellipse is reversed as shown in (b), the correlation will be negative and high, as demonstrated in Fig. 12.8b. However, if the scatter points in the graph can be enclosed in a complete circle as demonstrated in Fig. 12.8c, then the correlation will be close to zero, meaning absence of any relationship between the two variables. As the ellipse enclosing the scatter points gets wider in dimension and approaches the shape of a circle, the correlation value approaches zero. Conversely, when the ellipse enclosing the scatter points gets thinner and approaches the shape of a line, the correlation gets closer to +1 or 1 depending upon the nature of incline. (See Fig. 12.9.). The value of correlation obtained (e.g., r = .76) could be high, low, or near zero. It indicates the nature and strength of the

MEAN ACHIEVEMENT SCORES

256

Introduction to Psychology

Table 12.7 Perfect Positive, Perfect Negative, and Near Zero Correlations
(a) POSITIVE English Maths 65 62 60 58 50 r = +1 100 93 89 80 76 (b) NEGATIVE English Maths 65 62 60 58 50 r = -1 76 80 89 93 100 (c) NEAR ZERO English Maths 65 62 60 58 50 r = .00 100 93 89 80 76

Ellipse inclined +45

Ellipse inclined 45

Circle 5
5 4 3 2 1
5 4 3 2 1

VARIABLE 2

VARIABLE 2

VARIABLE 2
0

4 3 2 1

VARIABLE 1 a. High Positive

VARIABLE 1 b. High Negative

VARIABLE 1 c. Near Zero

a. High Positive

b. High Negative

c. Near Zero

Fig. 12.8 Scatter diagrams illustrating high Positive correlation (a), high Negative correlation (b), and near Zero correlation (c).

relationship between the two variables. You must remember that correlation index is, a measure of the direction (positive or negative) and extent (high or low) of relationship between two sets of scores, it is a part of descriptive statistic. It only describes the nature of relationship between any two variables. However, if we want to determine the significance level of the correlation, it is part of

Negative Correlation

Positive Correlation

1.0 .9 .8 .7 .6 .5 .4 .3 .2 .1

+.1 +.2 +.3 +.4 +.5 +.6 +.7 +.8 +.9 +1.0

Negative Increasing

0 Strength of Relationship

Positive Increasing

Zero

Fig. 12.9 The range of possible correlations

Statistics in Psychology

257

the inferential statistics which is beyond the scope of present discussion. A Word of Caution : A coefficient of correlation only indicates the degree to which two variables are related to each other. They dont necessarily show a cause and effect relationship. Suppose in a study, it has been found that there is high correlation between scores on intelligence test and examination marks. It only indicates that intelligence and achievement scores are related. Can we say on this basis that intelligence causes the achievement scores? No. Instead it is far more likely that achievement scores are caused by other things, like study habits, home environment, motivation, and so on, which are again related to intelligence. Causal relationship can only be established in studies using experimental method, where the independent variable are manipulated directly by the experimenter.
ACTIVITY 12.4 Calculating Mean, and Drawing Scattergram Select 10 of your classmates randomly and measure and record their heights. Now you measure the weight of each of the students in kilogms. Record your observations in the given format: Serial No. 1 2 3 4 5 6 7 8 9 10 Height in Weight cms. (X) in kgms (Y) X = Y =
contd...

Workout the following: i. ii.

Y Range of X Range of Y iii. Draw a scatter gram on a graph paper by taking height in cms on X-axis and weight in kgms on Y-axis. iv. Interpret the Means of X and Y with respect to the range (for example, the mean with smaller range is more reliable than with larger range) v. Through inspection, try to enclose the scatter-points in an ellipse or a circle and interpret the nature and direction of the correlation.

Recapitulation Correlation is a measure of degree and direction of relationship between two variables. The question of correlation is closely related to the problem of prediction. When the relationship between two variables is linear, the correlation can be expressed by product moment coefficient of correlation, symbolized as r. The value of r varies from +1 to 1, both the end values indicate perfect correlations. A zero correlation reflects absence of any relationship between two variables. Correlation is descriptive in nature and describes the degree and direction of association between the variables.
LEARNING CHECKS V

1. Correlation is a measure of and of relationship between two variables. 2. The coefficient of correlation varies from to . 3. Correlation indicates absence of relationship.

4. Correlation is closely related to . 5. Product moment r is an index of relationship between two sets of measures when the trend is . 6. When the Scatter points can be enclosed in a circle the correlation will be close to .

258

Introduction to Psychology

BOX 12.3

PRODUCT MOMENT CORRELATION (r) moment correlation coefficient (r) using the raw scores method (we could obtain r by the deviation scores method also).

The marks obtained by 10 students of class X in Mathematics (Test I) and English (Test II) are given below. We will work out the product

Table 12.8 Product Moment Correlation (r)


Student 1 2 3 4 5 6 7 8 9 10 Test I X 70 65 52 61 45 72 80 51 75 55 626 X Test II Y 50 48 51 45 40 55 60 38 58 47 492 Y

X2 4900 4225 2704 3721 2025 5184 6400 2601 5625 3025 40410 X2

Y2 2500 2304 2601 2025 1600 3025 3600 1444 3364 2209 24672 Y2

XY 3500 3120 2652 2745 1800 3960 4800 1938 4350 2585 31450 XY

The formula used for calculating r from raw scores is: r=

NXY-XXXY

[ NSX 2 -(SX)2 ] [NSY 2 -(SY) 2 ]


(co-efficient of correlation calculated from raw score) substituting the values in the formula r=

10 x31450 - 626 x492

= .86

[10x40410 (626)2 ] [10x24672 (492)2 ]


The coefficient of correlation (r) has been found to be .86. It is positive and high. This indicates that the relationship between the marks obtained in Mathematics and English tests are positively correlated and strong.

NORMAL DISTRIBUTION CURVE In the latter part of the 19 th century, the British scientist Sir Francis Galton made first serious study of individual differences and found that many mental and physical characteristics are normally distributed. For example, if we take large number of people and measure their heights, weights, and other bodily feature measurements, the distribution curve resulting from each of the characteristics will be conforming the pattern

of normal distribution. Let us first study what is a normal curve? Normal Distribution Curve is sometimes referred to as Gaussian curve also. The normal curve is a mathematical abstraction having a particular defining equation. Remember that the curve is a mathematical derivation and not a law of nature. The curve has important characteristics, useful in applications for inferential statistics. The normal curve is presented in Fig. 12.10.

Statistics in Psychology

259

BOX 12.3

Table 12.9 Rank Order Correlation (rho)

Let us use the same data to calculate the rank order correlation. First of all we have to separately rank order the marks in Mathematics and English. These are: Students 1 2 3 4 5 6 7 8 9 10 X 70 65 52 61 45 72 80 51 75 55 Y 50 48 51 45 40 55 60 38 58 47 Rank of X Rx 4 5 8 6 10 3 1 9 2 7 Rank of Y Ry 5 6 4 8 9 3 1 10 2 7 D Rx-Ry -1 -1 4 -2 1 0 0 -1 0 0 D2 1 1 16 4 1 0 0 1 0 0

D2 = 24
The formula for calculating (rho) = 1

2 6D 2 n ( n - 1)
6 24 10 (10 - 1)
2

Substituting the values in the formula : = 1

= .86

It may be noted that the r and values are exactly the same. In fact, when there are no ties in ranks, calculation of r and will yield identical outcomes.

Let us observe the important characteristics of the normal curve. (a) The normal distribution curve is a bell shaped curve. It involves symmetrical distribution. That is, the left half of the curve is a mirror image of the right half. (b) It is a unimodal distribution. (c) The values of mean, median, and mode all coincide. In other words all of them have the same value.

99.70%

95% 68%

+1

+2

+3

STANDARD DEVIATION

Fig. 12.10 The normal distribution curve

(d) Starting at the center of the curve and going outwards, the height of the curve descends gradually at first, then faster, and finally slower. However, the curve never touches the X-axis. It is asymptotic. (e) The normal curve involves a continuous distribution. (f) A large number of scores fall relatively close to the mean on either side. As the distance from the mean increases, the scores become fewer. (g) The total area under the curve is distributed as follows: 68.26 percent of the total area under the curve lies within the range 1 and 99.73 percent cases lie within the range 3. For practical purposes the curve may be taken to end at points 3 and + 3 distance from the mean (). Different values can be read from Table (usually Table A) given in the Appendix of the statistics books. Divergence from Normality : If we draw a frequency polygon from any set of data, the

260

Introduction to Psychology

first thing which strikes the eye is the symmetry or lack of it. As stated earlier, in a normal curve the mean, median, and mode all coincide and there is perfect balance between left and right halves. The obtained distribution may deviate and become skewed. Let us try to understand the concept of skewness, an important feature found in many kinds of empirical data. Skewness : When the mean, median, and mode have different values, the distribution is called skewed. This can happen when there is a concentration of scores on either side of the distribution. For example, if a test is too easy a large number of individuals will be able to secure high scores (concentration on the right end of the curve) and few low scores. The curve in such a situation will be negatively skewed with mode on the right and mean towards the skewness and median in between (see Fig. 12.11 a). On the other hand, if the test is too difficult most of the individuals being tested will be lying on the negative side (concentration on the left end of the curve) of the distribution and the curve will be skewed positively (see Figure 12.11b). The mode will be on the left and mean towards the skewness and median in between. Earlier, we have presented the three methods of measuring central tendencies Mean, Median, and Mode. We observed, for the data presented in Table 12.2 the values of respective measures are: Mean: 170.7,

Median: 171.00, Mode: 172.00. These three values are quite similar, with slight variation in the values (which could be due to chance). This indicates that the curve (frequency distribution) is close to normal distribution. However, when the distribution is skewed, the values of Mean, Median, and Mode differ. It can be observed from Figure 12.11 that in the negatively skewed curve and positively skewed curves the mean shifts towards the skewness, median is in between and the mode in the extreme position. It may be noted that mean is more sensitive to extreme scores than median and mode. For further statistical computations and for stability under the influence of random sampling fluctuations, the mean is the choice in most situations. However, in some situations, where the distribution is strongly skewed or have a few very deviant scores, median is found to have advantages over mean.
ACTIVITY 12.5 Understanding Skewness A test was administered to 100 students. The test was quite easy and a large number of scores concentrated on the right side of the distribution curve. Draw a curve representing the outcome and discuss the nature of divergence from normality. Draw the positions of Mean, Median and Mode.

FREQUENCY

Mdn Mo Mn ve skewness

Mo Mdn Mn +ve skewness

Low

SCORES (a) Negatively Skewed

High

Low

SCORES (b) Positively Skewed

High

Fig. 12.11 Position of the mean, median, and mode in negatively and positively skewed curves

Statistics in Psychology

261

Recapitulation

LEARNING CHECKS VI

A normal distribution curve is a bell shaped curve having symmetrical distribution. It is unimodal. In this kind of distribution the values of mean, median, and mode all coincide. The curve is asymptotic. It is a mathematically derived curve, also called Gaussian curve. The total area under the curve is distributed to include 68.26 percent of the cases within 1 ; 95.44 percent within 2; and 99.72 percent cases within 3. The curve has many important characteristics useful for inferential purposes. Divergence from normality reflects skewness positive as well as negative. If the concentration of scores is on the right side of the distribution the skewness is negative and concentration on the left side results in positive skewness. The position of Mean, Median, and Mode changes with the nature of skewness.

Fill in the blanks. 1. The normal distribution curve is derived from equation. It is curve. also called 2. In a normal distribution the , , and all coincide. 3. In a normal distribution, 1 includes cases. 4. In a normal distribution 3 includes cases. 5. If the test administered is too simple, the distribution obtained will be skewed.

Key Terms
Descriptive Statistics, Inferential Statistics, Parametric, Non-Parametric, Mean, Median, Mode, Variability, Dispersion, Co-relation coefficient, random sampling, normal curve, Range, Average deviation, Variance, Standard deviation, Correlation, Prediction, Normal distribution, Asymptotic, Skewness, Sample.

SUMMARY
l

l l

l l l l

l l

Statistics is that branch of Mathematics which deals with numerical data. It is of two types: Descriptive and Inferential. The descriptive statistics helps to summarise data in the form of graphical representation, the central tendency, dispersion, and correlation. The inferential statistics involve, sampling, significance, and errors of observation. Measurement is the use of rules to assign a number to a specific observation of a variable. Psychologists use four levels of measurements: Nominal, Ordinal, Interval, and Ratio. Graphical representation helps to get a quick overview of the numerical data. Bar diagram, Frequency polygon, and Histogram are three important modes of graphical representation of data. Bar diagram is used when there is discontinuity between various categories and histogram is used for continuous series. Frequency polygon can be constructed on the histogram. In preparing graph independent variable is represented on X-axis and dependent variable on Y-axis. The three measures of central tendency are: Arithmetic Mean or simply Mean, Median, and Mode. Mean is also sometimes called average. Mean in ungrouped data is calculated by adding all the scores and dividing it by the number of scores. Mean is the most stable measure of central tendency and most frequently used statistic. Median is the value that falls exactly at the center of the distribution. Median in ungrouped data is the middle score when N is odd and mid point of two middle scores when N is even. Mode is the score value with the highest frequency. Variability is a general term, meaning variation or dispersion among the scores in a distribution. It includes Range, Standard Deviation, and other measures.

262 l l l

Introduction to Psychology

l l

Range is the simplest measure of variability obtained by subtracting the lowest score from the highest. The Standard Deviation is the most stable measure of variability, generally used with the Mean. It reflects how much the scores tend to vary or depart from the Mean. Bivariate Correlation is a measure of degree and direction of relationship between two variables. It varies from +1 to 1, which represent perfect positive and perfect negative correlation, respectively. In between +1 and 1 is zero, reflecting absence of any relationship between two variables. A linear relationship between two variables is expressed by Product Moment coefficient of correlation, symbolised as r. Normal distribution curve or Gaussian curve is bell shaped, symmetrically distributed, unimodal, and asymptotic curve defined by a mathematical equation. 1 SD unit includes 68.26 percent cases and 3 SD units, 99.73 percent cases. Divergence from normality could be negatively or positively skewed. Concentration of scores on the right side of the curve results in negative skewness and on left side to positive skewness.

Review Questions
1. What do you mean by Statistics? 2. What do you understand by the term measurement? What are the different levels of measurement used in Psychology? 3. What is the purpose of graphical representation of data? What are the different graphical methods? 4. How can you differentiate between Bar-diagram and Histogram? 5. What are the measures of central tendency? Differentiate these measures. 6. What is variability? How does it help in interpreting the central tendency? 7. What is standard Deviation and how is it different from range? 8. What is correlation? 9. What are the properties of Normal Distribution Curve? What are its uses in Psychology?

ANSWERS
I II : :

TO

LEARNING CHECKS

1. numerical data, 2. descriptive, inferential, 3. descriptive, 4. inferential, 5. interval, 6. ratio. 1. four, 2. categorical, nominal, ordinal, 3. line, 4. Independent, dependent 5. continuous, interval, ratio. 1. Mean, average, Arithmetic Mean, 2. Center, 3. highest, 4. bimodal, 5. Interval, ratio. 1.Range, Semi-interquartile Range, Average Deviation, and Standard Deviation, 2. lowest, highest, 3.average, mean, 4.square, 5.stable. 1. degree, direction, 2.+1, -1, 3. Zero, 4. prediction, 5. linear, 6. zero. 1. Mathematical, Gaussian, 2. mean, median, mode, 3. 68.26 p.c., 4. 99.73 p.c., 5. negatively.

III : IV : V :

VI :

Das könnte Ihnen auch gefallen