Sie sind auf Seite 1von 6

Chi-square distribution

A distribution obtained from the multiplying the ratio of sample variance to population variance by the degrees of
freedom when random samples are selected from a normally distributed population
Contingency Table
Data arranged in table form for the chi-square independence test
Expected Frequency
The frequencies obtained by calculation.
Goodness-of-fit Test
A test to see if a sample comes from a population with the given distribution.
Independence Test
A test to see if the row and column variables are independent.
Observed Frequency
The frequencies obtained by observation. These are the sample frequencies.

The chi-square (
) distribution is obtained from the values of the ratio of the sample variance and population
variance multiplied by the degrees of freedom. This occurs when the population is normally distributed with
population variance sigma^2.

Properties of the Chi-Square

Chi-square is non-negative. Is the ratio of two non-negative values, therefore must be nonnegative itself.

Chi-square is non-symmetric.

There are many different chi-square distributions, one for each degree of freedom.

The degrees of freedom when working with a single population variance is n-1.
Chi-Square Probabilities
Since the chi-square distribution isn't symmetric, the method for looking up left-tail values is different from the
method for looking up right tail values.

Area to the right - just use the area given.

Area to the left - the table requires the area to the right, so subtract the given area from one and
look this area up in the table.

Area in both tails - divide the area by two. Look up this area for the right critical value and one
minus this area for the left critical value.
DF which aren't in the table
When the degrees of freedom aren't listed in the table, there are a couple of choices that you have.

You can interpolate. This is probably the more accurate way. Interpolation involves estimating the
critical value by figuring how far the given degrees of freedom are between the two df in the table and going that
far between the critical values in the table. Most people born in the 70's didn't have to learn interpolation in high
school because they had calculators which would do logarithms (we had to use tables in the "good old" days).

You can go with the critical value which is less likely to cause you to reject in error (type I error).
For a right tail test, this is the critical value further to the right (larger). For a left tail test, it is the value further to
the left (smaller). For a two-tail test, it's the value further to the left and the value further to the right. Note, it is
not the column with the degrees of freedom further to the right, it's the critical value which is further to the
right. The Bluman text has this wrong on page 422. The guideline is right, the instructions are wrong.
Stats: Goodness-of-fit Test
The idea behind the chi-square goodness-of-fit test is to see if the sample comes from the population with the
claimed distribution. Another way of looking at that is to ask if the frequency distribution fits a specific pattern.
Two values are involved, an observed value, which is the frequency of a category from a sample, and the
expected frequency, which is calculated based upon the claimed distribution. The derivation of the formula is very
similar to that of the variance which was done earlier (chapter 2 or 3).
The idea is that if the observed frequency is really close to the claimed (expected) frequency, then the square of
the deviations will be small. The square of the deviation is divided by the expected frequency to weight
frequencies. A difference of 10 may be very significant if 12 was the expected frequency, but a difference of 10
isn't very significant at all if the expected frequency was 1200.
If the sum of these weighted squared deviations is small, the observed frequencies are close to the expected
frequencies and there would be no reason to reject the claim that it came from that distribution. Only when the
sum is large is the a reason to question the distribution. Therefore, the chi-square goodness-of-fit test is always
a right tail test.
The test statistic has a chi-square distribution when the following
assumptions are met

The data are obtained from a random sample


The expected frequency of each category must be at least 5. This goes back to the requirement that
the data be normally distributed. You're simulating a multinomial experiment (using a discrete distribution) with
the goodness-of-fit test (and a continuous distribution), and if each expected frequency is at least five then you
can use the normal distribution to approximate (much like the binomial). If the expected
The following are properties of the goodness-of-fit test

The data are the observed frequencies. This means that there is only one data value for each
category. Therefore, ...

The degrees of freedom is one less than the number of categories, not one less than the sample
size.

It is always a right tail test.

It has a chi-square distribution.

The value of the test statistic doesn't change if the order of the categories is switched.
Stats: Test for Independence

In the test for independence, the claim is that the row and column variables are independent of each other. This is
the null hypothesis.
The multiplication rule said that if two events were independent, then the probability of both occurring was the
product of the probabilities of each occurring. This is key to working the test for independence. If you end up
rejecting the null hypothesis, then the assumption must have been wrong and the row and column variable are
dependent. Remember, all hypothesis testing is done under the assumption the null hypothesis is true.
The test statistic used is the same as the chi-square goodness-of-fit test. The principle behind the test for
independence is the same as the principle behind the goodness-of-fit test. The test for independence is always a
right tail test.
In fact, you can think of the test for independence as a goodness-of-fit test where the data is arranged into table
form. This table is called a contingency table.
The test statistic has a chi-square distribution when the following
assumptions are met

The data are obtained from a random sample

The expected frequency of each category must be at least 5.


The following are properties of the test for independence

The data are the observed frequencies.

The data is arranged into a contingency table.

The degrees of freedom are the degrees of freedom for the row variable times the degrees of
freedom for the column variable. It is not one less than the sample size, it is the product of the two degrees of
freedom.

It is always a right tail test.

It has a chi-square distribution.

The expected value is computed by taking the row total times the column total and dividing by the
grand total

The value of the test statistic doesn't change if the order of the rows or columns are switched.

The value of the test statistic doesn't change if the rows and columns are interchanged (transpose
of the matrix)
..

Various definition of Research Methods by various authors


Clarke and Clarke: Research is a careful, systematic and objective investigation conducted to obtain valid facts,
draw conclusions and established principles regarding an identifiable problem in some field of knowledge.
John .W. Best: Research is a systematic and objective analysis and recording of controlled observations that may
lead to the development of generalizations, principles, theories and concepts, resulting in prediction for seeing
and
possibly
ultimate
control
of
events.

Clifford woody: Research is a careful enquiry or examination in seeking facts or principles, a diligent
investigation
to
ascertain
something.
Mouley: It is the process of arriving at dependable solution to the problems through the planned and systematic
collection, analysis and interpretation of data.

Research is a logical and systematic search for new and useful information on a particular topic. It is an
investigation of finding solutions to scientific and social problems through objective and systematic analysis. It is
a search for knowledge, that is, a discovery of hidden truths. Here knowledge means information about matters.
The information might be collected from different sources like experience, human beings, books, journals, nature,
etc. A research can lead to new contributions to the existing knowledge. Only through research is it possible to
make progress in a field. Research is indeed civilization and determines the economic, social and political
development of a nation. The results of scientific research very often force a change in the philosophical view of
problems which extend far beyond the restricted domain of science itself.
Research is not confined to science and technology only. There are vast areas of research in other disciplines
such as languages, literature, history and sociology. Whatever might be the subject, research has to be an active,
diligent and systematic process of inquiry in order to discover, interpret or revise facts, events, behaviours and
theories. Applying the outcome of research for the refinement of knowledge in other subjects, or in enhancing the
quality of human life also becomes a kind of research and development.
Research is done with the help of study, experiment, observation, analysis, comparison and reasoning.
Research is in fact ubiquitous. For example, we know that cigarette smoking is injurious to health; heroine is
addictive; cow dung is a useful source of biogas; malaria is due to the virus protozoan plasmodium; AIDS
(Acquired Immuno Deficiency Syndrome) is due to the virus HIV (Human Immuno Deficiency Virus). How did
we know all these? We became aware of all these information only through research. More precisely, it seeks
predictions of events, explanations, relationships and theories for them.

Twitter
Pivoted from: Odeo
In 2005, Evan Williams and Biz Stone designed a platform to create, browse, and share podcasts. They were
making a bet that podcasting would become a mainstream medium for sharing news and broadcasting opinion.
They would eventually be proven correct, but not before Apple launched podcast support for iTunes in June.
Williams and Stone took a step back and researched the new market, exploring user adoption rates, technology,
and customer acquisition costs. They concluded that they had no real chance of competing against Apple.
Crucially, however, they didn't simply give up. They realized that the platform they had built had tremendous
scalability and potential. Suppose they doubled-down on simplicity, and just made a portal where people could
share what they were up to. They looked at existing social networks like Facebook, and researched customer
dissatisfaction. Users loved Facebook for photo-sharing and friend-snooping, but often found the News Feed to
be overwhelming and cluttered. Their new venture, Twitter, would provide a back-to-the-basics feed of
information, with a focus on news and celebrity. It seemed crazy, but they pulled it off, accomplishing one of the
most successful pivots of the 21st century.
..
Advantages of primary data: Advantages of primary data are as follows:

The primary data are original and relevant to the topic of the research study so the degree of accuracy is very
high.
Primary data is that it can be collected from a number of ways like interviews, telephone surveys, focus groups
etc. It can be also collected across the national borders through emails and posts. It can include a large population
and wide geographical coverage.
Moreover, primary data is current and it can better give a realistic view to the researcher about the topic under
consideration.
Reliability of primary data is very high because these are collected by the concerned and reliable party.

Addresses Specific Research Issues


Carrying out their own research allows the marketing organization to address issues specific to their own
situation. Primary research is designed to collect the information the marketer wants to know (Step 2) and
report it in ways that benefit the marketer. For example, while information reported with secondary research
may not fit the marketers needs (e.g., different age groupings) no such problem exists with primary research
since the marketer controls the research design.

Greater Control
Not only does primary research enable the marketer to focus on specific issues, it also enables the marketer to
have a higher level of control over how the information is collected. In this way the marketer can decide on
such issues as size of project (e.g., how many responses), location of research (e.g., geographic area) and time
frame for completing the project.

Efficient Spending for Information


Unlike secondary research where the marketer may spend for information that is not needed, primary data
collections focus on issues specific to the researcher improves the chances that research funds will be spent
efficiently.

Proprietary Information
Information collected by the marketer using primary research is their own and is generally not shared with
others. Thus, information can be kept hidden from competitors and potentially offer an information
advantage to the company that undertook the primary research.

Disadvantages of primary data: Following are the disadvantages of primary data:


For collection of primary data where interview is to be conducted the coverage is limited and for wider coverage
a more number of researchers are required.
A lot of time and efforts are required for data collection. By the time the data collected, analysed and report is
ready the problem of the research becomes very serious or out dated. So the purpose of the research may be
defeated.
It has design problems like how to design the surveys. The questions must be simple to understand and respond.
Some respondents do not give timely responses. Sometimes, the respondents may give fake, socially acceptable
and sweet answers and try to cover up the realities.

Cost

Compared to secondary research, primary data may be very expensive since there is a great deal of marketer
involvement and the expense in preparing and carrying out research can be high.

Time Consuming

To be done correctly primary data collection requires the development and execution of a research plan. Going
from the start-point of deciding to undertake a research project to the end-point to having results is often much
longer than the time it takes to acquire secondary data.

Not Always Feasible

Some research projects, while potentially offering information that could prove quite valuable, are not within
the reach of a marketer. Many are just too large to be carried out by all but the largest companies and some are
not feasible at all. For instance, it would not be practical for McDonalds to attempt to interview every customer
who visits their stores on a certain day since doing so would require hiring a huge number of researchers, an
unrealistic expense. Fortunately, as we will see in a later tutorial there are ways for McDonalds to use other
methods (e.g., sampling) to meet their needs without the need to talk with all customers.
.

Merits
Degree of accuracy is quite high.

It does not require extra caution.

It depicts the data in great detail.

Primary source of data collection frequently includes definitions of various terms and units
used.

For some investigations, secondary data are not available.


Demerits
Collection of data requires a lot of time.

It requires lot of finance.

In some enquiries it is not possible to collect primary data.

It requires a lot of labor.

It requires a lot of skill.

Das könnte Ihnen auch gefallen