Sie sind auf Seite 1von 2

Primary data

Secondary data

MEANING:
DATA COLLECTED FIRST HAND IS CALLED
PRIMARY DATA.

THE SECONDARY DATA IS COLLECTED FROM


READILY AVAILABLE SOURCES SUCH AS
PUBLICATIONS.

METHOD/SOURCES :
THE PRIMARY DATA ARE COLLECTED
THROUGH
SURVEYS/INTERVIEWS,EXPERIMENTATION
AND OBSERVATION.

THE SECONDARY DATA IS COLLECTED FROM


INTERNAL SOURCES SUCH AS COMPANYS
RECORDS,AND FROM EXTERNAL SOURCES
SUCH AS
NEWSPAPER,MAGAZINES,GOVERNMENT,RECO
RDS,I
INDUSTRYREPORTS,ETC.

ACCURACY:
THE PRIMARY DATA ARE MORE ACCURACY
AS SUCH DATA ARE COLLECTED FIRST
HAND BY THE RESEARCHER OR FIELD
INTERVIEWERS.

THE SECONDARY DATA MAY NOT BE


ACCURACY;AS SUCH DATA ARE COLLECTED
FROM READILY AVAILABLE PUBLISHED.

RELIABILITY:
THE PRIMARY DATA ARE MORE RELIABLE
AS THEY ARE COLLECTED FIRST HAND BY
THE RESEARCHER.

THE SECONDARY DATA MAY NOT BE RELIABLE


AS SUCH DATA ARE COLLECTED FROM
READILY AVAILABLE SOURCES, WHICH MAY OR
MAY NOT BE ACCURACY.

TIME FACTOR :
TO COLLECTED PRIMARY DATA, THE
RESEACRCHER HAS TO SPEND A LOT OF
TIME AND EFFORT.

COST FACTOR :
TO COLLECT PRIMARY DATA, THE
RESEACRCHER HAS TO INCUR MORE
EXPENSES SUCH AS PREPARATION AND
PRINTING OF QUESTIONNAIRE,TRAINING
TO THE INTERVIEWERS,ETC.

PAPER WORK :
PRIMARY DATA INVOLVE A LOT OF PAPER
WORK IN RESPECT OF QUESTIONNARIE ,
TABULATION AND ANALYSIS OF DATA

THE SECONDARY DATA CAN BE COLLECTED


FROM INTERNAL AND EXTERNAL SOURCES AT
A COMPARATIVELY QUICKER PACE.

THE COST OF COLLECTING SECONDARY DATA


IS COMPARATIVELY LOWER AS THERE IS NO
NEED TO PREPARE AND PRINT
QUESTIONNAIRE,AND THERE IS NO NEEDOF
FIELD STAFF.

SECONDARY DATA INVOLVES RELATIVELY LESS


PAPER WORK, AS THE DATA ARE READILY
AVAILABLE FROM PUBLISHED SOURCES.

In this nal chapter we provide a summary of conclusions, lessons learned and a vision for future research,
given the overall theme of data mining in context. Below we will outline and relate some of the main ndings of
the chapters, see the chapters itself for more background and references.

The main purpose of the cases chapter (chapter 2) is to demonstrate that successful data mining
applications involve a lot more than just applying or improving a core modeling algorithm. The rst two cases
were originally written for audiences with no data mining or computer science background, marketeers and
medical profes- sionals respectively. In both elds there is a clear push towards a more data driven, quantitative
or even scientic approach, as illustrated by trends such as evidence based medicine, personalized treatments,
database marketing, one to one marketing and real time decisioning. These cases provide an inside out, end to
end view of data mining and the data mining process, taking the application context rather than the technology
as a starting point. One of the ndings of both studies was that there were often no major dierences between
prediction algorithms on a problem, nor were there clear winners across the overall range of problems.
The third case is an even stronger example of this. This case is a more research oriented project dealing with
the recognition of pathogen yeast cells in images. For this case it is still an open question whether the underlying
problem is truly easy to solve (classifying yeasts) given that the data mining problem is trivial (classifying
pictures resulting from the experimental set up chosen). This is a good practical example that the particular
translation of the research or business problem into a data mining problem has a major impact on the results,
which is a topic that should be covered by the rst step in the data mining process, dening the objectives and
experimental approach.
The medical case on predicting head and neck cancer survival rate points out another specic issue. Whilst
building a model on a single data set is relatively straightforward and may lead to models with comparable
accuracy, a variety of other data sets from other sources is also available. These data sets may dier in terms of
attribute coverage, population denition and experimental set up. How can these data sets be combined into a
single source of data to mine in? This topic is addressed in chapter 3 and concerns the second step in the data
mining process, the data step.
The fact that dierent classiers produce similar results on the same data set is also making the point for
going beyond mere performance evaluation. Evaluation methods should be used and developed that provide
more of a diagnosis and char- acterization approach to evaluation rather than just measuring quality, a topic that
is addressed in chapter 4 and 5. This ts the evaluation step in the data mining process, traditionally the fourth
step straight after the modeling step, but as per the above the scope of this evaluation should not be constrained
to the modeling only, a topic that chapter 4 is specically dealing with. In the second case we did carry out a
basic bias variance evaluation, but this was limited to comparing dierent modeling algorithms only. Just as in
chapter 4 variance was a more important component than bias to explain dierences across classiers, and the
experiments provided us with data to take a somewhat speculative attempt at estimating the intrinsic error, the
error rate for the (theoretical) optimal classier.
The nal case introduces a real time automatic scene classier for content-basedvideo retrieval. In our
envisioned approach end users like television archive doc- umentalists, not image processing experts, build
classiers interactively, by simply indicating positive examples of a scene. To produce classiers that are
suciently reliable we have developed a procedure for generating problem specic data prepro- cessors. This
approach has been successfully applied to various domains of video content analysis, such as content based
video retrieval in television archives, auto- mated sewer inspection, and porn ltering. In our opinion, in most
circumstances the ultimate goal for data mining is to let end users create classiers, primarily because it will be
more scalable; a lot more classiers can be created in much shorter time by a lot more users. In addition the
resulting models can be of higher quality compared to purely data driven approaches, no matter how advanced
the algorithms would be, because experts can inject domain knowledge by identifying relevant preprocessors. In
terms of the data mining process, this case is more concerned with changing the agent executing the end to end
process from data miner to domain expert.
In summary, given the apparent limited impact of modeling methods given a prepared data set, these cases
demonstrate the importance of developing generally applicable methodology and tools to improve all steps in the
process beyond model- ing, starting with objective formulation, data collection and preparation to evaluation and
deployment, which also opens up opportunities for end to end process automa- tion.

Das könnte Ihnen auch gefallen