Sie sind auf Seite 1von 31

Summary: Research Methods

[1]

30.03.2012

Part I Conceptual and Ethical Foundations


Chapter 1
The Spirit of Behavioral Research
Science and the Search for Knowledge:
Scientific method is a misleading term because not every finding can be accounted for with the same formulaic strategy; one of the four ways of knowing. Empirical reasoning is the process that underlies scientific method. Verifiability principle means to be able to assess the truth via amassing of factual observations. Falsifiable hypotheses are claims that knowledge evolves through repeated testing of circumstances.

What do Behavioral Researchers Really Know?:


Why-questions and how-questions: Why resembles the underlying process of something and How addresses the observable nature of things. Descriptive research orientation(s) refers to Social constructionism, Contextualism/Perspectism, Evolutionary Epistemology.

Social Constructionism:
Panpsychistic. The (social) world exists only in the cognition of the individual which itself is a linguistic construction. Explaining the world through interpretations and narrative analyses of everyday life. The deductivist approach is enough; which leads to probabilistic assertions (A is more likely than B). Science should be defined through experiments of things that can be exactly duplicated (Gergen, 1995), thus Geology would be no scientific discipline.

Contextualism/Perspectism:
Both started as a reaction against a mechanistic model of behavior. The pure knowledge itself does not exist. Circumstances are only through context, such as societal systems and culture, getting meaning; and within those borders true (Contextualism).

[2]

Therefore, all hypothesis and theories are true (and false) depending on the perspective. Thus, change (which is constrained by nature) must be intrinsic and no theory can account for everything. This doctrine is called methodological pluralism and theoretical ecumenism (no universal laws without borders).

Evolutionary Epistemology:
The core idea is that successful theories and knowledge in science evolve in a competition for survival. Theories survive because of their usefulness and independent from context (see Contextualism). Ultimately there is an a priori reality waiting to be discovered.

Peirces Four Ways of Knowing:


Method of tenacity here you are to believe something because its common or old knowledge (religion, opinion of most). Method of authority means that you should consult experts if you would like to know what to believe or what is true. A priori method claims that one should make use of reason and logic to think without getting influenced by authorities. Scientific method makes use of the a priori strategy plus encourages us to investigate the world as well (open mindedness and skepticism).

The Rhetoric of Justification:


Rhetoric is defined as persuasive language to prose, warrant, defend, or excuse certain beliefs. Whorfian hypothesis: the language that we use is our window into the world.

Visualizations and Perceptibility:


The use of images in everyday reasoning and thinking in science and in the social world (the use of analogies etc.).

Aesthetics:
The perceptible images are evaluated on their beauty; probably a basic psychological component (theories in science can also be beautiful or elegant).

Limitations of the Four Supports of Conviction:


The factors of perceptibility, rhetoric, aesthetics, and empirical content are imperfect support. Make use of empirical methods to create comparable findings.

Behavioral Research Defined:


[3]

Neuroscience: Most micro. Biological and biochemical factors. Cognition: More micro. Thinking and reasoning. Social Psychology: More macro. Interpersonal and group factors. Sociology: Most macro. Societal systems.

Three Broad Research Orientations:


Descriptive research orientation: What is happening behaviorally? (How). This approach is frequently considered as a necessary first step. Relational research orientation: Focus on at least two variables to infer association/causality. Making use of ad hoc hypothesis which are developed for this special result and working hypothesis (suppositions we are using). Experimental research orientation: Focuses on the identification of causes (What). The researcher makes use of experimental and control groups to get infer causality. The research should be programmatic, which means following a plan that involves more than a single study (or a single set of observations).

The Descriptive Research Orientation:


Observational study. The correlation between predictor (independent) and criterion (dependent) variable.

The Relational Research Orientation:


The value of replication: varied replications vary slightly from one another. Construct refers to an abstract (psychological) idea which is used as an explanatory concept. Social desirability describes the assumption that people differ in their need for approval and affection. Operationalize something; is a definition based on observable /empirical guidelines. Pseudosubject, a confederate of the experimenter.

The Experimental Research Orientation:


A number of female monkeys became mothers themselves.

Empirical Principles as Probabilistic Assertions:


Empirical principles are based on controlled observations and generally accepted scientific truths about the likelihood of a situational behavior. Therefore, we think of these principles as probabilistic assertions on the assumption Deductive-statistical explanation: Reasoning from general to specific in a probabilistic way.
[4]

Inductive-statistical explanation: Reasoning from specific to general in a probabilistic way.

Orienting Habits of Good Scientific Practice:


Enthusiasm. Open-mindedness. Common sense. Role-taking ability. Inventiveness. Confidence in ones own judgment.

Consistency and care about detail. Ability to communicate. Honesty.

[5]

Chapter 2
Contexts of Discovery and Justification
Inspiration and Exploration:
Discovery refers to the origin of ideas or the genesis of theories and hypotheses. Justification refers to the processes by which hypotheses are empirically adjudicated. Null hypothesis significance testing is a dichotomous decision making paradigm (either/or not).

Theories and Hypotheses:


Hypotheses are theoretical statements or theoretical propositions. Theories are hypothetical formulations; conjectural. Thinking inductively is thinking theoretically. Testability means that theories and hypotheses are stated in a way that should allow disconfirmation (falsification).

Using a Case Study for Inspiration:


Using an in-depth analysis of an individual or a group of people with shared characteristics.

Serendipity in Behavioral Research:


Serendipity is the term for lucky findings.

Novelty, Utility, and Consistency:


Novelty shouldnt be merely a minor variation from an older idea with a big contribution to the scientific world. Utility and Consistency: Is the idea useful and consistent with what is generally known in the field? To ensure that you are up-to-date visit the newest literature on your topic.

Testability and Refutability:


Poppers idea was that it is not verifiability, but falsifiability which is the essential difference between science and non-science/pseudoscience.

Clarity and Conciseness:


Operational definition: The technical name for an empirically based definition.

[6]

Theoretical definitions: Do not attempt to force our thinking into a rigidly empirical mold. Typology is a systematic classification of types when trying to condense the definition of a psychological concept. Facet analysis: To formulate a classification system based on assumed structural patterns or dimensions (factor analysis). Coherence means whether the precise statement of the hypothesis fits logically Parsimony describes the simplicity of a statement (Occams razor).

Positivism:
Embracing the positive observational (opposite to negativism). So, statements authenticated by sensory experience are more likely to be true (see Vienna circle). Hume: All knowledge resolves itself in probability and thus it is impossible to prove beyond doubt that a generalization is incontrovertible true.

Falsificationism:
Antipositivist view on the basis of inescapable conclusions. It is argued that theories might have boundaries that will never be explored.

Conventionalism:
Duheme-Quine thesis plays on the role of language. Meaning that theories evolve on the basis of certain linguistic conventions (like simplicity).

An Amalgamation of Ideas:
The views about the requirements of scientific theories and hypotheses now seem a mixture of Falsificationism, conventionalism, and practicality: i. ii. iii. iv. v. vi. Finite testability. Falsifiability. Theories can add or replace outmoded models. If its not supported it may not be right. Though, if there is support it may not be true. There are always alternative explanations.

Type I and Type II Decision Errors:


Null hypothesis, H0 Alternative hypothesis, H1
[7]

Type I Error is the mistake of rejecting H0 when its true Type II Error refers to the mistake of not rejecting H0 when it is false. Significance level, p-value, indicates the probability of Type I error and is denoted as alpha. The probability of Type II error is symbolized as beta. Confidence: 1-alpha, is the probability of not making a Type I error. Power: 1-beta, the probability of not making a Type II error; sensitivity.

Statistical Significance and the Effect Size:


Significance test = Size of effect x Size of Study Any particular test of significance can be obtained by one or more definitions of the effect size multiplied by one or more definitions of the study size.

Two Families of Effect Sizes:


Correlation family (phi, rpb, R, partial-eta squared). Difference family (Cohens d, Hedges g) (Ratio family.)

Interval Estimates around Effect Sizes:


Null-counternull interval is an estimate based on the actual p rather than the chosen alpha level.

[8]

Chapter 3
Ethical Considerations, Dilemmas, and Guidelines
Puzzles and Problems:
A controversy between Wittgenstein and Popper around their views of philosophy. There are no problems in philosophy, simply linguistic puzzles revealing misuse of language (W.). Ethics: Has to do with the values by which the conduct of individuals is morally evaluated.

A Delicate Balancing Act:


Even if topics seem to be neutral (and the researcher himself as well) one must realize that to others these topics do not necessarily be free of value. Institutional Review Board (IRB): This group of independent people oversees the work of scientists. Passive Deception: The use of deception via leaving our or avoiding something. Active Deception: The deception through intentional misleading information on behalf of a certain party.

Historical Context of the APA Code:


APA created a task force Cook Commission which wrote a code of ethics (1966; adapted in 1972). The APA proposed a list with 10 Ethical Guidelines.

The Belmont Report, Federal Regulations, and the IRB:


Belmont Report: Includes emphasize on respect, maximization of plausible benefits (minimizing harm), and fairness. If more than minimal risk is involved then specific safety requirements are necessary. Minimal risk research: Including studies in which the probability for the participant to get harmed is not higher than in everyday life. The emphasis is on 5 broad principles.

Principle I Respect for Persons and their Autonomy:


Informed Consent refers to the procedure in which prospective subjects voluntarily agree. May also impair validity of the research (subject expectations) or can create doubts on its own (paranoid ideation).

Principle II Beneficence and Nonmalficience:

[9]

Doing good (beneficence) or doing no harm (nonmalficience). Debriefing if deception is used; also referred to as dehoaxing. Can be useful for disclosure of information that was not revealed before as well.

Principle III Justice:


Placebo (represents an issue of fairness); it implies a masquerading of the real thing. Wait-list control group, here is the alternative therapy given to the control group after results have been documented in the experimental group. How people view ethical questions depends on their orientation. The Consequentialist view refers to the argument stating that right or wrong depends on the last consequence. Whereas, the Deontologist view argues that the procedure matters and not the following consequence. For participants who have been in a placebo - or control group, a moral cost may be involved by simply publishing the results of the study.

Principle IV Trust:
Confidentiality is intended to ensure the subjects privacy by procedures for protecting the data (see Certificate of Confidentiality).

Principle V Fidelity and Scientific Integrity:


Causism is a problem that occurs if someone implies a causal relationship where none is supported by the underlying data. Omission of data, dealing with outliers and in general the fabrication of data is important in order to establish a meaningful association between variables. There is a discussion about the moral and technical appropriateness of analysis and re-analysis of data. Plagiarism refers to the stealing of anothers ideas or work.

Costs, Utilities, and IRBs:


It is important not to only consider the expenses for research but also to have a look at the probable costs of doing no research in a specific field (prospective costs/risks analysis).

Scientific and Societal Responsibilities:


Research on animals has been the foundation for numerous significant advances.

[10]

Part II Operationalization and Measurement of Dependent Variables


Chapter 4
Reliability and Validity of Measurements
Random and Systematic Error:
Reliability is the degree to which measures give consistent, dependable, and stable results. Validity refers to the degree to which measures are appropriate or meaningful. Error is the fact that all measurements are subject to fluctuations. Noise is another description of chance error. Random error is presumed to be uncorrelated with the actual value (push measurements up or down, so over many trials should be very close to normal). Systematic error pushes measurements in the same direction and over many trials the population value wont be reached. The systematic errors in experimental research are the main concern of internal validity.

Assessing Stability and Equivalence:


Three traditional types of reliability do exist and each is quantified by a reliability coefficient. i. ii. Test-retest reliability from one measurement to another. Alternate-form reliability equivalence of different versions of tests. The correlation between two tests is called equivalent-forms reliability; the tests are also expected to have the same variance. Stability coefficients are correlations between scores on the same form administered to the same people at different times. Coefficients of equivalence are correlations between scores on different forms administered at the same time. Cross lagged correlations are used to indicate both, the stability and the equivalence. Internal-consistency reliability is the degree of relatedness of items that measure the same thing in a test, reliability of components.

iii.

Internal-Consistency Reliability and Spearman Brown:

[11]

If we judge the internal-consistency reliability of a test to be too low, we can increase the value of R by increasing the number of items as long as the items remain reasonably homogenous. Spearman-Brown equation is particular useful to assess the total reliability of a test by increasing the number of test items.

KR20 and Cronbachs Alpha:


Split-half reliability is obtained via correlating one half of a test with the other half. The problem is that we get different values depending on where we do the split. KR20 is based on all possible splits, and is used when test items are scored dichotomously. The KR20 is also understood as a special case of Cronbachs alpha.

Effective Reliability of Judges:


Effective reliability is the estimation of reliability of a group of judges (RSB). To get an estimate of the judge-to-judge reliability intraclass correlation is used.

Effective Cost of Judges:


Effective cost (EC) is a procedure that tries to maximize reliability for fixed cost. Once the EC is assessed for each judge, they are ranked from best to worst. When two or more different kinds of judges are used, the Spearman-Brown formula cannot be used.

Interrater Agreement and Reliability:


Interrater agreement can be very misleading and ambiguous. You need the number of agreements and disagreements to compute it. A better procedure is the product-moment correlation.

Cohens Kappa:
Cohens Kappa is adjusted for agreement based in simple lack of variability to improve the measurement of percentage agreement. Observed minus expected is divided by the total number of cases minus the expected. Omnibus statistical procedures, has the problem that it is difficult to tell which statements are made reliable and which are not. Focused statistical procedures are testing a specific statement.

Replication in Research:
Replication or Repeatability can only be relative, so no exact replication will ever be possible and several factors affect the utility of a replication.

[12]

i. ii.

When the replication is conducted. Thus, early replications are typically more useful. How the replication is conducted. A precise replication is intended to be as close as possible to the original whereas a varied replication intentionally changes an aspect of it. Who conducted the replication because of the problem of correlated replicators (same person replicates finding over and over again). Unfortunately and fortunately, some researchers are precorrelated by virtue of their common interests.

iii.

Validity Criteria in Assessment:


Determining validity typically depends on accumulation of evidence in three specific fields. i. ii. Content validity requires that the items represent the material/content they should. Criterion validity concerns the extent to which the test correlates with criteria it should correlate with. When a criterion is in the immediate present, we speak of concurrent validity. Another type of criterion related indication is predictive validity. Construct validity is a measure of the psychological characteristic and its correlation with the questionnaire that was used to assess it. There are two types of construct validity. The convergence (convergent validity) across different methods or measures of the same trait and the divergence (divergent validity) between measures of related but conceptually distinct behaviors or traits. The Multitrait-multimethod matrix of intercorrelations is used to triangulate the convergent and discriminant validity of a construct.

iii.

Test Validity, Practical Utility, and the Taylor-Russell Tables:


Selection ratio is the proportion of applicants to be selected by a test. The selection accuracy increases as the validity coefficients increase and the selection ratios decrease. The benefits of increasing validity coefficients are usually greater as selection ratios decrease.

Relationship of Validity to Reliability:


There is no minimum level of internal-consistency reliability needed for validity but in practice low reliabilities associated with high validity are not common. In practice the validity of a composite instrument depends on the average validity, number, and intercorrelation of each individual item, subtest or judge. The larger the better.

[13]

Chapter 5
Observations, Judgments, and Composite Variables
Observing, Classifying, and Evaluating:
refers to data in written form, records, etc. consists of numerical data.

Observing While Participating:


The preferred strategy of investigation of ethnographers is participant observation, which means the interaction while participating as observers in a culture.

Maximizing Credibility and Serendipity:


Time sampling involves sampling of specified periods and recording everything of interest in this time. Behavioral sampling is used when the behavior itself is periodically sampled. Events are relatively brief occurrences in a specific moment. States are occurrences of longer duration. Condensed account is signed by the participants implying a permission to use gathered data. Expanded account includes details from recall of things that were not recorded earlier.

Organizing and Sense-Making in Ethnographic Research:


Organize it chronically and zoom from broad to narrow. Analytic serendipity starts with knowledge about the current literature and follows the asking of questions about particular phenomena.

Interpreter and Observer Biases:


Are normally categorized as . These kinds of artifacts are systematic errors that operate in the hand of the scientist but are not due to uncontrolled variables that might interact with the participants behavior. are systematic errors that occur while interpreting the data (e.g. while clinging to a specific theory). refer to systematic errors in the recording phase of research (perception does not equal reality) generally in favor of the researcher hypothesis.

Unobtrusive Observations and Nonreactive Measurements:


[14]

are not controlled for and they do have a direct impact on the reactions of research participants. Reactive measures affect the behavior that is being measured, whereas nonreactive measures do not.

Archives:
Two subcategories exist in archival research. Running records, such as actuarial data (birth, marriage, facebook timeline) and personal documents.

Archival Data and Content Analysis:


is the name given to a set of procedures that are used to categorize and evaluate pictorial, verbal, or textural material. Use something (hypothesis, common sense) as a basis to judge something.

Physical Traces:
Simple unobtrusive observations are observations that are not visible for the person being observed.

Unobtrusive Observation:
Contrived unobtrusive observations are unobtrusive measures in manipulated situations.

Selecting the Most Appropriate Judges:


, judges are used to evaluate or categorize the variables of interest (random judges or expert judges depending on the corresponding hypothesis). There is generally a positive correlation between bias toward a category and accuracy in that category.

Effects of Guessing and Omissions on Accuracy:


The scoring of omitted items as zero gives too little credit. It is reasonable to credit these items with the score that would be attained via random guessing.

Forced-Choice Judgments:
is a procedure used to overcome the halo error. refers to a type of response set in which the person being evaluated is judged in terms of a general impression.

Categorical Scales and Rating Scales:


are responses on some form of continuous rating scale (numerical -, magnitude and graphic scale).

Numerical Formats:

[15]

has numbers as anchors and a following description which is used to give those numbers meaning and a specific context to evaluate.

Graphic Formats:
Are simply straight lines (mostly invisibly consisting of different areas) in which the judge or the participant should indicate his/her position (attitude) with regard to construct in question (anchor).

Scale Points and Labels:


The greatest benefits to reliability with regard to the measurement instrument accrued as you go from 2 to 7 points in scale. contain gradual distributions of one aspect. contain gradual distributions of one aspect and its opposite anchored with neutrality in the middle.

Magnitude Scaling:
Magnitude scaling is a concept in which the upper range of the score is not defined but left for interpretation of the judge (open-ended).

Rating Biases and their Control:


negatively. occurs if raters avoid giving extreme ratings. in rating refers to the problem that judges rate variables or dimensions in a similar way, only in the basis of perceived logical relatedness. The difference between halo and logical error is that the latter one results from the judges conscious evaluation of relatedness and not from a subjective feeling. tend to rate someone who is familiar more positively. if you getting reminded about the leniency you tend to evaluate more

Bipolar Versus Unipolar Scales:


It may often be worth the effort to use more unipolar scales in hopes of turning up some surprises; but bipolar are more common.

Forming Composite Variables:


To form composite variables, you start standardizing the scores of all available variables and then replace these score with the overall mean.

Benefits of Forming Composite Variables:

[16]

If variables are highly correlated with each other it is hard to treat them as being different. Thus, conceptually the forming of composite variables is beneficial. If you combine variables you are able to obtain more accurate estimates of the relationships with other composite variables and you reduce the number of predictors.

Forming Composites and Increasing Effect Sizes:


Only when individual variables are perfectly correlated with each other there is no benefit of forming composite variables. The lower the mean intercorrelation among the individual variables, the greater will be the increase in r.

The Intra/Intermatrix:
average means the average between variables within a single composite. average characterizes the level of relationship between one composite variable and another.

The r Method:
In this method the point-biseral correlation is computed. This is the Pearson r, where one of the variables is continuous and the other is dichotomous. The rpb is computed between the mean correlations of the intra/Intermatrix (continuous) with their dichotomously located position on the principal diagonal, versus off the diagonal. The more positive the correlation, the higher the intra than the inter.

The g Method:
G is the difference between the mean of the mean rs on the diagonal and the mean of the mean rs off the diagonal divided by the weighted s, combined from the on-diagonal (intra) and off-diagonal (intra) values of r.

The Range-to-Midrange Ratio:


Is used if you cannot use the r or the g method.

[17]

Part III The Logic of Research Designs


Chapter 7
Randomized Controlled Experiments and Causal Inference
Experimentation in Science:
The most important principle is randomization. Experimentation is a systematic study design to examine the consequences of deliberately varying a potential cause agent. Variation, posttreatment measures, and inferential techniques are common features.

Randomized Experimental Designs:


Between-subjects designs are regarded as the gold standard, such as randomly assigning the subjects to a specific treatment group or placebo. As there is a lot of ethical discourse about the use of placebos, they may only be used when there are no other therapies available that are suitable for comparison. Wait-list control groups are another possibility; combined with repeated measurements, these allow us to gain further information about the temporal effects of the drug. Within-subjects design, or crossed design is used when each participant is receiving more than one treatment or is in more than one condition (e.g. being experimenter and subject at the same time). To address the problem of systematic differences between successive conditions, the experimenter can use counterbalancing, that is rotating the sequences of the conditions in a so called Latin square. A factorial design includes more than one variable (or factor) which is measured by including more than one level of each factor (e.g. 2x2). Also known as full factorials. Fractional factorial designs only use some combinations of factor levels. Mixed factorial designs, consisting of both between- and within-subjects factors.

Characteristics of Randomization:
The rare instances in which very large differences between conditions existed even before the treatments were administered are sometimes referred to as failures of randomization. Another variation on the randomized designs noted before is to use pretest measurements to establish baselines scores for all subjects.
[18]

Statistically randomness means that each participant has the same probability of being chosen for a particular group.

The Philosophical Puzzle of Causality:


Puzzling topic this causality! Causal relations imply an association between cause and effect. There are four types of causation: i. ii. iii. iv. Material cause refers to the elementary composition of things. The formal cause is the outline, conception, or vision of the perfected thing. Efficient cause (!) is the agent or moving force that brings about the change. The final cause, or teleological explanation, refers to the purpose, goal, or ultimate function of the completed thing.

Contiguity, Priority, and Constant Conjunction:


Hume, the sensation of causation is fictional. He came up with eight rules to judge causes and effects; boiled down to three essentials. i. ii. iii. Contiguity in space and time. Priority which means that the cause most come before the effect. Constant conjunction, or union, between cause and effect.

AS an example, the barometer falls before it rains, but a falling barometer doesnt cause the rain. Coincidentally, e.g. Monday precedes Tuesday but it is absurd to say that Monday causes Tuesday. Therefore, there is a missing ingredient for causality.

Four Types of Experimental Control:


Control conditions, and control, are implying a check for the treatment condition. Behavior control refers to the shaping of learned behavior based on a particular schedule of reinforcement designed to elicit the behavior in question.

Mills Method of Agreement and Difference:


Method of agreement states that, If X, then Y. X is in this case a sufficient condition (or adequate) to bring about the effect in Y. Method of difference describes that, If not-X, then not-Y; thus X is not just a sufficient condition of Y, but a necessary one. In other words, X is required for Y to occur.

Between-groups Designs and Mills Joint Method:

[19]

The group given the treatment (experimental condition) resembles the method of agreement, whereas the group not given the drug (control condition) resembles the method of difference. The method described above is referred to as joint method of agreement and difference.

Independent, Dependent, and Moderator Variables:


Independent variable is the antecedent which evokes a presumable change in an outcome variable (predictor) Dependent variable is the status of a measurable consequence (criterion). Moderators may alter the relationship between cause-and-effect variables. Mediator variables are defined as factors that intervene between the independent variable and the outcome variable in a causal chain. To avoid the implied meaning of a causal chain, researchers try to focus on functional relations/correlations instead.

Solomons Extended Control Group Design:


Investigates the possible sensitizing effects of pretests in pre-post-test designs. It is assumed that pretests might change the subjects attitudinal set. Therefore, it is argued that a threegroup or preferably a four group design is the (Solomon) way to go. (Control II = pretest control) The additional group in a four way design is control for history, or the effects of uncontrolled events that may be associated with the passage of time. (Control III = history control)

Threats to Internal Validity:


Internal validity and causal inference depends on operationalizing a reliable relationship and its presumed cause (covariation), the evidence of temporal precedence and the out ruling of possible rival explanations internal validity. i. Regression toward the mean has not to do with actual scores, but rather with predicted ones. It occurs when pre and post variables consist of the same measure taken at two points in time and the correlation between both is smaller than 1. The following four threats to internal validity are all diminished by using a Solomon design. History implies a source of error attributable to an uncontrolled event that occurs between pre and post measurement and can bias the post measurement. Maturation described the intrinsic changes in the subjects and is confounded with internal validity as target. Instrumentation refers to intrinsic changes in the measurement instruments.
[20]

ii. iii. iv.

v.

Selection is a potential threat when there are unsuspected differences between the participants in each condition; random allocation is not a guarantee of comparability between groups.

Threats to External Validity:


External validity is the validity for inferences about the possibility to generalize findings. Three issues are conflated in this broader use of the term. i. ii. iii. Statistical generalizability refers to the representativeness of the results to a wider population. Conceptual replicability or robustness. Realism, which can be divided into mundane realism (extent of analogous meaning from laboratory to natural setting) and experimental realism (psychological impact of the experimental manipulation on the participants.

Representative research design describes an idealized experimental model in which everything is perfect. Ecologically valid are experiments that satisfy this criterion of representativeness. Single stimulus design (e.g. use of experimenters of one sex) has two major limitations. i. If there are differences in results after using a second stimulus (female experimenter) we cannot conclude if the results are still valid or confounded which resembles a threat to internal validity. If we fail to find differences, this could be due to the presence of an uncontrolled stimulus variable operation either to counteract or to increase an effect artificially to a ceiling value.

ii.

The use of convenience samples is topic of major debate (see convenience sample mice, Hull/Tollman)

Statistical Conclusion and Construct Validity:


Statistical conclusion validity is concerned with inferences about the correlation between treatment and outcome (Humes, contiguity of events). See statistical power, fishing for statistically significant effects, etc. Construct validity refers to higher order constructs that represent sampling participants. Thus, whether a test measures the characteristics that it is supposed to measure.

Subject and Experimenter Artifacts:


Artifacts are factors that are evenly distributed across conditions and result in inaccurate findings. Subject and experimenter artifacts are systematic errors due to uncontrolled subject- or experimenter-related variables.
[21]

Hawthorne effect describes that human subjects behave in a special way because they know they are subjects and under investigation. Saul Rosenzweig (one badass name!) describes three artifacts. i. ii. iii. Observational attitude of the experimenter. Motivational attitude. Errors of personality influence (e.g. warmth or coolness of the experimenter).

Dustbowl empiricist view that emphasizes only observable responses as acceptable data in science leaving all the cognitive accounts.

Demand Characteristics and Their Control:


Demand characteristics are subtle, uncontrolled task orienting cues in an experimental situation. Good subject effect is cooperative behavior is used to support the view of the authority conducting the research. Quasi-control strategy has the idea that some of the participants step out of the good subject role and act as co-investigators in the search of truth. Thus, having participants serve as their own control and maybe disclose the factors that determined their behavior later on. Preinquiriy is another use of quasi control, meaning that some of the prospective participants are sampled and afterwards separated from the pool. Evaluation apprehensions are spurious effects resulting from the participants anxieties about how they would be evaluated. In some experimental situations some subjects may feel a conflict between evaluation apprehension and the good subject effect (looking good versus doing good).

Interactional Experimenter Effects:


Interactional experimenter effects are ate least to some extend attributable to the interaction between experimenters and their subjects. There are five classes of effects. i. ii. iii. iv. Biosocial attributes including the biological and social characteristics of experimenters, such as gender, age, and race. Psychosocial attributes include factors such as personality and temperament. Situational effects refer to the overall setting, including the experience of the researcher. Modeling effect. It sometimes happens that before the experimenters conduct their studies, they try out the tasks themselves. If a modeling effect occurs, it is most likely to be patterned on the researcher opinion.
[22]

The use of replications is one amongst the most powerful tools available to control for these kinds of artifacts. Experimenter expectancy is a virtual constant in science and may lead to a self-fulfilling prophecy.

Experimenter Expectancy Effects and Their Control:


Several strategies available to control for the effects of experimenters expectancies. i. ii. iii. iv. v. vi. Increasing the number of experimenters. Monitoring the behavior of experimenters. Analyzing experiments for order effects. Maintaining (double-) blind contact. Minimizing experimenter subject contact. Employing expectancy control groups.

[23]

Chapter 8
Nonrandomized Research and Functional Relationships
Nonrandomized and Quasi-Experimental Studies:
Quasi-experimental refers experiments that are lacking the full control over the scheduling of experimental stimuli that make randomized experiments possible. Association implies covariation to some degree. Methodological pluralism is necessary because all research designs are limited in some way (it depends). There are four types of nonrandomized strategies. i. ii. Nonequivalent groups design, in this case, the researchers dont have any control about the assignments to groups (historical control trials). Interrupted time-series designs uses large numbers of consecutive outcome measures that are interrupted by a critical intervention. The objective is to assess the causal impact of the intervention by comparing before and after measurement. Single-case studies primarily used as detection experiments and frequently in Neuroscience. Correlational design, is characterized by the simultaneous observation of interventions and their possible outcomes (retrospective covariation of X and Y).

iii. iv.

Diachronic research is the tracking of the variable of interest over successive periods of time. Synchronic research is the name for studies that take a slice of time and examine behavior only at one point.

Nonequivalent Groups and Historical Controls:


Nonequivalent-groups design in addition to their resemblance to nonrandomized between-groups experiments is that there is usually a pre and post observation or measurement. But it is not always possible to not assign certain people to treatments (Ethics). Furthermore, self-selection or assignment biases can introduce problems. One way to overcome these obstacles is to introduce randomization after assignment or through a wait-list control design. Historical controls or literature controls are often uninterruptable and dangerously misleading (see clinical data). Net effects may often mask true individual effects; it can lead to spurious conclusions because of a statistical irony named Simpsons paradox. The raw data should not be pooled before the individual results are first examined.
[24]

Interrupted Time Series and the Autoagressive Integrated Moving Average:


Interrupted time-series designs makes use of sampled measurements obtained at different time intervals before and after the intervention. Time series because there is a single data point for each time and interrupted because there is a clear cut for the intervention (choose a sampling interval that will capture the effects of interest). First step is to define the period of observation, then obtain the data. Last step is using identification of underlying serial effects (ARIMA) and checking the fitted model. Pulse function is an abrupt change that lasts only a short time. A series of observations must be stationary which means that it is assumed that values of the observations fluctuate normally about the mean as opposed to systematically drifting upward or downward. Secular trend are systematic in/decreases in the level of the series. A secular trend can be made stationary by differencing. The procedure of differencing consists of subtracting the first observation from the second and so forth. Autocorrelation refers to the extent to which data points or observations are dependent on one another or can be assumed to be independent. i. ii. Regular describes the dependency of adjacent observations or data points on one another. Seasonal describes the dependency of observations separated by one period or cycle.

Single-Case Experimental Designs:


Single-case experimental studies involve repeated observations but on a single unit (case), or in a few ones. Individual behavior is first assessed under specific baseline conditions against which any subsequent changes in behavior can be evaluated after an environmental treatment is manipulated.0 A-B-A, also called reversal design where A is no treatment (baseline) and B is treatment phase can be extended or changed in order whatever is most suitable. If the treatment is counterproductive or ineffective, the researcher can terminate the environmental manipulation or alter the scheduling of events. These kinds of studies are mostly cost-effective but still time consuming and might not be generalizable (thus, the need for replication). Direct replication which means the repetition of the same study. Systematic replication refers to the variation of an aspect of the previous study.

Cross-lagged Correlational Designs:

[25]

Correlational designs and cross-lagged panel designs are frequently used in behavioral sciences. Cross lagged implies that some data points are treated as temporally lagged values of the outcome measures. Panel design is another name for longitudinal research (increased precision of treatment and added time component in order to be able to detect temporary changes). Observed bivariate correlations can be too high, too low, spurious or accurate (causation [?]) depending on the pattern of relationship among the variables in the structure that actually generated the data. The absence of correlation in cross-lagged designs is not proof of the absence of causation. Three sets of paired correlations are represented. i. ii. iii. Test-retest correlations (ra1a2, rb1b2) Synchronous correlations (ra1b1, ra2b2) Cross lagged correlations (ra1b2, rb1a2)

In fact, relationships are seldom stationary, but instead are usually lower over longer lapses of time-described as temporal erosion; the reduction is also called attenuation and the leftover is referred to as residual.

Invisible Variables and the Mediation Problem:


Path analysis necessary (questionable) concept because by simply observing variables one cannot assess causality. Therefore, via removing the alternate pathways the goal is to settle for the most probable one which is then used to infer causality. Third-variable problem might be also characterized as the invisible variables problem because there could be more than one hidden confounding variable. Any variable that is correlated with both A and B may be the cause of both. The possibility that a causal effect of some variable X on outcome variable Y is explained by a mediator variable that is presumed to intervene between X and Y. The estimation of parameters is recommended by using the procedure of Bootstrapping or Jackknife. It is not possible to prove that a causal hypothesis is true, but it might be possible to reject untenable hypotheses and, in this way, narrow down the number of plausible hypotheses.

The Cohort in Longitudinal Research:


Longitudinal research has the purpose to examine peoples responses over an extended time span. Cohort or generation is a group of people born about the same time and having had similar life experiences.

[26]

Longitudinal data are usually collected prospectively but the data can also be gotten retrospectively from historical records.

Different forms of Cohort Studies:


A cohort table provides basic data and enables the researcher to detect important differences between cross-sectional and cohort designs. The concepts of age, cohort, and period may not be operationally defined the same way in different fields. Age effect implies changes in average responses due to the natural aging process. Time of measurement effect implies some kind of impact of events in chronological time. Cohort effect implies past history specific to a particular generation These three effects cannot be estimated simultaneously in any of these (following) designs. i. ii. iii. iv. v. Simple cross-sectional design, subjects at different ages are observed at the same time. Simple longitudinal design, subjects of the same cohort are observed over several periods. Cohort-sequential design, several cohorts are studied, the initial measurements being taken in successive years. Time-sequential design, subjects at different ages are observed at different times. Cross-sequential designs, several different cohorts that are observed over several periods are initially measured in the same period.

Subclassification on Propensity Scores:


Subclassification; it is claimed that via increasing the number of subclasses the precision of the analysis increases. Five or six subclasses ordinarily reducing the bias in the raw comparison by some 90%.

Multiple Confounding Covariates:


Combine all confounding variables into a single variable. In work with multiple confounded covariates, the propensity score is computed from the prediction of group membership scored one or zero. It does not require a particular kind of relationship between the covariate and the outcome within each condition, whereas the regression approach does. The major limitation of the propensity score method is that it can adjust only for observed confounding covariates.
[27]

Chapter 9
Randomly and Nonrandomly Selected Sampling Units
Sampling a Small Part of the Whole:
Probability sampling, here i. ii. iii. Every sampling unit has a known nonzero probability of being selected. The units are randomly drawn. The probabilities are taken into account in making estimates from the sample.

Convenience samples are raising the concern about generalizability (e.g. student samples). Paradox of sampling implies that the sample is of no use when its not representative of the population. In order to know that its representative, you need to know the whole population characteristics, but then you dont need the sample in the first place. Sampling plans specifies how the respondent will be selected. This is one way to overcome the paradox of sampling by consideration or the procedure by which the sample is obtained.

Bias and Instability in Surveys:


Point estimates are central values of frequency distributions. Interval estimates are the corresponding variability of the underlying distribution (margin of error [SE*crit.]). True population value is the point value we would obtain based on analyzing all the score in the population. Bias is the difference between the true population value and our estimate of it from a sampling distribution. Stability refers to the actual variation in data. Precision describes the bias which leads to estimation that is systematically above or below the true value. Instability results when the observations within a sample are highly variable and the number of observations is small. The more homogenous the members of the population, the fewer of them need to be sampled.

Simple Random-Sampling Plans:


Simple random sampling meaning that the sample is selected from an undivided population (simple) and chosen by process that gives every unit in the population the same chance of being selected (random).
[28]

Random sampling without replacement describes the procedure in which a selected unit cannot be reselected and must be disregarded on any later draw. Random sampling with replacement refers to the possibility to reselect a unit.

Improving Accuracy in Random Sampling:


In doing stratified random sampling we divide the population into a number of parts and randomly sample in each part independently (unbiased procedure). It can pay off in an improved accuracy by enabling researchers to randomly sample strata that are less variable than the original population.

Speaking of Confidence Intervals:


Bayesian: We can be % confident/certain that the population value we are trying to estimate falls between those lower and upper limits. Frequentist: With repeated samplings for each sample of % confidence intervals, we will be correct in % of our samples when we say that the quantity estimated will fall within the % confidence interval.

Other Selection Procedures:


Area probability sampling, the population is divided into selected units that have the same probability of being chosen as the unselected units in the population cluster, all done in stages (multistage cluster sampling). Systematic sampling involves methodological selection of the sampling units in sequences separated on lists, by the interval of selection. It starts randomly and is based on the selection of units from the population at particular intervals. Haphazard/Fortuitous samples are amongst the most common designs of nonrandom selection (e.g. informal polls). Quota sampling involves obtaining specified numbers of respondents to create a sample that is roughly proportional to the population.

Nonresponse Bias and its Control:


Increasing the effort to recruit the nonrespondents decreases the bias of point estimates in the sample, independent on what design you are using. We are often in a position to suspect bias but unable to give an estimate of its magnitude. Prepaid monetary incentives produce higher response rates than promised incentives or gifts offered with the initial encounter. A proposed theoretical model was based on Walds model of nonresponse.

Studying the Volunteer Subject:

[29]

The volunteer subject problem can be understood as a variant on the problem of nonresponse bias. Approaches to compare volunteers and nonvolunteers range from looking through archives, recruiting volunteers, to second-stage volunteers comparisons etc. Volunteering might have both general and specific predictors.

Characteristics of the Volunteer Subject:


There are conclusions warranting maximum/considerable/some/minimum confidence for the characteristics of volunteers. The maximum are following. i. ii. iii. Better educated, and higher social class status. More intelligent. Higher in the need for social approval and more sociable.

Implications for the Interpretation of Research Findings:


Merely increasing the size of the sample of volunteers will not reduce the bias, but an effort to recruit more nonvolunteers or to use probability sampling as well as attempting to reduce the nonresponse rate would target this problem.

Situational Correlates and the Reduction of Volunteer Bias:


There are conclusions warranting maximum/considerable/some/minimum confidence for the situational correlates of volunteering. The maximum and considerable ones are i. ii. iii. iv. v. More interested in the topic. Higher expectation to be favorably evaluated by the investigator. Perceiving the investigation as important. Feeling states influence participation (e.g. guilt triggers likelihood). An incentive increasing the likelihood and stable personality moderates this.

Explaining participants the significance of the research results in giving up trivial research and increases the likelihood of authentic participation.

The Problem of Missing Data:


The primary problem of missing data is the introduction of bias into the estimations; an additional problem is decreased statistical power. MCAR, missingness unrelated to any variables of interest (missing completely at random). MAR, missingness is related, but can be accounted for by other variables (missing at random).
[30]

MNAR, missingness related to variables of substantial interest and cannot be fully accounted for by other variables (missing not at random).

Procedures for Dealing with Missing Data:


Two approaches, either nonimputational or imputational. Nonimputational can be listwise deletion (drop all) or pairwise deletion (compute missing from available). The latter one requires the data to be MCAR to yield unbiased estimates. Maximum likelihood and Bayesian estimation are two procedures which can lead unbiased results when the data are MAR. Imputational can be single or multiple. Single imputational procedures can be further subdivided into four alternative procedures. i. ii. Mean substitution procedure, here are all missing values for any given variable replaced by the mean value of that variable (only if MCAR). Regression substitution procedure, all missing values are replaced by the predicted value of that variable from a regression analysis using only cases with no missing data (only if MCAR). Stochastic regression imputation adds a random residual term to the estimates based on regression substitution and often yields more accurate analyses than regression substitution. Hot deck imputation is a procedure in which cases without missing data are matched to the cases with the missing data.

iii.

iv.

In multiple imputation each missing observation is replaced not by a single estimate, but by a set of m reasonable estimates that will yield m pseudocomplete data sets; these are later on combined to obtain a more accurate estimate of variability than what is possible with single imputational techniques. These procedures tend to be much simpler computationally than Bayesian or Maximum Likelihood Estimation and most useful.

[31]

Das könnte Ihnen auch gefallen