The Development of A Test On Critical Thinking: August 2008

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/301354312
The Development of a Test on Critical Thinking
Conference Paper · August 2008
CITATIONS READS
0 322
2 authors:
Fe Josefa Nava Jaime Jose Guanzon Nicdao

University of the Philippines Ateneo de Manila University
31 PUBLICATIONS 12 CITATIONS 1 PUBLICATION 0 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
ICT IN EDUCATION View project
criterion-related and construct-related validity of a parental involvement scale View project
All content following this page was uploaded by Fe Josefa Nava on 18 April 2016.
The user has requested enhancement of the downloaded file.

The Development of a Test on Critical Thinking: A Test Development Project*
Jaime Jose G. Nicdao, MA

Ateneo de Manila University
Fe’ Josefa G. Nava, Ph.D.

Cesar Baltazar
Sara Ma
University of the Philippines - Diliman
The measurement of psychological traits of students such as critical
thinking is important to educational practice.
Critical thinking is described as purposeful, reasoned, and goal directed
thinking – the kind of thinking involved in solving problems, formulating
inferences, calculating likelihoods, and making decisions (Halpern, 1989).
Experts agree that “critical thinking is a purposeful, self-regulatory
judgment which results in interpretation, analysis, evaluation, and inference, as
well as explanations of the evidential, conceptual, methodological, criteriological,
or contextual considerations upon which judgment is based. It is essential as a
tool of inquiry (American Philosophical Association, 1990).
It is a liberating force in education and a powerful resource and in one’s
personal and civic life (Facione, 1998). In this sense, critical thinking cuts
across specific subjects or disciplines. It has applications in all areas of life and
learning (Halpern, 1989).
This paper describes the procedures followed in the development of a
Critical Thinking Test.
Through a review of literature important components of critical thinking
were identified.
st
*Paper presented in the 1 National Conference on Educational Measurement and Evaluation (NCEME) –
Developing a Culture of Assessment in Learning Organizations, August 6-7 2008, College of St. Benilde –
International Conference Center
The Delphi Report, Critical Thinking: A Statement of Expert Consensus
for Purposes of Educational Assessment and Instruction (Facione, 1990), cited
what skills constitute the core of critical thinking. It explained that experts were
virtually unanimous (N>95%) on including analysis, evaluation, and inference as
central to critical thinking. It said that strong consensus (N>87%) exists that
interpretation, explanation and self-regulation are also central to critical thinking.
On the other hand, the California Critical Thinking Skills Test (Facione, P., et. al.
1990, 1998 ) provided scores on sub-scales named analysis, inference,
evaluation, deductive reasoning and inductive reasoning.
The current study has adopted the California Critical Thinking Skills
Test categories for a reason. It seems that the skill categories cover analysis,
inference, and evaluation which provide alternative ways of dealing with the
challenging situations of today while at the same time it includes the time-tested
ways of reasoning which are deduction and induction.
In this context, the CCTST Form 2000 defines the skill categories in
this way: Inductive reasoning happens when a person decides that the evidence
at hand means that a given conclusion is probably true. Deductive reasoning
happens when a person decides that, no matter what, it is impossible that the
conclusion he is considering is false, given that the evidence at hand means that
a given conclusion is probably true. Analysis is pulling apart arguments and
points of view to show why a person thinks what he or she thinks. It is separating
the premises and the assumptions a person is using from the claim or the
conclusion that the person is reaching. Inference happens when a person draws
conclusions based on reasons and evidence. The person might be using his
deductive reasoning inference skills or his inductive reasoning skills. He can
2
apply to all sorts of things including beliefs, opinions, facts, conjectures,
principles, and assumptions. Evaluation happens when a person decides how
strong or how weak another person’s arguments are, or when he determines the
believability of a given statement.
The Table of Specifications contained five subparts namely Analysis,
Inference, Evaluation, Deduction, and Induction. Analysis was subdivided into a)
able to distinguish relevant information from irrelevant information to solve a
problem b) able to identify supporting details for the main idea, and c) able to
identify cause and effect relationship. Inference was subdivided into a) able to
see the implications of a position someone is advocating, or drawing out meaning
from the elements in a situation b) able to determine a person’s trait or emotion
based on details given, and c) able to make appropriate prediction based on the
information given. Evaluation was subdivided into a) able to judge if a given idea
or opinion is relevant or applicable b) able to make fair judgments as an impartial
observer, and c) able to assess alternatives or courses of action. Deduction was
subdivided into a) able to apply general rules to specific problems to come up
with logical answers b) able to reach a valid conclusion by examining orderly
relationships among terms (linear ordering), and c) able to reach a valid
conclusion by examining contingency relationships in “if – then” statements.
Induction was subdivided into a) able to combine separate pieces of information,
or specific answers to problems, to form general rules or conclusions b) able to
come up with logical explanation for why seemingly unrelated events occur
together, and c) able to determine if conclusion is incorrect by examining given
premises.
A test on critical thinking was developed using multiple-choice
3
format. From a review of literature on the domains of critical thinking, an item
pool of more than 20 questions were developed. The number of items for each
main category was set at 4 for a total of 20 items. Four scenarios were created
out of which the 20 questions were generated. The scenarios depicted situations
that grade school level children would typically encounter in school, at home, and
in the community.
The following is an example of an Analytic Question based on a given
scenario:
After school, Roger played with his neighbors. On his way home, he
discovered that the expensive ball pen given to him as a gift by his mother
was missing.
Which of the following information could best help him find his pen?
A. The names of his neighbors

B. The time he played with his neighbors
C. The time he last used the pen
D. The places at school where he went to
Here is an example of an Evaluative Question based on a given scenario:
During a test in class, Brian asked his seatmate Eric to pass a piece of
paper to Greg. Seeing Eric do it, the teacher got hold of the paper and tore
it to pieces.
If Brian is a fair student, which of the following statements would best

describe the action he will take?
A. He will keep quiet about what happened.

B. He will apologize to the teacher.
C. He will apologize to Greg.
D. He will pick up the pieces of paper.
The first pilot testing of the instrument was conducted on seventy-six (n =
76) Grade 6 students in an exclusive male private school in Metro Manila. The
reliability of the test was at a low level with a Cronbach’s Alpha value of 0.123.
4
An additional sample from a more heterogeneous group was selected. Seventy-
six (n = 76) Grade 6 pupils from a public elementary school responded to the
test.
One of the most important conditions of good psychological measures is
reliability. Reliability is the characteristic of a test that indicates the extent to
which it measures the trait with consistency. The reliability of a test is a
necessary precondition to validity. If a test is not reliable then it may not be able
to measure the trait as it should.
Reliability can be estimated using different techniques. Two easy
methods involve giving the same test twice (test-retest or stability) or giving
parallel versions of the same test (equivalence). However, these approaches
require two test administrations. In contrast, internal consistency procedures
require only one test administration.
The variability for the sample of the public school pupils can be explained
by a large total score range, standard deviation, and variance. This indicates the
heterogeneity of the public school sample.
After the test was administered to the public school students, the resulting
reliability index increased to 0.674.
Variability is an important aspect that affects measurement reliability
(Anastasi, 1982). It is the nature of the sample or the range of individual
differences on which reliability is measured. Reliability is influenced by the
degree to which respondents or examinees vary. It is often recommended that
reliability estimates are computed on a heterogeneous sample. Fan and Yin
(2001) made the following conclusions in their study: 1. sample variability with
5
regard to the trait being measured has obvious effect on measurement reliability,
with measurement reliability being reduced by group variability restriction, and 2.
group performance level also appears to affect measurement reliability, and
measurement error tends to be smaller for high-performance samples than for
low performance samples.
Validity is another vital aspect of an instrument. Validity is the
characteristic of a test that indicates the extent to which it measures what it is
supposed to measure (Anastasi, 1982).
One type of validity evidence is content validity, which is essentially an
expert’s qualitative evaluation of the items vis-à-vis the table of specifications. A
professor in psychology reviewed the items and gave comments and suggestions
to improve the instrument. Another way of establishing validity is through factor
analysis. It is a method developed to identify psychological traits that would lead
to construct validity (Anastasi, 1982). It is a refined statistical technique for
analyzing the interrelationships of behavior data. Coefficient Alpha reliability is
used to provide evidence for unidimensionality. If a test does not measure a
single construct, it cannot measure the particular construct it is intended to
measure (Cunningham, 1998).
Following the subparts in the table of specification, a 5-factor solution did
not clearly distinguish among the factors. Hence, a 4-factor solution was applied.
The four factors were: Factor 1 – Inferential Thinking (9 items), Factor 2 –
Evaluative Thinking (4 items), Factor 3 – Logical Thinking (3 items), and Factor 4
– Analytical Thinking (4 items).
Two items (Item 10 & Item 11) were deleted. These items were chosen on
the basis of their effect on the reliability; items which yielded the greatest
6
increase in the reliability coefficient when deleted. Thus the final reliability index
for the 18 items was 0.755. The final form measures four dimensions (Table 1).
Table 1. Subparts of the Final Critical Thinking Test
Subparts Objective Item # # of Items

(Factors)
1. Analytical 1a 15, 1, and 4 3
Thinking 1b 20 1
2. Inferential 2a 18, 13, and 3 3
Thinking 2b 7 and 16 2
2c 12, 14, and 2 3
3. Evaluative 3b 9 1
Thinking 3c 8 and 19 2
4. Logical Thinking 4a 6 1
4c 17 1
5a 5 1
TOTAL (Alpha = 0.755) 18
7
References
American Philosophical Association. (1990). Critical Thinking: A Statement of

Expert Consensus for Purposes of Education Assessment and Instruction.
“The Delphi Report” Committee on Pre-College Philosophy. (ERIC
document 315-423).
American Philosophical Association. (1990). Critical Thinking: A Statement of

Expert Consensus for Purposes of Educational Assessment and
Instruction. (ERIC document ED 315-423).
Anastasi, Anne. (1982). Psychological Testing 5th Edition. New York: MacMillan
Publishing Co., Inc.
Cunningham, George K. (1998). Constructing and Interpreting Tests. London:

Falmer Press.
Facione, P.A., et.al. The California Critical Thinking Skills Test. CCTST, Forms A
and B; and the CCTST Test Manual. Millbrae, CA: California Academic
Press. 1990, 1998.
Facione, P. A. & Facione, N. C. (2000). The Disposition toward Thinking: Its

Character, Measurement, and Relationship to Critical Thinking Skills.
Santa Clara CA: California:Academic Press.
Fan, Xitao & Yin, Ping. Sample Characteristics and Measurement

Reliability: An Empirical Exploration. Paper presented at the Annual
Meeting of the American Educational Research Association (Seattle, WA,
April 10-14, 2001).
Halpern, D. F. (1989). Thought and Knowledge: An Introduction to Critical

Thinking. Hillsdale, NJ: Lawrence Erlbaum Associates Publishers.
McMillan, James H. (2001). Classroom Assessment, Principles and Practice for

Effective Instruction. MA: Allyn and Bacon.
View publication stats

The Development of A Test On Critical Thinking: August 2008

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

The Development of A Test On Critical Thinking: August 2008

Hochgeladen von

Copyright:

Verfügbare Formate

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

The Development of a Test on Critical Thinking

Conference Paper · August 2008

Fe Josefa Nava Jaime Jose Guanzon Nicdao

SEE PROFILE SEE PROFILE

ICT IN EDUCATION View project

criterion-related and construct-related validity of a parental involvement scale View project

The user has requested enhancement of the downloaded file.

Jaime Jose G. Nicdao, MA

Fe’ Josefa G. Nava, Ph.D.

The measurement of psychological traits of students such as critical

thinking is important to educational practice.

Critical thinking is described as purposeful, reasoned, and goal directed

thinking – the kind of thinking involved in solving problems, formulating

inferences, calculating likelihoods, and making decisions (Halpern, 1989).

Experts agree that “critical thinking is a purposeful, self-regulatory

judgment which results in interpretation, analysis, evaluation, and inference, as

well as explanations of the evidential, conceptual, methodological, criteriological,

or contextual considerations upon which judgment is based. It is essential as a

tool of inquiry (American Philosophical Association, 1990).

It is a liberating force in education and a powerful resource and in one’s

learning (Halpern, 1989).

This paper describes the procedures followed in the development of a

Critical Thinking Test.

Through a review of literature important components of critical thinking

for Purposes of Educational Assessment and Instruction (Facione, 1990), cited

virtually unanimous (N>95%) on including analysis, evaluation, and inference as

interpretation, explanation and self-regulation are also central to critical thinking.

1990, 1998 ) provided scores on sub-scales named analysis, inference,

evaluation, deductive reasoning and inductive reasoning.

ways of reasoning which are deduction and induction.

at hand means that a given conclusion is probably true. Deductive reasoning

a given conclusion is probably true. Analysis is pulling apart arguments and

deductive reasoning inference skills or his inductive reasoning skills. He can

principles, and assumptions. Evaluation happens when a person decides how

believability of a given statement.

The Table of Specifications contained five subparts namely Analysis,

Inference, Evaluation, Deduction, and Induction. Analysis was subdivided into a)

able to distinguish relevant information from irrelevant information to solve a

see the implications of a position someone is advocating, or drawing out meaning

from the elements in a situation b) able to determine a person’s trait or emotion

or opinion is relevant or applicable b) able to make fair judgments as an impartial

observer, and c) able to assess alternatives or courses of action. Deduction was

subdivided into a) able to apply general rules to specific problems to come up

with logical answers b) able to reach a valid conclusion by examining orderly

relationships among terms (linear ordering), and c) able to reach a valid

conclusion by examining contingency relationships in “if – then” statements.

Induction was subdivided into a) able to combine separate pieces of information,

or specific answers to problems, to form general rules or conclusions b) able to

together, and c) able to determine if conclusion is incorrect by examining given

A test on critical thinking was developed using multiple-choice

The following is an example of an Analytic Question based on a given

A. The names of his neighbors

Here is an example of an Evaluative Question based on a given scenario:

If Brian is a fair student, which of the following statements would best

A. He will keep quiet about what happened.

The first pilot testing of the instrument was conducted on seventy-six (n =

One of the most important conditions of good psychological measures is

reliability. Reliability is the characteristic of a test that indicates the extent to

which it measures the trait with consistency. The reliability of a test is a

to measure the trait as it should.

Reliability can be estimated using different techniques. Two easy