CHAPTER
I
INTRODUCTION
A.
Background
The five principles of practicality, reliability, validity,
authenticity, and washback go a long way toward providing useful guidelines for
both evaluating an existing assessment procedure and designing one on your own.
Quizzes, tests, final exams, and standardized proficiency tests can all be
scrutinized through these five lenses.
Are there other
principles that should be invoked in evaluating and designing assessments? The
answer, of course, is yes. Language assessment is an extraordinarily broad
discipline with many branches, interest areas, and issues. The process of
designing effective assessment instruments is far so complex to be reduced to
five principles. Good test construction, for example, is governed by
research-based rules of test preparation, sampling of tasks, item design and
construction, scoring responses, ethical standards, and so on. But the five
principles cited here serve as an excellent foundation on which to evaluate
existing instruments and to build your own.
We will look at how to
design tests in Chapter 3 and at standardized tests in Chapter 4. The questions
that follow here, indexed by the five principles, will help you evaluate
existing tests for your own classroom. It is important for you to remember,
however, that the sequence of these questions does not imply a priority order.
Validity, for example, is certainly the most significant cardinal principle of
assessment evaluation. Practicality may be a secondary issue in classroom
testing. Or, for a particular test, you may need to place authenticity as your
primary consideration. When all is said and done, however, if validity is not
substantiated, all other considerations may be rendered useless.
B. Problem Formulation
1. Are The Test Procedures
Practical?
2.
Is The Test Reliable?
3.
Does The Procedure Demonstrate Content Validity?
4.
Is The Procedure Face Valid And “Biased For Best”?
5. Are The Test
Tasks As Authentic As Possible ?
6.
Does
The Test Offer Beneficial Washback To The Learner?
CHAPTER II
DISCUSSION
APPLYING PRINCIPLES TO THE EVALUATION OF CLASSROOM TESTS
A.
Are The Test
Procedures Practical?
Practicality is
determined by the teacher’s (and the student’s) time constraints, costs, and
administrative details, and to some extent by what occurs before and after the
test. To determine whether a test is practical for your needs, you may want to
use the checklist below.
Practicality checklist :
1. Are administrative
details clearly established before the test?
2. Can students complete
the test reasonably within the set time frame?
3. Can the test be
administered smoothly, without procedural “gliches”?
4. Are all materials and
equipment ready?
5. Is the cost of the test
within budgeted limits?
6. Is the
scoring/evaluation system feasible in the teacher’s time frame?
7.
Are methods for reporting result determined in advance?
As the checklist
suggests, after you account for the administrative details of giving a test,
you need to think about the practicality of your plans for scoring test. In teachers’
busy lives, time often emerges as the most important factor, one that over
rides other considerations in evaluating an assessment. If you need tailor a
test to fit your own time frame, as teachers frequently do, you need to
accomplish his without damaging the tests’ validity and washback. Teachers
should, for example, avoid the temptation to offer only quickly scored
multiple-choice selection items that may be neither appropriate nor
well-designed. Everyone knows teachers secretly hate to grade tests (almost as
much as students hate to make them! ) and will do almost anything to get
through that task as quickly and effortlessly as possible. Yet good teaching
almost always implies an investment of the teacher’s time in giving
feedback-comment and suggestions-to students on their tests.
B.
Is The Test
Reliable?
Reliability applies to
both the test and the teacher, and at least for source of unreliability must be
guarded against, as noted in the second section of this chapter. Test and test
administration reliability can be achieved by making sure that all students
receive the same quality of input, whether written or auditory. Part of
achieving test reliability depends on the physical context-making sure, for
example that
Every student has a
cleanly photocopied test sheet, Sound amplification is clearly audible to everyone in the
room, Video input is equally visible to all,
Lighting, temperature, extraneous noise, and other
classroom conditions are equal (and optimal) for ll students, and Objective scoring procedures leave little debate about
correctness of an answer.
Rather reliability,
another common issue in assessments, may be more difficult, perhaps because we
too often overlook this as an issue. Since classroom tests rarely involve two
scorers, inter-rater reliability is seldom an issue. Instead, intra-rater
reliability is of constant concern to teachers: What happens to our fallible
concentration and stamina over the period of time during which we are
evaluating a test? Teachers need to find ways to maintain their concentration
and stamina over the time it takes to score assessments. In open-ended response
tests, this issue is of paramount importance. It is easy to let mentally
established standards erode over the hours you require to evaluate the test. Intra-rater reliability for
open-ended responses may be enhanced by the following guidelines: Use consistent sets of criteria for a correct response.
Give uniform attention to those sets throughout the evaluation time.
Read through tests at least twice to check for your consistency.
If you have made “mid-stream” modifications of what you consider as a correct response, go back and apply the same standards to all. Avoid fatigue by reading the tests in several sittings, especially if the time requirement is a mtter of several hours.
Give uniform attention to those sets throughout the evaluation time.
Read through tests at least twice to check for your consistency.
If you have made “mid-stream” modifications of what you consider as a correct response, go back and apply the same standards to all. Avoid fatigue by reading the tests in several sittings, especially if the time requirement is a mtter of several hours.
C.
Does The Procedure
Demonstrate Content Validity?
The major source of
validity in a classroom test is content validity: the extent to which the
assessment requires students to perform tasks that were included in the
previous classroom lessons and that directly represent the objective of the
unit on which the assessment is based. If you have been teaching an English
language class to fifth graders who have been reading, summarizing, and
responding to short passages, and if your assessment is based on this work,
then to be content valid, and test needs to include performance in those
skills. There
are two steps to evaluating the content validity of a classroom test.
1. Are classroom objectives identified and appropriately
framed?
Underlying every good classroom test are the objectives of the lesson, module,
or unit of the course in question. So the first measure of an effective
classroom test is the identification of objectives. Sometimes this is easier
said than done. Too often teachers work trough lessons day after day with
little or no cognizance of the objectives they seek to fulfill. Or perhaps
those objectives are so poorly framed that determining whether or not they were
accomplished is impossible. Consider the following objectives for lesson, all
of which appeared on lesson plans designed by students in teacher preparation
programs:
a.
Students should be able to demonstrate some reading
comprehension.
b.
To practice vocabulary in context.
c.
Students will have fun trough a relaxed activity and thus
enjoy their learning.
d.
To give students a drill on the / i / - / I / contrast.
e.
Students will produce yes / no questions with final rising
intonation.
Only the last objective
is framed in a form that lends itself to assessment. In (a), the modal should
is ambiguous and the expected performance is not stated. In (b), everyone can
fulfill the act of “practicing”; no standards are stated or implied. For
obvious reasons, (c) cannot be assessed. And (d) is really just a teacher’s
note on the type of activity to be used.
Objective (e), on the other hand, include a
performance verb and a specific linguistic target. By specifying acceptable and
unacceptable levels of performance, the goal can be tested. An appropriate test
would elicit an adequate number of samples of student performance, have a
clearly framed of standards for evaluating the performance ( say, on a scale of
1 to 5), and provide some sort of feedback to the student.
2. Are lesson objectives represented in the form of test
specifications?
The next content-validity
issue that can be applied to the classroom test centers on the concept of test
specifications. Don’t let this word scare you. It simply means that a test
should have a structure that follows logically from the lesson or unit you are
testing. Many tests have a design that
divides them into a number of sections
(corresponding, perhaps, to the objectives that are being assessed), Offers students a variety of item types, and Give an appropriate relative weight to each section.
Some tests, of course,
do not lend themselves to this kind of structure. A test in a course in
academic writing at the university level might justifiably consist of an in
class-written essay on a given topic-only one ”item” and one response, in a
manner of speaking. But in this case the specs (specifications) would be
embedded in the prompt itself and in the scoring or evaluation rubric used to
grade it and give feedback. We will return to the concept of test specs in the
next chapter.
The content validity of
an existing classroom test should be apparent in how the objectives of the unit
being tested are represented in the form of the content of items, clusters of
items, and item types. Do you clearly perceive the performance of test-taker as
reflective of the classroom objectives? If so, and you can argue this, content
validity has probably been achieved.
D.
Is The Procedure
Face Valid And “Biased For Best”?
This question
integrates the concept of face validity with the importance of structuring an
assessment procedure to elicit the optimal performance of the student. Student
will generally judge a test to be face valid if:
Directions are clear, The structure of the test is
organized logically, Its difficulty level is
appropriately pitched, The test has no
“surprises”, and timing is appropriate.
A phrase that has come to be associated with face
validity is “biased for best” a term that goes a little beyond how the student
views the test to a degree to strategic involvement on the part of student and
teacher in preparing for, setting up, and following up on the test itself.
According to Swain (1984), to give an assessment procedure that is “biased for
best”, a teacher offers student appropriate
review and preparation for the test, Suggests
strategies that will be beneficial, and structures
the test so that the best students will be modestly challenged and the weaker
students will not be overwhelmed.
It’s easy for teachers
to forgot how challenging some tests can be, and so a well-planned testing
experience will include some strategic suggestions on how students might
optimize their performance. In evaluating a classroom test, consider the extent
to which before, during, and after-test options are fulfilled.
Test-taking strategies
Before the Test:
1) Give students all the information
you can about the test: Exactly what will the test cover? which topics will be
most important? What kind of items will be on it? How long will it be?
2) Encourage students to
do a systematic review of material. For example, they should skim the textbook
and other material, outline major points, write down examples.
3) Give them practice
tests or exercises, if available.
4) Facilitate formation of
a study group, if possible.
5) Caution students to get
a good night’s rest before the test.
6) Remind students to get
to the classroom early
During the Test
1) After the test is
distributed, tell students to look over the whole test quickly in order to get
a good grasp of its different parts.
2) Remind them to mentally
figure out how much time they will need for each part.
3) Advise them to
concentrate as carefully as possible.
4)
Warn students a few minutes before the end of the class
period so that they can finish on time, proofread their answers; and catch
careless errors.
After the Test:
1)
When you return the test, include feedback on specific things
the student did well, what he or she did not do well, and, if possible, the
reasons for your comments.
2)
Advise students to pay careful attention in class to whatever
you say about the test results.
3)
Encourage questions from students.
4)
Advise students to pay special attention in the future to
points on which they are weak.
Keep in mind that wht
comes before and after the test also contributes to its face validity. Good
class preparation will give students a comfort level with the test, and good
feedback-washback-will allow them to learn from it.
E.
Are The Test Tasks As Authentic As Possible ?
Evaluate
the extent to which a test is authentic by asking the following questions:
Consider
the following two excerpts from tests, and the concept of authenticity may
become a little clearer.
The sequence of items in the contextualized tasks
achieves a modicum of authenticity by contextualizing all the items in a story
line. The conversation is one that might occur in the real world, even if with
a little less formality. The sequence of items in the de contextualized tasks
takes the test-taker into five different topic areas with no context for any.
Each sentence is likely to be written or spoken in the real world, but no in
that sequence. Given the constraints of a multiple-choice format, on a measure
of authenticity I would say the first excerpt is “good” and the second excerpt
is only “fair”.
F. Does
The Test Offer Beneficial Washback
To The Learner?
The design of an
effective test should point the way to beneficial wash back. A test that
achieves content validity demonstrates relevance to the curriculum in question
and thereby sets the stage for wash back. When test items represent the various
objectives of a unit, and/or when sections of a test clearly focus on major
topics of the unit, classroom tests can serve in a diagnostic capacity even if
they aren’t specifically labelled as such.
Other evidence
of wash back may be less visible from an examination of the test itself. Here
again, what happens before and after the test is critical. Preparation time
before the test can contribute to wash back since the learner is reviewing and
focusing in a potentially broader way on the objectives in question. By
spending classroom time after the test reviewing the content, students discover
their areas of strength and weakness. Teachers can raise the wash back
potential by asking students to use test results as a guide to setting goals for
their future effort. The key is to play down the “Whew, I’m glad that’s over”
feeling that students are likely to have, and play up the learning that can now
take place from their knowledge of the results,
Some of the
“alternatives” in assessment referred to in Chapter 1 may also enhance wash
back from tests. (See also Chapter 10.) Self assessment may sometimes be an
appropriate way to challenge students to discover their own mistakes. This can
be particularly effective for writing performance: once the pressure of
assessment has come and gone, students may be able to look back on their
written work with a fresh eye. Peer discussion of the test results may also be
an alternative to simply listening to the teacher tell everyone what they got
right and wrong and why. Journal writing may offer students a specific place to
record their feelings, what they learned, and their resolutions for future
effort.
The five basic
principles of language assessment were expanded here into six essential
questions you might ask yourself about an assessment. As you use the principles
and the guidelines to evaluate various forms of tests and procedures, be sure
to allow each one of the five to take on greater or lesser importance,
depending on the context. In large-scale standardized testing, for example,
practicality is usually more important than wash back, but the reverse may be
true of a number of classroom tests. Validity is of course always the final
arbiter. And remember, too that these principles, important as they are, are
not the only considerations in evaluating or making an effective test. Leave
some space for other factors to enter in.
In the next
chapter, the focus is no how to design a test. These same five principles
underlie test construction as well as test evaluation, along with some new
facets that will expand your ability to apply principles to the practicalities
of language assessment in your own classroom.
CHAPTER III
CONCLUSION
Assessment
requires students to perform tasks that were included in previous lesson and
represent the objectives of the unit on which the assessment is based
•
Students
will judge a test to be valid if:
–
Directions
are clear
–
The
structure of the test is organized logically
–
Its
difficulty level is appropriately pitched
–
The
test has no “surprises”
–
Timing
is appropriate
•
Biased
for best: a term that goes a little beyond how the student views the test to a
degree of strategic involvement on the part of student and teacher in preparing
for, setting up, and following up on the test.
•
The
design of an effective test should point the way to beneficial washback.
A test that
achieves content validity demonstrates relevance to the curriculum in question
and sets the stage for washback







0 comments:
Post a Comment