|
News and events
Editor’s Note: FOCUS magazine, the official magazine
of the Society of Pharmaceutical and Biotech Trainers (SPBT),
contains the article High Stakes TEsting - Doing it Right
by Steven Just, President of Pedagogue Solutions. The article
is reprinted below.
High Stakes Testing – Doing
it Right
Approximately 15 years ago we first started
discussing with pharmaceutical companies the idea of using
testing to validate training programs and measure sales representatives’
knowledge acquisition. At that time we essentially had three
types of reactions: (1) some small number of companies “got
it” (2) a larger number of companies felt that testing
was a “nice to have, not a need to have,” or “something
that we might do some day but not now” and (3) a surprising
number of companies told us that testing sales representatives
was not “part of their corporate culture.”
Of course, this situation has changed dramatically
in these 15 years. Today, rare is the pharmaceutical company
that does not test its sales force on a regular basis. In
fact many pharmaceutical companies, for a variety of reasons
(perceived competitive advantage, compliance issues, pressure
from regulatory organizations), have moved in the opposite
direction and are now using testing in its high stakes form:
as an important element in career decisions (promotion and
dismissal). If you are using high stakes testing or are considering
its use you must be careful, systematic and knowledgeable
about basic testing theory – otherwise you are opening
your company up to potential legal jeopardy.
The key questions you must consider in high
stakes testing are:
- Are your tests fair,
valid, and reliable?
- Are your test questions
written to well-formed learning objectives?
- Have you written the
appropriate number of test questions to cover these learning
objectives?
- Are the test questions
written at the proper level of Bloom’s Taxonomy?
- Are the test questions
properly constructed?
- Have you used a defensible
methodology for setting a passing score?
- Have you done a post-exam
item analysis?
- What sort of policy
of remediation and consequences have you put in place?
- Has this policy been
communicated to the test takers?
- Have you consulted
with your in-house legal team?
- Are your test results
auditable?
Let’s look at each of these, briefly,
in turn:
Validity, Reliability and Fairness
Fairness is not generally a contentious issue in corporate
knowledge-based testing as long as all employees are exposed
to the same training programs, have the same learning resources
available to them and are expected to perform at the same
level of competency. It is worth noting that fairness does
become an important issue in skills evaluations, because human
raters (as opposed to computers) are doing the scoring.
There are many types of validity; the one most
relevant to this discussion is content validity. Content validity
is assured by writing well-formed questions to properly constructed
learning objectives. It is important to mention that validity
is not a quantitative measure; it does not return a numeric
result.
Reliability refers to consistency of results
over time, over multiple test forms, across items and among
evaluators (for performance-based tests). For statistical
reasons that are beyond the scope of this paper the reliability
of mastery (criterion-referenced tests) tends to be low relative
to norm-referenced tests because the scores tend to bunch
up at one end of the curve. From a pure statistical perspective
test reliability is maximized when the average test score
is 62.5% -- generally not an acceptable average to most of
our clients.
Learning Objectives
All training materials must have well formed learning objectives
and test questions must be written to these objectives. We
are often asked how many questions should be written to each
objective. The theoretical answer is: as many as are needed
to thoroughly test the objective. In practice, for most learning
objectives, this means three to five questions.
Bloom’s Taxonomy
Most testing in the pharmaceutical industry is done at the
Knowledge and Comprehension level, with some testing done
at the Application level. Not much is done at the three highest
levels of Bloom. This is not inherently bad if you are truly
testing just knowledge acquisition. If, however, you want
to see if your sales representatives can apply their knowledge
then write more questions at the Application level. Why aren’t
more questions written at the Application level? It’s
hard to write good Application questions!
Question Construction
The rules for writing questions
are not difficult, but it is surprising how many question
writers have never been exposed to them. Space does not permit
listing all of them here. Contact the author if you would
like a list of the rules.
Passing Score
In our work with many pharmaceutical companies this is the
area of test validity most commonly violated. It is our experience
that most companies set passing scores arbitrarily by one
of three methods:
- The Higher Authority
Method: “Our Vice President said it should be 90.”
- The Committee Method:
“What do you think it should be? I don’t know,
90 seems about right, is that OK with everyone?”
- The Received Wisdom
Method: “I don’t know how or when it got set
but it’s always been 90.”
There are legally defensible ways to set cut
scores but none of the above passes muster. Legally defensible
passing score setting methods fall into two categories: Data-Driven
and Conjectural. The most commonly used method is the Conjectural
method known as the Angoff method. In the Angoff method a
team of three to five subject matter experts independently
assesses each test item and estimates the percentage of minimally
competent test takers that one would expect to answer the
item correctly. The percents are then summed and averaged
to obtain the passing score.
Item Analysis
Since most tests these days are given with on-line testing
systems it is relatively easy to do a post exam item analysis.
At a minimum, for each item, you should do a point-biserial
correlation and a choice distribution. You can weed out poorly
written items (in spite of your best efforts, they will sneak
in there) and detect needed areas of student remediation.
Remediation and Consequences
If you are going to use test results as an element of promotion
and dismissal decisions you must have a clearly thought out
policy of remediation and escalating consequences for failure.
At each failure you need to demonstrate that you have given
the test taker a fair chance at remediation prior to his/her
taking another test. This policy must be communicated to the
test takers prior to the initiation of the testing program.
Legal Advice
As you can imagine, in our litigious society, high stakes
testing can lead into potential legal jeopardy. Prior to instituting
a high stakes testing program consult with your company’s
HR personnel and lawyers for company policy and guidance.
(Important disclaimer: The author of this article is NOT a
lawyer.)
Auditable Results
If your company finds itself in a legal dispute the authenticity
of your records may be challenged. Be certain that your testing
environment has legally defensible electronic records and
signatures (adheres to federal code 21 CFR 11).
The focus of this article has been on objective,
cognitive tests. In sales training, skills testing is also
a critical element of employee evaluation. Although outside
the scope of this article there are also methodologies for
ensuring the validity of these types of measures. Be certain
that you use a scoring rubric based on Behaviorally Anchored
Rating Scales (BARS), and ensure rater consistency as a key
element of fairness through training, practice and statistical
methods.
It is impossible in an article of this
length to do more than scratch the surface of testing theory.
For questions or more detailed information, you can contact
Steven Just at sjust@pedagogue.com.
|