Pedagogue Solutions  
Pedagougue Update
    HOME>News/Events>Articles

 

     

 

 

 

News and events

Subjectivity and the Angoff Process
by Steven B. Just

Readers of these articles and attendees at our workshop on “Best Practices in Test Development” know that the Angoff process is the most commonly recommended method for setting legally defensible passing scores on criterion-referenced tests. The Angoff method is known as a “conjectural” method because it involves recording the judgments of subject matter experts as to the difficulty of test items. For those not familiar with the Angoff process it works like this (there are a number of variations on this process):

  1. Gather together subject matter experts (a minimum of three).
  2. For each item on the test have them rate the probability that a minimally competent test taker would get the question correct.
  3. Sum each judge’s score and convert to a percent of 100.
  4. Average the judges’ scores.
  5. This is the cut score.

So for example:

Angoff Method: Example
Item
Judge 1
Judge2
Judge 3
1 .75 .80 .85
2 .80 .90 1.00
3 .75 .75 .90
4 .90 .90 .80
5 .95 .75 .85
TOTAL
4.15 4.10 4.40
PRECENT
83% 82% 88%

While it is possible to strive for “scientific accuracy” through rigorous training, practice and discussion, ultimately we are asking humans to make judgments -- a process fraught with subjectivity.

I have run Angoff processes in two ways, each with its own pros and cons:

Method 1: Have the group as a whole discuss each item prior to the individuals making their ratings or have the discussion after each individual makes his/her rating and give the individuals a chance to change their scores based on the discussion.

Pros: The process tends to generate fewer outliers. The individuals tend to reach consensus.

Cons: A single dominant individual (either because he/she knows more, has more seniority or simply has a stronger personality) can easily sway the group.


Method 2: Have each individual rate the items without discussion.

Pros: You get an honest rating from each individual unbiased by group dynamics.

Cons: You tend to get wildly disparate estimates. If you have only three judges for example, merely taking the average of three very different estimates does not imply accuracy of the resulting mean. (If it’s 30 degrees today and 90 degrees tomorrow, you are not living in a pleasant 60 degree climate.)

Here are some ways to address these problems:

  1. Train the judges, practice, and discuss the results, so everyone understands what you are estimating. Admittedly “estimating how many minimally competent test takers out of 100 will get the item right” is a difficult concept to get one’s mind around.
  2. Have five or more judges, especially if you are using Method 2. Then if you want you can throw out the high and low outliers for each item.
  3. Pilot the test!! I repeatedly implore my clients to do this, but under time pressure to get the test out they often don’t. Often the results of the pilot are fascinating, and disturbing. I recently went through a multi-test Angoff process with a client. In theory the students who took the different tests were from a homogenous group and all of the passing scores we derived from the Angoff process were within five points of one another. We found:
    1. The average scores on the pilot tests were very different from one another (much more than the five point Angoff spread).
    2. A much higher percentage of students failed one of the tests than the others (i.e. the actual scores on this one test were much lower than the judges predicted they would be).
    3. Despite our best efforts at item review an incorrect answer slipped through.
    Because we had the pilot results we were able to modify the tests and the cut scores to deliver consistent outcomes across all of the tests.
  1. Ask the Angoff judges one additional question: “What percentage of students who take this test would you expect to pass first try?” Use this number as a reasonableness check against the pilot results and the original Angoff scores to arrive at a final cut score (this is known as the Beuk adjustment).

The Angoff process is inherently subjective. There are steps you can take to minimize this but it’s a fact you must accommodate in your testing process.

POWERED by Pedagogue