Is Your Hiring Test a Joke??

When something looks good on the surface, but completely without merit, it is called a joke.

You might not have thought of this before, but many hiring tests fit that bill. I’m talking about tests that deliver numbers and data that look good on the surface, but do nothing to predict candidate job success — in other words, scores do a better job predicting vendor sales than employee performance.

Let me explain why, beginning with how professionals develop a hiring test.

What Works: Professional standards

Professionals always start with a job theory that sounds something like this: “I believe factor-X affects job performance.”

Next, they draft some X- items and give their test to hundreds of people, tweaking and tuning the items along the way. Then they use one or more methods to test whether scores are directly associated with job performance; for example they might give their test to everyone upon hiring, ignore the scores, and later compare test scores to job performance.

This is called predictive validity. They could also give their test to people already on the job and comparing test scores to job performance. This is called concurrent validity. Both methods have their pros and cons.

Drafting a stable, solid and trustworthy hiring test takes months of writing, editing, running studies, and systematically examining the guts of the test at both the item and factor level. This is the only way to know test scores consistently and accurately predict job performance.

Bad joke examples

A while ago, I reviewed a test supposedly developed for retail hires.

The vendor’s own test manual showed scores predicted nothing. Not shrinkage. Not theft. Not turnover. Not performance. Zilch…nada…nothing!

Still, the vendor with a straight face, claimed it “could be helpful” for hiring. You know, like claiming it predicts job performance even though it doesn’t?

Another time I was asked by a proud author to look at their web test. I intentionally answered every multiple choice question with the same letter (i.e., a technique to see if it would produce junk scores).

After the vendor told me the test results described me exactly, I explained what I did. Then, I went on to explain the kind of work necessary before it could be considered professional. They replied their investors would never stand for that.

Wouldn’t it be nice to have, you know, accuracy?

In a final example, a user claimed a certain well-known test would predict management success based on ego-drive. He maintained this trait was desirable for managers.

I said that was a nice thought, but if I was rejected for having a low ego-drive score, I would want to see proof ego-drive was necessary for job performance and then demand to see a study that showed my score predicted job performance.

We did not talk much after that. I guess I was being downright unreasonable by expecting a test user to show scores predicted job performance

Developing a Joke Test: Begin with ignorance

Ignorance is not a permanent condition. It can be fixed. So why do people think, without taking a single class in identifying job skills, measuring job performance, or psychometrics, they know how to develop a hiring test that meets professional standards?

It takes cooperative organizations, patient candidates, honesty, accuracy, and a boatload of statistical work. In fact, here is a link to a book how professionals do it: http://www.apa.org/science/programs/testing/standards.aspx. If you think you want to develop a test, or fix the one you market now, read this book thoroughly.

If you only want to buy a good test, ask your vendor for proof he/she followed the standards. If the vendor never heard of it, or claims it’s too complicated for the average person, then the test is probably bogus!

Developing a Joke Test: Assume personality scores = skill

I attended a course on the DISC once when the instructor mentioned it was often used to hire salespeople. What? DISC factors predict job performance? DISC scores are just differences between how people answer questions, NOT differences in performance!

Not only is DISC scoring weird, the “either/or” scoring method requires rejecting one factor every time another is chosen, thus two people can provide completely different answers but get the same score! Furthermore, its theory was originally based on soldier-behavior under combat conditions. And, just because the vendor thinks all sales people should be pushy, does that mean all customers enjoy dealing with salespeople who are high D’s?

Personality score differences are not skill differences.

Developing a Joke Test: Average everything

Averages are particularly insidious because they look job-credible.

For example, a vendor gives a generic (usually homegrown) test to 100 truck drivers, or 200 salespeople, or some other job title, averages the scores, and exclaims his/her test scores predict success in driving a truck, selling, or in some other occupation!

Are all the people in the sample equally competent? Did they all earn high marks for job performance or low turnover? Are all the truck drivers in the group doing identical work? How might you explain why some individual truck drivers score exactly the same as individuals in other jobs?

Remember that, on average, a person with one foot in a fire and the other in a bowl of ice is perfectly comfortable. Of course, a disreputable test vendor is perfectly comfortable selling junk because he/she really does not know, think, or care about selling averages.

Developing a Joke Test: Toss and stick

Imagine giving a test to a high-performing group of employees, averaging their scores, and using the mean as the job target.

Whoa! The state of job prediction science just regressed to throwing lots.

This technique is plagued with problems: the vendor assumes each factor affects job performance; average scores hide individual differences; people in the low group are often ignored; and, the biggest joke of all, the differences probably happened by chance. I had one vendor tell me that “Toss-and-Stick” was just another way to confirm a test works. I must have missed that class in grad school.

Developing a Joke Test: Circus acts

Let me introduce you to Prof. Bertram Forer. Forer gave his college students a personality test, but instead of giving back their actual scores, he gave each student an identical report gathered from several horoscopes.

Using a 0 to 5 agreement scale, students averaged 4.26. In other words, although entirely different people received the same personality description, virtually all individuals agreed it described them to a “T.”

This experiment later became termed the Barnum Effect, after P.T. Barnum who always made sure he had something for everyone. Junk test vendors take advantage of the Forer Effect when people get so excited about their test scores in a training workshop, they want to take the test into the hiring/promotion arena.

Another user Circus Act is the “one-off” effect. That is, some users tend to think their recollection of one or two exceptions makes the rule.

This often sounds like, “That can’t be right. I knew someone who…..” That’s bad human judgment at work, and a great reason why people need to base hiring/ promotion decisions on hard test facts. And, let’s not forget, interviews are tests — verbal ones. They have something to measure, use questions, and right/wrong answers.

Developing a Joke Test: Summary

The marketplace filled with junk and deception: wrong-headed vendors seek more sales; trainers and managers mistakenly think training tests predict job performance; professional test practices are treated with ignorance and disrespect; occupational averages wrongly predict performance; meaningless organizational groupings and averages predict nothing; and, so forth.

Think about it: When someone uses or sells an unprofessional test they are really saying, “I don’t care how many careers are ruined by my bogus test scores, or how much money is lost by making a bad hire, these inaccurate tests help make better hiring decisions.”

Are you laughing yet?