What are the types of criterion validity?

Criterion validity refers to how well the measurement of one variable can predict the response of another variable.

One variable is referred to as the explanatory variable while the other variable is referred to as the criterion variable.

For example, we might want to know how well some college entrance exam is able to predict the first semester grade point average of students.

The entrance exam would be the explanatory variable and the criterion variable would be the first semester GPA.

What are the types of criterion validity?

We want to know if it’s valid to use this particular explanatory variable as a way to predict the criterion variable.

How to Measure Criterion Validity

We typically measure criterion validity using a metric like the Pearson Correlation Coefficient, which takes on value between -1 and 1 where:

  • -1 indicates a perfectly negative linear correlation between two variables
  • 0 indicates no linear correlation between two variables
  • 1 indicates a perfectly positive linear correlation between two variables

The further away the correlation coefficient is from zero, the stronger the association between the two variables.

For example, if we collected data on entrance exam scores and first semester GPA for 1,000 students and found that the correlation between the two variables was 0.843 then this would mean the two variables are highly correlated.

In other words, students who score high on the entrance exam also tend to earn high GPA’s during their first semester. Conversely, students who score low on the entrance exam tend to earn low GPA’s during their first semester.

Types of Criterion Validity

There are two main types of criterion validity:

1. Predictive Validity

The first type of criterion validity is known as predictive validity, which determines whether or not the measurement of one variable is able to accurately predict the measurement of some variable in the future.

The previous example of measuring a student’s college entrance exam score and their first semester GPA is an example of measuring predictive validity because we measure the two variables at different points in time.

In other words, we’re trying to determine if the entrance exam score can predict first semester GPA well.

What are the types of criterion validity?

2. Concurrent Validity

The second type of criterion validity is known as concurrent validity, which measures two variables concurrently (i.e. at the same time) to see if one variable is significantly associated with the other.

An example of this would be if a company administers some type of test to see if the scores on the test are correlated with employee productivity.

What are the types of criterion validity?

The benefit of this approach is that we don’t have to wait until some point in the future to take a measurement on the criterion variable we’re interested in.

To measure the criterion validity of a test, researchers must calibrate it against a known standard or against itself.

Comparing the test with an established measure is known as concurrent validity; testing it over a period of time is known as predictive validity.

It is not necessary to use both of these methods, and one is regarded as sufficient if the experimental design is strong.

One of the simplest ways to assess criterion related validity is to compare it to a known standard.

A new intelligence test, for example, could be statistically analyzed against a standard IQ test; if there is a high correlation between the two data sets, then the criterion validity is high. This is a good example of concurrent validity, but this type of analysis can be much more subtle.

An Example of Criterion Validity in Action

A poll company devises a test that they believe locates people on the political scale, based upon a set of questions that establishes whether people are left wing or right wing.

With this test, they hope to predict how people are likely to vote. To assess the criterion validity of the test, they do a pilot study, selecting only members of left wing and right wing political parties.

If the test has high concurrent validity, the members of the leftist party should receive scores that reflect their left leaning ideology. Likewise, members of the right wing party should receive scores indicating that they lie to the right.

If this does not happen, then the test is flawed and needs a redesign. If it does work, then the researchers can assume that their test has a firm basis, and the criterion related validity is high.

Most pollsters would not leave it there and in a few months, when the votes from the election were counted, they would ask the subjects how they actually voted.

This predictive validity allows them to double check their test, with a high correlation again indicating that they have developed a solid test of political ideology.

Criterion Validity in Real Life - The Million Dollar Question

This political test is a fairly simple linear relationship, and the criterion validity is easy to judge. For complex constructs, with many inter-related elements, evaluating the criterion related validity can be a much more difficult process.

Insurance companies have to measure a construct called 'overall health,' made up of lifestyle factors, socio-economic background, age, genetic predispositions and a whole range of other factors.

Maintaining high criterion related validity is difficult, with all of these factors, but getting it wrong can bankrupt the business.

Coca-Cola - The Cost of Neglecting Criterion Validity

For market researchers, criterion validity is crucial, and can make or break a product. One famous example is when Coca-Cola decided to change the flavor of their trademark drink.

Diligently, they researched whether people liked the new flavor, performing taste tests and giving out questionnaires. People loved the new flavor, so Coca-Cola rushed New Coke into production, where it was a titanic flop.

The mistake that Coke made was that they forgot about criterion validity, and omitted one important question from the survey.

People were not asked if they preferred the new flavor to the old, a failure to establish concurrent validity.

The Old Coke, known to be popular, was the perfect benchmark, but it was never used. A simple blind taste test, asking people which flavor they preferred out of the two, would have saved Coca Cola millions of dollars.

Ultimately, the predictive validity was also poor, because their good results did not correlate with the poor sales. By then, it was too late!

Criterion validity is a method of test validation that examines the extent to which scores on an inventory or scale correlate with external, non-test criteria (Cohen & Swerdlik, 2005).

The ultimate aim of criterion validity is to demonstrate that test scores are predictive of real-life outcomes. The basic paradigm for this approach is to give the instrument to a group of individuals and to collect measures of some criterion of interest (e.g., health status, responsiveness to psychotherapy, work performance). There are two variants to this paradigm. The first is called concurrent validity , where both the test scores and criterion measure are collected at the same time. The second is called predictive validity where criterion ratings are obtained at some point after the test scores were obtained. Concurrent paradigms tend to generate higher validity coefficients than predictive paradigms because the passage of time will tend to attenuate correlations between the...

This is a preview of subscription content, access via your institution.

  • Cohen, R. J., & Swerdlik, M. E. (2005). Psychological testing and assessment: An introduction to tests and measurement (6th ed.). New York: McGraw-Hill.

    Google Scholar 

Download references