Wading Through the Data Swamp:
Program Evaluation 201

Supplements

Threats to Internal Validity

Selection

Selection refers to systematic differences between groups being compared that may account for differences in outcomes. For example, a comparison group may start out with lower drug use. When this happens, both the differences observed between the groups on the posttest and any differences in change over time could reflect the fact that they didn't start out at the same level.

Even if they have the same average scores on the pretest, if one group is different in other ways (e.g., higher socioeconomic status), this difference is a threat to internal validity. The differences you see at posttest might be caused by factors other than the program.

History

History refers to specific events occurring between the first (pretest) and second (posttest) measurement in addition to the experimental variable. For example, participants may become involved in additional activities at their school that promote a drug-free environment but aren't part of the program. Any such event could invalidate the inference that the change was due to the program.

Maturation

Maturation refers to processes within the respondents related to the passage of time and not specific to particular events. These include growing older, growing more used to messages about drugs, and becoming more sensitive to the influence of their peers. It is a fact that kids grow up and change over time, despite the wishes of their parents.

Testing

Taking a pretest itself can affect the scores of a second test. Simply taking the test can change the subject's behavior or attitudes. For example, answering questions about attitudes toward drugs could make kids more aware of these attitudes and lead them to think more about their attitudes and talk about them with their friends. These discussions could change the kids' attitudes more than the program.

Instrumentation

Instrumentation refers to changes in the calibration of measuring instruments or changes in the observers or scorers used that may produce changes in the measurements. Jack used the GPRA but let's say he used an instrument that he created himself. If it was still being developed, he might decide that something needed to be fixed between the pre- and posttest. These corrections could affect the results.

Regression

Regression refers to a change over time that looks like a program effect but occurred because the group of participants had extreme scores at the pretest. Evaluators have learned that when a group starts out either very high or very low at the pretest, they often tend to "move toward the center" of the measure on the posttest even when the program isn't causing them to change.

For example, let's say you tested a program on the kids who were in the top 10 percent of their class on having a favorable attitude toward drug use. At posttest, they had less favorable attitudes toward drug use. It's possible that some of that difference occurred because some of the group you selected happened to give an extreme response on the day of the pretest.

Mortality

Also known as attrition, mortality is when some participants drop out of the program. Changes are only observed on those who didn't drop out.