GUIDE 2: VARIABLES AND HYPOTHESES
GUIDE 3: RELIABILITY, VALIDITY, CAUSALITY, AND EXPERIMENTS
GUIDE 4: EXPERIMENTS & QUASI-EXPERIMENTS
GUIDE 5: A SURVEY RESEARCH PRIMER
GUIDE 6: FOCUS GROUP BASICS
GUIDE 7: LESS STRUCTURED METHODS
GUIDE 8: ARCHIVES AND DATABASES
5481 METHODS OF EDUCATIONAL RESEARCH
At this point, you are fairly itching to begin your design. But we still have important conceptual material to cover. After all, you want your measures to be reliable and valid, your statements about causality to be appropriate, and be able to generalize your findings.
In order to make any kind of causal assessments in your research situation, you must first have reliable measures, i.e., stable and/or repeatable measures. If the random error variation in your measurements is so large that there is almost no stability in your measures, you can't explain anything! Picture an intelligence test where an individual's scores ranged from moronic to genius level over a short period of time. No one would place any faith in the results of such a "test" because the person's scores were so unstable or unreliable.
Reliability is required to make statements about validity. However, reliable measures could be biased and hence "untrue" measures of a phenomenon) or confounded with other factors such as acquiescence response set. Picture a scale that always weighs five pounds too light. The results are reliable, but inaccurate or biased. Or, picture an intelligence test on which women or people of color always score lower (even if this doesn't occur on other tests). Again, the measure may be reliable but biased.
Note that some estimates of reliability are based on the number of items in the test or scale (Cronbach's Alpha is one example). Thus, we might have a long measure, with a lot of items, that will appear "reliable," yet when we examine the measure closely, we discover that the correlations among items are low. This means that items in that measure just don't seem to "hang together" or relate well to each other and your measure may be multidimensional. While this is a "judgement call," be advised that it is desirable for "reliable measures" to also be unidimensional measures, i.e., to measure one and only one construct. It is much easier to interpret unidimensional measures.
Internal validity addresses the "true" causes of the outcomes that you observed in your study. Strong internal validity means that you not only have reliable measures of your independent and dependent variables BUT a strong justification that causally links your independent variables to your dependent variables. At the same time, you are able to rule out extraneous variables, or alternative, often unanticipated, causes for your dependent variables. Thus strong internal validity refers to the unambiguous assignment of causes to effects. Internal validity is about causal control.
Laboratory "true experiments" have the potential to make very strong causal control statements. Random assignment of subjects to treatment groups (see below) rules out many threats to internal validity. Further, the lab is a controlled setting, very often the experimenter's "stage." If the researcher is careful, nothing will be in the laboratory setting that the researcher did not place there. When we leave the lab to do studies in natural settings, we can still do random assignment of subjects to treatments, but we lose control over potential causal variables in the study setting (dogs bark, telephones ring, the experimental confederate just got run over walking against the "don't walk" sign on West Tennessee.)
External validity addresses the ability to generalize your study to other people and other situations. To have strong external validity (ideally), optimally you need a probability sample of participants or respondents drawn using "chance methods" from a clearly defined population (all registered students at Florida State University in the Fall 2008 semester, for example). Ideally, you will have a good sample of groups (e.g., classes at all ability levels). You will have a sample of measurements and situations (you study who follows a confederate who violates the "don't walk" signs at different times of day, different days, and different locations on campus.) When you have strong external validity, you can generalize to other people and situations with confidence. Public opinion surveys typically place considerable emphasis on defining the population of interest and drawing good samples from that population. On the other hand, laboratory experiments often employ "convenience samples," such as intact college classes taught by a friend. As a result, we may not know whom the subjects represent.
Construct validity is about the correspondence between your concepts (constructs) and the actual measurements that you use. A measure with high construct validity accurately reflects the abstract concept that you are trying to study. Since we can only know about our concepts through the concrete measures that we use, you can see that construct validity is extremely important. It also becomes clear why it is so important to have very clear conceptual definitions of our variables. Only then can we begin to assess whether our measures, in fact, correspond to these concepts. This is a critical reason why you first worked with concepts, and only then began to work on operationalizing them.
If we only use one measure of a concept, about the best we can do is "face validity," i.e., whether the measure appears "on the face of it" to reflect the concept. Therefore, it is wise to use multiple measures of a concept whenever possible. Further, ideally these will be different kinds of measures and designs.
EXAMPLE: You might measure mathematical skill through a paper and pencil test, through having the student work with more geometric problems, such as a wood puzzle, and having the student make change at a cash register. Our faith that we have accurately measured her high math ability is stronger if she performs well on all three sets of tasks.
Construct validity is often established through the use of a multi-trait, multi-method matrix. At least two constructs are measured. Each construct is measured at least two different ways, and the type of measures is repeated across constructs. For example, each construct first might be measured using a questionnaire, then each construct would be measured using a similar set of behavioral observation categories.
Typically, under conditions of high construct validity, correlations are high for the same construct (or "trait") across a host of different measures. Correlations are low across constructs that are different but measured using the same general technique. Sometimes, this is called "triangulating" measures.
Under low construct validity, the reverse holds. Correlations are high across traits using the same "method" (or type of technique or measurement) but low for the same trait measured in different ways. For example, if our estimate of a student's math ability was wildly divergent depending on whether we examined scores on the questionnaire, making change, or the wood puzzle, we would have low construct validity and a corresponding lack of faith in the results.
One implication of all this material
is that, of course, we NEVER, NEVER say "intelligence is what this intelligence
test measures." Or any other single kind of "test" or assessment, of course.
There are many ways of knowing, and different cultures and subcultures use different expectations and norms about proof and causality. Causality is critical: it tells us what is possible, what can be changed and what is difficult, if not impossible, to change. For example, if you are convinced that biological factors cannot be overcome, you probably will not work with visually impaired children because you would believe that they could not compensate for their disabilities. Causality tells us what are the “prime movers” of the phenomena that we observe.
Consider some different perspectives on causality:
Here are some different ways and means of "proof":
Much of the research process centers around what are the true causal or “independent variables.” What we initially may consider to be “true causal” variables may, instead, turn out to be artifacts of the research process (e.g., questionnaire format response set or experimental reactivity or confounded treatment effects) or the particular group that we studied. Much of science consists of ruling out alternative causes or explanations. While science is one form of knowing and one generic way of gathering evidence that either disconfirms or is suggestive of causality, it is not the only way of doing so. The results of science may or may not be accurate, but without following "the rules" of science, most scientists do not believe one is "doing science." Considerable disagreement occurs between scientists and members of the general public because scientists don't make it clear how our methods of "proof" differ from those commonly used among the general public (e.g., legal arguments).
According to science rules, definitive proof via empirical testing does not exist. Science uses the term "proof" (or, rather, "disproof") differently from the way attorneys or journalists do. Our measurements could be later shown to be contaminated by confounding factors. A correlation could have many causes, only some of which have been identified. Later work can show earlier causes to be spurious, that is, both cause and effect depend on some prior causal (often extraneous) variable. Statistics are NEVER EVER considered to "prove" anything although statistical results CAN disconfirm.
Further, science is a self-correcting process. Another researcher can try to duplicate your results. If your results are interesting, in fact, dozens of researchers may try to duplicate your results. If something was awry with your study, the subsequent research projects should discover and correct this.
We use the rules of science in this
|Cancerous Human Lung
This dissection of human lung tissue shows light-colored cancerous tissue in the center of the photograph. While normal lung tissue is light pink in color, the tissue surrounding the cancer is black and airless, the result of a tarlike residue left by cigarette smoke. Lung cancer accounts for the largest percentage of cancer deaths in the United States, and cigarette smoking is directly responsible for the majority of these cases.
"Cancerous Human Lung," Microsoft(R) Encarta(R) 96 Encyclopedia. (c) 1993-1995 Microsoft Corporation. All rights reserved.
|Most people--and most scientists--accept that smoking cigarettes causes lung cancer although the evidence (for humans) is strictly correlational rather than experimental. There are many topics where it is neither possible--nor desirable--to use the experimental method. To accept more correlational evidence it will help to examine the rules below. (SCL)|
Many scientists believe that the ONLY way to establish causality is through randomized experiments. That is one reason why so many methods text books designate experiments–and only experiments--as “quantitative research.”
I have never quite understood, by the way, how the numeric level of one's measures can have much to do with cause. After all, variables such as gender, nationality, and ethnicity can have profound casual effects and they are categorical variables. Authors who make this mistake may also misunderstand causality.
Indeed a moment’s reflection will convince you that experiments are far from the only way to establish causality. Most people now accept that smoking cigarettes causes lung cancer (see the Encarta selection above)–yet no society has ever randomly assigned half its population to smoke cigarettes and the other half not (although there are some experiments with rats). This causal conclusion about smoking and lung cancer is based on correlational or observational evidence, i.e., observing the systematic covariation of two (or more) variables. Cigarette smoking and lung cancer are both "naturalistic" variables, i.e., we must accept the data as nature gave them to us (some authors call these "organismic" variables for "organic.")
There is no doubt that the results from careful, well-controlled experiments are typically easier to interpret in causal terms than results from other methods. However, as you can see, causal inferences are often drawn from correlational studies as well. Non-experimental methods must use a variety of ways to establish causality and ultimately must use statistical control, rather than experimental control. The results of the Hormone Replacement Therapy experiments, released in the summer of 2002, remind us of the great care that must be taken when designing nonexperimental research.
If one variable causes a second variable, they should correlate thus causation implies correlation. However, two variables can be associated without having a causal relationship, for example, because a third variable is the true cause of the "original" independent and dependent variable. For example, there is a statistical correlation over months of the year between ice cream consumption and the number of assaults. Does this mean ice cream manufacturers are responsible for crime? No! The correlation occurs statistically because the hot temperatures of summer cause both ice cream consumption and assaults to increase. Thus, correlation does NOT imply causation. Other factors besides cause and effect can create an observed correlation.
If one variable
causes a second, the cause is the independent variable (explanatory
variables or predictors).
The effect is the dependent variable (outcome or response variable).
If you can designate a distinct cause and effect, the relationship is called asymmetric.
For example, most people would agree that it is nonsense to assume that contacting lung cancer would lead most individuals to smoke cigarettes. For one thing, it takes several years of smoking before lung cancer develops. On the other hand, there is good reason to believe that the carcinogens in tobacco smoke could lead someone to develop lung cancer. Therefore, we can designate a causal variable (smoking) and the relationship is asymmetric.
Two variables may be associated but we may be unable to designate cause and effect. These are symmetric relationships.
For example, men
over 30 with greater mental health scores are more likely to be married
in the U.S. Aha! Marriage is a "buffer" protecting from the stresses of
life, and therefore it promotes greater mental health. Wait! Perhaps the
causal direction is the reverse. Men who are in better mental shape to
begin with get married. Maybe both are true...When we cannot clearly
designate which variable is causal, we have a symmetric relationship.
RULES AND GUIDANCE
Since we know that we cannot use experimental treatments in naturalistic variables to determine cause and effect, yet we know that scientists can and do draw causal conclusions in nonexperimental studies, here is a set of helpful rules for tentatively establishing causality in correlational data.
For a more detailed discussion, I recommend the following book:
Barbara Schneider, Martin Carnoy, Jeremy Kilpatrick, William H. Schmidt, Richard J. Shavelson (2007): Estimating Causal Effects: Using Experimental and Observational Designs. A think tank white paper prepared under the auspices of the AERA Grants Program.
You can actually download this book FOR FREE from the American Educational Research Association by clicking HERE!
By the way, there are always alternative causal explanations in experiments too. The study control group may be flawed. Participants' awareness of being studied may create conditions (e.g., anxiety) that mean we do not measure "true" behavior or performance. So even though it may be easier to establish cause in experiments, keep in mind that nothing is fool-proof.
(1) TIME ORDER. The independent variable came first in time, prior to the second variable.
EXAMPLE: Gender or race are fixed at birth. Gender or race can be important causal variables because individuals behave differently toward males or females, and often behave differently toward individuals of different religions or ethnicities.
(2) EASE OF CHANGE. The independent variable is harder to change. The dependent variable is easier to change.
EXAMPLE: One's gender is harder to change than scores on an assessment test or years of school.
(3) "MAJORITY RULE." The independent variable is the cause for most people.
EXAMPLE: Although some people become so fed up with their jobs that they return to school to train for a better job, most people complete their education prior to obtaining a regular year-round, full-time job.
(4) NECESSARY OR SUFFICIENT. If one variable is a necessary or sufficient condition for the other variable to occur, or a prerequisite for the second variable, then the first variable may be the cause or independent variable.
EXAMPLES: A certain type of college degree is often required for certain jobs. At most universities, publications are a prerequisite for being awarded tenure.
(5) GENERAL TO SPECIFIC. If two variables are on the same overall topic and one variable is quite general and the other is more specific, the general variable is usually the cause.
EXAMPLE: Overall ethnic intolerance influences attitudes toward Hispanics.
(6) THE "GIGGLE" OR "SANITY" FACTOR. If reversing the causal order of the two variables seems illogical and makes you laugh, reverse the causal order back.
EXAMPLES: We don't believe choosing a specific college major or engaging in a particular sport determines one's gender.
MEMORIZE THESE SIX RULES.
You will apply them during exams and assignments all semester!
Dedicated to health and fitness, you devised a new exercise plan that you believe will really help people. So you obtain a sample of Educational Psychology undergraduate students. With the flip of a coin, half the students receive a physical and mental health screening and those who are fit begin this new exercise program. The other half also receive a health screening but no exercise regimen. Six weeks later, you re-examine everyone who was physically fit in the screening and compare the two groups. The group receiving the exercise plan now score happier and healthier than the group that did not.
Jubilant over the results, you assert that your new exercise plan contributes to physical and mental fitness!
Or does it? Are your results internally valid?
This study was a "true experiment." In a true experiment--whether laboratory, field, or simulation--participants are randomly assigned to treatment groups using a coin flip or some other type of probability, non human judgment method. It is randomization that makes true experiments so strong in internal validity and typically allows us to make relatively strong influences about causality. It is also random assignment to treatments that distinguishes a true experiment from other kinds of data collection.
Random assignment means that on the average at the beginning of a study, all your treatment groups are about the same. In your physical fitness study, it meant about the same percent of each group "flunked" the screening test and about the same percent exercised on a regular basis, even before your intervention.
Random assignment or "randomization" controls at the beginning for all the variables you can think of, and, more important, all the variables you didn't think of.
This study had another important research design aspect: a control group which did not receive the special exercise program. Control or comparison groups are critical in all kinds of research. If we did not have a control or comparison group, the study would be open to the criticism--and alternative causal explanation--that improvement in health would have occurred in any event among young adults, even had the exercise program never been instituted. Not only did you have a control group, but, in an experiment, participants are randomly assigned to it.
Studies that lack a control group are sometimes called "one shot" studies or sometimes case studies. While the results may be interesting, we are limited in the causal implications we can make from the results of "one shot" research.
We will later examine facets of the "good" control group.
You are pretty sure that you know what improved the health of your experimental subjects: the new exercise program you initiated. And there is a good chance that you are right, because by using random assignment you controlled for several pre-existing conditions or threats to internal validity: participants' general physical health, previous exercise patterns, incidence of depression or their general personal histories which, on the average, would be the same for each group. By using random assignment, you also controlled for any incidental historical conditions (such as an influenza outbreak that year which could influence health in both groups).
Your study has two other important features: a pretest and a posttest. In the pretest, you measured existing conditions on your dependent variables, i.e., mental and physical health among all your participants, whether in the experimental or control group, prior to any intervention at all. This enables you to double-check that your participants were pretty much alike across groups at the beginning of the study. You can also assess the level of change because you have both pretest and posttest information. Then, after your intervention, you reassessed scores on your dependent variables in a posttest. A posttest only design cannot do either of these important sets of measures.
This is often called a "pretest-posttest" experimental design.
You should be advised, however, that the standard pretest-posttest design may pose some threats to internal validity, or the unambiguous assignment of cause and effect. Why? Because simply being measured or observed during the pretest may sensitize some participants and they will behave differently as a result. (For example, being weighed might have sent all subjects to the exercise room for six weeks!) Further, a pretest may interact with an experimental treatment to heighten the effect of the experimental intervention more than it would have ordinarily.
How can you cope
with this dilemma? One way is the famous Solomon
Four Group Design, considered one of the strongest
experimental designs with respect to internal validity. In
the Solomon Four Group Design, there are four randomized groups of participants.
One group receives a pretest, the experimental treatment and a posttest.
The second group is identical, except it does not receive a pretest. The
third group receives a pretest and posttest but a different treatment
(this could be a group that receives no treatment at all, for example).
The final group receives only a posttest and the second treatment
(such as no treatment). Below is a diagram of the Solomon Four Group Design:
|GROUP ONE||Pretest||Treatment 1||Posttest|
|GROUP TWO||Treatment 1||Posttest only|
|GROUP THREE||Pretest||Treatment 2||Posttest|
|GROUP FOUR||Treatment 2||Posttest only|
Solomon Four Group Designs are more expensive because they require more participants and conditions than other types of experimental treatments. But, many researchers believe the advantages are worth the expense.
We will revisit experiments, and compare them with "quasi experiments", in Guide 4.
Some textbooks imply that "intact groups" cannot be part of a "true experiment." This is not necessarily true so assess each situation carefully to see if a true experiment is possible.
Suppose you are studying fourth grade classes. The major way the school divides its fourth grade students into classes is through a systematic alphabetical list. If there are five fourth grade classes, every fifth student goes to Class 1, Class 2, and so on. In other words, there is no reason at this particular school to believe any of the fourth grade classes is distinctive at the very beginning of the school year. If you randomly assign classes to different experimental treatments in this example, you will indeed have a "true experiment." The key is that the intact groups were pretty much assembled using random means in the first place.
Also, if it is the very beginning of the academic year, students in the different classes have not been exposed to different teachers or teaching methods. This will not be true later in the year. If you come in and do your experiment at the very beginning and before the different teachers have made assignments, begun in-depth lessons, etc., you probably do have a "true experiment."
On the other hand, suppose there was a systematic difference among groups before you applied any kind of intervention, such as Honors classes versus regular classes in school. In such a case, even random assignment of intact groups could not produce a true experimental design. The problem is particularly great if a difference between groups relates to a variable you want to study. For example, Honors math students may react differently to a new way of teaching algebra than students in regular classes.
So, study the situation
carefully. "True experiments" with intact groups are possible, but only
under a very restricted set of conditions. If you don't meet those conditions,
it is more likely that you have a "quasi-experiment," which we will examine
Measure carefully. Measure more than once. Use more than one measure of a construct.
Avoid bias, such as the bathroom scale that always measures 5 pounds too light.
Susan Carol Losh
September 9 2002
Revised February 11 2009
This page was built with Netscape Composer