SUSAN CAROL LOSH
METHODS READINGS AND ASSIGNMENTS
GUIDE 2: VARIABLES AND HYPOTHESES
GUIDE 3: RELIABILITY, VALIDITY, CAUSALITY, AND EXPERIMENTS
GUIDE 4: EXPERIMENTS & QUASI-EXPERIMENTS
GUIDE 5: A SURVEY RESEARCH PRIMER
GUIDE 6: FOCUS GROUP BASICS
GUIDE 7: LESS STRUCTURED METHODS
GUIDE 8: ARCHIVES AND DATABASES
Where are the data collection methods?
Before you design an experiment or a survey or an ethnography, you must consider basic issues in hypotheses, whether your variables can approximate numbers or are clearly just categories, and whether your variables are unidimensional or multidimensional. That's what we will do in this guide.
All too often, the student says "I want to do an experiment that will..." or "I want to do a survey" or "I want to do an ethnography of..."
Research begins with WHAT you want to find out, not how you want to discover it.
CONCEPTUAL VARIABLES are what you think the entity really is or what it means. Conceptual variables are about abstract constructs. YOU DO NOT DISCUSS MEASUREMENT AT THIS STAGE! Examples include "achievement motivation" or "endurance" or "second language". You are describing a concept.
On the other hand, OPERATIONAL VARIABLES (sometimes called "operational definitions") are how you actually measure this entity, or the concrete operations, measures, or procedures that you use to measure the concept in practice. If you use a Stanford-Binet to measure intelligence or a bar code scan to assess the popularity of musical artists, those are operational variables.
Why should we care about the difference? A conceptual definition is broader. A particular concept or construct can be operationalized in several different ways. For example, disengagement among students or team members can be measured through absence records, rates of volunteerism, expressions of enthusiasm, and so on.
To complicate matters further, an operational construct may measure many things besides the original concept you are interested in. A Stanford-Binet IQ test may measure "native ability," but also disabilities, language facility, format response set, and other factors extraneous to "native ability." This makes it doubly important to carefully define your conceptual variable.
***For Assignment One (not yet available), you will address CONCEPTUAL VARIABLES, and the RELATIONSHIPS AMONG CONCEPTUAL VARIABLES.
|Letter recognition||Scores on a particular test|
|Culture||Use of "Standard American English"|
|Collective Efficacy||Formation of online study groups|
A confounded variable is a multidimensional variable, it is a variable in which several variables are simultaneously embedded. Because this variable is multidimensional, we do not know precisely what it means or measures. This causes tremendous problems. If a confounded variable is a cause, we cannot isolate exactly what was the specific cause of some phenomenon.
Whenever possible, avoid confounded variables because they muddle and confuse any kind of causal assertions.
Educational level is one of the worst confounded variables because it simultaneously taps:
Experimental treatments that either deliberately or inadvertently include too many variables in a single treatment.
To solve the confounded variable problem,
you must carefully see that each operational variable measures one and
only one construct. This may mean more experimental groups (at least
four groups in my example above, including a control group.) It may mean
that you must use a variety of question formats in your "standardized test"
to control for question format effects.
If one variable causes a second variable, they should correlate (have a real relationship). Causation implies correlation.
However, two variables could be associated without having a causal relationship. For example, such a spurious relationship (apparently, but not truly causal) could occur because both the supposed independent variable and the supposed dependent variable are caused by a third variable.
YOU KNOW? There is an apparent correlation between ice cream
consumption and the number of bodily assaults.
However, this apparent correlation probably doesn't happen because some mystery ingredient in ice cream provokes violence. Rather the correlation occurs statistically because the hot temperatures of summer cause both ice cream consumption and assaults to increase.Thus, correlation does NOT imply causation.
Recall that causes are called INDEPENDENT VARIABLES. If one variable truly causes a second, the cause is the independent variable.
Independent variables are often also called explanatory variables or predictors.
Effects are called DEPENDENT VARIABLES. We explain what has caused dependent variables.
Dependent variables are also sometimes called outcome, response or criterion variables.
Two variables may be associated but we cannot designate cause and effect. These are symmetric relationships.
In asymmetric relationships, we CAN designate cause and effect.
EXAMPLE: Married or cohabiting people average better mental health than unmarried people. However, we have evidence that marriage promotes mental health AND ALSO that mentally healthy people are more likely to marry. Thus, we can't clearly and unambigously designate cause and effect without further information. This is a symmetric relationship.*
*Recent research indicates single people go to bars and drink more often, which may inflence mental health.
EXAMPLE: Someone's gender is linked to their level of basic science knowledge. While it is possible that being male or female might lead to differential interests, hence to sex-linked science scores, it is IMPOSSIBLE (in nearly all cases) for your basic science score to make you male or female, or to change your biological sex. Because cause and effect can unambiguously be designated, this is an asymmetric relationship.
I define a mediating variable as one that links between the independent and the dependent variable. Thus, an mediating variable is part of a causal chain:
INDEPENDENT VARIABLE -------> MEDIATOR VARIABLE ------> DEPENDENT VARIABLE
EXAMPLE: educational level is a cause of science attitudes because educational level influences the type of occupation someone has (mediator variable), and it is the occupational type that affects science attitudes.
Mediator variables inform us about causal sequences or chains, thus explaining the causal process of a phenomenon.
EXAMPLE:educational level -----> occupational type -----> income level
While I would love to say that employers will pay you just because you have a college degree, in fact, it is the job you obtain (often thanks to the degree) that pays the salary. The job is the mediating variable between educational level and income level.
Mediating variables certainly CAN be measured. They are critical to use in non experimental research designs. Often they can specify what it is about the dependent variable that is important.
Hypotheses link variables, typically independent, mediating, and dependent variables in causal assertions. An hypothesis may describe whether there is a relationship, no relationship predicted at all, the causal direction of the relationship, the mechanics (how) of the relationship, and may even specify the form of the relationship.
be falsifiable through logic or ultimately (for operational and
null hypotheses) through empirical test.
This property is absolutely critical in scientific research.
If an article you read does not address falsifiable hypotheses in some way, its assertions aren't science.
A CONCEPTUAL HYPOTHESIS links at least two conceptual variables. Typically, this is stated in some type of cause and effect manner.
|Aerobic exercise||will reduce||levels of "state anxiety."|
|Independent variable||direction of effect||Dependent variable|
|Young chronological age||will increase||ease of second language learning|
|Independent variable||direction of effect||Dependent variable|
EXAMPLE:An external threat raises team cohesiveness.
Notice that I have never stated how we will measure aerobic exercise, state anxiety, second language learning, external threat, or cohesion. At this stage I need to develop and define what these terms actually mean and how or why I expect them to be linked together.
For example, I could discuss how an external threat makes social identity salient and thus team members work together better. Or I might show how the endorphins generated through aerobic exercise allay anxiety. (In these examples, "salience of social identity" or "endorphins" are mediator variables.)
AN OPERATIONAL HYPOTHESIS links at least two operational variables. Again, some type of cause and effect is usually present in the hypothesis.
EXAMPLE: Children with an encyclopedia in their home will achieve higher scores on the Stanford-Binet intelligence Test.
EXAMPLE: Fast walking will lower Galvanic Skin Response scores.
NULL HYPOTHESES (0)
In classical statistics inference testing, it is mathematically the easiest to disprove a null hypothesis, which is sometimes written as Ho:
A null hypothesis is also precisely stated.
A null hypothesis will assert that:
Having an encyclopedia in the home has
no effect on children's scores on the Stanford-Binet Intelligence
Fast walking has no effect on Galvanic Skin Response scores.
There is no relationship between an external threat and team cohesiveness.
As you can see, null hypotheses are basically "directionless."
If the null hypothesis is rejected, typically an alternative hypothesis (usually styled HA:) is accepted. Usually the alternative hypothesis will assert that a relationship among two or more variables exists or that two or more subpopulations differ in some respect. A direction to the relationship (e.g., external threat raises team cohesion) may be specified. Directional alternative hypothesis are specified in advance of data collection procedures.
You may not believe your null hypothesis at the time you state it, because, in fact, you believe there is a relationship or that two groups differ. However, a null hypothesis is consistent with more tests of "statistical significance" which may make it a little easier to work with.
It used to be that
students had to assert null hypotheses in a thesis, dissertation, conference
presentation or article. Now, we are more comfortable with students creating
directional hypotheses. Many articles do so now.
A SHORT STATISTICAL PRESENTATION TO HELP WITH READING
AND TO USE IN CRITIQUES
The types of statistics used should be consistent with the level of
A variable is a characteristic or factor that has values that vary. Thus, a variable has at least two different categories or values.
Variables consist of sets or systems of categories with several properties. Examples of category systems include:
GENDER: Categories = Male and Female
PRIMARY/SECONDARY GRADES: Categories = Kindergarten, first, second, third...and so forth to grade twelve
AGE IN YEARS: Categories = 1, 2, 3, 4, 5, and so forth up to 90 years of age--or even higher.
At a minimum, category systems should be exhaustive (cover all cases) . Each case must be able to fit into a category. Sometimes that means we must construct an all-inclusive "other" category.
Categories of a variable should also be mutually exclusive (each case fits into one and only ONE category).
Other nice category properties--WHEN IT IS POSSIBLE-- include:
a good spread of cases over categories (no category with too large or too small a percentage of cases). Possibilities IF the data allow include a normal ("bell-shaped" or Gaussian distribution) or an equiprobable distribution in which each category has the same number of cases.
a limited number of categories and
equal intervals between categories (applies only IF the category values are numeric).
TIP: Try to gather data as completely as possible (for example, get education in number of years rather than degree level) because you can collapse or move around categories later. If you really meant degree level, then ask about degree level explicitly rather than years of education or "how much" education.
Avoid "open-ended" categories that do not have fixed end points when possible (e.g., "graduate degree or more"--or "$75000 or more"). Keep in mind that it may not be possible to use a final closed category with income.
Make questions and responses explicit enough that respondents or interviewers do not need to guess about the answer. "Guessing" can quickly turn a numeric variable into a non numeric variable.
Nominal, ordinal and interval-ratio variables are different types of category systems. These form a cumulative and hierarchical set of data properties, so that nominal properties are true for ordinal and interval data. And ordinal properties are also true for interval data. The reverse does NOT hold.
With nominal variables, you can tell whether two cases or instances fall into the same category or into different categories. Thus, you can sort all cases into mutually exclusive, exhaustive categories. That's it!
Examples of nominal variables include:
Birth country and
Religious affiliation (or denomination)
Nominal variables are also sometimes called categorical variables or qualitative variables. The categories are not only not numbers, they do not have any inherent order.
Try these examples:
Who is more? Koreans or Turks? More WHAT? Country of origin is NOT a number.
Who is "better"? Women or Men? Better at WHAT? If you suspect that ranking the categories (NOTE: NOT the cases within the categories) would start a war, you probably have nominal variables.
STATS & PRESENTATION ADVICE: You can only do very basic statistics or presentations with nominal data, such as: percents, ratios, rates, frequency distributions (thus charts and graphs), and modes. Of course, many nominal variables are very important, especially as explanatory variables.
With ordinal variables, the categories themselves can be rank-ordered from highest to lowest.
This means the scores must be rank-ordered from highest to lowest (or vice versa) first, before you can use any ordinal measures. Like runners in a race, we can rank scores--and the categories themselves--from first to last, most to least, or highest to lowest.
In rank-ordered cases, we can literally rank order the finishers in a race or the students by their grade point average (first in class, second in class, and so on down to last in class). Notice that the intervals between cases probably are not the same (or equal). The class valedictorian may have a straight-A or 4.0 average, the salutatorian a 3.6, the third student a 3.5, and so on. The fastest runner might run a mile in 5 minutes, the second fastest in 5 minutes 10 seconds (10 seconds slower), the third runner in 6 minutes (50 seconds slower still). So the "distances" between the SCORE TIMES (not the ranks) are unequal.
We can also rank-order the categories of a variable in ordinal data. One example is a Likert, or rank-ordered scale. Respondents are given a statement, such as "I like President Obama" then asked if they:
Strongly Agree Agree Disagree or Strongly Disagree with that statement.
We can surmise that someone who "strongly agrees" supports that statement more intensely than someone who "agrees"--but we don't know how much more intensely.
Most Agree-Disagree (Likert) attitude scales are ordinal data.
This is fairly obvious when there are 5-7 categories but it is also true when there are only two categories: someone who favors raising teacher salaries obviously is more in favor than someone who opposes the raise.
Someone who smokes cigarettes "at all" and answers "yes" smokes more than someone who smokes zero cigarettes (and answers "no").
Other types of ordinal data include:
the order of finish (e.g., class rank or a horse race)
"yes-no" experiences (someone who answers "yes" to "Do you play the lottery?" clearly plays more than someone who answers "no"), or
collapses of numeric data into categories with unequal widths or intervals (e.g., collapsing years of education into degree level).
STATS & PRESENTATION ADVICE: Everything that you can do with nominal data (graphs, modes, etc.) you can do with ordinal data too. In addition, with ordinal data, you can do percentiles, quartiles, and medians (the category that includes the 50th percentile).
Most statistical processing computer programs,
such as SPSS, assign numbers to all categories as a default, even to non
numeric nominal and ordinal variables. This is for data processing ease
and does not give you any clues as to the type of data you have. THE
DATA ANALYST MUST MAKE THAT DECISION!
You can count the number of books and you can't have less than zero.
In addition to the properties of nominal and interval category systems, interval or ratio variables possess a common and equal unit that separates adjacent or adjoining categories.
EXAMPLES:one year of age or one year of education or one dollar of income. Each of these examples is one equal unit.
These intervals are equal no matter how high up the scale you go.
Most "count variables" (years of age or years of formal education, children, dollars) are ratio variables.
STATS & PRESENTATION ADVICE: With numeric data (interval or ratio variables), in addition to all the options that you have with nominal and ordinal variables, the analyst can perform arithmetic operations on the scores: add, subtract, divide and multiply them. Thus you can calculate arithmetic means on numeric data.
It is nonsense to perform arithmetic operations on clearly nominal data.
For example, suppose you have a group of
three men and three women. Can you calculate a mean or arithmetic average
score? What could it possibly be? It can't be a number because gender category
value is a name or tag ("male" "female") that cannot be added or multiplied.
|TYPE OF VARIABLE||CATEGORIES EXHAUSTIVE||CATEGORIES MUTUALLY EXCLUSIVE||CASES CAN BE SEPARATED BY CATEGORY||CATEGORIES CAN BE RANK-ORDERED||CATEGORIES SEPARATED BY EQUAL INTERVAL||FIXED OR NON ARBITRARY ZERO|
Susan Carol Losh
Revised September 1 2013
This page was built with Netscape Composer
METHODS READINGS AND ASSIGNMENTS