Please be patient with non-working links as these websites are gradually
restored and uploaded to the FSU mailer server.
NOW
POSTED: THE GUIDE TO EXAM 3:
EDF
5481 METHODS OF EDUCATIONAL RESEARCH
INSTRUCTOR:
DR. SUSAN CAROL LOSH
FALL 2002
|
WHY EXAMINE WEB-BASED DATABASES?
|
As you have already learned, it is expensive
and time-consuming to collect data, especially datasets that are sizable
or comprehensive. In the early 1970s, the United States Federal government
initiated a series of what have come to be called "Social Indicators."
The idea was to collect data from different domains (education, health,
the status of women and ethnic minorities, public opinion, etc.) and to
continue these series over time, thereby tracking change and continuity
among Americans. At the same time, other countries, particularly Canada,
Western Europe, and Japan, also began indicator series, thus making possible
international comparisons. One example is the Third International Mathematics
and Science Study (TIMSS). Data were collected in 42 countries in 1995
and in 38 countries in 1999. More recent additions address experience with
computers and the World Wide Web.
Considerable effort has been devoted
to making many of these indicator series compatible over time:
-
Questions are asked in the same way
-
Changes to questions are established via "split-ballot"
testing, i.e., experiments to see whether the revised questions work the
same way as the original questions. A good indicator series NEVER
shifts question format (or open question codes) arbitrarily.
-
Variables are defined in the same way
-
Coding categories remain constant
-
If coding changes are made, care is taken
to make new coding systems compatible with the old, such as the detailed
United States Census three digit occupational codes
A series may have an "oversight board."
These boards monitor the content and form of the indicator series. Thus,
principal investigators cannot capriciously change either content or form
without input from a panel of expert professionals.
The number of data archives is already
HUGE and is growing by the minute. Some of the large archives, such as
ICPSR, The Roper Center or the Howard W. Odum Institute for Research in
Social Science at the University of North Carolina, are simply staggering
in the amount of data that they hold.
As you look through some of the pages,
you will see that several times I have given the warning: "set aside a
day to explore this archive." Do take this warning seriously!
One of these archives may hold the answer to your proposed dissertation
or provide the basis for a nice conference paper or article. They are definitely
worth exploring.
With resources such as these, the novice--and
even the experienced--researcher should seriously reconsider whether they
really want to gather all of their own data from scratch.
|
WHY THESE ARCHIVES ARE IMPORTANT
TO YOU
|
-
There is no point in "reinventing the wheel."
Why
do a small local study when data already exist on regional, national or
even international levels? An example is using the "CIRP" to look at college
student beliefs, attitudes, and accomplishments instead of convenience
samples of your buddy's classes.
-
"There is plenty of gold in them thar hills."
Most of these databases are so huge that no one investigator could ever
analyze everything in them. With each successive year, the possibilities
for analysis grow. Further, other researchers may have ideas for analysis
that did not occur to the original Principal Investigator. In other words,
there is plenty of data for you to do an original analysis--without all
the backbreaking work of collecting the data too.
I practice what I preach! Since 2001
I have worked with the National Science Foundation Surveys of Public Understanding
of Science and Technology. These surveys now span 1979 to 2008, an unprecedented
look at public knowledge, reasoning and attitudes about science and technology.
I have built longitudinal files from these data now available at ICPSR
and The Roper Center.
For two examples
of my examination of generational versus aging effects on science beliefs
and attitudes (CLICK
HERE) and information technology (CLICK
HERE), see the Internet links. I have also extensively examined change
over time.
|
-
Many of these archives offer an unprecedented
opportunity to track trends over time. How did computer use change
from the early 1980s to the early 2000s? What kind of educational preparation
do students receive who rise to eminence later on? What are the average
student characteristics in research universities as opposed to liberal
arts colleges, and how did these characteristics change over time? What
are gender differences in Internet use over time?
-
YOUR time, resources, and energy. Many
researchers, especially junior faculty, have limited resources. With one
eye on the tenure clock, junior faculty have limited time too. It takes
time, often A LOT of time, to gather your own data. If existing archives
have variables that are directly pertinent to your research interests,
it is often in your best professional interests to use these archives.
Obviously, using pre-existing archives
are not for everyone. Many students in disciplines that lend themselves
to "quick and dirty" experiments can quickly collect data with relatively
little financial investment. However, even these researchers may be interested
in "triangulation" with survey data or historical records.
|
CLICK HERE
TO ENTER THE ONLINE DATABASE MENU
|
|
QUESTIONS YOU SHOULD CONSIDER
ABOUT ONLINE DATABASES
|
-
What is the unit of analysis? Is it
an individual? An organization, such as a college or university? A time
point for a country or state series? Archives vary and the unit is not
always an individual.
-
What kinds of variables does the archive
cover? Degree attainment? Health practices? Drug or alcohol usage?
Attitudes?
-
What is the time frame covered by the archive?
Examples:
the average school FCAT scores for 1998-2001 or The General Social Survey
from 1972-2000.
-
What is the geographic frame covered by
the archive (state? local? United States? international?)
-
Who were the sponsor(s) of the archive
(e.g., NSF? NCES? United Faculty of Florida?)
-
How did the archive come to be?
-
Were the data collected especially for
the archive (such as IPEDS or TIMSS)? Or were the data compiled from other
sources (such as Web CASPAR)?
-
Does the archive contain any tutorials
that instruct how to use it (online or otherwise)?
-
Are there codebooks that describe the data,
the variables and the file structure?
-
How are the data available? Are they
ready for online analysis? Are the data available to download into your
computer? Are the data contained in .pdf format tables? Are there
alternative ways to obtain the data (such as CD-Rom?)? If so, how can the
data be obtained?
-
Can you simply download the data or must
you obtain a CD-Rom or other device from the archive agency?
-
How "clean" are the data? One good
example is the U.S. government's famous "Falling Through the Net" data
about the "Digital Divide" in Computer and Internet Usage. This is one
of the most cited datasets about the Digital Divide and it is appallingly
"dirty." Any household resident 14 years of age or older was asked
to provide information about all other residents in the household. Considerable
data are missing on racial identification. The information I could locate
did not say how the data were gathered (in-person? Random Digit Dial of
landlines?) Apparently, the government was in such a rush to put up the
dataset, the data contain a LOT of careless errors.
As a result, I consider estimates from the early years of these data to
be unreliable despite a usually trustworthy source.
-
Is there a charge for the data? If
so, what is the cost? Most archival costs are surprisingly reasonable,
when you consider the effort involved in the first place. For example,
the cost of the ENTIRE General Social Survey archive, from 1972 to 2006,
in SPSS ready format is less than $500. Compare this with the millions
of dollars it cost to gather the data. Don't forget: you will incur time
and financial costs to gather and process your own data. It may, indeed,
turn out to be cheaper to use the archive. And University dissertation
grants may even cover the acquisition cost.
-
What kinds of analyses can be done online?
Frequency
distributions? Cross-tabulations? Multiple regression or other multivariate
analyses? See if the archive uses the California-Berkeley Survey Documentation
and Analysis (SDA/DAS) program which is simple to use, covers most
basic statistics, and is unbelievably fast (including on a dial-up system).
Many online datasets are now directly linked to the SDA/DAS system.
-
Is a questionnaire available or some other
original document describing each variable in detail? Maybe it is available
as a separate link or as a .pdf document (did you remember to download
the Adobe Acrobat Reader?)
-
What is mentioned about coverage or response
rate? For example, data are missing from several states in early data
series about abortion. Some surveys have completed interviews with less
than half of the originally contacted respondents. In other cases, such
as the CIRP, response rates can vary considerably from college to college.
-
Do you need any kind of license from the
data agency? Many data sets at
the National Science Foundation, the National Center for Educational Statistics,
and other agencies require you to have a license if you work with what
is called the "unit record" data. Unit record data is the "raw data" where
each record is an individual or an institution. This means the person or
institution could plausibly be identified (although this is unlikely).
Obtaining a license is typically not a problem for legitimate researchers
but it does necessitate some paperwork so be prepared to check about this
and budget some time accordingly.
-
What was the mode of data collection? In-person
surveys may give different results than telephone surveys. The top administrator
of a university may access different data than a rank-and-file faculty
member.
-
How recently has the database been monitored
or updated? See if you can find a date on the page, typically at the
very top or the very bottom of the page. "Old pages" may have missing links,
unfixed errors, omit the most recent updates to files, or simply may not
work.
-
Were the data gathered over time by different
agencies or different principal investigators? If
so, changes in variables, definitions, or coding may have occurred. You
may find differences attributable to these changes, rather than to changes
in the concepts you are studying--thus threats to internal validity.
-
How far back does the data series extend?
The
longer the series, the more likely you are to encounter strange alphabetic
and non-alphanumeric codes, or inconsistencies in definitions or measures.
And the more likely the original data are to be flat out MISSING.
-
Were data compiled from different agencies
into a single archive? Again, check for consistencies in definitions
(even of the same variable!) across agencies.
-
See if the description of the archive notes
any problems or missing information.
-
What are your computer skills? Some
databases are in ascii format which you can probably download into a spreadsheet
such as Quattro Pro or EXCEL. But the field delimiters vary widely: some
use spaces, others use commas, still others rely on a format statement
so that the data can be read. Do you know how to analyze data using a spreadsheet
program? If not, do you know how to transfer spreadsheet data into a statistical
program such as SPSS or SAS? Do you have file management skills so that
you can insert value labels, variable labels and missing data codes? In
other cases, you may have to save or print tabular displays and hand enter
the data into a spreadsheet (very carefully). As you can see, it is VERY
helpful to have good computer skills--or to have some good friends who
do!
Any original problems when the data
were first gathered will STILL be there when the data are archived. See
what you can find out about issues with question format, sampling, coding
categories, and other sources of bias and random error. Sometimes (for
example: the General Social Survey) there will be considerable information
about entities such as response rate, sometimes there is not.
Always remember this classic cliché:
do the best you can with what you got. Despite any problems, online
databases and archives are a terrific resource for us all.
|
|
WHERE
TO START HUNTING FOR ONLINE ARCHIVES
|
-
Professional associations in your field
(check
out those resources and links to professional sites in Blackboard)
-
The FSU on-line library system
-
Search engines using your topic of interest
-
Major US government or state WEB sites
(if
you are an International Student, check out sites from your home country).
The National Center for Education Statistics, the National Science Foundation,
the Centers for Disease Control--and even the State of Florida website
all contain links to many, many databases. You will find several of them
in our course database menu.
-
Major archives such as the Inter-university
Consortium for Political and Social Research at University of Michigan
(ICPSR), Pew Center for Research on the People and the Press, or the Roper
Center in Connecticut.
-
One link leads to another. I found
the International Social Survey Program link from the General Social Survey
www site.
-
Check with faculty and graduate students
in The College of Information
-
Many recent textbooks have online supplements
or Web sites that list archives
-
Check McMillan, chapters 3 and 4 for information
on Subject Directories and Search Engines (pp. 86-87; 90; 93; 96-97).
|
|
CLICK HERE
TO ENTER THE ONLINE DATABASE MENU
|
November 19 2002
Revised January 10 2009
This page was built with
Netscape Composer.
Susan Carol Losh
Always
under construction as new databases are entered.
Please be patient with non-working links as these websites are gradually
restored and uploaded to the FSU mailer server.