Example Data
Sets for Data Analysis Courses
Right click and “save target as” or “save file as” to
download these files.
Data
Analysis II (Correlation, Regression, & Logistic
Regression)
HW 1
Urban Mobility Report 2009
on traffic delays and population for 90 urban areas.
HW 2 Urban Mobility Report 2009
(new version) Population and traffic delay data with two regional
indicators: EASTWEST (0=East, 1=West)
and REGION (1=Northeast, 2=South, 3=Midwest/West, 4=Southwest)
HW
3 Urban Mobility Report 2009
(same as HW 2). Population,
density, and traffic delay data with two regional indicators: EASTWEST (0=East, 1=West) and REGION
(1=Northeast, 2=South, 3=Midwest/West, 4=Southwest)
Chilean plebiscite data. VOTE is the survey respondent’s preference six months
before the election (0 = Pinochet, 1 = new government). SEX is the sex of the respondent (0=female,
1=male), AGE is the age of the respondent, EDUC is a variable for three levels
of education (primary, secondary, post secondary), INCOME is income in Chilean
Pesos, and STATQUO is a score on a measure of political support for the status
quo (standardized scores)
Data
Analysis I (Significance Testing, t-tests,
Chi-square, Correlation, Reliability, ANOVA)
HW1 no
downloads.
HW2
Portland police racial profiling data. Contains data on the minority status of the driver
stopped and whether driver was arrested for 2004.
NPC School Survey Data for Reliability
Analysis. (Right click and “save target as” or “save file as” to
download this file).
Responses to the following items were on a 4-point scale (0=NO! 1=no 2=yes
4=YES!).
CRIME, How
much does "crime" and/or "drug selling" describe your
neighborhood?
FIGHTS, How
much does "fights" describe your neighborhood?
BLDINGS, How
much does "lots of empty or abandoned buildings" describe your
neighborhood?
GRAFFITI, How
much does "lots of graffiti describe your neighborhood?
GETOUT, I'd
like to get out of my neighborhood.
MOVE, People move in and out of my neighborhood.
HW3
Urban Socioeconomic Income and
Growth Data for 50 largest cities with 4 regions (Problems 1-2)
U.S. and Canada artificial car
and truck mpg data (Problems 4-5)
Urban Socioeconomic Income and
Growth data for 50 largest cities 1960-1970 (Problem 6)
Multilevel
Regression
HW 1
School alcohol
and drug survey (SPSS). (Right click and “save target as” or “save file as” to
download these files.) The
data set used in this homework is from a real survey about drug use and
violence collected from 11th grade students in
Alcohol use over the last year:
0 “none”
1 “1-2 times”
2 “3-5 times”
3 “6-9 times”
4 “10-19 times”
5 “20-39 times”
6 “40+ times”
Neighborhood
support:
QC7 My neighbors notice when I am doing a good
job and let me know.
QC9 there are people
in my neighborhood who encourage me to do my best.
QC10 There are people
in my neighborhood who are proud of me when I do something well.
QC12 There are lots of adults in my neighborhood I
could talk to about something important.
Neighborhood erosion:
QC1A How much does "crime and/or drug selling"
describe your neighborhood?
QC1B How much does
"fights" describe your neighborhood?
QC1C How much does
"lots of empty or abandoned buildings" describe your neighborhood?
QC1D How much does
"lots of graffiti" describe your neighborhood?
HW 2
(Right click and “save target as” or “save file as” to download these files.)
Descriptive statistics for the HLM data file:
LEVEL-1 DESCRIPTIVE
STATISTICS
VARIABLE NAME N
MEAN SD MINIMUM MAXIMUM
NHRATING 396 3.46 1.05 1.00 5.00
PROBLEMS 396 2.16 0.77 1.00 4.00
SERVICES 396 3.44 0.86 1.00 5.00
LEVEL-2 DESCRIPTIVE
STATISTICS
VARIABLE NAME N
MEAN SD MINIMUM MAXIMUM
MEANSERV 42 3.45
0.43 2.55 4.38
HW 3
(Right click and “save target as” or “save file as” to download these files.)
Problems
1-4: Social Relationships data files: SPSS, HLM
5-wave
longitudinal data collected at 6-month intervals. The following are descriptive statistics for
the HLM data file:
LEVEL-1 DESCRIPTIVE
STATISTICS
VARIABLE NAME N
MEAN SD MINIMUM MAXIMUM
TIME 1500 2.00 1.41 0.00 4.00
TIMESQ 1500 6.00 5.90 0.00 16.00
SUPPORT 1500 2.39 0.78 0.01 4.00
HEALTH 1500 2.14 1.06 0.00 4.00
LEVEL-2 DESCRIPTIVE
STATISTICS
VARIABLE NAME N
MEAN SD MINIMUM MAXIMUM
EDUC 300 4.74 1.95 1.00 9.00
Problems
5-6: British Elections Study 2005 data files: SPSS, HLM.
Study of
election attitudes in England, Scotland, and Wales. The following are descriptive statistics for
the HLM data file:
LEVEL-1 DESCRIPTIVE
STATISTICS
VARIABLE NAME N
MEAN SD MINIMUM MAXIMUM
VOTED 2278 0.76 0.43 0.00 1.00
AGE 2278 46.75
16.01 18.00 99.00
COLLEGE 2278 0.23 0.42 0.00 1.00
TRUSTPOL 2278 3.88
2.17 0.00 10.00
MEANCON 2278 0.12 0.11 0.00 0.57
LEVEL-2 DESCRIPTIVE
STATISTICS
VARIABLE NAME N
MEAN SD MINIMUM MAXIMUM
MEANCON 255 0.11 0.12 0.00 0.57
Structural
Equation Modeling
HW1
Census undercount data. From *E.P. Ericksen, J.B. Kadane, & J.W.
Tukey (1989), "Adjusting the 1980 Census of Population and Housing,"
Journal of the American Statistical Association, 84:927-944. Variables are area (AREA), the estimates of
undercounts in the area (UNDER), poverty rate (POVERTY), and crimes in the area
per 1000 (CRIME).
HW
2
School
Survey data from NPC Research, Inc. on
neighborhood quality. ASCII data file
and Mplus input
statements.
HW
3
School
survey data with Gender. SPSS file, ASCII data file,
and Mplus input statements for reading the data: Problem 1b,
Problem
1c, Problem
2a&2b.
Social
relationships data. ASCII data file,
Mplus input statements for reading the data:
Problem
3a, Problem3b.