Example Data Sets
for Newsom's Data Analysis Courses
|
|
Right click and "save target as" or "save file as"
to download these files.
Structural
Equation Modeling
HW1 Census undercount data (ASCII data file). Mplus input file. (store the
data and input file in the same folder). From *E.P. Ericksen, J.B. Kadane, & J.W.
Tukey (1989), "Adjusting the 1980 Census of
Population and Housing," Journal of the American Statistical Association,
84:927-944. Variables are area (AREA), the estimates of undercounts in
the area (UNDER), poverty rate (POVERTY), and crimes in the area per 1000
(CRIME).
School
Survey data from NPC Research, Inc. on neighborhood quality. ASCII data file
and Mplus input statements.
Data
Analysis II (Correlation, Regression, & Logistic
Regression)
HW
1 From the Socio-Economic
Indicators for Functional Urban Regions in the United States, 1820-1970 study
(available through the Inter-University Consortium for Political and Social
Research; ICPSR). I have selected a few variables for the 100 largest
cities for use in these problems: per capita income from 1970 (INCOME);
population growth between 1960 and 1970 (given in thousands, so that -81 indicates a decline of 81,000 and 501 is an increase of
501,000); the percentage of the population that are adults in 1970 (ADULTS).
HW 2 New version of the Socio-Economic
Indicators for Functional Urban Regions in the United States, 1820-1970 data
set with two region variables: cities in the eastern U.S. are coded 0 and cities in the
western U.S. are coded 1, and a four category region variable (REGION), with 1
= Northeast, 2 = Southeast, 3 = Southwest, and 4 = Midwest and West.
HW 3 Socio-Economic
Indicators for Functional Urban Regions in the United States, 1820-1970 (same
version as HW 2).
Chilean plebiscite data. VOTE
is the survey respondent's preference six months before the election (0 =
Pinochet, 1 = new government). SEX is the sex of the respondent (0=female,
1=male), AGE is the age of the respondent, EDUC is a variable for three levels
of education (primary, secondary, post secondary), INCOME is income in Chilean
Pesos, and STATQUO is a score on a measure of political support for the status
quo (standardized scores).
Data
Analysis I (Significance Testing, t-tests, Chi-square,
Correlation, Reliability, ANOVA)
HW1 no
downloads.
HW2
Portland
police racial profiling data. Contains data on the minority status of the driver stopped and
whether driver was arrested for 2004.
HW3
NPC School Survey Data for Reliability Analysis. (Right
click and "save target as" or
"save file as" to download this
file). Responses to the following items were on a 4-point scale (0=NO! 1=no
2=yes 4=YES!).
CRIME, How much
does "crime" and/or "drug selling" describe your
neighborhood?
FIGHTS, How much does "fights" describe your neighborhood?
BLDINGS, How much does "lots of empty or abandoned
buildings" describe your neighborhood?
GRAFFITI, How much does "lots of graffiti describe your
neighborhood?
GETOUT, I'd like to get out of my neighborhood.
MOVE, People move in and out of my neighborhood.
Socioeconomic data set
with 4 regions. From the Socio-Economic
Indicators for Functional Urban Regions in the United States, 1820-1970
study (available from ICPSR).
Artificial
data set on gas mileage in US and Canada.
Socioeconomic
data with 1960 and 1970 income and
positive/negative growth indicator.
Multilevel
Regression
HW 1 ABA neighborhood data: A community survey of 378 respondents from 42
neighborhoods conducted by the American Bar Association (ABA). SPSS Data File,
HLM Data File
Descriptive
Statistics
|
LEVEL-1 |
|
|
|
|
|
|
VARIABLE NAME |
|
|
|
|
|
|
|
N |
MEAN |
SD |
MINIMUM |
MAXIMUM |
|
NHRATING |
378 |
3.45 |
1.05 |
1.00 |
5.00 |
|
DRUGS |
378 |
2.04 |
0.74 |
1.00 |
3.00 |
|
FEAR |
378 |
2.28 |
1.12 |
1.00 |
4.00 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
LEVEL-2 |
|
|
|
|
|
|
VARIABLE NAME |
N |
MEAN |
SD |
MINIMUM |
MAXIMUM |
|
MEANPROB |
42 |
2.14 |
0.55 |
1.16 |
3.31 |
HW 2 Head Start mental health consultation survey. 559 teachers in 66 programs. SPSS Data File,
HLM Data File
|
LEVEL-1 |
|
|
|
|
|
|
VARIABLE
NAME |
N |
MEAN |
SD |
MINIMUM |
MAXIMUM |
|
HELPED |
559 |
2.96 |
0.71 |
1.00 |
4.00 |
|
SHARE |
559 |
3.54 |
0.66 |
1.00 |
4.00 |
|
CULTCOMP |
559 |
3.34 |
0.57 |
1.00 |
4.00 |
|
|
|
|
|
|
|
|
LEVEL-2 |
|
|
|
|
|
|
VARIABLE
NAME |
N |
MEAN |
SD |
MINIMUM |
MAXIMUM |
|
MONEY |
66 |
26125.98 |
42147.09 |
0.00 |
235000.00 |
HW 3 Social relationships study. SPSS Data File,
HLM Data File
Descriptive
statistics
|
LEVEL-1 |
|
|
|
|
|
|
VARIABLE
NAME |
N |
MEAN |
SD |
MINIMUM |
MAXIMUM |
|
TIME |
1500 |
2.00 |
1.41 |
0.00 |
4.00 |
|
TIMESQ |
1500 |
6.00 |
5.90 |
0.00 |
16.00 |
|
SUPPORT |
1500 |
2.39 |
0.78 |
0.01 |
4.00 |
|
HEALTH |
1500 |
2.14 |
1.06 |
0.00 |
4.00 |
|
|
|
|
|
|
|
|
LEVEL-2 |
|
|
|
|
|
|
VARIABLE
NAME |
N |
MEAN |
SD |
MINIMUM |
MAXIMUM |
|
EDUC |
300 |
4.74 |
1.95 |
1.00 |
9.00 |
British election study. HLM Data
File
Descriptive
statistics
|
LEVEL-1 |
|
|
|
|
|
|
VARIABLE NAME |
N |
MEAN |
SD |
MINIMUM |
MAXIMUM |
|
VOTED |
2278 |
0.76 |
0.43 |
0.00 |
1.00 |
|
AGE |
2278 |
46.75 |
16.01 |
18.00 |
99.00 |
|
COLLEGE |
2278 |
0.23 |
0.42 |
0.00 |
1.00 |
|
TRUSTPOL |
2278 |
3.88 |
2.17 |
0.00 |
10.00 |
|
MEANCON |
2278 |
0.12 |
0.11 |
0.00 |
0.57 |
|
|
|
|
|
|
|
|
LEVEL-2 |
|
|
|
|
|
|
VARIABLE NAME |
N |
MEAN |
SD |
MINIMUM |
MAXIMUM |
|
MEANCON |
255 |
0.11 |
0.12 |
0.00 |
0.57 |
|
|
Data
Analysis II (Correlation, Regression, & Logistic
Regression)
HW 1 Urban Mobility Report 2009
on traffic delays and population for 90 urban areas.
HW 2 Fertility rate data. Contraceptive use
and fertility in developing countries collected by Robey,
Shea, Rutstein, and Morris (1992). Variables
are COUNTRY, REGION (1=Central and Southern Africa, 2=Asia and Pacific Islands,
3=Latin America and Carribean, 4=Near East and North
Africa), FERTRATE, CNTRCPT.
HW 3
School
survey. (Right
click and "save target as" or
"save file as" to download these files.) The data set used in this
homework is from a real survey about drug use and violence collected from 11th
grade students in Oregon by NPC Research,
Inc. Twenty-eight schools are selected for these analyses, with 2,433 individuals total. Students responded to a wide variety of
questions about the student's drug use, alcohol use, violence, community, and
family. The data set contains three dichotomous variables, GUN (0=no, 1=yes),
RACE (0=white, 1=nonwhite), and GANG (0=no, 1=yes), as well as the following
three questions about alcohol use (ALCOHOL), neighborhood
support (SUPPORT), and the condition of the neighborhood (EROSION):
ALCOHOL:
Alcohol use over the last year:
0
"none"
1
"1-2 times"
2
"3-5 times"
3
"6-9 times"
4
"10-19 times"
5
"20-39 times"
6
"40+ times"
SUPPORT—Neighborhood support:
QC7 My
neighbors notice when I am doing a good job and let me know.
QC9
there are people in my neighborhood who encourage me to do my best.
QC10 There are people in my neighborhood who are proud of me when
I do something well.
QC12 There are lots of adults in my neighborhood I could talk to about
something important.
EROSION—Neighborhood erosion:
QC1A How
much does "crime and/or drug selling" describe
your neighborhood?
QC1B How
much does "fights" describe your
neighborhood?
QC1C How
much does "lots of empty or abandoned buildings" describe
your neighborhood?
QC1D How much does "lots of graffiti" describe
your neighborhood?
|
|
Right click and "save target as" or "save file as"
to download these files.
|
|
Right click and "save target as" or "save file as"
to download these files.
|
|