Sie sind auf Seite 1von 2

EC303 – Assignment 3

Do SW empirical exercises: E9.1, 10.1, 10.2, 11.1, 11.2, 12.1, 12.2

Also, compete the following STATA data cleaning assignment:

This assignment is designed to familiarize you with the basic steps used to transform raw data
into the form needed for analysis. This is commonly referred to as cleaning the data.

Step 1: Go to the Bureau of Labor Statistics website for the National Longitudinal Survey of
Youth 1979 (www.bls.gov/nls/nlsy79.htm). You want to find the web investigator. This will
allow you to download data (after your register your email address).

Step 2: In the web investigator, tag and extract the following variables (you may want to use the
reference number filter to find these variables – click the reference number scroll menu, click the
item that matches the first four digits of variables in the list below, click submit filter choices,
click the boxes next to the variables in the list below in the list generated, repeat this process
until you have clicked all 12 variables in the list):

R0000100 ID
R0017300 Highest grade completed 1979
R0153000 Rotter scale – pair one, statement A 79
R0153100 Rotter scale – pair one, statement B 79
R0153200 Rotter scale – pair two, statement A 79
R0153300 Rotter scale – pair two, statement B 79
R0153400 Rotter scale – pair three, statement A 79
R0153500 Rotter scale – pair three, statement B 79
R0153600 Rotter scale – pair four, statement A 79
R0153700 Rotter scale – pair four, statement B 79
R0214700 Race/ethnicity 79
R0214800 Respondent’s sex 79

Once you have selected these variables, select extract tagged variables, and click Stata dictionary
in the extraction options, and click submit extract, click on the link to the .zip file to download to
your computer.

Step 3: This will give you a zipped file which includes a .DCT file. Save the .DCT file to your
hard drive.

Step 4: Open the data in STATA using the dictionary:

infile using “C:\filename.dct”

Step 5 (optional): You may wish to rename the variables to something easier to remember then
R0017300. You may also want to run the .do file in the .zip file that will “label” your variables.
This doesn’t change the names of the variables but it does tell you what the number 4 means in,
say, the race variable.

Step 6: Tabulate each of the variables (except for the ID#)

tab R0017300

Tabulate tells you what values the variable takes and how many observations take each value.

If there are any values which seem odd (e.g., negative numbers or very high numbers), this is
likely how they coded missing observations, etc. You need to change these values so that they
are “.”’s (how STATA codes missing values).

Step 7: Generate binary (or dummy) variables for each of the race/ethnic categories

Step 8: Generate a variable which divides the observations into three categories using the
highest grade they’d completed as of 1979 – students with less then high school, students in high
school, and students out of high school.

Step 9: Generate a Rotter Scale score

Return to the NLSY79 website and find the user’s guide. Find the description of the Rotter scale
in the user’s guide and follow its instructions to generate a total Rotter scale score.

Step 10: Drop the raw Rotter scale components.

Step 11: Generate a variable which is equal to the natural log of the total Rotter Scale Score

Step 12: Generate a new variable which is equal to the mean Rotter Score of the individual’s
schooling category (this variable should take on three values; one for each of the three schooling
categories).

Step 13: Generate a new variable which is equal to the 60th percentile Rotter Score of the
individual’s gender

Step 14: Generate a new variable which is equal to the median Rotter Score of the individual’s
combined race-gender category (i.e., Hispanic-male).

Das könnte Ihnen auch gefallen