Beruflich Dokumente
Kultur Dokumente
Introduction to STATA
Sun Li Centre for Academic Computing lsun@smu.edu.sg
Outline
Computing Resources
Datasets in STATA
Data Management with STATA Exercise 1 Data Descriptions & Simple Graphs Exercise 2
Computing Resources
STATA is a statistical package for managing, analyzing, and graphing data. has both command and menu-driven interface has cross-platform compatibility: Windows, Unix, and Mac. has three flavors:
the standard Intercooled STATA (2047 variables) the more limited Small STATA (99 variables) the extended STATA/SE (32766 variables).
Computing Resources
CAC Computing Resources for STATA users Windows:
STATA/SE version 10.0 10-user network perpetual license Installation guide
(http://research2.smu.edu.sg/CAC/StatisticalComputing/Wiki/STATASoftware Questions.aspx)
Linux CAC Beowulf Cluster: STATA/SE version 10.0 Unlimited users About CAC Beowulf Cluster: (http://research2.smu.edu.sg/CAC/HPC/Wiki/MAIN.aspx)
Getting Started
Getting Started
Review box
Variable window
Results window
Command line
Getting Started
Getting help in STATA
Help menu:
contents : for a list of command categories & language syntax help : for a STATA command with examples search: to search help by keywords
Getting Started
Website resources: The STATA website: http://www.stata.com
The STATA journal reviewed papers, regular columns, user-written
software: http://www.stata-journal.com/ STATA FAQ : http://www.stata.com/support/faqs STATA User Support : http://www.stata.com/support Books: http://www.stata.com/bookstore/statabooks.html
Contact: For statistical consultation: Sun Li: lsun@smu.edu.sg For software installation: TAN SuhWen: swtan@smu.edu.sg
Running STATA
Files in STATA Commands and Output STATA Variable Definitions
Running STATA
Files in STATA
.dta STATA dataset STATA can read and write from/to ASCII files, such as Excel files.
.do STATA do-file, command file Do files can be edited and displayed by text editor, like Notepad. .log, .smcl STATA log file, output file Log files document the commands and analysis results displayed in Result Window, including error messages. Log files (.log) can be edited and displayed by text editor.
Running STATA
Commands and Output
Command prompt driven in:
Batch mode: do-file Interactive mode: command line E.g.: verinst -- verifying version and installation of STATA
Running STATA
To save results: log-files
File -> Log -> Begin..., View..., or Close. .smcl or .log extension. Record everything in Results Window, including commands, results, error
messages, etc. If the file already exists, another dialog opens to allow you to decide whether to overwrite the file with new output or to append new output to the existing file. From command: cd // list current working directory cd D:\lsun // change working directory to be D:\lsun dir // list files under the current working directory
Running STATA
STATA Variable Definitions
Variable names 1-32 characters: recommend to use 8 characters Valid character: letters a-z, numbers 0-9 and underscore _ Name must be started with a letter (or underscore, but discouraged because STATA-generated variables start with an underscore.) Case-sensitive: lowercase or uppercase letters
Variable types
String (Storage bytes: Str1 to Str80 Str244 in SE) Numeric (categorical, continuous)
Running STATA
Format of numeric variables Numeric formula: %w.dg; %w.df; %w.de w: the total width, including period and decimals d: number of decimals
Format General Fixed Formula Example 2 1 1 1.41 1.414e+00 1,000 1000 1000 1000.00 1.000e+03 10,000,000 1e+07 10000000 1.00e+07 1.000e+07
%w.dg %w.df
Exponential
%w.de
%10.3e
Running STATA
Missing Values in STATA
Missing values are created in input or import when a numeric field is empty;
. < . a < . b < etc This can lead to mistakes in logical expressions.
Running STATA
Expressions and Functions Operators
Arithmetic ^ * / power multiplication division > < >= Relational greater than less than > or equal ! ~ | Help command: help functions Logical not not or
+ -
addition subtraction
<= == !=
~=
&
and
Running STATA
Memory Consideration When your dataset is very large, you may consider to:
Set the size of memory: set memory Set the maximum number of variables: set maxvar Set the maximum dimension of matrices: set matsize
memory
maxvar
matsize
5,000
400
2,047
10
32,766
11,000
Q2: Why do I get the error message no room to add more observations even after I reset STATA memory to load my data set?
Hint: Two important considerations: 1) Make sure that you allocate an amount of memory that is larger than the file that you are using. Stata will need the extra room to perform any commands or calculations. 2 Make sure that you do not allocate too much memory because your computer will not have enough memory (RAM) left to perform other tasks.
Datasets in STATA
Starting Point A Rectangular Matrix Data Input and Output Edit Data Properties
Variable Management
Data Reorganization Date and Time Values in STATA
Datasets in STATA
Starting Point: A Rectangular Matrix
X 11 X 12 X 13 ... X 1K
N observations
Datasets in STATA
Data Input and Output
Load STATA-format dataset:
clear
Note:
STATA is case-sensitive. All STATA commands are lowercase. STATA allows only one dataset at one time in memory.
Datasets in STATA
varlist : a list of variables with blanks in between.
var1 var1 var2 var3 var* *var var1-var3 just one variable three variables variables starting with var variables ending with var var1, var2 and var3
if : conditional language
if mpg>40 if mpg>40 & income==70 if mpg>40 | mpg <10
Datasets in STATA
Import dataset of other formats
Stata can import tab-delimited ASCII text files directly.
Excel can write tab-delimited ASCII text files choose FileSave AsSave as type: Text (tab delimited)
Import text file into STATA
Datasets in STATA
Example
sysuse auto, clear save auto, replace describe browse edit
//open system dataset auto.dta and clear any dataset in memory if any //save the data in memory to working directory and replace if any //describe the dataset //open data browser
Datasets in STATA
Edit Data Properties
generate x=price/mpg rename x priceunit label variable priceunit "price per mpg list priceunit in 1/10
//create new variable from algorithm //rename variable //label variable //list first 10 obs for the variable priceunit
d date year1 month1 year2 month2 list date year1 year2 month1 month2 in 1
Exercise 1
Plot area
Y-axis title
10
20
30
2,000
4,000
5,000
Legend
first legend
Note: This is the outer region or background
second legend
Frequency
10
Frequency
2,000
4,000
5,000
10
15
1,000
1,500
2,000
3,500
4,000
4,500
5,000
20
20.6
20.4
30
15
10
20
10.5
10
Domestic
Foreign
-4000
4000-4999
5000-5999 mpg
6000-9999 trunk
10000+
15.38%
13.46%
9.091% 18.18%
17.31%
-4000 5000-5999 10000+ 4000-4999 6000-9999
42.31%
22.73%
4000-4999 6000-9999
Graphics: Q & A
http://www.stata.com/support/faqs/graphics/
Exercise 2
Next Session
Statistical Analysis
17 Oct Friday, 9.30am-12pm
Training Room @ Library Level 5
Data Description And Simple Inference
Group Comparison And Correlation General Linear Regression
Logistic Model
Binary Logistic Model Ordinal Logistic Model