Sie sind auf Seite 1von 16

Center for Effective Global Action

University of California, Berkeley

Appendix 1.2-B: Introduction to MATA


Pre-requisites

 Module 1.2: Intro to STATA.

Contents
1. Introduction ................................................................................................................................. 2
2. Matrix, vector and scalar .............................................................................................................. 3
3. Matrix manipulation .................................................................................................................... 6
4. Accessing data from STATA ........................................................................................................ 10
5. Operations with matrices ........................................................................................................... 12
6. Final Remarks ............................................................................................................................. 15
7. Further Readings ....................................................................................................................... .16
Appendix 1.2: Intro to MATA Page | 2

1. INTRODUCTION

The goal of this appendix, along with Appendix 1.3, is to offer a more advanced treatment of STATA’s
tools to perform more complex tasks. We have covered the basics in the module and are now in the
position of introducing more advanced material. This material is optional but we believe students would
appreciate to learn more about STATA’s capabilities to handle more complex operations.
One of the most important tools in this regard is MATA. MATA is a matrix programming language similar
to the C language, which syntax is pretty close to MATLAB or GAUSS. It can be used interactively to
perform matrix calculations or to build matrix functions. These features can be executed from .do files
and .ado files. MATA was added to STATA starting version 9 to overcome the limitations of traditional
STATA commands that operate with matrices.
MATA has libraries to perform advanced math and matrix functions. Moreover, MATA functions can
access STATA variables and work with a subset of them through virtual matrices. This reduces the
memory space used in STATA. MATA makes the work in STATA more efficient with a higher
computational speed. As a result, MATA has become the preferred programming language to implement
new features in STATA (Gould 2011).

In relation to traditional STATA environment, MATA is recommended for the following tasks:
STATA is better for . . . MATA is better for . . .
Parsing standard syntax Parsing non-standard syntax (including files)
Data management Performing matrix operations
Scripting existing STATA commands Non-scripting applications
Outputting (usually) Outputting (when complicated)
Posting saved results
Source: Gould (2011).

As we will see in Appendix 1.3, MATA is extremely useful for programming tasks within an .ado file or
.do file. However, it would be necessary first to introduce specific concepts about programming that
will be covered later. Therefore, we will pay attention in this appendix only to the interactive use of
MATA.

Center for Effective Global Action


University of California, Berkeley
Appendix 1.2: Intro to MATA Page | 3

2. MATRIX, VECTOR AND SCALAR

MATA is accessible directly from the STATA command window. To get into the MATA environment enter
the mata command. All subsequent commands are treated as MATA commands. You may return to the
traditional STATA session typing end in the command window. All objects created on the MATA
workplace shall remain available in MATA when you type in the mata command again.
You can create a matrix using two options. To illustrate that, let´s pretend that you want to build a
matrix equivalent to this one:

1 2
AB 
3 4
The first option is to declare the matrix element by element, separating the columns by commas (,) and
the rows by backslashes (\). To obtain a matrix equal to the one above, you must write the following
lines. First, type mata in the command window:
mata

Now, add the matrix using the following routine:


A = (1,2 \ 3,4)

We have chosen A as the name of the matrix and the numbers 1 to 4 as elements. These elements may
also be strings like in the following example:
B = (“Kurt”, “Samuel”\”Teresa”, “Jamil”)

To see the output you just need to type the names of the matrixes in the command line. For instance:
A
B

Center for Effective Global Action


University of California, Berkeley
Appendix 1.2: Intro to MATA Page | 4

The output is the following:


: A
1 2
+---------+
1 | 1 2 |
2 | 3 4 |
+---------+

: B
1 2
+-------------------+
1 | Kurt Samuel |
2 | Teresa Jamil |
+-------------------+

The second option is to generate an empty matrix by using the function J(f,c,v). This function
returns a matrix of f rows and c columns, whose elements are all equal to a constant v. In this case, the
constant is equal to a missing value (.) Then, we fill the matrix element by element, calling the row (f)
and column (c) that they occupy in the matrix, in the following way: B[f,c].
B = J(2,2,.)
B[1,1] = 1
B[1,2] = 2
B[2,1] = 3
B[2,2] = 4

We call the matrix typing the name B in the command line. The result is the following:
: B
1 2
+---------+
1 | 1 2 |
2 | 3 4 |
+---------+

MATA has a number of functions that return different type of matrices. One important matrix is the
identity matrix. The function I(n) returns an identity matrix, where n is the dimension and the main
diagonal elements are equal to 1 and 0 otherwise. For example:
I(3)

Center for Effective Global Action


University of California, Berkeley
Appendix 1.2: Intro to MATA Page | 5

The result is the following:


: I(3)
[symmetric]
1 2 3
+-------------+
1 | 1 |
2 | 0 1 |
3 | 0 0 1 |
+-------------+

To avoid being repetitive, we focus mostly in the STATA outputs since they contain the relevant code.
Another special case is the function uniform(f,c). This return a matrix (f,c) with random elements
uniformly distributed on the interval (0, 1). The function uniformseed() sets the seed. It has a similar
role as seed set STATA command. In this example, we create a matrix with 2 rows and 3 columns with
random numbers in the interval (0,1):
: uniform(2,3)
1 2 3
+-------------------------------------------+
1 | .1369840784 .643220668 .5578016951 |
2 | .6047949435 .684175977 .1086679425 |
+-------------------------------------------+

Similar exercise can be performed for a normal distribution. For this, the inverse cumulative normal
distribution is used in the following way:
: invnormal(uniform(2,3))
1 2 3
+----------------------------------------------+
1 | .3014337824 -1.545904789 .1389086436 |
2 | 1.133267712 -.6583710099 -1.700496348 |
+----------------------------------------------+

A vector is a 1 × n or m × 1 matrix. Thus, a row vector can be generated listing the elements one by one
and separating them by the column operator ",". Another way is to generate a numeric sequence. MATA
provides a special operator to assist in creating them: the symbol "..". Here is the set of examples:
F = (1,2,3)
F = (1..3)

Following the same logic, a column vector can be generated listing the elements one by one and
separating them by the row operator "\". Another way is to generate a series of numbers with symbols
"::". or by applying the transpose of a row vector by adding an apostrophe ('). For instance:
F = (1\2\3)
F = (1,2,3)'
F = (1::3)

Center for Effective Global Action


University of California, Berkeley
Appendix 1.2: Intro to MATA Page | 6

There are also functions to generate special vectors. For example, the function e(i,n) returns a vector
whose elements are equal to zero, except for the column "i" which is equal to one. See the example
below:
: e(2,5)
1 2 3 4 5
+---------------------+
1 | 0 1 0 0 0 |
+---------------------+
Unlike traditional matrix STATA setting, a scalar is equivalent to a 1x1 matrix in MATA. Therefore, it can
be written in the following way:
A=2

3. MATRIX MANIPULATION

New matrices can be generated from existing ones. One way is to use functions that combine existing
matrices to form a new matrix, using concatenation. For instance, you can build a matrix taking those
elements already existing as a sub-matrix. In this case, you can use “,” as column operator and “\” as
row operator. For instance, in the example below, we create a matrix E based on the sub-matrices A, B,
C and D:
: A=(1, 2 \ 3, 4)
: B=(5, 6, 7 \ 8, 9, 10)
: C=(3, 4 \ 5, 6)
: D=(1, 2, 3 \ 4, 5, 6)

: E = (A, B \ C, D)
: E
1 2 3 4 5
+--------------------------+
1 | 1 2 5 6 7 |
2 | 3 4 8 9 10 |
3 | 3 4 1 2 3 |
4 | 5 6 4 5 6 |
+--------------------------+

Center for Effective Global Action


University of California, Berkeley
Appendix 1.2: Intro to MATA Page | 7

You can also build new matrices based on existing ones using matrix operations. To illustrate the point,
let us start from a matrix equivalent to the following:

5 6 7 
 
B   8 9 10 
1 2 3 
 

For example, the transpose of this matrix can be generated by adding an apostrophe (') at the end. The
output is the following:

: B'
1 2 3
+----------------+
1 | 5 8 1 |
2 | 6 9 2 |
3 | 7 10 3 |
+----------------+

Another example is the function sort(X, idx), which returns a matrix with rows sorted in ascending
order according to the column idx. In the example below, sort(B,(1,2)) return the matrix B sorted
by the first and second columns:
: sort(B, (1,2))
1 2 3
+----------------+
1 | 1 2 3 |
2 | 5 6 7 |
3 | 8 9 10 |
+----------------+

A sub-matrix can be extracted from an existing matrix using the range subscripts that provides top-left
and bottom-right elements of a matrix already created. In the example, a matrix is built from rows 1 and
2 and columns 2 and 3 of the matrix B. Here is the set of commands and the result:
: B[|1, 2\ 2, 3|]
1 2
+-----------+
1 | 6 7 |
2 | 9 10 |
+-----------+

Center for Effective Global Action


University of California, Berkeley
Appendix 1.2: Intro to MATA Page | 8

A submatrix is built from discontiguous rows and columns. In the example, rows 1, 3 and 2 (in that
order) and columns 2 and 3 of matrix B make up the new sub-matrix D. The code is the following:
D=B[(1\3\2), (2\3)]

The output is below:


: D
1 2
+-----------+
1 | 6 7 |
2 | 2 3 |
3 | 9 10 |
+-----------+

We can also extract just the lower triangle of a matrix using the function lowertriangle or the upper
triangle with the function uppertriangle. The example is below:
: lowertriangle(B)
1 2 3
+-------------+
1 | 5 0 0 |
2 | 8 9 0 |
3 | 1 2 3 |
+-------------+
: uppertriangle(B)
1 2 3
+----------------+
1 | 5 6 7 |
2 | 0 9 10 |
3 | 0 0 3 |
+----------------+

It is also possible to create vectors from an existing matrix. Using the function diagonal we can extract
a column vector with the elements of the main diagonal. In the example below, we get the following
results:
: diagonal(B)
1
+-----+
1 | 5 |
2 | 9 |
3 | 3 |
+-----+

Center for Effective Global Action


University of California, Berkeley
Appendix 1.2: Intro to MATA Page | 9

Perhaps the readers want to get the same results but instead of getting a column with the results, you
may want to get the diagonal in a square matrix. The function diag builds a square matrix with
elements of the main diagonal. See the example below:
: diag(B)
[symmetric]
1 2 3
+-------------+
1 | 5 |
2 | 0 9 |
3 | 0 0 3 |
+-------------+

A row or column vector can be built up from the matrix B in the following way:
: B[2,.]
1 2 3
+----------------+
1 | 8 9 10 |
+----------------+
: B[.,3]
1
+------+
1 | 7 |
2 | 10 |
3 | 3 |
+------+

The function vec transforms a matrix into a column vector. For example, consider the following 3x2
matrix:

6 7 
 
D  2 3 
 9 10 
 

Center for Effective Global Action


University of California, Berkeley
Appendix 1.2: Intro to MATA Page | 10

Using this function, we would obtain the following results:


: d=vec(D)
: d
1
+------+
1 | 6 |
2 | 2 |
3 | 9 |
4 | 7 |
5 | 3 |
6 | 10 |
+------+

The previous steps can be reversed using the functions rowshape or colshape functions. In this
example, we recover the original matrix D from the vector d:
: rowshape(d,2)'
1 2
+-----------+
1 | 6 7 |
2 | 2 3 |
3 | 9 10 |
+-----------+

: colshape(d,3)'
1 2
+-----------+
1 | 6 7 |
2 | 2 3 |
3 | 9 10 |
+-----------+

4. ACCESSING DATA FROM STATA

There are two key MATA functions to access the database stored in the STATA memory. These are the
st_data y st_view. Both allow you to access STATA datasets loaded and presented in a matrix form in
MATA. The key difference is that with st_view any modification made to the matrix in the MATA
environment can be transferred to the underlying dataset in STATA, something that does not happen
with st_data.

Center for Effective Global Action


University of California, Berkeley
Appendix 1.2: Intro to MATA Page | 11

Consider for example the base DataFinal_A1_2.dta which can be downloaded from the course
website and upload it into the STATA environment. We can create a copy of the dataset in the MATA
environment using the routine st_data(colvector,rowvector). In the following example, we
create a sub-matrix Xvar with a sample of observations of variables of age and sex of household head.
This sample contains observations from 1 to 3 and 5 through 7. The results are shown below:
: Xvar = st_data((1::3\5::7),(" agehead ", " sexhead"))
: Xvar
1 2
+-----------+
1 | 62 0 |
2 | 31 1 |
3 | 25 0 |
4 | 30 1 |
5 | 38 1 |
6 | 29 1 |
+-----------+

One can get the same outcome by indicating the location of the variables in your database. In this case,
the variables have positions 5 and 6. The STATA routine would be the following:
Xvar = st_data((1::3\5::7),(5, 6))

We can also extract information using a conditional. For this case, you can add a third argument to the
previous function. If we include the number 0 as the third argument, we restrict the copying to those
observations in which there are no missing values. If we include a dummy variable as the third
argument, the copying is restricted solely to the observations in which that variable is identical to one. In
the example, we build a matrix Xvar containing a copy of the observations where pov_HH is equal to 1,
i.e. households living in condition of poverty. The code is the following:
Xvar = st_data((1::3\5::7),(" agehead ", " sexhead"), 0)
Xvar = st_data(.,(" agehead ", " sexhead"), "pov_HH")

To generate a view of the data we can run the following routine in the MATA environment. The matrix X
contains all the observations of the variables previously listed.
X=.
st_view(X, ., (" agehead ", " sexhead"))

Center for Effective Global Action


University of California, Berkeley
Appendix 1.2: Intro to MATA Page | 12

In a similar way as st_data, the command st_view also enables the user to restrict observations
according to a conditional:
st_view(X,(1::3\5::7), (" agehead ", " sexhead"), 0)
st_view(X,(1::3\5::7), (" agehead ", " sexhead"), "pov_HH")

We can make descriptions of a matrix (or matrices) or delete them without transforming the master
STATA dataset. We type in the command window mata describe. The result is the following:
: mata describe
# bytes type name and extent
-----------------------------------------------------------------------------
32 real matrix X[6,2]
53,040 real matrix Xvar[3315,2]
-----------------------------------------------------------------------------

We drop a matrix or just clear the MATA environment typing the following in the command window:
mata drop X
mata clear

5. OPERATIONS WITH MATRICES

There are a number of basic operations that can be applied to matrices. The main logical and
mathematical operators are the following:
Matrix operators
+ (addition) == Equal to
- (subtraction or negation) != Not equal to
* (multiplication) <, > Less, greater than
<=, >= Less, greater than or equal
/ (matrix division by a scalar) to
^ Power (of scalar) &, && logical and
|,|| logical or
For example, we perform a simple matrix multiplication operation as follows:
x = (1, 2, 3)
y = (3\ 4\ 5)
x*y

Center for Effective Global Action


University of California, Berkeley
Appendix 1.2: Intro to MATA Page | 13

The result is 26. To indicate an operation element by element between matrices of the same dimensions
we only must precede using the operator with colon ":" as follows:
x = (1, 2, 3)
y = (3, 4, 5)
x:*y

The output is the following:


1 2 3
+----------------+
1 | 3 8 15 |
+----------------+

The function cross(X,Z) is an alternative to matrix product x*y and offers the advantage of omitting
rows containing missing values (equivalent to omitting observations with missing values) and uses less
memory (especially with view functions). This function is also flexible. The version
cross(X,xc,Z,zc)adds a column to the right of X or Z when xc or xz is nonzero. This feature is
especially important for econometric estimations.

When the goal is to solve a system of linear equations of the type AB = X, MATA includes a range of
functions. For instance, to calculate the inverse of a matrix, we have the following functions:

Function: Inverse of a Matrix


luinv(A): inverse of full rank, square matrix A.

cholinv(A): inverse of positive definite, symmetric matrix A.

invsym(A): generalized inverse of positive-definite, symmetric


matrix A.

Alternatively, MATA also offers more sophisticated functions with solutions that are more suitable to
directly numerically resolve the system of equations. For instance, consider the following functions:

Function: Solve a linear equation


lusolve(A,B): A is full rank, square matrix.

cholinv(A): A is positive definite, symmetric matrix.

Center for Effective Global Action


University of California, Berkeley
Appendix 1.2: Intro to MATA Page | 14

We can use MATA to compute the coefficients of a linear regression. Recall from your basic econometric
or statistical book that the regression coefficients have the following matrix form:

b  ( X ' X ) 1 X ' y;
where X is a covariate matrix whereas y is a vector. Let’s download the dataset DataFinal_A1_2.dta.
In the MATA environment, you can access the STATA dataset and organize the variables as vectors or
matrices. Then, you can implement the regression estimator. The STATA code is the following:
use "$path/DataFinal_A1_2.dta", clear
keep if !missing(IncomeLabHH1)&!missing(agehead)&!missing(famsize)

mata
y = st_data(.,"IncomeLabHH1")
X = st_data(.,("agehead", "famsize"))
X = X, J(rows(X),1,1)
b = invsym(X'*X)*X'*y
b
end

The result is the following:


. use "$path/DataFinal_A1_2.dta", clear
. keep if !missing(IncomeLabHH1)&!missing(agehead)&!missing(famsize)
(2391 observations deleted)

. mata
------------------------------------------------ MATA (type end to exit) ----
: y = st_data(.,"IncomeLabHH1")
: X = st_data(.,("mpg", "weight"))
: X = X, J(rows(X),1,1)
: b = invsym(X'*X)*X'*y
: b
1
+----------------+
1 | -14.35157464 |
2 | -3.493746343 |
3 | 3408.640825 |
+----------------+
: end
-----------------------------------------------------------------------------

A more sophisticated version can make the estimation more efficient using the view’s command. We
first apply the command st_view on the variables and then use the function cross(X,xc,Z,zc) on
these variables. Finally, we apply a direct solution of the system of equations.

Center for Effective Global Action


University of California, Berkeley
Appendix 1.2: Intro to MATA Page | 15

The following code describes the steps:


use "$path/DataFinal_A1_2.dta", clear
mata
M=X=y=.
st_view(M, ., ("IncomeLabHH1", "agehead", "famsize"), 0)
st_subview(y, M, ., 1)
st_subview(X, M, ., (2\.))
XX = cross(X,1,X,1)
Xy = cross(X,1,y,0)
b = cholsolve(XX,Xy)
b
end

The output is the following:


. use "$path/DataFinal_A1_2.dta", clear

. mata
------------------------------------------------ MATA (type end to exit) ----
: M=X=y=.
: st_view(M, ., "IncomeLabHH1", "agehead", "famsize"), 0)
: st_subview(y, M, ., 1)

: st_subview(X, M, ., (2\.))
: XX = cross(X,1,X,1)
: Xy = cross(X,1,y,0)
: b = cholsolve(XX,Xy)
: b
1
+----------------+
1 | -14.35157464 |
2 | -3.493746343 |
3 | 3408.640825 |
+----------------+
: end
-----------------------------------------------------------------------------

6. FINAL REMARKS

We have covered in this appendix the basics of MATA. We discussed some basic operations and also
showed how it can be used to perform more complex tasks like implementing a linear regression. In
Appendix 1.3 we will show how MATA can be useful in the context of programming.

Center for Effective Global Action


University of California, Berkeley
Appendix 1.2: Intro to MATA Page | 16

7. BIBLIOGRAPHY/FURTHER READINGS

1. Baum, Cristopher (2006). An Introduction to Modern Econometrics using STATA. STATA Press.
2. Baum, Cristopher (2010). “Programming in STATA and Mata” [PowerPoint slides]. Available at:
http://economics.adelaide.edu.au/research/seminars/STATA_Lecture4.pdf
3. Gould, William (2011). “Mata, the missing manual” [PowerPoint slides]. Available at:
http://www.STATA.com/meeting/uk10/UKSUG10.Gould.pdf
4. Schmidheiny, Kurt (2008). Coding with Mata in STATA. Universitat Pompeu Fabra. Available at:
http://www.schmidheiny.name/teaching/STATAmata.pdf

Center for Effective Global Action


University of California, Berkeley

Das könnte Ihnen auch gefallen