Choonjoolee (DEA) 10boston Conf

An Efficient Data Envelopment Analysis with a large data set in Stata
15-16 July, 2010 Boston10 Stata Conference Choonjoo Lee, Kyoung-Rok Lee sarang90@kndu.ac.kr, bloom.rampike@gmail.com Korea National Defense University
Contents
Part I. A Large Data Set in Stata/DEA
Large Data Set in DEA? Computational Aspects of Large Data Set The Scope of this Study Efficiency Matters in Stata/DEA/Linear Programming Tasks to be covered
Part II. Malmquist Index Analysis with the Panel Data

Basic Concept of Malmquist Index The User Written Command malmq
Part I. A Large Data Set in Stata/DEA

Large Data Set in DEA? Computational Aspects of Large Data Set The Scope of this Study Efficiency Matters in Stata/DEA/Linear Programming Tasks to be covered
Large Data Set in DEA?

Graphical illustration of DEA concept
Large Data Set in DEA?

Variables and Observation Constraints by the Features of DEA Domain Programs(Language)
Statistical Package based DEA Programs Spreadsheet based DEA Programs Language based DEA Codes
Performance of Linear Program(LP): Efficiency and Accuracy

LP is the Critical Component of DEA Program Approaches to Solve LP: Simplex, Interior Point Methods(IPMs) Numerous Variants of the Basic LP Approach
DEA Report Format(User Interface Design)

Results(input, output) Graphical Display Log
Computational Aspects of Large Data Set

Matrix Size for the Data Set in Matrix Format
# of rows and columns(variables and observations) allowed by the Program The storage limit of the computer memory upgrade of computer technology, the way to access the data in the memory
Matrix Density
# of nonzeros of the matrix How many zero elements in the matrix?
A Computationally Demanding Procedure of DEA due to the LP

The number of iterations needed to solve a problem grows exponentionally as a function of variables and observations
Numerical Difficulties
Inaccuracy and inefficiency due to the Floating Point Arithmetic with finite precision Numerical Precision due to the binary representation of number
The Scope of this Study

Performance of DEA code
Linear Program/Simplex Method Computational Technique Illustration
Panel Data in DEA

Malmquist Index Analysis
Efficiency Matters in Stata/DEA/LP

DEA program demands heavy computation
Computation time heavily depends on the number of observations(DMUs), variables(inputs, outputs), LP process, etc.
Stata uses RAM(memory) to store data

The memory size matters for the large data set

The performance of Input Oriented DEA models
Model 5-2-2-V1 5-2-2-V2 (released) 5-5-5-V3 365-1-5-V1 365-1-5-V2* Computatio Memory n(sec) ~20 <2 <1 ? ~14600 1G <300M <300M 6G 6G <300M Two-stage LP Mata, Tolerance Basic feasible solution Revised Simplex Method Major Areas Revised
365-1-5-V3* 20 (under development)
Stata SE

Understanding the difference of computation
Method Tableau Simplex Operation Multiplication, Division Addition, Subtraction Revised Simplex Multiplication, Division Addition, Subtraction Pivoting (m+1)(nm+1) m(n-m+1) (m+1)2 m(m+1) m(n-m) m(n-m) Pricing Total m(n-m)+n+1 m(n-m+1) m(n-m)+(m+1)2 m(n+1)
if the number of observations(n) becomes significantly larger than the number of variables(m)?

Tableau and Revised Simplex in DEA/LP
Data
Store A B C D E Input Data Employee Area 10 20 15 15 20 30 25 15 12 9 Output Data Sales Profit 70 6 100 3 80 5 100 2 90 8
Source: Cooper et al.(2006), table3-7

Tableau and Revised Simplex in DEA/LP
For DMU A
Store A
Orientation Input Oriented
Input Data Employee Area 10 20

Constant Return to Scale
Output Data Sales Profit 70 6

Variable Returns to Scale Min s.t. xA - X 0 Y -yA 0 e=1 0 Max s.t. xA - X 0 yA -y 0 e=1
The Basic DEA Models

Min s.t. xA - X 0 Y -yA 0 0 Max s.t. xA - X 0 yA -y 0 0
Output Oriented

Program Structure
DATA

Program Syntax
dea ivars = ovars [if] [in] [, rts(crs | vrs | drs | irs) ort(in | out) stage(1 | 2) trace saving(filename)]
rts(crs | vrs | drs | irs) specifies the returns to scale. The default, rts(crs), specifies constant returns to scale. ort(in | out) specifies the orientation. The default is ort(in), meaning input-oriented DEA. stage(1 | 2) specifies the way to identify all efficiency slacks. The default is stage(2), meaning two-stage DEA. trace specifies to save all the sequences displayed in the Results window in the dea.log file. The default is to save the final results in the dea.log file. saving(filename) specifies that the results be saved in filename.dta.

Develop the Basic Data Bank(input oriented CRS)
Canonical form
Min s.t. 10 - 10A - 15B - 20C 20 - 20A - 15B - 30C 6A +3B + 5C + 25D - 12E 9E 70 6 0 0
15D -
70A+ 100B + 80C + 100D + 90E 2D + 8E
Standard form
Min s.t. 10 - 10A - 15B - 20C - 25D - 12E - S120 - 20A - 15B 6A + 30C - 15D 2D + 8E 9E - S1+ -S2 + - S2+ x3 70A + 100B + 80C + 100D + 90E 3B + 5C + + x1 + x2 = 70 +x4 = 6 =0 =0

X 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 10 20 0 0 30 10 20 0 0 30 10 20 0 0 A B C D E 0 0 0 0 0 -10 -15 -20 -25 -12 -20 -15 -30 -15 -9 70 100 80 100 90 6 3 5 2 8 46 73 35 62 77 -10 -15 -20 -25 -12 -20 -15 -30 -15 -9 70 100 80 100 90 6 3 5 2 8 -47/4 353/8 -105/8 171/4 0 -1 -21/2 -25/2 -22 0 0 0 1 S1 0 -1 0 0 0 -1 -1 0 0 0 -1 -1 0 0 0 S20 0 -1 0 0 -1 0 -1 0 0 -1 0 -1 0 0 S1+ 0 0 0 -1 0 -1 0 0 -1 0 -1 0 0 -1 0 S2+ 0 0 0 0 -1 -1 0 0 0 -1 69/8 -3/2 -9/8 45/4 -1/8 x1 -1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 x2 -1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 x3 -1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 x4 RHS MRT -1 0 0 0 0 0 0 70 1 6 0 76 0 0 0 0 0 70 70/90 1 6 6/8 -77/8 73/4 3/2 9/8 -45/4 1/8 9 27/4
x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3
-53/4 -93/8 -195/8 -51/4 5/2 265/4 95/4 155/2 6/8 3/8 5/8 2/8
5/2 10/265 6/8 1/2

Model V1: Tableau DEA
A S2 + E S2 S2 + E Z 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 A 0 1 0 0 0 -1/15 35/3 -1/15 2/9 35/36 B -11/70 1/7 -11/70 41/7 49/8 C D -32/35 -89/70 6/21 -33/21 -32/35 -267/210 43/21 59/24 152/21 182/21 E 0 0 0 0 1 0 0 0 0 1 S1 -39/350 -6/35 -39/350 4/105 1/6 -1/10 -2 -1/10 0 0 S2 1/175 3/35 1/175 -2/105 S1 + -1/70 -1/70 -1/70 S2+ 0 0 0 1 0 0 0 0 1 0 RHS 1 1 1 0 0 14/15 35/3 14/15 2/9 35/36 MRT 35/3 175/1
Efficiency score() of DMU A is 14/15
-1/6 -14/15 -7/6 5/3 10/3 -55/3 -1/6 -14/15 -7/6 53/9 19/9 62/9 451/72 177/72 257/36
159/185 5 -1/12 159/212 0 0 -1/75 1 -1/6 0 -1/15 0 -4/45 0 -4/45

Model V3: Revised DEA
c A cB B 0 I cN N cN-cBB-1 N B-1 N 0 I 0 b 0 b
cBB-1 b B-1 b

X 1 0 0 0 0 0 A 0 B 0 C 0 D 0 E 0 -9 90 8 x1 x2 x3 x4 10 20 0 0 -10 -20 70 6 -15 -15 100 3 -20 -30 80 5 -25 -15 100 2 -12
cN
S1 0 -1 0 0 0 S2 0 0 -1 0 0 S1 + 0 0 0 -1 0 S2 + 0 0 0 0 -1 x1 -1 1 0 0 0 x2 -1 0 1 0 0 x3 -1 0 0 1 0
cB
x4 -1 0 0 0 1 RHS 0 0 0 70 6
Step1: Set up the initial tableau factors. Step2: Find entering variable. Step3: Find leaving variable.
Step4: Update the tableau. (Update the basis.)

- 1st step: The initial tableau factors.
B= xB= CB= CBB-1 =
- 2nd step: Finding entering variable

cN -cBB-1 N: Max value is selected as a entering variable
30
A
46
B
73
C
35
D
62
E
77 Max
S1-1
S 2-1
S1+
-1
S2+
-1
- 3rd step: Finding leaving variable

B-1 N = Min{xB/(B-1 N)} ={, , 70/90, 6/8} = 6/8 (x4)

- 4th step: Update the tableau
X 1 x1 x2 x3 x4 0 0 0 0 0 10 20 0 0 A 0 -10 -20 70 6 B 0 -15 -15 100 3 C 0 -20 -30 80 5 D 0 -25 -15 100 2 E 0 -12 -9 90 8 S1 0 -1 0 0 0
cN
S2 0 0 -1 0 0 S1 + 0 0 0 -1 0 S2 + 0 0 0 0 -1 x1 -1 1 0 0 0 x2 -1 0 1 0 0 x3 -1 0 0 1 0
cB
x4 -1 0 0 0 1 RHS 0 0 0 70 6
N
X 1 x1 x2 x3 E 0 0 0 0 0 10 20 0 0 A 0 -10 -20 70 6 B 0 -15 -15 100 3 C 0 -20 -30 80 5 D 0 -25 -15 100 2 x4 -1 0 0 0 1 S1 0 -1 0 0 0 S2 0 0 -1 0 0 S1 + 0 0 0 -1 0 S2 + 0 0 0 0 -1 x1 -1 1 0 0 0
B
x2 -1 0 1 0 0 x3 -1 0 0 1 0 x4 0 -12 -9 90 8
b
RHS 0 0 0 70 6
Tasks to be covered
Computational Accuracy
Example: Obtaining Inverse Matrix
Matrix D
1 1.341099143-61.13394928 0.4455321 1.883781314 2.58794665 3 0 0 0 0.0588235 0 0 0 0.116421975-6.672515869 -0.110761 0.495342732 0.09713860 6 0-0.172319263-19.71403694 -0.262333 - 1.54739666 0.074690066 0-0.046367686-4.060891628 -0.082268 - 0.25169459 0.009800959 0 0.105886854 4.651313305 0.1136269 - 0.03722914 0.015884314 3
Tasks to be covered
Inverse matrix D by Stata/Mata luinv (D)
1 162470623.2 -4.022811871 - 487411816.6 81235289.98 81235306 0 -147760451.4 -0.087162294 73880208 - -73880196.74 443281245.5 0 3410527.559 0.007873073 -1705264 10231581.38 1705263.517 0 16.99999999 0 86785601.44 -2.96E-17 -2.77E-08 2.18378179 1.66E-07 2.77E-08
- 260356746.7 43392788.04 43392792 0 31184842.39 0.196004759 - 93554511.28 15592419.02 15592418
Tasks to be covered
Inverse matrix D by Stata/Mata luinv (D)
. mata mata (type end to exit) : st_view(X=.,.,(" a1"," a2"," a3"," a4"," a5","a6")) : b=luinv(X) : b 1 2 3 4 5 6 1 1 0 0 0 0 0 6 1 2 3 4 5 6 81235289.98 -73880196.74 1705263.517 2.76977e-08 43392788.04 15592419.02 2 162470623.2 -147760451.4 3410527.559 16.99999999 86785601.44 31184842.39 3 -4.022811871 -.0871622935 .0078730725 -2.95716e-17 2.18378179 .1960047586 4 -81235305.55 73880208.39 -1705263.586 -2.76977e-08 -43392791.54 -15592418.13 5 487411816.6 -443281245.5 10231581.38 1.66186e-07 260356746.7 93554511.28
Tasks to be covered
D*D-1 in Stata/Mata(default tolerance)
1 5.96E-08 2.36E-08 -3.73E-08 -1.74E-18 -1.63E-09 1 -1.63E-09 1.81E-09 1 5.96E-08 9.78E-09 -2.98E-08 0 -7.45E-08 1.63E-09 -3.96E-09 -7.45E-09 0 1.000000003 0 0 0 0 4.66E-10 -1.49E-08 -2.79E-09 4.66E-09
2.95E-10 4.66E-10 0.99999998 -1.40E-09 9 3.84E-11 -1.28E-09 7.45E-09 1.000000001
Should it be Identity Matrix?
Tasks to be covered
D*D-1 in Excel
1 5.96046E-08-7.77156E-16 7.45058E-09-5.96046E-08-1.49012E-08 0 0.999999999 2.72414E-17 0 4.19095E-09 0 1.49012E-08 0 7.31257E-09 0
1 6.98492E-10 1.49012E-08 7.21775E-09 0 0.999999996 0 0
0 9.31323E-10-3.46945E-17-4.65661E-10 0.999999996-9.31323E-10 0-4.88944E-09 4.85723E-17 4.19095E-09-2.42144E-08 1
Where the computational inaccuracy comes from?
Tasks to be covered
One of the possible reasons: Decimal and Binary numbers
17(decimal number)
17 / 2 = 1 8/2=0 4/2=0 2/2=0 1/2=1
0.75(decimal) = 0.11(binary) 0.7(decimal) = 0.101100110011(binary)
0.6(decimal) = 0.100110011001(binary) 0.10(decimal) = 0.000110011001(binary) 0.05(decimal) = 0.000011001100(binary)
= 10001(binary number)
How computer saves a=0.75, b=0.7+0.05, c=0.6+0.1+0.05?
Tasks to be covered
Accuracy
Tolerance
to set upper or lower limit on the number of iterations. to stop an unattended run if the algorithm falls into a cycle
Preprocessing: Scaling
to improve the numerical gap and get a safe solution.
Ex) Rank(D)
Part II. Malmquist Index Analysis with the Panel Data

Basic Concept of Malmquist Index The User Written Command malmq
Basic Concept of Malmquist Index
Malmquist Productivity Index(MPI) measures the productivity changes along with time variations and can be decomposed into changes in efficiency and technology.

The input oriented MPI can be expressed in terms of input oriented CRS efficiency as Equation 1 and 2 using the observations at time t and t+1.

The input oriented geometric mean of MPI can be decomposed using the concept of input oriented technical change and input oriented efficiency change as given in equation 4.
The User written command malmq

Program Syntax malmq ivars = ovars [if] [in] [, ort(in | out) period(varname) trace saving(filename)]
ort(in | out) specifies the orientation. The default is ort(in), meaning input-oriented DEA. period(varname) identifies the time variable. trace specifies to save all the sequences displayed in the Results window in the malmq.log file. The default is to save the final results in the malmq.log file. saving(filename) specifies that the results be saved in filename.dta.

Example
Data

Example
Result

Example
Result
Notes
The data and code related to the presentation will be available from the Conference website.
References
Cooper, W. W., Seiford, L. M., & Tone, A. (2006). Introduction to Data Envelopment Analysis and Its Uses, Springer Science+Business Media. Ji, Y., & Lee, C. (2010). Data Envelopment Analysis, The Stata Journal, 10(no.2), pp.267-280. Lee, C., & Ji, Y. (2009). Data Envelopment Analysis in Stata, DC09 Stata Conference. Maros, Istvan. (2003). Computational techniques of the simplex method, Kluwer Academic Publishers.

Choonjoolee (DEA) 10boston Conf

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Choonjoolee (DEA) 10boston Conf

Hochgeladen von

Copyright:

Verfügbare Formate

An Efficient Data Envelopment Analysis with a large data set in Stata

Part II. Malmquist Index Analysis with the Panel Data

Part I. A Large Data Set in Stata/DEA

Large Data Set in DEA?

Large Data Set in DEA?

Performance of Linear Program(LP): Efficiency and Accuracy

DEA Report Format(User Interface Design)

Computational Aspects of Large Data Set

A Computationally Demanding Procedure of DEA due to the LP

The Scope of this Study

Panel Data in DEA

Efficiency Matters in Stata/DEA/LP

Stata uses RAM(memory) to store data

Efficiency Matters in Stata/DEA/LP

365-1-5-V3* 20 (under development)

Efficiency Matters in Stata/DEA/LP

Efficiency Matters in Stata/DEA/LP

Source: Cooper et al.(2006), table3-7

Efficiency Matters in Stata/DEA/LP

Input Data Employee Area 10 20

Output Data Sales Profit 70 6

The Basic DEA Models

Efficiency Matters in Stata/DEA/LP

Efficiency Matters in Stata/DEA/LP

Efficiency Matters in Stata/DEA/LP

70A+ 100B + 80C + 100D + 90E 2D + 8E

Efficiency Matters in Stata/DEA/LP

5/2 10/265 6/8 1/2

Efficiency Matters in Stata/DEA/LP

Efficiency score() of DMU A is 14/15

159/185 5 -1/12 159/212 0 0 -1/75 1 -1/6 0 -1/15 0 -4/45 0 -4/45

Efficiency Matters in Stata/DEA/LP

Efficiency Matters in Stata/DEA/LP

Step4: Update the tableau. (Update the basis.)

Efficiency Matters in Stata/DEA/LP

- 2nd step: Finding entering variable

- 3rd step: Finding leaving variable

Efficiency Matters in Stata/DEA/LP

- 260356746.7 43392788.04 43392792 0 31184842.39 0.196004759 - 93554511.28 15592419.02 15592418

2.95E-10 4.66E-10 0.99999998 -1.40E-09 9 3.84E-11 -1.28E-09 7.45E-09 1.000000001

Should it be Identity Matrix?

1 6.98492E-10 1.49012E-08 7.21775E-09 0 0.999999996 0 0

0 9.31323E-10-3.46945E-17-4.65661E-10 0.999999996-9.31323E-10 0-4.88944E-09 4.85723E-17 4.19095E-09-2.42144E-08 1

Where the computational inaccuracy comes from?

0.75(decimal) = 0.11(binary) 0.7(decimal) = 0.101100110011(binary)

0.6(decimal) = 0.100110011001(binary) 0.10(decimal) = 0.000110011001(binary) 0.05(decimal) = 0.000011001100(binary)

Part II. Malmquist Index Analysis with the Panel Data

Basic Concept of Malmquist Index

Basic Concept of Malmquist Index

Basic Concept of Malmquist Index

Basic Concept of Malmquist Index

The User written command malmq

The User written command malmq

The User written command malmq

The User written command malmq

Das könnte Ihnen auch gefallen