Pypy Benchmarks Pca

PyPy benchmarks
in Principal Component Analysis (PCA)

(DRAFT)
Progress is always about changes
Valery A. Khamenya
May 19, 2012
This report is a quick attempt to apply Principal Component Analysis (PCA) to

the benchmarking measurements of the fastest (as of 2012) implementation PyPy of
a quite nice language Python. The targeted audience is anyone, who loves PyPy and
perhaps hates statistics ;)
The factors that influence the variance in benchmarking data are (more powerful
tend to go first):
1. progress with time huge!
2. 32-vs-64
3. battle between sympy_expand/sympy_str and twisted_iteration
4. opposition ai vs spitfire-related tests
5. crypto_pyaes vs the rest
6. spectral-norm vs the rest
7. meteor-contest vs the rest
8. fannkuch vs the rest
9. nbody_modified vs the rest
1 What is it good for?

In short, PCA helps to reveal major factors causing variation in the data. This way one could
do the following:
1. to guess whats going on in PyPy last year from benchmarking perspective.
2. to figure out the biggest hidden games behind the PyPy efforts, what is primarily addressed, what are the prios
3. to find benchmarking tests that are too similar to each other and probably just increase
redundance
4. to estimate how a test increases the representativity/coverage of benchmarking in relation
to others
5. to decrease redundancy in the set of benchmarking tests that are used to state the final
speed-up of PyPy over CPython and therefore allow people outside from PyPy world rely
more on the PyPy speed-up factor http://speed.pypy.org
6. to see what was the influence of particular source control revision on the factors
7. to guess what kind of operations are mostly stressed in a new test (a group of simplistic
benchmarking unit-tests would be needed for this though: float arithmetics, integer
arithmetics, strings, dictionaries, loops, recursion, flow control, exeption handling, OOP,
garbage collection, multiprocessing, multithreading, I/O, arrays, etc)
Albeit, one should strongly understand that all conclusions about single test are made based
on analysis of variation of its benchmarking measurements, i.e. changes during multiple measurements. That is, PCA is like a snake that could miss a cold motionless prey. And of course
this snake will be effective, if one has a vector of measurements, e.g. multiple measurements for
a single benchmarking test from different git-revisions or, alternatively, multiple measurements
from different benchmarking tests for a given single git-revision.
2 Details to skip during 1st reading

The measurements we analyze represent PyPy progress from Jan 2010 (svn epoch) to May 2012
(git epoch). First of all lets read the data into a matrix. To make things simple lets respect
and consider only those benchmarking tests that have at least 300 measurements. Then lets
kick heartlessly those benchmarking rounds, where even one of these respected benchmark tests
failed to produce a time measurement.
Oops, svn epoch seems to be kicked, but we dont worry for moment. Each column of the
benchmarking data matrix represents measurements for a benchmarking test. Here is, e.g. the
top-left part of the matrix:
> m[1:5, 1:4]
48277:39882f1dfd15-64
48277:39882f1dfd15
48354:10f7167b3e98-64
48354:10f7167b3e98
48400:adab424acda7-64
ai bm_chameleon
bm_mako
chaos
0.7149890
0.1392867 0.2094372 0.4828594
0.7056351
0.1474193 0.2078679 0.4821176
0.7221774
0.1460866 0.2096604 0.4682653
0.7279082
0.1362366 0.1952811 0.4751145
0.7506032
0.1451451 0.2010512 0.4759733
The fun with PCA is that we could analyse the data matrix and then ...transpose it and get
an alternative point of view. Well discuss it later.
Currently we get 31 benchmarking tests and 332 benchmarking rounds. This means, in terms
of PCA we could dream of 31 factors that we might manage to discover.
3 Just to feed your interest before we go...

Statistical analysis is often underestimated. Just to feed the interest of those, who is too far
away from all PCA, ICA, whatever-else approaches. Lets apply PCA to the data matrix with
benchmarking measurements as-is.
3.1 32 vs 64 an easy Factor Nr. 2

And here are the first 2 strongest factors that influence the variation in benchmarking measurements.
PC1 x PC2
32
32
32
32
32
32
32 32 32

32 32
32
32
32
32
32
32
32
32
32
32
32
64
64
64
64
32
64
64
64
64
32
64 64
64

64
32
32
32
32 32
32
32
32
32
64
64
64
6464
64
64
64
64
64
64
64
64
32
32
64
32
32
32
64
64
64
64
64 64
64
32
32
64
64
64
64
32
64
64 64
64
64
64
6464 64
64
64
64 6464
64
64
64
64
64
6464
64
64
6464 64 64
64
64 64 64
64
64
64 64
64 64
64
64
64
64
64
64 64
6464
64
64
64
64 64
64
64

64

64

64
6464 64
64
64 64 64
64
64
64
64
64 64
64
64
64 64
64
64
64
64
64

6464 64
6464 64 64 64 64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
64
32
3232
32
32
32 32
32

32 32
32 32
32
32
32

32
32
32
32
32
32
32 3232
32 32 3232 32 32
3232
32
32
32
32
32

32
32 32 32
32
32 32
64
32
32

32
32
32
32
32
32
32
32
32
32
3232
3232
32
32
3232
32
32
32
32 32

32 32
32
32
32
32
32
32
32

32 32
32
32
3232
32
32
32 32
32 32
32
32
32 32 32
32
32 32
32
32
32
32 32
32
64
32
32 32 32
64
32

32
32
32
64
32
32
64
64
64
64
64
64
64
64
32
32
32
32
32
32
32
32
32
32
32
64
64
64
64
64
64
64
Each point represents a round of benchmarking measurements for the corresponding PyPy
source control revision and CPU-platform. By the way, factors in PCA are often named Principal
Components (PC), therefore PC1, PC2, etc.
To get an idea about the second factor (Y-axis) the benchmarking rounds are marked with
64 for 64-bit target platform, and similarly with 32 for 32-bit one. Of course, PCA is not
able to interpret factors completely or even magically name them right ;) However it does help
us to sort them by the influence they make on the variances in benchmarking measurements.
The 32-vs-64 was just an easy puzzle about second-strong factor in measurement variance.
How strong it is? This graph shows the contribution of each facor in squared variance of the
data:
4
0
Variances
Ordered factors, major factors are wellseparated
An important thing to know about this graph is the larger is difference between the factors,
the better is algorithmical separation of factors (and more reliable is the interpretation)
Zoom in first 7:
4
0
Variances
Ordered factors, major factors are wellseparated (zoom)
3.2 No 64-bit anymore and Factor Nr.1

OK lets kick 64-bit for simplisity of the next steps. The graphs will be more sparse and easier
to find interesting things during our first approaching to the data.
> m <- m[bitSuffixes == "32", ]
Thus, dont get confused, PC2 will no longer refer to 32-vs-64 story!
1.5
PC1 x PC2 (no 3264 anymore)

54509:b58494d41466
54915:573a6cacf459
53971:2a83c08dcb0e
54841:38a19a4dd9f3
53869:65f628f558ca
53817:62902925695c
54887:b7558f5630d6
54769:f6fbfecb93fd
50819:bb7e012d070a
54779:ac3066573611
54998:d8e00a3ec08d
54335:83dbfcb6f927
54279:859f1579f2bd
54854:4fc21e56dbc9
54315:1da1c1632353
51745:29a811af16dc
54978:89ed5aadced0
53930:d148511060f8
54599:442a3ea22328
49647:ddbc82ef4d8f
49705:0a31d8ef2f8a
54836:52324a85becd
52658:a8321d3e8e9c
53290:76607038e429
55013:8021ff42995c
51285:b09a9354d977
54289:ad03b1c52876
51078:b67e65d709e1
52207:5b7ecbf87681
54131:76c5931f64cc
51647:7cd209e0414e
54737:1e469996fdab
54610:92cfbb56d39e
54270:41c799d11717
50234:33ec28c6d811
52889:0eaf96f13694
54368:57f6dff7fb22
54752:6dffe8f51e7b
54811:cc436eb0a04b
54700:7fc6072593dd

54891:5e8d21a87161
52607:0f03693b05ac
49539:7d9e78a91dce
51935:cf1a8868cf4e
53127:21d7882b8571
53215:edd5581881f4
50621:b3f614a9de14
53975:8ae92dbdda48
53293:478bc4f20cb7
54898:629cfca82920

54085:285ff15e1498
54440:8ae7413e7b32
49556:b64cba156148
49840:17fd3198ef36
50138:4162bc8b5f4c
51466:f254dc780358
53841:1c6dc3e6e70c
52943:092ee39048af
50834:532130e19935
0.5
1.0
48885:c07fe33e541d
48400:adab424acda7
49403:913f736ff114
0.0
50758:85a5e1fe1ad8
53263:986d17b4a13c
53958:ea2751a04d47
50321:03796662a8a0
53248:1dcf738f99a5
54796:8cb0aa4c2211
53240:b608170d963a
49823:d9ef0a8f3fa2
52257:11d854db3e60
50792:862207881328

50594:2b3d72c181dd
54951:94956a840d5f
54046:3b48363cff78
50836:819faa2129a8
52823:f50a42098ae3
50663:16d3f098e8ec

54308:97c57afceef4
50291:e37e4e6e97b8
49857:87f2c234b924

50886:c8ddbb442986
50358:969865e9cb30
51557:9adc55550ee851616:5aa09e8483d3
51333:eb0269c21eec
50481:b673742c84f1
50969:d0d0b1bbbee8
50029:06acac97ffa5
50452:ef8e9023100e
51251:30e3fdc262ca
52576:80d15a9a3932
51216:ca3f367e84af
53274:487174b08100
49672:d550918b20a6
52004:32425967effa
50701:5467c010ecde
50397:6fb87770b5d2
50079:87235ee9b8ab 49589:e3fa364982b2
48673:b387640aa6ba

51498:7745b3fcec92
54487:11d96d5e877f
54010:4dcfa3206067
50605:10601f705a55
53000:836fcc2fe8d8
53305:81acfc4eadac
54107:5b9f7aa356a0
50316:8de6f245c959
50979:6a589f1a038a 52672:b319183b838d
50911:94e9969b5f00
49743:bf59d657e73a
51365:5e43d79c76a7
52113:380432600a53
51738:45af9fa4aed0
52573:59a514e97b66
53032:f9f3b57f1300
53155:3539e2d663f4
50935:69095778cbfd
50098:03e42e96479d
52024:f054c58ba588
51123:7bb8b38d8563
52432:fd14bc0aec12
50524:5a9a29b9c0ae
53083:49afda04d4ce
52056:5bf9a08deeb4
0.5
49458:365410e9e95e
48462:fc37961a668f
48277:39882f1dfd15
48675:3150cc438a42
48724:7cd8e99541db
48616:b994f7ec222e
48572:1847537fd4b5
48354:10f7167b3e98
48599:a3a5ac0a2daf
48482:a2b911e61392
48630:fb26ce1b9d1b
48649:06cddf70488a
49500:f46e309f89bd
48761:98bf21b80fc5
49344:73b76d76352b
48538:993b01fd53d4
48852:9e7c5b33e755
1.0
50040:e4a0b9e4d23b
51018:5afb4fd1f372
50718:196c4e9bbd48
50181:0732486f6a76
52533:f272bf10ef94
52140:646611ce782f
52030:f91dd3570b06
51707:9901f428b3b1
51701:173aa3e5cde0
48778:c4dce4f412b1
52769:37fb24cc3dde
51817:0586c5404983
52741:48ef6cd6e2df
50858:44b0e2106e2d
51002:ece227c225ab
50322:4efbd07c3e55
52396:5e6014f8952051462:c85a96246d2f
1.5
2.0
52490:30cb1ba90150
50826:f6f8ddc1a2f0
54037:4d18306a2fb3
52382:00b830d7bd6a
52336:4b90bae5c842
What is special about this stand-alone cluster of git-revisions, where e.g. git-revisions 48761:98bf21b80fc5
or, say, 48354:10f7167b3e98 are most extrem ? The answer will explain the benchmarking measurement variance Factor Nr.1 in terms of git-revisions.
It looks simply like after revisions in the range near 48277-49500 there was a considerable
qualitative speed up, i.e. after Nov/Dec 2011. Well the first 2 factors were not that much
interesting, but for those, who never saw PCA probably it was fun.
4 No old data
Lets kick old data und focus on the recent changes after git-revision 49600.
> isRecent <- sapply(strsplit(rownames(m), ":"), function(x) as.numeric(x[1])) >
+
49600
> m <- m[isRecent, ]
Do we have a major factor well-separated from the secondary one?
1.0
0.5
0.0
Variances
1.5
2.0
no 64bit data, no old data
2.0
PC1 x PC2 (no 3264 anymore, no old data)

52382:00b830d7bd6a
52336:4b90bae5c842
1.5
54037:4d18306a2fb3
50826:f6f8ddc1a2f0
52396:5e6014f89520
50858:44b0e2106e2d
52490:30cb1ba90150
52140:646611ce782f
52741:48ef6cd6e2df
50322:4efbd07c3e55
51002:ece227c225ab
51462:c85a96246d2f
51817:0586c5404983
51707:9901f428b3b1
51701:173aa3e5cde0
1.0
52769:37fb24cc3dde
50718:196c4e9bbd48
53083:49afda04d4ce
50181:0732486f6a76
52030:f91dd3570b06
52533:f272bf10ef94
51018:5afb4fd1f372
52024:f054c58ba588
51123:7bb8b38d8563
50935:69095778cbfd
53155:3539e2d663f4
52056:5bf9a08deeb4
50911:94e9969b5f00
50979:6a589f1a038a
0.5
50524:5a9a29b9c0ae
52432:fd14bc0aec12
54107:5b9f7aa356a0
51738:45af9fa4aed0
50040:e4a0b9e4d23b
53305:81acfc4eadac
52113:380432600a53
52573:59a514e97b66
50098:03e42e96479d
52672:b319183b838d
53032:f9f3b57f1300
51365:5e43d79c76a7
49743:bf59d657e73a
54487:11d96d5e877f
54010:4dcfa3206067
51557:9adc55550ee8
50701:5467c010ecde
50452:ef8e9023100e
52004:32425967effa
53274:487174b08100
51251:30e3fdc262ca
50397:6fb87770b5d2
51498:7745b3fcec92
53000:836fcc2fe8d8
50605:10601f705a55
50079:87235ee9b8ab
0.0
50969:d0d0b1bbbee8
52576:80d15a9a3932
50029:06acac97ffa5
50886:c8ddbb442986
50836:819faa2129a8
50481:b673742c84f1
49857:87f2c234b924
50358:969865e9cb30
49823:d9ef0a8f3fa2
51935:cf1a8868cf4e
54796:8cb0aa4c2211
53841:1c6dc3e6e70c
50321:03796662a8a0
54308:97c57afceef450758:85a5e1fe1ad8
50834:532130e19935
53240:b608170d963a
53263:986d17b4a13c
50594:2b3d72c181dd
52257:11d854db3e60
50792:862207881328
54951:94956a840d5f
52823:f50a42098ae3
50291:e37e4e6e97b8
51616:5aa09e8483d3
54046:3b48363cff78
53248:1dcf738f99a5
51333:eb0269c21eec
49672:d550918b20a6
50663:16d3f098e8ec 53958:ea2751a04d47
51216:ca3f367e84af
50316:8de6f245c959
49840:17fd3198ef36
54898:629cfca82920
54085:285ff15e1498
53127:21d7882b8571
0.5
53293:478bc4f20cb7
51647:7cd209e0414e
53975:8ae92dbdda48
54752:6dffe8f51e7b
50138:4162bc8b5f4c
51466:f254dc780358
54131:76c5931f64cc
52943:092ee39048af
54270:41c799d11717
54891:5e8d21a87161
50621:b3f614a9de14
54440:8ae7413e7b32
54368:57f6dff7fb22
53215:edd5581881f4
54811:cc436eb0a04b
52889:0eaf96f13694
51285:b09a9354d977
53290:76607038e429
54700:7fc6072593dd
52607:0f03693b05ac
54279:859f1579f2bd
54610:92cfbb56d39e
54289:ad03b1c52876
54737:1e469996fdab
52207:5b7ecbf87681
54599:442a3ea22328
54836:52324a85becd
50234:33ec28c6d811
54335:83dbfcb6f927
54779:ac3066573611
53930:d148511060f8
51078:b67e65d709e1
55013:8021ff42995c
52658:a8321d3e8e9c
54978:89ed5aadced0
51745:29a811af16dc
49647:ddbc82ef4d8f
49705:0a31d8ef2f8a
53817:62902925695c
53971:2a83c08dcb0e
53869:65f628f558ca
54315:1da1c1632353
54998:d8e00a3ec08d
1.0
54887:b7558f5630d6
54854:4fc21e56dbc9
50819:bb7e012d070a
54769:f6fbfecb93fd
54841:38a19a4dd9f3
54915:573a6cacf459
54509:b58494d41466
The PC1 (X-axis) is rather about time progress, but what abot Y-axis? What are its poles
52382:00b830d7bd6a and 54509:b58494d41466 ?
5 Flip-flop!
As mentioned, we could rotate data matrix to see things from different point of view.
point is a benchmarking test

spitfire_cstringio
spitfire
sympy_expand
html5lib
slowspitfire
crypto_pyaes
sympy_str
fannkuch
sympy_integrate
pyflatefast
raytracesimple
go
meteorcontest
sympy_sum
spectralnorm
django
twisted_tcp
telco
nbody_modified
chaos
float
rietveld
spambayes
bm_mako
richards
ai
bm_chameleon
twisted_pb
twisted_names
twisted_iteration
json_bench
100
200
300
400
500
What special about json_bench or html5lib? Nothing much interesting. They always show
higher avg_changed than the others. Lets normalize the avg_changed range for each test.
4e04
2e04
0e+00
Variances
6e04
8e04
first factors are wellseparated, huray! But others... :(
10
PC1 x PC2, point is a benchmarking test (Normalized)

ai
0.02
sympy_expand
sympy_str
meteorcontest
spectralnorm
0.00
richards
raytracesimple
twisted_iteration
float django
json_bench
go telco
rietveld
twisted_pb
spambayes
twisted_tcp
bm_chameleon
html5lib
sympy_sum
chaos twisted_names
sympy_integrate
pyflatefast
nbody_modified
bm_mako
crypto_pyaes
0.02
fannkuch
spitfire
0.04
spitfire_cstringio
slowspitfire
0.10
0.08
0.06
0.04
0.02
0.00
0.02
0.04
So the main battle of the last PyPy year seems to be between sympy_expand/sympy_str
and twisted_iteration.
The second big opposition is ai vs spitfire-related tests.
Well, the problem is that these first two battles are the only well-recognizable
factors.
OK, lets kick these measurements to hear the rest chorus:
> bigSolo <- c("ai", "sympy_expand", "sympy_str", "twisted_iteration",
+
"slowspitfire", "spitfire_cstringio", "spitfire")
> toTake <- sapply(rownames(tm), function(r) !(r %in% bigSolo))
> tm <- tm[toTake, ]
are top-factors separatable now?
11
0.00000
0.00005
0.00010
0.00015
Variances
0.00020
0.00025
no big solo poles changed a bit, but not much
12
PC1 x PC2, no big solo, point is a benchm. (Normalized)

spectralnorm
0.03
0.04
0.05
meteorcontest
0.01
0.00
0.01
0.02
fannkuch
crypto_pyaes
nbody_modified
floatchaos
pyflatefast
raytracesimple
go
richards
bm_chameleon
spambayes
django
twisted_pb
twisted_tcp
rietveld
sympy_integrate
twisted_names

telco
json_bench
sympy_sum
bm_mako
html5lib
0.00
0.02
0.04
0.06
Lets kick 3 more tests:

> backSolo <- c("spectral-norm", "crypto_pyaes", "meteor-contest")
> toTake <- sapply(rownames(tm), function(r) !(r %in% backSolo))
13
0.00008
0.00004
0.00000
Variances
0.00012
no back solo too poles changed a bit, but not much
14
fannkuch
0.01
0.02
PC1 x PC2, no back solo too, point is a benchm. (Normalized)
json_bench
bm_mako
go
pyflatefast
0.00
twisted_names
spambayes
sympy_sum
twisted_pbhtml5lib
sympy_integrate
twisted_tcp
rietveld
telco
django
richards
raytracesimple
bm_chameleon
0.01
chaos
float
0.03
0.02
nbody_modified
0.04
0.03
0.02
0.01
0.00
0.01
Lets kick 2 more tests:

> backSolo2 <- c("fannkuch", "nbody_modified")
> toTake <- sapply(rownames(tm), function(r) !(r %in% backSolo2))
15
6e05
4e05
2e05
0e+00
Variances
8e05
1e04
no back solo2 too poles changed a bit
16
0.010
PC1 x PC2, no back solo2 too, point is a benchm. (Normalized)

go
richards
0.005
pyflatefast
twisted_pb
twisted_names
twisted_tcp
rietveld
html5lib
json_bench
sympy_sum
spambayes
0.000
telco
raytracesimple
django
sympy_integrate
chaos
float
0.010
bm_chameleon
0.020
bm_mako
0.02
0.01
0.00
17
0.01
6 Appendix
An example of non-recognizable PCA factors from the 31*100 matrix of normally distributed
random data.
> p <- prcomp(matrix(rnorm(31 * 100), ncol = 31))
> plot(p, n = 31, "really a bad case, factors can't be separated")
1.0
0.0
0.5
Variances
1.5
2.0
really a bad case, factors can't be separated
This report is generated using LATEX, Sweave and R.
18

Pypy Benchmarks Pca

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Pypy Benchmarks Pca

Hochgeladen von

Copyright:

Verfügbare Formate

PyPy benchmarks

in Principal Component Analysis (PCA)

This report is a quick attempt to apply Principal Component Analysis (PCA) to

1 What is it good for?

2 Details to skip during 1st reading

3 Just to feed your interest before we go...

3.1 32 vs 64 an easy Factor Nr. 2

Ordered factors, major factors are wellseparated

Ordered factors, major factors are wellseparated (zoom)

3.2 No 64-bit anymore and Factor Nr.1

PC1 x PC2 (no 3264 anymore)

no 64bit data, no old data

PC1 x PC2 (no 3264 anymore, no old data)

point is a benchmarking test

first factors are wellseparated, huray! But others... :(

PC1 x PC2, point is a benchmarking test (Normalized)

no big solo poles changed a bit, but not much

PC1 x PC2, no big solo, point is a benchm. (Normalized)

Lets kick 3 more tests:

no back solo too poles changed a bit, but not much

PC1 x PC2, no back solo too, point is a benchm. (Normalized)

Lets kick 2 more tests:

no back solo2 too poles changed a bit

PC1 x PC2, no back solo2 too, point is a benchm. (Normalized)

really a bad case, factors can't be separated

This report is generated using LATEX, Sweave and R.

Das könnte Ihnen auch gefallen