Sie sind auf Seite 1von 11

Comparison of the Maps of Science1

Tomas Cahlik2

The aim of this article is to describe some methods of comparison of maps of science and to show
possibilities that these methods give for further research in this interesting area.

Keywords: co-word analysis, citation analysis, maps of science, science dynamics

Introduction

Two methods can be used for creation of maps of science, coword analysis [1,2,3,4,5,9] and citation
analysis [6,8]. This article begins with a short summary of both of them. In the main body of this article we will
touch two problem areas. The first one is the comparison of maps of one scientific field obtained from
publications issued in different time periods. This comparison enables the analysis of the dynamics of evolution
of a scientific field. We will compare maps obtained by coword analysis. The main methodological problem of
this comparison is the identification of a specific research theme on various maps. The reason is that not only a
theme can change its position on the map but even the keywords that form a theme usually change. The solution
of this problem lies in defining as identical such themes on various maps that have the number of common
keywords over a (subjectively )given threshold3. The second problem area is the comparison of maps of one
scientific field in a certain time period, when the maps have been obtained by various methods (coword analysis
and citation analysis). This comparison brings new insights into the interpretation of maps of science4. There are
two main methodological problems of this comparison. The first one is that both coword method and citation
analysis use different types of presentation of the maps. We will solve this problem by converting the map
obtained with citation analysis into the presentation used by coword analysis5. The second one is again the
identification of a specific research theme on various maps. In coword analysis themes are formed by keywords,
in citation analysis themes are formed by citations. Therefore we cannot use the same trick as in the first problem
area. The solution of this problem lies in following steps: For each theme we can find relevant articles. In
coword analysis, relevant articles are those that have the number of common keywords with the theme over a
(subjectively) given threshold. In citation analysis, relevant articles are those that have the number of common
citations with the theme over a (subjectively) given threshold. We define as the same theme such themes on
various maps that have the number of common articles over a (subjectively) given threshold.
In the conclusion of this article some results implied by comparison of maps of science are summarized.
These results concern dynamics of scientific fields and new possibilities for interpretation of maps of science.

1
Results of grant 402/00/0999 Research and Development in Economic Growth Models, Grant Agency of the
Czech Republic are used in this article.
2
Charles University Prague, Faculty of Social Sciences, Institut of Economic Studies, Opletalova 26, Praha.

E-mail: Cahlik@mbox.fsv.cuni.cz

3
Lets imagine an analogous problem how we could distinguish the villages that have been moved by a giant to
a different place on the maps from various eras of Liliputs Empire. We would have to look at the houses that
form the villages and if we could find more than for example 5o % of the same houses it would be the same
village.
4
Lets use here another analogy from cartography. It can be quite difficult to distinguish which parts of the
world are described on an old geografical maps. The reason is mainly the shortage of geografical facts in the
time of their creation. Interpretation of scientific maps is analogous to the problem of distinguishing the parts of
the world on an old geographical map. (A question then is whether a map is so imprecise that it is useless.)
5
The shape of Australia looks different on a globe from the one on a plane. The reason is that two various
projections are used. The conversion of a map obtained with citation analysis into the coword analysis standard
is analogous to the change of projection in geografical maps.
Methods of Mapping Science Coword Analysis and Citation Analysis
Coword Analysis

Articles from journals that are mostly used by scientists in a specific research field are excerpted for a
certain time period. Let us suppose that the resulting database describes the main development of a research field
well. Keywords used to describe the contents of an article are the basic building blocks of a research field
structure. A cluster of keywords can be understood as a short description of a research theme. A research field is
then described as a structure of mutually connected research themes.
A research theme can be identified by using the information about common occurrences of keywords in
some articles. Let us calculate an "association index" as:
eij = fij2/(fi.fj) ,
where fij is the number of common presence of keywords i and j in an article, fi is the total number of
occurrences of the word i in all the articles. We can understand an association index as a measure of strength of
ties between keywords in a research field. This measure is then used for clustering the keywords into research
themes.
Each research theme obtained in this process has two parameters. The first one, called "density",
measures the strength of internal ties among all keywords describing the research theme. We can understand this
parameter as a measure of the theme development. The second one, called "centrality", measures the strength of
external ties to other themes. We can understand this parameter as a measure of importance of a theme in the
development of the entire analyzed field.
Both median and mean values for density and centrality can be used in classifying themes into four
groups. Thereafter, a research field can be understood as a set of research themes, mapped in a "strategic
diagram" - graph made by plotting themes according to their centrality and density rank values (if we use median
for classifying clusters) or values (if we use mean) along two axes, x-axe centrality, y-axe density. Strategic
diagrams with rank values are used more commonly than the ones with values, because of their legibility.
Themes in the first quadrant are both well developed and important for the structuring of a research field.
Themes in the fourth quadrant have well developed internal ties but unimportant external ties and so are only of
a marginal importance for the field. Themes in the third quadrant are both weakly developed and marginal.
Themes in the second quadrant are important for a research field but are not developed. We use the clockwise
convention of quadrant numbering and we hope it will not cause any misunderstandings later in the text6.
To be more specific we show the application of coword method in the scientific field Water Resources.
In 1997, Institute for Scientific Information in Philadelphia, USA, covered 42 journals for this field. We
excerpted articles from 20 of these journals with the highest impact factor, for the following five years, 1994-
1999. So, for 1994 we obtained 1481 articles, for 1995 1626 articles, for 1996 2096 articles, for 1997 1664
articles and for 1998 1808 articles. We then analysed these articles with the help of computer program
LEXIDYN7.

Strategic diagram for the first period is in Fig. 1.

6
A detailed description can be found for example in [4].
7
You can find this application on http://tucnak.fsv.cuni.cz/~cahlik/ under Economic Growth.

2
8

36

20

35

15

Fig. 1: Strategic Diagram for the 1st period.

All the themes on Fig. 1 are clusters of keywords. For example theme 8 in the first quadrant is the
cluster of the following keywords: design , immiscible fluids, unsaturated zone, filamentous organism,
sensitivity theory, floc forming organis, surface, population dynamics, uncertainty, simulation.
.

Citation Analysis
Maps of science based on citation analysis are products of the Institute for Scientific Information (ISI)
in Philadelphia, USA. In this case clusters are not formed by keywords, but they are formed by reference
citations cited articles. ISI has been creating maps of the whole science since 1983 combining SCI and SSCI
databases. This gives the possibility to find even connections between sciences and social sciences8 on the map.
There are many problems connected with mapping the whole science. The first one is that the average
numbers of citations differ in different scientific fields. The solution lies in the so called fractional citation
counting. For the presentation of the map, ISI uses the multidimensional scaling technique. The distances
between nodes show the strength of ties between themes - the shorter the distance, the stronger the tie. On the
map, even the size of clusters, given by the number of source citations is distinguished. A small part of a fictive
map of science can then look as in Fig. 29.

8
With coword analysis, usually maps of specific scientific fields are created.
9
A detailed description can be found for example in [6].

3
1
4

5
3

Fig. 2: A small part of a fictive map of science.


The strongest tie is between clusters 2 and 4, the weakest one between 5 and 3. Cluster 5 has the most
and cluster 3 the least of source citations.

Comparison of both methods.


Coword analysis was developed by French scientometricians, that had known the work of ISI quite
well. They use in principle the same algorithm for the creation of clusters as in ISI. The main advantage of the
use of keywords for the French was that keywords can also be found in other databases than in SCI and SSCI.
French elaboration is the strategic diagram with density and centrality, in ISI they do not count these
two parameters for clusters and they present maps using multidimensional scaling. The question is how
situations of contradictions in the lengths of ties are solved. In Fig. 3a, the distance between clusters 7 and 8 is
implied by their relative position that is given by other distances among all the clusters. If the strength of ties
between clusters 7 and 8 obtained directly by citations differs from this distance, all the distances must be
adjusted in some way or we must enter into a higher dimensional space (Fig. 3b). ISI informs that in principle a
two dimensional projection is possible. This would be a very interesting empirical finding and it would be
interesting to find its explanation.

z
y
5 ,7 ,8 a re in p la n e xy
6 6 is a b o ve th a t p la n e
7 7
6

5 8 5 8

a ) p la n e ch a rt b ) 3 D ch a rt x

Fig. 3: Problem of space projection.


Coword analysis identifies mutual strengths of scientific themes. From the point of view of time, all the
themes in the strategic diagram are equally actual. Citation analysis stresses the mechanisms of the emerging of
scientific themes. According to the age of citations, clusters showing more actual problem areas can be
distinguished.

4
Comparison of maps of one scientific field obtained by coword analysis in various periods

Scientific fields evolve in time. Let us use again the application of coword analysis in the scientific field
Water Resources. Strategic diagram in the fifth period (year 1998, Fig. 4), looks quite different from the
diagram for year 1994 (Fig.1).

92
9
20
1
211
99
6

10
11
46

15
39
7
2

Fig. 4: Strategic diagram for the fifth period.


Some themes live over more periods. In Fig. 5 you can see chains of themes, each chain describes the
dynamics of evolution of one scientific theme10. Threshold 3 says that at least three keywords are common for
themes in the following periods. So in Fig. 5 we can read, for example, that theme 4 in the first period
transformed into theme 3 in the second, third and forth periods and into theme 9 in the fifth period11.

10
Just remember that we define as the same theme such themes on various maps that have the number of
common keywords over a subjectively given threshold three in this case.
11
The numbers are given to the themes by the program Lexidyn itself, according to the most important keyword.
Only an expert for the specific scientific field can make the interpretation.

5
CHAINS OF THEMES
THRESHOLD : 3

*********************************************
4 -> 3 -> 3 -> 3 -> 9
*********************************************
11 -> 11
*********************************************
18 -> 18
*********************************************
2 -> 2
2 -> 2
*********************************************
15 -> 25
*********************************************
6 -> 19 -> 6
*********************************************
35 -> 46
*********************************************
88 -> 39 -> 64 -> 92
88 -> 39 -> 64 -> 39
*********************************************
36 -> 1 -> 1
1 -> 1

Fig. 5: Evolution of themes in time.


Some additional information can be given to those chains to obtain a quite detailed look at the
dynamics of evolution of scientific themes12. Graphical presentation as in Fig. 6 can be used. Here we can read,
for example, that theme 2 was in the first period in the second quadrant and had ten keywords, 6 keywords from
this theme were not even in the dictionary in the next period (+++), three were in theme 2 in the second period.
Into theme 2 other 4 keywords that were in the first period in the dictionary entered but they were not included
in any theme (xxx). Four keywords from theme 2 from the second period were in the third period in the
dictionary but they were not in any theme, three keywords were not even in the dictionary. (Themes 2 in the
second, third and forth period cannot be considered as the same themes with threshold 3).13
2 2
2 10 6 4 10 4 1994
+++ xxx
3 3
xxx 4 +++ 5 4 xxx +++ 4
3 4 2
2 10 3 8 8 10 5 3 10 3 1995
4 +++ +++ +++
xxx 5 4

+++ 7 +++ 3 +++ 3


1 4 2
2 10 7 3 9 10 3 10 3 1996
+++ +++
6 5
xxx
+++ 4 3 xxx 3
2 4 2
2 10 6 4 10 3 10 1997
3 3 3 3
+++ 5 +++ 4
+++ 5 1 3 xxx 3
3 9 2 10 3 9 10 1
2 10 9 10 1998

Fig. 6: Detailed information about evolution of some themes.

12
Program Lexidyn, function Field Analysis - Parents and Children of Themes.
13
A careful reader can be surprised that in fig. 4 the themes 2 and 39 are in the second quadrant. The reason is
that in strategic diagrams, medians are used for projection and in fig. 6, means are used for counting the position
of themes. It can cause small differences near the axes.

6
Comparison of maps of one scientific field obtained by both coword and citation analysis

Let us use as example again the scientific field Water Resources. In Fig. 7 is the strategic diagram of
this scientific field in 1994, based on keywords, with two strongest ties of theme 1214. In Fig. 8 is the strategic
diagram of the same scientific field in the same time, but based on reference citations15. In this diagram, the most
important ties of theme 896 can be found.

12
41

20

4
7
2

Fig. 7: Strategic diagram based on keywords.

14
A careful reader must ask now, why fig. 7 differs from fig. 1. For the analysis in fig. 7, lower number of
articles was used than for the analysis in fig. 1 (there it was 1481 articles). As far as I know, the sensitivity
analysis of the number of articles has not been made yet though it would be very interesting for the interpretation
of maps.
15
Theme 896 is formed by the following citations: vomvoris eg 1990 wat, kitanidis pk 1988 j, rubin y 1991
water r, chin da 1992 water r, garabedian sp 1991 w, dagan g 1982 water r, gelhar lw 1981 water, dagan g 1990
water r, graham w 1989 water , vomvoris eg 1986 the.
Those are the codes of citations used by ISI, in SCI database you can use these codes to find the cited
articles.

7
100 9

151 6
770

772
647
1198
896

860

155 5
126 9 642

103 3

650
142 1

696

128 6
143 0

Fig. 8: Strategic diagram based on citations.

With coword analysis are the themes formed by keywords, with citation analysis by citations. So we
cannot define as the same theme such themes on various maps that have the number of common keywords over
a given threshold. Therefore we make the following steps: For each theme in Fig. 7 we find relevant articles,
those articles that have more than two of common keywords with the theme. For each theme in Fig. 8 we find
relevant articles, too, but relevant articles are here those articles that have more than two of common citations
with the theme.16. In the following table we can identify the number of articles that are common for themes in
Fig. 7 and Fig. 8.
Themes in fig. 7
1(6) 12(30) 2(8) 20(9) 4(20) 41(10) 7(10) 9(6)
1009(11) 0 1 0 0 2 0 0 0
1033(11) 0 1 4 0 0 0 0 0
1198(6) 0 0 0 0 0 0 0 0
1269(8) 0 0 0 0 0 0 0 0
1286(8) 0 0 0 0 1 0 0 0
1421(12) 0 0 0 0 0 0 0 0
Themes 1430(11) 0 0 0 0 1 0 0 0
In fig. 8 1516(5) 0 0 0 0 0 0 0 0
1555(10) 0 2 0 0 2 0 0 0
642(12) 0 9 0 0 1 0 0 0
647(6) 0 0 0 0 0 0 0 3
650(6) 0 0 0 0 0 0 0 0
696(18) 0 12 0 0 2 0 0 0
770(6) 0 0 0 0 2 0 0 0
772(7) 0 0 0 0 0 0 0 0
860(6) 0 0 0 0 0 0 0 0
896(10) 0 8 0 0 1 0 0 0

16
Threshold two is given subjectively.

8
Lets try to find identical themes in Fig. 7 and Fig.8 based on this table.
1. Theme 12 in Fig. 7 is a cluster of themes 896,696,642,1555 and 1009 in Fig. 8. This statement can
be further based on the ties of theme 896 that are shown in Fig. 8. (We do not put the theme 1033
into the cluster because there are no relevant ties in Fig. 8 and only one common article is not
sufficient in relation to the total number of 11 articles in the theme 1033).
2. Theme 9 in Fig. 7 is the theme 647 in Fig. 8. This statement can be further based on the similar
position of both themes in strategic diagrams.
3. Theme 4 is a cluster of themes 1009, 1286,1555,642,696,770 and 896.
4. Theme 2 is theme 1033.
The clusters in Fig.8 that could be identified as themes 12 and 9 in Fig. 7 are very similar. Therefore it is
possible that the expert that must make the final interpretation will identify both clusters as one scientific theme.
This statement can be further based on the relevant tie in Fig. 7.

Conclusions results so far obtained by comparison of maps

Comparison of maps of one scientific field obtained by coword analysis in various periods enables
to obtain empirical knowledge about the dynamics of scientific fields and scientific themes.
Concerning the evolution of a scientific field, we can find some arguments [7,9] that the evolution of
intensity of research activity (number of publications) during the life-span of a field (Fig. 9) is correlated with
some patterns of research themes concentration in a strategic diagram. In the first and last stages, the themes are
commonly concentrated in the second and fourth quadrants. In the second and fourth stages, the themes are
dispersed in all four quadrants. In the third stage, the concentration is in the first and third quadrants.

intensity
3 -maturity

4 - depression

2 - expansion
1 - start
5 - obsole

time
Fig. 9. Intensity of research activity during the life of a scientific field.

Concerning the evolution of themes, following statements can be made [1,2,4]:

1. Themes that live more periods often survive to further periods.


2. Themes that have had an interesting evolution survive more often than themes with simple dynamics.
3. Themes from the first and second quadrants survive more often than themes from the third or forth
quadrants.
4. One can see the tendency of the themes from the second quadrant to go to the first quadrant. This
development is not at all surprising, the themes that are central are interesting for the field and thus have a
tendency to be elaborated.

9
5. Themes from the fourth quadrant are mostly coming into the field as already elaborated in another research
field. If this spring-in succeeds ie. if the connections of such a theme to its new field are so interesting that
they are elaborated, then such a theme becomes central in further periods. But most of the themes from the
fourth quadrant leave the field in the next period. Those are the springs-in that are not considered as
interesting by researchers.
6. Themes from the first quadrant that will not survive can make another research field richer or their
development can continue (be hidden) in applications.

Let us apply these statements in the scientific field Water Resources. The most interesting dynamics (Fig. 6)
has the theme 64 from the fourth period. This theme had lived long before and it parted in the fifth period into
two subthemes. From these subthemes theme 92 is definitely interesting, because it is in the first quadrant, but
we cannot exclude an intensive restructuring of theme 39 in the third quadrant. Another basic theme for this
scientific field is theme 9 that has the longest life.
This information is of enormous importance for the effectivity of research and development. It enables the
influence on the flow of financial resources into such scientific fields and themes that have the possibility of an
interesting evolution, of potentially interesting results that will potentially augment the prestige of all
participants. Furthermore, it enables the scientists to filter the enormous amount of articles connected with their
scientific field. The scientists can concentrate only on such articles that are connected with interesting and basic
themes17.
Comparison of maps of one scientific field obtained by both coword and citation analysis enables a
better interpretation of maps.
The main interpretation problem with coword analysis is the formulation of scientific themes. The number
of a theme in the strategic diagram is only a technical mark, it is the number of the most important keyword18.
Even all the keywords that form a cluster in the strategic diagram do not need to describe the scientific theme
well. Final interpretation can be made only by an expert for the specific scientific field on the basis of his expert
knowledge.
In the maps created by ISI, the ties between nodes are interpreted and the technique used is called citation
context analysis. In principle, articles that form the relevant cocitation ties are analyzed (Fig. 10).

1 d yn am ics
4
d iffe re n tia l ca lcu lu s

2
e co no m ic g row th re sea rch a n d d e ve lop m e n t
3 5
S o lo w M o d el

Fig. 10: Part of a fictive map of science with citation context analysis.

17
Here we ought to stress that we do not know very much about the dynamics yet. It diminishes the usability of
obtained results.
18
Numbering is made by the program LEXIDYN itself, the only reason for numbering is the legibility of
diagrams. Just imagine what would the diagrams look like if there were the whole keywods.

10
Usually the minimal tree in the graph is then found and its interpretation forms the description of basic
connections in science on the relevant degree of aggregation. It is interesting that the connections are sometimes
based on methods and sometimes on empirical data. Some clusters give theory, others apply this theory to solve
practical problems.It is often difficult to decide about causality, the ties between clusters are usually reciprocal.
As in the coword analysis, here too the base for final interpretation is the knowledges of an expert in this branch
of science.
We can question if the maps obtained by both methods are so imprecise that we cannot practically use
them. So far only the experts for the specific scientific field can judge it19. But, if we could show correspondence
between maps obtained by both methods, it would be an objective proof for their relative correctness.
The interpretation of maps of science can be understood as a comparison with a subjective map existing
in the head of an expert. The aim of the interpretation is in principle an explicit presentation of such subjective
maps. If we could show that the maps obtained by coword analysis and by citation analysis are relatively correct
then the expert could take them as a solid base for interpretation and the information from these maps could be a
real base for the identification of the structure of science.

References
[1] Cahlik,T.- Jirina,M.: Scientometric Analysis of Artificial Neural Networks Scientific Field. Neural Network
World, 1997,No. 2.
[2] Cahlik,T.- Jirina,M.: Knowledge Restructuring during Scientific Field Development. In: Proceedings of the
Workshop on Artificial Intelligence Techniques, Brno 1996.
[3] Callon,M. Law,J. Rip,A. - Mapping the Dynamics of Science and Technology. MacMillan, London, 1986.
[4] Courtial,J.P. - Introduction a la Scientomtrie, Anthropos, Paris, 1991.
[5] Courtial, J.P.-Cahlik,T.-Callon,M. A Model for Social Interaction Between Cognition and Action through a
Key-word Simulation of Knowledge Growth. Scientometrics, Vol. 31, No.2, 1994.
[6] Garfield,E. - Citation Indexing: its Theory and Application in Science, Technology and the Humanities. John
Wiley and Sons, New York 1979.
[7] Kuhn,T.S. - The Structure of Scientific Revolutions. Chicago University Press, 1970
[8] Small,H.-Garfield,E.: The Geography of Science: Disciplinary and National Mappings. Journal of
Information Science 11 (1985), 147-159.
[9] Turner, W.A.-Rojouan,F.: Evaluating Input/Output Relationship in a Regional Research Network Using
Coword Analysis. Scientometrics 22 (1991).

19
As an analogy, the correctness of old geografic maps could be judged only by sailors that used them.

11