Sie sind auf Seite 1von 29

Scientometrics (2016) 108:1013–1041

DOI 10.1007/s11192-016-2024-0

A methodology for technology trend monitoring:


the case of semantic technologies

Oleg Ena1 • Nadezhda Mikova1 • Ozcan Saritas1 •

Anna Sokolova1

Received: 24 February 2015 / Published online: 25 June 2016


Ó Akadémiai Kiadó, Budapest, Hungary 2016

Abstract This paper introduces a systematic technology trend monitoring (TTM)


methodology based on an analysis of bibliometric data. Among the key premises for
developing a methodology are: (1) the increasing number of data sources addressing dif-
ferent phases of the STI development, and thus requiring a more holistic and integrated
analysis; (2) the need for more customized clustering approaches particularly for the pur-
pose of identifying trends; and (3) augmenting the policy impact of trends through gathering
future-oriented intelligence on emerging developments and potential disruptive changes.
Thus, the TTM methodology developed combines and jointly analyzes different datasets to
gain intelligence to cover different phases of the technological evolution starting from the
‘emergence’ of a technology towards ‘supporting’ and ‘solution’ applications and more
‘practical’ business and market-oriented uses. Furthermore, the study presents a new
algorithm for data clustering in order to overcome the weaknesses of readily available
clusterization tools for the purpose of identifying technology trends. The present study
places the TTM activities into a wider policy context to make use of the outcomes for the
purpose of Science, Technology and Innovation policy formulation, and R&D strategy
making processes. The methodology developed is demonstrated in the domain of ‘‘semantic
technologies’’.

Oleg Ena, Nadezhda Mikova, Ozcan Saritas and Anna Sokolovaequal have contributed equally to this paper.

& Anna Sokolova


avsokolova@hse.ru
Oleg Ena
ovena@hse.ru
Nadezhda Mikova
nmikova@hse.ru
Ozcan Saritas
osaritas@hse.ru
1
National Research University Higher School of Economics, Moscow, Russian Federation

123
1014 Scientometrics (2016) 108:1013–1041

Keywords Trend monitoring  Bibliometrics  Technology mining  Foresight  Semantic


technologies  Russia

Introduction

Recent years have witnessed major advancements in science, technology and innovation
(STI). The new global context suggests an increased financial, trade and investment flows
leading to a more interconnected and interdependent world, which is accelerated by rapid
technological progress in areas such as ICTs, biotechnologies, fuel cells and nanotech-
nologies. Meanwhile, severe social and economic instabilities have been witnessed due to
the economic recession, lack of fresh water, food, and energy supply, climate change,
regional conflicts, and respective population movements. In such a rapidly changing
complex environment with full of opportunities and threats, it becomes crucial to identify
emerging trends as the weak signals of potential changes and indicators of future shocks
and surprises in the form of wild cards. A number of studies have been undertaken by a
wide variety of institutions for the purpose of identifying and monitoring trends. These
involved:
Large international organizations including the European Commission (Mrakotsky-
Kolm and Soderlind 2009), Organisation for Economic Co-operation and Development
(OECD 2007), International Telecommunications Union (ITU 2014), and International
Energy Agency (IEA 2013).
National research centers including RAND (Antón et al. 2001; Silberglitt et al. 2006),
US National Research Council (NRC 2005), US Office of Naval Research (ONR 2014)
and UK Government Office for Science (2010).
Universities and research institutions including Manchester Institute of Innovation
Research (iKNOW 2014), National Institute of Science and Technology Policy (NISTEP
2010), Fraunhofer Institute for Systems and Innovation Research (Fraunhofer ISI 2014).
Corporations including Shell (2007), IBM (2013) and Microsoft-Fujitsu (2011).
Consulting companies including Battelle (2014), Gartner (2013), Z-Punkt (2014), Lux
Research (2014), Deloitte (2012), TechCast (2014) and TrendHunter (2014).
TTM has been frequently used by inter-governmental organisations to monitor global
technology trends and to set international standards by the private sector to develop cor-
porate Research and Development (R&D) strategies; by consultancy companies to provide
technology intelligence to their clients and by research and academic institutions to keep
track of S&T advancements, identifying new research topics and collaboration networks.
For instance, ITU has launched a Technology Watch programme with the major objective
of monitoring rapidly changing ICTs, providing international standards in the domain, and
tackling issues related to the global ICT development (ITU 2014). The US Navy’s Global
Technology Watch aims at developing a wide variety of approaches, which are capable of
identifying the key technological trends in the defence sector (Kostoff 1999, 2003).
Majority of these activities aim to utilize technology trend monitoring (TTM) for collecting
and analysing data to explore trends and provide early indications of potential changes and
developments for more anticipatory policy and strategy making. The anticipatory intelli-
gence gathered through the TTM work provide public STI policy makers and private R&D
strategy developers, with tools for prioritising potential opportunities and threats and

123
Scientometrics (2016) 108:1013–1041 1015

allocating resources to increase the ability to capitalise on, protect against, or mitigate the
impact of potential disruption.
The process typically involves collection and analysis of data from structured and
unstructured sources to extract ‘hidden patterns’, which may be indications of emerging
trends and developments. A wide variety of approaches has been used for this purpose.
These range from simple keyword searches to more sophisticated applications of scien-
tometrics and technology mining using qualitative and quantitative methods. Among the
qualitative methods used for trend monitoring are literature review, scenarios, expert
panels, interviews and others. For example, key methods used in the RAND studies
‘Global technology revolution’ were literature review (foresights, articles, outlooks and
S&T journals), assessment of real progress in research and development (R&D) as well as
investment attractiveness, interviews with RAND experts and gathering collective opinions
on S&T development from a broad spectrum of individuals (Antón et al. 2001). More
recently, the advancement of computing technologies allow the use of quantitative methods
through the analysis of large amounts of data. Amongst the most intensively discussed and
developed quantitative methods are (1) ‘bibliometric/scientometric analysis’ (e.g. Chao
et al. 2007; Chen 2006; Cobo et al. 2011; Guo et al. 2011; Kajikawa et al. 2008; Kostoff
et al. 2008; Morris et al. 2002; Porter and Cunningham 2005; Shibata et al. 2008; Smal-
heiser 2001; Upham and Small 2010); (2) ‘patent analysis’ (e.g. Abraham and Moitra 2001;
Campbell 1983; Corrocher et al. 2003; Dereli and Durmusoglu 2009; Kim et al.
2009, 2008; Lee et al. 2009, 2011; Porter and Cunningham 2005; Tseng et al. 2007; Wang
et al. 2010; Yoon and Park 2004). As the internet has become an important source of
information methods for web-based information retrieval are also developed by a number
of researchers (Palomino et al. 2013).
Although, technological trend monitoring (TTM) has been acclaimed by a number of
leading policy and strategy makers, and methodologies have been developed particularly in
the last 15 years with an increasing use of information technologies, there is still room for
improvement, particularly, in the use of data sources, methodology for data analysis, and
the positioning the TTM in the policy and strategy making process.
First of all, the analysis of literature reveals that most of the work in the TTM field
draws upon the analysis of the publication and patent databases—and in most cases either
of them. However, the analysis of sole publications and patents is not enough to understand
the full cycle of technology development pathways. At present, the data sources are so rich
and diverse. Besides publications and patents, for example, Foresight reports, conference
and business presentations, social networks may provide equally important information for
understanding rapid technological changes. All these different sources may provide
valuable input for addressing different phases of the STI development. Some of these
sources may provide evidence on the ‘supply’ side of technologies, such as academic and
scientific publications, where information regarding R&D and early stages of technological
development can be derived. Similarly, the ‘demand’ side dynamics may be captured in
other information sources, such as academic and sectoral conferences and presentations.
The integrated use of the data sources will certainly give a more complete picture of the
emergence and evolution of STI developments.
The use of diverse data sources in a complementary way require more flexible and
adaptive data analysis and clustering approaches. Most of the tools used for clustering at
present generate clusters that are similar to domain hierarchy levels, which does not allow
for the identification of trends, but only generate broad labels such as ‘energy sector’,
‘shipbuilding’, and ‘aerospace’. Therefore, the present study posits that there is a need for
more customized clustering approach particularly for the purpose of identifying trends.

123
1016 Scientometrics (2016) 108:1013–1041

Finally, it is important to ensure that the TTM work generates impactful results for
policy and strategy making. The present study strives to design a process to explicitly link
TTM into policies and strategies within an explicit and systematic framework. Current
work in this field, and the methodologies developed consider TTM as a standing alone
activity and generate reports on outputs without a systematic translation of the technology
intelligence into policy and strategy. The proposed TTM process aims at augmenting the
policy impact of TTM. A range of information sources are analysed to extract a number of
established and emerging new technologies and technology application areas, along with
their individual, institutional, geographical and temporal attributions. Then, the relation-
ships between the technologies and application areas are revealed. It is through the sys-
temic relationships, technology clusters are generated, which then lead to the identification
of the technology trends by studying the dynamics of technologies over time. The trends
are described in detail by using a set of parameters, which indicate future opportunities and
threats for policy makers, corporations, research bodies and other potential users. For the
purpose of demonstration, the present study focuses on one of the Russian STI priorities,
Information and Communication Technologies, and more specifically on ‘Semantic
technologies’.
Thus, ‘‘Theoretical background’’ section of the paper begins with the review of TTM
and presents examples of the studies conducted in the field. This will allow benchmarking
the proposed approach with other similar efforts. ‘‘Methodology’’ describes the method-
ology across its five stages and introduces the use of existing and newly developed tools.
The results of the study conducted in the area of semantic technologies, and how the
overall TTM work is integrated into STI policy making for Russia will be presented in
‘‘‘Semantic technologies’ case study’’ section. The paper will be concluded with lessons
learned and questions for further research in the last section.

Theoretical background

A technology trend can be considered as a continuously growing technology area with a


certain pattern. In order to identify the pattern as a trend it should have existed for a certain
period of time, usually about 5 years, with a good prospect of continuing its development
in the future to cover next 5–10 years or beyond. During the TTM work, scientists have to
deal with both structured and unstructured data. Various methods and approaches have
been developed for this purpose. Besides sole biblimetric/scientometric analysis of patents
and publications, approaches have been developed, for instance, to investigate emerging
research fronts on the basis of citation networks of scientific publications and patents in
order to discover uncommercialized gaps by comparing them (Shibata et al. 2010, 2011).
Daim et al. (2006) propose a methodology for forecasting emerging technologies by
identifying hidden patterns and trends. This study complements patent and bibliometric
analysis with methods of forecasting such as scenario development, analysis of growth
curves, analogies, etc. Further, analysis of textual data has become possible with the
introduction of the methods of text mining, which have become popular for handling large
amounts of documents (Lee et al. 2009). Kostoff et al. (1997) proposed a database
tomography for information retrieval as an analytical system to work with large databases.
Kostoff et al. (2004, 2008) develop a systematic approach with the aim of identifying
disruptive technology roadmaps by using literature-based discovery process.

123
Table 1 A comparative analysis of TTM studies
Study Aim and trend type Coverage (subject Methodology Methods Information Type of result Integration into STI policy
area) stages sources (stakeholders)

Porter To develop an approach to ‘‘Solid oxide fuel Spell out the Bibliometric Patent 2 main clusters: For all players:
(2005) quick data processing for cells’’ focal questions analysis databases Nano-surfaces Information providers
monitoring emerging and decide Patent (Derwent Rare-earth materials Information professionals
technologies how to answer analysis World Technology analysts
Trend type: emerging them Expert Patent Researchers,
technologies Get suitable data procedures Index) technologists, and
Search (iterate) Bibliometric managers
Scientometrics (2016) 108:1013–1041

Import into text databases Decision-makers


mining (ISI WOK)
software Web
(VantagePoint) (research
Clean the data institution
Analyze and web-sites)
interpret
Represent the
information
Standardize and
semi-automate
where possible
Kostoff To propose a generic ‘‘Raynaud’s Retrieve core Literature Bibliometric 130 potential treatment To allow the
et al. methodology for phenomenon literature of review databases discoveries for ‘‘RP’’; technologically advanced
(2008) identifying potential (RP)’’, target problem Bibliometric (SCI, hundreds of potential nations to remain
discovery candidates ‘‘Cataracts’’, Characterize analysis MEDLINE) discoveries for competitive with the
Trend type: potential ‘‘Parkinson’s core literature Expert ‘‘Cataracts’’; 16 clusters developing nations,
discoveries disease (PD)’’ Expand core procedures for ‘‘PD’’; 7 potential which have large well-
‘‘Multiple literature discoveries for ‘‘MS’’; 6 trained low-cost labor
sclerosis (MS)’’ Generate potential discoveries for pools
‘‘Water potential ‘‘WP’’
purification discovery
(WP)’’
1017

123
Table 1 continued
1018

Study Aim and trend type Coverage (subject Methodology Methods Information Type of result Integration into STI policy
area) stages sources (stakeholders)

123
Lee To propose an approach for ‘‘Personal digital Development of Patent Patent 6 patent vacancies (no Innovative ideas for new
et al. creating and utilizing assistant (PDA) patent map analysis databases names): product development
(2009) keyword-based patent technologies’’ Identification of Expert (USPTO) High value vacancy (1) (NPD) and new
maps for use in new patent vacancy procedures Medium value vacancy (3) technology creation
technology creation Testing vacancy Low value vacancies (2) (NTC) processes
activity validity
Trend type: white spots
Shibata To perform a comparative ‘‘Gallium nitride Data collection Bibliometric Bibliometric 3 clusters for ‘‘GaN’’ (no For R&D managers and
et al. study in two research (GaN)’’ and Statistical analysis, databases names) policy makers
(2008) domains in order to ‘‘Complex methods Expert (ISI WOS) 9 clusters for ‘‘complex overviewing scientific
develop a method of networks’’ Clustering procedures networks’’: activities and detecting
detecting emerging Extracting the Support or Disease emerging research
knowledge domains role of each (Social) domains (tool for
Trend type: emerging paper Network analysis (Social) future’’Research on
knowledge domains Topic detection Support (Social) Research (R on R)’’:
by natural Small-World (Physics) incremental and
language HIV (Social) branching innovation)
processing Child development
(NLP) (Social)
General (Social)
City (Social)
Water (Physics)
Scientometrics (2016) 108:1013–1041
Table 1 continued

Study Aim and trend type Coverage (subject Methodology Methods Information Type of result Integration into STI policy
area) stages sources (stakeholders)

Tseng To automate the whole Multidisciplinary Text Patent Patent 6 topic clusters: Automatic tools to assist
et al. process of creating final (collection of segmentation analysis databases Chemistry patent engineers or
(2007) patent maps for topic NSC patents Text Expert (USPTO) Electronics and semi- decision makers in patent
analyses and improves from all areas) summarization procedures conductors analysis:
other patent analysis tasks Stop words and Generality To make decisions about
such as patent stemming Communication and future technology
classification, Keyword and computers development
organization, knowledge phrase Material To inspire novel solutions
Scientometrics (2016) 108:1013–1041

sharing, and prior art extraction Biomedicine To predict business trends


searches Term association
Trend type: emerging topics Topic clustering
Topic mapping
Yoon To describe the overall ‘‘Wavelength Collect raw Patent Patent 7 technology keyword For R&D managers,
and process of developing division patent analysis databases clusters for ‘‘WDM’’ academicians and policy
Park patent network for multiplexing documents Expert (USPTO) makers
(2004) analyzing up-to-date (WDM)’’ Transform into procedures
trends of high structured data
technologies and Analyze relation
identifying promising among patents
avenues for new product Develop patent
development network
Trend type: promising Executive patent
technologies analysis
1019

123
1020 Scientometrics (2016) 108:1013–1041

Some more systematic methodological frameworks have also been developed for
monitoring technology trends. For instance, Porter and Newman (2011) suggest a sys-
tematic five-stage ‘technology mining’ process. The proposed stages include: (1) literature
review, (2) research profiling, (3) technology mining, (4) structured knowledge discovery,
and (5) literature-based discovery.
The ‘Quick Technology Intelligence Process’ (QTIP) is also used for the empirical
analysis of the S&T publications and patent data in rapid technology analysis (Porter
2005). This methodology employs the ‘Vantage Point’ software, which can fulfill the
functions of data cleaning, statistical analysis, trend analysis, and information visualiza-
tion. This type of electronic applications and automated tools are in a great demand. There
are several other techniques to help engineers conduct patent analysis in using text mining
techniques, such as data extraction, cluster generation, topic identification and information
mapping and visualization (Tseng et al. 2007).
Visualisation is one crucial part of the process. There have been attempts to provide
visualisation techniques in the systematic analysis of technology trends (Chen 2006; Cobo
et al. 2011; Kim et al. 2008; Lee et al. 2009; Morris et al. 2002; Shibata et al. 2008; Tseng
et al. 2007; Wang et al. 2010; Zhu and Porter 2002; Yoon and Park 2004). The important
task of these studies is the quick generation of helpful knowledge from text in format of
two-dimensional maps. Studies devoted to evaluation of different efficient software-tools
(like Thomson Data Analyzer, Aureka and others) for analyzing patent documents in
structured and unstructured format are designed to present observations on advantages and
weaknesses of their application (e.g. Ruotsalainen 2008).
Table 1 presents a comparative summary of aforementioned studies undertaken on
TTM, and thus provides a comparative analysis between different approaches used.
Considering the aims of the studies and methods for attaining those aims, Table 1
illustrates that the studies usually focus on well-defined narrow technology areas and
monitor trends in those areas through bibliometric and patent analysis. These methods
primarily consider the frequency of occurrence and co-occurrence of keywords/terms,
which are usually obtained based on statistical data (i.e. the number of times a particular
keyword/term was used and co-occurred with other keywords/terms). Because the analyses
look at the frequency data, clusters are usually generated under the most frequently
referred keywords/terms, which are in most cases generic terms. This process typically
produce rather ‘broad tendencies’ than more specific ‘technology trends’. Thus, there is a
need for more customized approaches, which may go deeper in the text itself, consider the
context of the keywords/terms and extract more refined ‘patterns’ of technologies using
fuzzy clustering.
The table also shows that among the data sources used for TTM are scientific publi-
cations, patents and more recently the web. In most cases the studies focus either one of the
data sources, without a joint analysis of different sources. However these sources are
limited to the ‘R&D’ and ‘emergence’ stages of technology life cycle. If a technology trend
is considered in a broad sense, not only as an emerging technology, but to cover earlier
‘blue sky research’ stage and later ‘application’ and social impact’ stages, then there is a
need to expand the information sources to capture the full cycle. In this case using different
databases is considered to make the analysis of technology development more complete
and multifaceted. Martino (2003) suggests that sources such as newspapers articles and
business and popular press can be used to capture ‘application’ and ‘social impact’ stages
of technology development respectively. Furthermore, results of foresight exercises which
are frequently based on the synthesis of research with a long term time horizon and expert
opinions can be considered as a significant sources of information for ‘blue skies research’,

123
Scientometrics (2016) 108:1013–1041 1021

which give the first weak signals of future technologies. Thus, the current paper posits that
for policy and strategy-oriented studies, it is certainly not enough to look at only publi-
cations or patents, but in a wider scope by including wider sources of data, which may give
complementary information to capture the entire spectrum of supply and demand
dynamics. Then the TTM activities will be able to provide more valuable intelligence for
decision making processes, which bring the third key premise of the approach presented in
this paper, the action orientation of trend monitoring activities.
Analysis of the studies represented in the table indicates that the studies are typically
undertaken as standing alone work. They are usually published and shared individually,
without a clear reference for the implications on the policy and strategy impacts. There is
no clear discussion on how to achieve impact for policy and strategy making. Hence, the
TTM work requires more systematic approaches beginning with a broader scanning phase
in a wide variety of information sources and databases, processing data by using multiple
tools and techniques in an integrated way and generating results with a further discussion
on their implications for public and corporate policy and strategy. The TTM approach
presented in the following sections is developed to address these points.

Methodology

The TTM methodology proposed in the current paper aims at addressing the key expec-
tations from the trend monitoring activities discussed above, including:
1. Linking the TTM with policy and strategy making processes, and increasing the
impact of the activity on action.
2. Broadening the possible sources of data to address various stages of technological
development.
3. Combining clusterization methods in a complementary way to create a flexible and
adaptive approach with more refined output for the purpose of identifying technology
trends.
In order to meet the first expectation of linking the TTM with policy and strategy, a five-
phase systematic process was designed as illustrated in Fig. 1.
The phases presented in the figure aim at providing a ‘systematic’ framework for
undertaking the TTM activity and ‘explicitly’ linking the activity with the implemen-
tation of its results (Saritas 2013). Thus, the first phase ‘intelligence’ is mainly concerned
with scanning and surveying activities. First, the scope of the study is defined in line
with the overall objectives of the TTM activity. The area under investigation is described
in detail and key terms are generated with the help of the domain experts. Next, a set of
relevant data sources are identified. The sources are scanned through the use of quan-
titative and qualitative methods such as Bibliometric Analysis, Patent Analysis, and
Literature Review.
In the ‘immersion’ phase, input data obtained in the first phase is ingested, transformed
and normalised through sorting, mapping and further analysis. The aim is to capture overall
technology development patterns, which are revealed through the analysis. Here data
clustering techniques are used to identify trends. Software like the Vantage Point, VOS-
viewer, and Carrot can be used at this phase. The present study developed an additional
clustering algorithm due to the weaknesses of the existing approaches in generating
clusters and identifying trends. This will be detailed below.

123
1022 Scientometrics (2016) 108:1013–1041

Following the identification of trends the ‘integration’ phase considers the networks of
trends, key actors, institutions, countries and examines the dynamic relationships between
them. With a timeline analysis, this phase will reveals the technology development
pathways, and thus the nature of technology trends.
The data on trends are then analysed and described in the ‘interpretation’ phase. This
phase benefits from expert opinions to capture the diversity of multiple interpretations and
different viewpoints. The narratives of emergence pathways, alternative future trajectories
in each cluster and the impacts of weak signals and wild cards on those clusters can be
explained in the course of inclusive discussions.
Finally the ‘intervention’ phase is concerned with the translation of the key messages
arising from the TTM process into policies and actions. Therefore, it is concerned with the
identification of priorities, actions, capacity requirements and organisational structures.
These are used for the formulation of Science, Technology and Innovation (STI) policies
and Research and Development (R&D) strategies. An ideal follow up step of the inter-
vention phase can be the evaluation of the findings and re-iteration of the TTM process.
The second feature of the proposed TTM methodology is to make use of broader
sources of data in a complementary way. As discussed earlier, different types of infor-
mation sources provide diverse knowledge about the development trajectories of emer-
gence, pacing, key and base technologies. This information can be extracted from
structured data (such as publications and patent databases), semi-structured data (such as
data coming from social networks and web forums), or unstructured data (such as free text
and presentations). This multi-source approach provides a more complete picture about the

Fig. 1 The TTM methodology

123
Scientometrics (2016) 108:1013–1041 1023

technology trend, its drivers, supply and demand dynamics and further impacts. In order to
gain this valuable intelligence, flexible and adaptive tools will be needed with the capa-
bility to analyse wide variety of information sources, which brings the third key feature of
the methodology—the development and use of flexible and adaptive tools.
The TTM methodology developed in the paper involves the use of complementary
clusterization methods: This new approach was used to overcome the shortcomings of
currently available tools for clusterization, such as the inability of cluster refinement
beyond pre-defined attributes, exclusion of the time dimension from clustering, impossi-
bility to turn off unnecessary clusters from clustering results, and finally the lack of core
clustering metrics such as characterized vectors and descriptors of both documents and
clusters. The clusterization methods used at the present study are: (1) an ‘on-the-fly’
clusterization of documents based on the full text processing with Carrot software and (2)
the algorithm developed by HSE that combines fuzzy clustering by utilizing topic mod-
elling and other methods; intelligent application of stop-lists based on the common fre-
quent keywords in the document samples such as ‘patent’, ‘invention’, and ‘apparatus’,
which may appear in the same genre; and the selection of topics that are relevant to all
genres of documents in the same research area. The application of traditional clustering
tools leads to generation of clusters that are similar to domain hierarchy levels, which does
not allow for the identification of trends. Moreover, the majority of clusters generated are
too broad such as ‘energy sector’, ‘shipbuilding’, and ‘aerospace’. The application of two-
step clusterization helps to combine clusters identified by Carrot and fuzzy clusters gen-
erated by the HSE algorithm to create more refined trends while cleaning general and
‘noisy’ concepts from the clusters. The details of how the HSE algorithm was developed
and implemented will be demonstrated through a case study on ‘semantic technologies’.

‘Semantic technologies’ case study

The key notion of semantic technologies is to automatically extract meanings and


knowledge from unstructured text and store it in a machine-readable form that computers
can access and interpret. ‘Semantic technologies’ have been developing for over 15 years
and are characterized by a broad set of research and development, technologies, products
and services. They are used by a number of other fields from natural to social sciences.
Despite the long-term evolution and wide areas for potential applications, there are not yet
many well-established technology, product categories, industry standards and benchmark
companies associated to the semantic technologies. All these factors make the area
attractive as a subject for TTM. The process of TTM in semantic technologies is presented
in the next sections.

Stage 1: Formation of a list of terms

The first two stages of the methodology are concerned with collecting input and gathering
intelligence for TTM. First of all a set of terms and keywords are formed to be used for
searches. Terms or keywords are commonly used to define a technology area or devel-
opment. Typically, about 30–50 keywords are used in the TTM work to describe each
technology sub-area in a complete manner. On one hand, the list of keywords should not be
too large for an optimum search effort, on the other hand, the list should be large enough to
provide a unique description of the area under investigation. At this stage, it is important

123
1024 Scientometrics (2016) 108:1013–1041

Table 2 Information sources and databases used for TTM


No. Information source Database

1 Scientific articles ISI web of science, scopus


2 Patents EPOa, USPTOb, JPOc
3 Media Factiva
4 Foresight exercises European foresight monitoring network,
European foresight Platform
5 Conferences Conference websites
6 EC projects CORDIS Europa
7 The internet Websites
8 Dissertations ProQuest
9 Academic/non-academic presentations SlideShare database
a
European patent office
b
United States patent and trademark office
c
Japan patent office

that domain experts are involved in the process for the identification and refinement of the
keywords to best represent the area under investigation. A list of keywords was created for
the pilot area ‘semantic technologies’ in consultation with the domain experts, including:
‘semantic intelligence,’ ‘semantic business,’ ‘semantic BI,’ ‘semantic infrastructure,’
‘Resource Description Framework (RDF),’ ‘Ontology Web Language (OWL),’ and ‘Pro-
tocol and RDF Query Language (SPARQL)’, etc.

Stage 2: Data scanning in databases and collections

At this stage of the TTM methodology, a set of databases and collections were identified
for the analysis. The present study benefits from a wide range of information sources in
order to capture the full cycle of technological emergence and development process. The
use and relevancy of information sources can be determined according to the scope and
objectives of the TTM study. Below, some of the potentially useful information sources are
listed and described (Table 2).
Each of the information sources and databases, and how they were used for TTM
process in the semantic technologies domain are described briefly in the following sections.
For the purpose of the current study, collections from the above-mentioned databases were
generated covering 10 year period (i.e. 2002–2012).

Scientific articles

This is undoubtedly one of the most frequently used sources of information in most of the
TTM work, as the latest scientific advancements are commonly discussed and shared in the
scientific literature. Web of science (WoS) and scopus databases are frequently used for the
generation of a collection of articles. The articles are identified by using the list of key-
words generated in the first stage of the TTM process through bibliometric analysis. Data
on article titles, abstracts, keywords, author names, affiliations, and locations, and funders/
sponsors are collected for analysis. The collection of articles is useful for studying the

123
Scientometrics (2016) 108:1013–1041 1025

dynamics of S&T development and growth of interest in certain areas as well as for the
detection of highly cited fields. Bibliometric analysis for pilot area ‘semantic technologies’
was performed using a resource platform WoS, which provides information not only about
the ratings of scientists, countries, research directions, but also contains additional features
like ‘highly cited papers,’ ‘hot papers,’ and ‘research fronts.’ Research on ‘semantic
technologies’ in WoS generated 4994 publications.

Patents

Patents play a pivotal role in TTM as inventions patented represent important evidence of
scientific and engineering advancements in certain areas of S&T. Patents reflect the ability
of an individual or organization to transform scientific results into technological applica-
tions. They are also necessary condition for economic use of research results and, there-
fore, play the central role in the analysis of economic potential and determination of the
most promising sectors and actors such as individuals, organizations and countries. Patents
not only play a role in legally protecting inventions, but they are also first indications of the
introduction of new artefacts and services, which allow the possibility of detecting inno-
vation breakthroughs. Major patenting organizations including the European Patent Office,
United States Patent and Trademark Office, and Japan Patent Office provide patent data.
Analysis of a variety of databases through patent analysis ensures larger global coverage
and benchmarking between countries. The Derwent Innovations Index (DII) database
developed on the Web of Knowledge platform was used to create a collection of patents for
the semantic technologies area. The DDI database is designed for quick and accurate
search for patents granted in different countries and also provides additional descriptive
information about the importance of the patent, as well as analysis of its relations with
other documents. In the case of semantic technologies, the patent collection includes 623
documents.

Media

Media is a source of understanding S&T supply and demand dynamics in a wide variety of
socio-economic areas. Analyzing media provides the opportunity to monitor leading sci-
ence and technological news from business sites, and transcripts of essential news chan-
nels. As a large database, Factiva can be used to extract data. Factiva provides access to
more than 2000 newspapers (including ‘New York Times,’ ‘Wall Street Journal,’ ‘Fi-
nancial Times,’ and Russian newspaper ‘Vedomosti’), over 3000 magazines (including
‘The Economist,’ ‘Time,’ and ‘Forbes’), and more than 500 news feeds (including ‘Dow
Jones,’ ‘Reuters,’ and ‘The Associated Press’). For the semantic technologies area a col-
lection of 13,885 news was created by using the Factiva database. As in the case of articles
and patents, the keywords search method was used to analyze information from this
database.

Foresight projects

As forward looking activities Foresight projects with a time horizon of 5–100 (or longer)
years, are valuable sources of information for detecting technology trends and priorities. In
the scope of the TTM methodology, the European Foresight Platform (EFP), formerly
European Foresight Monitoring Network (EFMN), database was used. Since the initiation

123
1026 Scientometrics (2016) 108:1013–1041

of the EFMN as a European Commission 7th Framework Programme, the project mapped
over 2000 Foresight exercises. The issue analysis function of the EFMN mapping
methodology provided help with the identification and analysis of key emerging issues
relevant for the future of European S&T development. The EFP was used as a main source
of information for creating Foresight collection. The official website of the platform
provides short presentations (briefs) of the major Foresight studies conducted in different
countries of the world. As a result, 25 Foresight-projects were included in the collection for
semantic technologies.

Conferences

International conferences, seminars and forums are potentially useful sources of infor-
mation and can be beneficial for TTM. These events may be good outlets for introducing
novel technologies and major technological areas, and assessing dynamics and prospects of
their implementation. It is obvious that in this respect, business conferences are of a great
interest as they pay attention to current issues and reflect opinions of key experts–repre-
sentatives from specific areas of knowledge. These experts may not only be directly related
to the development of technologies, but also are interested in their implementation. One of
the goals of this sort of business activities is knowledge sharing via presentation of sig-
nificant results obtained in R&D activities. Best practices of leading industry companies
(key players) are also commonly shared. This may provide valuable information to fill
summary tables in the framework of methodology. For the semantic technologies area, a
list of conferences provided by an expert group is used for creating collection of 2434
records, including conference programs with brief descriptions of manuscripts and
presentations.

EC projects

The European Commission (EC) Framework Programme (FP) is one of the largest research
funding organisations in the world. Currently in its seventh iteration, with an eighth one
about to start, a large number of research projects that have been funded by the EC.
‘CORDIS Europa’ database can be used for collecting data about the projects. A keywords
search can be conducted in the ‘projects’ section, where results can be classified by
thematic areas and based on certain time periods. Useful information can be obtained on
long-standing and emerging technology trends, demand side and markets, and on S&T
performance of different countries. In a similar way, the database of the National Science
Foundation in the US can provide extremely useful information on technology trends. In
the case of semantic technologies, the collection of EC projects created using ‘CORDIS
Europa’ database included 76 projects.

World wide web

The Internet is the largest storage of useful information about S&T development. The main
advantage of the Internet as a source of information is its immediacy and wide scope of
available data on S&T developments, which may be found in public and private web sites,
news portals, articles, blogs and forums. However, it should be noted that besides extre-
mely useful information, there are large amount of ‘hypes’ on the Internet. Web-scraping
can be used as a method to obtain data about technology trends in various fields from the

123
Scientometrics (2016) 108:1013–1041 1027

Internet resources. These Internet resources are initially discussed with experts to narrow
World Wide Web information to data of specific interest. Full-text search library ‘Lucence’
inlaid above web-spider ‘Nutch’ can be proposed as a basis for the Internet search using
web-scraping method. This open source search engine is well-known and fully docu-
mented. It has already been implemented as a basic infrastructure—return index, search
robot, parsers for various documents (HTML, PDF, etc.) and has a user-friendly interface.
For this reason, the important part of the process is to configure ‘Nutch’ in an appropriate
way (at what depth should spider ‘drop’ one iteration, how many documents should spider
download at each level, which links should it take into account, how many pages spider can
download simultaneously from one website). Similarly, web-spider ‘Nutch’ is used for
creating a collection of 994 web-documents in the area of semantic technologies.

Dissertations

International dissertation databases are also considered to be a useful source of information


about global technology trends to identify emerging and disruptive S&T areas. In this case,
electronic libraries of theses and abstracts can be used to search for data provided by
different scientists from all over the world. Some dissertations are presented in these
databases in full-text form, whereas the others can be accessed only in the form of
abstracts. The proposed method suggests using the ProQuest database, which includes the
most recent publications as well as archives covering the year 1971 up to present. ProQuest
is composed of multiple databases on various subjects and includes the latest editions of
publications such as ‘American Sociological Review’, ‘Econometrica’, ‘Journal of Polit-
ical Economy’. The advanced search option, which is used for the selection of scientific
literature, provides the capability to filter, display and store results received. For the
semantic technologies a collection of 6000 dissertations was created by using the ProQuest
database.

SlideShare presentations

The SlideShare presentation database is potentially an extremely useful data source to


access presentations related to various topics in a wide variety of S&T areas. SlideShare
features a vibrant professional and educational community that provides comments, ‘fa-
vorites’ and downloads content on a regular basis. SlideShare content spreads virally
through blogs and social networks such as LinkedIn, Facebook, and Twitter. Individuals
and research organizations upload documents to SlideShare for sharing ideas, conducting
research, connecting with each other and generating decisions for their businesses. In most
cases, presentations are prepared for conferences, business events and presale activities. In
addition, since time of presentation is typically limited, they contain the most concentrated
information about concrete technologies, their applications, and innovation ideas. In this
sense, they can act as a separate business and technology-oriented data source. In order to
form a collection in the area of semantic technologies, a list of keywords discussed with
experts was used to scan the SlideShare website (www.slideshare.net) with the purpose of
finding all relevant documents. Searches resulted with 110 presentations in the .pdf format,
which were then converted into text format for the analysis with text mining tools.
In addition to the abovementioned information sources, the analysis of academic pro-
grammes and curriculum of the leading academic institutions in the world, international
and national S&T policy documents, personal contacts with experts and stakeholder

123
1028 Scientometrics (2016) 108:1013–1041

communities specialised on certain areas of S&T and crowdsourcing approaches would be


immensely useful for TTM work.
Following the extraction of data from a wide variety of information sources, the third
stage of the proposed TTM methodology is concerned with the clustering of data obtained.

Stage 3: Data clusterization

This stage is concerned with sorting and mapping as part of the immersion process. At this
stage, the input extracted from different data sources was used for clusterization in order to
identify areas to formulate technology trends. As mentioned earlier, the purpose of using a
variety of data sources was not to limit research in a certain phase of technological
development, but to cover the entire spectrum of technological evolution as much as
possible. Thus, processing nine collections of different genres instead of a single document
collection helps extracting information relating to technological trends to cover both
research and scientific aspects (e.g. through scientific articles and dissertations, and
technology and production aspects (e.g. through patents and projects), as well as business
and marketing aspects (e.g. through conferences and presentations). Certainly, some data
sources provide information in relation to two or three aspects such as Foresight exercises,
media and the Internet. With regards to this, the parallel processing of the nine collections
in the current TTM study intended to:
Provide a diverse scope of the subject area ‘‘semantic technology’’.
Identify specific technological trends from the specific collections.
Avoid the suppression of business trends, where the semantic technologies domain as
well as trends with from different genres.
Each collection was processed as follows:
1. Empirical study with the involvement of experts for the selection of the control
parameters for clustering.
2. Clustering of the collections with software.
3. Creation of trends for each collection.
4. Harmonization of trends from different collections.
5. Formation of the final technology trends.
Empirical study for the selection of the control parameters for clustering algorithms was
carried out by variation and subsequent expert validation of the results in relation to the
following control parameters:
Maximum top-level clustering passes.
Cluster size.
Merge threshold.
TF label scorer weight.
Obtained clustering results for each group of control parameters were validated by ‘‘Se-
mantic Technologies’’ domain experts.
Keywords and phrases defined by experts as interim trends were marked by special
markers. Changing the control parameters clustering was performed by bi-directional
passage to maximize the number of the interim trends. List of the trends for the specific
collection was composed within the reaching the maximum number of the interim trends.
Then, eight interim lists were harmonized by experts within the entire set of information

123
Scientometrics (2016) 108:1013–1041 1029

sources. The final list of the trends was formed as most frequent names of the harmonized
technological trends.
During this process data clusterization appeared to be a problematic area. This was
mainly due to the challenges of data refinement for the purpose of extracting useful
information such as on people, institutions and locations. Various publicly available or
commercial clusterization methods have been developed. However, most of them are
currently far from the precise clusterization of raw data. They have their pros and cons,
which may be assessed in line with the nature of the work undertaken. For the purpose of
the TTM study presented in the current paper, two methods of clusterization were selected.
First one is proprietary software available for data clusterization, namely ‘Carrot’ data
clusterization tool.1 In addition to it, a second clusterization approach was developed and
applied by the programmers of the Higher School of Economics (HSE). The process of
clusterization began with data preparation and filtering as described in the next section.

Data preparation and pre-filtering

Any clusterization process involves some preparatory work in order the make input data
ready for the analysis. This process tends to take longer as the diversity of databases to be
analyzed increases. As presented above the current study makes us of a large set of
information sources, which involve structured data such as publication and patent data-
bases, and semi- and un-structured data such as SlideShare, conference programs and
dissertations.
First, all data was converted into text format to provide compatibility. However, each
collection was processed separately in order to take into account different stages of
technological development i.e. to capture blue sky (emerging) trends, or different levels of
technological development such as research and development, technology, product, market
stages. Upon the completion of the clusterization process, technology trends were grouped
under different stages of development. Below, how this process was undertaken will be
explained step by step.

Clustering with Carrot software

Using the Carrot method, a number of clusters can be generated with the use of various
cluster tuning attributes. Refining the clustering procedure by managing stop lists, maxi-
mum cluster size and count, weights of capitalized words, words in document titles, TF/
IDF ratio and other tuning parameters, the clusters in Fig. 2 were obtained.
To refine the composition of the clusters and their interconnections, clusterization
results were rendered with a variety of visualization tools that are integrated into the
software package Carrot2 Document Clustering Workbench. For example in Fig. 3
selected cluster ‘Social Semantic Web’ is shown with its connections to other clusters,
which are indicated with small circles.
Obtained results of the trend extraction using Carrot software for the separate collec-
tions, and results of the trends harmonization are shown in Table 3. The table presents four
selected collections with top three clusters in each of the six trends identified.
Table 3 demonstrates the variety of results generated from different collections. The
results of the Carrot clusterization process confirmed the one of the key premises of the
paper that ‘utilizing different information sources would provide intelligence on the

1
http://carrot2.org.

123
1030 Scientometrics (2016) 108:1013–1041

Fig. 2 Carrot clustering results

different levels of technological evolution’. Some evidence of to support this proposition


can be found in Table 3. For instance, regarding first trend, linked open data, the analysis
of scientific articles revealed the ‘‘Data linkage’’ cluster. This is a ‘supporting’ technology
for Semantic Technologies on the supply-side and it is at a lower level in the technological
architecture. Following the first raw towards right, it can be seen that the patents database
generated the ‘‘ontology derivation’’ cluster. This is a ‘solution’ approach, which ensures,
for instance, the success of the semantic web and structures that are implemented (Wouters
et al. 2002). Moving to conferences and presentations, it is seen that ‘‘Web of Data’’
concept emerges, which is a ‘practical’ application that makes semantic knowledge of data
accessible and semantic services available.
The clusterization process using the Carrot algorithm generated six clusters, however,
left large amount of documents in the others (i.e. trash) category and left them unanalyzed.
Furthermore, the clusterization algorithm used in Carrot is a hierarchical and tends to
generate more broad and stable clusters such as ‘‘semantic technologies’’ and similar sub

123
Scientometrics (2016) 108:1013–1041 1031

Fig. 3 Carrot clustering results in the form of an Aduna cluster map

clusters under them, such as ‘‘semantic web’’. Therefore, it is difficult to explore emerging
new trends, which are usually merged under the general clusters or left out of analysis. The
following section will detail the main motivations of developing a new algorithm by the
HSE and how it compares and complements the other clusterization tools.

Clustering with the HSE clustering algorithms

The clusterization algorithm developed by the HSE based on earlier work undertaken by
Kuznetsov (2001) provides a number of important features, which are considered to be
missing in the other commonly used tools. Firstly, the limitations with current tools are
related to the restricted features for the multilevel clusterization. This is the main reason for
the fact that the analyses frequently generate stable clusters containing broad thematic
areas like ‘‘semantic web’’ or ‘‘ontology modeling,’’ which do not necessarily lead to any
specific technological trends. A second and perhaps more significant shortcoming arises
due to the complex conceptual nature of technology trends. The existing clusterization
tools assign each record (e.g. document) to a specific cluster and do not allow the use of the
same record for the analysis of cross-cutting clusters. This is because those tools have been
developed for the purpose of thematic clustering, therefore they lead to suppression
characteristic uni- and bigrams ensuring division of the document set to thematic subdo-
mains: ‘‘semantic technologies’’ -[ (‘‘semantic web’’, ‘‘ontology modelling’’, ‘‘knowl-
edge formalization’’ etc.) even with hierarchical clustering. However, document
systematization for the purpose of identifying technology trends is different than the
thematic clustering.

123
1032 Scientometrics (2016) 108:1013–1041

Table 3 Obtained results of the trend extraction and results of the trends harmonization
Articles Patents Conferences Presentations Harmonized

Linked data Linked open data Linked data Linked data Linked open data
Data linkage Semantic Web of data Graph database (LOD)
Open data interpretation Metadata Web of data
Ontology
derivation
Semantic web Social data Social web Social media Social semantic
Folksonomies Semantic graph Social media Social semantics web
Social bookmarking Contextual Social software Semantic wiki
workspaces
Ontology mapping Record linkage Semantic Semantic Semantic
Interoperability Interoperability interoperability interoperability interoperability
Knowledge representation Content mapping Metadata Archetypes
Information management metadata
Integration Semantic
repositories
Bioinformatics ontology – Bioinformatics Semantic Semantic
Bioinformatics platform Bioinformatics bioinformatics bioinformatics
Bioinformatic resources e-resources Knowledge
Bioinformatics system
technology Biological
systems
Mobile semantic – Mobile web Mobile semantic Mobile semantic
Multimodal mobile Smart devices Semantic agents
interfaces Ubiquitous Mobile devices
Mobile environments computing
Semantic digital libraries Digital documents Semantic digital Semantic digital Semantic digital
Affilation disambignation Digital library libraries libraries libraries
Automated ontology Library cloud Library Library system
Digital resources collections Library services
Data repositories

The algorithm proposed by the HSE aimed at enabling a more granulated processing of
the document collections and identify novel trends. Among the key features of the HSE
clustering algorithm are:
‘On the fly’ selection of the clustering elements: bigrams, trigrams, n-grams.
Use of different metrics of clustering: TF/IDF or TF only.
Compatibility with a variety of source document formats (pdf, doc, ppt, rtf).
Availability of tools for weighing meaningful linguistic marker for the new cluster
centroid formation.
With these capabilities, the HSE algorithm aims to address some of the limitations of the
Carrot software, which were identified through the study and considered to limit the
possibility of identifying unique trends.
In order to address the problems encountered with the existing applications, the HSE
algorithm was equipped with a multi-level clustering scheme with an intelligent expansion
of the clusterization stop-list. The new approach for automatic stop-list extension was
developed as a module in the new software. The need to include this method in the HSE

123
Scientometrics (2016) 108:1013–1041 1033

clustering algorithm was because keywords and phrases that are not applied to technology
trends automatically expand the basic clusterization stop-list. Hence, during the analysis of
a particular collection of documents (such as patents or theses), high-frequency terms such
as ‘patent’ and ‘invention’ form the cluster characteristic vectors. To reduce the impact of
these terms in the formation of clusters, each of the nine document collections were
supplemented by a specific parity document collection of the same genre.
In order to reduce the ‘noise’ of frequent terms, N-gram frequency ratings were used.
These were compared online at different levels (i.e. the whole document collection, and
topics and sub-topics of clusters). A term is automatically identified as noise and included
in the stop-list if the null hypothesis shows that this term has a high frequency. The term is
then distributed identically over the document collection and clusters’ topics or sub-topics
instead of a particular cluster where it emerged. Finally, all the stop-lists are combined to
form a unified ‘master stop-list’. The terms identified were included in the vocabulary
following the ‘bag-of-words’ model. Then, frequency characteristics were generated from
the adjusted document collection specifics, based on TF/IDF. The term vectors of the
documents were formed by taking the terms as document attributes and frequency char-
acteristics as values of the attributes. The HSE algorithm is capable of customizing parity
collections for different areas. With this feature the HSE clustering algorithm became one
of the most flexible and adaptable approaches for clusterization for trend identification. All
clustering parameters and tuning options in the algorithm are customizable, which provides
different tuning options such as changing TF/IDF ratio and other granular customization of
various results with a wide set of clustering metrics.
The second improvement is related to the development of the fuzzy (soft) clusterization
algorithm. This method implements the topic modelling (Blei and Lafferty 2007; Griffths
and Steyvers 2004; Steyvers and Griffiths 2007) and provides a binding of the source
document simultaneously to a large number of clusters, which when combined with
intelligent stop-list expansion provides weight growth of the topics corresponding to
implicit technological trends.
For the purpose of clustering, the HSE algorithm benefits from various methods
including k-means with Euclidean distance metric, Manhattan distance metric and Pearson
correlation metric along with latent Dirichlet allocation (LDA) modeling with an extension
for N-gram support and formal concept analysis (FCA) based on the lattice theory. Using
the algorithm, it is possible to manage representation, structure and different analytical
dimensions of clustering results. In such customization, there are possibilities to change
and refine core clustering functions producing results based only on TF (without IDF)
calculation, getting unigram, bigram and trigram lists and other modifications of the core
and supplementary algorithms. To weight co-occurrence of categorical attributes (key-
words) and the individual patterns (documents) the FCCM fuzzy clustering algorithm was
used with modifications for large text corpora (Kummamuru et al. 2003).
Figure 4 illustrates the results of the clusterization through the HSE algorithm described
above.
As highlighted in the figure, this algorithm helped to reveal additional and more novel
trends, which were not identified in other tools including ‘‘Semantic Business Intelligence’’
and ‘‘Semantic e-Health’’ along with other e-applications for commerce, government, and
learning. The results also confirmed some of the other trends, which were found previously
such as ‘‘Linked Data’’.
Consultations and validations of the results generated through the Carrot and HSE
algorithm revealed eight trends in total, which are described in the next section of the
paper. In summary, the analysis has shown that standard clusterization tools may not reveal

123
1034 Scientometrics (2016) 108:1013–1041

all the trends. Combined and customized use of the clustering tools provides more diverse
outputs, particularly if the purpose is not only to generate clusters, but to discover hidden
patters in data.

Stage 4: Identification and description of trends

This stage begins with the integration of the output generated to identify technology trends.
For the pilot area, ‘semantic technologies,’ a wide variety of collections were generated
including 6000 dissertation abstracts; 4994 abstracts of scientific articles; 2434 materials
from conferences programs, call texts and manuscripts); 623 patents; 994 web articles; 110
SlideShare presentations; 76 abstracts from FP7 projects; and 25 Foresight projects. In the
framework of this study, all two clustering methods described earlier were used to grad-
ually process each document collection coming from different information sources through
continuous expert consultations. As a result of this process, the following list of eight
trends was identified:
1. Linked open data (LOD).
2. Social semantic web.
3. Semantic business intelligence.
4. Semantic interoperability.
5. Semantic bioinformatics.
6. Mobile semantic.
7. Semantic digital libraries.
8. Semantic-based e-Apps (semantic e-commerce, e-government, e-learning, e-health).

Fig. 4 HSE clustering results

123
Scientometrics (2016) 108:1013–1041 1035

These results were presented in a final workshop with the participation of broader stake-
holders to discuss and prioritize the list of trends. The workshop resulted with a reduced
number of trends, which were considered to be the most relevant and promising. The
shorter list of trends included:
1. Linked open data (LOD).
2. Social semantic web.
3. Mobile semantic.
4. Semantic digital libraries.
5. Semantic-based e-Apps (semantic e-commerce, e-government, e-learning, e-health).
The final list of trends was taken to the next step for further elaboration and database
creation.

Stage 5: Creation of a trends database

After the final list of technology trends was generated, the interpretation stage elaborated
the trends further to provide a thorough description for their further use. A template was
designed to describe the most determinative features of trends in order to assess their
significance in terms of their impact on global socio-economic development in the long
term, as well as their potential impacts on Russia. For understanding the state-of-the-art in
the subject area, a brief description of scientific and technological context is included.
Promising products/services associated to the trend indicates the potentials for exploitation.
Further socio-economic benefits are indicated with a brief description of new consumer
properties (e.g. portability, multi-functionality, energy efficiency, etc.), disruptive capacity,
expected effects of its development (social, economic, ecological, political, etc.), as well as
key areas of application. One of the core features of trend descriptions is the life cycle
stage at which major work is carried out (basic research, applied research, prototypes, mass
production), which in turn may indicate approximate year of technology implementation
and expected market volume. Mechanisms for the efficient introduction of the technologies
were indicated with the development of other interconnected technologies as well as
alternative directions of technological development that have their comparative advantages
and disadvantages. Information on key players and leading countries in the subject area
were considered to be helpful to determine Russia’s position in global development of the
technology. In addition, assessing the technology development should take into account
drivers that stimulate it (for example, the need of reducing emissions, the need of pro-
cessing large amounts of information, legal requirements, etc.), as well as barriers, risks,
and uncertainties that may adversely affect the technology introduction. Additional liter-
ature, web-links, and other background information were also considered to be useful for
further elaboration and update of information on technology trends. Consequently, the final
template included the following features for the description of identified trends:
1. Name (full title of the technology trend).
2. Short description.
3. Promising products/services associated to the trend.
4. New consumer properties.
5. Expected effects (the most important results of technology application—in society,
economy, ecology, etc.)
6. Life cycle stage (the stage of technology development—basis research, applied
research, prototype, mass production).

123
1036 Scientometrics (2016) 108:1013–1041

7. Year of technology implementation.


8. Areas of application.
9. Market volume.
10. Competitive technologies (that can substitute studied technology, their advantages
and disadvantages).
11. Associated technologies related to studied trend.
12. Leading countries in the area (the most successful countries in technology
development).
13. Key players (including companies, universities, research organizations, or other
institutions that are mainly engaged in technology development actions).
14. Drivers (factors that can accelerate innovation process).
15. Disruptive capacity (potentials to be a game changer).
16. Barriers, risks and uncertainties against the trend.
17. Data sources and additional information (literature, web-links, and other
background information).
Once the descriptions were completed, a global trends database was generated with all the
trends mapped. The TTM methodology described above will be extended to cover other
sectors and topics at three levels: upper, intermediate and bottom. At the upper level, the
aforementioned six priority areas for the Russian Government will be considered. Each of
the priority areas will then be divided into five to six technological sub areas at the
intermediate level. Then five to eight technology trends are identified at the bottom level.
The TTM case study described above focused the ‘ICTs’ sector at the upper level; ‘se-
mantic technologies’ at the intermediate level, and identified five technology trends at the
bottom level. The database generated will be used to map the entire set of trends in all
priority areas.

Uses of the TTM results

Besides raising awareness on emerging trends, the TTM study and the trends database are
considered to be a useful source of information for a number of further efforts such as
national, regional or sectoral level Foresight projects and STI policy formulation and
corporate R&D strategy making. This is the final stage of the TTM process, which is
entitled ‘intervention’. Achieving impacts with the TTM work may be achieved in various
ways:
The analysis of publication, patent, media and other databases reveals trends in
scientific, technological and policy domains.
The trend monitoring study helps to identify the ‘weak signals’ of possible future
developments with potential opportunities and threats. Once identified, it will be
possible to prioritise trends and weak signals in the most promising areas of STI.
Through the bibliometric analyses, the results of the TTM study indicate leading
countries, institutions and individuals in certain STI domains. This makes it possible to
build domestic and international collaboration networks with the use of the TTM results.
A ‘gap analysis’ can be conducted by comparing and contrasting the results in the world
and in Russia. Strengths and weaknesses can be identified at the global and national
levels and future collaborators can be identified by using the networks of leading
countries, institutions and individuals.

123
Scientometrics (2016) 108:1013–1041 1037

In order to serve for these purposes, a TTM database has been generated. This relational
database involves all the attributions used in the trends description and allows searching for
trends by using keywords, sectors, institutions, countries as well as associated grand
challenges and social, technological, economic, environmental and political domains they
may be related to. Advanced reporting functions have been added into to the database. The
reporting function helps to generate reports based on the requirements of the database
users. For instance, currently, regular TRENDletters are being issued periodically on the
national priority areas identified for Russia.2 Examples of TRENDletters released so far
can be seen on the dedicated web page (in Russian language).3 The TRENDletters are
distributed for different user groups including individuals, government institutions, busi-
ness firms, research institutions among the others, which are engaged in STI policy and
strategy processes.
Furthermore, the results of the TTM activity have been used in the Russian STI
Foresight 2030 exercise, which was addressed by the President Vladimir Putin at the
Federal Assembly of the Russian Federation.4 The TTM studies are currently being used
for the on-going Foresight 2040 exercise.

Conclusions and discussion

This paper has presented a TTM methodology, which aims to detect and describe a set of
technology trends in selected domains of interest. Conceptually, the process begins with an
Intelligence gathering process, which involves undertaking comprehensive understanding
and scoping activities to best describe the area under investigation. After scoping the field
with selected keywords and terms, a wide variety of information sources are analysed
through the use of a combination of quantitative and qualitative techniques. This process
helps to extract a number of established and emerging new technologies, technology
application areas, along with their individual, institutional and geographical attributions.
The TTM process has validated one of the premises of the paper that using multiple
sources of data gives a more complete picture of the technological evolution process. As
the focus of the TTM process moved from publications towards patents and then to
conferences and presentations, it was observed that there is a move from ‘supporting’
technologies to ‘solution’ approaches and then to more ‘practical’ business-oriented
applications. Each data source is considered to indicate a different stage of the techno-
logical development life-cycle that is the emergence, growth, maturity or saturation phases,
with implications on the supply of and demand for new technologies.
Then, relationships between the technologies and application areas were investigated
through the clusterization processes to identify the emerging trends in the domain. Out of
the analyses with various clusterization tools readily available, it was concluded that there
is a need for a more customised approach for revealing technology trends than identifying
merely clusters. A proprietary clustering algorithm was developed by the HSE to overcome
the shortcomings of the existing tools. With its improved properties, the new algorithm

2
The priority areas include Information and Communication Technologies, Living Systems and Biotech-
nologies, Nanotechnologies, Transportation and Aerospace Technologies, Technologies for the Rational Use
of Natural Resources, and Energy Efficiency and Energy Saving Technologies.
3
http://issek.hse.ru/trendletter/. Last visited on October 27, 2015.
4
http://www.hse.ru/data/2014/03/03/1330240475/Foresight%202030.pdf. Last visited on October 27, 2015.

123
1038 Scientometrics (2016) 108:1013–1041

helped to identify additional technology trends, which represented the two out of five final
clusters shortlisted through consultations.
The trends identified were described in detail by using a wide variety of parameters to
cover future opportunities and threats for policy makers, corporations, research bodies and
other potential users. A database is currently being prepared with advanced search and
reporting options to enable the use of TTM results for multiple purposes, for example as an
input for future national, regional and sectoral Foresight studies.
The review of the technology monitoring, technology watch and technology mining
literatures reveal that the methodology developed is comparable to other similar efforts
with potentially useful new features. It fulfils the expectations of diversity with the use of
information sources, a systematic process, transparency, inclusivity and scalability, and
therefore is considered to be useful for other TTM efforts. Moreover, the proposed TTM
methodology has several features which distinguish it from similar efforts such as the ones
developed by Daim et al. (2006), Porter and Newman (2011), and Shibata et al.
(2008, 2011). First of all, a systemic approach is proposed for the TTM process, this is a
more holistic process than the other methods, and is achieved by integrating the com-
prehensive technology intelligence process with implementation and policy making pro-
cesses. The results are not provided as raw materials, but instead involve precise strategies
and policies through the identification of threats, weak signals of change, collaboration
networks and gap analysis. Therefore, the operationalization of results and action-orien-
tation are considered to be one of the most prominent features of the approach described in
this paper.
Secondly, the proposed TTM approach does not only rely on quantitative methods (i.e.
bibliometric analysis, scientometrics, or technology mining). The process benefits from a
combination of quantitative and qualitative methods, which go beyond mere data collec-
tion and analysis. Hence, the process is more inclusive, creative and strategy focused with
the use of expert consultations, possibility of integrating scenario methods formulated
around the weak signals of future changes, and through the use of gap analysis and
strategic roadmaps for long, medium and short term strategies.
Thirdly, the proposed TTM approach benefits from a wide variety of data and infor-
mation sources. Most of the existing studies rely merely on the analysis of patent or
publication data. However, the approach described in the present paper draws upon a wide
variety of sources to gather intelligence for instance on research and development through
the analysis of publications; technology and product intelligence through the analysis of
patents; and market intelligence through media analysis. All these provide lenses for the
analysis of different levels of development through the STI life cycle.
Furthermore, the TTM study described above made use of a new clustering algorithm,
which has been developed and tested in the framework of several research projects
undertaken by the Higher School of Economics. The integrated use of currently available
algorithms with the more customized one for trend analysis helped to explore more future-
oriented and novel content for discussion. The joint use of clustering tools also increased
the reliability of the outputs. The process of consultations and workshops indicated that the
experts found the trends identified complete and well-spotted without the need for adding
any further trends into the list. The final list of technology trends were considered to be
valid and representative of the dynamics of the technology area under investigation.
As a whole, the proposed methodology contributes to the TTM research with a novel
approach, which allowed developing tools and process for exploring all relevant infor-
mation sources with original and flexible clusterization mechanism and providing a sys-
tematic framework to translate results into practice. Future steps of the study will involve

123
Scientometrics (2016) 108:1013–1041 1039

the extended use of TTM methodology in different domains, while improving the capa-
bilities of extracting the hidden patterns in data through quantitative and qualitative
improvements, and better and further integration with policy and strategy making
processes.

Acknowledgments The article was prepared within the framework of the Basic Research Program at the
National Research University Higher School of Economics (HSE) and supported within the framework of a
subsidy by the Russian Academic Excellence Project ‘5-100’. The authors are grateful for the immense help
of Sergey Kuznetsov’s team (Higher School of Economics) in the development of clustering algorithms and
Mr. Evgeny Klochickin (Ph.D. candidate at the Manchester Institute of Innovation Research Manchester
Business School) in the process of extracting and analysing the data.

References
Abraham, B. P., & Moitra, S. D. (2001). Innovation assessment through patent analysis. Technovation,
21(4), 245–252.
Antón, P. S., Silberglitt, R., & Schneider, J. (2001). The global technology revolution: Bio/nano/materials
trends and their synergies with information technology by 2015. Santa Monica, CA: RAND.
Battelle. (2014). Battelle database. http://www.battelle.org. Last visited on February 3, 2014.
Blei, D., & Lafferty, J. (2007). A correlated topic model of science. Annals of Applied Statistics, 1(1),
17–35.
Campbell, R. S. (1983). Patent trends as a technological forecasting tool. World Patent Information, 5(3),
137–143.
Chao, C.-C., Yang, J.-M., & Jen, W.-Y. (2007). Determining technology trends and forecasts of RFID by a
historical review and bibliometric analysis from 1991 to 2005. Technovation, 27(5), 268–279.
Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific
literature. Journal of the American Society for Information Science and Technology, 57(3), 359–377.
Cobo, M. J., Lopez-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). An approach for detecting,
quantifying, and visualizing the evolution of a research field: A practical application to the fuzzy sets
theory field. Journal of Informetrics, 5(1), 146–166.
Corrocher, N., Malerba, F., & Montobbio, F. (2003). The emergence of new technologies in the ICT field:
Main actors, geographical distribution and knowledge sources. TENIA project. http://eco.uninsubria.
it/dipeco/quaderni/files/QF2003_37.pdf. Last visited on February 3, 2014.
Daim, T. U., Rueda, G., Martin, H., & Gerdsri, P. (2006). Forecasting emerging technologies: Use of
bibliometrics and patent analysis. Technological Forecasting and Social Change, 73(8), 981–1012.
Deloitte. (2012). Tech trends 2012: Elevate IT for digital business. http://www.deloitte.com/assets/Dcom-
UnitedStates/Local%20Assets/Documents/us_cons_techtrends2012_013112.pdf. Last visited on
February 3, 2014.
Dereli, T., & Durmusoglu, A. (2009). A trend-based patent alert system for technology watch. Journal of
Scientific and Industrial Research, 68(8), 674–679.
Fraunhofer ISI. (2014). Emerging technologies. http://www.isi.fraunhofer.de/isi-en/t/index.php. Last visited
on February 3, 2014.
Gartner. (2013). Top 10 strategic technology trends for 2013. http://www.gartner.com/technology/research/
top-10-technology-trends. Last visited on February 3, 2014.
Griffiths, T., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of
Sciences, 101(1), 5228–5235.
Guo, H., Weingart, S., & Borner, K. (2011). Mixed-indicators model for identifying emerging research
areas. Scientometrics, 89(1), 421–435.
IBM. (2013). IBM five in five. http://www.ibm.com/smarterplanet/us/en/ibm_predictions_for_future/ideas/
index.html. Last visited on February 3, 2014.
IEA. (2013). Tracking clean energy progress 2013. http://www.iea.org/publications/TCEP_web.pdf. Last
visited on February 3, 2014.
iKNOW. (2014). iKNOW database. http://community.iknowfutures.eu. Last visited on February 3, 2014.
ITU. (2014). Technology watch. http://www.itu.int/en/ITU-T/techwatch/Pages/default.aspx. Last visited on
February 3, 2014.

123
1040 Scientometrics (2016) 108:1013–1041

Kajikawa, Y., Yoshikawa, J., Takeda, Y., & Matsushima, K. (2008). Tracking emerging technologies in
energy research: Toward a roadmap for sustainable energy. Technological Forecasting and Social
Change, 75(6), 771–782.
Kim, Y. G., Suh, J. H., & Park, S. C. (2008). Visualization of patent analysis for emerging technology.
Expert Systems with Applications, 34(3), 1804–1812.
Kim, Y., Tian, Y., Jeong, Y., Jihee, R., & Myaeng, S.-H. (2009). Automatic discovery of technology trends
from patent text. In Proceedings of the 2009 ACM symposium on applied computing (pp. 1480–1487).
http://ir.kaist.ac.kr/papers/2008/20081025_2009_SAC_Camera-ready.pdf. Last visited on February 3,
2014.
Kostoff, R. N. (1999). Science and technology innovation. Technovation, 19(10), 593–604.
Kostoff, R. N. (2003). Science and technology text mining: Global technology watch, office of naval
research, technical report. http://www.dtic.mil/get-tr-doc/pdf?AD=ADA415863. Last visited on
February 3, 2014.
Kostoff, R. N., Boylan, R., & Simon, G. R. (2004). Disruptive technology roadmaps. Technological
Forecasting and Social Change, 71(1–2), 141–159.
Kostoff, R. N., Briggs, M. B., Solka, J. L., & Rushenberg, R. L. (2008). Literature-related discovery (LRD):
Methodology. Technological Forecasting and Social Change, 75(2), 186–202.
Kostoff, R. N., Eberhart, H. J., & Toothman, D. R. (1997). Database tomography for information retrieval.
Journal of Information Science, 23(4), 301–311.
Kummamuru, K., Dhawale, A., & Krishnapuram, R. (2003). Fuzzy co-clustering of documents and key-
words. In Proceedings of the 12th IEEE international conference on the fuzzy systems (Vol. 2,
pp. 772–777).
Kuznetsov, S. (2001). Machine learning on the basis of formal concept analysis. Automation and Remote
Control, 62(10), 1543–1564.
Lee, H.-J., Lee, S., & Yoon, B. (2011). Technology clustering based on evolutionary patterns: The case of
information and communications technologies. Technological Forecasting and Social Change, 78(6),
953–967.
Lee, S., Yoon, B., & Park, Y. (2009). An approach to discovering new technology opportunities: Keyword-
based patent map approach. Technovation, 29(6–7), 481–497.
Lux Research. (2014). Lux research database. http://www.luxresearchinc.com. Last visited on February 3,
2014.
Martino, J. P. (2003). A review of selected recent advances in technological forecasting. Technological
Forecasting and Social Change, 70(8), 719–733.
Microsoft-Fujitsu. (2011). Key ICT trends and priorities (Vol. 1). http://download.microsoft.com/
documents/Australia/InsightsQuarterly/IQ_IG%20Full%20Report. Last visited on February 3, 2014.
Morris, S., DeYong, C., Wu, Z., Salman, S., & Yemenu, D. (2002). DIVA: A visualization system for
exploring document databases for technology forecasting. Computers and Industrial Engineering,
43(4), 841–862.
Mrakotsky-Kolm, E., & Soderlind, G. (2009). Final recommendations towards a methodology for tech-
nology watch at EU level. STACCATO Deliverable 2.2.1. http://publications.jrc.ec.europa.eu/
repository/bitstream/111111111/12930/1/reqno_jrc50348_staccato%20tech%20watch.pdf. Last vis-
ited on February 3, 2014.
NISTEP. (2010). The 9th science and technology foresight. National Institute of Science and Technology
Policy, NISTEP report no 140 ‘The 9th Delphi Survey’. March 2010. http://www.nistep.go.jp/achiev/
sum/eng/rep140e/pdf/rep140se.pdf. Last visited on February 3, 2014.
NRC. (2005). Avoiding surprise in an era of global technology advances. Committee on Defense Intelli-
gence Agency Technology Forecasts and Reviews, National Research Council. http://www.nap.edu/
catalog.php?record_id=11286. Last visited on February 3, 2014.
OECD. (2007). Infrastructure to 2030 (volume 2): Mapping policy for electricity, water and transport.
http://www.oecd.org/futures/infrastructureto2030/40953164.pdf. Last visited on February 3, 2014.
ONR. (2014). Office of Naval Research website. http://www.onr.navy.mil. Last visited on February 3, 2014.
Palomino, M. A., Vincenti, A., & Owen, R. (2013). Optimising web-based information retrieval methods for
horizon scanning. Foresight, 15(3), 159–176.
Porter, A. L. (2005). QTIP: Quick technology intelligence processes. Technological Forecasting and Social
Change, 72(9), 1070–1081.
Porter, A. L., & Cunningham, S. W. (2005). Tech mining: Exploiting new technologies for competitive
advantage. New York, NY: Wiley.
Porter, A. L., & Newman, N. C. (2011). Mining external R&D. Technovation, 31(4), 171–176.
Ruotsalainen, L. (2008). Data mining tools for technology and competitive intelligence. Espoo 2008. VTT
Tiedotteita—Research Notes 2451.

123
Scientometrics (2016) 108:1013–1041 1041

Saritas, O. (2013). Systemic foresight methodology. In D. Meissner, L. Gokhberg, & A. Sokolov (Eds.),
Foresight and science, technology and innovation policies: Best practices (pp. 83–117). Berlin:
Springer.
Shell. (2007). The Shell Global Scenarios to 2025. The future business environment: trends, trade-offs and
choices. http://www-static.shell.com/content/dam/shell/static/aboutshell/downloads/our-strategy/shell-
global-scenarios/exsum-23052005.pdf. Last visited on February 3, 2014.
Shibata, N., Kajikawa, Y., & Sakata, I. (2010). Extracting the commercialization gap between science and
technology—Case study of a solar cell. Technological Forecasting and Social Change, 77(7),
1147–1155.
Shibata, N., Kajikawa, Y., & Sakata, I. (2011). Detecting potential technological fronts by comparing
scientific papers and patents. Foresight, 13(5), 51–60.
Shibata, N., Kajikawa, Y., Takeda, Y., & Matsushima, K. (2008). Detecting emerging research fronts based
on topological measures in citation networks of scientific publications. Technovation, 28(11), 758–775.
Silberglitt, R., Antón, P. S., Howell, D. R., & Wong, A. (2006). The global technology revolution 2020, In-
depth analysis: Bio/nano/materials/information trends, drivers, barriers and social applications. http://
www.rand.org/content/dam/rand/pubs/technical_reports/2006/RAND_TR303.pdf. Last visited on
February 3, 2014.
Smalheiser, N. R. (2001). Predicting emerging technologies with the aid of text-based data mining: The
micro approach. Technovation, 21(10), 689–693.
Steyvers, M., Griffiths, T. (2007). Probabilistic topic models. In D. McNamara, S. Dennis, W. Kintsch
(Eds.), Handbook of latent semantic analysis. Psychology Press, Hove. ISBN 978-0-8058-5418-3.
TechCast. (2014). TechCast database. http://www.techcast.org. Last visited on February 3, 2014.
TrendHunter. (2014). TrendHunter database. http://www.trendhunter.com. Last visited on February 3, 2014.
Tseng, Y. H., Lin, Ch. J., & Lin, Y. I. (2007). Text mining techniques for patent analysis. Information
Processing and Management, 43(5), 1216–1247.
UK Government Office for Science. (2010). Technology and innovation futures: UK growth opportunities
for the 2020s. http://www.northamptonshireobservatory.org.uk/docs/doc10-1252-technology-and-
innovation-futures[1]101105145732.pdf. Last visited on February 3, 2014.
Upham, S. P., & Small, H. (2010). Emerging research fronts in science and technology: Patterns of new
knowledge development. Scientometrics, 83(1), 15–38.
Wang, M. Y., Chang, D. S., & Kao, Ch.-H. (2010). Identifying technology trends for R&D planning using
TRIZ and text mining. R&D Management, 40(5), 491–509.
Wouters, C., Dillon, T., Rahayu, W., & Chang, E. (2002). A practical walkthrough of the ontology
derivation rules. In R. Cicchetti et al. (Eds.), DEXA 2002, LNCS 2453 (pp. 259–268).
Yoon, B., & Park, Y. (2004). A text-mining-based patent network: Analytical tool for high-technology trend.
Journal of High Technology Management Research, 15(1), 37–50.
Zhu, D., & Porter, A. L. (2002). Automated extraction and visualization of information for technological
intelligence and forecasting. Technological Forecasting and Social Change, 69(5), 495–506.
Z-Punkt. (2014). Trend Radar 2020. http://www.z-punkt.de/trend-radar2020.html. Last visited on February
3, 2014.

123

Das könnte Ihnen auch gefallen