Sie sind auf Seite 1von 5

An Operational Test of KnowledgeNet

Paul Thompson
Thayer School of Engineering and
Department of Computer Science, Dartmouth College
Hanover, New Hampshire 03755, U.S.A

ABSTRACT Much has changed since those days. Computerized


retrieval systems now provide access to the full text of
This paper describes an operational test of the documents. Database retrieval systems have been
KnowledgeNet system for knowledge management. developed which store an organization’s data in
KnowledgeNet is based on probabilistic design principles structured databases to be retrieved by an exact match
first developed for document retrieval, but which have query language, SQL, which is not unlike the Boolean
been applied to the retrieval of people within an retrieval languages of the 1950s. Still, such database
organization as the sources of information. This records are a very small percentage of the computerized
operational test of KnowledgeNet was conducted at the data held by most organizations. Most data is textual.
Caltex operation at the Minas oil field in Indonesia. An Textual retrieval methods have changed as well. Until
initial analysis of the results of this test show that about 10 years ago Boolean retrieval systems were
employees at Tripatra were able to probabilistically index virtually the only ones available commercially. Since
themselves as sources of information accurately enough then major online retrieval services have provided
to provide useful retrieval of their expertise by their optional ranked retrieval methods and, more significantly,
supervisors. This technology shows promise for the World Wide Web has brought about a new type of
knowledge management not only within an organization, retrieval with its web search engines. Ranked retrieval
but also at the national level. The expertise of workers no methods use various metrics to measure the similarity of
longer within an organization could be maintained within a user’s query to the text of documents in order to rank
a national labor database, facilitating the worker’s rehire documents, so that even if, say, a thousand documents are
by another organization. retrieved by a query, the user can be shown the top ten or
twenty documents, which are more likely to be relevant
than lower ranked documents. Although these ranked
1. INTRODUCTION retrieval techniques worked well with well-organized
collections such as those maintained by the major online
Putting the knowledge existing in the minds of its service providers, they did not work well on the Web,
members to effective use is a challenge facing any especially as it grew larger. A few years ago it was
organization. The problem tends to become greater the common to do a search on one of the major web search
greater the size of the organization. One of the earliest engines and retrieve more than five million not well-
applications of the computer, dating back to the 1950s, ranked documents. More recently web search engines
was to solve what was called the library problem, using have developed new algorithms that are more suited to
computers to find bibliographic records that could then be the nature of the Web, e.g., its hyper-link structure, and
used to find documents. In those days computerized better retrieval is being provided. Nevertheless all of
retrieval systems were usually used to find documents for these improvements with database systems and text
scientific researchers. It was commonplace then to retrieval systems do not solve the problem mentioned
observe that people were living in an information age and above – putting to effective use the knowledge existing in
that there was an information overload problem. These the minds of the members of an organization. This
early computerized retrieval systems were based on an problem has come to be called the knowledge
exact match retrieval methodology referred to as Boolean management problem.
logic. A searcher would construct a query expressing his
or her information need using terms from the vocabulary To be sure some of this knowledge is represented in
used to represent the bibliographic records held in the databases. Some of this knowledge is represented in
system, connecting the terms with the Boolean operators, documents written by members of the organization or by
“AND”, “OR”, and “NOT”, as well as proximity others outside the organization. Some of this knowledge
operators permitting the match of records where, for might be gleaned from e-mail sent within an organization.
example, term 1 was within 2 words of term 2. Using However, even if database management systems were
such queries a user would tend to find either far too many extended to much greater coverage than they now have,
records (information overload) or few, if any, records.
and even if document retrieval systems could be coming directly from a knowledgeable person. A seeker
developed which for every information need provided all of information cannot interact with a database record or
and only the relevant documents contained either within document in the same way as with a person.
an organization or on the Web, and even if a machine
learning, natural language understanding algorithm could As described so far, it is clear that KnowledgeNet
be developed and applied to the analysis of all e-mail promises advantages as a tool for knowledge
messages passing through an organization’s e-mail management over systems based on database or
system, the knowledge management problem still would document retrieval. Several issues need to be addressed,
not be solved. This is because much knowledge, however, before a practical system can be built. Two of
including some of the most useful knowledge an the most important such issues are: First, which
organization has, only exists in the minds of its members knowledge topics should be included in the system?, and,
second, can people accurately estimate either how helpful
2. KNOWLEDGENET they could be to a seeker of information on a given topic,
or how helpful a potential source of information would be
KnowledgeNet is a knowledge management system, to them?
based on principles first developed for ranked document
retrieval [1], which directly addresses the problem of The question of which knowledge topics should be
accessing knowledge held only in the minds of an included in the system is resolved by referring to the
organization’s members. Document retrieval researchers workflow management of the organization itself. The
realized many years ago that the exact match logic used topics are those implied by the accounting structure of the
in early document retrieval systems and in database organization. The second question is potentially more
systems was inadequate to retrieve the unstructured serious. It has been known for many years that people
information contained in documents. There are a variety are poor estimators of probabilities [3]. Tools have been
of reasons why exact match technology is inadequate, but developed in work on decision support systems that can
chief among these reasons are linguistic and conceptual help with this problem to some extent, but it remains true
ambiguity. The same word can have multiple meanings. that initial estimates provided by sources and seekers will
The same, or similar, concepts can be expressed in many likely not be as accurate as desired [4]. The probabilistic
different ways. Over the years retrieval systems were algorithms underlying KnowledgeNet are able to adapt to
developed that either calculated the probability that a the experience of people using the system to improve the
particular document would be relevant for a particular accuracy of initial estimates through a process known as
user’s information need, or used heuristics to measure the relevance feedback. With relevance feedback a seeker
similarity of the document to the user’s information need, retrieving a document, or in the case of KnowledgeNet a
as represented by the query. It was recognized that the source of information, provides an assessment to the
same type of probabilistic retrieval algorithms could be system as to whether or not the document, or source, was
used to retrieve information, or knowledge directly from helpful. For many years research studies have shown that
people, the ultimate sources of the knowledge that might relevance feedback can lead to much more accurate
be represented in databases or documents [2]. retrieval systems. Users of KnowledgeNet will be
motivated to provide relevance judgments as they realize
A person could probabilistically index him- or herself as that providing these judgments leads to their obtaining
a source of information. To be more concrete, a person better results.
could be asked to estimate how many out of ten people
coming to him or her with a question on a particular topic This paper describes an operational test of KnowledgeNet
the person thinks that he or she could help. Similarly, a in a large oil company in Indonesia. It shows the promise
person seeking for information using such a system could of how effectively such a system could be used to put
be asked out of n people represented in the organization seekers of information in contact with workers who can
as having some knowledge about a topic, how many of answer the seekers’ questions. KnowledgeNet’s
these people does the seeker think could provide helpful knowledge base, i.e., the information stored on the
information. These estimates of knowledge sources and expertise of each member of the organization represented
seekers can be combined to provide the probability that a in the system, can also have another important use. As
particular source could help a particular seeker with his or large organizations lay off workers from time to time, the
her information need. Such a probabilistic retrieval expertise of laid off workers could be represented in a
system for people as sources of information and national labor database, which would facilitate the
knowledge, which we call KnowledgeNet, has eventual rehiring of these workers.
advantages that go beyond retrieving information, or
knowledge, that is contained only in people’s minds.
Even if knowledge has been recorded in a database record
or document, it is often not as helpful as knowledge
3. MINAS: AN OPERATIONAL TEST OF opportunities for the worker themselves. Much of the
KNOWLEDGENET work is repetitious, though taking place over often-
difficult terrain.
Minas is an ageing oil field in central Sumatra.
Seventeen years after its first production, Minas field, on Given context, the worker can make a subjective
May 4, 1969 reached an accumulated production of the assessment, such as is required in the use of KNOW.
first billion barrels of crude and became the first giant oil KNOW requires only two such assessments: one, from
field in Asia, east of Iran and the twenty-second in the the worker, or source at file building time; the other, from
world. Until the end of 1990, the accumulated production the user of the system at query time. KNOW requires a
of Minas field had exceeded three billion barrels. “Minas subjective assessment pertaining to specified work
crude” (now known with its name “Sumatran light categories. Regarding each of these categories, 52 in this
crude”-SLC) is favored by industrial countries for its very case, the worker is asked how many out of 10 questions
low degree of sulphur. From 1995 onward, at the time put to him by his immediate supervisor he feels he could
Caltex (PT Caltex Pacific Indonesia) was organizing help answer. These are not meant to be test questions,
Minas as a Strategic Business Unit (SBU), part of an but questions that the supervisor himself feels he needs
enterprise approach to project management, it was help in answering.
becoming evident that oil production at Minas field was
falling off. Also, about this time, it was seen that two These work categories are called Areas of Interest in
advanced management technologies might be applied KNOW and are critical to the use of the system. An
significantly, with far ranging results. This paper will enterprise approach to project management requires a
deal with one of them, KnowledgeNet One World, or body of hierarchically interrelated cost account codes,
KNOW. The second technology is an enterprise defining a generic project as being composed of a number
management package that implements the organizational of chargeable sub-projects, and non-chargeable covering
theory of Dr. Elliott Jaques [5]. terms. Use of financial enterprise software, such as
PeopleSoft [8], SAP [9], and J.D. Edwards [10], requires
KNOW represents the first practical use of the innovative this. These account code definitions, used by Calfais, the
KnowledegNet search technology, which was first Caltex financial software, recently “mapped” over to J. D.
introduced, as Helpnet, an academic research model, Edwards, were used to define the 52 work categories
incorporating the probabilistic search theory of Dr. M.E. given to the workers for their input estimates. The
Maron and his colleagues [2]. Although the model was KNOW Areas of Knowledge requirement was met by
technically sound, it was perceived that the use of simply using employee Time and Attendance data. (Time
subjective input probabilities required of the participants and Attendance Systems are part of any enterprise
was inherently unreliable [3]. Helpnet’s seminal system.) Employees are linked to corporate functions,
contribution to the field of expert finding systems is only corporate function are Areas of Knowledge. As an
now being recognized [6, 7]. employee moves through his corporate career he may
occupy more than one functional area. These are all his
The Helpnet paper resurfaced at Minas in 1995. The Areas of Knowledge, and may be kept track of
Helpnet model, which had not been used for a practical automatically.
application, by 1995, was outdated technically. All that
was available for management to consider was the An operational test had to wait until the development of
Helpnet paper, which was theoretical in nature and hard the KNOW program, which took place after November
for anyone but a specialist to understand. Aside from the 2000. Careful consideration was given as to how to
problem of developing user-friendly software for the approach program development. It was seen that the
application itself, it was necessary to address the application lent itself ideally to the use of an off-the-shelf
“subjective input” objection. Estimating is not foreign to relational database as a starting point. Any, but essential
construction. Although it might be preferable to arrive at programming would be avoided. A user-friendly,
a task time assessment, using a productivity factor and workable program was ready for corporate use sometime
drawing quantity, the skill and experience of a worker, in 2003. Since the employee work situation had altered
given some kind of model, has proven to be quite reliable dramatically in the intervening years – the resident
in making these assessments, also. contractor at Minas had downsized from about 3,000 to
1,400 employees, Minas management kindly provided the
Caltex has out-sourced management of the field labor means to solicit employee information from as many as
force to various contractors over the years, the last being possible.
Tripatra, a national contractor. The one constant has been
the labor force itself, which is generational. This It was not until this year, 2004 that the operational test
continuity of service has provided depth of experience, was able to go forward. From the site visit in 2003 to the
skill and knowledge, while providing training site visit in 2004, continued downsizing has further
reduced project employee levels to about 500, suitable for fewer numbers of sources that were consulted than other
the mostly maintenance mission now required of the areas, there were, on average, 4.596 sources consulted for
contractor. Also, Primatrain, a local software house, each area of interest for which more than one source was
provided IT support, unavailable on site. Of the original consulted. The source with the highest estimate actually
1,400 employees, 500 only were able to provide Area of answering the most questions for 19 out of 47 areas of
Interest information. Of that 500, only 110 were still interest is about .40425 percent of the time. If their
with the company at the last visit. In the allotted time estimates had no predictive value, it would be expected
there was no possibility of getting complete file that by chance they would answer the most questions one
information for the remaining Minas complement of 500. out of 4.596 times, which would be .217 percent of the
time. The actual performance of .40425 shows that the
Despite the difficult labor conditions described above and estimates have predictive value.
the expense, a test of KNOW was completed. Eighteen
supervisors asked questions from as many of the 52 Areas 4. DISCUSSION AND FUTURE PLANS
of Interest as they felt were applicable to their employees.
Because of the downsizing only a few of the employees Although the preliminary results described above show
who were originally entered into the KNOW system were that humans can make estimates for KnowledgeNet that
still on site. In most cases an employee was only asked have predictive value, it is not necessary to regard these
one question for each Area of Interest, making it estimates as static. As seekers of information use
impossible to get a clear sense of how accurate the KnowledgeNet to find people to answer their questions,
employee’s estimates as to how many of 10 questions he the seekers can record which people were helpful. This
could answer were. For three types of employees, relevance feedback can be used to adjust the original
however, due to the uncertainties as to who would be estimates provided by the sources of information.
their supervisor as reorganization proceeded, more than Relevance feedback has been shown to be effective in
one supervisor asked questions of the same pool of document retrieval [11]. The probabilistic model on
employees. In particular, employees working in the which KnowledgeNet is based has been extended to
“civil” area were asked questions by three supervisors. support relevance feedback [12, 13]. Recently this model
was implemented in the NewsVerifier system [14].
Although representing only an initial quick look at the
large amount of data collected, it is instructive to consider KNOW recognizes the fact that all employees have an
the responses of the civil employees. Each supervisor inherent intellectual property right which cannot be
evaluated whether the answer given to his question by contracted away by an employing organization. What
each of the employees was satisfactory. Since each of the one knows is a market commodity, imperfectly expressed
three supervisors asked each of the employees one in an employee resume or curriculum vitae. A qualitative
question, an employee answered either 0, 1, 2, or all 3 measure of the knowledge one accrues over time in
questions correctly, or, in terms of percentages, 0, .33, .66, corporate employment does not exist. Furthermore, law
or 1.00. These actual results can be compared to the forbids employers from expressing an opinion. Just as
percentage of questions that the employee estimated that market forces determine the value of consumer goods and
he would be able to answer. Three questions are not services, the market place of ideas has its place within a
enough to give a valid measure of an individual’s ability large corporate setting. The uninhibited exchange of
to make accurate estimates. On the other hand, it is work-related ideas, with relevance feedback can be used
possible to consider a more aggregate measure. Of the 52 to value an employee’s corporate contributions over time.
Areas of Interest, the number of times that the source that These valuations may be carried over into an Internet
made the highest estimate of number of questions that he world market, and with the power of the KNOW search
could answer, was in fact the source that was able to technology give the employee his unique voice. A
answer the most questions, was recorded. If two or more national labor database based on KNOW technology is
sources were each able to answer the same, highest not only feasible, but can be achieved. The worker will
number of questions, including the source with the no longer be lost in the collective; as impossible to find as
highest estimate, this was counted in favor of the source a needle in a haystack.
with the highest estimate. For a few areas, there was only
one source, so these areas were not included. For 28
5. CONCLUSIONS
areas of interest the source with the highest estimate, was
not the source who was able to answer the most questions.
For 19 areas of interest, the source with the highest This initial operational test of KnowledgeNet shows the
estimate was able to answer the most questions. promise of this technology to support knowledge
Although at first this might not seem too impressive, management within an organization, benefiting both
consider the hypothetical case where sources’ estimates management and employees at all levels. It also suggests
had no predictive value. Since some areas of interest had how the knowledge of employees within an organization
can be preserved not only within corporate memory, but
within national memory in the form of a national labor 9. SAP. 2004. http://www.sap.com/
database.
10. J.D. Edwards. 2004. http://www.jdedwards.com
6. ACKNOWLEDGEMENTS
11. T. Sakai, S.E. Robertson, and S. Walker, “Flexible
The author would like to acknowledge the assistance of relevance feedback for NTCIR-2” In Proceedings
the management of Tripatra and all of its employees of the Second NTCIR Workshop Meeting on
present and past, who took part in this study. Without Evaluation of Chinese & Japanese Text Retrieval
their participation this study would not have been and Text Summarization, Tokyo, Japan, March 7 –
possible. 9, Tokyo: National Institute of Informatics, 2001.

6. REFERENCES 12. P. Thompson, “A combination of expert opinion


approach to probabilistic information retrieval, Part
1: The conceptual model”, Information Processing
1. S.E. Robertson, M.E. Maron, and W.S. Cooper. & Management, Vol. 25, No. 6, 1990, pp. 371-382.
“Probability of relevance: A unification of two
competing models for document retrieval”, 13. P. Thompson, “ A combination of expert opinion
Information Technology: Research and approach to probabilistic information retrieval, Part
Development, Vol. 1, No. 1, 1982, pp. 1-2. 2: mathematical treatment of CEO Model 3”,
Information Processing & Management, Vol. 25,
2. M.E. Maron, S. Curry, and P. Thompson, P. “An No. 6, 1990, pp 383-394.
Inductive Search System: Theory, Design and
Implementation”, IEEE Transactions on Systems, 14. P. Thompson, “Cognitive Hacking and Intelligence
Man, and Cybernetics, Vol. SMC-16, No. 1, 1986, and Security Informatics”, Proceedings of the
pp. 21-28. Conference on Enabling Technologies for
Simulation Science VIII, Defense and Security
3. D. Kahneman, P. Slovic, A. Tversky, A., editors. Symposium 2004¸ Orlando, Florida, 12-16 April,
Judgment under uncertainty: Heuristics and 2004.
biases, Cambridge, England: Cambridge University
Press; 1982

4. P. Thompson, “ Subjective Probability and


Information Retrieval: A Review of the
Psychological Literature”, Journal of
Documentation, Vol. 44, No. 2, 1988, pp. 119-143.

5. E. Jaques, Requisite Organization: A Total


System for Effective Managerial Organization
and Managerial Leadership for the 21st Century,
2nd edition, Glouster, Massachusetts: Cason Hall,
1996.

6. J. Wang, C. Zheng, T. Li, W-Y. Ma, W. Liu,


“Ranking User's Relevance to a Topic through Link
Analysis on Web Logs”, Fourth ACM CIKM
International Workshop on Web Information and
Data Management (WIDM'02), McLean, Virginia,
2002.

7. D. Yimam-Seid and A. Kobsa, “Expert Finding


Systems for Organizations: Problem and Domain
Analysis and the DEMOIR Approach”, Journal of
Organizational Computing and Electronic
Commerce, Vol. 13, No. 1, 2003, pp. 1-24.

8. PeopleSoft. 2004.
http://www.peoplesoft.com/corp/en/public_index.jsp

Das könnte Ihnen auch gefallen