Sie sind auf Seite 1von 13

1

Using OLAP and Data Mining for Content Planning in


Natural Language Generation

Eloi L. Favero1 and Jacques Robin2


1
Departamento de Informtica Universidade Federal do Par (DI-UFPA)
Par, Brazil
ellf@di.ufpe.br
2
Centro de Informtica Universidade Federal de Pernambuco (CIn-UFPE)
Recife, Brazil
jr@Sdi.ufpe.br

Abstract. We present a new approach to content determination and content


organization in the context of natural language generation for quantitative
database summaries. Three key properties make our work innovative and in-
teresting: (1) we developed a new text planning approach to deals with the
content organization of a data set into a summary report, for example a Data
Mining discovery; (2) the approach is domain independent; (3) it covers a
significant class of database summary applications.

1 Research Context: Executive Summary Generation

In this paper, we present a new approach for content determination and organiza-
tion in quantitative data summarization. This approach has been developed for
HYSSOP (HYpertext Summary System of On-line analytical Processing) which
generates hypertext reports for OLAP summaries and Data Mining (DM) discover-
ies. HYSSOP is itself part of the Intelligent Decision-Support System called MA-
TRIKS (Multidimensional Analysis and Textual Reporting for Insight Knowledge
Search). MATRIKS aims to provide a comprehensive knowledge discovery envi-
ronment through seamless integration of data warehousing, OLAP, DM, expert sys-
tem and natural language generation technologies.
The architecture of MATRIKS is given in Fig. 1. It extends previous cutting-edge
environments for Knowledge Discovery in Databases (KDD) such as DBMiner [4]
by the integration of:
a data warehouse hypercube exploration expert system allowing automation and
expertise legacy of dimensional data warehouse exploration strategies developed
by human data analyst using OLAP queries and data mining tools;
2 Favero and Robin.

an hypertext executive summary generator reporting data hypercube exploration


insights in the most concise and familiar way: a few Web pages of natural lan-
guage.

D e c is io n D a ta
m a k e r a n a ly s t

H Y S S O P :
N L h y p e r te x t
s u m m a ry
g e n e r a to r D a ta W a r e h o u s e
h yp e rc u b e
e x p lo r a tio n
E x p e r t S y s te m

O L A P s e rv e r D a ta M in in g
T o o l- k it

M u lti- d im e n s io n a l
D a ta M a r t

D a ta U p d a te &
R e m o d e llin g

O L T P -D B

Fig. 1. The architecture of MATRIKS

These two extensions allow an IDSS to be used directly by decision makers without
constant mediation of a human expert in data analysis.
Research Focus. To our knowledge, the development of HYSSOP is pioneer
work in coupling OLAP and data mining with Natural Language Generation (NLG).
We view such coupling as a synergetic fit with tremendous potential for a wide
range of practical applications. In a nutshell, while NLG is the only technology able
to completely fulfill the reporting needs of OLAP and data mining, these two tech-
nologies are reciprocally the only ones able to completely fulfill the content deter-
mination needs of a key NLG application sub-class: textual summarization of quan-
titative data.
Previous quantitative data NL summarization systems (such as (ANA [10], FOG
[1], LFS [8], GOSSIP [2], PLANDoc [12], FLOWDoc [13], ZEDDoc [14], Meteo-
Cogent [9]) among others) generally perform content determination by relying on a
fixed set of domain-dependent heuristic rules. Such an approach suffers from two
severe limitations that prevents it from reporting the most interesting content from
an underlying database:
Lecture Notes in Computer Science 3

it does not scale up for analytical contexts with high dimensionality and which
take into account the historical evolution of data through time; such complex
context would require a combinatorially explosive number of summary content
determination heuristic rules;
it can only select facts whose class have been thought ahead by the rule base
author, while in most cases, it is its very unexpectedness that makes a fact inter-
esting to report;
OLAP and data mining are the two technologies that emerged to tackle precisely
these two problems: for OLAP efficient search in a high dimensionality, historical
data search space and for data mining automatic discovery in such spaces of hitherto
unsuspected regularities or singularities.
On the other hand, currently text planning approaches (schema-based (or topic
tree) [11] and rhetoric relation-based (RST) [7]) suffers from a weakness: lack of
expressive power to order a set of homogeneous repeated discourse tree nodes. This
motivated our new discourse organization approach.
Though HYSSOP is a hypertext quantitative data generator, in this work, we fo-
cus only on the content determination and discourse organization approaches for
production of linear text.
The paper is organized as follows: section 2 presents HYSSOPs architecture to-
gether with a list of summarization devices; sections 3,4,5 present respectively:
HYSSOPs content determination, discourse and sentence planning; section 6 re-
lated work, and section 7 the conclusions.

2 HYSSOPs Architecture

Previous Work: Summarization Devices. Surveying quantitative data NLG


systems (ANA [10], FOG [1], LFS [8], GOSSIP [2], PLANDoc [12], FLOWDoc
[13], ZEDDoc [14], MeteoCogent [9]) we identify four classes of summarization
and aggregation devices, from syntactic clause combination to statistical trends:

1. Content Aggregation:
Martin logged on to the system + Jessie logged on to the system => Martin and Jessie
logged on to the system (GOSSIP)
It has 24 activities, including 20 tasks + It has 24 activities, including 4 decisions =>
It has 24 activities, including 20 tasks and 4 decisions (FLOWDoc)
2. Ontological Content Abstraction: qualitative to qualitative:
Document is more general than Draft document in SGML format is an entity
generalization (FLOWDoc)
A frequent task consist in creating documents generalize several ontological nodes
about creating skeleton documents, create document sections, etc. generalization
of actions (FLOWDoc)
3. Consolidation Functions: quantitative to quantitative A set of quantitative mes-
sages (e.g., OLAP cells) as a data aggregate are summarized by OLAP consoli-
dation functions such as sum(), count(), avg(), percent(), min(), max().
4 Favero and Robin.

European Internet domains at 28 percent and U.S. network domains at 23 percent


were the most frequent Internet user domains percent(), percent(), max() (ZEDDoc)
4. Statistical Content Abstraction: quantitative to qualitative Several quantitative
messages are abstracted into a statistical trend phrase.
Wall streets security markets meandered upward through most of the morning before
being pushed downhill late in the day. (ANA)
Southeast wind to 10 mph decreasing to 5 mph (MeteoCogent)

Clause aggregation, under the syntactic viewpoint, combines clauses in more com-
plex syntactic structures such as sentences, giving more fluency and coherence to
the text. In our viewpoint, clause aggregation is content aggregation. When one
mention clause, it means that the content aggregation process occurs at clause repre-
sentation layer. A summarization process can create new information such as arith-
metic or statistic facts, while the content aggregation no, it just removes the recur-
rent information.
HYSSOPs Architecture. A complex NLG process, following [15], can be con-
ceptually decomposed as a pipeline of four tasks: content determination, discourse
planning, sentence planning and content realization. HYSSOP also adopts this stan-
dard pipeline model, however, with relation to previous quantitative data summari-
zation systems, HYSSOPs architecture presents two key differences, Fig. 2:
The content determination relies on a Data Mining component which deals with
consolidation functions and statistical content abstraction; and uses OLAP hierar-
chies for data classification.
The content organization (discourse/sentence planning) relies on a totally new
approach, specially developed to deals with a set of homogeneous data, e.g., a set
of data cube cells or database records. It decides about the spatial organization of
the selected messages into a textual report. How to group/order the messages into
paragraphs and sentences?

Data cube

OLAP hierarchy Data Mining content determination
content determination What to say?

Content matrix discourse planning content organization
When to say what?
Discourse tree sentence planning

Dictionary lexicalization content realization
How to say it?
Grammar syntactic realization Make sure to say it correctly

Textual summary

Fig. 2. HYSSOPs architecture

The content realization has two sub-task: lexicalization and syntactic realization.
Once the messages are packaged into sentence structures, the lexicalization sub-
Lecture Notes in Computer Science 5

task, using a specialized dictionary, maps the message concepts into lexical forms
which are submitted the syntactic realization grammar (see [5]).

3 Content Determination: Using OLAP and DM

HYSSOPs content determination is totally done by a Data Mining (DM) compo-


nent which produces a DM discovery, from an exploration session driven by a min-
ing goal such as looks for anomalous monthly sales variations (measured as a per-
centage difference from previous month). A successful mining exploration pro-
duces a pool of cells which match the goal. Such discovery form a pool, and not
an OLAP sub-cube, because a typical cube exploration selects cells sparsely distrib-
uted in different aggregation layers. However, the pool of cells have in common a
homogeneous representation in terms of the dimension set defining the overall data
space.

Dimensions Measures DM
cell product place time sales-diff exception drill-down
1c Birch Beer nation Nov -10 low nation
2c Jolt Cola nation Aug +6 low nation
3c Birch Beer nation Jun -12 low nation
4c Birch Beer nation Sep +42 high nation
5c Cola central Aug -30 low region
6c Diet Soda east Aug +10 low region
7c Diet Soda east Sep -33 medium region
8c Diet Soda east Jul -40 high region
9c Diet Soda south Jul +19 low region
10c Diet Soda west Aug -17 low region
11c Cola Colorado Sep -32 medium state
12c Cola Colorado Jul -40 medium state
13c Cola Wisconsin Jul -11 low state

Fig. 3. A DM discovery derived from a retailing database (cell and drill-down columns are
not conveyed in the summary report); taken from (Sarawagi et al. [18])

Fig. 3 presents a DM discovery input for HYSSOP, comprising 13 cells. Each cell
is described by three dimension values (product, place, time) and two measures
(sales-diff, exception). In addition, the right-left column indicates in what drill-down
place layer the cell was discovered (nation -> region -> state). This DM discovery
was taken from a retail application [18].

4 Discourse Planning: Content Matrix Approach

The content organization task deals with the question When to say what?. Pre-
vious quantitative data NLG approaches relay on a topic tree discourse structure (or
other equivalent device) which fails to enforce linear precedence order among
6 Favero and Robin.

similar nodes, which is the main issue in our application domain. To deal with the
grouping and ordering problem, we introduced a new content organization approach
which relies on an input data representation called content matrix. A content matrix
has the following characteristics:
initially it is populated from a database relational table;
unlike a relational table, its rows and columns are ordered;
it is an internal data structure to reasoning about the content organization;
during the content organization rows and columns are reordered;
Let us introduce the content organization approach on basis a practical example.
Fig. 4 shows an summary report (version A) and the related content matrix already
organized, for Fig. 3 input.

Last year, the most atypical sales variations from one month to the next occurred for:
Birch Beer with a 42% national increase from September to October;
Diet Soda with a 40% decrease in the Eastern region from July to August.
At the next level of idiosyncrasy came:
Colas Colorado sales, falling 40% from July to August and then a further 32% from Septem-
ber to October;
again Diet Soda Eastern sales, falling 33% from September to October.
Less aberrant but still notably atypical were:
again nationwide Birch Beer sales' -12% from June to July and -10% from November to De-
cember;
Cola's 11% fall from July to August in the Central region and 30% dive in Wisconsin from
August to September;
Diet Soda sales 19% increase in the Southern region from July to August, followed by its two
opposite regional variations from August to September, +10% in the East but -17% in the
West;
national Jolt Cola sales' +6% from August to September.
To know what makes one of these variations unusual in the context of this year's sales, click on it.

Cell exception product place time sales-diff


4c high Birch Beer nation Sep +42
8c high Diet Soda east Jul -40
12c medium Cola Colorado Jul -40
11c medium Cola Colorado Sep -32
7c medium Diet Soda east Sep -33
3c low Birch Beer nation Jun -12
1c low Birch Beer nation Nov -10
13c low Cola Wisconsin Jul -11
5c low Cola central Aug -30
9c low Diet Soda south Jul +19
6c low Diet Soda east Aug +10
10c low Diet Soda west Aug -17
2c low Jolt Cola nation Aug +6

Fig. 4. Version A summary report and its related content matrix, for the input of Fig. 3

The page text follows a discourse organization strategy, by exceptionallity then


by product: following this, the rows were grouped by exception; the exception-
groups, among them, are sorted by decreasing of exception value: high in the first
Lecture Notes in Computer Science 7

paragraph (the most atypical...); medium in the second (At next level of idiosyn-
crasy...) and low in the third (Less aberrant but still atypical...). Inside each excep-
tion-group paragraph there are itemize product-groups. The product-groups, among
them, are sorted by alphabetic order (Birch beer ... Diet Soda...). Finally, inside a
product-group the rows are sorted by sales-diff.
This natural language enunciated strategy is a problem. What does mean this
statement in a more formal computer oriented language? Below we reword it in a
pseudo language:
group-by measure exception, sorted-by measure exception decrease
then group-by dim product, sorted-by dim product increase
then sort-by measure sales-Diff decrease
These three statements are linked by a THEN particle denoting a sequential order
among them. A subsequent statement runs inside the previous formed groups. As
consequence, the result is a nested structure.

Last year, there were 13 exceptions in the beverage product line.


The most striking was Birch Beer's 42% national fall from Sep to Oct.
The remaining exceptions clustered around four products were:
Again, Birch Beer's sales accounting for other two national exceptions, both decreasing mild
values: (1) a 12% from Jun to Jul; and (2) a 10% from Nov to Dec;
Cola's sales accounting for four exceptions:
(1) two medium in Colorado, a 40% from Jul to Aug and a 32% from Aug to Sep; and (2)
two mild, a 11% in Wisconsin from Jul to Aug and a 30% in Central region from Aug to Sep;
Diet Soda accounting for five exceptions:
(1) one strong, a 40% slump in Eastern region from July to Aug; (2) one medium, a 33%
slump in Eastern region from Sep to Oct; (3) three mild: two increasing, a 19% in Southern
region from Jul to Aug and a 10 % in Eastern region from Aug to Sep; and one falling, a 17%
in Western region from Aug to Sep;
Finally, Jolt Cola's sales accounting for one mild exception, a 6% national fall from Aug to
Sep.

Cell product exception dir sales-diff place time


4c Birch Beer high + 42 nation Sep
3c Birch Beer low - 12 nation Jun
1c Birch Beer low - 10 nation Nov
12c Cola medium - 40 Colorado Jul
11c Cola medium - 32 Colorado Sep
13c Cola low - 11 Wisconsin Jul
5c Cola low - 30 central Aug
8c Diet Soda high - 40 east Jul
7c Diet Soda medium - 33 east Sep
9c Diet Soda low + 19 south Jul
6c Diet Soda low + 10 east Aug
10c Diet Soda low - 17 west Aug
2c Jolt Cola low + 6 nation Aug

Fig. 5. Version B summary report (conveying groups counts) and its related content matrix

Using a discourse language to define strategies, one can create several alterna-
tive discourse organizations for a same input: for each one a different output is pro-
8 Favero and Robin.

duced. Fig. 5 presents an alternative text output, Version B, generated by the fol-
lowing statement:
with count on all groups
group-by dim product, ordered-by by product increase
then group-by measure exceptionallity, ordered-by measure exception decrease
then group-by dim dir, ordered-by measure dir decrease
then sort-by measure sales-Diff decrease
In version B, a count summarization strategy was adopted. For example Cola's
sales accounting for four exceptions: two medium... two mild...). By conveying
count attributes a new style of summary report is produced, in which the text coher-
ence is enforced: first a group is enunciated by its count and common attributes and
then the group is detailed.
Though we presented two outputs for the same input, there are other possible
grouping and sorting combinations. Two other acceptable versions are: (i) a data
analyst can ask for a report organized under the market drill-down hierarchy, in
three main paragraphs (nation -> region -> state); (ii) a decision maker can also look
for anomalies of specific products in specific time intervals (a paragraph for each
month).
In our domain application, the homogeneity of multi-dimensional data is an im-
portant characteristic because it gives to expert-user, who defines the organization
strategy, a wide range of choices to place the cells into a summary report.

cell exception product place time sales-diff


4c high Birch Beer nation Sep +42
8c *2 Diet Soda east Jul -40
11c medium Cola Colorado Sep -32
12c *3 *2 Colorado Jul -40
7c Diet Soda east Sep -33
1c low Birch Beer nation Nov -10
3c *8 *2 nation Jun -12
5c Cola central Aug -30
13c *2 Wisconsin Jul -11
6c Diet Soda east Aug +10
9c *3 south Jul +19
10c west Aug -17
2c Jolt Cola nation Aug +6

group-by measure exception, sorted-by measure exception decrease


then group-by dim product, sorted-by dim product increase
Lecture Notes in Computer Science 9

Fig. 6. Strategy groups factorization, for version A summary report

5 Sentence Planning: Factorization Matrix Approach

A specialized sentence planner was designed to be compatible with our discourse


organization approach. It implements a content aggregation approach for the content
matrix. It basically removes the content matrix recurrent information using a fac-
torization process, in two steps:
Factorization of discourse strategy groups;
Fine gain factorization inside the discourse strategy groups;
The factorization, initially, deals with the groups defined by the discourse strategy.
A column, when moved to the most-left position by a group-by statements, is
factored out, as exemplified in Fig. 6. This process is repeated for each one of the
group-by statements in version A, for exception and product.
cell exception product place X time or sales-diff
time X place
4c high Birch Beer nation Sep +42
8c *2 Diet Soda east Jul -40
11c medium Cola Colorado Sep -32
12c *3 *2 *2 Jul -40
7c Diet Soda east Sep -33
1c low Birch Beer nation Nov -10
3c *8 *2 *2 Jun -12
5c Cola central Aug -30
13c *2 Wisconsin Jul -11
9c Diet Soda Jul south +19
6c Aug east +10
10c *3 *2 west -17
2c Jolt Cola nation Aug +6

Fig. 7. Fine grain factorization, inside the product groups the totally factored content ma-
trix for Version A summary report

cat =aggr, level=1, ngroup=2, nmsg=3


common= exceptionallity = medium %%At next level of idiosyncrasy came:

distinct= cat =seq


cat =aggr, level =2, ngroup =2, nmsg=2,
common= product=Cola, place=Colorado %% Colas Colorado sales

distinct= cat=seq
cat=msg, time=7, var =-40%% falling 40% from Jun to Jul

cat=msg, time=9 var=-32%% and then a further 32 from Sep to Oct

cat =msg, product =Diet Soda, time =9, place =east, var=-33
anaph [occurr =2nd, repeated=[product, place]
10 Favero and Robin.

%%again Diet Soda Eastern sales, falling 33% from Sep to Oct

Fig. 8. A fragment (the first two main groups) of the factored content matrix of Fig. 7, repre-
sented as a discourse tree

In a second step, the factorization process continues, now inside the previous fac-
tored groups. Unlike the first step, the fine grain factorization produces a nested
content matrix which cannot be represented in a relational table; here, different row-
groups can have different order for its columns. For example, in the low X DietSoda
group, the time column was moved to left (Fig. 7) to factor out one Aug value;
which is reflected in the text generated: ...its two opposite regional variations from
August to September, +10% in the East but -17% in the West. The left move is
necessary because the factorization happens on most-left columns. In fact, a factored
content matrix represents a nested discourse tree, Fig. 8. The factorization process
and its algorithm is detailed in [16].
Linguistically, while the content organization enforce an order for the input mes-
sages (table rows) the factorization process works to produce a complex paratactic
constituents grouping minimal sentence constituents with intersecting semantics (the
common-value-columns). The output discourse tree represents a whole paratactic
structure for the summary report. The content realization component translates the
discourse tree output nodes to the text fragments, as annotated in Fig. 8. It follows
currently approaches [5], using a dictionary and a syntactic realization grammar.

5 Related Work

HYSSOP presents tree key differences in relation to previous work on quantita-


tive data NLG (ANA [10], FOG [1], LFS [8], GOSSIP [2], PLANDoc [12], FLOW-
Doc [13], ZEDDoc [14], MeteoCogent [9]):
the OLAP/DM component for content selection;
the new Content Matrix discourse planning approach;
the new Factorization Matrix content aggregation approach;
Though HYSSOP adopts the standard pipeline architecture model, it relies on an
OLAP/DM component which implements the summarization approaches in a robust
domain independent way, contrasting with previous system ad-hoc implementation
of consolidation function and data classification approaches.
Currently in the NLG literature there are two main content organization ap-
proaches: (i) the mentioned topic tree, which comprised also the textual schemata
[11] and (ii) rhetoric relations based (RST) [7]. Both these approaches have a com-
mon weakness:
The topic tree cannot order several instances of a same message under a node;
The RST also have the same problem under non-ordered nodes; e.g., JOINT.
However, HYSSOP introduces the content matrix new text planning approach
which fills this gap. The content matrix, though works for homogeneous set of data,
covers a large class of database applications. In particular, using discourse strategy
Lecture Notes in Computer Science 11

statements one can produce different discourse organization for a same input data
this flexibility is not found in the mentioned alternative approaches (topic-
tree/RST).
To work together with our discourse organization approach we developed a new
content aggregation approach based in a factorization process. Previous aggregation
approaches (such as [3], [17]) relies on some linguistic syntactic/rhetoric internal
representation for clauses. In contrast with previous approaches, our factorization
approach works all time on a semantic layer of representation, keeping the same
portability of the content matrix. A limitation of the factorization approach is deal
only with paratactic content combination.
In conclusion, our new content organization approach as a whole can be used
with different inputs, directly from a database. The interface between the content
organization process and the database application is the relational table schemata.

6. Conclusion

Several NLG systems were developed to produce textual summary reports from
database quantitative data (the mentioned: ANA, FOG, LFS, GOSSIP, FLOWDoc,
PLANDoc, ZEDDoc, MeteoCogent). All these systems rely on some kind of se-
mantic network knowledge representation for reasoning about the input data. As
consequence:
for content determination, most of them implement ad-hoc consolidation
functions which are built-in in OLAP/DM systems.
many of them implement ad-hoc functionalites of databases: for example,
the SQL/order by for content organization.
In HYSSOP, modern database functionalities present a significant technologic ad-
vance: the ad-hoc application specific implementations are systematized and gener-
alized into domain independent robust implementations.
From a database perspective, HYSSOPs content organization approach allows
the actual integration of these modern database technologies to produce natural
language summaries. Our research goal is the systematic use of database technolo-
gies, specially OLAP and DM, in the content determination and content organiza-
tion tasks:
OLAP provides tools for quantitative summarization, based in consolidation
functions e.g., count() and avg();
OLAP data representation allows the extraction of dense data aggregates, ordered
in different ways, ready to be conveyed;
OLAP dimension hierarchies play a similar role to NLG ontologies for data clas-
sification;
Current databases provide grouping/ordering operations to content organization
of non dense data sets, e.g., DM discoveries;
DM approaches extend and improved the current NLG content determination
approaches by systematically deals with several discovering tools such as clus-
12 Favero and Robin.

tering analysis; trends, time-series; learning association rules, finding dependency


networks [4]. Thus, DM provides intelligent, domain independent tools for con-
tent selection.
The database technology for content delimitation and organization relies on general
domain-independent database models and systems. Thus, a same successful imple-
mentation can be ported and reused in several similar applications. In addition, the
OLAP/DM technology can scale up, from small database application to very large
databases. Adding a significant advance to the current NLG content determination
technology which typically deals with small databases.

References

1. Bourbeau, L., Carcagno, D., Goldberg, E., Kittredge, R., Polgure, A.: Bilingual genera-
tion of weather forecast in an operational environment. COLING'90, Helsinki, (1990).
2. Carcagno, D., Iordanskaja, L.: Content determination and text structuring; two interrelated
processes. In H Horacek [ed.] New concepts in NLG: Planning, realization and systems.
London: Pinter Publishers, pp 10-26, (1993).
3. Dalianis, H.: Aggregation as a subtask of text and sentence planning. In Proc. of Florida
AI Research symposium, FLAIRS-96, Florida, pp 1-5, (1996).
4. DBMiner: http://db.cs.sfu.edu/DBMiner/index.html, (2000).
5. Elhadad, M., McKeown, K., Robin, J.: Floating constraints in lexical choice. Computa-
tional Linguistics, 23(2), (1977).
6. Favero, E.L.: Generating hypertext summaries of data mining discoveries in multidimen-
sional databases. PhD Thesis. CIn, Universidade Federal de Pernambuco, Recife, Brazil.
7. Hovy, E.: Automated discourse generation using discourse structure relations. Artificial
Intelligence, 63: 341-385, (1993).
8. Iordanskaja, L., Kim, M., Kittredge, R., Lavoie, B., Polguere, A.: Generation extended
bilingual statistical reports. In Proc. of COLING 94, (1994), pp.1019-1023.
9. Kittredge, R., Lavoie, B.: MeteoCogent: A knowledge-based tool for generation weather
forecast texts. In Proc. of the American meteorological Society AI Conference (MAS-98),
Phoenix, Arizona, (1998).
10. Kukich, K.: Knowledge-based Report Generation: A knowledge-engineering approach to
NL Report Generation; Department of Information Science, University of Pittsburgh, Ph.
D. thesis, (1983).
11. McKeown, K.: Text generation. Cambridge University Press, Cambridge, England, 1985.
12. McKeown, K., Kukich, K., Shaw, J.: Practical issues in automatic document generation.
In Proc. of ANLP94, pages 7-14, Stuttgart, October (1994).
13. Passonneau, B., Kukich, K., Robin, J., Hatzivassiloglou, V., Lefkowitz, L. Jin, H.: Gen-
erating Summaries of Work Flow Diagrams. In Proc. of the Intern. Conference on NLP
and Industrial Applications. Moncton, New Brunswick, Canada (NLP+IA'96). 7p. (1996)
14. Passonneau, B., Radev, D., Kukich, K., McKeown, K., Jin, H.: Summarizing Web Traf-
fic: A portability exercise, (1997).
15. Reiter, E., Dale, R.: Building applied natural language generation system. ANLPC
Washington DC. (1997).
Lecture Notes in Computer Science 13

16. Robin, J. Favero, E.L.: Content aggregation in natural language hypertext summarization
of OLAP and Data Mining discoveries. In Proc. of INLG2000 Conference (Intern. Natu-
ral Language Generation), Mitzpe Ramon, Israel June (2000).
th
17. Shaw, J.: Segregatory coordination and ellipsis in text generation. In Proc. of the 17
COLING98. (1998).
18. Sarawagi, S., Agrawal, R., Megiddo, N.: Discovery-driven exploration of MDDB data
cubes. In Proc. Int. Conf. of Extending Database Technology [EDBT98], March, (1998).