Sie sind auf Seite 1von 4

December, 2009

Attivios JOIN Capability Technical Brief

The traditional methods for accessing unstructured content and structured data are divided: enterprise search engines are used for finding content, and database technologies are needed for retrieving data. Most database methods dont search content, and those that do manage only the content that is previously stored and organized in database tables. On the other hand, legacy enterprise search engines dont look at databases, leaving their contents out of search results. More recently, some search engines have included an ability to pull data from databases, but they typically lose the relational quality of the data, which is a main value of databases. Attivio has solved this problem of assembling data from both content sources and databases without losing the data relationships. Attivios JOIN capability is the key to this advance in information access functionality.

Fundamentals of SQL JOIN

JOIN originated as a Structured Query Language (SQL) clause for use in databases and is not available search engines. The SQL JOIN operation combines records from two or more tables in a relational database, linking specific fields to create a single result set. The SQL JOIN allows update efficiency by enabling smaller tables structures. For example, youd use SQL JOIN to relate the CustomerID field in a Customer table and the CustID field in an Accounts Receivable table so a single query can retrieve all related information about a customer from both tables without duplicating information. Customers CustomerID CustLastName CustFirstName CustStreetAddress CustState CustZip Accounts Receivable ________JOIN_______ CustID BalanceAveAnnual AmountDue BillDate DueDate

Figure 1: JOINed Database tables

Attivio 246 Walnut Street Newton, MA 02460 o +1.857.226.5040 | f +1.857.226.5072 | e

With a SQL JOIN in place, a single query can include CustState from the Customers table and AmountDue from the Accounts Receivable table. For example, a query could return the customer IDs and names of every customer in Maine with a balance over $1000. Though very powerful, SQL JOINs are complex statements created by database developers familiar with the underlying data model. Moreover, the static schema that is required means that uses for JOIN must be planned and implemented well in advance of their use.

Attivios AIE JOIN Capability

The JOIN operation in AIE, which is part of Attivios Advanced Query Language, allows all matching information to be retrieved from disparate sources without any of the prior set-up and data modeling that are required in databases. In other words, with AIE you dont have to JOIN tables in advance. The AIE JOIN is executed when the query is issued. The information that is JOINed can originate in content or databases, including different databases whose tables arent linked with a SQL JOIN. This means that with one query, AIE JOIN allows you to gather and analyze all types of information, regardless of source or format. Queries that use JOIN ask for multiple pieces of information, containing a string of at least two subqueries, with each sub-query requesting one type of information. For example, a query with sub-queries could ask for July sales (one sub-query) for MKI products (one sub-query) included in Time magazine article (one sub-query). AIE sub-queries can be a table or a complex query (including nested JOINs). The sub-queries are dynamically combined on any fields. Unlike in databases, keys are not required and relationships among fields and tables do not need to be pre-defined. Instead, a record is created on the fly to contain the data and content that match the query.

Unique Qualities of AIE JOIN

The JOIN operator developed by Attivio is not found in any other information access or enterprise search product and offers functionality different from the JOIN clause in relational database (RDBMS) products. Specifically, Attivios patent application covers: Querying joined data within a hybrid index. The innovations in AIE are: 1. Making JOINs available for all content sources as well as all data sources 2. Executing the JOIN operation at query time (whereas database JOINs must be set up prior to queries) 3. Not requiring predefined schemas (table definitions), relationships (any sub-queries can be joined), or models for how data is to be extracted from sources.

Page 2 of 4

Attivio 246 Walnut Street Newton, MA 02460 o +1.857.226.5040 | f +1.857.226.5072 | e

Example of a JOIN Query

In the example below, the user of the dashboard in this baseball application requests information on Jason Giambi. The dashboard is populated from a large set of data and content (articles about baseball) that are indexed together in AIE. A query for any entity (noun phrase) in the index presents a JOINed set of information statistics data and news content about the entity the user requests, which in this example is player Jason Giambi. When the user issues the simple query Jason Giambi (or selects this name from the facet recommendations on the left), the JOIN query is executed, and the user interface is populated with data about this player and all content in which he is mentioned. In this example, AIE returns facets such as people that are entities extracted from the news stories (the entity extraction executes automatically during the ingestion and indexing). Other facets, such as height are derived from statistics data. These facets are used in a menu that facilitates navigation. The dashboard also shows that AIE automatically calculates and returns facet statistics for every dimension used in the dashboard, such as runs, hits, average, standard deviation and count values. This data is displayed in tables and charts. The behind-the-scenes AIE JOIN query for populating this dashboard is:
JOIN (OR(AND(table:master,playerID:giambija01),content:"JasonGiambi"),OUTER(table:batting), on = newsPlayerID)

Figure 2: Results of an AIE JOIN

Page 3 of 4

Attivio 246 Walnut Street Newton, MA 02460 o +1.857.226.5040 | f +1.857.226.5072 | e

In contrast to AIE, the standard SQL required to accomplish the same thing: Depends on significant up-front work to define the data model, design the schema and structure the content tables Cant access content that isnt previously stored and organized as database records Requires two multi-line queries (one for data, one for content) Doesnt provide dynamic facet recommendation to facilitate navigation and exploration Specifically, AIEs syntax for building this dashboard is much simpler than SQL: a) The SQL UNION that unites the two queries to access the data and content requires the same number of columns (with matching types) from both the data and content queries. AIE does not have this reliance on columns or on pre-defined schemas, so queries can be more flexible. b) The concept of all data including all text-based fields doesnt exist in the relational database model. It would require SQL to include queries for all the tables in the database, creating a HUGE performance issue.

Summary: Advantages of AIEs JOIN Capability

AIE JOIN ensures that all relevant information is included in response to a single query regardless of whether the sources are databases or content stores. Information that is returned can be used to populate charts, tables or other interfaces and can include both data and content (e.g., output can include a chart and a list of relevant content). Data extracted from content can also be included. AIE JOIN is simpler than SQL JOIN for dashboard programmers to implement. Even more important, AIE JOIN also enables significantly more flexibility for users. The advantages of AIE JOIN include: No need to predefine tables and relationships No need to write long and complex SQL queries When saved queries rely on a JOIN, the JOIN is automatically executed each time the saved query is submitted

Prepared by:
Attivio, Inc. 246 Walnut Street Newton, MA 02460 2009 Attivio, Inc. All rights reserved. Attivio, Active Intelligence Engine, and all other related logos and product names are either registered trademarks or trademarks of Attivio in the United States and/or other countries. All other company, product, and service names are the property of their respective holders and may be registered trademarks or trademarks in the United States and/or other countries.

Page 4 of 4