Sie sind auf Seite 1von 4

Spatial On-Line Analytical Processing (SOLAP): A Tool the to Analyze the Emission of Pollutants in Industrial Installations

Rosa Matias Escola Superior de Tecnologia e Gesto de Leiria Instituto Politcnico de Leiria Morro do Alto do Vieiro 2411-901 Leiria, Portugal Email: rmatias@estg.ipleiria.pt
AbstractThis paper presents a category of On-Line Analytical Processing (OLAP) tool intended to explore data related to space i.e. spatial data. The tool, designated SOLAP, integrates concepts from two different worlds, namely, Geographic Information Systems (GIS) and OLAP. OLAP systems are developed for interactive and rapid analysis of large amounts of data. The integration of spatial data in OLAP systems raises some questions because spatial data has a specific and complex nature since it can be in the form of text, numbers, vectors or images. For example, is necessary to introduce spatial data types and spatial operations in the data layer, execute slices, drill-ups or drill-downs with spatial data, display spatial data in a human readable manner or plan ways for acceptable response times during the execution of OLAP operations with spatial data. We develop a SOLAP system that makes use of spatial packages in Object Relational Database Management Systems (ORDBMS) in order enable navigation through spatial data. In the scope of this work we implement a prototype and apply it to a case of study that consists in analyzing the emission of pollutants in Portuguese industrial installations. Index TermsGeographic Information Systems, Spatial Databases, Spatial Data Warehousing, Spatial On-Line Analytical Processing

Joo Moura-Pires CENTRIA Faculdade de Cincias e Tecnologia Universidade Nova de Lisboa Quinta da Torre, 2829-516 Caparica, Portugal Email: jmp@di.fct.unl.pt
This paper presents a category of OLAP tool intended to explore data related to space i.e. spatial data. The tool, designated SOLAP, integrates concepts from two different worlds, namely, GIS and OLAP. First we explain the need and the meaning of Spatial Online Analytical Processing; the multidimensional model is revisited considering dimensions with spatial attributes and fact tables with spatial measures. The impact of adding spatial information to the multidimensional model is analyzed in terms of typical OLAP operations, slide, drill-up and drilldown. Then a prototype of a SOLAP tool is presented showing the benefits of this kind of system with some examples. II. SPATIAL ON-LINE ANALYTICAL PROCESSING SOLAP is a visual platform built especially to support rapid and easy spatial-temporal analysis and exploration of data following a multidimensional approach comprised of aggregations levels available in cartographic display as well as in tabular and diagram display [1]. A SOLAP tool tries to bring advantages from GIS and OLAP applications. GIS is a computer system capable of assembling, storing, manipulating, and displaying geographically referenced information i.e. data identified according to their locations [4]. GIS systems are known to be not very well adapted for decision-making because they have complex query interfaces and spatial operators that are not very intuitive for non-specialists. GIS systems do not support well on the fly aggregation of data and processing times may be very long for complex queries that are typical of strategic decision-making [2]. However, they are very useful for the visualization of spatial data [2]. OLAP systems intend to explore large amounts of data in rapid and intuitive way but are not very suited for handling spatial data [6] because they only have concerns about alphanumeric data types. A. Multidimensional Model The entity-relationship data model is normally used in the development of On-Line Transaction Processing (OLTP) systems. The entity-relationship conduct to the spread of data

I. INTRODUCTION N data warehouses systems time is considered to be a fundamental variable. But isnt location a fundamental issue to? We cannot forget that every thing that happens happens somewhere. The mobility of people, goods and services has contributed for the accumulation of data related to space. In operational databases of organizations it is estimated that there are large amounts of spatial data. For examples, the address and telephone numbers of employees, clients, suppliers or customers are spatial data [7] since they can be translated to a place in the surface of earth. The evolutions of hardware devices and software applications have also contribute to the accumulation of spatial data. For example devices like satellites collects, every days, terabytes of images of the surface of the heart or mobile devices track lots of data about the location of people and goods. All that data is waiting to be analyzed.

along the database witch makes impracticable the extraction information that can give an overview of status in organization. The entity-relationship model normalizes data and contributes to the degradation of performance in queries that involves many tables. This limitation of the entityrelationship model conducted to emerge of the multidimensional model whose approach is more in agreement with the functional and performance requirements of OLAP systems. The multidimensional model contains a central table (fact table) containing the bulk of the data, with no redundancy and a set of smaller attendant tables (dimension tables), normally one for each dimension [5]. A dimension has attributes that are columns of the table that represents the dimension. Dimensions and fact tables have a master/detail relationship. Concept hierarchies is a method of ordering attributes of dimensions in a general to specific way [5] and are used to navigate through information. In the case of the emission of industrial pollutants we have dimensions like time, industrial installations, pollutants or activities. The central table has facts like the emission amount.
.

examples of spatial measures are: (i) geographic coordinates of an emission point; or (ii) the polygon of diffusion area of a pollutant. C. OLAP Operations Common OLAP operations are slice, drill up and drill down. In a slice operation values of attributes of dimensions are restricted. For example, in Fig. 2, the attribute emittedTo that bellows to the pollutants dimension, is used to obtain a dataset with only the emission that are made to water.

Fig. 2. SQL command for a slice operation. There is a join operation between the fact table emissions and the dimension pollutants. There exit a restriction using the attribute emmitedTo of the dimension pollutants.

In a drill up operation the level of abstraction decreases. For example, in Fig. 2, if we remove the attribute facility we obtain a dataset with lower granularity. In a drill down operation the level of detail increases. For example, in Fig. 2 we could add the attribute county name of the dimension installations and obtain a dataset with greater granularity. Drill up and drill down operations can be done in combination with concept hierarchies. D. Extended OLAP Operations Attributes and measures can be represented by geometric data types like points, lines or polygons. So we have to extend OLAP operations in order to manipulate that kind of data types. The geometric component introduces new capacities in the slice OLAP operation. With alphanumeric attributes we use operators like equal, superior, inferior or between. With geometric attributes we use operators that establish relationships between geometries, so, questions to ask have a different nature. Answers to questions involve establishing and verifying relations between objects in space. That was not possible before. The most common geometric relationship operations are topology and distance. Geometric attributes of spatial dimensions can be restricted by using: (i) geometry, for example, geometry that represent a polluted area; (ii) layers, for example, rivers, lakes, protected places or tourist places; (iii) or a rectangle or buffer defined by a user. Some spatial slices are: (i) restrict the industrial installations to those located in protected areas; (ii) restrict the industrial installations to those near contaminated wells; or (iii) restrict the counties that are crossed by rivers (county polygons overlapped by river lines).

Fig. 1. Multidimensional model in the case the emissions of pollutants in industrial installations.

B. Spatial Dimensions, Hierarchies and Measures A spatial dimension is a dimension witch has one or more attributes whose data type can be alphanumeric or geometric. In a dimension like industrial installations examples of a spatial and alphanumeric attribute are: (i) the address of the industrial installation; (ii) the name of the county associated to the industrial installation. On the other hand examples of spatial and geometric attribute are: (i) the location point of the industrial installation; (ii) the county polygon where the industrial installation is located; or (iii) the rivers lines that represent the most near rivers of the industrial installation. A spatial hierarchy can be: (i) full semantic, if all attributes are alphanumeric, for instance, industrial installations addresses that rollup to county names; (ii) hybrid, if some attributes are geometric and others alphanumeric, for instance, location points of industrial installations that rollup to county names; or (iii) full geometric, if all attributes are geometric, for instance, location points representing industrial installations that rollup to county polygons. Spatial measures are measures, stored in a fact table, of geometric type, for instance, geographic coordinates. In analyzing the emission of pollutants, in industrial installations,

Fig. 3. SQL Command with a topology relationship. Only data related to installations inside a rectangle, that represent a polluted area, are obtained. The function SDO_INSIDE is a topology function that bellows to a spatial package of a spatial database.

A spatial drill up or drill down with an alphanumeric attribute and a geometric measure involves a geometric aggregation function like an aggregation union. The process may involve obtaining collections of geometries in accord with same group by clause.

integrates concepts from the relational model, the multidimensional model and the GIS world. Tables, columns, primary keys and foreign keys are mapped in to dimensions, attributes, fact tables and measures. Geometric measures are combined with numeric measures for defining thematic layouts. The metadata is loaded into the OLAP Server and translated into objects that implement the functionality of the system. The multiple ways of users to combine operations have addressed the necessity of building a SQL engine. Operations made by users are translated to SQL statements that are submitted to the data layer. C. Client Layer The client layer has a GIS component capable of retrieving geometries from the spatial database. With a thematic display users can watch in a graphical way the influence of measures in areas of space. The thematic visualization is accomplished with a legend that permits the interpretations of visual signal in the map. Operations that users can execute are: (i) filters with numeric measures; (ii) filters with geometric measures; (iii) slices with alphanumeric attributes of dimensions; (iv) slices with geometric attributes of spatial dimensions; (v) rolling up through spatial hierarchies; (vi) combining geometric measures with spatial data of reference; (vii) aggregating geometric objects ad hoc.

Fig. 4. SQL command width aggregation function SDO_AGGR_UNION.

Pre-aggregating spatial cubes can benefit performance. Geometries can have hundreds of points and so tends to occupy more space than alphanumeric data. The computation of operations that involves geometries is expensive especially if the geometries have irregular boundaries. The Minimum Bounding Rectangle (MBR) is a function that returns the minimum rectangle that encloses geometry. Using the MBR of geometries instead of the exact geometry makes the computation more light and saves disk space. The geometric measures can be calculated [3]: (i) without pre-aggregation; (ii) with pre-aggregation using the MBR; or (iii) selective preaggregation. Without pre-aggregation all the clients requests have to be made on the fly. Pre-aggregation with the MBR of geometries avoids the space occupation of pre-aggregations of the exact geometries and the users can always request the exact geometries. The selective pre-aggregation involves identifying the important aggregations. Algorithms can be used for this propose. The work in [3] addresses this question. III. A PROTOTYPE FOR A SOLAP TOOL Our OLAP system has three layers, namely, the data layer, the OLAP server layer and the client layer. A. Data Layer The Data Layer is an ORDBMS with a spatial package that implements the multidimensional model with spatial attributes, spatial measures and spatial hierarchies. We identified the relevant spatial aggregates and materialize them. B. OLAP Server Layer The OLAP server Layer is a ROLAP Server and is responsible for the behavior of the system. Its a bridge between the data layer and the client layer. A metadata repository describes the characteristics of the data layer and

Fig. 5. The figure represents the architecture of the SOLAP tool. Layers are implemented in a separated way. The OLAP Server requests alphanumeric and geometric data to the data layer. Geometric data is presented in a GIS Component.

Some questions that the prototype can answer are: (i) for a year and a pollutant show the thematic map with the distribution of emission points in space; (ii) for a year, show the diffusion areas of sulfate, emitted by industrial installations located in protected areas; (iii) for the a common year and pollutant, overlap the thematic map of emissions per regions with the thematic map of emission points; (iv) in a date show diffusion polygons of pollutant CO2 width

emission amount superior to some values; (v) for a period of time show emission points near a river with pollution problems; or (vi) for a year and a pollutant show a thematic map with diffusion polygons associated with the 10 greater emission amounts.

REFERENCES
[1] [2] Bdard, Y. (1997). Spatial OLAP. Vidoconfrence, 2me Forum annuel sur la R-D. Gomatique VI: Un monde accessible, Montral, Canada. Bdard, Y. (2003). Integrating GIS components with knowledge discovery technology for enveronmetal head decision support. International Journal of Medical Informatics, 70, 79-94. Han, J., Stefanovic, N., & Kopersk, K. (1998). Selective Materialization: An Efficient Method for Spatial Data Cube Construction. In PAKDD '98 (Ed.), Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining (pp. 144158). London: Springer-Verlag. GIS definition. Available from http://www.unity.edu/sarihou/cs2883/gisdefinition Han, J., & Kamber, M. (2001). Data Mining Concepts and Techiques. San Francisco: Morgan Kaufmann. Rivest, S., Brdard, Y., and Marchand P., (2001), TOWARD BETTER SUPPORT FOR SPATIAL DECISION MAKING: DEFINING THE CHARACTERISTICS OF SPATIAL ON-LINE ANALYTICAL PROCESSING (SOLAP). Geomatica, 55(4), 539-555 [Web cast from Microstrategy] MicroStrategy and MapInfo: Business Intelligence + Location Intelligence = The Ultimate Decision Tool for the Enterprise [Web cast from Microstrategy]. Available from http://www.microstrategy.com/Profile/Webcasts/

[3]

[4] [5] [6]

[7]

Fig. 6. The figure is an overlap of three thematic layers with common data, namely, data about the same pollutant and the same period of time. One thematic layer represents the amount emitted within diffusion polygons. Other thematic layer represents the amount emitted within county polygons. Finally, other thematic layer represents an external reference to a continuous map about the concentration of the pollutant in the ground. In this case and with the prototype the user can manipulate data in order to observe a simpler figure.

IV. CONCLUSION AND FUTURE WORK This kind of system is a hybrid system developed with a combination of technologies that belongs to many areas. Using spatial packages of one ORDBMS as enable the functionality of the system we just manipulate spatial data types and geometric operations. On the other hand we have all the benefits of using one ORDBMS. Users can make many combinations of variables using available objects of a multidimensional engine. That actions will be translated with a SQL engine that will generate SQL commands for submit to the spatial database. The GIS component is important for human understanding of geometries but visualization makes sense if it has a thematic component that projects measures in geometries. In ORDBMS spatial packages are now prepared to execute spatial data mining; the system only has to allow parameterizations to do things like spatial classification or spatial clustering.

Das könnte Ihnen auch gefallen