You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/310295040

Experimental Evaluation of Spatial Indices with FESTIval

Conference Paper · October 2016

CITATIONS READS

6 72

3 authors, including:

Anderson Chaves Carniel Ricardo Rodrigues Ciferri


University of São Paulo Federal University of São Carlos (UFSCar)
26 PUBLICATIONS   60 CITATIONS    75 PUBLICATIONS   377 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

USO DE FERRAMENTAS DE CÓDIGO ABERTO EM UM DATA WAREHOUSE SOBRE DADOS EDUCACIONAIS View project

Spatial Indexing on Flash Memories View project

All content following this page was uploaded by Anderson Chaves Carniel on 15 November 2016.

The user has requested enhancement of the downloaded file.


31th SBBD – Demonstration Track October, 2016 – Salvador, BA, Brazil

demos:03

Experimental Evaluation of Spatial Indices with FESTIval


Anderson Chaves Carniel1 , Ricardo Rodrigues Ciferri2 ,
Cristina Dutra de Aguiar Ciferri1
1
Department of Computer Science – University of São Paulo
São Carlos, SP 15560-970, Brazil
2
Department of Computer Science – Federal University of São Carlos
São Carlos, SP 15565-905, Brazil
accarniel@gmail.com, ricardo@dc.ufscar.br, cdac@icmc.usp.br

Abstract. Spatial indices like the R-tree and the R*-tree are widely employed in
spatial databases to improve the spatial query processing, such as point queries
and spatial range queries. Different parameters are conceivable for spatial in-
dices, which directly impact in their performance. Despite there are many evalu-
ations of spatial indices in the literature, the reproducibility of these evaluations
requires much implementation efforts. In this paper, we propose FESTIval, a
PostgreSQL extension that provides a unique environment to evaluate different
spatial indices with different parameters. As a result, FESTIval automatically
collects statistical data of performed operations and allows the performance
comparison of spatial indices by using different metrics.

1. Introduction
Several advanced applications use spatial database systems to manage spa-
tial information represented by spatial data types like points, lines, and re-
gions [Schneider and Behr 2006]. For instance, cities represented by regions. Spa-
tial queries [Gaede and Günther 1998] commonly employ topological predicates (e.g.,
overlap, inside) [Schneider and Behr 2006] to return a set of objects that satisfy
some topological predicate. For instance, a spatial range query that returns all ob-
jects overlapping a rectangular-shaped object (window query). A huge set of spa-
tial indices [Gaede and Günther 1998] have been proposed in the literature to im-
prove the spatial query processing. Hierarchical spatial indices are the most popular,
such as the R-tree [Guttman 1984] and its variant the R*-tree [Beckmann et al. 1990,
Beckmann and Seeger 2009]. Parameterization of spatial indices plays an important role
in the performance of the spatial query processing.
Experimental evaluations are needed to verify the performance of a spatial index.
This task is quite complicated since functional implementations are hardly found even for
the most popular spatial indices. Another problem is that available implementations are
based on different programming languages or system environments, and thus, it can lead
to unfair comparisons and to problems in the collection of statistical data.
This paper has two objectives. The first objective is to propose a unique framework
for spatial indexing evaluation that contains implementations of different spatial indices
to provide fairer comparisons between them. The second objective is to capture, collect,
and store statistical information of performed operations (e.g., insertion, spatial queries)

123
31th SBBD – Demonstration Track October, 2016 – Salvador, BA, Brazil

and thus, comparisons of different spatial indices with different parameters would be pos-
sible. We achieve these goals by proposing FESTIval (stands for Framework to Evaluate
SpaTial Indices in non-volatile memories for PostgreSQL). FESTIval is a framework im-
plemented in a form of a PostgreSQL extension that aids in the execution of experimental
evaluations of different spatial indices for non-volatile memories, such as magnetic disks.
It provides the following main functionalities:

• Creation of different spatial indices with different parameters;


• Processing of spatial queries like spatial range queries on previously constructed
spatial indices.
• Collection of statistical data of performed operations, such as total time process-
ing, number of writes and reads, total time of split operations, and so on.
• Collection of detailed information of a constructed spatial index. It allows to
analyze the structure of a spatial index, which includes its height, number of leaf
and internal nodes, occupancy of each node, and so on.

The remaining of this paper is organized as follows. Section 2 surveys related


work. Section 3 summarizes needed concepts from spatial indexing. Section 4 details
FESTIval. Finally, Section 5 concludes the paper and presents future work.

2. Related Work
Spatial indexing has been an important topic in spatial databases and several spa-
tial indices with different characteristics are proposed [Gaede and Günther 1998].
Commonly, extensive experimental evaluations are conducted in order to check,
verify the performance of a spatial index [Guttman 1984, Beckmann et al. 1990,
Gaede and Günther 1998, Beckmann and Seeger 2009, Sowell et al. 2013]. Here, we
consider two important features: (i) the use of a unique framework or environment that
comprises the implementations of compared spatial indices and (ii) the collection of a
expressive set of different comparison metrics.
With regard to the first characteristic, a majority of the approaches [Guttman 1984,
Beckmann et al. 1990, Gaede and Günther 1998, Beckmann and Seeger 2009] does not
use a unique framework or environment system to evaluate the performance of spatial
indices. As a consequence, the experiments are conducted by using a specific implemen-
tation for each spatial index. Hence, in order to compare different spatial indices, we need
either to reimplement them based on their original papers or to reuse their existing im-
plementations. Unfortunately, the reimplementation is the most common situation since
the source code of the spatial indices are often not available by several reasons, such
as license restrictions. On the other hand, FESTIval and [Sowell et al. 2013] provide a
unique extensible, free, and open-source framework to evaluate different spatial indices
and thus, do not require extra efforts of implementation. While FESTIval offers several
different parameterizations of a spatial index, it is not the case of [Sowell et al. 2013].
Hence, FESTIval allows the reproducibility of experimental evaluations and its possible
extensions by changing several parameters of spatial indices (see Section 4).
With regard to the second characteristic, the approaches differ in the way
that they collect the statistical data of an experimental evaluation. A major-
ity of the approaches [Guttman 1984, Beckmann et al. 1990, Gaede and Günther 1998,

124
31th SBBD – Demonstration Track October, 2016 – Salvador, BA, Brazil

Beckmann and Seeger 2009] collects specific statistical data from their own implemen-
tations. Thus, in addition to the cost of the implementation of each spatial index of the
experiment, these approaches also collect their statistical data of interest. On the other
hand, FESTIval automatically stores and collects an expressive set of statistical data from
a spatial index operation. The collected statistical data are based on several metrics used
in existing evaluations and are detailed in Section 4.2.

3. Spatial Query Processing and Spatial Indexing


Several types of spatial queries have been defined in the literature, such as spatial selec-
tion, spatial range queries, and point queries [Gaede and Günther 1998]. In general, these
spatial queries return a set of spatial objects from a set of objects that satisfy some topo-
logical predicate (e.g., overlap, inside) for a given spatial object, e.g., a spatial query that
find all the rivers intersecting a city.
The evaluation of topological predicates are commonly performed by using
the 9-Intersection Model [Schneider and Behr 2006]. Since this evaluation requires
the computation of complex geometric computational algorithms, spatial approxima-
tions like the Minimum Bounding Rectangle (MBR) are employed [Guttman 1984,
Gaede and Günther 1998]. The goal of a spatial approximation is to use simpler geo-
metric structures (e.g., rectangles) in the topological predicate processing. On the other
hand, these approximations introduce some (dead) space that does not belong to the orig-
inal spatial object. Hence, spatial query processing is composed of two steps: filtering
and refinement [Gaede and Günther 1998].
The filtering step uses spatial approximations to speed up the evaluation of spatial
queries with topological predicates and returns a set of spatial objects that may includes
false hints. This step also employs a spatial index structure, such as the R-tree and the R*-
tree, which index spatial objects by using their MBRs in hierarchical structures. Several
parameters adjust the spatial index performance according to a specific dataset. Page
size as well as minimum and maximum entries of each node are examples of typical
parameters. There is a significant performance impact if different parameters are used in
a spatial index in different datasets [Gaede and Günther 1998]. In addition, each index
may include specific parameters according to its design. For instance, the R*-tree uses a
policy of redistribution.
The refinement step verifies if each object returned by the filtering step satisfies the
required topological predicate by accessing its exact geometry. It is the most expensive
operation on a spatial query processing and thus, the lower the number of objects returned
by the filtering step is, the lower time processing of the refinement step is. As a result,
the refinement step returns the final set of objects of a spatial query. For some cases, the
refinement step is not needed since the topological predicate evaluation by using MBRs
is sufficient to return the final set [Gaede and Günther 1998]. An example is the predicate
disjoint since if two MBRs are disjoint, the original objects are also disjoint.

4. The FESTIval
We propose FESTIval (stands for Framework to Evaluate SpaTial Indices in non-volatile
memories for PostgreSQL), which offers a unique environment to perform experimen-
tal evaluations of different spatial indices with different parameters. FESTIval is a

125
31th SBBD – Demonstration Track October, 2016 – Salvador, BA, Brazil

free, open-source framework implemented in C by using the extensibility provided by


the PostgreSQL internal library. FESTIval accesses spatial objects by using the spa-
tial extension PostGIS and process geometric operations (e.g., topological predicates)
by using the GEOS library. The complete documentation of FESTIval is available at
http://gbd.dc.ufscar.br/festival/.
FESTIval provides support for the following spatial indices: the R-tree and the
R*-tree. To store statistical data of a performed spatial operation, FESTIval maintains a
data schema called sdf. Section 4.1 presents the sdf schema. Section 4.2 details the spatial
operations in which FESTIval provides support.

4.1. FESTIval Data Schema


FESTIval maintains a PostgreSQL schema called sdf in order to store different param-
eters of different spatial indices as well as statistical data collected from a spatial index
operation. The complete documentation of sdf can be found in FESTIval documentation.
This schema is automatically created by FESTIval when it is installed in a PostgreSQL
database. Here, we present only the most important relational table schemas of sdf due to
space limitation:
Source(src_id: int, table:string, column:string, pk:string)
BasicConfiguration(bc_id:int, page_size:int, max_ent:int, min_ent:int)
SpecializedConfiguration(sc_id:int, sc_desc:string)
RStarTreeConfiguration(sc_id:int, rein_perc:int, rein_type:string)
Execution(pe_id:int, total_time:double, read_time:double,
write_time:double, split_time:double, filter_time:double,
refin_time:double, cand_num:int, result_num:int, query_pred:int, ...)
IndexSnapshot(pe_id:int, height:int, num_nodes, num_entries, ...)

There are two categories of information managed by FESTIval and stored in sdf.
The first category refers to the configuration of a spatial index. Firstly, FESTIval stores
needed information of indexed spatial objects, such as its table, column, and primary
key column. This information is stored in the table Source. We also store two types of
parameters of a spatial index. The first type refers to generic parameters that are used
for all spatial indices implemented in FESTIval, such as page size as well as minimum
and maximum entries allowed in nodes. These parameters are stored in the table Basic-
Configuration. The second type refers to specific parameters that are used by a specific
index. That is, each spatial index has its own set of specific parameters, and the table
SpecializedConfiguration generalizes the specific parameters of each spatial index. For
instance, the R*-tree has specific parameters, such as the reinsertion percentage of leaf
and internal nodes (rein perc) and its type (rein type). They are stored in the specialized
table RStarTreeConfiguration and for each register in this table, there is a value in Spe-
cializedConfiguration. It is performed similarly for the R-tree. Several possible values of
both generic and specific parameters are included by default. Users are also able to insert
new parameters, which are checked to determinate if they are valid.
The second category refers to the storage of statistical data collected after a per-
formed operation. Two types of statistical information are collected by FESTIval. The
first type refers to statistical data of a performed operation. This information is stored in
the table Execution, which is a non-normalized table that stores statistical data of any type
of execution. Thus, each spatial operation fills some specific columns of this table (see

126
31th SBBD – Demonstration Track October, 2016 – Salvador, BA, Brazil

Section 4.2). The second type refers to statistical data about the structure of the spatial
index. It includes, the height of the index, number of leaf and internal nodes, number of
entries in leaf and internal nodes, and the total and dead space area of each node. This
information is stored in the table IndexSnapshot.

4.2. General Functionalities


FESTIval provides the (i) creation of a spatial index based on a set of parameters (Sec-
tion 4.2.1) and (ii) the spatial query processing (Section 4.2.2).

4.2.1. Creation of Spatial Indices

FESTIval provides the following SQL function that constructs a spatial index:
FT CreateSpatialIndex(integer index , text name, text path, integer src id , integer bc id ,
integer sc id ). It returns true if the spatial index was successfully constructed, and false
otherwise. A spatial index is constructed by inserting spatial objects one by one according
to the insertion algorithm of the index. To create a spatial index, FESTIval considers the
following parameters: index, name, path, src id, bc id, and sc id. The parameter index is
an identifier that specifies the spatial index to be constructed. The possible values are 1 for
the R-tree and 2 for the R*-tree. The parameter name consists in the name of the spatial
index while the parameter path is the full path of the index directory. The parameter src id
is a primary key value of the table BasicConfiguration that indicates the spatial objects to
be indexed. The parameters bc id and sc id specify the generic and specific parameters
of the spatial index by using the primary key values of the tables BasicConfiguration and
SpecializedConfiguration, respectively. Since the specific parameters refer to only one
type of index, FESTIval checks if the index to be constructed (i.e., the parameter index)
is compatible with the values of the sc id.
This function also automatically collects statistical data with respect to the con-
struction of a spatial index. This data is stored in the table Execution (Section 4.1) which
includes the following information: total time of the construction (total time), processing
time of read and write operations (read time, write time), and processing time of splitting
operations (split time). FESTIval also collects other statistical data that are not showed
in Section 4.1. Further, FESTIval allows to visualize the constructed index by accessing
another relational table. This additional statistical data is not detailed here due to space
limitation but detailed in FESTIval documentation.

4.2.2. Query Processing with Spatial Indices

FESTIval provides the following SQL function that processes a spatial query:
FT QuerySpatialIndex(text name, text path, integer query, geometry obj , integer p). It
returns a set of records that corresponds to the final result of the query with the follow-
ing format (id , geo), where id is the primary key value of the spatial object geo. This
SQL function has the following parameters name, path, query, obj, and p. The parame-
ters name and path specify respectively the name and the location of the index previously
created (Section 4.2.1). The parameter query identifies the type of the spatial query that
will be processed, which can be 1 for spatial selection, 2 for spatial range query, and 3
for point query. The parameter obj gives the search object to be used in the spatial query.

127
31th SBBD – Demonstration Track October, 2016 – Salvador, BA, Brazil

Some restrictions with respect to the geometric format of obj may be applicable. The first
restriction is if query is 2, then the MBR of obj is considered. The second restriction is
if query is 3, then only a point object is allowed for obj. The parameter p specifies the
topological predicate to be used in the spatial query, which includes intersects, overlap,
disjoint, meet, inside, coveredBy, contains, covers, and equals.
This function also automatically collects statistical data with respect to the spa-
tial query processing. This data is stored in the table Execution (Section 4.1) which in-
cludes: total time of the spatial query (total time), processing time of refinement and
filtering steps (filter time, refin time), processing time of read operations (read time), the
employed predicate (query pred), and the number of candidates and results of the spa-
tial query (cand num, result num). Due to space limitation, we recommend to access the
FESTIval documentation that details other collected statistical data.

5. Conclusions and Future Work


In this paper, we propose FESTIval, a framework in a form of a PostgreSQL exten-
sion that allows the conduction of experimental evaluations of different spatial indices
with different parameters by using a same environment. As a consequence, it easily
provides the reproducibility of experiments. Further, FESTIval automatically collects
several statistical data of performed operations that are stored in relational tables in the
PostgreSQL and can be accessed by using SQL queries. Future work will deal with the
extension of FESTIval to provide support to other spatial indices, such as the Hilbert R-
tree [Gaede and Günther 1998]. Further, we will implement buffer pool and the collection
of related statistical data.

Acknowledgments. This work has been supported by the Brazilian federal research agen-
cies CAPES and CNPq as well as by the São Paulo Research Foundation (FAPESP). A.
C. Carniel has been supported by the grant #2015/26687-8, FAPESP. R. R. Ciferri has
been supported by the grant #311868/2015-0, CNPq. C. D. A. Ciferri has been supported
by the grant #2016/04990-3, FAPESP.

References
Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B. (1990). The R*-tree: An effi-
cient and robust access method for points and rectangles. SIGMOD Record, 19(2):322–
331.
Beckmann, N. and Seeger, B. (2009). A revised R*-tree in comparison with related index
structures. In ACM SIGMOD Int. Conf. on Management of Data, pages 799–812.
Gaede, V. and Günther, O. (1998). Multidimensional access methods. ACM Computing
Surveys, 30(2):170–231.
Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. SIGMOD
Record, 14(2):47–57.
Schneider, M. and Behr, T. (2006). Topological relationships between complex spatial
objects. ACM Trans. on Database Systems, 31(1):39–81.
Sowell, B., Salles, M. V., Cao, T., Demers, A., and Gehrke, J. (2013). An experimental
analysis of iterated spatial joins in main memory. Proc. VLDB Endow., 6(14):1882–
1893.

128

View publication stats