Beruflich Dokumente
Kultur Dokumente
Credits
Contents Replicating and synchronizing
by Jonathan Neve
Interbase/FireBird databases using CopyCat Alexey Kovyazin,
Chief Editor
......................................................................... 27
Editor’s note Helen Borrie,
by Dmitri Kouzmenko
Using IBAnalyst
Editor
by Alexey Kovyazin
Rock around the blog
......................................................................... 31 Dmitri Kouzmenko
Editor
................................................................... 3 Readers feedback
Noel Cosgrave,
by Volker Rehn
Comments to “Temporary tables” article
Sub-editor
by Helen E. M. Borrie
Firebird conference
......................................................................... 35 Lev Tashchilin,
................................................................... 4 Designer
Miscellaneous Natalya
Oldest Active ......................................................................... 36
Polyanskaya,
Blog editor
by Helen E. M. Borrie
On Silly Questions and Idiotic Outcomes
................................................................... 5
Server internals
Cover story
by Ann. W. Harrison
Locking, Firebird, and the Lock Table
Subscribe now!
................................................................... 6
To receive future issues
Magazine CD notifications send email to
by Dmitri Kouzmenko and Alexey Kovyazin
Inside BLOBs
subscribe@ibdeveloper.com
................................................................. 11
TestBed
by Vlad Horsun, Alexey Kovyazin
Testing NO SAVEPOINT in InterBase 7.5.1
................................................................... 13
Development area
Best viewed with Acrobat Reader 7
by Vladimir Kotlyarevsky
Object-Oriented Development in RDBMS, Part 1 Donations
Download now!
.................................................................. 22
Firebird either denies the new describing follow a protocol known base file. Most databases have a itself. Firebird coordinates physical
request, or puts it on a list to wait as two-phase locking, which is typi- single server process like Super- access to the database through
until the resource is available. Inter- cal of locks taken by transactions in Server that has exclusive access to locks on database pages.
nal lock requests specify whether database systems. Databases that the database and coordinates
they wait or receive an immediate use record locking for consistency physical access to the file within In general database theory, a trans-
error on a case-by-case basis. control always use two-phase
When a transaction starts, it speci- record locks. In two-phase locking,
fies whether it will wait for locks that a transaction acquires locks as it
it acquires on tables, etc. proceeds and holds the locks until it
ends. Once it releases any lock, it
Lock modes can no longer acquire another. The
two phases are lock acquisition and
For concurrency and read commit- lock release. They cannot overlap.
ted transactions, Firebird locks
tables for shared read or shared When a Firebird transaction reads
write. Either mode says, “I’m using a table, it holds a lock on that table
this table, but you are free to use it until it ends. When a concurrency
too.” Consistency mode transac- transaction has acquired a shared
tions follow different rules. They write lock to update a table, no con-
lock tables for protected read or sistency mode transaction will be
protected write. Those modes say able to get a protected lock on that
“I’m using the table and no one else table until the transaction with the
is allowed to change it until I’m shared write lock ends and releases
done.” Protected read is compati- its locks. Table locking in Firebird is
ble with shared read and other pro- two-phase locking.
tected read transactions. Protected
write is only compatible with share Locks can also be transient, taken
read. and released as necessary during
the running of a transaction. Fire-
The important concept about lock bird uses transient locking exten-
modes is that locks are more subtle sively to manage physical access to
than mutexes – locks allow the database.
resource sharing, as well as protect-
ing resources from incompatible Firebird page locks
use.
One major difference between Fire-
bird and most other databases is
Two-phase locking vs. tran- Firebird’s Classic mode. In Classic
sient locking mode, many separate processes
The table locks that we’ve been share write access to a single data-
action is a set of steps that transform section. In SuperServer, only the state of the lock table to reflect its on a resource that is already locked
the database from on consistent server uses that share memory request. in an incompatible mode, one of
state to another. During that trans- area. In Classic, every database two things happens. Either the
formation, the resources held by the connection maps the shared memo- Conflicting lock requests
requesting transaction gets an
transaction must be protected from ry and every connection can read When a request is made for a lock immediate error, or the request is
incompatible changes by other and change the contents of the
transactions. Two-phase locks are memory.
that protection.
The lock table is a separate piece of
In Firebird, internally, each time a share memory. In SuperServer, the
transaction changes a page, it lock table is mapped into the server
changes that page – and the physi- process. In Classic, each process
cal structure of the database as a maps the lock table. All databases
whole – from one consistent state to on a server computer share the
another. Before a transaction reads same lock table, except those run-
or writes a database page, it locks ning with the embedded server.
the page. When it finishes reading
or writing, it can release the lock The Firebird lock manager
without compromising the physical
consistency of the database file. We often talk about the Firebird
Firebird page level locking is tran- Lock Manager as if it were a sepa-
sient. Transactions acquire and rate process, but it isn’t. The lock
release page locks throughout their management code is part of the
existence. However, to prevent engine, just like the optimizer, pars-
deadlocks, transactions must be er, and expression evaluator. There
able to release all the page locks it is a formal interface to the lock
holds before acquiring a lock on a management code, which is similar Registering for the Conference
new page. to the formal interface to the distrib-
uted lock manager that was part of
Call for papers
The Firebird lock table VAX/VMS and one of the inter-
faces to the Distributed Lock Man-
When all access to a database is ager from IBM. Sponsoring the Firebird Conference
done in a single process – as is the
case with most database systems – The lock manager is code in the
locks are held in the server’s memo- engine. In Classic, each process has
ry and the lock table is largely invis- its own lock manager. When a
ible. The server process extends or Classic process requests or releases
remaps the lock information as a lock, its lock management code
required. Firebird, however, man- acquires a mutex on the shared http://firebird-conference.com/
ages its locks in a shared memory memory section and changes the
put on a list of waiting requests and work the mechanism must be fast they check the index definitions for mode of the request, etc. Lock
the transactions that hold conflicting and reliable. A fast, reliable inter- the table, find the new index defini- blocks describe the resources being
locks on the resource are notified of process communication mechanism tion, and begin maintaining the locked.
the conflicting request. Part of every can be – and is – useful for a num- index.
ber of purposes outside the area To request a lock, the owner finds
lock request is the address of a rou- the lock block, follows the linked list
tine to call when the lock interferes that’s normally considered data- Firebird locking summary
base locking. of requests for that lock, and adds
with another request for a lock on Although Firebird does not lock its request at the end. If other own-
the same object. Depending on the For example, Firebird uses the lock records, it uses locks extensively to ers must be notified of a conflicting
resource, the routine may cause the table to notify running transactions isolate the effects of concurrent request, they are located through
lock to be released or require the of the existence of a new index on a transactions. Locking and the lock the request blocks already in the list.
new request to wait. table. That’s important, since as table are more visible in Firebird Each owner block also has a list of
soon as an index becomes active, than in other databases because the its own requests. The performance
Transient locks like the locks on every transaction must help main- lock table is a central communica- critical part of locking is finding lock
database pages are released tain it – making new entries when it tion channel between the separate blocks. For that purpose, the lock
immediately. When a transaction stores or modifies data, removing processes that access the database table includes a hash table for
requests a page lock and that page entries when it modifies or deletes in Classic mode. In addition to con- access to lock blocks based on the
is already locked in an incompati- data. trolling access to database objects name of the resource being locked.
ble mode, the transaction or trans- like tables and data pages, the Fire-
actions that hold the lock are noti- When a transaction first references bird lock manager allows different A quick refresher on hashing
fied and must complete what they a table, it gets a lock on the exis- transactions and processes to notify
are doing and release their locks tence of indexes for the table. A hash table is an array with linked
each other of changes to the state of lists of duplicates and collisions
immediately. Two-phase locks like When another transaction wants to the database, new indexes, etc.
table locks are held until the trans- create a new index on that table, it hanging from it. The names of lock-
action that owns the lock completes. must get an exclusive lock on the Lock table specifics able objects are transformed by a
When the conflicting lock is existence of indexes for the table. function called the hash function
released, and the new lock is grant- Its request conflicts with existing The Firebird lock table is an in-mem- into the offset of one of the elements
ed, then transaction that had been locks, and the owners of those locks ory data area that contains of four of the array. When two names
waiting can proceed. are notified of the conflict. When primary types of blocks. The lock transform to the same offset, the
those transactions are in a state header block describes the lock result is a collision. When two locks
where they can accept a new index, table as a whole and contains have the same name, they are
Locks as interprocess com- pointers to lists of other blocks and duplicates and always collide.
they release their locks, and imme-
munication free blocks. Owner blocks describe
diately request new shared locks on In the Firebird lock table, the array
Lock management requires a high the existence of indexes for the the owners of lock requests – gen-
erally lock owners are transactions, of the hash table contains the
speed, completely reliable commu- table. The transaction that wants to
connections, or the SuperServer. address of a hash block. Hash
nication mechanism between trans- create the index gets its exclusive
Request blocks describe the rela- blocks contain the original name, a
actions, including transactions in lock, creates the index, and com-
tionship between an owner and a collision pointer, a duplicate point-
different processes. The actual mits, releasing its exclusive lock on
lockable resource – whether the er, and the address of the lock block
mechanism varies from platform to the existence of indexing. As other
request is granted or pending, the that corresponds to the name. The
platform, but for the database to transactions get their new locks, collision pointer contains the
address of a hash block whose ber of hash slots if the load is high. allowed to update the lock table at The change will not take effect until
News & Events name hashed to the same value. The symptom of an overloaded any instant. When updating the lock all connections to all databases on
The duplicate pointer contains the hash table is sluggish performance table, a process holds the table’s the server machine shut down.
PHP Server
address of a hash block that has under load. mutex. A non-zero mutex wait indi-
exactly the same name. cates that processes are blocked by If you increase the number of hash
One of the more interest-
The tool for checking the lock table the mutex and forced to wait for slots, you should also increase the
ing recent developments
A hash table is fast when there are is fb_lock_print, which is a com- access to the lock table. In turn, that lock table size. The second line of
in information technolo-
relatively few collisions. With no mand line utility in the bin directory indicates a performance problem the lock print
Version:114, Active owner 0,
gy has been the rise of
collisions, finding a lock block of the Firebird installation tree. The
Length: 262144, Used: 85740
browser based applica- inside the lock table, typically
involved hashing the name, index- full lock print describes the entire because looking up a lock is slow.
tions, often referred to
ing into the array, and reading the state of the lock table and is of limit-
by the acronym "LAMP".
pointer from the first hash block. ed interest. When your system is If the hash lengths are more than tells you how close you are to run-
One key hurdle for Each collision adds another pointer under load and behaving badly, min 5, avg 10, or max 30, you ning out of space in the lock table.
broad use of the LAMP to follow and name to check. The invoke the utility with no options or need to increase the number of The Version and Active owner are
technology for mid-mar- ratio of the size of the array to the switches, directing the output to a hash slots. The hash function used in uninteresting. The length is the max-
ket solutions was that it number of locks determines the file. Open the file with an editor. Firebird is quick but not terribly effi- imum size of the lock table. Used is
was never easy to con- number of collisions. Unfortunately, You'll see output that starts some- cient. It works best if the number of the amount of space currently allo-
figure and manage. the width of the array cannot be thing like this: hash slots is prime. cated for the various block types
LOCK_HEADER BLOCK
adjusted dynamically because the and hash table. If the amount used
Version:114, Active owner:0, Length:262144, Used:85740
PHPServer changes that: size of the array is part of the hash is anywhere near the total length,
Semmask:0x0, Flags: 0x0001
it can be installed with function. Changing the width uncomment this parameter in the
Enqs: 18512, Converts: 490, Rejects:0, Blocks: 0
just four clicks of the changes the result of the function configuration file by removing the
… LockMemSize = 1048576
a capable, compact,
easy to install and easy The size of the hash table is set in
to manage solution. the Firebird configuration file. You The seventh and eighth lines suggest Change this line in the configuration
must shut down all activity on all that the hash table is too small and file: The value is bytes. The default lock
#LockHashSlots = 101
PHPServer is a free databases that share the hash table that it is affecting system perform- table is about a quarter of a
download – normally all databases on the ance. In the example, these values megabyte, which is insignificant on
machine – before changes take indicate a problem: Uncomment the line by removing modern computers. Changing the
Read more at:
Mutex wait: 10.3%
effect. The Classic architecture uses lock table size will not take effect
Inside BLOBs
This is an excerpt from the book “1000 InterBase & Firebird Tips & Tricks”
by Alexey Kovyazin and Dmitri Kouzmenko, which will be published in 2006.
kdv@ib-aid.com
How the server Initially, the basic record data on ally contains the BLOB data. The first type is the simplest. If the
works with BLOBs the data page includes a reference Depending on the size of the BLOB, size of BLOB-field data is less than
The BLOB data type is intended for to a “BLOB record” for each non- this BLOB-record will be one of the free space on the data page, it is
storing data of variable size. Fields null BLOB field, i.e. to record-like three types. placed on the data page as a sepa-
of BLOB type allow for storage of structure or quasi-record that actu- rate record of "BLOB" type.
data that cannot be placed in fields
of other types, - for example, pic-
tures, audio files, video fragments,
etc.
The second type is used when the The special header contains the size will be 2 gigabytes. So, if you
size of BLOB is greater than the following information: plan to have very large BLOB
free space on the page. In this fields in your database, you should
case, references to pages contain- •The number of the first blob page experiment with storing data of a
ing the actual BLOB data are in this blob. It is used to check that large size beforehand.
stored in a quasi-record. Thus, a pages belong to one blob.
two-level structure of BLOB-field •A sequence number. This is The segment size mystery
data is used. important in checking the integrity Developers of database applica-
of a BLOB. For a BLOB pointer tions often ask what the Segment
page it is equal to zero. Size parameter in the definition of
•The length of data on a page. As a BLOB is, why we need it and
a page may or may not be filled to whether or not we should set it
the full extent, the length of actual when creating Blob-fields.
data is indicated in the header. In reality, there is no need to set
this parameter. Actually, it is a bit
Maximum BLOB size of a relic, used by the GPRE utility
BLOB Page As the internal structure for storing when pre-processing Embedded
BLOB data can have only 3 levels SQL. When working with BLOBs,
The blob page consists of the fol-
of organization, and the size of GPRE declares a buffer of speci-
lowing parts:
data page is also limited, it is pos- fied size, based on the segment
sible to calculate the maximum size size. Setting the segment size has
of a BLOB. no influence over the allocation
If the size of BLOB-field contents is and the size of segments when
very large, a three-level structure However, this is a theoretical limit storing the BLOB on disk. It also
is used – a quasi-record stores ref- (if you want, you can calculate it), has no influence on performance.
erences to BLOB pointer pages but in practice the limit will be Therefore the segment size can be
which contain references to the much lower. The reason for this safely set to any value, but it is set
actual BLOB data. lower limit is that the length of to 80 bytes by default.
BLOB-field data is determined by a
The whole structure of BLOB stor- variable of ULONG type, i.e. its Information for those who want to
age (except for the quasi-record, maximal size will be equal to 4 know everything: the number 80
of course) is implemented by one gigabytes. was chosen because 80 symbols
page type – the BLOB page type. could be allocated in alphanumer-
Different types of BLOB-pages dif- Moreover, in reality this practical ic terminals.
fer from each other in the presence limit is reduced if a UDF is to be
of a flag (value 0 or 1) defining used for BLOB processing. An
how the server should interpret the internal UDF implementation
given page. assumes that the maximum BLOB
In issue 1 we published Dmitri table, with the following structure: Download the InterBase 7.5.1 trial
Yemanov’s article about the inter- CREATE TABLE TEST ( version from www.borland.com.
ID NUMERIC(18,2),
nals of savepoints. While that article The installation process is obvious
NAME VARCHAR(120),
was still on the desk, Borland and well-described in the InterBase
DESCRIPTION VARCHAR(250),
announced the release of InterBase documentation.
CNT INTEGER,
7.5.1, introducing, amongst other
You can download a backup of the
QRT DOUBLE PRECISION,
things, a NO SAVEPOINT option
test database ready to use from
TS_CHANGE TIMESTAMP,
for transaction management. Is this
http://www.ibdeveloper.com/issu
TS_CREATE TIMESTAMP,
an important improvement for Inter-
Base? We decided to give this e2/testspbackup.zip (~4 Mb) or,
implementation a close look and test NOTES BLOB alternatively, an SQL script for cre-
it some, to discover what it is all ); ating it from
about. http://www.ibdeveloper.com/issu
This table contains 100,000 records, which will be updated during the test. e2/testspdatabasescript.zip (~1
The stored procedure and generators are used to fill the table with test data. Kb).
Testing NO SAVEPOINT You can increase the quantity of records in the test table by calling the
In order to analyze the problem that stored procedure to insert them: If you download the database
SELECT * FROM INSERTRECS(1000);
the new transaction option was backup, the test tables are already
intended to address, and to assess populated with records and you
its real value, we performed several The second table, TEST2DROP, has the same structure as the first and is filled can proceed straight to the section
with the same records as TEST. “Preparing to test”, below.
INSERT INTO TEST2DROP SELECT FROM TEST;
very simple SQL tests. The tests are
all 100% reproducible, so you will If you choose instead to use the
be able to verify our results easily. As you will see, the second table will be dropped immediately after connect. SQL script, you will create data-
We are just using it as a way to increase database size cheaply: the pages base yourself. Make sure you
Database for testing occupied by the TEST2DROP table will be released for reuse after we drop insert 100,000 records into table
the table. With this trick we avoid the impact of database file growth on the TEST using the INSERTRECS stored
The test database file was created in
test results. procedure and then copy all of
InterBase 7.5.1, page size = 4096,
character encoding is NONE. It them to TEST2DROP three or four
contains two tables, one stored pro- Setting test environment times.
cedure and three generators. All that is needed to perform this test is the trial installation package of Inter- After that, perform a backup of this
Base 7.5.1, the test database and an SQL script. database and you will be on the
For the test we will use only one
same position as if you had down-
Hardware is not a material issue for these tests, since we are only comparing performance with and without the NO
SAVEPOINT option. Our test platform was a modest computer with Pentium-4, 2 GHz, with 512 RAM and an 80GB
Samsung HDD.
Preparing to test
A separate copy of the test database is used for each test case, in order to eliminate any interference between state-
ments. We create four fresh copies of the database for this purpose. Supposing all files are in a directory called
C:\TEST, simply create the four test databases from your test backup file:
quit;
The second script tests performance for the same UPDATE with the NO SAVEPOINT option:
News & Events connect "C:\testsp2.ib" USER "SYSDBA" Password "masterkey"; News & Events
drop table TEST2DROP;
Fyracle 0.8.9 commit; IBAnalyst 1.9
Janus has released a select count(*) from test; IBSurgeon has issued new
new version of Oracle- commit; version of IBAnalyst.
mode Firebird, Fyracle. set time on;
Fyracle is a specialized set stat on; Now it can better analyze
build of Firebird 1.5: commit; InterBase or Firebird data-
SET TRANSACTION NO SAVEPOINT; // enable NO SAVEPOINT
it adds temporary base statistics with using
update TEST set ID = ID+1, QRT = QRT+1, NAME=NAME||'1', ts_change = CURRENT_TIMESTAMP;
tables, hierarchical
metadata information (in this
commit;
queries and
case connection to database
quit;
a PL/SQL engine.
is required).
Version 0.8.9 adds
support for stored Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script, both scripts are IBAnalyst is a tool that assists
procedures written the same, simply testing the behavior of engine in case of the single bulk UPDATE. a user to analyze in detail
in Java. Firebird or InterBase data-
To test sequential UPDATEs, we added several UPDATE statements--we recommend using five.
Fyracle dramatically The script for testing without NO SAVEPOINT would be: base statistics and identify
reduces the cost of port- possible problems with data-
ing Oracle-based appli- connect "E:\testsp3.ib" USER "SYSDBA" Password "masterkey"; base performance, mainte-
cations to Firebird. drop table TEST2DROP; nance and how an applica-
commit; tion interacts with the data-
select count(*) from test;
Common usage
includes the base.
Compiere open source commit;
set time on;
ERP package,
It graphically displays data-
set stat on;
mid-market deployments base statistics and can then
of Developer/ commit; automatically make intelli-
2000 applications update TEST set ID = ID+1, QRT = QRT+1, NAME=NAME||'1', ts_change = CURRENT_TIMESTAMP; gent suggestions about
and demo CD's update TEST set ID = ID+1, QRT = QRT+1, NAME=NAME||'1', ts_change = CURRENT_TIMESTAMP; improving database per-
update TEST set ID = ID+1, QRT = QRT+1, NAME=NAME||'1', ts_change = CURRENT_TIMESTAMP;
of applications formance and database
update TEST set ID = ID+1, QRT = QRT+1, NAME=NAME||'1', ts_change = CURRENT_TIMESTAMP;
without license trouble. maintenance
Read more at: update TEST set ID = ID+1, QRT = QRT+1, NAME=NAME||'1', ts_change = CURRENT_TIMESTAMP;
commit;
www.ibsurgeon.com/
quit;
www.janus-soft- news.html
ware.com
You can download all the scripts and the raw results of their execution from this location:
http://www.ibdeveloper.com/issue2/testresults.zip
Buffers = 2048
access to DB fields, http://www.tectsoft.net/
Reads = 1
styles, text flow, URLs,
Anchors. Products/Firebird/
Writes 942 FBMail.aspx
Trial! Fetches = 3
Buy Now! This is an excerpt from the one-pass script with NO SAVEPOINT enabled
In the second UPDATE we start to see the difference. With default transaction settings this
UPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-
POINT enabled. With default transaction settings we can see that memory usage is signifi-
cant, wherease with NO SAVEPOINT no additional memory is used.
The third and all following UPDATE statements with default settings show equal time and
memory usage values and the growth of writes parameters.
Table 1
Test results for 5 sequental UPDATEs
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that time/memory values and writes growth are
all small and virtually equal for each pass. The corresponding graphs are below:
The first UPDATE statement has almost the same execution time with and without the NO A few words about versions
SAVEPOINT option. However, memory consumption is reduced fivefold when we use NO You probably know already that InterBase is a multi-record-version database engine,
SAVEPOINT. meaning that each time a record is changed, a new version of that record is produced. The
old version does not disappear The NO SAVEPOINT “Release Notes” for InterBase 7.5. SP1, “New in InterBase 7.5.1”, page 2-2).
immediately but is retained as a
backversion.
option Secondly, when a NO SAVEPOINT transaction is rolled back, it is marked as rolled back in the transaction
inventory page. Record version garbage thereby gets stuck in the "interesting" category and prevents the
In fact, the first time a backversion is The NO SAVEPOINT option in
OIT from advancing. Sweep is needed to advance the OIT and back out dead record versions.
written to disk, it is as a delta ver- InterBase 7.5.1 is a workaround for
sion, which saves disk space and the problem of performance loss Fuller details of the NO SAVEPOINT option are provided in the InterBase 7.5.1. Release Notes.
memory usage by writing out only during bulk updates that do multiple
the differences between the old and passes of a table. The theory is: if
Initial situation
the new versions. The engine can using the implicit savepoint man-
agement causes problems then let's Consider the implementation details of the undo-log. Figure 3 shows the initial situation:
rebuild the full old version from the
new version and chains of delta ver- kill the savepoints. No savepoints –
Recall that we perform this test on freshly-restored database, so it is guaranteed that only one version exists
sions. It is only if the same record is no problem :-)
for any record.
updated more than once within the Besides ISQL, it has been surfaced
same transaction that the full back- as a transaction parameter in both
version is written to disk. DSQL and ESQL. At the API level, a
new transaction parameter block
The UNDO log concept (TPB) option isc_tpb_no_savepoint
You may recall from Dmitri’s article can be passed in the isc_start_trans-
how each transaction is implicitly action() function call to disable
enclosed in a "frame" of savepoints, savepoints management. Syntax
each having its own undo log. This details for the latter flavors and for
log stores a record of each change the new tpb option can be found in
in sequence, ready for the possibili- the 7.5.1 release notes.
ty that a rollback will be requested. The effect of specifying the NO
A backversion materializes when- SAVEPOINT transaction parameter
Figure 3
ever an UPDATE or DELETE state- is that no undo log will be created.
However, along with the perform- Initial situation before any UPDATE - only the one record version exists, Undo log is empty
ment is performed. The engine has
to maintain all these backversions in ance gain for sequential bulk
updates, it brings some costs for The first UPDATE
the undo log for the relevant save-
point. transaction management. The first UPDATE statement creates delta backversions on disk (see figure 4). Since deltas store only the
differences between the old and new versions, they are quite small. This operation is fast and it is easy
So, the Undo Log is a mechanism to First and most obvious is that, with work for the memory manager.
manage backversions for save- NO SAVEPOINT enabled, any
points in order to enable the associ- error handling that relies on save- It is simple to visualize the undo log when we perform the first UPDATE/DELETE statement inside the trans-
ated changes to be rolled back. The points is unavailable. Any error action – the engine just records the numbers of all affected records into the bitmap structure. If it needs to
process of Undo logging is quite during a NO SAVEPOINT transac- roll back the changes associated with this savepoint, it can read the stored numbers of the affected records,
complex and maintaining it can tion precludes all subsequent exe- then walk down to the version of each and restore it from the updated version and the backversion stored
consume a lot of resources. cution and leads to rollback (see on disk.
This approach is very fast and economical on memory usage. The engine The engine could write all intermediate versions to disk but there is no reason to do so.
does not waste too many resources to handle this undo log – in fact it reuses These versions are visible only to the modifying transaction and would not be used unless
the existing multi-versioning mechanism. Resource consumption is merely a rollback was required.
the memory used to store the bitmap structure with the backversion num-
bers. We don't see any significant difference here between a transaction
with the default settings and one with the NO SAVEPOINT option enabled.
Figure 5
The second UPDATE creates a new delta backversion for transaction 1,
Figure 4 erases from disk the delta version created by the first UPDATE, and copies
UPDATE1 create small delta version on disk and put record number into UNDO log the version from UPDATE1 into the Undo log.
The second UPDATE
When the second UPDATE statement is performed on the same set of records, we have a This all makes hard work for the memory manager and the CPU, as you can see from the
different situation. growth of the “max mem”\”delta mem” parameters values in the test that uses the default
transaction parameters.
Here is a good place to note that the example we are considering is the most simple situa-
tion, where only the one global (transaction level) savepoint exists. We will also look at the When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log. As
difference in the Undo log when an explicit (or enclosed BEGIN… END) savepoint is used. a result, we see execution time, reads/writes and memory usage as low for subsequent
updates as for the first.
To preserve on-disk integrity (remember the ‘careful write’ principle ?) the engine must
compute a new delta between the old version (by transaction1) and new version (by trans- The third UPDATE
action2, update2), store it somewhere on disk, fetch the current full version (by transac-
tion2, update1), put it into the in-memory undo-log, replace it with the new full version (with The third and all subsequent UPDATEs are similar to the second UPDATE, with one excep-
backpointers set to the newly created delta) and erase the old, now superseded delta. As tion – memory usage does not grow any further.
you can see – there is much more work to do, both on disk and in memory.
Original design of IB implement second UPDATE another way but sometime after IB6 Bor- BEGIN... END savepoint framework, the engine has to store a backversion for each associ-
land changed original behavior and we see what we see now.But this theme is for another ated record version in the Undo log.
article;)
For example, if we used an explicit savepoint, e.g. SAVEPOINT Savepoint1, upon perform-
Why is the delta of memory usage zero? The reason is that, beyond the second UPDATE, ing UPDATE2, we would have the situation illustrated in figure 7:
no record version is created. From here on, the update just replaces record data on disk
with the newest one and shifts the superseded version into the Undo log.
A more interesting question is why we see an increase in disk reads and writes during the
test. We would have expected that the third and following UPDATEs would do essentially
equal numbers of read and writes to write the newest versions and move the previous ones
to the undo log. However, we are actually seeing a growing count of writes. We have no
answer for it, but we would be pleased to know.
The following figure (figure 6) helps to illustrate the situation in the Undo log during the
sequential updates. When NO SAVEPOINT is enabled, the only pieces we need to perform
are replacing the version on disk and updating the original backversion. It is fast as the first
UPDATE.
Figure 7
If we have an explicit SAVEPOINT, each new record version associated with it will have a
corresponding backversion in the Undo log of that savepoint
In this case the memory consumption would be expected to increase each time an UPDATE
occurs within the explicit savepoint's scope.
Figure 6 Summary
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2 The new transaction option NO SAVEPOINT can solve the problem of excessive resource
version and its own version is written to disk as the latest one. usage growth that can occur with sequential bulk updates. It should be beneficial when
Explicit SAVEPOINT applied appropriately. Because the option can create its own problems by inhibiting the
advance of the OIT, it should be used with caution, of course. The developer will need to take
When an UPDATE statement is going to be performed within its own explicit or implicit extra care about database housekeeping, particularly with respect to timely sweeping.
Object-Oriented vlad@contek.ru
Development in RDBMS,
Part 1 of all the methods taken from the tion development - if you develop
Thanks and apologies The problem state-
articles, and of course I have tested your applications just like 20-30
This article is mostly a compilation all my own methods.
ment years ago.
of methods that are already well-
known, though many times it turned Mixing of object-oriented program- What is the problem? As you see, almost all these charac-
out that I on my own have reinvent- ming and RDBMS use is always a teristics sound good, except, proba-
Present-day relational databases
ed a well-known and quite good compromise. I have endeavored to bly, the last one. Today you can
were developed in times when the
wheel. I have endeavored to pro- recommend several approaches in hardly find a software product (in
sun shone brighter, the computers
vide readers with links to publica- which this compromise is minimised almost any area) consisting of more
were slower, mathematics was in
tions I know of that are discussing for both components. I have also than few thousand of lines which is
favour, and OOP had yet to see the
the problem. However, if I missed tried to describe both advantages written without OOP technologies.
light of day. Due to that fact most
someone’s work in the bibliogra- and disadvantages of such a com- OOP languages have been used for
RDBMSs’ have the following char-
phy, and thus violated copyright, promise. a long time for building visual
acteristics in common:
please drop me a message at forms, i.e. in UI development. It is
I should make it clear that the object also quite usual to apply OOP at
vlad@contek.ru. I apologize 1.Everyone got used to them and
approach to database design as the business logic level, if you
beforehand for possible inconven- felt comfortable with them.
described is not appropriate for implement it either on a middle-tier,
ience, and promise to add any nec-
every task. It is still true for the OOP 2.They are quite fast (if you use or on a client. But things fall apart
essary information to the article.
as a whole, too, no matter what them according to certain known when the deal comes closer to the
The sources in the bibliography are OOP apologists may say! ?. I would standards). data storage issues… During the last
listed in the order of their appear- recommend using it for such tasks as ten years there were several
ance in my mind. document storage and processing, 3.They use SQL, which is an easy,
attempts to develop an object-ori-
accounting, etc. comprehensible and time-proved
ented database system, and, as far
The described database structures data manipulation method.
And the last, but not least , I am very as I know, all those attempts were
have been simplified in order to
thankful to Dmitry Kuzmenko, 4.They are based upon a strong rather far from being successful.
illustrate the problems under con-
Alexander Nevsky and other peo- mathematical theory. The characteristics of an OODBMS
sideration as much clearly as possi-
ple who helped me in writing this are the antithesis of those for an
ble, leaving out more unimportant 5.They are convenient for applica-
article. RDBMS. They are unusual and
elements. I have tested the viability
slow; there are no standards for The OID any disadvantages. Even in the And what is more, nobody requires
data access and no underlying All objects are unique, and they pure relational model it does not run-time pointers to contain some
mathematical theory. Perhaps the must be easily identifiable. That is matter whether the surrogate key is additional information about an
OOP developer feels more comfort- why all the objects stored in the unique within a single table or the object except for a memory
able with them, although I am not database should have unique ID- whole database. address. However, there are some
sure… keys from a single set (similar to people who vigorously reject usage
OIDs should never have any real of surrogates. The most brilliant
As a result, everyone continues object pointers in run-time). These world meaning. In other words, the
identifiers are used to link to an argument against surrogates I’ve
using RDBMS, combining object- key should be completely surro- ever heard is that "they conflict with
oriented business logic and domain object, to load an object into a run- gate. I will not list here all the pros
time environment, etc. In the [1] the relational theory». This state-
objects with relational access to the and cons of surrogate keys in com- ment is quite arguable, since surro-
database, where these objects are article these identifiers are called parison with natural ones: those
OIDs (i.e. Object IDs), in [2] – UINs gate keys, in some sense, are much
stored. who are interested can refer to the closer to that theory than natural
(Unique Identification Number), or [4] article. The simplest explanation
"hyperkey". Let us call them OIDs, ones.
What do we need? is that everything dealing with the
though “hyperkey” is also quite a real world may change (including Those who are interested in more
The thing we need is simple – to beautiful word, isn’t it? ? . the vehicle engine number, network strong evidence supporting the use
develop a set of standard methods card number, name, passport num- of OIDs with the characteristics
that will help us to simplify the First of all, I would like to make a
couple of points concerning key ber, social security card number, described above (pure surrogate,
process of tailoring the OO-layer of and even sex ?. unique at least within the data-
business logic and a relational stor- uniqueness. Database developers
who are used to the classical base), should refer to [1], [2], and
age together. In other words, our Nobody can change their date of [4] articles.
task is to find out how to store approach to database design birth – at least not their de facto
objects in a relational database, would probably be quite surprised date of birth. But birth dates are not The simplest method of OID imple-
and how to implement links at the idea that sometimes it makes unique, anyway.) mentation in a relational database
between the objects. At the same sense to make a table key unique is a field of “integer” type, and a
time we want to keep all the advan- not only within a single table (in Remember the maxim "everything function for generating unique val-
tages provided by the relational terms of OOP – not only within a that can go bad will go bad" ("con- ues of this type. In larger or distrib-
database design and access: certain class), but also within the sequently, everything that cannot uted databases, it probably makes
speed, flexibility, and the power of whole database (all classes). How- go bad…". hum!. But let’s not talk sense to use “int64” or a combina-
relation processing. ever, such strict uniqueness offers about such gloomy things ? ). tion of several integers.
important advantages, which will Changes to some OIDs would
become obvious quite soon. More- immediately lead to changes in all
RDBMS as an object over, it often makes sense to pro- identifiers and links, and thus, as ClassId
vide complete uniqueness in a Uni- Mr. Scott Ambler wrote [1], could All objects stored in a database
storage verse, which provides considerable result in a “huge maintenance should have a persistent analogue
First let’s develop a database struc- benefits in distributed databases nightmare.” As for the surrogate of RTTI, which must be immediately
ture that would be suitable for and replication development. At the key, there is no need to change it, at available through the object identi-
accomplishing the specified task. same time, strict uniqueness of a key least in terms of dependency on the fier. Then, if we know the OID of an
within the database does not have changing world. object, keeping in mind that it is
Description varchar(128),
unique within the database, we can links, is often a very complicated
Deleted smallint,
immediately figure out what type task, to put it mildly ?.
if you know ClassId for this dictionary which relates one-to-one to the OBJECTS table by the Nevertheless, there are two main disadvantages: implementation of the system of
element. If you only know the type OID field. The "order number" and “comments” attrib- object-relational mapping for this method is less than simple, and there are some diffi-
name, the query becomes a bit more utes are stored in the “Name” and “Description” fields culties in the organization of type inheritance. This method is described in detail in [1]
complex: of the OBJECTS table. “Orders” also refers to and [3]. These articles also describe the implementation of type inheritance methods in
select OID, Name, Description a database.
from OBJECTS
where ClassId = (select OID from OBJECTS where ClassId = -100
Method 2. (See. [5]) All object attributes of any type are stored in the form of a
value varchar(256),
dictionaries. ners" dictionary. You can retrieve all attributes of
“Order” type with the following query:
select o.OID, constraint attributes_pk primary key (OID, attribute_id));
Storing of more complex
objects o.Name as Number,
o.Description as Comment,
connected 1:M with OBJECTS by OID. There is also a table
ord.customer,
It is clear that some objects in real
databases are more complex than create table class_attributes (
those which can be stored in the ord.sum_total OID TOID, /*here is a link to a description-object of the type */
OBJECTS table. The method for stor- from Objects o, Orders ord attribute_id integer not null,
ing them depends on the application where o.OID = ord.OID and ord.OID = :id attribute_name varchar(32),
attribute_type integer,
domain and the object’s internal As you see, everything is simple and usual. You could
constraint class_attributes_pk primary key (OID,attribute_id))
structure. Let’s look at three well- also create a view “orders_view”, and make everything
known methods of object-relational look as it always did. ?
mapping. which describes type metadata – an attribute set (their names and types) for each
If an order has a “lines” section, and a real-world order object type.
Method 1. The objects are stored just should definitely have such a section, we can create sep-
as in a standard relational database, arate table for it, call it e.g. “order_lines”, and relate it All attributes for particular object where the OID is known are retrieved by the query:
with the type attributes mapped to
select attribute_id, value
with the “Orders” table by a relation 1:M.
create table order_lines (
table attributes. For example, docu-
from attributes
id integer not null primary key,
ment objects of the “Order” type
where OID = :oid
object_id TOID, /* reference to order object – 1:M relation */
with such attributes as "order num-
ber," "comments," "customer," and
"order amount" are stored in the item_id TOID, /* reference to ordered item */ or, with names of attributes
table Orders amount numeric(15,4), select a.attribute_id, ca.attribute_name a.value
cost numeric(15,4), from attributes a, class_attributes ca, objects o
create table Orders (
sum numeric(15,4) computed by (amount*cost)) where a.OID = :oid and
OID TOID primary key,
a.OID = o.OID and
customer TOID,
o.ClassId = ca.OID and
One very important advantage of this storage method is that it allows you to
sum_total NUMERIC(15,2)),
work with object sets as you would with normal relational tables (which they
actually are). All the advantages of the relational approach are present. a.attribute_id = ca.attribute_id
In the context of this method, you database, a significant benefit ben- Method 3. Everything is stored in utes, it would be better to to use method 1, since it is the closest to
can also emulate a relational stan- efit since, with this approach the BLOB, and one of the persistent for- the standard relational model and you will retain all the power of
dard. Instead of selecting object number of tables would not mats is applied – a custom format, SQL. If, on the other hand, data processing is not particularly com-
attributes in several records (one increase, no matter how many dif- or, for example, dfm (VCL stream- plex and data sets are not too large and/or you need a simple
attribute per a record) you can get ferent types were stored in a data- ing) from the Borland VCL, or XML, database structure, then it makes sense to use method 2. If a data-
all attributes in a single record by base. or anything you like. There is noth- base is used only as an object storage, and all operations are per-
joining or by using subqueries: ing to comment on here. The advan- formed with run-time instances of objects without using native data-
tages are obvious: object retrieval base tools like SQL (except work with attributes stored in the
select o.OID, logic is simple and fast; no extra OBJECTS table), the third method would probably be the best
o.Name as Number, database structures are necessary choice due to the speed and ease of implementation.
o.Description as Comment,
– just a single BLOB field; you can
To be continued…
a1.value as customer,
store any custom objects, including
absolutely unstructured objects
a2.value as sum_total (such as MS Word documents,
from OBJECTS o
HTML pages, etc). The disadvan-
left join attributes a1 on a1.OID = o.OID and
tage is also obvious: there is nothing
http://www.fast-report.com/
of extending and changing a type; attribute instead of just the one field
ease in implementing object inheri- processing like searching or group-
described in Method 1. ing, or it is likely to be added in the
tance; very simple structure for the
future, using a certain object attrib-
www.microtec.fr
databases using CopyCat
PART 1 : Data logging Two-way replication
BASICS OF DATABASE Before anything can be replicated, One obvious difficulty involved in two-way replication is how to
all changes to each database must avoid changes that have been replicated to one database from
REPLICATION of course be logged. CopyCat cre- replicating back to the original database. Since all the changes to
A replicator is a tool for keeping ates a log table and triggers for the database are logged, the changes made by the replicator are
several databases synchronized each table that is to be replicated. also logged, and will therefore bounce back and forth between the
(either wholly or in part), on a con- These triggers insert into the log source and the target databases. How can this problem be avoid-
tinuous basis. Such a tool can have table all the information concerning ed?
many applications: it can allow for the record that was changed (table
name, primary key value(s), etc). The solution CopyCat uses is related to the sub-node management
off-line, local data editing, with a
system described above. Each sub-node is assigned a name, which
punctual synchronization upon
is used when the sub-node logs in to the database. When a sub-
reconnection to main database; it Multi-node replication node replicates its own changes to its parent, the replication triggers
can also be used over a slow con-
Replicating to and from several log the change for all the node's sub-nodes except the current user.
nection, as an alternative to a direct
nodes adds another degree of com- replicates its changes. Each node's Thus, only sub-nodes other than the originator receive the change.
connection to the central database;
plexity. Every change that is made list of sub-nodes is stored in a table
another use would be to make an Conversely, CopyCat logs in to the nodes local database using the
to one database must be applied to in the node's database. (Incidental-
automatic, off-site, incremental node name of its parent as user name. Thus, any change made to
all the others. Furthermore, when ly, the parent node is configured in
backup, by using simple one-way the local database during replication will be logged for all sub-
one database applies this change, it the replicator software itself rather
replication. nodes other than the node's parent, and any change made to the
must indicate to the originating than in the database, and therefore,
database that the change has been no software is needed on nodes parent node will be logged to other sub-nodes, but not to the origi-
Creating a replicator can be quite
applied, without in any way hinder- having no parent – which allows nating node itself.
tricky. Let's examine some of the
key design issues involved in data- ing the other databases from repli- these servers to run Linux, or any
base replication, and explain how cating the same change, either other OS supported by Primary key synchronization
these issues are implemented in before, simultaneously, or after. Interbase/FireBird). One problem with replication is that since data is edited off-line,
Microtec CopyCat, a set of Delphi / there is no centralized way to ensure that the value of a field
In CopyCat, these problems are When a data change occurs in a
C++Builder components for per- remains unique. One common answer to this problem is to use
solved using a simple and flexible replicated table, one line is gener-
forming replication between Inter- GUID values. This is a good solution if you're implementing a new
system. Each replication node can ated per sub-node. Thus, each sub-
base and FireBird databases. database (except that GUID fields are rather large, and therefore,
be have one parent node, and sev- node fetches only the log lines that
eral sub-nodes towards which it concern it. not very well suited for a primary key field), but if you have an exist-
ing database that needs replication, sion of the record: this is a conflict. Difficult database structures
it would be very difficult to replace There are certain database archi-
all primary or unique key fields by CopyCat automatically detects con-
flicts, logs them to a dedicated tectures that are difficult to repli-
GUID values. cate. Consider for example a
table, and disables replication of
Since GUID fields are, in many that record in either direction until “STOCK” table, containing one line
cases, not feasible, CopyCat imple- the conflict is resolved. The conflicts per product, and a field holding the
ments another solution. CopyCat table holds the user names of both current stock value. Suppose that
allows you to define for each pri- nodes involved in the conflict, as for a certain product, the current
mary key field (as well as up to well as a field called stock value being 45, node A adds
three other fields for which unicity is “CHOSEN_USER”. In order to 1 item to stock, setting the stock
to be maintained) a synchroniza- solve the conflict, the user simply value to 46. Simultaneously, node
tion method. In most cases, this will has to put in this field the name of B, adds 2 items to stock thereby set-
be either a generator, or a stored the node which has the correct ver- ting the current stock value to 47.
procedure call, though it could be sion of the record, and automatical- How can such a table then be repli-
any valid SQL clause. Upon repli- ly, upon the next replication, the cated? Neither A nor B have the
cation, this SQL statement is called record will be replicated and the correct value for the field, since nei-
on the server side in order to calcu- conflict resolved. ther take into consideration the
late a unique key value, and the changes from the other node.
resulting value is then applied to the This system was carefully designed
to function correctly even in some of Most replicators would require such
local database. Only after the key an architecture to be altered.
values (if any) have been changed the complex scenarios that are pos-
sible with CopyCat. For instance, Instead of having one record hold
locally is the record replicated to the current stock value of product,
the server. the conflict may in reality be
between two nodes that are not there could be one line per change.
When replicating from the parent directly connected to each other: This would solve the problem. How-
node to the local node however, since CopyCat nodes only ever ever, restructuring large databases
this behaviour does not take place: communicate directly with their par- (and the end-user applications that
the primary key values on the serv- ent, there is no way to tell if another usually go with them) could be a
er are considered to be unique. node may not have a conflicting rather major task. CopyCat was
update for a certain record. Fur- specifically designed to avoid these
thermore, it's entirely possible that problems altogether, rather than
Conflict management require the database structure to be
two nodes (having the same parent)
Suppose a replication node and its should simultaneously attempt to changed.
parent both modify the same record replicate the same record to their
during the same time period. When To solve this kind of problem, Copy-
parent. By using a snapshot-type Cat introduces stored procedure
the replicator connects to its parent transaction, and careful ordering of
to replicate its changes, it has no “replication”. That is, a mechanism
the replication process, these issues for logging stored procedure calls,
way of telling which of the two are handled transparently.
nodes has the most up-to-date ver- and replicating them to other
nodes. When dealing with an
PART 2:
GETTING STARTED WITH COPYCAT
Copycat is available in two distinct forms :
Below is a concise guide for getting started with the CopyCat component suite:
Many more applications are possible since the CopyCat components are 5) On the “Tables” tab, for each table that you want to replicate, set a pri-
very flexible and allow for synchronization of even a single table! ority (relative to the other tables), and double-click on the “PKn generator”
columns to (optionally) fill the primary key synchronization method. Once Features include :
these settings have been made, set the “Created” field to 'Y', so as to gen- • Easy to use installer
erate the meta-data.
• Independent server Administrator tool (CTAdmin)
6) On the “Procedures” tab, set “Created” to 'Y' for each procedure that
you want to replicate, after having set a priority. • Configuration wizard for setting up links to master / slave databases
7) Apply all the generated SQL to all databases that should be replicated. • Robust replication engine based on Microtec CopyCat
8) For each database to be replicated, set the list of sub-nodes (in the • Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
RPL$USERS table).
• Simple & intuitive control panel
Replicate •Automatic email notification on certain events (conflicts, PK violations, etc.)
1. In Delphi, open the “Replicator” example project.
Visit the CopyTiger homepage to download a time-limited trial version.
2. Drop a provider component on the form, and hook it up to the TCcRepli-
http://www.microtec.fr/copycat/ct
cator's DBProvider property.
3. Setup the LocalDB and RemoteDB properties of the TCcReplicator with SUMMARY
the connection parameters for the local and remote databases.
In today's connected world, database replication and synchronization are topics of great interest among
4. Fill in the user name of the local and all industry professionals. With the advent of Microtec CopyCat, the Interbase / Firebird community is
remote nodes, as well as the SYSDBA obtaining a two-fold benefit :
user name and password (needed for 1. By encapsulating all the functionality of a replicator into Delphi components, CopyCat makes it easier
primary key synchronization). than ever to integrate replication and synchronization facilities into custom applications,
5. Compile and run the example. 2. By providing a standalone tool for the replication of Interbase / Firebird databases, Microtec is respond-
6. Press the “Replicate now” button. ing to another great need in the community – that of having a powerful and easy-to-use replication tool,
and one that can be connected to an existing database without disrupting it's current structure.
2. CopyTiger : CopyCat is being actively developed by Microtec and many new features are being worked on such as
the CopyCat Win32 support for replicating between heterogeneous database types (PostgreSQL, Oracle, MSSQL, MySQL,
standalone replication tool NexusDB, ...) as well as a Linux / Kylix version for the components and the standalone tool.
your database. The warnings or stored for unknown time by the We will examine the next 8 rows as if there any other active transactions between oldest active and
comments shown are based on operating system in its file cache. a group, as they all display aspects next transaction, but there can be such transactions. Usually, if
carefully gathered knowledge InterBase 6 creates databases with of the transaction state of the data- the oldest active gets stuck, there are two possible causes: a) that
obtained from a large number of Forced Writes OFF. base: some transaction is active for a long time or b) the application
real-world production databases. design allows transactions to run for a long time. Both causes pre-
Why is this marked in red on the • The Oldest transaction is the vent garbage collection and consume server resources.
Note: All figures in this article con- IBAnalyst report? The answer is sim- oldest non-committed transaction.
tain gstat statistics which were taken ple – using asynchronous writes can Any lower transaction numbers are • Transactions per day – this is calculated from Next transaction,
from a real-world production data- cause database corruption in cases for committed transactions, and no divided by the number of days passed since the creation of the
base (with the permission of its own- of power, OS or server failure. record versions are available for database to the point where the statistics are retrieved. This can be
ers). such transactions. Transaction num- correct only for production databases, or for databases that are
Tip: It is interesting that modern bers higher than the oldest transac- periodically restored from backup, causing transaction numbering
As I said before, raw database sta- HDD interfaces (ATA, SATA, SCSI) tion are for transactions that can be to be reset.
tistics look cryptic and are hard to do not show any major difference in in any state. This is also called the
interpret. IBAnalyst highlights any performance with Forced Write set As you have already learned, if there any warnings, they are shown
"oldest interesting transaction",
potential problems clearly in yellow On or Off1 . as colored lines, with clear, descriptive hints on how to fix or prevent
because it freezes when a transac-
or red and the detail of the problem the problem.
tion is ended with rollback, and
can be read simply by placing the Next on the report is the mysterious
server can not undo its changes at It should be noted that database statistics are not always useful. Sta-
cursor over the relevant entry and "sweep interval". If positive, it sets
that moment. tistics that are gathered during work and housekeeping operations
reading the hint that is displayed. the size of the gap between the old-
can be meaningless.
est2 and oldest snapshot transaction, • The Oldest snapshot – the old-
What can we discover from the at which the engine is alerted to the est active (i.e., not yet committed) Do not gather statistics if you:
above figure? This is a dialect 3 need to start an automatic garbage transaction that existed at the start
database with a page size of 4096 collection. On some systems, hitting of the transaction that is currently • Just restored your database
bytes. Six to eight years ago devel- this threshold will cause a "sudden the Oldest Active transaction.
opers used a default page size of • Performed backup (gbak –b db.gdb) without the –g switch
performance loss" effect, and as a Indicates lowest snapshot transac-
1024 bytes, but in more recent result it is sometimes recommended tion number that is interested in • Recently performed a manual sweep (gfix –sweep)
times such a small page size could record versions.
that the sweep interval be set to 0
lead to many performance prob- Statistics you get on such occasions will be practically useless. It is
(disabling automatic sweeping
lems. Since this database has a • The Oldest active3 – the oldest also correct that during normal work there can be times where data-
entirely). Here, the sweep interval is
page size of 4k, there is no warning currently active transaction . base is in perfect state, for example, when applications make less
marked yellow, because the value database load than usual (users are at lunch or it's a quiet time in the
displayed, as this page size is okay.
of the sweep gap is negative, • The Next transaction – the business day).
Next, we can see that the Forced which it can be in InterBase 6.0, transaction number that will be
Firebird and Yaffil statistics but not in 1- InterBase 7.5 and Firebird 1.5 have special features that can periodi-
Write parameter is set to OFF and assigned to new transaction
cally flush unsaved pages if Forced Writes is Off.
marked red. InterBase 4.x and 5.x InterBase 7.x. When the value of the
sweep gap is greater than the • Active transactions – IBAnalyst 2 - Oldest transaction is the same Oldest interesting transaction, mentioned
by default had this parameter ON.
will give a warning if the oldest everywhere. Gstat output does not show this transaction as "interesting".
Forced Writes itself is a write cache sweep interval (if the sweep interval
is not 0) the report entry for the active transaction number is 3 - Really it is the oldest transaction that was active when the oldest trans-
method: when ON it writes
30% lower than the daily transac- action currently active started. This is because only new transaction start
changed data immediately to disk, sweep interval will be marked red
tions count. The statistics do not tell moves "Oldest active" forward. In the production systems with regular
but OFF means that writes will be with an appropriate hint. transactions it can be considered as currently oldest active transaction.
How to seize the moment multi-user conflicts can be tested ly, and what the table size is in the main reason for performance degradation. For some tables
when there is something with two or three concurrently run- megabytes. Most of these warnings there can be a lot of versions that are still "in use". The server can
wrong with the database? ning applications. However, with are customizable. not decide whether they really are in use, because active transac-
Your applications can be designed larger numbers of users, garbage tions potentially need any of these versions. Accordingly, the serv-
collection problems can arise. Such In this database example there are
so well that they will always work er does not consider these versions as garbage, and it takes longer
potential problems can be caught if several activities. First of all, yellow
with transactions and data correct- and longer to construct a correct record from the big chain of ver-
you gather database statistics at the color in the VerLen column warns
ly, not making sweep gaps, not sions whenever a transaction happens to read it. In Figure 2 you
correct moments. that space taken by record versions
accumulating a lot of active trans- can see two tables that have the versions count three times higher
is larger than that occupied by the
actions, not keeping long running than the record count. Using this information you can also check
records themselves. This can result
snapshots and so on. Usually it does Table information whether the fact that your applications are updating these tables so
from updating a lot of fields in a
not happen (sorry, colleagues). frequently is by design, or because of coding mistake or an appli-
Let's take look at another sample record or by bulk deletes. See the
cation design flaw.
The most common reason is that output from IBAnalyst. rows in which MaxVers column is
developers test their applications marked in blue. This shows that only
The IBAnalyst Table statistics view is one version per record is stored and The Index view
running only two or three simultane- also very useful. It can show which
ous users. When the application is consequently this is caused by bulk IIndices are used by database engine to enforce primary key, for-
tables have a lot of record versions, deletes. So, both indications can tell
then used in a production environ- eign key and unique constraints. They also speed up the retrieval of
where a large number of that this is really "bulk deletes", and
ment with fifteen or more simultane- data. Unique indices are the best for retrieving data, but the level of
updates/deletes were made, frag- number in Versions column is close
ous users, the database can behave benefit from non-unique indices depends on the diversity of the
mented tables, with fragmentation to number of deleted records.
unpredictably. Of course, multi-user indexed data.
caused by update/delete or by
mode can work okay because most blobs, and so on. You can see which Long-living active transactions pre- For example, look at ADDR_ADDRESS_IDX6. First of all, the index
Figure 2 Table statistics tables are being updated frequent- vent garbage collection, and this is
name itself tells that it was created manually. If statistics
were taken by the Services API with metadata info, you
can see what columns are indexed (in IBAnalyst 1.83
and greater). For the index under examination you can
see that it has 34999 keys, TotalDup is 34995 and
MaxDup is 25056. Both duplicate columns are marked
in red. This is because there are only 4 unique key val-
ues amongst all the keys in this index, as can be seen
from the Uniques column. Furthermore, the greatest
duplicate chain (key pointing to records with the same
column value) is 25056 – i.e. almost all keys store one
of four unique values. As a result, this index could:
• Slow down garbage collection. Indices with a low count of unique Reports
values can impede garbage collection by up to ten times in compar-
ison with a completely unique index. This problem has been solved There is no need to look through the entire report each time, spotting cell color and reading hints for new warnings.
in InterBase 7.1/7.5 and Firebird 2.0. More direct and detailed information can be had by using the Recommendations feature of IBAnalyst. Just load the
statistics and go to the Reports/View Recommendations menu. This report provides a step-by-step analysis, includ-
• Produce unnecessary page reads when the optimizer reads the ing more detailed descriptive warnings about forced writes, sweep interval, database activity, transaction state,
index. It depends on the value being searched in a particular query database page size, sweeping, transaction inventory pages, fragmented tables, tables with lot of record versions,
- searching by an index that has a larger value for MaxDup will be massive deletes/updates, deep indices, optimizer-unfriendly indices, useless indices and even empty tables. All of
slower. Searching by value that has fewer duplicate values will be this information and the accompanying suggestions are dynamically created based on the statistics being loaded.
faster, but only you know what data is stored in that indexed column.
As an example of the report output, let’s have a look at a report generated for the database statistics you saw
That is why IBAnalyst draws your attention to such indices, marking earlier in this article: "Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pages.
them red and yellow, and including them in the Recommendations Read_committed transaction uses global TIP, but snapshot transactions make own copies of TIP in memory. Big TIP
report. Unfortunately most of the “bad" indices are automatically size can slowdown performance. Try to run sweep manually (gfix -sweep) to decrease TIP size. "
created to enforce foreign-key constraints. In some cases this prob-
lem can be solved by preventing, using triggers, deletes or updates Here is another quote from table/indices part of report: "Versioned tables count: 8. Large amount of record versions
of primary key in lookup tables. But if it is not possible to implement usually slowdown performance. If there are a lot of record versions in table, than garbage collection does not work,
such changes, IBAnalyst will show you "bad" indices on Foreign or records are not being read by any select statement. You can try select count(*) on that tables to enforce garbage
Keys every time you view statistics. collection, but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
ful if there is at least one transaction interested in these versions. Here
is the list of tables with version/record ratio greater than 3 :
Table Records Versions Rec/Vers size
CLIENTS_PR 3388 10944 92%
DICT_PRICE 30 1992 45%
DOCS 9 2225 64%
N_PART 13835 72594 83%
REGISTR_NC 241 4085 56%
SKL_NC 1640 7736 170%
STAT_QUICK 17649 85062 110%
UO_LOCK 283 8490 144%
Summary
IBAnalyst is an invaluable tool that assists a user in performing
detailed analysis of Firebird or InterBase database statistics and
identifying possible problems with a database in terms of perform-
ance, maintenance and how an application interacts with the data-
base. It takes cryptic database statistics and displays them in an
easy-to-understand, graphical manner and will automatically make
sensible suggestions about improving database performance and
easing database maintenance.
Readers feedback
We received a lot of feedback emails for article “Working with temporary tables in InterBase 7.5” which was published in issue 1.
The one of them is impressed me and with permission of its respective owner I’d like to publish it.
One thing I'd like to ask you to because of the complexity of a and from then on it is transparent to dynamic, either re user/session
change is re temp tables. I suggest statement. If developers have a fall- users/applications that the data data isolation, or re data where the
you even create another myth box back method to reduce complexity inside is user/session-specific. Web structure is unknown in advance,
for it. It is the sentence in those cases, that's an advantage. applications would be a good but needs to be processed like DB
Much better than asking developers example. data with a fixed structure. That
'Surely, most often temporary to supply their own query plans. Firebird does not have them is
tables were necessary to those The argument reminds me a bit of plainly a lack of an important fea-
developers who had been working Also, 'serious' RDBMS like Informix MySQL reasoning when it comes to ture, in the same category as cross
with MS SQL before they started to had them at least 15 years ago, features their 'RDBMS' does/did DB operations (only through qli,
use InterBase/Firebird.' when MS's database expertise did not have. Transactions / Foreign which means 'unusable'). Both fea-
not go further than MS Access. Cer- Keys / Triggers etc were all bad tures could make Firebird much
This myth does not want to die. tainly those MS database develop- and unnecessary because they did
Temporary tables (TT) are not a more suitable as the basis for OLAP
ers who need TTs to be able to do not have them (officially: they slow systems, an area where Firebird is
means for underskilled DB kids, their job would not have managed down the whole system and intro-
who cannot write any complex SQL lacking considerably.
to deal with an Informix server if duce dependencies). Of course
statement. They are *the* means of complexity was their main problem. they do. To call a flat file system a Well, to be fair, Firebird developers
dealing with data that is *structural- RDBMS is obviously good for mar- are working on both topics.
ly* dynamic, but still needs to be The two preceding paragraphs keting. Now they are putting in all
processed like data with a fixed were about local temp tables. There those essential database features
structure. So all OLAP systems are also global temporary tables Volker Rehn
which were declared crap by them
based on RDBMS are heavily (GTTs). A point in favour of GTTs is not long ago. And you can see volker.rehn@bigpond.com
dependent on this feature - or, if it is that one can give users their own already that their marketing now
not present, it requires a whole lot workspace within a database with- tells us how important those fea-
of unnecessary and complicated out having to setup some clumsy tures are. I bet we won't see MySQL
workarounds. I'm talking out of administration for it. No user id benchmarks for a while ;-) .
experience. scattered around in tables where
they don't belong, no explicit We should not make a similar mis-
Then, there are situations where the cleanup, no demanding role man- take. Temporary tables are impor-
optimizer simply loses the plot agement. Just setup tables as temp, tant if the nature of a system is
Subscribe now!
This is the first, official To receive future issues
book on Firebird — the
free, independent,
notifications send email to
open source relational subscribe@ibdeveloper.com
database server that
emerged in 2000.
Based on the actual
Firebird Project, this
book will provide you
all you need to know
about Firebird data-
base development, like
installation, multi-plat-
form configuration, Magazine
SQL language, inter- CD
faces, and mainte-
nance.
Donations