Beruflich Dokumente
Kultur Dokumente
Architecture
Over many years general-purpose database systems have dominated the database
industry. These offer a wide range of functions, applicable to many, if not most
circumstances in modern data processing. These have been enhanced with
extensible datatypes (pioneered in the PostgreSQL project) to allow development of
a very wide range of applications.
There are also other types of databases which cannot be classified as relational
databases. Most notable is the object database management system, which stores
language objects natively without using a separate data definition language and
without translating into a separate storage schema. Unlike relational systems, these
object databases store the relationship between complex data types as part of their
storage model in a way that does not require runtime calculation of related data
using relational algebra execution algorithms.
1
Database management systems
Components of DBMS
RDBMS components
2
• Transaction engine - Transactions are sequences of operations that read or
write database elements, which are grouped together.
• Relational engine - Relational objects such as Table, Index, and Referential
integrity constraints are implemented in this component.
• Storage engine - This component stores and retrieves data records. It also
provides a mechanism to store metadata and control information such as
undo logs, redo logs, lock tables, etc.
ODBMS components
Types
Operational database
These databases store detailed data needed to support the operations of an entire
organization. They are also called subject-area databases (SADB), transaction
databases, and production databases. For example:
3
• customer databases
• personal databases
• inventory databases
• accounting databases
Analytical database
These databases store data and information extracted from selected operational and
external databases. They consist of summarized data and information most needed
by an organization's management and other[which?] end-users. Some people refer to
analytical databases as multidimensional databases, management databases, or
information databases.
Data warehouse
A data warehouse stores data from current and previous years — data extracted
from the various operational databases of an organization. It becomes the central
source of data that has been screened, edited, standardized and integrated so that it
can be used by managers and other end-user professionals throughout an
organization. Data warehouses are characterized by being slow to insert into but fast
to retrieve from. Recent developments in data warehousing have led to the use of a
Shared nothing architecture to facilitate extreme scaling.
Distributed database
End-user database
External database
4
Navigational database
In-memory databases
In-memory databases primarily rely on main memory for computer data storage.
This contrasts with database management systems which employ a disk-based
storage mechanism. Main memory databases are faster than disk-optimized
databases since[citation needed] the internal optimization algorithms are simpler and
execute fewer CPU instructions. Accessing data in memory provides faster and more
predictable performance than disk. In applications where response time is critical,
such as telecommunications network equipment that operates emergency systems,
main memory databases are often used.
Document-oriented databases
Real-time databases
Relational Database
The standard of business computing as of 2009, relational databases are the most
commonly used database today[citation needed]. It uses the table to structure information
so that it can be readily and easily searched through.
5
Models
Products offering a more general data model than the relational model are
sometimes classified[by whom?] as post-relational. The data model in such products
incorporates relations but is not constrained by the Information Principle, which
requires the representation of all information by data values in relation to it.
Some of these extensions to the relational model actually integrate concepts from
technologies that pre-date the relational model. For example, they allow
representation of a directed graph with trees on the nodes.
In recent years, the object-oriented paradigm has been applied[by whom?] to database
technology, creating various kinds of new programming models known as object
databases. These databases attempt to bring the database world and the
application-programming world closer together, in particular by ensuring that the
database uses the same type system as the application program. This aims to avoid
the overhead (sometimes referred to as the impedance mismatch) of converting
information between its representation in the database (for example as rows in
tables) and its representation in the application program (typically as objects). At the
same time, object databases attempt to introduce key ideas of object programming,
such as encapsulation and polymorphism, into the world of databases.
A variety of these ways have been tried[by whom?] for storing objects in a database.
Some products have approached the problem from the application-programming
side, by making the objects manipulated by the program persistent. This also
typically requires the addition of some kind of query language, since conventional
programming languages do not provide language-level functionality for finding
objects based on their information content. Others[which?] have attacked the problem
from the database end, by defining an object-oriented data model for the database,
and defining a database programming language that allows full programming
capabilities as well as traditional query facilities.
Storage structures
6
• ordered/unordered flat files
• ISAM
• heaps
• hash buckets
• B+ trees
These have various advantages and disadvantages - discussed further in the articles
on each topic. The most commonly used[citation needed] are B+ trees and ISAM.
Object databases use a range of storage mechanisms. Some use virtual memory-
mapped files to make the native language (C++, Java etc.) objects persistent. This
can be highly efficient but it can make multi-language access more difficult. Others
break the objects down into fixed- and varying-length components that are then
clustered tightly together in fixed sized blocks on disk and reassembled into the
appropriate format either for the client or in the client address space. Another
popular technique involves storing the objects in tuples (much like a relational
database) which the database server then reassembles for the client.
Other important design choices relate to the clustering of data by category (such as
grouping data by month, or location), creating pre-computed views known as
materialized views, partitioning data by range or hash. Memory management and
storage topology can be important design choices for database designers as well.
Just as normalization is used to reduce storage requirements and improve the
extensibility of the database, conversely denormalization is often used to reduce join
complexity and reduce execution time for queries.[1]
Indexing
All of these databases can take advantage of indexing to increase their speed. This
technology has advanced tremendously since its early uses in the 1960s and 1970s.
The most common[citation needed] kind of index uses a sorted list of the contents of some
particular table column, with pointers to the row associated with the value. An index
allows a set of table rows matching some criterion to be quickly located. Typically,
indexes are also stored in the various forms of data-structure mentioned above (such
as B-trees, hashes, and linked lists). Usually, a database designer selects specific
techniques to increase efficiency in the particular case of the type of index required.
Most relational DBMSs and some object DBMSs have the advantage that indexes can
be created or dropped without changing existing applications making use of them,
The database chooses between many different strategies based on which one it
estimates will run the fastest. In other words, indexes act transparently to the
application or end-user querying the database; while they affect performance, any
SQL command will run with or without indexes to compute the result of an SQL
statement. The RDBMS will produce a query plan of how to execute the query: often
generated by analyzing the run times of the different algorithms and select the
quickest process. Some of the key algorithms that deal with joins are nested loop
join, sort-merge join and hash join. Which of these an RDBMS selects may depend
on whether an index exists, what type it is, and its cardinality.
An index speeds up access to data, but it has disadvantages as well. First, every
index increases the amount of storage used on the hard drive which is also
necessary for the database file, and second, the index must be updated each time
7
the data are altered, and this costs time. (Thus an index saves time in the reading of
data, but it costs time in entering and altering data. It thus depends on the use to
which the data are to be put whether an index is overall a net plus or minus in the
quest for efficiency.)
• Atomicity: Either all the tasks in a transaction must happen, or none of them.
The transaction must be completed, or else it must be undone (rolled back).
• Consistency: Every transaction must preserve the integrity constraints — the
declared consistency rules — of the database. It cannot leave the data in a
contradictory state.
• Isolation: Two simultaneous transactions cannot interfere with one another.
Intermediate results within a transaction must remain invisible to other
transactions.
• Durability: Completed transactions cannot be aborted later or their results
discarded. They must persist through (for instance) restarts of the DBMS after
crashes.
In practice, many DBMSs allow the selective relaxation of most of these rules — for
better performance.
Concurrency control ensures that transactions execute in a safe manner and follow
the ACID rules. The DBMS must be able to ensure that only serializable, recoverable
schedules are allowed, and that no actions of committed transactions are lost while
undoing aborted transactions.
Replication
8
Parallel synchronous replication of databases enables the replication of transactions
on multiple servers simultaneously, which provides a method for backup and security
as well as data availability. This is commonly referred to as "database clustering".[2]
[3]
Security
Database security denotes the system, processes, and procedures that protect a
database from unintended activity. Enforcing security is one of the major tasks of the
DBA.
DBMSs usually enforce security through access control, auditing, and encryption:
• Access control ensures and restricts who can connect and what they can do to
the database.
• Auditing logs what action or change has been performed, when and by whom.
• Encryption: many commercial databases include built-in encryption
mechanisms to encode data natively into tables and to decipher information
"on the fly" when a query comes in. DBAs can also secure and encrypt
connections if required using DSA, MD5, SSL or legacy encryption standards.
In the United Kingdom, legislation protecting the public from unauthorized disclosure
of personal information held on databases falls under the Office of the Information
Commissioner. Organizations based in the United Kingdom and holding personal data
in electronic format (databases for example) must register with the Data
Commissioner.[4]
Locking
In basic filesystem files or folders, only one lock at a time can be set, restricting the
usage to one process only. Databases, on the other hand, can set and hold mutiple
locks at the same time on the different levels of the physical data structure. The
database engine locking scheme determines how to set and maintain locks based on
the submitted SQL or transactions by the users. Generally speaking, any activity on
the database should involve some or extensive locking.
As of 2009 most DBMS systems use shared and exclusive locks. Exclusive locks
mean that no other lock can acquire the current data object as long as the exclusive
lock lasts. DBMSs usually set exclusive locks when the database needs to change
data, as during an UPDATE or DELETE operation.
Shared locks can take ownership one from the other of the current data structure.[5]
Shared locks are usually used while the database is reading data (during a SELECT
operation). The number, nature of locks and time the lock holds a data block can
9
have a huge impact on the database performances. Bad locking can lead to
disastrous performance response (usually the result of poor SQL requests, or
inadequate database physical structure)
The isolation level of the data server enforces default locking behavior. Changing
the isolation level will affect how shared or exclusive locks must be set on the data
for the entire database system. Default isolation is generally 1, where data can not
be read while it is modified, forbidding the return of "ghost data" to end users.
Databases can also be locked for other reasons, like access restrictions for given
levels of user. Some DBAs also lock databases for routine maintenance, which
prevents changes being made during the maintenance. See "Locking tables and
databases" (section in some documentation / explanation from IBM) for more detail.)
However, many modern databases do not lock the database during routine
maintenance. e.g. "Routine Database Maintenance" for PostgreSQL.
Applications
10