Beruflich Dokumente
Kultur Dokumente
Instance Architecture
Lesson 6
Agenda
Instance Architecture
Creating a Database
Database
▪ Location
▪ Files
▪ Limits
Instance Architecture
Vector Architecture
Client application
SQL parser
parsed tree
INGRES
Optimizer
query plan
Cross compiler
X100 algebra
Rewriter
annotated query tree
Builder
X100
operator tree
Execution engine
data request data
Buffer manager
I/O
File System
Vector-H Architecture
Client application
X100
SQL parser
inter-node communication
parsed tree HDFS
INGRES
Vector Network
X100
Optimizer
MPI
query plan
X100
X100 algebra
MPI
Builder Builder
X100
X100
HDFS HDFS
HDFS namenode HDFS datanode
The Vector-H instance architecture. The only interfaces in Hadoop we require are
ZooKeeper, HDFS and YARN.
Regular client applications talk to regular Vector instance on the master node. API
(direct) access is possible to the X100 engine on master node.
X100 server (engine) on master node has a modified rewriter that is aware of the
slave nodes (workers). It decides what parts of the query will be executed on what
node and annotates the query plan with this information. This annotated query plan is
distributed to all worker nodes. The builders on the worker nodes make sure the
operator tree only contains the relevant parts of the plan and adds local parallelism.
The builder on the master node might include some execution operators and operators
to collect and possibly aggregate the final results. The execution engines on the
workers execute their local plan, working with as much local data as possible. They
are able to communicate with other worker nodes if needed. The partial results are
sent back to the master for aggregation and the final result is passed on to the client.
Remember it is strongly recommended that Vector should have its own 10Gbit
Network interface.
User Interface
(SQL, ABF, OpenROAD, Name
JAVA, etc.) VW
Server x100 Server Vector
Memory
(iigcn) LOG Data Store
authpass VW LOCK
Communications
Server (iigcc) Databases
DBMS
Server (iidbms)
DAS Databases
Server (iigcd)
Databases
X100 process starts when connecting for the first time to the database – then stop
when ingstop.
The LOG file shown here, was replaced by the ‘wal’ file in Vector 5.0, it’s still a ‘log’
file.
II_SYSTEM
$II_SYSTEM/ingres
RDBMS/Vector executables
Library files
Utilities
$II_SYSTEM/ingres/version.rel
Product ID Version
Patch number
Creating a Database
Partial syntax
createdb [-uusername] dbname [-p]
[-dlocation] [-clocation] [-jlocation]
[-blocation] [-wlocation] [-vlocation]
Defaults
– Public database
– Database defaults are the same as installation defaults
Example
– Create a public database called db1
createdb db1
-d Create db with location as db’s default data location (note that this does not affect
Vector data files).
-c Create db with location as db’s default checkpoint location (note that this does not
affect Vector checkpoint files).
-j Create db with location as db’s default journal location.
-b Create db with location as db’s default dump location.
-w Create db with location as db’s default work location (note that this is not used by
Vector work operations).
-v Specifies the location of the Vector database files.
It is recommended that you size your database area appropriately and use the default
data location.
infodb database_name
Vector
Vector-H
----Extent directory------------------------------------------------------
Location Flags Physical_path
------------------------------------------------------------------
ii_database ROOT,DATA /opt/Actian/VectorVH/ingres/data/default/otp
ii_journal JOURNAL /opt/Actian/VectorVH/ingres/jnl/default/otp
ii_hdfsbackup CHECKPOINT,HDFS hdfs://cloudera-cluster-01-nameservice/Actian/VectorVH/ingres/ckp/default/otp
ii_dump DUMP /opt/Actian/VectorVH/ingres/dmp/default/otp
ii_work WORK /opt/Actian/VectorVH/ingres/work/default/otp
ii_hdfsdata DATA,VWROOT,HDFS hdfs://cloudera-cluster-01-nameservice/Actian/VectorVH/ingres/data/default/otp
ii_hdfswork WORK,HDFS hdfs://cloudera-cluster-01-nameservice/Actian/VectorVH/ingres/work/default/otp
================================================================================
----Vectorwise directory--------------------------------------------------------
Location Flags Physical_path
------------------------------------------------------
ii_database ROOT,DATA /opt/Actian/VectorVH/ingres/data/vectorwise/otp
ii_journal JOURNAL /opt/Actian/VectorVH/ingres/jnl/vectorwise/otp
ii_hdfsbackup CHECKPOINT,HDFS hdfs://cloudera-cluster-01-nameservice/Actian/VectorVH/ingres/ckp/vectorwise/otp
ii_work WORK /opt/Actian/VectorVH/ingres/work/vectorwise/otp
ii_hdfsdata DATA,VWROOT,HDFS hdfs://cloudera-cluster-01-nameservice/Actian/VectorVH/ingres/data/vectorwise/otp
ii_hdfswork WORK,HDFS hdfs://cloudera-cluster-01-nameservice/Actian/VectorVH/ingres/work/vectorwise/otp
II_DATABASE/ingres/data/vectorwise/dbname/wal/
▪ (Vector) Write Ahead Log
II_DATABASE/ingres/data/vectorwise/dbname/CBM/lock
▪ The Vector database lock file
II_DATABASE/ingres/data/vectorwise/dbname/CBM/default/ (and other
locations that database has been extended to)
▪ The database column table files
• Remember we are a column store, so we have column and not just table files.
Data area
Ingres data
Vector data
Vector Database
WAL file
Ingres Checkpoint
Vector Checkpoint
Ingres Journal
Vector Journal
Work/Sort area
Ingres work
Vector work
The checkpoint area contains out database column and table backup files
_salesdbaScust_cust_id_00000191
_salesdba (table owner)
S delimiter
cust__ (table name)
cust_id (column name)
_0000… (unique table identifier id)
“vectorwise/dbname” Directory
“authpass” File
x100 client-to-server connection authorization file
▪ One per database
▪ Created each time the x100 server starts
Prevents
▪ Non-installation owner using x100 client
▪ Remote connection to the x100 server
“CBM” Directory
Stores:
▪ Catalog
▪ DDL changes
▪ Min-Max indices
▪ Secondary Indexes
▪ In-memory updates (aka Positional Delta Tree / PDT)
“wal/main.wal” File
This file is read at x100 server start up
▪ To recreate Vector system catalog, secondary index etc in memory
▪ On Vector-H, this is stored in HDFS and read by ALL Data Nodes
▪ As this file gets bigger x100 server will take longer to start
“../wal/main_wal_backups” Directory
Stores copies of the old main.wal files
▪ After LOG condensation
Log condense -> update Catalog info, remove DDL changes and in-memory updates
The main.wal file grows as DDL and DML queries make changes to the database.
When the file grows significantly, the system tries to shrink its size by creating a new,
smaller version of it.
For recovery, previous versions of the file are saved in the main_wal_backups
directory of the default data location. Each file in the main_wal_backups directory is
named using its creation timestamp.
When the total size of the files in the directory exceeds the configuration
parameter max_old_log_size (on max_old_log_size), the oldest files in the directory
are automatically deleted.
“CBM/lock” File
One per database
This file is the database lock file
▪ Prevents starting two instances on the same data
▪ Ensures only one x100 process per database per node
CBM/default/data Files
Vector
_actianSairline__airline_code_00000116
Schema owner Table name Column name Unique column number (dec)
_actianSairline@1c4__airline_code_000000000000001a
Partition #1 of 4
Database Limits
Maximum database file size
▪ Unlimited
Maximum tables in a database
▪ 1.1 billion
Maximum rows per table
▪ 140 trillion (2^47)
Maximum row width per table
▪ 256 KB
Maximum columns per table
▪ 2048
Maximum number of tables in a join
▪ 382
Lesson Summary
Instance Architecture
Creating a Database
Database
▪ Location
▪ Files
▪ Limits