Instance Architecture: Lesson 6

Lesson 6 – Instance Architecture
Instance Architecture
Lesson 6
1 Confidential © 2016 Actian Corporation
Vector DBM 6-1 © Actian Corporation

Agenda
Creating a Database
Database
▪ Location
▪ Files
▪ Limits


Vector Instance Architecture

Vector Architecture
Client application
SQL query result
SQL parser
parsed tree
INGRES
Optimizer
query plan
Cross compiler
X100 algebra
Rewriter
annotated query tree
Builder
X100
operator tree
Execution engine
data request data
Buffer manager
I/O
File System
X100 Alegbra is written in “MIL”
The rewriter looks at parallelization of query (x100 algebra)

Vector-H Architecture
Client application
SQL query result
X100
SQL parser
inter-node communication
parsed tree HDFS
INGRES
Vector Network
X100
Optimizer
MPI
query plan
Cross compiler HDFS

Master node (datanode)
X100
X100 algebra
MPI
Worker node [1..n] (datanodes)

Distributed rewriter Rewriter HDFS
annotated tree X100
annotated query tree annotated query tree
Builder Builder
X100
X100
operator tree partial operator tree HDFS

MPI
Execution engine Execution engine X100
partial result set
data request data data request data
Buffer manager Vector Network Buffer manager

HDFS
I/O I/O X100
HDFS HDFS
HDFS namenode HDFS datanode
The Vector-H instance architecture. The only interfaces in Hadoop we require are
ZooKeeper, HDFS and YARN.
Regular client applications talk to regular Vector instance on the master node. API
(direct) access is possible to the X100 engine on master node.
X100 server (engine) on master node has a modified rewriter that is aware of the
slave nodes (workers). It decides what parts of the query will be executed on what
node and annotates the query plan with this information. This annotated query plan is
distributed to all worker nodes. The builders on the worker nodes make sure the
operator tree only contains the relevant parts of the plan and adds local parallelism.
The builder on the master node might include some execution operators and operators
to collect and possibly aggregate the final results. The execution engines on the
workers execute their local plan, working with as much local data as possible. They
are able to communicate with other worker nodes if needed. The partial results are
sent back to the master for aggregation and the final result is passed on to the client.
Communication is with the intel MPI (Message Passing Interface) library.
Remember it is strongly recommended that Vector should have its own 10Gbit
Network interface.

Vector Instance Architecture
User Interface
(SQL, ABF, OpenROAD, Name
JAVA, etc.) VW
Server x100 Server Vector
Memory
(iigcn) LOG Data Store
authpass VW LOCK
Communications
Server (iigcc) Databases
DBMS
Server (iidbms)
DAS Databases
Server (iigcd)
Databases
User Interface Locks Log Buffers

(SQL, ABF, OpenROAD,
etc.) Recovery
Server (iircp)
Journals
Journals
Journals
Journals
Ingres
Archiver (iiacp)
Transaction
Log File
X100 process starts when connecting for the first time to the database – then stop
when ingstop.
The LOG file shown here, was replaced by the ‘wal’ file in Vector 5.0, it’s still a ‘log’
file.

LOG file – now wal.main

VW
x100 Server Vector
Memory
Data Store
WAL
The ‘log’ file is found under…..
There is one for each database.
There is also one for each slave in Vector-H.

II_SYSTEM
• II_SYSTEM defines the “root” location for a single Vector instance.

• An operating system environment variable:
II_SYSTEM for a “VW” instance

Vector environment variables

• Stored in $II_SYSTEM/ingres/files/symbol.tbl
• Set using ingsetenv
• Viewed using ingprenv:
II_CONFIG=/opt/Actian/VectorVH/ingres/files
II_INSTALLATION=VH
II_DATABASE=/opt/Actian/VectorVH
II_CHECKPOINT=/opt/Actian/VectorVH
II_JOURNAL=/opt/Actian/VectorVH
II_DUMP=/opt/Actian/VectorVH
II_WORK=/opt/Actian/VectorVH
II_NUM_OF_PROCESSORS=2
II_MAX_SEM_LOOPS=2000
II_TIMEZONE_NAME=UNITED-KINGDOM
II_CHARSETVH=UTF8
II_TEMPORARY=/tmp
ING_EDIT=/usr/bin/vim
II_GCNVH_PORT=58800
II_MTS_JAVA_HOME=/opt/Actian/VectorVH/ingres/jre
II_HDFSDATA=hdfs://cloudera-cluster-01-nameservice/Actian/VectorVH
II_HDFSBACKUP=hdfs://cloudera-cluster-01-nameservice/Actian/VectorVH
II_HDFSWORK=hdfs://cloudera-cluster-01-nameservice/Actian/VectorVH
II_SHADOW_PWD=/opt/Actian/VectorVH/ingres/bin/ingvalidpw
II_INSTALL_CTL=/opt/Actian/VectorVH/ingres/install/control
II_PATCH_BACKUP=/opt/Actian/VectorVH/ingres/install/backup
II_CKTMPL_FILE=/opt/Actian/VectorVH/ingres/files/cktmpl_vh.def
II_LG_MEMSIZE=32030720
I_MPI_ROOT=/opt/Actian/VectorVH/ingres/mpi
Vector DBM 6 - 10 © Actian Corporation

$II_SYSTEM/ingres
RDBMS/Vector executables
Log and configuration files
Library files
Transaction Log (RDBMS only)
Utilities

$II_SYSTEM/ingres/version.rel
Product ID Version
VH 5.0.0 (a64.lnx/530) Build Number
Architecture (64 bit) Operating System

version.rel and patches
Patch number

Creating a Database

Creating a Database: createdb
Partial syntax
createdb [-uusername] dbname [-p]
[-dlocation] [-clocation] [-jlocation]
[-blocation] [-wlocation] [-vlocation]
Defaults
– Public database
– Database defaults are the same as installation defaults
Example
– Create a public database called db1
createdb db1
Creating a Database: createdb

Partial Syntax flags:
-u Allows security administrator to impersonate would-be DBA.
-p Creates a private database.
-n Creates a Unicode database (NFD normalization).

-i Creates a Unicode database (NFC normalization).
-d Create db with location as db’s default data location (note that this does not affect
Vector data files).
-c Create db with location as db’s default checkpoint location (note that this does not
affect Vector checkpoint files).
-j Create db with location as db’s default journal location.
-b Create db with location as db’s default dump location.
-w Create db with location as db’s default work location (note that this is not used by
Vector work operations).
-v Specifies the location of the Vector database files.

Creating a Database: Actian Director
Actian Director discussed later



Extending a Database for Data

Databases must be extended to use additional locations
▪ accessdb
▪ extenddb
Once extended, tables need to be created or moved to use the

additional locations
▪ create table ... with location=(location_name[, location_name...])
Vector Only:
• modify table ... with location=(location_name[, location_name...])
• modify table ... to relocate ...
• modify table ... to reorganize ...
Modify : only on a HEAP table
It is recommended that you size your database area appropriately and use the default
data location.
It may be something more suitable for non Production environments.

infodb database_name

infodb database_name (cont.)
Vector
VectorH will also include the HDFS locations.

infodb database_name (cont.)
Vector-H
----Extent directory------------------------------------------------------
Location Flags Physical_path
------------------------------------------------------------------
ii_database ROOT,DATA /opt/Actian/VectorVH/ingres/data/default/otp
ii_journal JOURNAL /opt/Actian/VectorVH/ingres/jnl/default/otp
ii_hdfsbackup CHECKPOINT,HDFS hdfs://cloudera-cluster-01-nameservice/Actian/VectorVH/ingres/ckp/default/otp
ii_dump DUMP /opt/Actian/VectorVH/ingres/dmp/default/otp
ii_work WORK /opt/Actian/VectorVH/ingres/work/default/otp
ii_hdfsdata DATA,VWROOT,HDFS hdfs://cloudera-cluster-01-nameservice/Actian/VectorVH/ingres/data/default/otp
ii_hdfswork WORK,HDFS hdfs://cloudera-cluster-01-nameservice/Actian/VectorVH/ingres/work/default/otp
================================================================================
----Vectorwise directory--------------------------------------------------------
Location Flags Physical_path
------------------------------------------------------
ii_database ROOT,DATA /opt/Actian/VectorVH/ingres/data/vectorwise/otp
ii_journal JOURNAL /opt/Actian/VectorVH/ingres/jnl/vectorwise/otp
ii_hdfsbackup CHECKPOINT,HDFS hdfs://cloudera-cluster-01-nameservice/Actian/VectorVH/ingres/ckp/vectorwise/otp
ii_work WORK /opt/Actian/VectorVH/ingres/work/vectorwise/otp
ii_hdfsdata DATA,VWROOT,HDFS hdfs://cloudera-cluster-01-nameservice/Actian/VectorVH/ingres/data/vectorwise/otp
ii_hdfswork WORK,HDFS hdfs://cloudera-cluster-01-nameservice/Actian/VectorVH/ingres/work/vectorwise/otp
VectorH will also include the HDFS locations.

Files per Database

II_DATABASE/ingres/data/vectorwise/dbname/authpass
▪ Encrypted client connection file
II_DATABASE/ingres/data/vectorwise/dbname/wal/
▪ (Vector) Write Ahead Log
II_DATABASE/ingres/data/vectorwise/dbname/CBM/lock
▪ The Vector database lock file
II_DATABASE/ingres/data/vectorwise/dbname/CBM/default/ (and other
locations that database has been extended to)
▪ The database column table files
• Remember we are a column store, so we have column and not just table files.
Note: Vector-H the II_DATABASE is II_HDFSDATA
VectorH files will be located in the VectorH locations.
CBM = Column Buffer Manager

Directory Structure of Default Locations: Data
Data area
Ingres data
Vector data
Column table data

files
Vector Database
WAL file

Directory Structure of Default Locations

Checkpoint area
Ingres Checkpoint
Vector Checkpoint
Dumps (Ingres only)
Ingres Journal
Vector Journal
Work/Sort area
Ingres work
Vector work
The checkpoint area contains out database column and table backup files
The dmp area contains our undo files
The jnl (journal) area contains our redo log files
Work area should be temporary area for spill to disk operations

Data Files – RDBMS System Catalogs

Types of RDBMS data files include:
File type Location File name (e.g.)
System Default DATA (II_DATABASE) aaaaaanh.t00
catalogs aaaaaani.t00
User tables Any DATA location aaaaabab.t00
aaaaabac.t00
aaaaabac.t01
Indexes Any DATA location aaaaabae.t00
aaaaabaf.t00
aaaaabaf.t01
Database Default DATA (II_DATABASE) aaaaaaaa.cnf
configuration
file

Data Files - Vector

Types of Vector data files include:
File type Location File name (e.g.)
Column data Any DATA location _salesdbaScust__cust_id_0000

(Vector) (one file per column) 0191
Non-Partitioned Any DATA location _salesdbaScust__cust_id_0000
Column data (one file per table) 00000000000e
(Vector-H)
Partitioned Any DATA location _salesdbaScust@1c4__cust_id_
Column data (one file per table per partition – 0000000000000013
(Vector-H) where in this example, the file
represents partition #1 of 4)
Vector database Database’s CBM directory under WAL
definition and database’s default DATA location
data changes
Installation ingres/data/vectorwise directory vectorwise.conf
configuration file under II_DATABASE
Database Database’s CBM directory under vectorwise.conf
configuration file database’s default DATA location
_salesdbaScust_cust_id_00000191
_salesdba (table owner)
S delimiter
cust__ (table name)
cust_id (column name)
_0000… (unique table identifier id)

“vectorwise/dbname” Directory

“authpass” File
x100 client-to-server connection authorization file
▪ One per database
▪ Created each time the x100 server starts
Contains a “secret” key

▪ Assigned by iidbms
▪ Used to verify incoming requests
▪ Only accessible to instance owner
Prevents
▪ Non-installation owner using x100 client
▪ Remote connection to the x100 server

“CBM” Directory

“wal/main.wal” Write Ahead Log File

One per database
Stores:
▪ Catalog
▪ DDL changes
▪ Min-Max indices
▪ Secondary Indexes
▪ In-memory updates (aka Positional Delta Tree / PDT)
On Vector-H, the master node writes the write-ahead LOG

Note: All of the above topics are covered later on this course!

“wal/main.wal” File
This file is read at x100 server start up
▪ To recreate Vector system catalog, secondary index etc in memory
▪ On Vector-H, this is stored in HDFS and read by ALL Data Nodes
▪ As this file gets bigger x100 server will take longer to start
If this file is lost, so is the database and the data

DO NOT
▪ DELETE
▪ EDIT
▪ Maltreat

“../wal/main_wal_backups” Directory
Stores copies of the old main.wal files
▪ After LOG condensation
Possibly useful in case of catastrophic failures

Size limited with configuration “system” parameter
max_old_log_size
Log condense -> update Catalog info, remove DDL changes and in-memory updates
The main.wal file grows as DDL and DML queries make changes to the database.
When the file grows significantly, the system tries to shrink its size by creating a new,
smaller version of it.
For recovery, previous versions of the file are saved in the main_wal_backups
directory of the default data location. Each file in the main_wal_backups directory is
named using its creation timestamp.
When the total size of the files in the directory exceeds the configuration
parameter max_old_log_size (on max_old_log_size), the oldest files in the directory
are automatically deleted.
The files in the main_wal_backups directory can be manually removed without

adversely affecting the data in the database. Nevertheless, we recommend keeping at
least one or two most recent files as backups for database recovery.

“CBM/lock” File
One per database
This file is the database lock file
▪ Prevents starting two instances on the same data
▪ Ensures only one x100 process per database per node
Stores two useful numbers:

$ cat CBM/lock
48896
5077
▪ 48896 – port number, useful for x100_client
▪ 5077 – process ID, useful for gdb or… kill

CBM/default/data Files
Vector
_actianSairline__airline_code_00000116
Schema owner Table name Column name Unique column number (dec)
Vector-H (Partitioned and Non-Partitioned)

_actianSairline__airline_code_000000000000001a
1st Column name Unique table number (hex)
_actianSairline@1c4__airline_code_000000000000001a
Partition #1 of 4

Database Limits
Maximum database file size
▪ Unlimited
Maximum tables in a database
▪ 1.1 billion
Maximum rows per table
▪ 140 trillion (2^47)
Maximum row width per table
▪ 256 KB
Maximum columns per table
▪ 2048
Maximum number of tables in a join
▪ 382

Lesson Summary
Creating a Database
Database
▪ Location
▪ Files
▪ Limits


Instance Architecture: Lesson 6

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Instance Architecture: Lesson 6

Hochgeladen von

Copyright:

Verfügbare Formate

Lesson 6 – Instance Architecture

1 Confidential © 2016 Actian Corporation

Vector DBM 6-1 © Actian Corporation

2 Confidential © 2016 Actian Corporation

Vector DBM 6-2 © Actian Corporation

3 Confidential © 2016 Actian Corporation

Vector DBM 6-3 © Actian Corporation

Vector Instance Architecture

4 Confidential © 2016 Actian Corporation

Vector DBM 6-4 © Actian Corporation

SQL query result

5 Confidential © 2016 Actian Corporation

X100 Alegbra is written in “MIL”

The rewriter looks at parallelization of query (x100 algebra)

Vector DBM 6-5 © Actian Corporation

SQL query result

Cross compiler HDFS

Worker node [1..n] (datanodes)

operator tree partial operator tree HDFS

Buffer manager Vector Network Buffer manager

6 Confidential © 2016 Actian Corporation

Communication is with the intel MPI (Message Passing Interface) library.

Vector DBM 6-6 © Actian Corporation

Vector Instance Architecture

User Interface Locks Log Buffers

7 Confidential © 2016 Actian Corporation

Vector DBM 6-7 © Actian Corporation

LOG file – now wal.main

8 Confidential © 2016 Actian Corporation

The ‘log’ file is found under…..

There is one for each database.

There is also one for each slave in Vector-H.

Vector DBM 6-8 © Actian Corporation

• II_SYSTEM defines the “root” location for a single Vector instance.

II_SYSTEM for a “VW” instance

9 Confidential © 2016 Actian Corporation

Vector DBM 6-9 © Actian Corporation

Vector environment variables

10 Confidential © 2016 Actian Corporation

Vector DBM 6 - 10 © Actian Corporation

Log and configuration files

Transaction Log (RDBMS only)

11 Confidential © 2016 Actian Corporation

Vector DBM 6 - 11 © Actian Corporation

VH 5.0.0 (a64.lnx/530) Build Number

Architecture (64 bit) Operating System

12 Confidential © 2016 Actian Corporation

Vector DBM 6 - 12 © Actian Corporation

version.rel and patches

13 Confidential © 2016 Actian Corporation

Vector DBM 6 - 13 © Actian Corporation

14 Confidential © 2016 Actian Corporation

Vector DBM 6 - 14 © Actian Corporation

Creating a Database: createdb

15 Confidential © 2016 Actian Corporation

Creating a Database: createdb

-n Creates a Unicode database (NFD normalization).

Vector DBM 6 - 15 © Actian Corporation

Creating a Database: Actian Director

16 Confidential © 2016 Actian Corporation

Actian Director discussed later

Vector DBM 6 - 16 © Actian Corporation

Creating a Database: Actian Director