Sie sind auf Seite 1von 49

A+

Documentation

USER GUIDE
Table of Contents
MonetDB User guide 3

Installation 3

Tutorial 5

Windows tutorial 9
Changing database location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
About the VOC dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

MonetDB.R Tutorial 10
dplyr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
About the VOC dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

MonetDB Server 14

mclient man-page 18
NAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
SYNOPSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
OPTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
BACKSLASH COMMANDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
SEE ALSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

monetdb man-page 23
NAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
SYNOPSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
OPTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
COMMANDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
EXPRESSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
RETURN VALUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
SEE ALSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

monetdbd man-page 28
NAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
SYNOPSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
COMMANDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
CONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
REMOTE DATABASES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
MULTIPLEX-FUNNELS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
SIGNALS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1
RETURN VALUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
SEE ALSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

mserver5 man-page 34
NAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
SYNOPSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
OPERATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
MSERVER5 OPTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
GDK PARAMETERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
MSERVER5 PARAMETERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
SQL PARAMETERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
CONFIG FILE FORMAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
SEE ALSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

msqldump man-page 38
NAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
SYNOPSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
OPTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
SEE ALSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Readonly database 39

System resources 40

Autoloading scripts 41

Memory footprint 42

Multi-core usage 42

Data compression 43

Disk space 43
Running out of diskspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

OOM Killer 44

Server debugging 44

SQL dump and restore 45


Windows platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

File system level backup 46

2
MonetDB User guide

A MonetDB database instance manages a single database on a single system. It is perfectly possible
to have multiple database instances run concurrently on a single system, but its resource claim may
negatively affect the performance perceived.
This section takes you from the easy installation to running a tutorial.
• Installation

• Tutorial

• MonetDB Server

• System resources

• SQL dump and restore

Installation

The MonetDB code is available in easy to install packages for system administrators. The platforms
supported with their detailed requirements are shown on the download menu. A private install can be
obtained by downloading and compiling the source distribution. You don’t need super-user or system
administrator priviliges for this.
After installation of MonetDB software you will find a number of programs in the bin sub-directory of
your installation directory. Below is an overview of the server and client programs distributed with
MonetDB.

Server
program Description Documentation
monetdbd The MonetDB Database Server [See manpage]
daemon.Once started it runs in the
background and facilitates, amongst
others, management of local (and remote)
mserver5’s. The daemon is controlled by
the client application monetdb. Note:
monetdbd is currently only available on
UNIX-like platforms.
mserver5 The MonetDB server, version 5. It runs as [See manpage]
a console program. Client programs
(except monetdb) connect to this server
process.

3
Client
program Description Documentation
monetdb A command-line program to control the [See manpage]
monetdbd deamon server by executing a
single command. It allows you for
instance to start & stop an mserver5
process or create a new database.
mclient A command-line program to interactively [See manpage]
communicate to a running mserver5
process. The program is intended for
technical persons, e.g. DBA’s, SQL
developers and system adminstrators.
jdbcclient.jarA Java command-line program to [See manpage]
interactively communicate to a running
mserver5 process via the MonetDB JDBC
driver. The program is intended for
technical persons, e.g. SQL developers
and DBA’s. Note: This program is a
separately downloaded.
msqldump A command-line program to export the [See manpage]
structure and content of a database or a
specific table to a file. This allows you to
quickly export and re-create the database
or table on a different server.

Please note that all these programs are started from a command shell (such as bash or cmd.exe) or
invoked from a script file. They do not have a Graphical User Interface (GUI)
For a user you may prefer to use more user-friendly GUI-programs. Many good GUI-programs are avail-
able. Those generic GUI-programs typically allow you to connect to any DBMS via standard database
connectivity API such as ODBC or JDBC. MonetDB provides both an ODBC driver and a JDBC driver.
The JDBC driver is usable for programs written in Java. There are also MonetDB server connectivity API
libraries available for populair programming languages such as Python, PHP, Perl. [More].
If you need the MonetDB ODBC driver or JDBC driver, you will need to download, install and configure
them at the computer OS where the GUI-program is installed or started. For the ODBC driver you will
need system administrator privileges to install the MonetDB ODBC driver components. For the JDBC
driver no sys-admin privileges are needed, you only have to download the single jar (Java ARchive) library
file (such as monetdb-jdbc-2.24.jar) from the download area and configure in your GUI-program where
the jar file is stored (or add it to the CLASSPATH environment setting or java -cp startup parameter).
The MonetDB JDBC driver is a 100% pure Java (type 4) driver, which means the same JDBC driver can
be deployed on all platforms which support the Java Runtime Environment (JRE) version 1.7 or higher.

4
Tutorial

The MonetDB/SQL tutorial is based on the material published in the


book J.R. Bruijn, F.S. Gaastra and I. Schaar, Dutch-Asiatic Shipping in the 17th and 18th Centuries,
which gives an account of the trips made to the East and ships returned safely (or wrecked on the way).
A total of 8000 records are provided. They include information about ship name and type, captain, the
arrival/departure of harbors along the route, personnel accounts, and anecdotal information.
Get going quickly on Linux involves starting the MonetDB daemon monetdbd, code name meriovingian.
It runs in the background and facilitates, amongst others, management of local/remote servers. The
daemon is controlled by the application monetdb. See its documentation for all details. The monetdbd
is not available on Windows, but these systems a default database called demo is made ready by default
(Windows tutorial is available here).
Here we demonstrate a simple session involving creation of a database, loading data, and querying.
1 shell> monetdbd create /path/to/mydbfarm
2 shell> monetdbd start /path/to/mydbfarm
3 shell> monetdb create voc
4 shell> monetdb release voc
5 shell> mclient -u monetdb -d voc
6 password:<monetdb>
7 Welcome to mclient, the MonetDB/SQL interactive terminal (Oct2014-SP1)
8 Database: MonetDB v11.19.7 (Oct2014-SP1), 'mapi:monetdb://localhost:50000/voc'
9 Type \q to quit, \? for a list of available commands
10 auto commit mode: on
11 sql>SELECT 'hello world';
12 +---------------+
13 | single_value |
14 +===============+
15 | hello world |
16 +---------------+
17 1 tuple (0.530ms)
18 sql>\q

The command \q or end-of-file signal terminates the connection with the server. Exploring the wealth
of functionality offered by MonetDB/SQL is best started using a toy database. For this we use the
VOC database which provides a peephole view into the administrative system of an early multi-national

5
company, the Vereenigde geoctrooieerde Oostindische Compagnie (VOC for short - The (Dutch) East
Indian Company) established on March 20, 1602. Download the VOC dataset voc_dump.zip (542K) gz
(519K) bz2 (371K) which is a compressed file with SQL statements. After the file has been extracted,
load its contents into MonetDB using mclient. Before you load the VOC data set, it is advised to
introduce a user different from the omnipresent default monetdb. The new user is given his own schema
for the MonetDB database. Assuming you have started MonetDB with SQL module, proceed as follows:
1 shell> mclient -u monetdb -d voc
2 password:<monetdb>
3 sql>CREATE USER "voc" WITH PASSWORD 'voc' NAME 'VOC Explorer' SCHEMA "sys";
4 sql>CREATE SCHEMA "voc" AUTHORIZATION "voc";
5 sql>ALTER USER "voc" SET SCHEMA "voc";
6 sql>\q

To illustrate the use of the newly created account and schema, the following example connects and
creates a table, which is dropped afterwards by aborting the transaction.
1 shell> mclient -u voc -d voc
2 password:<voc>
3 sql>START TRANSACTION;
4 auto commit mode: off
5
6 sql>CREATE TABLE test (id int, data varchar(30));
7 operation successful (0.722ms)
8
9 sql>\d
10 TABLE voc.test
11
12 sql>\d test
13 CREATE TABLE "voc"."test" (
14 "id" INTEGER,
15 "data" VARCHAR(30)
16 );
17 sql>CREATE TABLE "voc"."test" ("id" int, "data" varchar(30));
18 sql>ROLLBACK;
19 auto commit mode: on
20
21 sql>\d

Importing the voc_dump.sql file into the database can be done using the textual client interface. Some
alternative ways are as follows:
1 1: shell> mclient -u voc -d voc voc_dump.sql
2 password:<voc>
3 2:shell> mclient -u voc -d voc < voc_dump.sql
4 password:<voc>
5 3:shell> mclient -u voc -d voc
6 password:<voc>
7 sql> \< voc_dump.sql

6
The VOC data set contains data for around 8000 voyages.
1 sql>\d
2 TABLE voc.craftsmen
3 TABLE voc.impotenten
4 TABLE voc.invoices
5 TABLE voc.passengers
6 TABLE voc.seafarers
7 TABLE voc.soldiers
8 TABLE voc.total
9 TABLE voc.voyages

Let’s check out the data we just loaded. First, count the number of rows the voyages table.
1 sql>SELECT count(*) FROM voyages;
2 +------+
3 | L1 |
4 +======+
5 | 8131 |
6 +------+

The set consists of 8 tables, which are all bound to each other using FOREIGN KEY relationships. The
voyages table is the main table, which all others refer to. Every table, except invoices has a PRIMARY
KEY defined over the columns number and number_sup. Since the invoices table holds zero or more
invoices per voyage (identified by number, number_sub) a PRIMARY KEY constraint is not possible. The
tables craftsmen, impotenten, passengers, seafarers, and soldiers all share the same columns. We can
define a VIEW that combines them all into one big table, to make them easier to access.
1 sql>CREATE VIEW onboard_people AS
2 SELECT * FROM (
3 SELECT 'craftsmen' AS type, craftsmen.* FROM craftsmen
4 UNION ALL
5 SELECT 'impotenten' AS type, impotenten.* FROM impotenten
6 UNION ALL
7 SELECT 'passengers' AS type, passengers.* FROM passengers
8 UNION ALL
9 SELECT 'seafarers' AS type, seafarers.* FROM seafarers
10 UNION ALL
11 SELECT 'soldiers' AS type, soldiers.* FROM soldiers
12 UNION ALL
13 SELECT 'total' AS type, total.* FROM total
14 ) AS onboard_people_table;

The new view will show up and we can just use it as a normal table, to for instance calculate the number
of records for each group of people:
1 sql>\d
2 TABLE voc.craftsmen
3 TABLE voc.impotenten
4 TABLE voc.invoices

7
5 VIEW voc.onboard_people
6 TABLE voc.passengers
7 TABLE voc.seafarers
8 TABLE voc.soldiers
9 TABLE voc.total
10 TABLE voc.voyages
11 sql>SELECT type, COUNT(*) AS total
12 FROM onboard_people GROUP BY type ORDER BY type;
13 +------------+-------+
14 | type | total |
15 +============+=======+
16 | craftsmen | 2349 |
17 | impotenten | 938 |
18 | passengers | 2813 |
19 | seafarers | 4468 |
20 | soldiers | 4177 |
21 | total | 2467 |
22 +------------+-------+

Let’s see how many impotenten we have in the dataset:


1 sql>SELECT count(*) FROM impotenten;
2 +------+
3 | L1 |
4 +======+
5 | 938 |
6 +------+

It is possible to play with the set in many ways, to find out several things that took place during the
voyages of the ships, or the money that was earned. A few examples are shown below:
1 sql>SELECT COUNT(*) FROM voyages
2 WHERE particulars LIKE '%_recked%';
3 +------+
4 | L1 |
5 +======+
6 | 358 |
7 +------+
8 sql>SELECT chamber, CAST(AVG(invoice) AS integer) AS average
9 FROM invoices
10 WHERE invoice IS NOT NULL
11 GROUP BY chamber
12 ORDER BY average DESC;
13 +---------+---------+
14 | chamber | average |
15 +=========+=========+
16 | A | 282996 |
17 | Z | 259300 |
18 | H | 150182 |

8
19 | R | 149628 |
20 | D | 149522 |
21 | E | 149518 |
22 | null | 83309 |
23 +---------+---------+
24 sql>CREATE VIEW extended_onboard AS
25 SELECT number, number_sup, trip, trip_sup,
26 onboard_at_departure, death_at_cape,
27 left_at_cape, onboard_at_cape,
28 death_during_voyage, onboard_at_arrival,
29 death_during_voyage - left_at_cape AS death_at_arrival
30 FROM onboard_people;
31 sql>\d
32 TABLE voc.craftsmen
33 VIEW voc.extended_onboard
34 TABLE voc.impotenten
35 TABLE voc.invoices
36 VIEW voc.onboard_people
37 TABLE voc.passengers
38 TABLE voc.seafarers
39 TABLE voc.soldiers
40 TABLE voc.total
41 TABLE voc.voyages

Windows tutorial

On Windows the first step is to initialize a MonetDB server by simply opening: ‘Start -> Programs ->
MonetDB -> Start server’.
Subsequently you can start the textual interface (mclient) by opening: ‘Start -> Programs -> MonetDB
-> SQL client’. The commands entered there are identical to those found on other platforms. To stop
the server, you can simply close the MonetDB SQL Server window.
Note the server by default only accepts connections originating from the local host. If you need other
machines to access your database, change the configuration file by setting mapi_open=yes.
Exploring the wealth of functionality offered by MonetDB/SQL is best started using a toy database. For
this we use the VOC database which provides a peephole view into the administrative system of an
early multi-national company, the VOC (or Dutch East India Company).
Download the VOC dataset voc_dump.zip (542K) gz (519K) bz2 (371K) which is a compressed file with
SQL statements (SQL database dump). After the file has been extracted, load its contents into MonetDB
using the mclient.exe file. From within mclient run:
1 sql> \< voc_dump.sql

Since the SQL interface is the same on all platforms, from here on you can follow the later part of the
regular SQL tutorial.

9
Changing database location

If you need (or want) to change the location of the database then editing of the monetdb script is
required. Change the mclient.bat and M5server.bat files in C:\Program Files\MonetDB\MonetDB5. In
the M5server.bat file you would need to change the setting of the MONETDBFARM variable, and in
mclient.bat you would need to add a -d option to the call of mclient.exe. Note: you upgrade such
changed files will be overwritten.

About the VOC dataset

The data for the tutorial is published in the book J.R. Bruijn, F.S. Gaastra and I. Schaar Dutch-Asiatic
Shipping in the 17th and 18th Centuries, which gives an account of the trips made to the East and ships
returned safely (or wrecked on the way) by the VOC (or Dutch East India Company). A total of 8000
records are provided. They include information about ship name and type, captain, the arrival/departure
of harbors along the route, personnel accounts, and anecdotal information. You can find more about
the VOC on Wikipedia.

MonetDB.R Tutorial

The MonetDB.R tutorial is designed to familiarise R users with the functionality of the MonetDB.R con-
nector and dplyr library for data manipulation in R.
Before stating with this tutorial, make sure you have a running instance on MonetDB to which you can
connect. As such, the MonetDB/SQL tutorial is a recommended reading, even more, given that this
page was adapted from the SQL tutorial.
To start, in case you have not done it yet, create a DB farm using the MonetDB daemon monetdbd
and a new database called ‘voc’. The daemon is controlled by the application monetdb. See its docu-
mentation for all details. Note: The monetdbd is not available on Windows. For that reason, on these
systems a default database called demo is available by default (Windows-specific MonetDB tutorial is
available here).
From that moment on, you can start your favourite R shell, setup a MonetDB.R connection and process
your data in R.
Now, install MonetDB.R from your R shell.
1 > install.packages("MonetDB.R")

At this time you will be prompted to select a mirror.


After MonetDB.R is installed you must load it.
1 > library(MonetDB.R)
2 Loading required package: DBI
3 Loading required package: digest

10
Now, let’s create a connection to the database.
Note that by default MonetDB runs on port 50000, but MonetDB.R already knows that, thus there is
no need to set the port. Also, in the tutorial we’ve use the demo database, which is the default on
Windows.
1 > conn <- dbConnect(MonetDB.R(), host="localhost", dbname="demo", user="monetdb",
password="monetdb")

Let’s execute a dummy ‘hello world’ SQL query and in our database directly from R.
1 > dbGetQuery(conn,"SELECT 'hello world'")
2 single_value
3 1 hello world

To illustrate the use of the newly created account and schema, the following example connects and
creates a table, which is dropped afterwards by aborting the transaction.
1 > dbTransaction(conn)
2 > dbSendUpdate(conn,"CREATE TABLE test (id int, data varchar(30))")
3 > dbListTables(conn)
4 [1] "test"
5 > dbExistsTable(conn,"test")
6 [1] TRUE
7 > dbRollback(conn)
8 [1] TRUE
9 > dbListTables(conn)
10 character(0)

For the purpose of the tutorial we have prepared a TSV (tab-separated values) version of the VOC
dataset, which contains data for around 8000 voyages. After unpacking the .zip you can import the
data into the directly from R. Using the monetdb.read.csv function we can load the specified file as a
new table ‘craftsmen’, specifying the upper bound on rows to import (2500), the delimiter (\t). The
last parameter indicates that at least 1474 rows need to be read, before before the function can safely
auto-deretmine the datatype of the columns.
1 > file <- "voc_craftsmen.tsv"
2 > monetdb.read.csv(conn, file, "craftsmen", 2350, delim="\t", nrow.check=1474)
3 voc_craftsmen.tsv /Users/monetdb/voc_craftsmen.tsv
4 L1
5 1 2348

Now, let’s repeat the same for the other files in the dataset.
1 > file <- "voc_impotenten.tsv"
2 monetdb.read.csv(conn, file, "impotenten", 1000, delim="\t", nrow.check=160)
3 file <- "voc_invoices.tsv"
4 monetdb.read.csv(conn, file, "invoices", 4000, delim="\t", nrow.check=1000)
5 file <- "voc_passengers.tsv"
6 monetdb.read.csv(conn, file, "passengers", 3000, delim="\t", nrow.check=1000)
7 file <- "voc_seafarers.tsv"

11
8 monetdb.read.csv(conn, file, "seafarers", 4500, delim="\t", nrow.check=4500)
9 file <- "voc_soldiers.tsv"
10 monetdb.read.csv(conn, file, "soldiers", 4500, delim="\t", nrow.check=4000)
11 file <- "voc_total.tsv"
12 monetdb.read.csv(conn, file, "total", 2500, delim="\t", nrow.check=2000)
13 file <- "voc_voyages.tsv"
14 monetdb.read.csv(conn, file, "voyages", 8200, delim="\t", nrow.check=5000)

dplyr

Next, let’s install and load dplyr, which is designed to facilitate efficient datasets manipulation in R.
1 > install.packages("dplyr")
2 > library(dplyr)
3 Attaching package: ‘’dplyr
4 The following objects are masked from ‘package:’stats:
5 filter, lag
6 The following objects are masked from ‘package:’base:
7 intersect, setdiff, setequal, union

Make a new connection to the database for dplyr, specifying that the source is a MonetDB instance.
1 > monetdb_conn <- src_monetdb("demo")

Now, lets create non-matirialized views for all tables from the database we are going to need later in
R. Note that none of these will actaully move data from MonetDB into R, rather we are just creating a
reference to the tables.
1 > craftsmen <- tbl(monetdb_conn, "craftsmen")
2 impotenten <- tbl(monetdb_conn, "impotenten")
3 invoices <- tbl(monetdb_conn, "invoices")
4 passengers <- tbl(monetdb_conn, "passengers")
5 seafarers <- tbl(monetdb_conn, "seafarers")
6 soldiers <- tbl(monetdb_conn, "soldiers")
7 total <- tbl(monetdb_conn, "total")
8 voyages <- tbl(monetdb_conn, "voyages")

Let’s check out the data we just loaded. First, let’s see the dimensions and the column names of the
voyages table.
1 > dim(voyages)
2 [1] 8131 22
3 > colnames(voyages)
4 [1] "number" "number_sup" "trip"
5 [4] "trip_sup" "boatname" "master"
6 [7] "tonnage" "type_of_boat" "built"
7 [10] "bought" "hired" "yard"
8 [13] "chamber" "departure_date" "departure_harbour"

12
9 [16] "cape_arrival" "cape_departure" "cape_call"
10 [19] "arrival_date" "arrival_harbour" "next_voyage"
11 [22] "particulars"

As you can see, the respose is actaully an R array.


Since we have more than 8000 rows, let’s have a look at the first few.
1 > head(voyages)

If you are interested in what queries are executed in MonetDB you can turn on the query debug
1 > options(monetdb.debug.query=T)
2 > head(voyages)
3 QQ: 'SELECT "number", "number_sup", "trip", "trip_sup", "boatname", "master", "tonnage",
"type_of_boat", "built", "bought", "hired", "yard", "chamber", "departure_date",
"departure_harbour", "cape_arrival", "cape_departure", "cape_call", "arrival_date",
"arrival_harbour", "next_voyage", "particulars"
4 FROM "voyages"
5 LIMIT 6'
6 QQ: Query result for query 29 with 6 rows and 22 cols, 6 rows.

As you can see, the connector uses the native LIMIT SQL function for the head query.
Disable the debug view for now.
1 > options(monetdb.debug.query=F)

Let’s see how many impotenten we have in the dataset:


1 > summarise(impotenten, count = n())
2
3 count
4 1 938
5 .. ...

It is possible to play with the set in many ways, to find out several things that took place during the
voyages of the ships, or the money that was earned. A few examples are shown below:
1 > invoices_grouped_per_chamber <- group_by(invoices, chamber)
2 > average_invoices_per_chamber <- summarise(invoices_grouped_per_chamber, average = mean(invoice))
3 > arrange(average_invoices_per_chamber, desc(average))
4 Arrange: desc(average)
5
6 chamber average
7 1 A 282996.6
8 2 Z 259300.5
9 3 H 150182.4
10 4 R 149628.3
11 5 D 149522.8
12 6 E 149518.0

13
13 7 NA 83309.0
14 .. ... ...

See also

If you want to lear more about MonetDB.R connector or the dplyr, follow the links below:
• MonetDB.R homepage
• Introduction to dplyr
The MonetDB also supports embedded R queries in the database. This feature was introduced in the
Oct2014 release. Check out this blog post for more information.

About the VOC dataset

The data for the tutorial is published in the book J.R. Bruijn, F.S. Gaastra
and I. Schaar Dutch-Asiatic Shipping in the 17th and 18th Centuries, which gives an account of the trips
made to the East and ships returned safely (or wrecked on the way) by the VOC (or Dutch East India
Company). A total of 8000 records are provided. They include information about ship name and type,
captain, the arrival/departure of harbors along the route, personnel accounts, and anecdotal information.
You can find more about the VOC on Wikipedia.

MonetDB Server

Starting a MonetDB database server is as simple as starting the deamon program monetdbd. At the
time of this writing, monetdbd is only available on UNIX-like platforms. The MonetDB daemon program
encapsulates an arbitrary number of database servers, and takes care of their configuration. The general
concept of the MonetDB daemon program is that it builds upon a given directory, often referred to as the
dbfarm. There can be at most one instance of monetdbd monitoring a dbfarm, but multiple monetdbd
processes can be running on the same system.
The next (optional) step is to create a .monetdb configuration file with information to ease using the
Monetdb client interface, called mclient.

14
1 shell> cat ~/.monetdb
2 user=<yourname>
3 password=<yourpassword>
4 language=<"sql" or "mal">

In the simple case, monetdbd is started by the system through some init script, and databases are
created by the local DBA, and assigned to users. A more flexible approach for users, however, is the
case where a user runs monetdbd itself. Because a dbfarm can be anywhere in the filesystem, and
needs nothing more than just being an ordinary directory, a user can use monetdbd to initialise his/her
own dbfarm in an accessible place. For example:
1 % monetdbd create ~/my-dbfarm
2 % monetdbd get all ~/my-dbfarm
3 property value
4 hostname vomac.ins.cwi.nl
5 dbfarm /Users/fabian/my-dbfarm
6 status no monetdbd is serving this dbfarm
7 mserver unknown (monetdbd not running)
8 logfile merovingian.log
9 pidfile merovingian.pid
10 sockdir /tmp
11 port 50000
12 exittimeout 60
13 forward proxy
14 discovery true
15 discoveryttl 600
16 control no
17 passphrase <unknown>
18 mapisock /tmp/.s.monetdb.50000
19 controlsock /tmp/.s.merovingian.50000

After creation of the personal dbfarm directory, the get command can be used to inspect the dbfarm-
wide settings of the newly created environment. Note that the above output differs per system, per user
and per dbfarm. Important setting is the port to use to open up services for clients. In particular for a
user installation, these often may conflict with another running monetdbd, hence it is wise to change
these to a private port in situations where such conflicts may arise:
1 % monetdbd set port=54321 ~/my-dbfarm
2 % monetdbd get all ~/my-dbfarm
3 property value
4 hostname vomac.ins.cwi.nl
5 dbfarm /Users/fabian/my-dbfarm
6 status no monetdbd is serving this dbfarm
7 mserver unknown (monetdbd not running)
8 logfile merovingian.log
9 pidfile merovingian.pid
10 sockdir /tmp
11 port 54321
12 exittimeout 60

15
13 forward proxy
14 discovery true
15 discoveryttl 600
16 control no
17 passphrase <unknown>
18 mapisock /tmp/.s.monetdb.54321
19 controlsock /tmp/.s.merovingian.54321

Note that control defaults to no. For remote control connections, control has to be set to yes, and a
passphrase has to be set as well. When set, it will show as a hash in get’s output.
When the settings look ok, the next step is to simply start monetdbd:
% monetdbd start ~/my-dbfarm

If the above command returned without any message, the monetdbd process was successfully started.
The logfile (default merovingian.log) should show that the daemon has started, bound some ports, and
discovered itself.
From this part on, monetdbd can be given commands through the monetdb command (without the
trailing ‘d’). The monetdb client utility knows commands much like monetdbd, but it is much more
focussed towards giving information about what is going on inside monetdbd. The monetdb client
utility is the main tool to use to manage the dbfarm.
The dbfarm we created is empty, as such there are no databases inside it. With monetdb we can create
a new database:
1 % monetdb create my-first-db

1 created database in maintenance mode: my-first-db

This simple command created the necessary bits inside the dbfarm for a database called my-first-db. It
is created in maintenance mode, since otherwise anyone could access it before it is properly setup. If
the command gives an access-denied error, most probably another monetdbd is running on the system.
In that case try adding -p54321(the port) as arguments to monetdb in order to tell it which monetdbd to
contact as in the status example. In the status view of monetdb the newly created database will show
up as a locked database:
1 % monetdb -p54321 status

1 name state uptime health


2 my-first-db locked

Albeit the database is locked, it can be started, such that the monetdb superuser can access the database:
1 % monetdb start my-first-db

1 starting database 'my-first-db'... done

1 % monetdb status

16
1 name state uptime health
2 my-first-db locked 18s 100%, 0s

The database remains locked, but the uptime field indicates the database is actuallty running. If setting
up the database is ready, the database can be made available for other users (if any, see MonetDB SQL
tutorial) to connect to it using the release command:
1 % monetdb release my-first-db

1 taken database out of maintenance mode: my-first-db

1 % monetdb status

1 name state uptime health


2 my-first-db running 2m 48s 100%, 0s

Next thing, making a connection to the database using mclient requires the -d argument to specify
which database to connect to. It will ask for username and password when none given or provided via
a .monetdb file. The default username/password is monetdb/monetdb:
1 % mclient -dmy-first-db
2 user(fabian):monetdb
3 password:<monetdb>

1 Welcome to mclient, the MonetDB/SQL interactive terminal (Dec2011-SP1)


2 Database: MonetDB v11.7.7 (Dec2011-SP1), 'mapi:monetdb://vomac.ins.cwi.nl:54321/my-first-db'
3 Type \q to quit, \? for a list of available commands
4 auto commit mode: on
5 sql>

Like for the monetdb command, if this fails with some database does not exist error, try giving the port
(not the controlport!) to mclient such that it finds the correct monetdbd, e.g.:
% mclient -p54321 -dmy-first-db

It may be necessary to stop individual databases, or even make them unavailable. The monetdb com-
mands stopand lockcan perform these tasks. To completely stop the monetdbd daemon process for a
dbfarm, stop it using the stop command of monetdbd:
1 % monetdbd stop ~/my-dbfarm

This should stop the monetdbd and all the databases that are started inside it. To verify that monetdbd
has indeed stopped, review the end of the merovingian.log file. It should report all databases being shut
down, and the deamon shutting down.
• mclient man-page

• monetdb man-page

• monetdbd man-page

17
• mserver5 man-page

• msqldump man-page

• Readonly database

mclient man-page

NAME

mclient � the MonetDB command-line tool

SYNOPSIS

mclient [ options ] [ file or database [ file … ] ] mclient ��help

DESCRIPTION

MonetDB is a database management system that is developed from a main-memory perspective with
use of a fully decomposed storage model, automatic index management, extensibility of data types and
search accelerators, and a SQL frontend.
Mclient is the command-line interface to the MonetDB server.
If the ��statement=query (�s query) option is given, the query is executed. If any files are listed after the
options, queries are read from the files and executed. The special filename � refers to standard input.
Note that if there is both a ��statement option and filename arguments, the query given with ��state-
ment is executed first. If no ��statement option is given and no files are specified on the command line,
mclient reads queries from standard input.
When reading from standard input, if standard input is a terminal or if the ��interactive (�i) option is
given, mclient interprets lines starting with **** (backslash) specially. See the section BACKSLASH
COMMANDS below.
Before mclient starts parsing command line options, it reads a .monetdb file. If the environment variable
DOTMONETDBFILE is set, it reads the file pointed to by that variable instead. When unset, mclient
searches for a .monetdb file in the current working directory, and if that doesn’t exist, in the current
user’s home directory. This file can contain defaults for the flags user, password, language, database,
save_history, format, and width. For example, an entry in a .monetdb file that sets the default language
for mclient to mal looks like this: language=mal. To disable reading the .monetdb file, set the variable
DOTMONETDBFILE to the empty string in the environment.

18
OPTIONS

General Options ��help (�?)


Print usage information and exit.
��version (�v)
Print version information and exit.
��encoding=_encoding_ (�E encoding)
Specify the character encoding of the input. The option applies to both the standard input of mclient
and to the argument of the ��statement (�s) option but not to the contents of files specified on the
command line (except for � which refers to standard input) or files specified using the \< command
(those must be encoded using UTF-8). The default encoding is taken from the locale.
��language=_language_ (�l language)
Specify the query language. The following languages are recognized: mal and sql. A unique prefix
suffices. When the ��language option is omitted, the default of sql is assumed.
��database=_database_ (�d database)
Specify the name or URI of the database to connect to. The �d can be omitted if an equally named file
does not exist in the current directory. As such, the first non-option argument will be interpreted as
database to connect to if the argument does not exist as file. Valid URIs are as returned by ‘monetdb
discover‘, see monetdb(1), and look like mapi:monetdb://_hostname_:_port_/_database_.
��host=_hostname_ (�h hostname)
Specify the name of the host on which the server runs (default: localhost). When the argument starts
with a forward slash (/), host is assumed to be the directory where the UNIX sockets are stored for
platforms where these are supported.
��port=_portnr_ (�p portnr)
Specify the portnumber of the server (default: 50000).
��interactive (�i)
When reading from standard input, interpret lines starting with **** (backslash) specially. See the
section BACKSLASH COMMANDS below.
��timer=_timermode_ (�t timermode)
The timer command controls the format of the time reported for queries. The default mode is none
which turns off timing reporting. The timer mode clock reports the client-side wall-clock time (“clk”)
in a human-friendly format. The timer mode performance reports client-side wall-clock time (“clk”)
as well as detailed server-side timings, all in milliseconds (ms): the time to parse the SQL query, opti-
mize the logical relational plan and create the initial physical (MAL) plan (“sql”); the time to optimize
the physical (MAL) plan (“opt”); the time to execute the physical (MAL) plan (“run”). All timings are
reported on stderr.**
Note** that the client-measured wall-clock time is reported per query only when options ��interactive

19
or ��echo are used, because only then does mclient send individual lines (statements) of the SQL script
to the server. Otherwise, when mclient sends the SQL script in large(r) batch(es), only the total wall-
clock time per batch is measured and reported. The server-measured detailed performance timings are
always measured and reported per query.
��user=_user_ (�u user)
Specify the user to connect as. If this flag is absent, the client will ask for a user name, unless a default
was found in .monetdb file.
��format=_format_ (�f format)
Specify the output format. The possible values are sql, expanded, x, csv, tab, raw, xml, and trash. csv
is comma-separated values, tab is tab-separated values, raw is no special formatting (data is dumped
the way the server sends it to the client), sql is a pretty format which is meant for human consumption
where columns are clearly shown, expanded and x are synonyms and are another pretty format meant
for human consumption where column values are printed in full and below each other, xml is a valid (in
the XML sense) document, and trash does not render any output, enabling performance measurements
free of any output rendering/serialization costs. In addition to plain csv, two other forms are possible.
csv=_c_ uses c as column separator; csv+_c_ uses c as column separator and produces a single header
line in addition to the data.
��echo (�e)
Echo the query. Note that using this option slows down processing.
��history (�H)
Load and save the command line history (default off).
��log=_logfile_ (�L logfile)
Save client/server interaction in the specified file.
��statement=_stmt_ (�s stmt)
Execute the specified query. The query is run before any queries from files specified on the command
line are run.
��timezone (�z)
Do not tell the client’s timezone to the server.
��Xdebug (�X)
Trace network interaction between mclient and the server.
��pager=_cmd_ (�| cmd)
Send query output through the specified cmd. One cmd is started for each query. Note that the | will
have to be quoted or else the shell will interpret it.
SQL Options
��null=_nullstr_ (�n nullstr)

20
Set the string to be used as NULL representation when using the sql, csv, or tab output formats. If not
used, NULL values are represented by the string “null” in the sql output format, and as the empty string
in the csv and tab output formats. Note that an argument is required, so in order to use the empty
string, use �n ”” (with the space) or ��null=.
��autocommit (�a)
Switch autocommit mode off. By default, autocommit mode is on.
��rows=_nr_ (�r nr)
If specified, query results will be paged by an internal pager at the specified number of lines.
��width=_nr_ (�w nr)
Specify the width of the screen. The default is the (initial) width of the terminal.
��dump (�D)
Create an SQL dump.
��inserts (�N)
Use INSERT INTO statements instead of COPY INTO + CSV values when dumping the data of a ta-
ble. This option can be used when trying to load data from MonetDB into another database, or when
e.g. JDBC applications are used to reload the dump.

BACKSLASH COMMANDS

General Commands
\?
Show a help message explaining the backslash commands.
\q
Exit mclient.
\< file
Read input from the named file.
\ file
Write output to the named file. If no file is specified, write to standard output.
\| command
Pipe output to the given command. Each query is piped to a new invocation of the command. If no
command is given, revert to writing output to standard output.
\h
Show the readline(3) history.

21
\L file
Log client/server interaction in the given file. If no file is specified, stop logging information.
\X
Trace what mclient is doing. This is mostly for debugging purposes.
\e
Echo the query in SQL formatting mode.
\f format
Use the specified format mode to format the output. Possible modes the same as for the ��format (�f)
option.
\w width
Set the maximum page width for rendering in the sql formatting mode. If width is �1, the page width is
unlimited, when width is 0, use the terminal width. If width is greater than 0, use the given width.
\r rows
Use an internal pager using rows per page. If rows is �1, stop using the internal pager.
SQL Commands
\D
Dump the complete database. This is equivalent to using the program msqldump(1).
\D table
Dump the given table.
\d
Alias for \dvt.
\d[Stvsfn]+
List database objects of the given type. Multiple type specifiers can be used at the same time. The spec-
ifiers S, t, v, s, f and n stand for System, table, view, sequence, function and schema respectively. Note
that S simply switches on viewing system catalog objects, which is orthogonal to the other specifiers.
\d[Stvsfn]+ object
Describe the given object in the database using SQL statements that reconstruct the object. The same
specifiers as above can be used, following the same rules. When no specifiers are given, vt is assumed.
The object can be given with or without a schema, separated by a dot. The object name can contain the
wildcard characters ***** and **** that represent zero or more, and exactly one character respectively.
An object name is converted to lowercase, unless the object name is quoted by double quotes (”). Ex-
amples of this, are e.g. *.mytable_, _tabletype*, or “myschema.FOO”_. Note that wildcard characters
do not work in quoted objects. Quoting follows SQL quoting rules. Arbitrary parts can be quoted, and
two quotes following each other in a quoted string represent the quote itself.

22
\A
Enable auto commit mode.
\a
Disable auto commit mode.

EXAMPLES

Efficiently import data from a CSV (comma-separated values) file into a table. The file must be readable
by the server. $file is the absolute path name of the file, $table is the name of the table, $db is the
name of the database.
1 mclient −d $db −s "COPY INTO $table FROM ’’$file USING DELIMITERS ’’’,,\\\’n’,\\"’"

Efficiently import data from a CSV file into a table when the file is to be read by mclient (e.g. the server
has no access to the file). $file is the (absolute or relative) path name of the file, $table is the name
of the table, $db is the name of the database.
1 mclient −d $db −s "COPY INTO $table FROM STDIN USING DELIMITERS ’’’,,\\\’n’,\\"’" − < $file

Note that in this latter case, if a count of records is supplied, it should be at least as large as the number
of records actually present in the CSV file. This, because otherwise the remainder of the file will be
interpreted as SQL queries.
See https://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto for more information
about the COPY INTO query.

SEE ALSO

msqldump(1), mserver5(1)

monetdb man-page

NAME

monetdb � control a MonetDB Database Server instance

SYNOPSIS

monetdb [ monetdb_options ] command [ command_options ] [ command_args ]

23
DESCRIPTION

Monetdb allows an administrator of the MonetDB Database Server to perform various operations on
the databases in the server. It relies on monetdbd(1) running in the background for all operations.

OPTIONS

Monetdb_options affect all commands and control the general behavior of monetdb.
�q
Suppresses all standard progress messages, only writing output to stderr if an error occurred.
�h hostname
Connect to hostname instead of attempting a connection over the local UNIX socket. This allows mon-
etdb to connect to a remote monetdbd(1). The use of this option requires �P (see below).
�p port
Connects to the given portnumber instead of the default (50000). Requires �h to be given as option too.
�P passphrase
Specifies the passphrase necessary to login to a remote monetdbd(1). This option requires �h to be
given as well. A bad passphrase causes monetdb to fail to login, and hence fail to perform any remote
action.
�v
Show version, equal to monetdb version.

COMMANDS

The commands for the monetdb utility are create, destroy, lock, release, status, start, stop, kill,
profilerstart, profilerstop, set, get, inherit, discover, help, and version. The commands facilitate
adding, removing, maintaining, starting and stopping a database inside the MonetDB Database Server.
For all commands, database arguments can be glob-like expressions. This allows to do wildcard matches.
For details on the syntax, see EXPRESSIONS.**
create** [�m pattern] [�p password] database [_database_ …]
Initializes a new database in the MonetDB Database Server. A database created with this command
makes it available under its database name, but not yet for use by clients, as the database is put into
maintenance mode. This allows the database administrator to perform initialization steps before releas-
ing it to users, unless the �p argument is supplied. See also monetdb lock. The name of the database
must match the expression [A�Za�z0�9�_]+.**
�m** pattern

24
With the �m flag, instead of creating a database, a multiplex-funnel is created. See section MULTIPLEX-
FUNNEL in monetdbd(1). The pattern argument is not fully the same as a pattern for connecting or
discovery. Each parallel target for the multiplex-funnel is given as username+_password_**@**pattern
sequence, separated by commas. Here the pattern is an ordinary pattern as would be used for connect-
ing to a database, and can hence also be just the name of a database.
�p password
The �p flag allows to create a database with the given password for the monetdb user. Since this protects
the database from being accessed via well-known credentials, the created database is not locked after
creation. This way, a new database can be created and used right away using the password supplied.
destroy [�f] database [database …]
Removes the given database, including all its data and logfiles. Once destroy has completed, all data is
lost. Be careful when using this command.
�f
By default, a confirmation question is asked, however the �f option, when provided, suppresses this
question and removal is executed right away. Note that without this option you cannot destroy a running
database, bring it down first using the stop command.
lock database [database …]
Puts the given database in maintenance mode. A database under maintenance can only be connected to
by an administrator account (by default the monetdb account). A database which is under maintenance
is not started automatically by monetdbd(1), the MonetDB Database Server, when clients request for
it. Use the release command to bring the database back for normal usage. To start a database which
is under maintenance for administrator access, the start command can be used.
release database [database …]
Brings back a database from maintenance mode. A released database is available again for normal use
by any client, and is started on demand. Use the lock command to take a database under maintenance.
status [�lc] [�s states] [database …]
Shows the state of the given database, or, when none given, all known databases. Three modes control
the level of detail in the displayed output. By default a condensed one-line output per database format
is used. This output resembles pretty much the output of various xxxstat programs, and is ideal for
quickly gaining an overview of the system state. The output is divided into four columns, name, state,
health, and remarks. The state column contains two characters that identify the state of the database,
based on Booting (starting up), Running, Stopped, Crashed and Locked (under maintenance). This is
followed by the uptime when running. The health column contains the percentage of successful starts
and stops, followed by the average uptime. The remarks column can contain arbitrary information
about the database state, but usually contains the URI the database can be connected to.
�c
The �c flag shows the most used properties of a database. This includes the state of the database
(running, crashed, stopped), whether it is under maintenance or not, the crash averages and uptime

25
statistics. The crash average is the number of times the database has crashed over the last 1, 15 or 30
starts. The lower the average, the healthier the database is.
�l
Triggered by the �l flag, a long listing is used. This listing spans many rows with on each row one
property and its value separated by a colon (:). The long listing includes all information that is available.
�s
The �s flag controls which databases are being shown, matching their state. The required argument to
this flag can be a combination of any of the following characters. Note that the order in which they are
put also controls the order in which the databases are printed. b, r, s, c, and l are used to print a starting
up (booting), started (running), stopped, crashed and locked database respectively. The default order
which is used when the �s flag is absent, is rbscl.
start [�a] database [database …]**
stop** [�a] database [database …]**
kill** [�a] database [database …]
Starts, stops or kills the given database, or, when �a is supplied, all known databases. The kill com-
mand immediately sends a SIGKILL and should only be used as last resort for a database that doesn’t
respond any more. Killing a database may result in (partial) data loss. It is more common to use the stop
command to stop a database. It will first attempt to stop the database, waiting for mero_exittimeout
seconds and if that fails, kill the database. When using the start command, monetdb(1) will output diag-
nostic messages if the requested action failed. When encountering an error, one should always consult
the logfile of monetdbd(1) for more details. For the kill command a diagnostic message indicating the
database has crashed is always emitted, due to the nature of that command. Note that in combination
with �a the return code of monetdb(1) indicates failure if one of the databases had a failure, even though
the operation on other databases was successful.
profilerstart database [database …]**
profilerstop** database [database …]
Starts or stops the collection of profiling logs for the given database. The property profilerlogpath must
be set for the given database, and it should point to a directory where the logs will be gathered. The
filenames of the logs have the format: proflog<database_YYYY-MM-DD_HH-MM-SS.json_ where the
last part is the date and time when the collection started. Please note that a file recording the pid of the
profiler is written in the log directory, therefore each database needs to have a different profilerlogpath
value.
get <all | property[,property[,..]] [database …]
Prints the requested properties, or all known properties, for the given database. For each property its
source and value are printed. Source indicates where the current value comes from, e.g. the configura-
tion file, or a local override.
set property=_value database_ [database …]
Sets property to value for the given database. For a list of properties, run monetdb get all. Most
properties require the database to be stopped when set.**
shared=<yes|no**|tag

26
Defines if and how the database is being announced to other monetdbds or not. If not set to yes or no
the database is simply announced or not. Using a string, called tag the database is shared using that
tag, allowing for more sophisticated usage. For information about the tag format and use, see section
REMOTE DATABASES in the monetdbd(1) manpage. Note that this property can be set for a running
database, and that a change takes immediate effect in the network.
nthreads=_number_
Defines how many worker threads the server should use to perform main processing. Normally, this
number equals the number of available CPU cores in the system. Reducing this number forces the
server to use less parallelism when executing queries, or none at all if set to 1.
optpipe=_string_
Each server operates with a given optimizer pipeline. While the default usually is the best setting,
for some experimental uses the pipeline can be changed. See the mserver5(1) manpage for available
pipelines. Changing this setting is discouraged at all times.
readonly=<yes|no
Defines if the database has to be started in readonly mode. Updates are rejected in this mode, and the
server employs some read-only optimizations that can lead to improved performance.
nclients=_number_
Sets the maximum amount of clients that can connect to this database at the same time. Setting this
to a high value is discouraged. A multiplex-funnel may be more performant, see MULTIPLEX-FUNNEL
below.
inherit property database [database …]
Like set, but unsets the database-local value, and reverts to inherit from the default again.
discover [expression]
Returns a list of remote monetdbds and database URIs that were discovered by monetdbd(1). All
databases listed can be connected to via the local MonetDB Database Server as if it were local databas-
es using their database name. The connection is redirected or proxied based on configuration set-
tings. If expression is given, only those discovered databases are returned for which their URI match-
es the expression. The expression syntax is described in the section EXPRESSIONS. Next to database
URIs the hostnames and ports for monetdbds that allow to be controlled remotely can be found in
the discover list masked with an asterisk. These entries can easily be filtered out using an expres-
sion (e.g. ”mapi:monetdb:*”) if desired. The control entries come in handy when one wants to get an
overview of available monetdbds in e.g. a local cluster. Note that for monetdbd to announce its control
port, the mero_controlport setting for that monetdbd must be enabled in the configuration file.
�h
help [command]
Shows general help, or short help for a given command.
�v

27
version
Shows the version of the monetdb utility.

EXPRESSIONS

For various options, typically database names, expressions can be used. These expressions are limited
shell-globbing like, where the * in any position is expanded to an arbitrary string. The * can occur
multiple times in the expression, allowing for more advanced matches. Note that the empty string also
matches the , hence “de*mo” can return “demo” as match. To match the literal ’*’ character, one has
to escape it using a backslash, e.g. ”\”.

RETURN VALUE

The monetdb utility returns exit code 0 if it successfully performed the requested command. An error
caused by user input or database state is indicated by exit code 1. If an internal error in the utility occurs,
exit code 2 is returned.

SEE ALSO

monetdbd(1), mserver5(1)

monetdbd man-page

NAME

monetdbd � the MonetDB Database Server daemon

SYNOPSIS

monetdbd command [ command_args ] dbfarm

DESCRIPTION

Monetdbd is the MonetDB Database Server daemon. The program is mainly meant to be used as
daemon, but it also allows to setup and change the configuration of a dbfarm. The use of monetdbd is
either as user-oriented way to configure, start and stop a database farm, or to be started from a startup
script, such as from /etc/init.d/ on Linux systems or smf(5) on Solaris systems, as part of a system
startup.

28
Monetdbd is the system formerly known as merovingian. It was renamed to monetdbd since the name
merovingian proved to be confusing to most regular end-users. Internally, monetdbd uses the name
merovingian at many places for historical reasons.
A monetdbd instance manages one local cluster based, which is a directory in the system, referred to
as the dbfarm. Nowadays, the dbfarm location always has to be given as argument to monetdbd.
Within its local cluster monetdbd takes care of starting up databases when necessary, and stopping them
either upon request via monetdb(1) or when being shut down. Client database connections are made
against monetdbd initially which redirects or proxies the client to the appropriate database process,
started on the fly when necessary.
When started, monetdbd runs by default in the background, sending log messages to merovingian.log,
until being sent a stop, terminate or interrupt signal, possibly using the stop command of monetdbd.
monetdbd uses a neighbor discovery scheme to detect other monetdbd processes running in the local
network. Databases from those remote instances are made available to a locally connecting client.
Remote databases never override local databases, and their availability is controlled by the remote
monetdbd process. See also the sharing capabilities of monetdb(1) and the REMOTE DATABASES section
below.

COMMANDS

The commands for monetdbd are create, start, stop, get, set, version, and help. The commands
facilitate initializing a dbfarm, starting and stopping the MonetDB Database Server, and retrieving or
setting options.**
create dbfarm**
Initializes a new database farm, such that a MonetDB Database Server can be started on that location.
All necessary directories are attempted to be created, and an initial properties file is created in the
directory itself. dbfarm must be a location addressable in the local filesystem hierarchy.
start [�n] dbfarm
Starts monetdbd, the MonetDB Database Server, on the given dbfarm. When the �n flag is given,
monetdbd will not fork into the background, but instead remain attached to the calling environment,
until given a stop signal.
stop dbfarm
Sends a stop signal to the monetdbd process responsible for the given dbfarm.
get <all | property[,property[,..]] dbfarm
Prints the requested properties, or all known properties, for the given dbfarm. For each property, its
value is printed. Some properties are virtual, and given for information purposes only, they cannot be
modified using the set command.
set property=_value dbfarm_

29
Sets property to value for the given database. For a list of properties, run monetdbd get all. Some
properties require a restart of the MonetDB Database Server in order to take effect. The set command,
will however always write the property, and tell the running monetdbd to reload the properties file (if
running). For an explanation of the properties, see the CONFIGURATION section below.

CONFIGURATION

Monetdbd reads its properties from the .merovingian_properties file inside the dbfarm. This file is
created by the create command. This file is not meant to be edited manually, instead it should be
updated using the set command. The following properties can be set:**
logfile**
This property points to the file where all log messages are written to. It is relative to the dbfarm directory,
but can be absolute to point to e.g. another medium. Changing this property takes effect immediately
at runtime.
pidfile
Monetdbd stores the process ID of the background server in the file pointed to by this property. The
same rules apply as for the logfile property.
sockdir
For faster access, monetdbd uses UNIX domain sockets for its control mechanism and regular database
connections. The sockets are placed as files in the filesystem hierarchy. The sockdir property controls
in which directory they are placed. In general this setting should not be changed.
port
This property specifies which TCP port monetdbd should listen to for connection requests. Defaults to
50000.
listenaddr
This property specifies an address that is allowed to connect to the server. The user can specify one
IP(v6) address, or use the 0.0.0.0 notation to allow connections from everywhere. Defaults to localhost
(127.0.0.1).
control
For remote management of monetdbd, the control property specifies whether or not to enable remote
management. Note that for remote management, a passphrase is required, see below. It defaults to
false for security reasons. Changing this property takes effect immediately at runtime.
passphrase
To control monetdbd from a remote machine, a passphrase is necessary, to be given to monetd-
b(1). The passphrase can be either given as hashed value prefixed by the hash type in curly braces
(e.g. {SHA512}xxx…) or as plain text value which will be hashed automatically. Note that the only hash
accepted is the one specified at configure time, which is SHA512. Changing this property takes effect
immediately at runtime.

30
discovery
Specifies whether neighbor discovery is to be enabled using UDP broadcasts or not. The broadcasts
are done on the same portnumber as the port setting.
discoveryttl
Monetdbd publishes locally available databases to others periodically. The interval used here, defined in
seconds, depends on the time-to-live of the databases before they need to get refreshed. The default is
600 seconds (10 minutes), which should keep traffic in your network fairly low. Additions and removals
are processed immediately regardless of this timeout. If you are in a network environment where
physical network links disappear often, you may want to decrease this value to more quickly remove
no longer reachable databases.
exittimeout
mservers that were started by the MonetDB Database Server are shut down when monetdbd is shut
down. Setting the exittimeout property to a positive non-zero value will shut down each running
mserver with the given time-out in seconds. If the time-out expires, the mserver process is killed using
the SIGKILL signal. A time-out value of 0 means no mservers will be shut down, and hence they will
continue to run after monetdbd has shut down. Note that this particular configuration is extremely
inconvenient. The default time-out is 60 seconds. If your databases are rather large and find your
databases consistently being killed by monetdbd upon shutdown, you may want to increase this time-
out. Changing this property takes effect immediately at runtime.
forward
monetdbd has two ways in which it can “attach” a connecting client to the target database. The first
method, redirect, uses a redirect sent to the client with the responsible mserver process. The second
method, proxy, proxies the client to the mserver over monetdbd. While redirect is more efficient,
it requires the connecting client to be able to connect to the mserver. In many settings this may be
undesirable or even impossible, since a wide range of open ports and routing are necessary for this. In
such case the proxy technique of monetdbd is a good solution, which also allows a monetdbd instance
on the border of a network to serve requests to nodes in the local (unreachable) network. Note that
for local databases, the proxy method uses a UNIX domain socket feature to pass file-descriptors to the
local mserver. This effectively is as efficient as the redirect approach, but still hides away the mservers
properly behind monetdbd. Hence, in practice it is only relevant for connections to remote databases
to use redirects instead of proxies. Changing this property takes effect immediately at runtime.

REMOTE DATABASES

The neighbor discovery capabilities of monetdbd allow a user to contact a remote database transparently,
as if it were a local database. By default, all local databases are announced in the network, such that
neighbors can pick them up to make them available for their local users. This feature can be disabled
globally, or on database level. For the latter, the monetdb(1) utility can be used to change the share
property of a database.
While neighbor discovery in itself is sufficient to locate a database in a cluster, it is limited in expres-
siveness. For instance, database names are assumed to be unique throughout the entire system. This

31
means local databases overshadow remote ones, and duplicate remote entries cannot be distinguished.
To compensate for this, monetdbd allows to adds a tag to each database that is being shared. This tag
is sent in addition to the database name, and only understood by other _monetdbd_s.
Tags are arbitrary ASCII-strings matching the pattern [A�Za�z0�9./]+. There are no assumed semantics in
the tag, which allows for multiple approaches when using the tag. The tag is always used in combination
with the database name. For this, the ‘/’ character is used as separator, which hence suggests the user
to use that character as separator for multilevel tags. Monetdbd allows common path globbing using
‘*’ on tags, which allows for many use-cases. Consider for instance the following three databases with
their tag:
dbX/master/tableQ dbY/slave/tableQ dbZ/slave/tableQ
A default match has implicit ‘/*’ added to the search, making more generic search strings match more
specific ones. Hence, a connect with database dbX is the same as dbX/_ and hence matches dbX/mas-
ter/tableQ. Similar, a database connect for /master_ matches the same database as before. Note that the
implicit ‘/’ is not added if that would cause no matches, such as for /master/tableQ which would return
all masters for tableQ, which in our hypothetical example is only dbX. In contrast, a database connect
for _*/slave/tableQ_ matches with either dbY or dbZ. Monetdbd returns the two options to the client
in a round-robin fashion, such that subsequent connects for the same pattern result in a load-balanced
connect to either of both databases.
With tags in use, one can possibly make distinction between databases, if setup like that. The previous
example could hence also be setup like this:
tableQ/master tableQ/slave tableQ/slave
Connecting to tableQ/slave would now return either of both databases even though they are not unique
(apart from the host they are located on, which is not shown in the example). While being confusing for
humans, for monetdbd it is the same situation as in the previous example. However, because globbing
allows to make things easier to understand, tags for both slaves could be changed to slaveX or slave/X
and use the necessary pattern to match them. It is up to the user to decide how to use the tags.

MULTIPLEX-FUNNELS

Monetdbd implements multiplex-funnel capabilities. As the name suggests two techniques are com-
bined, the multiplexer and the funnel.
The funnel capability limits the access to the database to one client at a time. That is, if multiple clients
connect to the funnel, their queries will be serialized such that they are executed one after the other.
An effect of this approach is that clients no longer have an exclusive channel to the database, which
means that individual queries from one client may have been interleaved with queries from others. This
most notably makes SQL transaction blocks unreliable with a funnel. The funnel, hence, is meant to
scale down a large amount of clients that perform short-running (read-only) queries, as typically seen
in web-based query loads.
When a funnel is defined to use multiple databases, the funnel adds a multiplexer to its query channel. A
multiplex-funnel sends each query to all of the defined databases. This behavior can be quite confusing
at first, but proves to be useful in typical sharding configurations, where in particular simple selection

32
queries have to be performed on each of the shards. The multiplexer combines the answers from
all defined databases in one single answer that it sends back to the client. However, this combining
is without any smart logic, that is, the multiplexer does not evaluate the query it is running, but just
combines all answers it receives from the databases. This results in e.g. as many return tuples for a
**SELECT COUNT(*)** query, as there are databases defined.
Due to the two above mentioned characteristics, a multiplex-funnel has some limitations. As mentioned
before, transactions over multiple queries are likely not to result in the desired behavior. This is due to
each query to the funnel is required to be self-contained. Further, since for each query, the results from
multiple servers have to be combined into one, that query must only return a single response, i.e. multi-
statement queries are most likely causing the funnel to respond with an error, or return garbled results.
Last, the size of each query is limited to currently about 80K. While this size should be sufficient for
most queries, it is likely not enough for e.g. COPY INTO statements. Apart from the data transfer
implications, such statements should not be used with the funnel, as the results will be undefined due
to the limited query buffer. Applications using the funnel should aim for short and single-statement
queries that require no transactions.
See the create command in the monetdb(1) man-page for details on how to setup a multiplex-funnel.

SIGNALS

Monetdbd acts upon a number of signals as is common for a daemon.** SIGINT, SIGTERM, SIGQUIT**
Any of these signals make monetdbd enter the shutdown sequence. This sequence involves cleanly
shutting down listener sockets, shutting down all started databases and finally terminating itself.
SIGHUP
When this signal is received by monetdbd it will reopen the logfile as pointed to by the logfile setting.
Before it reopens the logfile, it will re-read the properties file from the dbfarm, which might result in
opening a different file to continue logging.

RETURN VALUE

Monetdbd returns exit code 0 if it was able to successfully perform the requested action, e.g. start, stop,
etc. When an error occurs during the action, that prevents monetdbd from successfully performing the
action, the exit code 1 is returned.

SEE ALSO

monetdb(1), mserver5(1)

33
mserver5 man-page

NAME

mserver5 � the MonetDB server version 5

SYNOPSIS

mserver5 [ options ]

DESCRIPTION

Mserver5 is the current MonetDB server that performs all processing on request of clients for a certain
database.
Note that while mserver5 is the process that does the actual work, it is usually more common to start,
monitor and connect to the mserver5 process through monetdbd(1).
This man-page describes the options that mserver5 understands. It is intended for people who really
need to work with mserver5 itself. In regular cases, the programs monetdbd(1) and monetdb(1) control
the many options, and allow to adjust them to appropriate values where sensible. For normal usage, it
is preferred to apply any configuration through these programs.

OPERATION

When the build-time configuration did not disable this, a mserver5 process presents the user with a
console prompt. On this prompt, MAL commands can be executed. The architecture is setup to handle
multiple streams of requests. The first thread started represents the server, which is the console prompt,
reading from standard input and writing to standard output.
The server thread started remains in existence until all other threads die. The server is stopped by Ctrl-D
on its console, typing quit() or by sending it a termination signal (SIGINT, SIGTERM, SIGQUIT).

MSERVER5 OPTIONS

Mserver5 can be started with options and scripts as arguments. The MAL scripts will be executed directly
after startup on the console, which eases e.g. testing of MAL scripts directly, without starting a client.**
��dbpath=**path
Path where mserver5 should find a database. Shorthand for option gdk_dbpath. Default value: ${pre-
fix}/var/monetdb5/dbfarm/demo.
��dbextra=_path_
Path where mserver5 should store transient data. Default value is the value of the ��dbpath option.

34
��dbinit=_stmt_
MAL statement to execute as part of the startup of the server.
��config=_file_
Config file to read options from. This file can contain all options as can be set with the –set flag. See
CONFIG FILE FORMAT.
��daemon=yes
��daemon=no
Disable the console prompt, do not read commands from standard input. Default: no.
��set option=_value_
Set individual configuration option. For possible options, see PARAMETERS sections.
��help
Print list of options.
��version
Print version and compile configuration.
��trace
Print information about MAL instructions as they are executed.

GDK PARAMETERS

GDK (Goblin Database Kernel) is the current columnar storage kernel engine of the MonetDB 5 database.
It is the component that manages and performs operations on BATs (Binary Association Tables), single
columns. The parameters here affect the behaviour of GDK which may nagatively impact performance if
set wrongly. The kernel tries to choose the values for optimal performance. Changing these parameters
is discouraged.**
gdk_vmtrim**
Enable or disable the vmtrim thread which tries to unload memory that is not in use. Default: yes on
32 bit platforms, no on 64 bit platforms
gdk_debug
You can enable debug output for specific kernel operations. By default debug is switched off for obvious
reasons. The value of gdk_debug is an integer, which value can be (a combination of):
1 1 = THRDMASK = thread-specific debug output
2 2 = CHECKMASK = property enforcing on new BATs
3 4 = MEMMASK = memory allocation
4 8 = PROPMASK = property checking on all values:
5 tells about wrongly set properties
6 16 = IOMASK = major IO activity
7 32 = BATMASK = BAT handling

35
8 128 = PARMASK = Thread management
9 256 = HEADLESSMASK = Warn about BAT heads that are not "headless-ready"
10 512 = TMMASK = Transaction management
11 1024 = TEMMASK = Locks and Triggers
12 4096 = PERFMASK = BBP Performance (?)
13 8192 = DELTAMASK = Delta debugging (?)
14 16384 = LOADMASK = Module loading

2097152 = ALGOMASK = show join/select algorithm chosen 4194304 = ESTIMASK = show result size
estimations (for join, select) 16777216 = NOSYNCMASK = disable forcefully synchronizing files to disk.
If this flag is set, there is no guarantee that the database remains consistent after a crash. DO NOT USE
(unless you really don’t care about your data). 33554432 = DEADBEEFMASK = disable “cleaning”
of freed memory in GDKfree() (e.g., for performance measurements) 67108864 = ALLOCMASK =
exhaustive GDK malloc & free tracing for debugging (GDK developers, only) 134217728 = OPTMASK
= trace the actions, decisions and effects of MAL optimizers 268435456 = HEAPMASK = trace/debug
HEAPextend; used only for development & debugging 536870912 = FORCEMITOMASK = forcefully
activate mitosis even on small tables, i.e., split small tables in as many (tiny) pieces as there are cores
(threads) available; this allows us to test mitosis functionality without requiring large data sets (— at
the expense of a potentially significant interpretation overhead for unnecessarily large plans); used only
for development & testing; set automatically by Mtest.py
Note that mserver5 recognizes a series of command line options that sets one or more of these debug
flags as well:
��threads (THRDMASK | PARMASK) ��memory (MEMMASK | ALLOCMASK) ��properties (CHECKMASK
| PROPMASK | BATMASK) ��io (IOMASK | PERFMASK) ��heaps (HEAPMASK) ��transactions (TMMASK
| DELTAMASK | TEMMASK) ��modules (LOADMASK) ��algorithms (ALGOMASK | ESTIMASK) ��perfor-
mance (DEADBEEFMASK) ��optimizers (OPTMASK) ��forcemito (FORCEMITOMASK | NOSYNCMASK)
Default: 0.

MSERVER5 PARAMETERS

Mserver5 instructs the GDK kernel through the MAL (MonetDB Assembler Language) language. Monet-
DB 5 contains an extensive optimiser framework to transform MAL plans into more optimal or functional
(e.g. distributed) plans. These parameters control behaviour on the MAL level.**
monet_vault_key**
The authorisation tables inside mserver5 can be encrypted with a key, such that reading the BATs does
not directly disclose any credentials. The monet_vault_key setting points to a file that stores a secret
key to unlock the password vault. It can contain anything. The file is read up to the first null-byte (’ ’),
hence it can be padded to any length with trailing null-bytes to obfuscate the key length. Generating
a key can be done for example by using a tool such as pwgen and adding a few of the passwords
generated. Make sure not to choose a too small key. Note that on absence of a vault key file, some
default key is used to encrypt the authorisation tables. Changing this setting (effectively changing the
key) for an existing database makes that database unusable as noone is any longer able to login. If you
use monetdbd(1), a per-database vault key is set.

36
max_clients
Controls how many client slots are allocated for clients to connect. This settings limits the maximum
number of connected clients at the same time. Note that MonetDB is not designed to handle mas-
sive amounts of connected clients. The funnel capability from monetdbd(1) might be a more suitable
solution for such workloads.
Default 64.

SQL PARAMETERS

The SQL component of MonetDB 5 runs on top of the MAL environment. It has its own SQL-level
specific settings.**
sql_debug**
Enable debugging using a mask. This option should normally be disabled (0). Default: 0.
sql_optimizer
The default SQL optimizer pipeline can be set per server. See the optpipe setting in monetdb(1) when
using monetdbd. During SQL initialization, the optimizer pipeline is checked against the dependency
information maintained in the optimizer library to ensure there are no conflicts and at least the pre-
requisite optimizers are used. The setting of sql_optimizer can be either the list of optimizers to run,
or one or more variables containing the optimizer pipeline to run. The latter is provided for readability
purposes only. Default: default_pipe.
The following are possible pipes to use:**
minimal_pipe**
The minimal pipeline necessary by the server to operate correctly. minimal_pipe=inline,remap,deadcode,multiplex,gene
default_pipe
The default pipeline contains the mitosis-mergetable-reorder optimizers, aimed at large tables and im-
proved access locality. default_pipe=inline,remap,costModel,coercions,evaluate,emptybind,pushselect,aliases,mitosis
no_mitosis_pipe
The no_mitosis pipeline is identical to the default pipeline, except that optimizer mitosis is omitted. It is
used mainly to make some tests work deterministically, and to check/debug whether “unexpected” prob-
lems are related to mitosis (and/or mergetable). no_mitosis_pipe=inline,remap,costModel,coercions,evaluate,emptybin
sequential_pipe
The sequential pipeline is identical to the default pipeline, except that optimizers mitosis & dataflow are
omitted. It is use mainly to make some tests work deterministically, i.e., avoid ambigious output, by
avoiding parallelism. sequential_pipe=inline,remap,costModel,coercions,evaluate,emptybind,pushselect,aliases,merge

37
CONFIG FILE FORMAT

The configuration file readable by mserver5 consists of parameters of the form “name=value”. The file
is line-based, each newline-terminated line represents either a comment or a parameter. Only the first
equals sign in a parameter is significant. Whitespace before or after the first equals sign is not stripped.
Trailing whitespace in a parameter value is retained verbatim. Any line beginning with a hash (#) is
ignored, as are lines containing only whitespace. The values following the equals sign in parameters are
all a string where quotes are not needed, and if written be part of the string.

SEE ALSO

monetdbd(1), monetdb(1), mclient(1)

msqldump man-page

NAME

msqldump � dump a MonetDB/SQL database

SYNOPSIS

msqldump [ options ] [ dbname ]

DESCRIPTION

MonetDB is a database management system that is developed from a main-memory perspective with
use of a fully decomposed storage model, automatic index management, extensibility of data types and
search accelerators, SQL- and XML- frontends.
Msqldump is the program to dump an MonetDB/SQL database. The dump can be used to populate a
new MonetDB/SQL database.
Before msqldump starts parsing command line options, it reads a .monetdb file. If the environment
variable DOTMONETDBFILE is set, it reads the file pointed to by that variable instead. When unset,
msqldump searches for a .monetdb file in the current working directory, and if that doesn’t exist, in the
current user’s home directory. This file can contain defaults for the flags user and password. To disable
reading the .monetdb file, set the variable DOTMONETDBFILE to the empty string in the environment.

OPTIONS

��help (�?)

38
Print usage information and exit.
��database=_database_ (�d database)
Specify the name of the database to connect to. The �d can be omitted if it is the last option.
��host=_hostname_ (�h hostname)
Specify the name of the host on which the server runs (default: localhost).
��port=_portnr_ (�p portnr)
Specify the portnumber of the server (default: 50000).
��user=_user_ (�u user)
Specify the user to connect as. If this flag is absent, the client will ask for a user name.
��describe (�D)
Only dump the database schema.
��inserts (�N)
When dumping the table data, use INSERT INTO statements, rather than COPY INTO + CSV values.
INSERT INTO statements are more portable, and necessary when the load of the dump is processed by
e.g. a JDBC application.
��functions (�f)
Only dump functions definitions.
��table=_table_ (�t table)
Only dump the specified table.
��quiet (�q)
Don’t print the welcome message.
��Xdebug (�X)
Trace network interaction between mclient and the server.

SEE ALSO

mclient(1), mserver5(1)

Readonly database

When working in a regular mode, the query processing spans the basic column storage and the data
changed by transactions (inserts, updates, and deletes). For instance, the inserted data, kept in a sepa-
rate delta structure, are added to the column before other operations in the plan. In a ‘read-only mode’

39
modifying transactions are not allowed and the query processing spans only over the basic column
storage. One of the consequences is that query execution may become faster.
The readonly mode is administered with a boolean variable, which can be set by the system administra-
tor using the monetdb tool:
1 shell> monetdb set readonly=yes *<mydatabasename>*

It ensures that all tables are accessed in read only mode. Moreover, the database is protected against
any catalog change, the user can not even create a temporary table for keeping an intermediate result
around.
The alternative route is to deploy SQL schemas in combination with user authentication and access
control grants to selectively allow users access to the database. See the corresponding description in
the SQL manual.

System resources

MonetDB has been designed to use the available resources without many tuning knobs. Functional
enhancements are collected in autoloading scripts, which are executed when a database is created.
Other resources are mostly gathered upon need from the operating system environment. MonetDB
aims to use as much of the main memory available, as many cores as can be practically deployed in
parallel processing of queries, and trying to avoid going to a slow disk. As such, it is a rather aggressive
system, which find immediate competition with ongoing heavy demand applications, e.g. a browser
with a ton of open windows.
We recommend to use MonetDB in production settings on a dedicated server or a sizeable Cloud server.
• Autoloading scripts

• Memory footprint

• Multi-core usage

• Data compression

• Disk space

• OOM Killer

• Server debugging

40
Autoloading scripts

Autoloading is used to initialize a new SQL catalog with scripts containing tables, functions, and proce-
dures while creating a new database with monetdb. Such scripts are kept in the MonetDB library area
./lib/monetdb5/createdb/ . Look them up for the details. The prefix number is used to direct the order
of their execution. A summary of the current collection is illustrated below

Script Description
09_like.sql String regular pattern matching using PCRE
10_math.sql Polar arithmetic
11_times.sql OS timer and io
12_url.sql URL manipulations (see manual)
13_date.sql POSIX formatted date coercions ((see man strptime/strftime)
14_inet.sql PostgreSQL inet data type (see manual)
15_querylog.sql Query history management (see manual)
16_tracelog.sql Query trace table (see manual)
17_temporal.sql Additional date, time and timestamp functions
18_index.sql Experimental Ordered index functions
20_vacuum.sql Table vacuum commands (experimental!)
21_dependency_views.sql Schema object dependencies analysis
22_clients.sql Client administration procedures
23_skyserver.sql Application support library
25_debug.sql Debugging tools for optimziers
26_sysmon.sql Active SQL query queue management
27_rejects.sql COPY into reject management
39_analytics.sql Analytic functions
40_geom.sql Geometry extension library (see manual)
40_json.sql Json data type support functions
41_md5sum.sql md5 checksum function
45_uuid.sql UUID data type functionality
46_profiler.sql start/stop collecting MonetDBS profiler traces in a profiler_logs
pool with a specific heartbeat
51_sys_schema_extension.sql Additional tables / views in “sys” schema.
60_wlcr.sql Workload Capture and Replay functions.
75_storagemodel.sql Storage footprint analysis and prediction.
80_statistics.sql Analyze tables for optimizer statistics
80_udf.sql Example C UDF functions.
90_generator.sql Table producing functions to generate series of numbers or
timestamps
99_system.sql System administration (internal)

Along the same line, MonetDB library area ./lib/monetdb5/autoload/ contains MAL scripts to be executed
each time the server is started.

41
Memory footprint

MonetDB requires all data that needs to be active at any given point in time to fit into the address space
— and of course to fit on the storage device, (i.e., your disk system). On 32-bit systems, the address
space is at most 32-bit (4 GB); in practice, it is actually limited to 3 GB or even 2 GB on most systems.
On 64-bit systems, the address space can theoretically be 64-bit, but in practice is often “limited” to
48-bit or so — not that that makes any difference …
MonetDB excessively uses main memory for processing, but does not require that all data fit in the
available physical memory. To handle a dataset that exceeds the available physical memory, MonetDB
does not (only) rely on the available swap space, but (also) uses memory-mapped files to exploit disk
storage beyond the swap space as virtual memory.
For example, while bulk-loading data (preferably using a COPY INTO statement from a (possibly com-
pressed) CSV file), MonetDB needs to have all columns of the table that is currently being loaded
“active”, i.e., accessible in the address space. However, during loading, parts of the data are continu-
ously written to the persistent files on disk, i.e., the whole table does not have to fit into main memory.
E.g., loading a 100 GB table works fine on a system with 8 GB RAM and 16 GB swap – provided there
is sufficient free disk space.
During query processing, MonetDB requires for each single MAL operation during the query execution
that all its inputs, its outputs, and possible temporary data structures fit in the address space. This
won’t be a problem on 64-bit machines, but may be experienced if you are close to the limit on 32-
bit machines. MonetDB automatically resorts to virtual memory and memory-mapped files for large
intermediate results. Also (large) persistent tables are accessed using memory mapping.
While running, you might see your mserver5 process’ virtual size grow well beyond the available phys-
ical memory and possibly also well beyond your swap space. In principle, this is not a problem at all.
Most of this virtual size is due to the fact that large base tables (or intermediate results) that reside as
files on disk are memory-mapped into the address space. Those parts of the data that are currently not
accessed do not consume any physical memory (except possibly for caching purposes).
However, if individual columns of your table(s) and/or individual columns of intermediate results exceed
the size of the available physical memory, the performance of MonetDB might (will) decrease due to
increased I/O requirements.

Multi-core usage

Modern systems come with many CPUs on a single die and further augmented with hyperthreading to
increase parallel processing. MonetDB will use as much of the cores as possible for both inter- and
intra- parallelism.
A command line argument gdk_nr_threads=<number> can be set to experiment with a limited number
of cores assigned to each user session.

42
Data compression

A distinctive feature of column stores is to apply aggressive data compression. However, compression is
often a two-edged sword, where movement of large data files over relative slow networks, disk access or
memory interconnects is compensated for by applying CPU cycles. The effectiveness of which strongly
depends on the mapping from database schema into data structures, their high maintenance cost, the
relational algorithms, the system architecture, and the data distribution. The compression ratios cited
depends on the input size, which is commonly assumed to be in CSV format, the data distribution, and
the database storage footprint, with or without auxiliary data structures like indices.
MonetDB applies different compression techniques automatically at many levels.
The column store representation is highly optimized, where the basic storage structure is a dense array,
i.e. without holes to accommodate future insertions or overhead caused by the data structure itself
(e.g. B-trees). This dense representation allows for direct mapping of the database files into memory.
The storage width ranges from 1 (byte) to 8 bytes (doubles). NULL values are part of the domain space,
which avoids auxiliary bit masks at the expensive of ‘loosing’ a single value from the domain.
All strings are stored using dictionary encoding. This significantly reduces their storage space, but with
larger dictionaries the maintenance cost may become expensive. Therefore for really large dictionary
tables, MonetDB resort to non-compressed string representation. The references into the dictionary
table occupy anywhere from 1 to 8 bytes, depending on the number of elements.
During query evaluation, a dense range of results is represented by a column view. This is a small
footprint representation of the result set. It avoids both copying the result and storing it in its private
column structure.

Disk space

The disk space footprint is determined by the way columns are being stored. MonetDB uses dictionary
encoding for string columns, but aside from this there is no default compression applied to reduce the
disk footprint. The prime reason being the two-headed sword of compression. It saves (cheap) disk
space and IO bandwidth at the cost of expensive CPU (de)compression overhead (See compression).
Since all columns are memory mapped upon access, i.e. they need not be decompressed. If disk space
comes at a premium and memory residency can be guaranteed for a long time, then a compressed
file system, e.g. BTRFS, may become helpful. Its compression behavior is often as good as dedicated
algorithms executed within the critical path of a query execution.
The disk footprint can be assessed using the (Linux) command ‘du’ on the dbfarm directory or to run
the query ‘select * from storage();’, provided the sql extensions are pre-loaded into your database. (See
storage model)

Running out of diskspace

One of the features of MonetDB’s execution model is that all intermediates are materialized as memory
mapped files. A consequence of this approach is that when memory is too small to keep them around,

43
they will be swapped to disk by the operating system. This can be seen as a decaying free space and
ultimately a full disk. In turn this (should) lead to a single transaction abort and removing its disk claim.
Evidently, filling your disk depends on the number of users and the complexity of their queries. It aids
to the cost of running COPY INTO and queries concurrently.
If you have limited resources for concurrent access then the monetdb funnel may be an option to
serialize the user requests. A single server can have multiple funnels.

OOM Killer

The MonetDB server may become a victim of Linux kernel functionality called Out Of Memory Killer
(or OOM Killer) responsible for dealing with excessive memory requirements. If the system reaches a
point where it may run out of memory, OOM Killer looks for victim process and ends its life the hard
way.
In most cases, a simple restart of the server will suffice, but when the killer enters during a complex
update transaction then the database may end-up in an inconsistent state.

Server debugging

In most cases, the system produces informative error messages. However, there are situations where
MonetDB enters an area not covered by the test suite or previous use and a segmentation fault occurs.
These cases are hard to analyze outside the development lab. To isolate and resolve the issue we need
at least the following information.
Sent us the result of the command mserver5 --version --dbname=<databasename> or the equivalent
using monetdb --version <databasename>
Is the error reproducible with a small (5-10 line) script/query/program? Trim your experiment to the
minimal size that demonstrates the erroneous behavior. Such a script is the best we can hope for,
because it will end up in the nightly testing.
In addition, follow the steps, assuming that you are logged onto the same Linux (!) machine as where
the server will run:
1. Sent the tail of the merovingian log (100 lines) located in the dbfarm directory.
2. Start the database server either with monetdb or as a full command line mserver --dbinit="..."
.
3. Lookup the process id for this server using the command ps -al |grep mserver5
4. Attach the debugger with gdb mserver5 <processid>
5. Run your experiment again until it hits the segmentation fault, which you see emerging in window.
6. Then obtain a stack trace with the command where. It shows of the context of the crash using
the list command and the variables used in the context of the crash.
7. Collect the status of all threads using the command thr app all bt.

44
SQL dump and restore

An SQL dump of a database is a common method to safely store away a snapshot of the database
for archival purposes or to migrate data between database instances, e.g. between two major system
releases. The content of a SQL dump is a large collection of SQL commands in ASCII. Running the script
will recreate the database in the same state as it was when the dump was created. Since MonetDB does
not provide global locking schemes, a time-consuming dump operation may become invalidated by a
concurrent update query.
The primary tool to consider for making an ASCII dump is msqldump, which includes a wide variety of
options:.
Usage: msqldump options [ dbname ]
Options are:

Option or Option Comment


-h hostname –host=hostname host to connect to
-p portnr –port=portnr port to connect to
-u user –user=user user id
-d database –database=database database to connect to
-f –functions dump functions
-t table –table=table dump a database table
-D –describe describe database
-N –inserts use INSERT INTO statements
-q –quiet don’t print welcome message
-X –Xdebug trace mapi network interaction
-? –help show this usage message

Note 1
–functions and –table are mutually exclusive

It act as a normal database application, e.g. it runs concurrently with all other user sessions, and dumps
the generated SQL statements onto standard output. Safe this to a file for later a later restore session
by calling mclient with the saved session as argument. Details on the various arguments can be found
in the manual page for mclient.
For example, consider you have already installed the SQL tutorial database voc on a Linux platform and
wishes to transport it to another machine. Then the following command line option generates the dump
file.
1 shell> msqldump --database=voc >/tmp/voc.sql

You can inspect the file /tmp/voc.sql to confirm that indeed a readable database dump is available. If
storage space is a problem, you can readily pipe the output of this command to a compression tool,

45
e.g. gzip, or sent it directly to another machine in your environment using conventional Linux tools,
e.g. scpy. As a baseline for recovery, move the file over to the new machine. Then the monetdb tool
can be used to create the database on the recipient machine. Once done, it suffices to feed the dump
file to mclient to populate the database.
Creation of the dump respects your credentials, which means you only can dump the tables you have
access to.
Migration of a database from other database systems follow the same route, but be aware that SQL
dialects often differ. A manual or scripted patch of a foreign SQL dump is often needed. An overview
of the various SQL dialects can be found in the SQL dialects book and general background on database
compatibility on Wikipedia.
1 shell> monetdb release demo

Windows platforms

On Windows platforms we recommend to always dump the old database into ASCII before installing a
new MonetDB release. Subsequently remove the dbfarm. After installation, the dump can be restored.
To dump the SQL database, start the MonetDB SQL Client program and type the command
1 \>...\databasedump.sql
2 \D
3 \>

The path after > should be an absolute path name (i.e. start with a drive letter) and be in a save location.
By default the database is located in %APPDATA%\MonetDB5. After having made a database dump it can
be removed. This folder is located inside the dbfarm\demofolder.
Restoring the SQL database can be done using the MonetDB SQL Client program with the following
command
1 \<...\databasedump.sql

File system level backup

Aside from the ASCII dumps, one can also take the database server out of production using the following
commands. Be careful in using shutdown(), because there may be users running transactions. This can
be seen using the sys.sessions table or the system monitor.
1 shell> mclient -d demo
2 select * from sys.sessions;
3 call sys.shutdown(10);
4 <cntrl-d>
5 shell> monetdb stop demo
6 shell> monetdb lock demo

46
The “lock” step is necessary to prevent monetdbd from automatically restarting the database “demo”
upon receiving another client connection request.
After the database has been securely stopped, we can create a copy of the database directory in the
dbfarm and put it aside in a safe place. Alternatively, incremental file-system dumps can be used to
reduce the time and storage space for a recovery point. Finally, the database is released for production
again using:
1 shell>monetdb release <dbname>

47

Das könnte Ihnen auch gefallen