Beruflich Dokumente
Kultur Dokumente
1. Executive Summary..........................................................................................................................3
2. Introducing The MySQL Cluster Database..................................................................................... 4
3. Why Use MySQL Cluster CGE for Subscriber Databases?........................................................... 5
3.1.Benefits of MySQL Cluster to Subscriber Databases....................................................6
4. Linear Scalability in a Shared-Nothing, Distributed Database.....................................................7
4.1.Distribution Awareness in MySQL Cluster ....................................................................8
5. Creating Tables Partitioned by Subscriber-ID.............................................................................. 11
6. Improved Scalability with Distribution-Awareness...................................................................... 12
7. Conclusion...................................................................................................................................... 13
8. Additional Resources .................................................................................................................... 14
9. Glossary.......................................................................................................................................... 15
1
In proprietary subscriber systems, data typically has to be offloaded to an external relational database in to perform complex SQL queries, e.g., for
accounting or other BSS purposes.
Featuring a “shared-nothing” distributed architecture with no single point of failure, MySQL Cluster is designed to
deliver 99.999% availability demanded by telecommunications services.
MySQL Cluster's real-time design delivers predictable, millisecond response times with the ability to service tens of
thousands of transactions per second. Support for in-memory and disk based data, automatic data partitioning with
load balancing and the ability to add nodes to a running cluster with zero downtime allows linear database scalability to
handle the most unpredictable telecoms services and applications.
MySQL Cluster is already proven in the toughest telecommunications environments delivering higher database
throughput and faster response times at 10x lower cost than proprietary clustered shared-disk databases2, with the
added benefit of running on commodity hardware and operating systems. Customers include Alcatel Lucent, Cisco,
Deutsche Telekom, Ericsson, Nokia Siemens Networks, Nortel, Telenor and UTStarcom.
Figure 1: The MySQL Cluster architecture eliminates any single point of failure
To learn more about the MySQL Cluster architecture, refer to the MySQL Cluster Architecture and New Features
whitepaper posted at:
http://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster7_architecture.php
2
http://www.mysql.com/why-mysql/case-studies/mysql-cs-alcatel.php
Figure 2: MySQL Cluster CGE powers the subscriber database used by multiple telecommunications services
Referring to the figure above, MySQL Cluster CGE implements specific capabilities to build highly scalable and reliable
subscriber services:
NDB API provides real-time access to data stored within the MySQL Cluster database. The NDB API is also easily
integrated with Subscriber-style APIs, such as LDAP, via a series of drivers developed by the leading open source
directory vendors and communities, including OpenDS and OpenLDAP3.
NDB API provides specialized features, which provide significant performance optimizations not available in the SQL
API or traditional ODBC / JDBC connectors. These include:
– Batching of operations inside transactions to optimize network usage, and significantly improve transactional
throughput
– An adaptive-send buffer at clients to optimize use of the network medium
– Distribution-aware transactions, through partitioning by key, distribution keys and controlling parallelism in
scans (discussed in detail in sections 4 and 5 of this Guide).
Real-Time SQL API using a MySQL Server is a simple and efficient means of enabling a range of applications to
access subscriber data stored in MySQL Cluster. Provisioning requests, AAA (Authentication, Authorization and
Accounting) protocols such as RADIUS and Diameter, in addition to a range of BSS applications can make use of the
SQL API to provide standard access to data stored in MySQL Cluster.
http://www.mysql.com/products/database/cluster/features.html#data_ldap
3
As illustrated in the figure above, in a 4-Node cluster with no distribution awareness, there is a 50% chance of
transaction starting on a Data Node in the wrong Node Group (a node not containing the data), resulting in
unnecessary inter-node group communications. In this example, the transaction is a batch6 of two operations (a read
4
See the MySQL Cluster Documentation: http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster.html
5
http://blogs.sun.com/hasham/entry/mysql_cluster_7_performance_benchmark
6
Increasing the number of operations in a transaction produces large performance gains, as it reduces the number of times NDB API clients send
transactions to Data Nodes, better utilizing network bandwidth.
Figure 5: Index Scans from MySQL Server have Parallel=0 (i.e., maximum parallelism)
Similarly, in Figure 5, we can see how an index scan that is sent to a MySQL Server causes the scan to be executed on
all of the data nodes in the cluster. The TC sends the index scan to the LQHs of all nodes, which execute the index
scan locally, and then return the results back to the MySQL Server.
However, we can reduce the number of index scans significantly by partitioning the database in such a way that all
relevant tuples belonging to a specific subscriber are located within a single node. The index scan will then only
execute on that data node. An example of this can be found in Figure 6.
7
See NDB-API Documentation: http://dev.mysql.com/doc/ndbapi/en/overview-selecting-tc.html
8
Partition by key is available in MySQL Cluster CGE
9
The parallel parameter for index scans is ignored and set to maximum if the scan results are sorted.
As illustrated above, when using the NDB API, a distribution key value is set for the Transaction, and an Index Scan’s
level of parallelism is 1. The Index Scan executes on the Primary Partition for the subscriber data.
For existing relational subscriber databases, MySQL Cluster's approach represents a painless transition to a
subscriber-oriented database compared to hierarchical approaches, such as LDAP and DAP for X.500.
MySQL Cluster CGE provides the performance benefits of distribution-aware operations, while still maintaining partition
and access transparency, as well as offering the advantages of the relational database model.
Similar benefits from distribution awareness can be realized when accessing data through the SQL interface where the
MySQL Server can parse the query and internally make the appropriate calls to the NDB API.
We can now work through a simple example of a subscriber database with an associated IMS service. First, we define
a simplistic schema for subscriber profiles, partitioned by subscriber-ID (‘uid’ column in the schema definition) as
follows:
CREATE TABLE subscriber_profile (
uid INT NOT NULL,
name VARCHAR(255) NOT NULL,
addr VARCHAR(255) NOT NULL,
PRIMARY KEY(uid)
) ENGINE=NDB PARTITION BY KEY (uid);
PARTITION BY KEY allows the specification of the distribution key for the table as a list of zero or more column
names. Where no column name is specified as the distribution key, the table's primary key is used (as would have
been the case in the above table). If no partition key or primary key is specified for the table, the table is reorganized
using a “hidden” primary key as the table's new partitioning key.
Columns specified as the distribution key do not have to be integer values, since the MD5 hashing function supplied by
MySQL guarantees an integer result regardless of the column data type. The distribution key can also be a composite
key, consisting of more than one column.
Now, we define an IMS service that uses the subscriber database, called Push-to-Talk (PTT). PTT is a “walkie-talkie”-
style service, where users push a button on their mobile device to instantaneously set up a one-way voice-
communication channel to one or more users.
NDB-API is required to write distribution-aware applications that can start transactions with the primary key or index
scan operations on the node containing the subscriber data. Some sample code is given in Figure 7 above.
However, when no distribution key value is supplied, the probability that the transaction will start on a Data Node in the
same Node Group as the Primary decreases with the increasing size of the cluster. For a transaction containing
Primary Key operations, in an 8-node cluster, this can lead to a performance drop of roughly 20%. This demonstrates
that, for large cluster sizes, a reasonable performance improvement can be gained by using distribution key values for
transactions containing individual subscriber operations.
0.8
0.6
Probability
0.4 No Distribution Key
Distribution Key
0.2
0
2 4 8 16
Number of Data Nodes
Figure 8: For Linear Scalability, Start Transactions with a Distribution Key Value.
Figure 9: For Linear Scalability with Index Scans, use a Distribution Key Value and Parallel=1 to start the
Transaction and Index Scan on the Data Node that contains the Subscriber Data.
7. Conclusion
MySQL Cluster Carrier Grade Edition is a perfect fit for Subscriber Databases due to its high performance, high
availability, and ease of integration with database-independent Subscriber APIs or existing relational or directory-based
subscriber models.
Many of its features are aimed specifically at meeting the requirements for Subscriber Databases deployed with CSPs,
including:
– Distribution-Awareness through the partitioning of tables by subscriber identifiers, and by using those identifiers
when accessing subscriber data to ensure that reads/writes are localized to the data node(s) containing the
subscriber data;
– Standards Based, open source database allows vendors and users of subscriber data management solutions to
easily integrate their applications with MySQL Carrier Grade Edition using their preferred database-independent
Subscriber API, e.g., LDAP, SQL, C++, Java, HTTP, etc;
– LDAP drivers enabling MySQL Cluster CGE to be accessed via the LDAP protocol;
– High Performance with a Shared-Nothing, Distributed Database that provides real-time access to in-memory
subscriber data with just a few milliseconds latency for reads and writes, and can be scaled out by adding
additional resources or by storing data on disk;
– 99.999% Availability achieved by synchronously replicating data across active nodes in the cluster, with recovery
data being asynchronously written to disk;
– Self-Healing of data nodes with sub-second fail-over times, and an optimized node recovery protocol that
automatically re-synchronizes data across re-starting data nodes;
– Geographic-Replication for site-level redundancy;
8. Additional Resources
Alcatel-Lucent uses MySQL Cluster Carrier Grade Edition to Handle over 60 million Subscribers:
http://www.mysql.com/why-mysql/case-studies/mysql-alcatel-casestudy.php
BT Plusnet Achieves Continuous Availability of Subscriber AAA Services with MySQL Cluster and FreeRADIUS:
http://www.mysql.com/why-mysql/case-studies/mysql_cs_plusnet.php
Copyright © 2009, Sun Microsystems Inc. MySQL is a registered trademark of Sun Microsystems in the U.S. and in
other countries. Other products mentioned may be trademarks of their companies.