Severalnines - ClusterControl

Thanks to everyone who participated in our recent webinar on how to load balance MySQL and MariaDB with ClusterControl and ProxySQL!

This joint webinar with ProxySQL creator René Cannaò generated a lot of interest … and a lot of questions!

We covered topics such as ProxySQL concepts (with hostgroups, query rules, connection multiplexing and configuration management), went through a live demo of a ProxySQL setup in ClusterControl (try it free) and discussed upcoming ClusterControl features for ProxySQL.

These topics triggered a lot of related questions, to which you can find our answers below.

If you missed the webinar, would like to watch it again or browse through the slides, it is available for viewing online.

Watch the webinar replay

You can also join us for our follow-up webinar next week on Tuesday, April 4th 2017. We’re again joined by René and will be discussing High Availability in ProxySQL.

Webinar Questions & Answers

Q. Thank you for your presentation. I have a question about connection multiplexing: does ProxySQL ensure that all statements from start transaction to commit are sent through the same backend connection?

A. This is configurable.

A small preface first: at any time, each client’s session can have one or more backend connections associated with it. A backend connection is associated to a client when a query needs to be executed, and normally it returns immediately back to the connection pool. “Normally” means that there are circumstances when this doesn’t happen. For example, when a transaction starts, the connection is not returned anymore to the connection pool until the transaction completes (either commits or rollbacks). This means that all the queries that should be routed to the same hostgroup where the transaction is running, are guaranteed to run in the same connection.

Nonetheless, by default, a transaction doesn’t disable query routing. That means that while a transaction is running on one connection to a specific hostgroup and this connection is associated with only that client, if the client sends a query destinated to another hostgroup, that query could be sent to a different connection.

Whatever the query could be sent to a different connection or not based on query rules is configurable by the value of mysql_users.transaction_persistent:

0 = queries for different hostgroup can be routed to different connections while a transaction is running;
1 = query routing will be disabled while the transaction is running.

The behaviour is configurable because it depends on the application. Some applications require that all the queries are part of the same transaction, other applications don’t.

Q. What is the best way to set up a ProxySQL cluster? The main concern here is configuration of the ProxySQL cascading throughout the cluster.

A. ProxySQL can be deployed in numerous ways.

One typical deployment pattern is to deploy a ProxySQL instance on every application host. The application would then connect to the proxy using very low latency connection via Unix socket. If the number of application hosts increase, you can deploy a middle-layer of 3-5 ProxySQL instances and configure all ProxySQL instances from application servers to connect via this middle-layer. Configuration management, typically, would be handled using Puppet/Chef/Ansible infrastructure orchestration tools. You can also easily use home-grown scripts as ProxySQL’s admin interface is accessible via MySQL command line and ProxySQL reconfiguration can be done by issuing a couple of SQL statements.

Q. How would you recommend to make the ProxySQL layer itself highly available?

There are numerous methods to achieve this.

One common method is to deploy a ProxySQL instance on every application host. The application would then connect to the proxy using very low latency connection via Unix socket. In such a deployment there is no single point of failure as every application host connects to the ProxySQL installed locally.

When you implement a middle-layer, you will also maintain HA as 3-5 ProxySQL nodes would be enough to make sure that at least some of them are available for local proxies from application hosts.

Another common method of deploying a highly available ProxySQL setup is to use tools like keepalived along with virtual IP. The application will connect to VIP and this IP will be moved from one ProxySQL instance to another if keepalived detects that something happened to the “main” ProxySQL.

Q. How can ProxySQL use the right hostgroup for each query?

A. ProxySQL route queries to hostgroups is based on query rules - it is up to the user to build a set of rules which make sense in their environment.

Q. Can you tell us more about query mirroring?

A. In general, the implementation of query mirroring in ProxySQL allows you to send traffic to two hostgroups.

Traffic sent to the “main” hostgroup is ensured to reach it (unless there are no hosts in that hostgroup); on the other hand, mirror hostgroup will receive traffic on a “best effort” basis - it should but it is not guaranteed that the query will indeed reach the mirrored hostgroup.

This limits the usefulness of mirroring as a method to replicate data. It is still an amazing way to do load testing of new hardware or redesigned schema. Of course, mirroring reduces the maximal throughput of the proxy - queries have to be executed twice so the load is also twice as high. The load is not split between the two, but duplicated.

Q. And what about query caching?

Query cache in ProxySQL is implemented as a simple key->value memory store with Time To Live for every entry. What will be cached and for how long - this is decided on the query rules level. The user can define a query rule matching a particular query or a wider spectrum of them. To identify query results set in cache, ProxySQL uses query hash along with information about user and schema.

How to set TTL for a query? The simplest answer is: to the maximum value of replication lag which is acceptable for this query. If you are ok to read stale data from slave, which is lagging 10 seconds, you should be fine reading stale data from cache when TTL is set to 10000 milliseconds.

Q. Connection limit to backends?

A. ProxySQL indeed implements a connection limit to backend servers. The maximum number of connections to any backend instance is defined in mysql_servers table.

Because the same backend server can be present in multiple hostgroups, it is possible to define the maximum number of connections per server per hostgroup.

This is useful for example in the case of a small set of connections where specific long running queries are queued without affecting the rest of the traffic destinated to the same server.

Q. Regarding the connection limit from the APP: are connections QUEUED?

A. If you reach the mysql-max_connections, further connections will be rejected with the error “Too many connections”.

It is important to remember that there is not a one-to-one mapping between application connections and backend connections.

That means that:

Access to the backends can be queued, but connections from the application are either accepted or rejected.
A large number of application connections can use a small number of backend connections.

Q. I haven’t heard of SHUN before: what does it mean?

A. SHUN means that the backend is temporarily marked as non-available but ProxySQL will attempt to connect to it after mysql-shun_recovery_time_sec seconds

Q. Is query sharding available across slaves?

A. Depending on the meaning of sharding, ProxySQL can be used to perform sharding across slaves. For example, it is possible to send all traffic for a specific set of tables to a set of slaves (in a hostgroup). Splitting the slaves into multiple hostgroups and performing query sharding accordingly is possible to improve performance, as each slave won’t read from disk data from tables for which it doesn’t process any query.

Q. How do you sync the configuration of ProxySQL when you have many instances for H.A ?

A. Configuration management, typically, would be handled using Puppet/Chef/Ansible infrastructure orchestration tools. You can also easily use home-grown scripts as ProxySQL’s admin interface is accessible via MySQL command line and ProxySQL reconfiguration can be done by issuing a couple of SQL statements.

Q. How flexible or feasible it is to change the ProxySQL config online, eg. if one database slave is down, how is that handled in such a scenario ?

A. ProxySQL configuration can be changed at any time; it’s been designed with such level of flexibility in mind.

‘Database down’ can be handled differently, it depends on how ProxySQL is configured. If you happen to rely on replication hostgroups to define writer and reader hostgroups (this is how ClusterControl deploys ProxySQL), ProxySQL will monitor state of read_only variable on both reader and writer hostgroups and it will move hosts as needed.

If master is promoted by external tools (like ClusterControl, for example), read_only values will change and ProxySQL will detect a topology change and it will act accordingly. For a standard “slave down” scenario there is no required action from the management system standpoint - without any changes in read_only value ProxySQL will just detect that the host is not available and it will stop sending queries to it, re-executing on other members of the hostgroup those queries which didn’t complete on dead slave.

If we are talking about a setup not using replication hostgroups then it is up to the user and their scripts/tools to implement some sort of logic and reconfigure ProxySQL on runtime using admin interface. Slave down, though, most likely wouldn’t require any changes.

Q. Is it somehow possible to SELECT data from one host group into another host group?

A. No, at this point it is not possible to execute cross-hostgroup queries.

Q. What would be RAM/Disk requirements for logs , etc?

A. It basically depends on the amount of log entries and how ProxySQL log is verbose in your environment. Typically it’s neglectable.

Q. Instead of installing ProxySQL on all application servers, could you put a ProxySQL cluster behind a standard load balancer?

A. We see no reason why not? You can put whatever you like in front of the ProxySQL - F5, another layer of software proxies - it is up to you. Please keep in mind, though, that every layer of proxies or load balancers adds latency to your network and, as a result, to your queries.

Q. Can you please comment on Reverse Proxy, whether it can be used in SQL or not?

A. ProxySQL is a Reverse Proxy. Contrary to a Forward Proxy (that acts as an intermediary that simply forwards requests), a Reverse Proxy processes clients’ requests and retrieves data from servers. ProxySQL is a Reverse Proxy: clients send requests to ProxySQL, that will understand the request, analyze it, and decide what to do: rewrite, cache, block, re-execute on failure, etc.

Q. Does the user authentication layer work with non-local database accounts, e.g. with the pam modules available for proxying LDAP users to local users?

A. There is no direct support for LDAP integration but, as configuration management in ProxySQL is a child’s play, it is really simple to put together a script which will pull the user details from LDAP and load them into ProxySQL. You can use cron to sync it often. All ProxySQL needs is a username and password hash in MySQL format - this is enough to add a user to ProxySQL.

Q. It seems like the prescribed production deployment includes many proxies - are there any suggestions or upcoming work to address how to make configuration changes across all proxies in a consistent manner?

A. At this point it is recommended to leverage configuration management tools like Chef/Ansible/Puppet to manage ProxySQL’s configuration.

Watch the webinar replay

You can also join us for our follow-up webinar next week on Tuesday, April 4th 2017. We’re again joined by René and will be discussing High Availability in ProxySQL.

Tags:

MySQL

MariaDB

proxysql

clustercontrol

load balancing

Setting up replication in MySQL is easy, but managing it in production has never been an easy task. Even with the newer GTID auto-positioning, it still can go wrong if you don’t know what you are doing. After setting up replication, all sorts of things can go wrong. Mistakes can easily be made and can have a disastrous ending for your data.

This post will highlight some of the most common mistakes made with MySQL replication, and how you can prevent them.

Setting up replication

When setting up MySQL replication, you need to prime the slave nodes with the dataset from the master. With solutions like Galera cluster, this is automatically handled for you with the method of your choice. For MySQL replication, you need to do this yourself, so naturally you take your standard backup tool.

For MySQL there is a huge variety of backup tools available, but the most commonly used one is mysqldump. Mysqldump outputs a logical backup of the dataset of your master. This means the copy of the data is not going to be a binary copy, but a big file containing queries to recreate your dataset. In most cases this should provide you with a (near) identical copy of your data, but there are cases where it will not - due to the dump being on a per object basis. This means that even before you start replicating data, your dataset is not the same as the one on the master.

There are a couple of tweaks you can do to make mysqldump more reliable like dump as a single transaction, and also don’t forget to include routines and triggers:

mysqldump -uuser -ppass --single-transaction --routines --triggers --all-databases > dumpfile.sql

A good practice is to check if your slave node is 100% the same, is by using pt-table-checksum after setting up the replication:

pt-table-checksum --replicate=test.checksums --ignore-databases mysql h=localhost,u=user,p=pass

This tool will calculate a checksum for each table on the master, replicate the command to the slave and then the slave node will perform the same checksum operation. If any of the tables are not the same, this should be clearly visible in the checksum table.

Using the wrong replication method

The default replication method of MySQL was the so called statement-based replication. This method is exactly what it is: a replication stream of every statement run on the master that will be replayed on the slave node. Since MySQL itself is multi-threaded but it’s (traditional) replication isn’t, the order of statements in the replication stream may not be 100% the same. Also replaying a statement may give different results when not executed on the exact same time.

This may result in different datasets between the master and slave, due to data drift. This wasn’t an issue for many years, as not many ran MySQL with many simultaneous threads, but with modern multi-CPU architectures, this actually has become highly probable on a normal day-to-day workload.

The answer from MySQL was the so called row-based replication. Row based replication will replicate the data whenever possible, but in some exceptional cases still use statements. A good example would be the DLL change of a table, where the replication then would have to copy every row in the table through replication. Since this is inefficient, such a statement will be replicated in the traditional way. When row based replication detects data drift, it will stop the slave thread to prevent making things worse.

Then there is a method in between these two: mixed mode replication. This type of replication will always replicate statements, except when the query contains the UUID() function, triggers, stored procedures, UDFs and a few other exceptions are used. Mixed mode will not solve the issue of data drift and, together with statement-based replication, should be avoided.

Circular replication

Running MySQL replication with multi-master is often necessary if you have a multi-datacenter environment. Since the application can’t wait for the master in the other datacenter to acknowledge your write, a local master is preferred. Normally the auto increment offset is used to prevent data clashes between the masters. Having two masters perform writes to each other in this way is a broadly accepted solution.

MySQL Master-Master replication

However if you need to write in multiple datacenters into the same database, you end up with multiple masters that need to write their data to each other. Before MySQL 5.7.6 there was no method to do a mesh type of replication, so the alternative would be to use a circular ring replication instead.

MySQL ring replication topology

Ring replication in MySQL is problematic for the following reasons: latency, high availability and data drift. Writing some data to server A, it would take three hops to end up on server D (via server B and C). Since (traditional) MySQL replication is single threaded, any long running query in the replication may stall the whole ring. Also if any of the servers would go down, the ring would be broken and currently there is no failover software that can repair ring structures. Then data drift may occur when data is written to server A and is altered at the same time on server C or D.

Broken ring replication

In general circular replication is not a good fit with MySQL and it should be avoided at all costs. Galera would be a good alternative for multi-datacenter writes, as it has been designed with that in mind.

Stalling your replication with large updates

Often various housekeeping batch jobs will perform various tasks, ranging from cleaning up old data till calculating averages of ‘likes’ fetched from another source. This means at set intervals, a job will create a lot of database activity and, most likely, write a lot of data back to the database. Naturally this means the activity within the replication stream will increase equally.

Statement-based replication will replicate the exact queries used in the batch jobs, so if the query took half an hour to process on the master, the slave thread will be stalled for at least the same amount of time. This means no other data can replicate and the slave nodes will start lagging behind the master. If this exceeds the threshold of your failover tool or proxy, it may drop these slave nodes from the available nodes in the cluster. If you are using statement-based replication, you can prevent this by crunching the data for your job in smaller batches.

Now you may think row-based replication isn’t affected by this, as it will replicate the row information instead of the query. This is partly true, as for DDL changes, the replication reverts back to statement-based format. Also large numbers of CRUD operations will affect the replication stream: in most cases this is still a single threaded operation and thus every transaction will wait for the previous one to be replayed via replication. This means that if you have high concurrency on the master, the slave may stall on the overload of transactions during replication.

To get around this, both MariaDB and MySQL offer parallel replication. The implementation may differ per vendor and version. MySQL 5.6 offers parallel replication as long as the queries are separated by schema. MariaDB 10.0 and MySQL 5.7 both can handle parallel replication across schemas, but have other boundaries. Executing queries via parallel slave threads may speed up your replication stream if you are write heavy. However if you aren’t, it would be best to stick to the traditional single threaded replication.

Schema changes

Performing schema changes on a running production setup is always a pain. This has to do with the fact that a DDL change will most of the time lock a table and only release this lock once the DDL change has been applied. It even gets worse once you start replicating these DDL changes through MySQL replication, where it will in addition stall the replication stream.

A frequently used workaround is to apply the schema change to the slave nodes first. For statement-based replication this works fine, but for row-based replication this can work up to a certain degree. Row-based replication allows extra columns to exist at the end of the table, so as long as it is able to write the first columns it will be fine. First apply the change to all slaves, then failover to one of the slaves and then apply the change to the master and attach that as a slave. If your change involves inserting a column in the middle or removal of a column this will work with row-based replication.

There are tools around that can perform online schema changes more reliably. The Percona Online Schema Change (as known as pt-osc) will create a shadow table with the new table structure, insert new data via triggers and backfill data in the background. Once it is done creating the new table, it will simply swap the old for the new table inside a transaction. This doesn’t work in all cases, especially if your existing table already has triggers.

An alternative is the new Gh-ost tool by Github. This online schema change tool will first make a copy of your existing table layout, alter the table to the new layout and then hook up the process as a MySQL replica. It will make use of the replication stream to find new rows that have been inserted into the original table and at the same time it backfills the table. Once it is done backfilling, the original and new tables will switch. Naturally all operations to the new table will end up in the replication stream as well, thus on each replica the migration happens at the same time.

Memory tables and replication

While we are on the subject of DDLs, a common issue is the creation of memory tables. Memory tables are non-persistent tables, their table structure remains but they lose their data after a restart of MySQL. When creating a new memory table on both a master and a slave, both will have an empty table and this will work perfectly fine. Once either one gets restarted, the table will be emptied and replication errors will occur.

Row-based replication will break once the data in the slave node returns different results, and statement-based replication will break once it attempts to insert data that already exists. For memory tables this is a frequent replication-breaker. The fix is easy: simply make a fresh copy of the data, change the engine to InnoDB and it should now be replication safe.

Setting the read_only variable to true

As we described earlier, not having the same data in the slave nodes can break replication. Often this has been caused by something (or someone) altering the data on the slave node, but not on the master node. Once the master node’s data gets altered, this will be replicated to the slave where it can’t apply the change and this causes the replication to break.

There is an easy prevention for this: setting the read_only variable to true. This will disallow anyone to make changes to the data, except for the replication and root users. Most failover managers set this flag automatically to prevent users to write to the used master during failover. Some of them even retain this after the failover.

This still leaves the root user to execute an errant CRUD query on the slave node. To prevent this from happening, there is a super_read_only variable since MySQL 5.7.8 that even locks out the root user from updating data.

Enabling GTID

In MySQL replication, it is essential to start the slave from the correct position in the binary logs. Obtaining this position can be done when making a backup (xtrabackup and mysqldump support this) or when you have stopped slaving on a node that you are making a copy of. Starting replication with the CHANGE MASTER TO command would look like this:

mysql> CHANGE MASTER TO MASTER_HOST='x.x.x.x',MASTER_USER='replication_user', MASTER_PASSWORD='password', MASTER_LOG_FILE='master-bin.0001', MASTER_LOG_POS=  04;

Starting replication at the wrong spot can have disastrous consequences: data may be double written or not updated. This causes data drift between the master and the slave node.

Also when failing over a master to a slave involves finding the correct position and changing the master to the appropriate host. MySQL doesn’t retain the binary logs and positions from its master, but rather creates its own binary logs and positions. For re-aligning a slave node to the new master this could become a serious problem: the exact position of the master on failover has to be found on the new master, and then all slaves can be re-aligned.

To solve this issue, the Global Transaction Identifier (GTID) has been implemented by both Oracle and MariaDB. GTIDs allow auto aligning of slaves, and in both MySQL and MariaDB the server figures out by itself what the correct position is. However both have implemented the GTID in a different way and are therefore incompatible. If you need to set up replication from one to another, the replication should be set up with traditional binary log positioning. Also your failover software should be made aware not to make use of GTIDs.

Conclusion

We hope to have given you enough tips to stay out of trouble. These are all common practices by the experts in MySQL. They had to learn it the hard way and with these tips we ensure you don’t have to.

We have some additional white papers that might be useful if you’d like to read more about MySQL replication.

Tags:

We previously compared two high availability solutions for MySQL - MHA and MariaDB Replication Manager and looked into how they performed fail-over. In this blog post, we’ll see how ClusterControl stacks up against these solutions. Since MariaDB Replication Manager is under active development, we decided to take a look at the not yet released version 1.1.

Flapping

All solutions provide flapping detection. MHA, by default, executes a failover once. Even after you restart masterha_manager, it will still check if the last failover didn’t happen too recently. If yes (by default, if it happened in the last 8 hours), no new failover will happen. You need to explicitly change the timeout or set --ignore_last_failover flag.

MariaDB Replication Manager has less strict defaults - it will allow up to three failovers as long as each of them will happen more than 10 seconds after the previous one. In our opinion this is a bit too flexible - if the first failover didn’t solve the problem, it is unlikely that another attempt will give better results. Still, default settings are there to be changed so you can configure MRM however you like.

ClusterControl uses similar approach to MHA - only one failover is attempted. Next one can happen only after the master has been detected successfully as online (for example, ClusterControl recovery or manual intervention by the admin managed to promote one of the slaves to a master) or after restart of cmon process.

Lost transactions

MHA can work in two modes - GTID or non-GTID. Those modes differ regarding to how missing transactions are handled. Traditional replication, actually, is handled in a better way - as long as the old master is reachable, MHA connects to it and attempts to recover missing transactions from its binary logs. If you use GTID mode, this does not happen which may lead to more significant data loss if your slaves didn’t manage to receive all relay logs - another very good reason to use semi-synchronous replication, which has you covered in this scenario.

MRM does not connect to the old master to get the logs. By default, it elects the most advanced slave and promotes it to master. Remaining slaves are slaved off this new master, making them as up to date as the new master. There is a potential for a data loss, on par with MHA’s GTID mode.

ClusterControl behaves similarly to MRM - it picks the most advanced slave as a master candidate and then, as long as it is safe (for example, there are no errant transactions), promote it to become a new master. Remaining slaves get slaved off this new master. If ClusterControl detects errant transactions, it will stop the failover and alert the administrator that manual intervention is needed. It is also possible to configure ClusterControl to skip errant transaction check and force the failover.

Network partitioning

For MHA, this has been taken care of by adding a second MHA Manager node, preferably in another section of your network. You can query it using secondary_check_script. It can be used to connect to another MHA node and execute masterha_check_repl to see how the cluster can be seen from that node. This gives MHA a better view on the situation and topology, it might not failover as it is unnecessary.

MRM implements another approach. It can be configured to use slaves, external MaxScale proxy or scripts executed through HTTP protocol on a custom port (like the scripts which governs HAProxy behavior) to build a full view of the topology and then make an informed decision based on this.

ClusterControl, at this moment, does not perform any advanced checks regarding availability of the master - it uses only its own view of the system, therefore it can take an action if there are network issues between the master and the ClusterControl host. Having said that, we are aware this can be a serious limitation and there is a work in progress to improve how ClusterControl detects failed master - using slaves and proxies like MaxScale or ProxySQL to get a broader picture of the topology.

Roles

Within MHA you are able to apply roles to a specific host, so for instance ‘candidate_master’ and ‘no_master’ will help you determine which hosts are preferred to become master. A good example could be the data center topology: spread the candidate master nodes over multiple racks to ensure HA. Or perhaps you have a delayed slave that may never become the new master even if it is the last node remaining.

This last scenario is likely to happen with MariaDB Replication Manager as it can’t see the other nodes anymore and thus can’t determine that this node is actually, for instance, 24 hours behind. MariaDB does not support the Delayed Slave command but it is possible to use pt-slave-delay instead. There is a way to set the maximum slave delay allowed for MRM, however MRM reads the Seconds_Behind_Master from the slave status output. Since MRM is executed after the master is dead, this value will obviously be null.

At the beginning of the failover procedure, ClusterControl builds a list of slaves which can be promoted to master. Most of the time, it will contain all slaves in the topology but the user has some additional control over it. There are two variables you can set in the cmon configuration:

replicaton_failover_whitelist

and

replicaton_failover_blacklist

The whitelist contains a list of IP’s or hostnames of slaves which should be used as potential master candidates. If this variable is set, only those hosts will be considered. The second variable may contain a list of hosts which will never be considered as master candidate. You can use it to list slaves that are used for backups or analytical queries. If the hardware varies between slaves, you may want to put here the slaves which use slower hardware.

Replication_failover_whitelist takes precedence, meaning the replication_failover_blacklist is ignored if replication_failover_whitelist is set

Integration

MHA is a standalone tool, it doesn’t integrate well with other external software. It does however provide hooks (pre/post failover scripts) which can be used to do some integration - for instance, execute scripts to make changes in the configuration of an external tool. MHA also uses read_only value to differentiate between master and slaves - this can also be used by external tools to drive topology changes. One example would be ProxySQL - MHA can work with this proxy using both pre/post failover scripts and with read_only values, depending on the ProxySQL configuration. It’s worth mentioning that, in GTID mode, MHA doesn’t support MariaDB GTID - it only supports Oracle MySQL or Percona Server.

MRM integrates nicely with MaxScale - it can be used along MaxScale in a couple of ways. It could be set so MaxScale will do the work to monitor the health of the nodes and execute MRM as needed, to perform failovers. Another option is that MRM drives MaxScale - monitoring is done on MRM’s side and MaxScale’s configuration is updated as needed. MRM also sets read_only variables so it makes it compatible with other tools which understand those settings (like ProxySQL, for example). A direct integration with HAProxy is also available - MRM, if collocated, may modify the HAProxy configuration whenever the topology changes. On the cons side, MRM works only with MariaDB installations - it is not possible to use it with Oracle MySQL’s version of GTID.

ClusterControl uses read_only variables to differentiate between master and slave nodes. This is enough to integrate with every kind of proxy which could be deployed from ClusterControl: ProxySQL, MaxScale and HAProxy. Failover executed by ClusterControl will be detected and handled by any of those proxies. ClusterControl also integrates with external tools regarding management. It provides access to management console for MaxScale and, to some extend, to HAProxy. Advanced support for ProxySQL will be added shortly. Metrics are provided for HAProxy and ProxySQL. ClusterControl supports both Oracle GTID and MariaDB GTID.

Conclusion

If you are interested in details how MHA, MRM or ClusterControl handle failover, we’d like to encourage you to take a look at the blog posts listed below:

Below is a summary of the differences between the different HA solutions:

	MHA	MRM	ClusterControl
Replication support	non-GTID, Oracle GTID	MariaDB GTID	Oracle GTID and MariaDB GTID
Flapping	One failover allowed	Defaults are less restrictive but can be modified	One failover allowed unless it brings the master online
Lost transactions	Very good handling for non-GTID, no checking for transactions on master for GTID setups	No checking for transactions on master	No checking for transactions on master
Network Partitioning	No support built in, can be added through user-created scripts	Very good false positive detection using slaves, proxy or external scripts	No support at this moment, work in progress to build false positive detection using proxy and slaves
Roles	Support for whitelist and blacklist of hosts to promote to master	No support	Support for whitelist and blacklist of hosts to promote to master
Integration	Can be integrated with external tools using hooks. Uses read_only variable to identify master and slaves which helps to integrate with other tools that understand this pattern.	Close integration with MaxScale, integration with HAProxy is also available. Uses read_only variable to identify master and slaves which helps to integrate with other tools that understand this pattern.	Can be integrated with external tools using hooks. Uses read_only variable to identify master and slaves which helps to integrate with other tools that understand this pattern.

If we are talking about handling master failure, each of the solutions does its job well and feature-wise they are mostly on par. There are some differences in almost every aspect that we compared but, at the end, each of them should handle most of the master failures pretty well. ClusterControl lacks more advanced network partitioning detection but this will change soon. What could be important to keep in mind that those tools support different replication methods and this alone can limit your options. If you use non-GTID replication, MHA is the only option for you. If you use GTID, MHA and MRM are restricted to, respectively, Oracle MySQL and MariaDB GTID setups. Only ClusterControl (you can test it for free) is flexible enough to handle both types of GTID under one tool - this could be very useful if you have a mixed environment while you still would like to use one single tool to ensure high availability of your replication setup.

Tags:

high availability

MySQL

clustercontrol

Cost analysis and cost effectiveness for databases are hardly ever performed, while it actually makes a lot of sense to perform such calculations. The easiest method of gaining insights into these is by performing a total cost of ownership (TCO) calculation. You might have theories on what your greatest cost factor is, but do you really know for sure?

Why would you perform such a TCO analysis? As with most research: prove your theory of the highest cost factor wrong. The TCO is a great tool to give a precise cost analysis and would give you some surprising insights!

Cost factors for databases

The cost factors for databases can be divided into two separate groups: capital expenses (CAPEX) and operational expenses (OPEX). Both cost factors are part of the infrastructure lifecycle.

TCO lifetime cycle

Capital expenses are the costs you pay upfront during the acquire phase: hardware purchases, (non-recurring) licensing cost and any other one time cost factors like replacement parts. These expenses are a constant factor in the TCO and are spread out over the lifetime of your database servers. Most of these costs will happen in the acquire phase, however the replacement parts will obviously take place in the maintenance phase.

Operational expenses are the costs for running the database servers. As these costs are recurring, you pay them on a regular (e.g., yearly) interval and mostly during the maintenance phase. These cost include datacenter/rack rental, power consumption, network usage and operational costs like (remote) hands and personnel. The last one includes sysops, DBAs and all costs made to facilitate them like desks, office space and training. Since these expenses are recurring, they will continue to grow during the lifetime of your database servers. The longer you operate these servers, the higher the operational expenses (OPEX) will be.

This means that the longer you use your database servers, the share between CAPEX and OPEX will shift towards a higher share of OPEX. The one time purchase of hardware may be considered a high cost upfront, but given that you will probably use the hardware for more than three years, it justifies the upfront cost.

For cloud hosting, the calculation will be similar. However, since you don’t have hardware to purchase upfront, the CAPEX will be a lot lower. As cloud hosting has a recurring monthly cost, the OPEX will be higher. In some cases, your cloud provider may calculate some (setup) cost upfront and this should be treated as CAPEX.

Example calculation for hardware

In this example we will make a calculation of a small company (under 100 employees) that hosts on hardware in their own racks in a data center. This company has two dedicated sysops and one experienced DBA (1-4 years), where the DBA is managing around 20 databases and the sysops around 200 hosts. The average DBA salary for this is $65,000, so the annual cost per database would be $3,250. The sysops average around $50,000 for the same experience, and cost $250 per host per year. The sysops are also the people who manage the datacenter. We will not factor in the facilitation costs as this would get over complicated.

For our example cluster, we will make use of a three node MySQL replication setup: one master and two slave nodes. Hardware is based upon the Dell R730 with 64GB of memory and six 400GB SSDs, as this is a very popular model for this purpose. The price of a R730 with this configuration is currently $7655.

Rental cost of a full rack is nowadays around $350[1], so the colocation cost per U is roughly $8 per month. Since the R730 is a 2U unit, the total cost for our databases would be $48 per month.

Modern colocation costs factor out the power consumption, as the power is a variable factor. Prices for power with colocation can vary a lot, but it currently averages around $0.20 per kWh. The average database server consumes around 200 watts, which results in a 144kWh consumption per month per server. For our three database servers this would result in $86 per month.

This results in the following TCO:

Cost item	CAPEX	OPEX (per year)	TCO (3 years)
Purchase: hardware	$22,965
Professional support (DBA / Sysop)		$10,500
Colocation cost		$576
Power cost (200W)		$1,032
Replacement parts	$1,500
Total	$24,465	$12,108	$60,789

TCO for database servers (hardware)

There are a couple of conclusions we can draw from this calculation. Cost for colocation, power and replacement parts are neglectable, compared to the other cost factors. Also during the lifetime of a database server, the support costs make up more than half of the total costs. And are far higher than the original purchase price of the servers.

Example calculation cloud hosting

In this example we will make a calculation for a company that hosts in the cloud. To compare fairly, we will again make use of a three node MySQL replication setup on EC2. Amazon provides a nice TCO calculator for these purposes, so we made use of this as input for the calculations below.

To make the database servers comparable, we chose the i3.2xlarge, which (currently) has 8 vCPUs, 61GB of RAM and 1900GB of SSD storage. This currently costs $0.624 per hour, which is slightly below $15 per day and $5466 per year.

In the cloud the upfront investments (CAPEX) are not necessary. This is true in many cases, except if you make use of reserved instances like in AWS. With reserved instances, you make a claim on Amazon to reserve (performant) capacity for you, that you can use at will. In our calculation, we will not make use of reserved instances. Next to the lower CAPEX, our OPEX should be lower since our sysops don’t have to go to the data center or install these servers.

This results in the following TCO:

Cost item	CAPEX	OPEX (per year)	TCO (3 years)
Professional support (DBA only)		$9,750
AWS 3x i3.2xlarge		$16,399
Total	$0	$27,149	$78,447

TCO for databases (cloud hosted)

Even though we have eliminated our upfront costs and capital investments (CAPEX), the OPEX is really high due to the premium we have to pay for high performance instances in AWS. Over a three year period, your TCO will be higher than having your own hardware

OPEX has a large influence

As you can see from these calculations, the influence of the operational costs (OPEX) during the lifetime of the servers is far greater than the initial large investment of the CAPEX. This is mostly due to running (and owning) these servers for multiple years.

In the case of owning your own hardware, we have shown that the operational costs even outweigh the initial costs for purchasing these servers. For the AWS example, the total costs of “owning” these servers is even higher than for the hardware example. This is the premium paid for flexibility, as with a cloud environment you are free to upgrade to a newer instance every year.

For both examples it is clear that the professional support for running these databases is relatively high. It looks like the sysops are clearly far more efficient when they are managing more than 200 hosts, but they don’t have to bother with the additional tasks that the DBA is supposed to do. If you could only make the DBA more efficient.

Making the DBA more efficient

Luckily, there are a few methods to make your DBA more efficient and handle more database servers. Either have the DBA relieved from various tasks (others will perform these tasks) or have the DBA perform less tasks through automation. The low hanging fruit would be to automate the most time consuming tasks or the most error prone ones.

The most time consuming DBAs tasks are provisioning, deployments, performance tuning, troubleshooting, backups and scaling clusters. Provisioning and installation of software is a repetitive task that can easily be automated, just like copying data and setting up replication when scaling out with read slaves. Similarly backups can be automated and with the help of a backup manager the restore process as well.

For the most error prone actions we could identify setting up replication, failover and schema management. In these tasks a single typo could lead to disastrous proportions where the only way to resolve is to restore from an earlier made backup.

Automation of repetitive tasks and chores is a tedious, but useful task. It takes time to automate each and every one of these tasks, and your DBA will most certainly be more busy with automation than with the other (daily) tasks. This automation may actually interfere with the normal day to day jobs, especially if the DBA isn’t a developer type and is struggling with the automation jobs. Wouldn’t it be more productive to offer the DBA a readily available toolset to work with instead? Or perhaps provide the sysops with a way that allows them to perform the tasks instead?

Do you even need a DBA?

Not all companies have full-time DBAs, at least not if you have a small number of databases. A DBA is a very specialized role, where a single person is dedicated to perform all database related tasks. This requires specialized knowledge of specific hardware, specific software, operating systems and in depth knowledge of SQL. Placing such a specialized person on a small number of databases, means the person will not have a lot to do and only cost money.

A sysop (or system administrator) is more of a generalist, and has to do a bit of everything. They generally manage hardware, operating systems, network, security, applications, databases and storage. They may have specific knowledge on one or more of these systems, but they can’t be specialized in all of them. The knowledge gap becomes more apparent when it comes to distributed setups (replication or clusters) and need for high availability.

As shown in the example calculations, the difference in salary expresses this picture as well. A DBA will cost more than a sysop and is more difficult to find as there’s not many of them around. That means that you probably have to do without a DBA. The challenge for the sysop is to have enough time to keep the databases perform well, troubleshoot any issues, monitor for any anomalies, maintain high availability, ensure data integrity and that data is backed up (and backups are verified to be ok).

ClusterControl saving costs

In ClusterControl the full database lifetime cycle has already been automated for the most popular open source databases. This means the DBA doesn’t necessarily have to automate his/her job anymore, as most of the tasks already have been automated in ClusterControl. Reliability will increase as the automation done in ClusterControl has been tested through and through, while what the DBA produced will only be tested directly in production.

Implementing a complete lifetime cycle management tool like ClusterControl means the DBA can now spend more time on useful things and manage more database servers. Or it could also mean people with less knowledge on databases can perform the same tasks.

Tags:

TCO

AWS

clustercontrol

This post is a continuation of our previous post on Online Schema Upgrade in Galera using TOI method. We will now show you how to perform a schema upgrade using the Rolling Schema Upgrade (RSU) method.

RSU and TOI

As we discussed, when using TOI, a change happens at the same time on all of the nodes. This can become a serious limitation as such way of executing schema changes implies that no other queries can be executed. For long ALTER statements, the cluster may be not available for hours even. Obviously, this is not something you can accept in production. RSU method addresses this weakness - changes happen on one node at a time while other nodes are not affected and can serve traffic. Once ALTER completes on one node, it will rejoin the cluster and you can proceed with executing a schema change on the next node.

Such behavior comes with its own set of limitations. The main one is that scheduled schema change has to be compatible. What does it mean? Let’s think about it for a while. First of all we need to keep in mind that the cluster is up and running all the time - the altered node has to be able to accept all of the traffic which hit the remaining nodes. In short, a DML executed on the old schema has to work also on the new schema (and vice-versa if you use some sort of round-robin-like connection distribution in your Galera Cluster). We will focus on the MySQL compatibility, but you also have to remember that your application has to work with both altered and non-altered nodes - make sure that your alter won’t break the application logic. One good practice is to explicitly pass column names to queries - don’t rely on “SELECT *” because you never know how many columns you’ll get in return.

Galera and Row-based binary log format

Ok, so DML has to work on old and new schemas. How are DML’s transferred between Galera nodes? Does it affect what changes are compatible and what are not? Yes, indeed - it does. Galera does not use regular MySQL replication but it still relies on it to transfer events between the nodes. To be precise, Galera uses ROW format for events. An event in row format (after decoding) may look like this:

### INSERT INTO `schema`.`table`
### SET
###   @1=1
###   @2=1
###   @3='88764053989'
###   @4='14700597838'

Or:

### UPDATE `schema`.`table`
### WHERE
###   @1=1
###   @2=1
###   @3='88764053989'
###   @4='14700597838'
### SET
###   @1=2
###   @2=2
###   @3='88764053989'
###   @4='81084251066'

As you can see, there is a visible pattern: a row is identified by its content. There are no column names, just their order. This alone should turn on some warning lights: “what would happen if I remove one of the columns?” Well, if it is the last column, this is acceptable. If you would remove a column in the middle, this will mess up with the column order and, as a result, replication will break. Similar thing will happen if you add some column in the middle, instead of at the end. There are more constraints, though. Changing column definition will work as long as it is the same data type - you can alter INT column to become BIGINT but you cannot change INT column into VARCHAR - this will break replication. You can find detailed description of what change is compatible and what isn’t in the MySQL documentation. No matter what you can see in the documentation, to stay on the safe side, it’s better to run some tests on a separate development/staging cluster. Make sure it will work not only according to the documentation, but that it also works fine in your particular setup.

All in all, as you can clearly see, performing RSU in a safe way is much more complex than just running couple of commands. Still, as commands are important, let’s take a look at the example of how you can perform the RSU and what can go wrong in the process.

RSU example

Initial setup

Let’s imagine a rather simple example of an application. We will use a bechmark tool, Sysbench, to generate content and traffic, but the flow will be the same for almost every application - Wordpress, Joomla, Drupal, you name it. We will use HAProxy collocated with our application to split reads and writes among Galera nodes in a round-robin fashion. You can check below how HAProxy sees the Galera cluster.

Whole topology looks like below:

Traffic is generated using the following command:

while true ; do sysbench /root/sysbench/src/lua/oltp_read_write.lua --threads=4 --max-requests=0 --time=3600 --mysql-host=10.0.0.100 --mysql-user=sbtest --mysql-password=sbtest --mysql-port=3307 --tables=32 --report-interval=1 --skip-trx=on --table-size=100000 --db-ps-mode=disable run ; done

Schema looks like below:

mysql> SHOW CREATE TABLE sbtest1.sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=29986632 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

First, let’s see how we can add an index to this table. Adding an index is a compatible change which can be easily done using RSU.

mysql> SET SESSION wsrep_OSU_method=RSU;
Query OK, 0 rows affected (0.00 sec)

mysql> ALTER TABLE sbtest1.sbtest1 ADD INDEX idx_new (k, c);
Query OK, 0 rows affected (5 min 19.59 sec)

As you can see in the Node tab, the host on which we executed the change has automatically switched to Donor/Desynced state which ensures that this host will not impact the rest of the cluster if it gets slowed down by the ALTER.

Let’s check how our schema looks now:

mysql> SHOW CREATE TABLE sbtest1.sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`),
  KEY `idx_new` (`k`,`c`)
) ENGINE=InnoDB AUTO_INCREMENT=29986632 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

As you can see, the index has been added. Please keep in mind, though, this happened only on that particular node. To accomplish a full schema change, you have to follow this process on the remaining nodes of the Galera Cluster. To finish with the first node, we can switch wsrep_OSU_method back to TOI:

SET SESSION wsrep_OSU_method=TOI;
Query OK, 0 rows affected (0.00 sec)

We are not going to show the remainder of the process, because it’s the same - enable RSU on the session level, run ALTER, enable TOI. What’s more interesting is what would happen if the change will be incompatible. Let’s take again a quick look at the schema:

mysql> SHOW CREATE TABLE sbtest1.sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`),
  KEY `idx_new` (`k`,`c`)
) ENGINE=InnoDB AUTO_INCREMENT=29986632 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Let’s say we want to change the type of column ‘k’ from INT to VARCHAR(30) on one node.

mysql> SET SESSION wsrep_OSU_method=RSU;
Query OK, 0 rows affected (0.00 sec)

mysql> ALTER TABLE sbtest1.sbtest1 MODIFY COLUMN k VARCHAR(30) NOT NULL DEFAULT '';
Query OK, 10004785 rows affected (1 hour 14 min 51.89 sec)
Records: 10004785  Duplicates: 0  Warnings: 0

Now, lets take a look at the schema:

mysql> SHOW CREATE TABLE sbtest1.sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` varchar(30) NOT NULL DEFAULT '',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`),
  KEY `idx_new` (`k`,`c`)
) ENGINE=InnoDB AUTO_INCREMENT=29986632 DEFAULT CHARSET=latin1
1 row in set (0.02 sec)

Everything is as we expect - ‘k’ column has been changed to VARCHAR. Now we can check if this change is acceptable or not for the Galera Cluster. To test it, we will use one of remaining, unaltered nodes to execute the following query:

mysql> INSERT INTO sbtest1.sbtest1 (k, c, pad) VALUES (123, 'test', 'test');
Query OK, 1 row affected (0.19 sec)

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Let’s see what happened.

It definitely doesn’t look good - our node is down. Logs will give you more details:

2017-04-07T10:51:14.873524Z 5 [ERROR] Slave SQL: Column 1 of table 'sbtest1.sbtest1' cannot be converted from type 'int' to type 'varchar(30)', Error_code: 1677
2017-04-07T10:51:14.873560Z 5 [Warning] WSREP: RBR event 3 Write_rows apply warning: 3, 982675
2017-04-07T10:51:14.879120Z 5 [Warning] WSREP: Failed to apply app buffer: seqno: 982675, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 2th time
2017-04-07T10:51:14.879272Z 5 [ERROR] Slave SQL: Column 1 of table 'sbtest1.sbtest1' cannot be converted from type 'int' to type 'varchar(30)', Error_code: 1677
2017-04-07T10:51:14.879287Z 5 [Warning] WSREP: RBR event 3 Write_rows apply warning: 3, 982675
2017-04-07T10:51:14.879399Z 5 [Warning] WSREP: Failed to apply app buffer: seqno: 982675, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 3th time
2017-04-07T10:51:14.879618Z 5 [ERROR] Slave SQL: Column 1 of table 'sbtest1.sbtest1' cannot be converted from type 'int' to type 'varchar(30)', Error_code: 1677
2017-04-07T10:51:14.879633Z 5 [Warning] WSREP: RBR event 3 Write_rows apply warning: 3, 982675
2017-04-07T10:51:14.879730Z 5 [Warning] WSREP: Failed to apply app buffer: seqno: 982675, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 4th time
2017-04-07T10:51:14.879911Z 5 [ERROR] Slave SQL: Column 1 of table 'sbtest1.sbtest1' cannot be converted from type 'int' to type 'varchar(30)', Error_code: 1677
2017-04-07T10:51:14.879924Z 5 [Warning] WSREP: RBR event 3 Write_rows apply warning: 3, 982675
2017-04-07T10:51:14.885255Z 5 [ERROR] WSREP: Failed to apply trx: source: 938415a6-1aab-11e7-ac29-0a69a4a1dafe version: 3 local: 0 state: APPLYING flags: 1 conn_id: 125559 trx_id: 2856843 seqnos (l: 392283, g: 9
82675, s: 982674, d: 982563, ts: 146831275805149)
2017-04-07T10:51:14.885271Z 5 [ERROR] WSREP: Failed to apply trx 982675 4 times
2017-04-07T10:51:14.885281Z 5 [ERROR] WSREP: Node consistency compromized, aborting…

As can be seen, Galera complained about the fact that the column cannot be converted from INT to VARCHAR(30). It attempted to re-execute the writeset four times but it failed, unsurprisingly. As such, Galera determined that the node consistency is compromised and the node is kicked out of the cluster. Remaining content of the logs shows this process:

2017-04-07T10:51:14.885560Z 5 [Note] WSREP: Closing send monitor...
2017-04-07T10:51:14.885630Z 5 [Note] WSREP: Closed send monitor.
2017-04-07T10:51:14.885644Z 5 [Note] WSREP: gcomm: terminating thread
2017-04-07T10:51:14.885828Z 5 [Note] WSREP: gcomm: joining thread
2017-04-07T10:51:14.885842Z 5 [Note] WSREP: gcomm: closing backend
2017-04-07T10:51:14.896654Z 5 [Note] WSREP: view(view_id(NON_PRIM,6fcd492a,37) memb {
        b13499a8,0
} joined {
} left {
} partitioned {
        6fcd492a,0
        938415a6,0
})
2017-04-07T10:51:14.896746Z 5 [Note] WSREP: view((empty))
2017-04-07T10:51:14.901477Z 5 [Note] WSREP: gcomm: closed
2017-04-07T10:51:14.901512Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2017-04-07T10:51:14.901531Z 0 [Note] WSREP: Flow-control interval: [16, 16]
2017-04-07T10:51:14.901541Z 0 [Note] WSREP: Received NON-PRIMARY.
2017-04-07T10:51:14.901550Z 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 982675)
2017-04-07T10:51:14.901563Z 0 [Note] WSREP: Received self-leave message.
2017-04-07T10:51:14.901573Z 0 [Note] WSREP: Flow-control interval: [0, 0]
2017-04-07T10:51:14.901581Z 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2017-04-07T10:51:14.901589Z 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 982675)
2017-04-07T10:51:14.901602Z 0 [Note] WSREP: RECV thread exiting 0: Success
2017-04-07T10:51:14.902701Z 5 [Note] WSREP: recv_thread() joined.
2017-04-07T10:51:14.902720Z 5 [Note] WSREP: Closing replication queue.
2017-04-07T10:51:14.902730Z 5 [Note] WSREP: Closing slave action queue.
2017-04-07T10:51:14.902742Z 5 [Note] WSREP: /usr/sbin/mysqld: Terminated.

Of course, ClusterControl will attempt to recover such node - recovery involves running SST so incompatible schema changes will be removed, but we will be back at the square one - our schema change will be reversed.

As you can see, while running RSU is a very simple process, underneath it can be rather complex. It requires some tests and preparations to make sure that you won’t lose a node just because the schema change was not compatible.

Tags:

galera

schema change

In this video Art van Scheppingen shows you how easy it is to add your Galera setups to ClusterControl. ClusterControl provides you with a single console to deploy, manage, monitor, and scale your Galera setups and mixed environment. In addition to advanced monitoring, ClusterControl affords feature benefits like comprehensive security, automation, reporting, and point-and-click deployments.

Watch the video and learn more about the following topics:

Deploying a new Galera cluster in ClusterControl
Importing an existing Galera Cluster into ClusterControl
Navigating the Cluster Overview section in ClusterControl
Accessing your Galera monitoring dashboards in ClusterControl
Accessing your Galera node data tables in ClusterControl
Accessing server statistics in ClusterControl
Navigating the nodes section in ClusterControl
Enabling binary logging from your Galera nodes in ClusterControl
Using advisors in your Galera setup
Performing backups
Access log files for
Adding new nodes to your Galera Clusters
Adding Asynchronous slaves
Cloning Clusters in ClusterControl
Load balancing for HAproxy, ProxySQL, MaxScale
Galera arbitrator in ClusterControl

Tags:

clustercontrol

galera cluster

severalnines

This video walks you through the features that ClusterControl offers for Galera Cluster and how you can use them to deploy, manage, monitor and scale your open source database environments.

ClusterControl and Galera Cluster

ClusterControl provides advanced deployment, management, monitoring, and scaling functionality to get your Galera clusters up-and-running using proven methodologies that you can depend on to work.

At the core of ClusterControl is it’s automation functionality that let’s you automate many of the database tasks you have to perform regularly like deploying new clusters, adding and scaling new nodes, running backups and upgrades, and more.

ClusterControl is cluster-aware, topology-aware and able to provision all members of the Galera Cluster, including the replication chain connected to it.

To learn more check out the following resources…

Tags:

Hybrid replication, i.e. combining Galera and asynchronous MySQL replication in the same setup, became much easier since GTID got introduced in MySQL 5.6. Although it was fairly straightforward to replicate from a standalone MySQL server to a Galera Cluster, doing it the other way round (Galera → standalone MySQL) was a bit more challenging. At least until the arrival of GTID.

There are a few good reasons to attach an asynchronous slave to a Galera Cluster. For one, long-running reporting/OLAP type queries on a Galera node might slow down an entire cluster, if the reporting load is so intensive that the node has to spend considerable effort coping with it. So reporting queries can be sent to a standalone server, effectively isolating Galera from the reporting load. In a belts and suspenders approach, an asynchronous slave can also serve as a remote live backup.

In this blog post, we will show you how to replicate a Galera Cluster to a MySQL server with GTID, and how to failover the replication in case the master node fails.

Hybrid Replication in MySQL 5.5

In MySQL 5.5, resuming a broken replication requires you to determine the last binary log file and position, which are distinct on all Galera nodes if binary logging is enabled. We can illustrate this situation with the following figure:

Galera cluster asynchronous slave topology without GTID

If the MySQL master fails, replication breaks and the slave will need to switch over to another master. You will need pick a new Galera node, and manually determine a new binary log file and position of the last transaction executed by the slave. Another option is to dump the data from the new master node, restore it on slave and start replication with the new master node. These options are of course doable, but not very practical in production.

How GTID Solves the Problem

GTID (Global Transaction Identifier) provides a better transactions mapping across nodes, and is supported in MySQL 5.6. In Galera Cluster, all nodes will generate different binlog files. The binlog events are the same and in the same order, but the binlog file names and offsets may vary. With GTID, slaves can see a unique transaction coming in from several masters and this could easily being mapped into the slave execution list if it needs to restart or resume replication.

Galera cluster asynchronous slave topology with GTID failover

All necessary information for synchronizing with the master is obtained directly from the replication stream. This means that when you are using GTIDs for replication, you do not need to include MASTER_LOG_FILE or MASTER_LOG_POS options in the CHANGE MASTER TO statement. Instead it is necessary only to enable the MASTER_AUTO_POSITION option. You can find more details about the GTID in the MySQL Documentation page.

Setting Up Hybrid Replication by hand

Make sure the Galera nodes (masters) and slave(s) are running on MySQL 5.6 before proceeding with this setup. We have a database called sbtest in Galera, which we will replicate to the slave node.

1. Enable required replication options by specifying the following lines inside each DB node’s my.cnf (including the slave node):

For master (Galera) nodes:

gtid_mode=ON
log_bin=binlog
log_slave_updates=1
enforce_gtid_consistency
expire_logs_days=7
server_id=1         # 1 for master1, 2 for master2, 3 for master3
binlog_format=ROW

For slave node:

gtid_mode=ON
log_bin=binlog
log_slave_updates=1
enforce_gtid_consistency
expire_logs_days=7
server_id=101         # 101 for slave
binlog_format=ROW
replicate_do_db=sbtest
slave_net_timeout=60

2. Perform a cluster rolling restart of the Galera Cluster (from ClusterControl UI > Manage > Upgrade > Rolling Restart). This will reload each node with the new configurations, and enable GTID. Restart the slave as well.

3. Create a slave replication user and run following statement on one of the Galera nodes:

mysql> GRANT REPLICATION SLAVE ON *.* TO 'slave'@'%' IDENTIFIED BY 'slavepassword';

4. Log into the slave and dump database sbtest from one of the Galera nodes:

$ mysqldump -uroot -p -h192.168.0.201 --single-transaction --skip-add-locks --triggers --routines --events sbtest > sbtest.sql

5. Restore the dump file onto the slave server:

$ mysql -uroot -p < sbtest.sql

6. Start replication on the slave node:

mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_HOST = '192.168.0.201', MASTER_PORT = 3306, MASTER_USER = 'slave', MASTER_PASSWORD = 'slavepassword', MASTER_AUTO_POSITION = 1;
mysql> START SLAVE;

To verify that replication is running correctly, examine the output of slave status:

mysql> SHOW SLAVE STATUS\G
       ...
       Slave_IO_Running: Yes
       Slave_SQL_Running: Yes
       ...

Setting up Hybrid Replication using ClusterControl

In the previous paragraph we described all the necessary steps to enable the binary logs, restart the cluster node by node, copy the data and then setup replication. The procedure is a tedious task and you can easily make errors in one of these steps. In ClusterControl we have automated all the necessary steps.

1. For ClusterControl users, you can go to the nodes in the Nodes page and enable binary logging.

Enable binary logging on Galera cluster using ClusterControl

This will open a dialogue that allows you to set the binary log expiration, enable GTID and auto restart.

Enable binary logging with GTID enabled

This initiates a job, that will safely write these changes to the configuration, create replication users with the proper grants and restart the node safely.

Photo description

Repeat this process for each Galera node in the cluster, until all nodes indicate they are master.

All Galera Cluster nodes are now master

2. Add the asynchronous replication slave to the cluster

Adding an asynchronous replication slave to Galera Cluster using ClusterControl

And this is all you have to do. The entire process described in the previous paragraph has been automated by ClusterControl.

Changing Master

If the designated master goes down, the slave will retry to reconnect again in slave_net_timeout value (our setup is 60 seconds - default is 1 hour). You should see following error on slave status:

       Last_IO_Errno: 2003
       Last_IO_Error: error reconnecting to master 'slave@192.168.0.201:3306' - retry-time: 60  retries: 1

Since we are using Galera with GTID enabled, master failover is supported via ClusterControl when Cluster and Node Auto Recovery has been enabled. Whether the master would fail due to network connectivity or any other reason, ClusterControl will automatically fail over to the most suitable other master node in the cluster.

If you wish to perform the failover manually, simply change the master node as follows:

mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_HOST = '192.168.0.202', MASTER_PORT = 3306, MASTER_USER = 'slave', MASTER_PASSWORD = 'slavepassword', MASTER_AUTO_POSITION = 1;
mysql> START SLAVE;

In some cases, you might encounter a “Duplicate entry .. for key” error after the master node changed:

       Last_Errno: 1062
       Last_Error: Could not execute Write_rows event on table sbtest.sbtest; Duplicate entry '1089775' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysqld-bin.000009, end_log_pos 85789000

In older versions of MySQL, you can just use SET GLOBAL SQL_SLAVE_SKIP_COUNTER = n to skip statements, but it does not work with GTID. Miguel from Percona wrote a great blog post on how to repair this by injecting empty transactions.

Another approach, for smaller databases, could also be to just get a fresh dump from any of the available Galera nodes, restore it and use RESET MASTER statement:

mysql> STOP SLAVE;
mysql> RESET MASTER;
mysql> DROP SCHEMA sbtest; CREATE SCHEMA sbtest; USE sbtest;
mysql> SOURCE /root/sbtest_from_galera2.sql; -- repeat step #4 above to get this dump
mysql> CHANGE MASTER TO MASTER_HOST = '192.168.0.202', MASTER_PORT = 3306, MASTER_USER = 'slave', MASTER_PASSWORD = 'slavepassword', MASTER_AUTO_POSITION = 1;
mysql> START SLAVE;

You may also use pt-table-checksum to verify the replication integrity, more information in this blog post.

Note: Since in MySQL replication the slave applier is by default still single-threaded, do not expect the async replication performance to be the same as Galera’s parallel replication. For MySQL 5.6 and 5.7 there are options to make the asynchronous replication executed in parallel on the slave nodes, but in principle this replication is still depending on the correct order of transactions inside the same schema to happen. If the replication load is intensive and continuous, the slave lag will just keep growing. We have seen cases where slave could never catch up with the master.

Tags:

galera cluster

MySQL

clustercontrol

asynchronous replication

gtid

Combining Galera and asynchronous replication in the same MariaDB setup, aka Hybrid Replication, can be useful - e.g. as a live backup node in a remote datacenter or reporting/analytics server. We already blogged about this setup for Codership/Galera or Percona XtraDB Cluster users, but a master failover as described in that post does not work for MariaDB because of its different GTID approach. In this post, we will show you how to deploy an asynchronous replication slave to MariaDB Galera Cluster 10.x (with master failover!), using GTID with ClusterControl.

Preparing the Master

First and foremost, you must ensure that the master and slave nodes are running on MariaDB Galera 10.0.2 or later. A MariaDB replication slave requires at least one master with GTID among the Galera nodes. However, we would recommend users to configure all the MariaDB Galera nodes as masters. GTID, which is automatically enabled in MariaDB, will be used to do master failover.The following must be true for the masters:

At least one master among the Galera nodes
All masters must be configured with the same domain ID
log_slave_updates must be enabled
All masters’ MariaDB port is accessible by ClusterControl and slaves
Must be running MariaDB version 10.0.2 or later

From ClusterControl this is easily done by selecting Enable Binary Logging in the drop down for each node.

Enabling binary logging through ClusterControl

And then enable GTID in the dialogue:

Once Proceed has been clicked, a job will automatically configure the Galera node according to the settings described earlier.

If you wish to perform this action by hand, you can configure a Galera node as master, by changing the MariaDB configuration file for that node as per below:

gtid_domain_id=<must be same across all mariadb servers participating in replication>
server_id=<must be unique>
binlog_format=ROW
log_slave_updates=1
log_bin=binlog

After making these changes, restart the nodes one by one or using a rolling restart (ClusterControl > Manage > Upgrades > Rolling Restart)

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Preparing the Slave

For the slave, you would need a separate host or VM with or without MariaDB installed. If you do not have MariaDB installed, you need to perform the following tasks; configure root password (based on monitored_mysql_root_password), create slave user (based on repl_user, repl_password), configure MariaDB, start the server and finally start replication.

Adding the slave using ClusterControl, all these steps will be automated in the Add Replication Slave job as described below.

Add replication slave to MariaDB Cluster

After adding our slave node, our deployment will look like this:

MariaDB Galera asynchronous slave topology

Master Failover and Recovery

Since we are using MariaDB with GTID enabled, master failover is supported via ClusterControl when Cluster and Node Auto Recovery has been enabled. Whether the master would fail due to network connectivity or any other reason, ClusterControl will automatically fail over to the most suitable other master node in the cluster.

Automatic slave failover to another master in Galera cluster

This way ClusterControl will add a robust asynchronous slave capability to your MariaDB Cluster!

Tags:

asynchronous replication

Our journey in adopting MySQL and MariaDB in containerized environments continues, with ClusterControl coming into the picture to facilitate deployment and management. We already have our ClusterControl image hosted in Docker Hub, where it can deploy different replication/cluster topologies on multiple containers. With the introduction of Docker Swarm, a native orchestration tools embedded inside Docker Engine, scaling and provisioning containers has become much easier. It also has high availability covered by running services on multiple Docker hosts.

In this blog post, we’ll be experimenting with automatic provisioning of Galera Cluster on Docker Swarm with ClusterControl. ClusterControl would usually deploy database clusters on bare-metal, virtual machines and cloud instances. ClusterControl relies on SSH (through libssh) as core communication module to connect to the managed hosts, so these would not require any agents. The same rule can be applied to containers, and that’s what we are going to show in this blog post.

ClusterControl as Docker Swarm Service

We have built a Docker image with extended logic to handle deployment in container environments in a semi-automatic way. The image is now available on Docker Hub and the code is hosted in our Github repository. Please note that only this image is capable of deploying on containers, and is not available in the standard ClusterControl installation packages.

The extended logic is inside deploy-container.sh, a script that monitors a custom table inside CMON database called “cmon.containers”. The created database container shall report and register itself into this table and this script will look for new entries and perform the necessary action using a ClusterControl CLI. The deployment is automatic, and you can monitor the progress directly from the ClusterControl UI or using “docker logs” command.

Before we go further, take note of some prerequisites for running ClusterControl and Galera Cluster on Docker Swarm:

Docker Engine version 1.12 and later.
Docker Swarm Mode is initialized.
ClusterControl must be connected to the same overlay network as the database containers.

To run ClusterControl as a service using “docker stack”, the following definition should be enough:

 clustercontrol:
    deploy:
      replicas: 1
    image: severalnines/clustercontrol
    ports:
      - 5000:80
    networks:
      - galera_cc

Or, you can use the “docker service” command as per below:

$ docker service create --name cc_clustercontrol -p 5000:80 --replicas 1 severalnines/clustercontrol

Or, you can combine the ClusterControl service together with the database container service and form a “stack” in a compose file as shown in the next section.

Base Containers as Docker Swarm Service

The base container’s image called “centos-ssh” is based on CentOS 6 image. It comes with a couple of basic packages like SSH server, clients, curl and mysql client. The entrypoint script will download ClusterControl’s public key for passwordless SSH during startup. It will also register itself to the ClusterControl’s CMON database for automatic deployment.

Running this container requires a couple of environment variables to be set:

CC_HOST - Mandatory. By default it will try to connect to “cc_clustercontrol” service name. Otherwise, define its value in IP address, hostname or service name format. This container will download the SSH public key from ClusterControl node automatically for passwordless SSH.
CLUSTER_TYPE - Mandatory. Default to “galera”.
CLUSTER_NAME - Mandatory. This name distinguishes the cluster with others from ClusterControl perspective. No space allowed and it must be unique.
VENDOR - Default is “percona”. Other supported values are “mariadb”, “codership”.
DB_ROOT_PASSWORD - Mandatory. The database root password for the database server. In this case, it should be MySQL root password.
PROVIDER_VERSION - Default is 5.6. The database version by the chosen vendor.
INITIAL_CLUSTER_SIZE - Default is 3. This indicates how ClusterControl should treat newly registered containers, whether they are for new deployments or for scaling out. For example, if the value is 3, ClusterControl will wait for 3 containers to be running and registered into the CMON database before starting the cluster deployment job. Otherwise, it waits 30 seconds for the next cycle and retries. The next containers (4th, 5th and Nth) will fall under the “Add Node” job instead.

To run the container, simply use the following stack definition in a compose file:

  galera:
    deploy:
      replicas: 3
    image: severalnines/centos-ssh
    ports:
      - 3306:3306
    environment:
      CLUSTER_TYPE: "galera"
      CLUSTER_NAME: "PXC_Docker"
      INITIAL_CLUSTER_SIZE: 3
      DB_ROOT_PASSWORD: "mypassword123"
    networks:
      - galera_cc

By combining them both (ClusterControl and database base containers), we can just deploy them under a single stack as per below:

version: '3'

services:

  galera:
    deploy:
      replicas: 3
      restart_policy:
        condition: on-failure
        delay: 10s
    image: severalnines/centos-ssh
    ports:
      - 3306:3306
    environment:
      CLUSTER_TYPE: "galera"
      CLUSTER_NAME: "Galera_Docker"
      INITIAL_CLUSTER_SIZE: 3
      DB_ROOT_PASSWORD: "mypassword123"
    networks:
      - galera_cc

  clustercontrol:
    deploy:
      replicas: 1
    image: severalnines/clustercontrol
    ports:
      - 5000:80
    networks:
      - galera_cc

networks:
  galera_cc:
    driver: overlay

Save the above lines into a file, for example docker-compose.yml in the current directory. Then, start the deployment:

$ docker stack deploy --compose-file=docker-compose.yml cc
Creating network cc_galera_cc
Creating service cc_clustercontrol
Creating service cc_galera

Docker Swarm will deploy one container for ClusterControl (replicas:1) and another 3 containers for the database cluster containers (replicas:3). The database container will then register itself into the CMON database for deployment.

Wait for a Galera Cluster to be ready

The deployment will be automatically picked up by the ClusterControl CLI. So you basically don’t have to do anything but wait. The deployment usually takes around 10 to 20 minutes depending on the network connection.

Open the ClusterControl UI at http://{any_Docker_host}:5000/clustercontrol, fill in the default administrator user details and log in. Monitor the deployment progress under Activity -> Jobs, as shown in the following screenshot:

Or, you can look at the progress directly from the docker logs command of the ClusterControl container:

$ docker logs -f $(docker ps | grep clustercontrol | awk {'print $1'})>> Found the following cluster(s) is yet to deploy:
Galera_Docker>> Number of containers for Galera_Docker is lower than its initial size (3).>> Nothing to do. Will check again on the next loop.>> Found the following cluster(s) is yet to deploy:
Galera_Docker>> Found a new set of containers awaiting for deployment. Sending deployment command to CMON.>> Cluster name         : Galera_Docker>> Cluster type         : galera>> Vendor               : percona>> Provider version     : 5.7>> Nodes discovered     : 10.0.0.6 10.0.0.7 10.0.0.5>> Initial cluster size : 3>> Nodes to deploy      : 10.0.0.6;10.0.0.7;10.0.0.5>> Deploying Galera_Docker.. It's gonna take some time..>> You shall see a progress bar in a moment. You can also monitor>> the progress under Activity (top menu) on ClusterControl UI.
Create Galera Cluster
- Job  1 RUNNING    [██▊       ]  26% Installing MySQL on 10.0.0.6

That’s it. Wait until the deployment completes and you will then be all set with a three-node Galera Cluster running on Docker Swarm, as shown in the following screenshot:

In ClusterControl, it has the same look and feel as what you have seen with Galera running on standard hosts (non-container) environment.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Management

Managing database containers is a bit different with Docker Swarm. This section provides an overview of how the database containers should be managed through ClusterControl.

Connecting to the Cluster

To verify the status of the replicas and service name, run the following command:

$ docker service ls
ID            NAME               MODE        REPLICAS  IMAGE
eb1izph5stt5  cc_clustercontrol  replicated  1/1       severalnines/clustercontrol:latest
ref1gbgne6my  cc_galera          replicated  3/3       severalnines/centos-ssh:latest

If the application/client is running on the same Swarm network space, you can connect to it directly via the service name endpoint. If not, use the routing mesh by connecting to the published port (3306) on any of the Docker Swarm nodes. The connection to these endpoints will be load balanced automatically by Docker Swarm in a round-robin fashion.

Scale up/down

Typically, when adding a new database, we need to prepare a new host with base operating system together with passwordless SSH. In Docker Swarm, you just need to scale out the service using the following command to the number of replicas that you desire:

$ docker service scale cc_galera=5
cc_galera scaled to 5

ClusterControl will then pick up the new containers registered inside cmon.containers table and trigger add node jobs, for one container at a time. You can look at the progress under Activity -> Jobs:

Scaling down is similar, by using the “service scale” command. However, ClusterControl doesn’t know whether the containers that have been removed by Docker Swarm were part of the auto-scheduling or just a scale down (which indicates that we deliberately wanted the containers to be removed). Thus, to scale down from 5 nodes to 3 nodes, one would:

$ docker service scale cc_galera=3
cc_galera scaled to 3

Then, remove the stopped hosts from the ClusterControl UI by going to Nodes -> rollover the removed container -> click on the ‘X’ icon on the top right -> Confirm & Remove Node:

ClusterControl will then execute a remove node job and bring back the cluster to the expected size.

Failover

In case of container failure, Docker Swarm automatic rescheduling will kick in and there will be a new replacement container with the same IP address as the old one (with different container ID). ClusterControl will then start to provision this node from scratch, by performing the installation process, configuration and getting it to rejoin the cluster. The old container will be removed automatically from ClusterControl before the deployment starts.

Go ahead and try to kill one of the database containers:

$ docker kill [container ID]

You’ll see the new containers that Swarm created will be provisioned automatically by ClusterControl.

Creating a new cluster

To create a new cluster, just create another service or stack with a different CLUSTER_NAME and service name. The following is an example that we want to create another Galera Cluster running on MariaDB 10.1 (some extra environment variables are required for MariaDB 10.1):

version: '3'
services:
  galera2:
    deploy:
      replicas: 3
    image: severalnines/centos-ssh
    ports:
      - 3306
    environment:
      CLUSTER_TYPE: "galera"
      CLUSTER_NAME: "MariaDB_Galera"
      VENDOR: "mariadb"
      PROVIDER_VERSION: "10.1"
      INITIAL_CLUSTER_SIZE: 3
      DB_ROOT_PASSWORD: "mypassword123"
    networks:
      - cc_galera_cc

networks:
  cc_galera_cc:
    external: true

Then, create the service:

$ docker stack deploy --compose-file=docker-compose.yml db2

Go back to ClusterControl UI -> Activity -> Jobs and you should see a new deployment has started. After a couple of minutes, you will see the new cluster will be listed inside ClusterControl dashboard:

Destroying everything

To remove everything (including the ClusterControl container), you just need to remove the stack created by Docker Swarm:

$ docker stack rm cc
Removing service cc_clustercontrol
Removing service cc_galera
Removing network cc_galera_cc

That’s it, the whole stack has been removed. Pretty neat huh? You can start all over again by running the “docker stack deploy” command and everything will be ready after a couple of minutes.

Summary

The flexibility you get by running a single command to deploy or destroy a whole environment can be useful in different types of use cases such as backup verification, DDL procedure testing, query performance tweaking, experimenting for proof-of-concepts and also for staging temporary data. These use cases are closer to developer environment. With this approach, you can now treat a stateful service “statelessly”.

Would you like to see ClusterControl manage the whole database container stack through the UI via point and click? Let us know your thoughts in the comments section below. In the next blog post, we are going to look at how to perform automatic backup verification on Galera Cluster using containers.

Tags:

In today’s competitive technology environment, high availability is a must. There is no way around it - if your website or service is not available, then most probably you are losing money. It could relate directly to money loss - your customers cannot access your e-commerce service and they cannot spend their money (in addition, they are likely to use your competitors instead). Or it can be less directly linked - your sales reps cannot reach your web-based CRM system and that seriously limits their productivity. No matter how, a website which cannot be reached can be more or less serious for any organization. The question is - how do we ensure that your website will stay available? Assuming you are using MySQL or MariaDB (not unlikely, if it is a website), one of the technologies that can be utilized is Galera Cluster. In this blog post, we’ll show you how to leverage a high availability database to improve the availability of your site.

High availability is hard with databases

Every website is different, but in general, we’ll see some frontend webservers, a database backend, load balancers, file system storage and additional components like caching systems. To make a website highly available, we would need each component to be highly available so we do not have any single point of failure (SPOF).

The webserver tier is usually relatively easy to scale, as it is possible to deploy multiple instances behind a load balancer. If you would want to preserve session state across all webservers, one would probably store it in a shared database (Memcached, or even MySQL Cluster for a pretty robust alternative). Load balancers can be made highly available (e.g. HAProxy/Keepalived/VIP). There are a number of clustered file systems that can be used for the storage layer, we have previously covered solutions like csync2 with lsyncd, GlusterFS, OCFS2. The database service can be made highly available with e.g. DRBD, so all storage is replicated to a standby server. This means the service can be started on another host that has access to the database files.

This might not work very well for a high traffic website though, as all webservers will still be hitting the primary database instance. Failover time also can take a while, since you are failing over to a cold standby server and MySQL has to perform crash recovery when starting up.

MySQL master-slave replication is another option, and there are different ways to make it highly available. There are inconveniences though, as not all nodes are the same - you need to ensure to just write to one master and avoid diverging datasets across the nodes.

Galera Cluster for MySQL/MariaDB

Let’s start with discussing what Galera Cluster is and what it is not. It is a virtually synchronous, multi-master cluster. You can access any of the nodes and issue reads and writes - this is a significant improvement compared to replication setups - no need for failovers and master promotions, if one node is down, usually you can just connect to another node and execute queries. Galera provides a self-healing mechanism through state transfers - State Snapshot Transfer (SST) and Incremental State Transfer (IST). What it means is that when a node joins a cluster, Galera will attempt to bring it back to sync. It may just copy missing data from other node’s gcache (IST) or, if none of nodes contain data in its gcache, it will copy all of the contents of one of nodes to the joining node (via SST). This also makes it very easy for a Galera Cluster to recover even from serious failure conditions. Galera Cluster is a method to scale reads - you can increase a size of the cluster and you can read from all of the slaves.

On the other hand, you have to keep in mind what Galera is not. Galera is not a solution which implements sharding (like NDB Cluster does) - it does not shard the data automatically, each and every Galera node contains the same full data set, just like a standalone MySQL node. Therefore, Galera is not a solution which can help you scale writes. It can help you squeeze more writes than standard, single-threaded replication as it can utilize multiple writers at once, but as of MySQL 5.7, you can use multithreaded replication for every workload so it’s not the same advantage it used to be. Galera is not a solution which can be left alone - even though it has some auto healing features, it still requires user supervision and it happens pretty often that Galera cannot recover on its own.

Having said that, Galera Cluster is still a great piece of software, which can be utilized to build highly available clusters. In the next section we’ll show you different deployment patterns for Galera. Before we get there, there is one more very important bit of information that is required to understand why we want to deploy Galera clusters the way we are about to describe - quorum calculations. Galera has a mechanism which prevents split brain scenarios from happening. It detects number of available nodes and checks if there is a quorum available - Galera has to see (50% + 1) nodes to accept traffic. Otherwise it assumes that a split brain happened and it is a part of a minority segment which does not contain current data nor it can accept writes. Ok, now let’s talk about different ways in which you can deploy Galera cluster.

Deploying Galera cluster

Basic deployment - three node cluster

The most common way in which Galera cluster is deployed is to use an odd number of nodes and just deploy them. The most basic setup for Galera is a three node cluster. This setup is enough to survive a loss of one node - it still can operate just fine even though its read capacity decreases and it cannot tolerate any more failures. Such setup is very common to be used as an entry level - you can always add more nodes in the future, keeping in mind that you should use an odd number of nodes. We have seen Galera clusters as big as 11 - 13 nodes.

Minimalistic approach - two nodes + garbd

If you want to do some testing with Galera, you can reduce the cluster size to two nodes. A 2-node cluster does not provide any fault tolerance but we can improve this by leveraging Galera Arbitrator (garbd) - a daemon which can be started on a third node. It will receive all of the Galera traffic and for the purpose of detecting failures and forming a quorum, it acts as a Galera node. Given that garbd doesn’t need to apply writesets (it just accepts them, no further action is taken), it doesn’t require beefy hardware, like database nodes would need. This reduce the cost of hardware that’s needed to build the cluster.

One step further - add asynchronous slave

At any point you can extend your Galera cluster by adding one or more asynchronous slaves. Ideally, your Galera cluster has GTID enabled - it makes adding and reslaving slaves so much easier. Even if not, it is still possible to create a slave, although replication is much less likely to survive a crash of the “master” galera node. Such asynchronous slave can be used for a variety of reasons. It can be used as a backup host - run your backups on it to minimize the impact on Galera cluster. It can also be used for heavier, OLAP queries - again, to remove load from the Galera cluster. Another reason why you may want to use asynchronous slave is to build a Disaster Recovery environment - set it up in a separate datacenter and, in case the location of your Galera Cluster would go up in flames, you will still have a copy of your data in a safe location.

Multi-datacenter Galera clusters

If you really care about availability of your data, you can improve it by spanning your Galera cluster across multiple datacenters. Galera can be used across the WAN - some reconfiguration may be needed to make it better adapted to higher latency of WAN connections but it is totally suitable to work in such environment. The only blocker would be if your application frequently modifies a very small subset of rows - this could significantly reduce number of queries per second it will be able to execute.

When talking about WAN-spanning Galera clusters it is important to mention segments. Segments are used in Galera to differentiate the nodes of the cluster which are collocated in the same DC. For example, you may want to configure all nodes in the datacenter “A” to use segment 1 and all nodes in the datacenter “B” to use segment 2. We won’t go into details here (we covered this bit extensively in one of our posts) but in short, using segments reduce inter-segment communication - something you definitely want if segments are connected using WAN.

Another important aspect of building a highly available Galera Cluster over multiple datacenters is to use an odd number of datacenters. Two is not enough to build a setup which would automatically tolerate a loss of a datacenter. Let’s analyze following setup.

As we mentioned earlier, Galera requires a quorum to operate. Let’s see what would happen if one datacenter is not available:

As you can clearly see, we ended up with 3 nodes up, so 50% only - not enough to form a quorum. Manual action is required to assess the situation and promote the remaining part of the cluster to form a “Primary Component” - this requires time and it prolongs downtime.

Let’s take a look at another option:

In this case we added one more node to the datacenter “B” to make sure it will take over the traffic when DC “A” is down. But what would happen if DC “B” goes down?

We have three nodes out of seven, less than 50%. The only way is to use a third datacenter. You can, of course, use one more segment of three nodes (or more, it’s important to have the same number of nodes in each datacenter) but you can minimize costs by utilizing Galera Arbitrator:

In this case, no matter which datacenter will stop operating, as long as it will be only one of them, with garbd we have a quorum:

In our example it is: seven nodes in total, three down, four (3 + garbd) up - enough to form a quorum.

Multiple Galera clusters connected using asynchronous replication

We mentioned that you can deploy an asynchronous slave to Galera cluster and use it, for example, as a DR host. You also have to keep in mind that Galera cluster, to some extent, can be treated as a single MySQL instance. This means there’s no reason why you couldn’t connect two separate Galera clusters using asynchronous replication. Such setup is pretty common. Again, as with regular slaves, it’s better to have GTID because it allows you to quickly reslave your “standby” Galera cluster to another Galera node in the “active” cluster. Without GTID this is also possible but it’s so much more time-consuming and error-prone. Of course, such setup does work together as the multi-DC Galera cluster would - there’s no automated recovery of the replication link between datacenters, there’s no automated reslaving if a “master” Galera node goes down. You may need to build your own tools to automate this process.

Proxy layer

There is no high availability without a proxy layer (unless you have built-in HA in your application). Static connections to a single host does not scale. What you need is a middleman - something that will sit between your application and your database tier and mask the complexity of your database setup. Ideally, your application will have just a single point of access to your databases - connect to a given host and port - that’s it.

You can choose between different proxies - HAProxy, MaxScale, ProxySQL - all of them can work with Galera Cluster. Some of them, though, may require additional configuration - you need to know what and how it needs to be done. Additionally, it’s extremely important so you won’t end up with a proxy as a single point of failure. Each of those proxies can be deployed in a highly available fashion, and you will need to have a few components work together.

How ClusterControl can help you to build highly available Galera Clusters?

ClusterControl can help deploy Galera using all the vendors that are available: Codership, Percona XtraDB Cluster and MariaDB Cluster.

When you deployed a cluster, you can easily scale it up. If needed, you can define different segments for WAN-spanning cluster.

You can also, if you want, deploy an asynchronous slave to the Galera Cluster.

You can use ClusterControl to deploy Galera Arbitrator, which can be very helpful (as we shown previously) in multi-datacenter deployments.

Proxy layer

ClusterControl gives you ability to deploy different proxies with your Galera Cluster. It support deployments of HAProxy, MaxScale and ProxySQL. For HAProxy and ProxySQL, there are additional options to deploy redundant instances with Keepalived and VirtualIP.

For HAProxy, ClusterControl has a statistics page. You can also set a node to maintenance state:

For MaxScale, using ClusterControl, you have access to MaxScale’s CLI and you can perform any actions that are possible from it. Please note we deploy MaxScale version 1.4.3 - more recent versions introduced licensing limitations.

ClusterControl provides quite a bit of functionality for managing ProxySQL, including creating query rules and query caching, including creating query rules and query caching. If you are interested in more details, we encourage you to watch the replay of one of our webinars in which we demoed this UI.

As mentioned previously, ClusterControl can also deploy HAProxy and ProxySQL in a highly available setup. We use virtual IP and Keepalived to track the state of services and perform a failover if needed.

As we have seen in this blog, Galera Cluster can be used in a number of ways to provide high availability of your database, which is a key part of the web infrastructure. You are welcome to download ClusterControl and try the different topologies we discussed above.

Tags:

galera

Hight Availability

clustercontrol

ClusterControl allows you to easily manage your database infrastructure on premise or in the cloud. With in-depth support for technologies like Galera Cluster for MySQL and MariaDB setups, you can truly automate mixed environments for next-level applications.

Since the launch of ClusterControl in 2012, we’ve experienced growth in new industries with customers who are benefiting from the advancements ClusterControl has to offer - in particular when it comes to Galera Cluster for MySQL.

In addition to reaching new highs in ClusterControl demand, this past year we’ve doubled the size of our team allowing us to continue to provide even more improvements to ClusterControl.

Take a look at this infographic for our top Galera Cluster for MySQL resources and information about how ClusterControl works with Galera Cluster.

Tags:

Thanks to everyone who participated in our recent webinar on High Availability in ProxySQL and on how to build a solid, scalable and manageable proxy layer using ProxySQL for highly available MySQL infrastructures.

This second joint webinar with ProxySQL creator René Cannaò saw lots of interest and some nice questions from our audience, which we’re sharing below with this blog post as well as the answers to them.

Building a highly available proxy layer creates additional challenges, such as how to manage multiple proxy instances, how to ensure that their configuration is in sync, Virtual IP and fail-over; and more, which we’ve covered in this webinar with René. And we demonstrated how you can make your ProxySQL highly available when deploying it from ClusterControl (download & try it free).

If you missed the webinar, would like to watch it again or browse through the slides, it is available for viewing online.

Watch the webinar replay

Webinar Questions & Answers

Q.: In a MySQL master/slave pair, I am inclined to deploy ProxySQL instances directly on both master and slave hosts. In an environment of 100s of master/slave pairs, with new hosts being built all the time, I can see this as a good way to combine host / MySQL / ProxySQL master/slave pair deploys via a single Ansible playbook. Do you guys have any thoughts on this?

A.: Our only concern here is that co-locating ProxySQL with database servers can make the debugging of database performance issues harder - the proxy will add overhead for CPU and memory and MySQL may have to compete for those resources.

Additionally, we’re not really sure what you’d like to achieve by deploying ProxySQL on all database servers - where would you like to connect? To one instance or to both? In the first case, you’d have to come up with a solution to handling potentially hundreds of failovers - when a master goes down, you’d have to re-route traffic to the ProxySQL instance on a slave. It adds more complexity than it’s really worth. The second case also creates complexity: instead of connecting to one proxy, the application would have to connect to both.

Co-locating ProxySQL on the application hosts is not that much more complex regarding configuration management than deploying it on database hosts. Yet it makes it so much easier for the application to route traffic - just connect to the local ProxySQL instance over the UNIX socket and that’s all.

Q.: Do you recommend for multiple ProxySQL instances to talk to each other or is it preferable for config changes to rely on each ProxySQL instance detecting the same issue at the same time? For example, would you make ProxySQL01 apply config changes in proxysql_master_switchover.sh to both itself and ProxySQL02 to ensure they stay the same? (I hope this isn't a stupid question... I've not yet succeeded in making this work so I thought maybe I'm missing something!)

A.: This is a very good question indeed. As long as you have scripts which would ensure that the configuration is the same on all of the ProxySQL instances - it should result in more consistent configuration across the whole infrastructure.

Q.: Sometimes I get the following warning 2017-04-04T02:11:43.996225+02:00 Keepalived_vrrp: Process [113479] didn't respond to SIGTERM. and VIP was moved to another server ... I can send you the complete configuration keepalived ... I didn't find a solution as to why I am getting this error/warning.

A.: This may happen from time to time. Timeout results in a failed check which triggers VIP failover. And as to why the monitored process didn't respond to signal in time, that’s really hard to tell. It is possible to increase the number of health-check fails required to trigger a VIP move to minimize the impact of such timeouts.

Q.: What load balancer can we use in front of ProxySQL?

A.: You can use virtually every load balancer out there, including ProxySQL itself - this is actually a topology we’d suggest. It’s better to rely on a single piece of software than to use ProxySQL and then another tool which would be redundant - more steep learning curve, more issues to debug.

Q.: When I started using ProxySQL I had this issue "access denied for MySQL user"; it was random, what is the cause of it?

A.: If it is random and not systematic, it may be worth investigating if it is a bug. We strongly recommend to open an issue on github.

Q.: I have tried ProxySQL and the issue we faced was that after using ProxySQL to split read/write , the connection switched to Master for all reads. How can we prevent the connection?

A.: This is most likely a configuration issue, and there are multiple reasons why this may happen. For example, if transaction_persistent was set to 1 and reads were all within a transaction. Or perhaps the query rules in mysql_query_rules weren’t configured correctly, and all traffic was being sent to the default hostgroup (the master).

Q.: How can Service Discovery help me?

A.: If your infrastructure is constantly changing, tools like etcd, Zookeeper or Consul can help you to track those changes, detect and push configuration changes to proxies. When your database clusters are going up and down, this can simplify configuration management.

Q.: In the discussion on structure, the load balancer scenario was quickly moved on from because of its single point of failure. How about when having a HA load balancer using CNAMES (not IP) for example AWS ElasticLoadBalancer on TCP ports. Would that be a structure that could work well in production?

A.: As long as the load balancer is highly available, this is not a problem, because it’s not a single point of failure. ELB itself is deployed in HA mode, so having a single ELB in front of anything (database servers, a pool of ProxySQL instances) will not introduce a single point of failure.

Q.: Don't any of the silo approaches have a single point of failure in the proxy that is fronting the silo?

A.: Indeed, although it is not a single point of failure - it’s more like multiple points of failure introduced in the infrastructure. If we are talking about huge infrastructure of hundreds or thousands of proxies, a loss of very small subset of application hosts should be acceptable. If we are talking about smaller setups, it should be manageable to have a ProxySQL per application host setup.

Watch the webinar replay

Tags:

Today we are pleased to announce the 1.4.1 release of ClusterControl - the all-inclusive database management system that lets you easily deploy, monitor, manage and scale highly available open source databases - and load balancers - in any environment: on-premise or in the cloud.

This release contains key new management features for MySQL and MariaDB load balancing with ProxySQL, along with performance improvements and bug fixes.

Release Highlights

For ProxySQL

Support for:
- MySQL Galera in addition to Replication clusters
- Active-standby HA setup with Keepalived
- Use the Query Monitor to view query digests
Management features:
- Manage Query Rules (Query Caching, Query Rewrite)
- Manage Host Groups (Servers)
- Manage ProxySQL Database Users
- Manage ProxySQL System Variables

For Galera Cluster for MySQL & Replication

Manage MySQL Galera and Replication clusters with management/public IPs
- For monitoring connections and data/private IPs for replication traffic
Add MySQL Galera nodes or Replication Read Slaves with management and data IPs

Download ClusterControl

View release details and resources

Load balancers are an essential component in MySQL and MariaDB database high availability; especially when making topology changes transparent to applications and implementing read-write split functionality. As we all know, high-traffic database applications draw an enormous amount of queries daily. Which is why DBAs and SysAdmins require reliable technology solutions that can automatically scale to handle those connections while remaining available for still more.

And this is where load balancing technologies such as HAProxy, MaxScale and now ProxySQL come in.

ClusterControl has always come with support for HAProxy, as a generic TCP load balancer. We then added support for MariaDB’s MaxScale, an SQL-aware load balancer.

And today we’re happy to announce management support for ProxySQL, a lightweight yet complex protocol-aware proxy that sits between the MySQL clients and servers, in addition to the deployment and monitoring features for ProxySQL we announced two months ago.

Unlike others, ProxySQL understands MySQL protocol, which allows the implementation of features otherwise impossible to implement. For example, ProxySQL is the only proxy supporting connections multiplexing and query caching.

With that said, the new management features in ClusterControl include the following:

MySQL Galera in addition to Replication clusters

Up until now, ClusterControl enabled users to deploy ProxySQL on MySQL Replication clusters and monitor its performance. The same is now true for Galera Cluster for MySQL, MariaDB Galera Cluster and Percona XtraDB. This also includes active-standby HA setups with Keepalived.

Use the Query Monitor to view query digests

ClusterControl offers unified and comprehensive real-time monitoring of your entire database and server infrastructure. You can easily visualize performance in custom dashboards to establish operational baselines and support capacity planning. And with comprehensive reports for ProxySQL, you have a clear view of data points like connections, queries, data transfer and utilization, and more.

For more information on how monitoring works in ProxySQL, see our blog post on MySQL Load Balancing with ProxySQL - An Overview.

Management features

With ClusterControl, you can now easily configure and manage your ProxySQL deployments with its comprehensive UI. You can create servers, reorientate your setup, create users, set rules, manage query routing, and enable variable configurations. The new management features in ClusterControl for ProxySQL include:

Manage Query Rules (Query Caching, Query Rewrite)

View running queries, create rules or cache and rewrite queries on the fly.

Manage Host Groups (Servers) - ProxySQL uses a concept of hostgroups - a group of different backends which serve the same purpose or handle similar type of traffic.

Add or remove servers to existing and new host groups.

Manage ProxySQL Database Users

Create new DB users or add existing MySQL users to ProxySQL.

Manage ProxySQL System Variables

View and change global runtime variables for tweaking your ProxySQL instance.

For more information on how ProxySQL helps with MySQL query cache, query rewrite, and on ProxySQL’s host groups, read our blog on How ProxySQL adds Failover and Query Control to your MySQL Replication Setup.

There are a number of other features and improvements that we have not mentioned here. You can find all details in the ChangeLog.

We encourage you to test this latest release and provide us with your feedback. If you’d like a demo, feel free to request one.

Thank you for your ongoing support, and load balancing!

PS.: For additional tips & tricks, follow our blog: https://severalnines.com/blog/

Tags:

We sat down at Percona Live 2017 with ProxySQL creator René Cannaò to talk about the new release of ClusterControl, what’s is coming up for ProxySQL, how it is being received by the MySQL Community and how it varies from Maxscale.

ClusterControl and ProxySQL

ProxySQL is a lightweight yet complex protocol-aware proxy that sits between the MySQL clients and servers. It is a gate, which basically separates clients from databases, and is therefore an entry point used to access all the database servers.

ClusterControl provides the only GUI on the market for the easy deployment, configuration and management of ProxySQL.

To learn more check out the following resources…

Tags:

proxysql

load balancer

clustercontrol

ProxySQL has an advanced multi-core architecture to handle that large number of connections, multiplexed to potentially hundreds of backend servers squeezing every drop of performance out of your database cluster; all with zero downtime.

Here are our top resources for ProxySQL to get you started with this amazing new technology.

On-Demand Webinars

High Availability in ProxySQL

Joint webinar with ProxySQL’s creator, René Cannaò, where we discuss building a solid, scalable and manageable proxy layer using ProxySQL in order to create a highly available MySQL infrastructure.

Watch the replay!

MySQL & MariaDB Load Balancing with ProxySQL & ClusterControl

We’re delighted to be joined by ProxySQL’s creator, René Cannaò, to tell us more about this new MySQL & MariaDB load balancing proxy and its features. We also show you how you can deploy ProxySQL using ClusterControl.

Watch the replay!

The Severalnines Blueprint for MySQL Replication includes all aspects of a MySQL Replication topology with the ins and outs of deployment, setting up replication, monitoring, upgrades, performing backups and managing high availability using proxies as ProxySQL, MaxScale and HAProxy. This webinar provides an in-depth walk-through of this blueprint and explains how to make best use of it.

Watch the replay!

Videos

Video: Interview #2 with ProxySQL Creator René Cannaò

Watch the Video!

Video: Interview with ProxySQL Creator René Cannaò

Severalnines sits down with ProxySQL founder and creator René Cannaò to discuss his product and the upcoming webinar MySQL & MariaDB Load Balancing with ProxySQL & ClusterControl.

Watch the Video!

Top Blogs

Announcing ClusterControl 1.4.1 - the ProxySQL Edition

Release of ClusterControl 1.4.1 with key new management features for MySQL and MariaDB load balancing with ProxySQL, along with performance improvements and bug fixes.

Read More!

Using ClusterControl to Deploy and Configure ProxySQL on top of MySQL Replication

This blog post shows you how to deploy and configure ProxySQL from ClusterControl 1.3.3, on top of a master-slave replication setup. ProxySQL is an SQL-aware load balancer. It allows you to automatically do read-write split of your traffic, by sending writes to your writeable master and sending reads to the read-only slaves.

Read More!

Tips and Tricks - How to Shard MySQL with ProxySQL in ClusterControl

Sharding can be a painful exercise. Databases need to be split on multiple servers, data migrated, and applications have to be tweaked to be aware of the shards. However, there are ways to make this less painful. This blog explains how to use ProxySQL to shard a table to another cluster in ClusterControl without changing your application.

Read More!

Sharding MySQL with MySQL Fabric and ProxySQL

This blog post shows you how to set up a sharded MySQL setup based on MySQL Fabric and ProxySQL.

Read More!

How to Setup Read-Write Split in Galera Cluster using ProxySQL

This blog shows you how to setup read-write split in Galera Cluster using ProxySQL. It also covers some limitations that we found at the time of writing.

Read More!

How ProxySQL adds Failover and Query Control to your MySQL Replication Setup

An overview of ProxySQL in a MySQL replication environment, and how ProxySQL adds features like failover and query control to the setup.

Read More!

MySQL Load Balancing with ProxySQL - An Overview

An overview of ProxySQL, an SQL-aware load balancer for MySQL and MariaDB. This blog post explains the concepts behind ProxySQL, and goes through installation and configuration.

Read More!

Whitepapers

Database Sharding with MySQL Fabric

Why do we shard? How does sharding work? What are the different ways I can shard my database? This white paper goes through some of the theory behind sharding. It also discusses three different tools which are designed to help users shard their MySQL databases. And last but not least, it shows you how to set up a sharded MySQL setup based on MySQL Fabric and ProxySQL.

Download the whitepaper

Migrating to MySQL 5.7 - The Database Upgrade Guide

Upgrading to a new major version involves risk, and it is important to plan the whole process carefully. In this whitepaper, we look at the important new changes in MySQL 5.7 and show you how to plan the test process. We then look at how to do a live system upgrade without downtime. For those who want to avoid connection failures during slave restarts and switchover, this document goes even further and shows you how to leverage ProxySQL to achieve a graceful upgrade process.

Download the whitepaper

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

ClusterControl for ProxySQL

Packaged in the latest ClusterControl release, ProxySQL enables MySQL, MariaDB and Percona XtraDB database systems to easily manage intense, high-traffic database applications without losing availability. ClusterControl offers advanced, point-and-click configuration management features for the load balancing technologies we support. We know the issues regularly faced and make it easy to customize and configure the load balancer for your unique application needs.

We know load balancing and support many different technologies. ClusterControl has many things preconfigured to get you started with a couple of clicks. If you run into challenged we also provide resources and on-the-spot support to help ensure your configurations are running at peak performance.

ClusterControl delivers on an array of features to help deploy and manage ProxySQL

Advanced Graphical Interface - ClusterControl provides the only GUI on the market for the easy deployment, configuration and management of ProxySQL.
Point and Click deployment - With ClusterControl you’re able to apply point and click deployments to MySQL, MySQL replication, MySQL Cluster, Galera Cluster, MariaDB, MariaDB Galera Cluster, and Percona XtraDB technologies, as well the top related load balancers with HAProxy, MaxScale and ProxySQL.
Suite of monitoring graphs - With comprehensive reports you have a clear view of data points like connections, queries, data transfer and utilization, and more.
Configuration Management - Easily configure and manage your ProxySQL deployments with a simple UI. With ClusterControl you can create servers, reorientate your setup, create users, set rules, manage query routing, and enable variable configurations.

Tags:

Join us for our new webinar next week, Tuesday May 9th, with Krzysztof Książek, Senior Support Engineer at Severalnines, who will discuss support for proxies for MySQL HA setups in ClusterControl: how they differ and what their pros and cons are. You’ll also be shown you how you can easily deploy and manage HAProxy, MaxScale and ProxySQL from ClusterControl via a live demo.

Proxies are building blocks of high availability setups for MySQL. They can detect failed nodes and route queries to hosts which are still available. If your master failed and you had to promote one of your slaves, proxies will detect such topology changes and route your traffic accordingly. More advanced proxies can do much more, such as route traffic based on precise query rules, cache queries or mirror them. They can be even used to implement different types of sharding.

Date, Time & Registration

Europe/MEA/APAC

Tuesday, May 9th at 09:00 BST (UK) / 10:00 CEST (Germany, France, Sweden)

North America/LatAm

Tuesday, May 9th at 9:00 PST (US) / 12:00 EST (US)

Agenda

Introduction
Why use a proxy layer?
Comparison of proxies - the pros & cons
- HAProxy
- MaxScale
- ProxySQL
Live demo of proxy support in ClusterControl

Speaker

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.

We look forward to “seeing” you there and to insightful discussions!

If you have any questions or would like a personalised live demo, please do contact us.

Tags:

Severalnines was one of the sponsors of Percona Live 2017 last week in Santa Clara, California and in case you missed the event here are some of the highlights.

Our Booth at the Event

Keynote Highlights

Videos from the Event

Severalnines was live several times (pardon the pun) during the event.

Our Sessions

In case you missed them at the event here are the slides from our presentations at Percona. You can view all the presentations here.

Become a MongoDB DBA - Monitoring Essentials

Presented by Art van Scheppingen (@dbart_sql)

MySQL Cluster (NDB) - Best Practices (aka DIE HARD VIII)

Presented by Johan Andersson

MySQL Load Balancing

Presented by Krzysztof Książek (@krzksiazek)

Next Event

If you are interested in attending the Percona Live Open Source Database Conference, the next one will be held in Dublin, Ireland September 25-27, 2017

Tags:

In my latest ClusterControl product demonstration video I show the latest features we have incorporated for ProxySQL, a highly flexible and versatile new load balancing technology.

In this video we demonstrate…

Deployment of ProxySQL for MySQL Replication and Galera Cluster
Installation instructions for ProxySQL
Adding ProxySQL Admins and monitoring users
Selecting the instances you wish to balance
Monitoring ProxySQL Host Groups
Query Monitoring
Managing Query Rules
Extracting top queries

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

ClusterControl for ProxySQL

ClusterControl delivers on an array of features to help deploy and manage ProxySQL

Advanced Graphical Interface - ClusterControl provides the only GUI on the market for the easy deployment, configuration and management of ProxySQL.
Point and Click deployment - With ClusterControl you’re able to apply point and click deployments to MySQL, MySQL replication, MySQL Cluster, Galera Cluster, MariaDB, MariaDB Galera Cluster, and Percona XtraDB technologies, as well the top related load balancers with HAProxy, MaxScale and ProxySQL.
Suite of monitoring graphs - With comprehensive reports you have a clear view of data points like connections, queries, data transfer and utilization, and more.

Configuration Management - Easily configure and manage your ProxySQL deployments with a simple UI. With ClusterControl you can create servers, reorientate your setup, create users, set rules, manage query routing, and enable variable configurations.

Tags:

Thanks to everyone who participated in yesterday’s webinar on how to deploy and manage ProxySQL, HAProxy and MaxScale with ClusterControl.

Krzysztof Książek, Senior Support Engineer at Severalnines, discussed support for proxies for MySQL HA setups in ClusterControl: how they differ and what their pros and cons are. And he demonstrated how you can easily deploy and manage HAProxy, MaxScale and ProxySQL from ClusterControl during a live demo.

If you missed the webinar, would like to watch it again or browse through the slides, it’s all available for viewing online now. You’ll also find below a transcript of the Q&A session, which took place at the end of the webinar.

Watch the webinar replay

Webinar Questions & Answers

Q.: I’d like to replace HAProxy by ProxySQL - can I deploy ProxySQL on the same VMs as my current HAProxy ones or do I have to create new VMs? I deploy and manage it all with ClusterControl.

A.: Yes, as long as there is no conflict in ports used by those two proxies, there’s no reason why they couldn’t coexist. By default there is no such conflict, but a user may customize ports when deploying a proxy from ClusterControl, so if you are not sure how HAProxy is configured, it’s better to double-check it.

Q.: Do you know what happened to the admin interface MaxScale 2.0 and why it was removed?

A.: We don’t have detailed knowledge - it’s been deprecated due to security reasons, but what exactly is hidden behind this statement, we don’t know.

Q.: Have you any plans to talk about or support other load balancers in future, such as F5 BigIP, A10 Networks, or Citrix Netscaler? Or do you have any immediate thoughts on them you can share just now?

A.: As of now we don’t have any plans related to these load balancers, but if we get more requests for it, we’ll look into them more.

Q.: How can we sync users across multiple Proxysql instances? Or add existing users automatically to a newly added Proxysql instance?

A.: As of now, it is not possible to do that using ClusterControl - you still can do it manually, accessing ProxySQL through the command line interface. Having said that, we have plans to implement configuration syncing in one of the next ClusterControl releases. For adding users in batches, right now, CLI is the best way - ProxySQL accepts passwords in a form of MySQL password hash, so it’s fairly easy to write a script which will do the import. This is one of the feature requests we got so we will, most likely, implement it at some point. We can’t share any ETA though.

Q.: How does ClusterControl handle configuration changes in ProxySQL?

A.: ClusterControl does not take advantage of multiple configuration levels in ProxySQL - any change introduced via the UI is immediately loaded to runtime configuration.

Q.: Can you describe what the CPU usage is for ProxySQL or MaxScale on read/write split?

A.: Typically you’ll see ProxySQL utilizing less CPU resources compared to MaxScale, but it all depends on your workload and the number of query rules you may have added to ProxySQL.

Watch the webinar replay

Tags:

Webinar Questions & Answers

Setting up replication

Using the wrong replication method

Circular replication

Stalling your replication with large updates

Schema changes

Memory tables and replication

Setting the read_only variable to true

Enabling GTID

Conclusion

Flapping

Lost transactions

Network partitioning

Roles

Integration

Conclusion

Cost factors for databases

Example calculation for hardware

Example calculation cloud hosting

OPEX has a large influence

Making the DBA more efficient

Do you even need a DBA?

ClusterControl saving costs

RSU and TOI

Galera and Row-based binary log format

RSU example

Initial setup

ClusterControl and Galera Cluster

Hybrid Replication in MySQL 5.5

How GTID Solves the Problem

Setting Up Hybrid Replication by hand

Setting up Hybrid Replication using ClusterControl

Changing Master

Preparing the Master

Preparing the Slave

Master Failover and Recovery

ClusterControl as Docker Swarm Service

Base Containers as Docker Swarm Service

Wait for a Galera Cluster to be ready

Management

Connecting to the Cluster

Scale up/down

Failover

Creating a new cluster

Destroying everything

Summary

High availability is hard with databases

Galera Cluster for MySQL/MariaDB

Deploying Galera cluster

Basic deployment - three node cluster

Minimalistic approach - two nodes + garbd

One step further - add asynchronous slave

Multi-datacenter Galera clusters

Multiple Galera clusters connected using asynchronous replication

Proxy layer

How ClusterControl can help you to build highly available Galera Clusters?

Proxy layer

Webinar Questions & Answers

Release Highlights

View release details and resources

MySQL Galera in addition to Replication clusters

Use the Query Monitor to view query digests

Management features

ClusterControl and ProxySQL

On-Demand Webinars

High Availability in ProxySQL

MySQL & MariaDB Load Balancing with ProxySQL & ClusterControl

Introducing the Severalnines MySQL© Replication Blueprint

Videos

Video: Interview #2 with ProxySQL Creator René Cannaò

Video: Interview with ProxySQL Creator René Cannaò

Top Blogs

Announcing ClusterControl 1.4.1 - the ProxySQL Edition

Using ClusterControl to Deploy and Configure ProxySQL on top of MySQL Replication

Tips and Tricks - How to Shard MySQL with ProxySQL in ClusterControl

Sharding MySQL with MySQL Fabric and ProxySQL

How to Setup Read-Write Split in Galera Cluster using ProxySQL

How ProxySQL adds Failover and Query Control to your MySQL Replication Setup

MySQL Load Balancing with ProxySQL - An Overview

Whitepapers