MySQL JSON fields on the ground!

Having to add a series of custom fields to a quite relational application, I decided to try the new JSON fields.

As of now you can:

– create json fields
– manipulate them with json_extract, json_unquote
– create generated fields from json entries

You can not:

– index json fields directly, create a generated field and index it
– retain the original json datatype (eg. string, int), as json_extract always returns strings.

Let’s start with a simple flask app:

# requirements.txt
mysql-connector-python
Flask-SQLAlchemy==2.0
SQLAlchemy>=1.1.3

Let’s create a simple flask app connected to a db.

import flask
import flask_sqlalchemy
from sqlalchemy.dialects.mysql import JSON

# A simple flask app connected to a db
app = flask.Flask('app')
app.config['SQLALCHEMY_DATABASE_URI']='mysql+mysqlconnector://root:secret@localhost:3306/test'
db = flask_sqlalchemy.SQLAlchemy(app)

Add a class to the playground and create it on the db. We need sqlalchemy>=1.1 to support the JSON type!

# The model
class MyJson(db.Model):
    name = db.Column(db.String(16), primary_key=True)
    json = db.Column(JSON, nullable=True)

    def __init__(self, name, json=None):
        self.name = name
        self.json = json

# Create table
db.create_all()

Thanks to flask-sqlalchemy we can just db.session ;)

# Add an entry
entry = MyJson('jon', {'do': 'it', 'now': 1})
db.session.add(entry)
db.session.commit()

We can now verify using a raw select that the entry is now serialized on db

# Get entry in Standard SQL
entries = db.engine.execute(db.select(columns=['*'], from_obj=MyJson)).fetchall()
(name, json_as_string), = first_entry  # unpack result (it's just one!)
assert isinstance(json_as_string, basestring) 

A raw select to extract json fields now:

entries = db.engine.execute(db.select(columns=[name, 'json_extract(json, "$.now")'], from_obj=MyJson)).fetchall()

(name, json_now), = first_entry  # unpack result (it's just one!)
assert isinstance(json_now, basestring) 
assert json_now != entry.json['now']  # '1' != 1 

MySQL 8.0 Innodb Cluster looks at MongoDB

MySQL turns 8.0 and the technical preview integrates a new “InnoDB Cluster”. The overall architecture reminds MongoDB:

– group replication with a single master, similar to replica-sets;
– a mysqlsh able to create replication group and local instances supporting js and python;
– a MySQL Router as a gateway to appservers, to be deployed on each client machine like the mongos.

Once installed, you can create a RG with a few commands:

su - rpolli
mysqlsh

\py  # enable python mode. Create 3 instances in  ~/sandbox-dir/{3310,3320,3330}

for port in (3310, 3320, 3330, 3340, 3350):
    dba.deploy_local_instance(port,{"password":"secret"});

Now we have 5 mysql instances listening on various ports. Create a cluster and check the newly created mysql_innodb_cluster_metadata schema.

\connect root:root@localhost:3310

cluster = dba.create_cluster('polli', 'pollikey');

\sql  # switch to sql mode

SHOW DATABASES;

| Database                      |
+-------------------------------+
| information_schema            |
| mysql                         |
| mysql_innodb_cluster_metadata |
| performance_schema            |
| sys                           |

Go back to the python mode and add the remaining instances to the cluster.

\py  # return to python mode again

# Eventually re-get the cluster.
cluster = dba.get_cluster('polli',{'masterKey':'pollikey'})  # masterKey is a shared secret between nodes.

# Add the other nodes
for port in ports[1:]:
    cluster.add_instance('root@localhost:' + str(port),'secret');

# Check status
cluster.status()  # BEWARE! The output is a str :( not a dict
{
    "clusterName": "polli",
    "defaultReplicaSet": {
        "status": "Cluster tolerant to up to 2 failures.",
        "topology": {
            "localhost:3310": {
                "address": "localhost:3310",
                "status": "ONLINE",
                "role": "HA",
                "mode": "R/W",
                "leaves": {
                    "localhost:3320": {
                        "address": "localhost:3320",
                        "status": "ONLINE",
                        "role": "HA",
                        "mode": "R/O",
                        "leaves": {}
                    },
                    "localhost:3330": {
                        "address": "localhost:3330",
                        "status": "ONLINE",
                        "role": "HA",
                        "mode": "R/O",
                        "leaves": {}
                    }
                    ....
                }
            }
        }
    }
}

Now check the failover feature.

dba.kill_local_instance(3310)  # Successfully killed

# Parse the output with...
import json
json.loads(cluster.status())["defaultReplicaSet"]["topology"].keys()  # localhost:3320 WOW!


Once set up, created users will span the whole group.

\sql
CREATE USER 'admin'@'%' IDENTIFIED BY 'secret';
GRANT ALL ON *.* TO 'admin'@'%'  WITH GRANT OPTION;

Now let’s connect to different cluster nodes.

mysql -uadmin -P3310 -psecret -e 'create database this_works_on_master;'  # OK
mysql -uadmin -P3320 -psecret -e 'create database wont_work_on_slave_even_if_admin;'  
ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement

The default setup allows writings only on master *even for admin|super users* that can be overriden as usual.

mysql> SHOW VARIABLES LIKE '%only' 
mysql> show variables like '%only';
+-------------------------------+-------+
| Variable_name                 | Value |
+-------------------------------+-------+
...
| read_only                     | ON    |
| super_read_only               | ON    |
...
+-------------------------------+-------+
mysql> set global super_read_only = OFF;  -- just for root
mysql> set global super_read_only = ON;  

mysql> set global read_only = OFF;  -- for all allowed users
mysql> 

Mongodb python driver is topology-aware. MySQL connectors instead rely on mysql-router for connecting to the right primary.

FullText Indexing IPv6 addresses with MySQL 5.7

MySQL 5.7 supports generated fields. This is particularly useful for searching the string representation of numeric stored ip addresses:

CREATE TABLE catalog(
ip varbinary(16) not null,
hostname varchar(64) not null,
label varchar(64),
ip_ntoa varchar(64) generated always as (inet6_ntoa(ip)) STORED, -- generate and store fields with the address representation
fulltext key (hostname, ip_ntoa, label)
);

When inserting values

INSERT INTO catalog(ip,hostname,label) VALUES
(inet6_aton('127.0.0.1'), 'localhost', 'lo'),
(inet6_aton('192.168.0.1'), 'gimli', 'stage,ipv4'),
(inet6_aton('fdfe::5a55:caff:fefa:9089'), 'legolas', 'router,ipv6'),
(inet6_aton('fdfe::5a55:caff:fefa:9090'), 'boromir', 'router,ipv6')

you can search in OR mode with

SELECT hostname FROM catalog WHERE
  MATCH(ip_ntoa, hostname, label)
  AGAINST('9089 router');
-- returns every entry matching ANY needle
***1***
hostname: legolas
***2***
hostname: boromir

Or exact matches

SELECT hostname FROM catalog WHERE
  MATCH(ip_ntoa, hostname, label)
  AGAINST('+9089 +router' in boolean mode);
-- returns ONE entry matching ALL needles
***1***
hostname: legolas

Positioning MySQL replication with GTID

mysql-fabric is a set of tools to setup resiliend and scalable mysql infrastructures.

You can create fabric groups (a set of replicated servers) with automatic failover and various policies.

Shortly:

#mysqlfabric group create mycluster
#mysqlfabric group add mycluster db-1:3306
#mysqlfabric group add mycluster db-2:3306
#mysqlfabric group add mycluster db-3:3306

And now pick a master: fabric will configure replication on all the nodes

#mysqlfabric group promote mycluster --slave_id DB_1_ID

Now, unless db-1 is a blank page, you’re likely to get an error :(

Fabric is trying to replicate ALL the changes happened on db-1 since its creation (included “CREATE USER root …”) to all slaves.

The solution is to
1 – get the last transaction id used for configuration;
2- tell to the slaves to skip everything until then.

It is done via

-- stop replication first, and reset what have been done until now (hopefully nothing ;)
STOP SLAVE; 
RESET MASTER;
-- tell the slave to skip the first 10 transactions from the server with id 9f36...
 SET @@GLOBAL.GTID_PURGED = '9f367fff-d91e-11e4-8ffe-0242ac110017:1-10'; 
-- now restart the slave and check that everything is fine
START SLAVE; 

SHOW SLAVE STATUS \G

Remote dump of mysqlbinlog

In MySQL 5.6 the mysqlbinlog command allows:
– connecting to a remote server with  “-R”
– get all binlogs passing “–to-last-log”

To continuously dump the logs, use
– stay connected with “–stop-never” which implies “–to-last-log”

You can practice using this MySQL Community Docker Image, which provides all binaries and an easy master-slave setup.

You can try using this

# mkdir /tmp/test; cd /tmp/test
# wget https://raw.githubusercontent.com/ioggstream/mysql-community/master/ga/docker-compose.yml
# docker-compose -p test up # run master and server

# docker run --rm -ti --link=test_master_1:master --entrypoint=/bin/bash ioggstream/mysql-community -l

$mysqlbinlog -R  -h master -u root -p root  --stop-never master-bin.000001

Now just send some data to the master (eg. the sakila db)

For further info, check here http://dev.mysql.com/doc/refman/5.6/en/mysqlbinlog-backup.html

Oracle’s mysql.connector for python

Oracle released a pure-python mysql connector with connection pooling support.

Create a connection pool is really easy. You can try the following snippets in ipython

import mysql.connector

auth = {
"database": "test",
"user":     "user",
"password":     "secret"
}

# the first call instantiates the pool
mypool = mysql.connector.connect(
pool_name = "mypool",
pool_size = 3,
**auth)

All the subsequent calls to connect(pool_name=”mypool”) will be managed by the pool

# this won't create another connection
#  but lend one from the pool
conn =  mysql.connector.connect(
    pool_name = "mypool",
    pool_size = 3,
    **auth)

# now get a cursor and play
c = conn.cursor()
c.execute("show databases")
c.close()

Closing the connection will just release the connection to the pool: we’re not closing the socket!

conn.close()

MySQL Cluster: dimensioning redo logs

A rule of thumb of redo log dimensioning is: enough to store all incoming create/update transaction between three LCP. In MySQL Cluster jargon, the periodical snapshot of the database is name Local Checkpoint or LCP.

That is:

2 * (DataMemory / DiskSpeed) * Incoming Thruput

During some test importing the famous Employees Database on a two-node amazon cluster I ran into the following error:

ERROR 1297 (HY000) at line 38 in file: ‘xao’: Got temporary error 410 ‘REDO log files overloaded (decrease TimeBetweenLocalCheckpoints or increase NoOfFragmentLogFiles)’ from NDBCLUSTER

It was strange, because the REDO log size was 512M, while the DB was only 130M. No, I definitely didn’t want to increase NoOfFragmentLogFiles (aka REDO log size): 512M was enough. Even decreasing TimeBetweenLocalCheckpoints was not useful – since I saturated AMZ machine I/O that was ~ 5MBps

So I started monitoring the REDO log occupation with the following command:

while sleep 5; do
mysql -e “select node_id, 100*used/total, used/1024/1024 from ndbinfo.logspaces where node_id=4;”;
done

I found that the logs – which are split in 4 logical chunks of 128M named “fragments” – were not uniformly used.
Each table was using always the same fragment, and the error was given while inserting a big table which exhausted “its” 128M chunk.

| node_id | 100*used/total | used/1024/1024 |
+---------+----------------+----------------+
|       4 |        96.7500 |   123.00000000 |
|       4 |         1.2500 |     2.00000000 |
|       4 |         1.2500 |     2.00000000 |
|       4 |         1.2500 |     2.00000000 |
+---------+----------------+----------------+

I asked to MySQL Guru Matthew Montgomery, which suggested the following reason:
“All transactions within a given LCP for a given fragment are stored within the same fragment log.”

The hints he provided follow:

Hint 1 :distribute REDO log usage, manually partition the table. This requires removing a foreign key to add the partition using ALTER TABLE:

CREATE TABLE salaries (
emp_no INT NOT NULL,
salary INT NOT NULL,
from_date DATE NOT NULL,
to_date DATE NOT NULL,
KEY (emp_no),
FOREIGN KEY (emp_no) REFERENCES employees (emp_no) ON DELETE CASCADE,
PRIMARY KEY (emp_no, from_date)
) ;
— We have to split the PARTITION statement from the CREATE one
— due to http://bugs.mysql.com/bug.php?id=67492
— Thx to Matthew “MontgoDB” Montgomery for the workaround!
ALTER TABLE salaries PARTITION BY KEY() PARTITIONS 8;

Edit: Thanks to this post, Matthew filed this bug and suggested the forementioned workaround!

Using a single statement you would have get this error from mysqld:

ERROR 1506 (HY000) at line 97: Foreign key clause is not yet supported in conjunction with partitioning

This is because the mysqld parser – which is InnoDB oriented – blocks this request because InnoDB doesn’t still support FK+Partitioning.

Obviously

Hint 2: use the max_rows directive and find a suitable value. In our case, the table should contain ~3M rows but we needed to overbook allocating space for 40M:

CREATE TABLE salaries (
emp_no INT NOT NULL,
salary INT NOT NULL,
from_date DATE NOT NULL,
to_date DATE NOT NULL,
KEY (emp_no),
FOREIGN KEY (emp_no) REFERENCES employees (emp_no) ON DELETE CASCADE,
PRIMARY KEY (emp_no, from_date)
) max_rows=40000000;

The 40M number was just heuristic: a script modified the “CREATE TABLE” statement until the inserting was successful ;)

After the change, fragment usage was doubled:

+---------+----------------+----------------+
| node_id | 100*used/total | used/1024/1024 |
+---------+----------------+----------------+
|       4 |        98.4000 |   126.00000000 |
|       4 |         1.2500 |     2.00000000 |
|       4 |         1.2500 |     2.00000000 |
|       4 |        99.2150 |   127.00000000 |
+---------+----------------+----------------+
+---------+----------------+----------------+
| node_id | 100*used/total | used/1024/1024 |
+---------+----------------+----------------+
|       4 |        57.0310 |    73.00000000 |
|       4 |         1.8750 |     3.00000000 |
|       4 |         1.8750 |     3.00000000 |
|       4 |        57.0310 |    73.00000000 |

MySQL Backup with ZManda

The Zmanda suite has some scripts to simplify backups. The open release just uses mysqldump, but you can download a zrm-xtrabackup plugin here.

Once installed xtrabackup and zmanda recovery mysql just put the zrm-xtrabackup plugin in /usr/share/mysql-zrm/plugins/.

With xtrabackup you can:
1. create a raw backup into /var/lib/mysql-zrm/BackupSet1/20120716154427

# full=0;incremental=1; export full incremental
# mysql-zrm –action backup –backup-level $full –mysql-binlog-path /var/log/mysql/

2. create an incremental backup

# mysql-zrm –action backup –backup-level $incremental –mysql-binlog-path /var/log/mysql/

The incremental backup dumps the binlogs created since the last full backup.

3. check backup status with

# mysql-zrm-reporter

The missing stuff is a way to effectively check the backup files. While zrm provides a mysql-zrm-verify, it just checks the md5 of the files, not if the backup files are effectively usable under mysql.

MySQL: Quick&Dirty Point in Time Recovery using binlog

MySQL can log all queries in binary format, creating the famous binlog files. BinLogs are posthumous files, created by mysqld after that queries have been sent to the engine for committing.

While it’s not compulsory to use binlogs, they are a required if you want:

  • replicating queries to another server;
  • create point in time recovery.

Point in Time Recovery

A Point in Time Recovery is a sort of Time Machine, returning a snapshot of the db at a given time. You can simply aggregate all your binlogs with:

# mysqlbinlog binlog.000001 binlog.000002 > my-pitr.sql

or:

# for bl in binlog*; do
mysqlbinlog $bl
done >> my-pitr.sql

Once you’ve aggregate *all* your binlog, you can import the my-pitr.sql:

# mysql -e “source /path/to/my-pitr.sql”

Note that:

  1. you cannot import each binlog file separately, because you will lose temporary tables.
  2. binlogs archive all your databases, not a single one.

Purging Binlogs

As binlog tend to grow an occupy a lot of space, you can set up an expiration policy based on the backup one, and know some basic operation to be done on your binlogs:

  • SHOW MASTER STATUS; — see the last transaction position in log file
  • FLUSH LOGS; — close the latest binlog file and start a new one (so we won’t delete a file in use)
  • PURGE BINARY LOGS BEFORE ‘yyyy-mm-dd hh:mm:ss’; — remove logs until date

And let’s see eventual issues in deleting binlogs:

  1. deleting a file while mysql is using it;
  2. deleting a file containing transactions not stored on backups.

First issue is solved in two step:
a) know which file is currently used, with SHOW MASTER STATUS;

eg.
mysql> show master status \G
File: mysql-bin.000016
Position: 106

b) closing this log file and opening a new one (.00017) with FLUSH LOGS;

If we take a backup now, every transaction we could have missed is contained in binlogs > 00017

T0 -
| flush binlogs
T1 -
| take a backup
T2 - (backup successfull)
| purge old binlogs
T3 -
|
v

MySQL Backup: that’s hot

Just downloaded, built and test a tool for Hot Backup of MySQL InnoDB databases: xtrabackup from Percona.

You can try the apt/rpm repository too.

You get two backup tools, that are quite easy to use:

  • xtrabackup – a C program that merges redo logs and InnoDB files to create Hot Backups;
  • innobackupex –  a perl wrapper around xtrabackup.

To backup your databases, prepare a mysql-owned /opt/backup_mysql/

#useradd backup -m -d /opt/backup_mysql;

Then run just run

# innobackupex  –user=root –password=password /opt/backup_mysql/

Once backup is finished (it took about 17 seconds for a 300Mb DB entirely fitting memory), you have a timestamped directory in /opt/backup_mysql/. You can use it as a new datadir to test your backup simply starting a new mysql instance.

UPDATE: latest innobackupex releases require you to APPLY redo logs to the datafiles. This is done in two steps
Apply redo logs:

# innobackupex --apply-log /opt/backup_mysql/2012-06-12_19-22-54/

Check innobackupex output

120612 19:22:57  innobackupex: completed OK!

REMEMBER: ALWAYS CHECK THAT YOUR BACKUP IS CONSISTENT! Unless you are sure to be able to recover the data your backup is not done ;)

To improve isolation you could even start this backup instance with another user, using eg. –user=backup. So, check backup permissions:

# chown -R backup.backup ~backup

Change user:

# su - backup

Start our server

$ /usr/bin/mysqld --datadir=/opt/backup_mysql/2012-06-12_19-22-54/ \
                  --socket=/opt/backup_mysql/backup_mysql.sock \
                  --skip-networking

This new instance won’t listen on any port (–skip-networking) but only on your backup_mysql.sock, so that it won’t interfere with existing instances.

Now we can check if our backup is fine, running mysql and issuing some queries:

# mysql --socket=/opt/backup_mysql/backup_mysql.sock -uroot -p
> show databases \G
*************************** 1. row ***************************
Database: employees
> use employees;
> show table status \G
... a lot of table informations ...
mysql> select count(salary) from salaries;
+---------------+
| count(salary) |
+---------------+
|       2844047 |
+---------------+
1 row in set (4.55 sec)
mysql> select sum(salary) as HR_Budget from salaries;
+--------------+
| HR_Budget    |
+--------------+
| 181480757419 |
+--------------+
1 row in set (0.83 sec)
> select sum(salary) as HR_Budget from salaries;