Positioning MySQL replication with GTID

mysql-fabric is a set of tools to setup resiliend and scalable mysql infrastructures.

You can create fabric groups (a set of replicated servers) with automatic failover and various policies.

Shortly:

#mysqlfabric group create mycluster
#mysqlfabric group add mycluster db-1:3306
#mysqlfabric group add mycluster db-2:3306
#mysqlfabric group add mycluster db-3:3306

And now pick a master: fabric will configure replication on all the nodes

#mysqlfabric group promote mycluster --slave_id DB_1_ID

Now, unless db-1 is a blank page, you’re likely to get an error 🙁

Fabric is trying to replicate ALL the changes happened on db-1 since its creation (included “CREATE USER root …”) to all slaves.

The solution is to
1 – get the last transaction id used for configuration;
2- tell to the slaves to skip everything until then.

It is done via

-- stop replication first, and reset what have been done until now (hopefully nothing ;)
STOP SLAVE; 
RESET MASTER;
-- tell the slave to skip the first 10 transactions from the server with id 9f36...
 SET @@GLOBAL.GTID_PURGED = '9f367fff-d91e-11e4-8ffe-0242ac110017:1-10'; 
-- now restart the slave and check that everything is fine
START SLAVE; 

SHOW SLAVE STATUS \G

Fixing oracle agent for RHCS

I just found an issue in the oracle agent for rhcs. If you’re curious, check out here on github.

Essentially existing processes were searched via ps | grep | awk instead of ps | grep.

While grep returns nonzero if nothing matches, awk always returns zero, so the agent always waits the timeout before stopping the resource.

After fixing the script in /usr/share/cluster/ reloading the configuration causes the following error

# cman_tool -r version
/usr/share/cluster/cluster.rng:2027: element define: Relax-NG parser error : Some defines for ORACLEDB needs the combine attribute
Relax-NG schema /usr/share/cluster/cluster.rng failed to compile
cman_tool: Not reloading, configuration is not valid

After reading https://access.redhat.com/site/solutions/549963 I just noted that the backup copy of the oracle agent was disturbing the RHCS.

The following chmod seemed to do the trick.

#chmod -x /usr/share/cluster/oracledb.sh-2014-05-26

Clustered Volumes? That’s Logical

With lvm you can create clustered volumes allowing different nodes to mount different lvs on the same vg.

The standard workflow of creating partitions and vgs is:

parted /dev/mapper/datadisk mklabel gpt mkpart 1p "1 -1";
partprobe /dev/mapper/datadisk
pvcreate /dev/mapper/datadiskp1
vgcreate vg_xml /dev/mapper/datadiskp1
vgchange -c y vg_xml

If you get this error creating the LV

# lvcreate -n lv_xml vg_xml -l +100%FREE 
  Error locking on node bar-coll-mta-02: Volume group for uuid not found: gFyARW80mUikvaZafzYz773pLBWqc8etOgVrimSit4OmC98c1cIT0qfZfY3tZRxQ
  Failed to activate new LV.

That’s because one of the cluster nodes can’t see the newly created volume.
Probably the following will solve

#partprobe /dev/mapper/datadisk
#vgscan