Brief OpenShift troubleshooting

If you have issues after an automagic openshift-on-openstack deployment:

1. Remember: every buildconfig created *before* the registry is not authorized to push the images

2. Remember: hawkular is a java application. Startup is slow. Just click there and wait for the startup

3. Ansible is your friend. To get container logs, just

ansible all -m shell -a 'ls /var/log/containers/CONTAINER_NAME*'

ansible all -m shell -a 'cat /var/log/containers/CONTAINER_NAME*' > CONTAINER_NAME.log

4. If a container don’t startup during the deployment, a broken image may have been downloaded

Jun 1 23:30:36 dev-7-infra-0 atomic-openshift-node: I0601 23:30:36.234103 32913 server.go:608] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"router-1-deploy", UID:"033670a9-470e-11e7-878f-fa163eac2bf7", APIVersion:"v1", ResourceVersion:"936", FieldPath:""}): type: 'Warning' reason: 'FailedSync' Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: Error response from daemon: {\"message\":\"invalid header field value \\\"oci runtime error: container_linux.go:247: starting container process caused \\\\\\\"exec: \\\\\\\\\\\\\\\"/pod\\\\\\\\\\\\\\\": stat /pod: no such file or directory\\\\\\\"\\\\n\\\"\"}"

Cleanup docker repo

docker ps -aq | xargs docker rm
docker rmi 90e9207f44f0 --force

5. Run oadm diagnostics on the master 😉

6. Check #oc get hostsubnet

OpenShift cockpit quickstart

Enabling openshift cockpit with the latest releases is quite simple, but requires using a local system account.

1- Install cockpit

yum install cockpit cockpit-kubernetes
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 9090 -j ACCEPT
systemctl enable cockpit.service --now

2- Create a custom user to be used for cockpit administration

useradd -m -k /home/cloud-user cockpit
passwd cockpit #

3- access cockpit via a tunnel from the management network using the user cockpit credentials.

ssh -D11111 cloud-user@bastion
firefox http://master-ip:9090

Openshift 3.4: broken ansible dependencies

The new ansible openshift 3.4 installation playbook is very nice.

Just set deploy variables in the inventory and everything will raise from the ground magically…

Well, not immediately tough. Due to this bug you need to:

– downgrade ansible to (the latest is

Or the playbook will try do serialize python objects which are actually strings.

Eg. if your configuration contains:

– name: “MyServer”

Ansible looks for a MyServer() class instead of using str(“MyServer”)

Trace http calls with python-requests

Today python-requests is the de-facto standard library for rest calls.

As everything goes on TLS, you can trace api calls with the following:

import httplib as http_client
http_client.HTTPConnection.debuglevel = 1
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.propagate = True

Set command output as facts with ansible

Having to check ntp configuration on a distributed cluster, I had to parse the “`timedatectl“` output into a dict and apply various checks.

I did this via the (infamous) 😉 jinja templates|pipelines.

# This is the check_time.yml playbook.

- name: Register the timedatectl output even in check mode. This command doesn't modify server configuration.
  shell: "timedatectl | grep ': '"
  check_mode: no
  register: timedatectl_output

# Note that:
#  - to use timedatectl_output into with_items we need to QUOTE-AND-BRACE it
#  - we can default the previously indefined timedatectl_status dictionary via
#       variable | default(VALUE)
#  - 
- name: Process timedatectl_output lines one at a time and update repeatedly the timedatectl_status variable using combine().
    timedatectl_status: >
        timedatectl_status | default({}) |
          dict([ item.partition(': ')[::2]|map('trim') ])
  with_items: "{{timedatectl_output.stdout_lines}}"

Now we can check 😉

- name: Clock synchronized
  fail: msg="Clock unsynchronized {{timedatectl_status}}"
  when: "{{timedatectl_status['NTP synchronized'] == 'no' }}"

- name: All hw clocks are utc
  fail: msg="hwclock not utc {{timedatectl_status}}"
  when: "{{timedatectl_status['RTC in local TZ'] == 'no' }}"

MySQL JSON fields on the ground!

Having to add a series of custom fields to a quite relational application, I decided to try the new JSON fields.

As of now you can:

– create json fields
– manipulate them with json_extract, json_unquote
– create generated fields from json entries

You can not:

– index json fields directly, create a generated field and index it
– retain the original json datatype (eg. string, int), as json_extract always returns strings.

Let’s start with a simple flask app:

# requirements.txt

Let’s create a simple flask app connected to a db.

import flask
import flask_sqlalchemy
from sqlalchemy.dialects.mysql import JSON

# A simple flask app connected to a db
app = flask.Flask('app')
db = flask_sqlalchemy.SQLAlchemy(app)

Add a class to the playground and create it on the db. We need sqlalchemy>=1.1 to support the JSON type!

# The model
class MyJson(db.Model):
    name = db.Column(db.String(16), primary_key=True)
    json = db.Column(JSON, nullable=True)

    def __init__(self, name, json=None): = name
        self.json = json

# Create table

Thanks to flask-sqlalchemy we can just db.session 😉

# Add an entry
entry = MyJson('jon', {'do': 'it', 'now': 1})

We can now verify using a raw select that the entry is now serialized on db

# Get entry in Standard SQL
entries = db.engine.execute(['*'], from_obj=MyJson)).fetchall()
(name, json_as_string), = first_entry  # unpack result (it's just one!)
assert isinstance(json_as_string, basestring) 

A raw select to extract json fields now:

entries = db.engine.execute([name, 'json_extract(json, "$.now")'], from_obj=MyJson)).fetchall()

(name, json_now), = first_entry  # unpack result (it's just one!)
assert isinstance(json_now, basestring) 
assert json_now != entry.json['now']  # '1' != 1 

Terraforming the clouds

Terraform is an infrastructure configuration manager by HashiCorp (Vagrant) like CloudFormation or Heat, supporting
various infrastructure providers including Amazon, VirtuaBox, …

Terraform reads *.tf and creates an execution plan containing all resources:

– instances
– volumes
– networks
– ..

You can check an example configuration here on github:

Unfortunately, it uses a custom but readable format instead of yaml.

# Create a 75GB volume on openstack
resource "openstack_blockstorage_volume_v1" "master-docker-vol" {
  name = "mastervol"
  size = 75

# Create a nova vm with the given colume attached
resource "openstack_compute_instance_v2" "machine" {
  name = "test"
  region = "${var.openstack_region}"
  image_id = "${var.master_image_id}"
  flavor_name = "${var.master_instance_size}"
  availability_zone = "${var.openstack_availability_zone}"
  key_pair = "${var.openstack_keypair}"
  security_groups = ["default"]
  metadata {
    ssh_user = "cloud-user"
  volume {
    volume_id = "${}"

Further resources (eg. openstack volumes|floating_ip, digitalocean droplets, docker containers, ..)
can be defined via plugins.

At the end of every deployment cycle, terraform updates the `terraform.tstate` state file (which may
be stored on s3 or on shared storage) describing the actual infrastructure.

Upon configuration changes, terraform creates and shows a new execution plan,
that you can eventually apply.

As there’s no ansible provisioner, a script can be used to extract an inventory file from a `terraform.tstate`.

MySQL 8.0 Innodb Cluster looks at MongoDB

MySQL turns 8.0 and the technical preview integrates a new “InnoDB Cluster”. The overall architecture reminds MongoDB:

– group replication with a single master, similar to replica-sets;
– a mysqlsh able to create replication group and local instances supporting js and python;
– a MySQL Router as a gateway to appservers, to be deployed on each client machine like the mongos.

Once installed, you can create a RG with a few commands:

su - rpolli

\py  # enable python mode. Create 3 instances in  ~/sandbox-dir/{3310,3320,3330}

for port in (3310, 3320, 3330, 3340, 3350):

Now we have 5 mysql instances listening on various ports. Create a cluster and check the newly created mysql_innodb_cluster_metadata schema.

\connect root:root@localhost:3310

cluster = dba.create_cluster('polli', 'pollikey');

\sql  # switch to sql mode


| Database                      |
| information_schema            |
| mysql                         |
| mysql_innodb_cluster_metadata |
| performance_schema            |
| sys                           |

Go back to the python mode and add the remaining instances to the cluster.

\py  # return to python mode again

# Eventually re-get the cluster.
cluster = dba.get_cluster('polli',{'masterKey':'pollikey'})  # masterKey is a shared secret between nodes.

# Add the other nodes
for port in ports[1:]:
    cluster.add_instance('root@localhost:' + str(port),'secret');

# Check status
cluster.status()  # BEWARE! The output is a str :( not a dict
    "clusterName": "polli",
    "defaultReplicaSet": {
        "status": "Cluster tolerant to up to 2 failures.",
        "topology": {
            "localhost:3310": {
                "address": "localhost:3310",
                "status": "ONLINE",
                "role": "HA",
                "mode": "R/W",
                "leaves": {
                    "localhost:3320": {
                        "address": "localhost:3320",
                        "status": "ONLINE",
                        "role": "HA",
                        "mode": "R/O",
                        "leaves": {}
                    "localhost:3330": {
                        "address": "localhost:3330",
                        "status": "ONLINE",
                        "role": "HA",
                        "mode": "R/O",
                        "leaves": {}

Now check the failover feature.

dba.kill_local_instance(3310)  # Successfully killed

# Parse the output with...
import json
json.loads(cluster.status())["defaultReplicaSet"]["topology"].keys()  # localhost:3320 WOW!

Once set up, created users will span the whole group.

CREATE USER 'admin'@'%' IDENTIFIED BY 'secret';

Now let’s connect to different cluster nodes.

mysql -uadmin -P3310 -psecret -e 'create database this_works_on_master;'  # OK
mysql -uadmin -P3320 -psecret -e 'create database wont_work_on_slave_even_if_admin;'  
ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement

The default setup allows writings only on master *even for admin|super users* that can be overriden as usual.

mysql> SHOW VARIABLES LIKE '%only' 
mysql> show variables like '%only';
| Variable_name                 | Value |
| read_only                     | ON    |
| super_read_only               | ON    |
mysql> set global super_read_only = OFF;  -- just for root
mysql> set global super_read_only = ON;  

mysql> set global read_only = OFF;  -- for all allowed users

Mongodb python driver is topology-aware. MySQL connectors instead rely on mysql-router for connecting to the right primary.

RHEV: recovery VM in Unknown state

If an operation that implies a state change on a VM fails, sometimes RHEV sets the VM status to ‘Unknown’.
This morning, after a fail of a ‘Power off’ operation on a VM in panic – due to a bug ([vdsm] AttributeError: GuestAgent instance has no attribute ‘_sock’) – the VM state was set to ‘Unknown’.
In this case basically you don’t can do anything…
If you know the real state of your VM, you can manually change it and restart the VM. So I set the state of my VM to 0 (stopped) and I restarted it.

[root@rhevm ~]# psql -U engine
psql (8.4.20)
Digita "help" per avere un aiuto.

engine=> select vm_guid from vm_static where vm_name='';
(1 riga)

engine=> select status from vm_dynamic where vm_guid='2d1e72a1-16c4-4f38-a21e-78113669dd98';
(1 riga)

engine=> update vm_dynamic set status=0 where vm_guid='2d1e72a1-16c4-4f38-a21e-78113669dd98';
[oVirt shell (connected)]# action vm start

job-id : 7f1ac179-047c-4d50-932f-3ae7970c96e2
status-state: complete
vm-id : 2d1e72a1-16c4-4f38-a21e-78113669dd98