Smoke testing openshift with ansible-galaxy

The ansible-galaxy ioggstream.ocp_health role can run a smoke test on openshift in minutes:

– etcd consistency
– rhn subscriptions
– master status
– registry, ipfailover and router instances

NOTE: it’s not a replacement of oadm diagnostics ;)


ansible-galaxy install ioggstream.ocp_health
# eventually tweak parameters
# vi /root/.ansible/roles/ioggstream.ocp_health/tests/ocp_health.yml
ansible-playbook --check /root/.ansible/roles/ioggstream.ocp_health/tests/ocp_health.yml

If you want to create a test project with two apps, one with a PVC and one with an ephemeral, set create_test_project.


ansible-playbook -v -e create_test_project=yes /root/.ansible/roles/ioggstream.ocp_health/tests/ocp_health.yml

Brief OpenShift troubleshooting

If you have issues after an automagic openshift-on-openstack deployment:

1. Remember: every buildconfig created *before* the registry is not authorized to push the images

2. Remember: hawkular is a java application. Startup is slow. Just click there and wait for the startup

3. Ansible is your friend. To get container logs, just


ansible all -m shell -a 'ls /var/log/containers/CONTAINER_NAME*'

ansible all -m shell -a 'cat /var/log/containers/CONTAINER_NAME*' > CONTAINER_NAME.log

4. If a container don’t startup during the deployment, a broken image may have been downloaded

Jun 1 23:30:36 dev-7-infra-0 atomic-openshift-node: I0601 23:30:36.234103 32913 server.go:608] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"router-1-deploy", UID:"033670a9-470e-11e7-878f-fa163eac2bf7", APIVersion:"v1", ResourceVersion:"936", FieldPath:""}): type: 'Warning' reason: 'FailedSync' Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: Error response from daemon: {\"message\":\"invalid header field value \\\"oci runtime error: container_linux.go:247: starting container process caused \\\\\\\"exec: \\\\\\\\\\\\\\\"/pod\\\\\\\\\\\\\\\": stat /pod: no such file or directory\\\\\\\"\\\\n\\\"\"}"

Cleanup docker repo


docker ps -aq | xargs docker rm
docker rmi 90e9207f44f0 --force

5. Run oadm diagnostics on the master ;)

6. Check #oc get hostsubnet

Set command output as facts with ansible

Having to check ntp configuration on a distributed cluster, I had to parse the “`timedatectl“` output into a dict and apply various checks.

I did this via the (infamous) ;) jinja templates|pipelines.

# This is the check_time.yml playbook.

- name: Register the timedatectl output even in check mode. This command doesn't modify server configuration.
  shell: "timedatectl | grep ': '"
  check_mode: no
  register: timedatectl_output

# Note that:
#  - to use timedatectl_output into with_items we need to QUOTE-AND-BRACE it
#  - we can default the previously indefined timedatectl_status dictionary via
#       variable | default(VALUE)
#  - 
- name: Process timedatectl_output lines one at a time and update repeatedly the timedatectl_status variable using combine().
  set_fact:
    timedatectl_status: >
      {{
        timedatectl_status | default({}) |
        combine(
          dict([ item.partition(': ')[::2]|map('trim') ])
        )
      }}
  with_items: "{{timedatectl_output.stdout_lines}}"

Now we can check ;)

- name: Clock synchronized
  fail: msg="Clock unsynchronized {{timedatectl_status}}"
  when: "{{timedatectl_status['NTP synchronized'] == 'no' }}"

- name: All hw clocks are utc
  fail: msg="hwclock not utc {{timedatectl_status}}"
  when: "{{timedatectl_status['RTC in local TZ'] == 'no' }}"


Terraforming the clouds

Terraform is an infrastructure configuration manager by HashiCorp (Vagrant) like CloudFormation or Heat, supporting
various infrastructure providers including Amazon, VirtuaBox, …

Terraform reads *.tf and creates an execution plan containing all resources:

– instances
– volumes
– networks
– ..

You can check an example configuration here on github:

Unfortunately, it uses a custom but readable format instead of yaml.

# Create a 75GB volume on openstack
resource "openstack_blockstorage_volume_v1" "master-docker-vol" {
  name = "mastervol"
  size = 75
}

# Create a nova vm with the given colume attached
resource "openstack_compute_instance_v2" "machine" {
  name = "test"
  region = "${var.openstack_region}"
  image_id = "${var.master_image_id}"
  flavor_name = "${var.master_instance_size}"
  availability_zone = "${var.openstack_availability_zone}"
  key_pair = "${var.openstack_keypair}"
  security_groups = ["default"]
  metadata {
    ssh_user = "cloud-user"
  }
  volume {
    volume_id = "${openstack_blockstorage_volume_v1.master-docker-vol.id}"
  }
}


Further resources (eg. openstack volumes|floating_ip, digitalocean droplets, docker containers, ..)
can be defined via plugins.

At the end of every deployment cycle, terraform updates the `terraform.tstate` state file (which may
be stored on s3 or on shared storage) describing the actual infrastructure.

Upon configuration changes, terraform creates and shows a new execution plan,
that you can eventually apply.

As there’s no ansible provisioner, a terraform.py script can be used to extract an inventory file from a `terraform.tstate`.