Software from North: huhtikuuta 2014

lauantai 26. huhtikuuta 2014

Building Fedora 20 image with Veewee

I was in need of Fedora 20 virtual machine, and at the same time wanted to learn something new. So instead of googling around for Vagrant image, I decided to use Veewee to build a new one. The resulting source files are available at Github

After installing Veewee, I started to create my image file. Veewee has a lot of predefined templates, one of which was Fedora-20-x86_64.

One of my goals in project structures is to have everything related to a project available with one checkout. So in this case, I want to have the Veewee definition file in the same directory that the rest of the files. The basic usage of bundler and Veewee requires that the 'bundle exec veewee' is executed in the Veewee -directory. So you have to define the working directory when running command

NOTE: THIS DOESN'T SEEM TO WORKING RIGHT NOW, https://github.com/jedi4ever/veewee/issues/936

bundle exec veewee vbox define 'basic-fedora-20-x86_64' 'Fedora-20-x86_64' -w ../project/veewee

This will create the definitions directory under "../project/veewee/", ie. "../project/veewee/definitions/basic-fedora-20-x86_64". I like to have the tool name as the directory name here, so there's some hint what these files are.

NOTE: WORKING COMMAND, EXECUTE IN YOUR project/veewee -directory

BUNDLE_GEMFILE=/home/jyrki/projects/veewee/Gemfile bundle exec veewee vbox define 'basic-fedora-20-x86_64' 'Fedora-20-x86_64'

After executing this command, you should have following project structure:

example-project
`-- veewee
    `-- definitions
        `-- basic-fedora-20-x86_64
            |-- base.sh
            |-- chef.sh
            |-- cleanup.sh
            |-- definition.rb
            |-- ks.cfg
            |-- puppet.sh
            |-- ruby.sh
            |-- vagrant.sh
            |-- virtualbox.sh
            |-- vmfusion.sh
            `-- zerodisk.sh

I would say that the most interesting file here is "ks.cfg", which is the kickstart file defining the installation. From there you can change
disk sizes etc.

Last command to execute for buidling image is

BUNDLE_GEMFILE=/home/jyrki/projects/veewee/Gemfile bundle exec veewee vbox build basic-fedora-20-x86_64

This will start VirtualBox
and starts to execute commands on it. Some of these include typing to the console, which is kind of funny to look at. The Virtualbox is left running, so you can ssh into it with the command

ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -p 7222 -l vagrant 127.0.0.1

Before you can use the image with vagrant, you have to export it from veewee and then add it into the vagrant. First execute command

BUNDLE_GEMFILE=/home/jyrki/projects/veewee/Gemfile bundle exec veewee vbox export basic-fedora-20-x86_64

which will shutdown the machine if it is running and export it to "basic-fedora-20-x86_64.box" -file. Now this file can be imported to vagrant with

vagrant box add 'basic-fedora-20-x86_64' 'basic-fedora-20-x86_64.box'

After this, you can start using the box in your Vagrantfiles.

tiistai 22. huhtikuuta 2014

Lessons learned from running Jenkins Slaves as Docker boxes

I've been running Jenkins slaves from docker containers just for a week now. In general, they have been working wonderfully. Of course there has been some kins and glitches, mainly when stopping and destroying containers. Version of Ansible script used for during this post can be found from https://github.com/sysart/ansible-jenkins-docker/tree/limits and most recent from https://github.com/sysart/ansible-jenkins-docker

Systemd

I was trying to control the containers with systemd, but this seemed cause some problems. It seems that it is quite easy to get into a situation, where the container was restarted immediately after being stopped, causing some weird problems. These became visible when docker refused to remove containers, complaining that their mounts were still in use. So I decided to forget the usage of systemd and just use the docker -module from ansible to stop running containers.

- name: stop {{container_names}}
docker: image="{{image_name}}" name="{{item}}" state=stopped
with_items: container_names

After this, stopping and starting works perfectly

Volumes

Be careful when you access files. Lets say that you have something like

VOLUME [ "/workspace" ]
ADD file /workspace/file

And then you run this with

docker run image /bin/ls /workspace

You will the see the file. But if you mount a volume, ie.

docker run -v /hostdir:/workspace image /bin/ls /workspace

The directory will be empty. This bit me when I wanted to have the home directory for jenkins user to be on a volume, but have some files added from Dockerfile. I ended up with linking few directories in start script to achieve what I wanted.

Limits and memory

The default memory amount for containers was pretty low, but it was easy to adjust. And in Fedora, there's a default process limit set in "/etc/security/limits.d/90-nproc.conf" -file which forces process limit to 1024 for all users.

# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.
* soft nproc 1024
root soft nproc unlimited

This had to be changed for both host and containers. How this showed up was a random "Exception in thread "Thread-0" java.lang.OutOfMemoryError: unable to create new native thread" during test run.

sunnuntai 20. huhtikuuta 2014

Installing Veewee on Fedora 20 with rbenv

rbenv install 1.9.2-p320

Last 10 log lines:
ossl_pkey_ec.c:819:29: note: each undeclared identifier is reported only once for each function it appears in
ossl_pkey_ec.c: In function ‘ossl_ec_group_set_seed’:
ossl_pkey_ec.c:1114:89: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (EC_GROUP_set_seed(group, (unsigned char *)RSTRING_PTR(seed), RSTRING_LEN(seed)) != RSTRING_LEN(seed))
^
/usr/bin/gcc -I. -I../../.ext/include/x86_64-linux -I../.././include -I../.././ext/openssl -DRUBY_EXTCONF_H=\"extconf.h\" -I/home/jyrki/.rbenv/versions/1.9.2-p320/include -fPIC -O3 -ggdb -Wextra -Wno-unused-parameter -Wno-parentheses -Wpointer-arith -Wwrite-strings -Wno-missing-field-initializers -Wno-long-long -o ossl_cipher.o -c ossl_cipher.c
make[1]: *** [ossl_pkey_ec.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: Leaving directory `/tmp/ruby-build.20140418062211.18615/ruby-1.9.2-p320/ext/openssl'
make: *** [mkmain.sh] Error 1

This is a known issue, which can be fixed by patching rbenv. Note that the patch contains changes to test/openssl/test_pkey_ec.rb, which doesn't seem to be present on 1.9.2-p320. So the command found from issue description needs some modifications.

curl -fsSL https://bugs.ruby-lang.org/projects/ruby-trunk/repository/revisions/41808/diff?format=diff | filterdiff -x ChangeLog -x test/openssl/test_pkey_ec.rb | rbenv install --patch 1.9.2-p320

(edit: Reported as ruby-build #555)
Note that running this requires filterdiff command from patchutil -package.

After that, rest of the installation went smoothly.

maanantai 14. huhtikuuta 2014

Self registering Jenkins hosts with Docker and Ansible

At work (Sysart), we have had a lot of problems with Jenkins builds interfering each other when running on same slave. Problems ranged from port conflicts to trying to use same Firefox instance during Selenium tests. Easiest thing to solve this is to run only one build at a time per slave. But we have some decent hardware with Intel I7 processors (4 cores, HT enabled), so running one job at a time per slave is kinda wasteful.

Previously, we used Ovirt for creating virtual machines, and then added them manually to Jenkins as slaves. But as we wanted to 10+ slaves, this would've been tedious. Also running a VM has overhead, which starts to hurt pretty quickly.

So enter Docker, Ansible and Swarm -plugin.

Basic idea in this is to have Docker image which connects to Jenkins immediately at the start. The image contains everything needed for running our tests, including stuff required for Selenium tests like Firefox. Building of images and containers are handled with Ansibles docker and docker-image -modules, actual starting and stopping of running containers is done with systemd mainly because I wanted to learn how to use that too :). Systemd also has systemd-journal, which is pretty amazing.

The image is build on containing host for now, as it was just easier. I'm definitely checking Docker repository in near future.

Volumes are used for workspace, mainly to persist Maven repositories between restarts. I had some problems with write permissions on the first try, but resolved this with some bash scripting.

Started containers can have labels, which are just added in the playbook with docker -modules "command" -variable. There's some funny quoting to get parameters right, see "start.sh" for details.

Main files are added below and example playbook with module can be found from Github.

Of course, there were some problems doing this.

Ansible

The docker-image module reports changes every time. This effectively prevents usage of handlers to restart containers.
Couldn't get uri -module to accept multiple http codes as return code ("Can also be comma separated list of status codes."). Most likely just misunderstanding of documentation
service -module failed to parse output when starting/stopping container services complaining about inability to parse json. Docker start and stop output the id of container in to stdout, so this might be the reason?

Docker

docker -d starts all the containers as default. This can be prevented by adding -r as parameter. But this doesn't seem to affect when the service is restarted. If docker -d starts the containers, then systemd tries to start container which fails causing restart.
I couldn't get volumes to be chowned for jenkins user. We need to have a non-root user for our tests, as we do some filesystem permission tests.
docker -d starts all the containers as default. This can be prevented by adding -r as parameter. But this doesn't seem to affect when the service is restarted. If docker -d starts the containers, then systemd tries to start container which fails causing restart.

Jenkins

Slave removal is slow, which can easily cause problems as containers are stopped and restarted quickly. Luckily this can be checked via REST api.

There's still few things I'd like to add here:

Enable commiting and downloading the used container for a given test run. This would be helpful in situations where tests were successful on developers environment but not on Jenkins. But then, developers should use same image base as the test environment :)
Have a production image, which would be extended by test image.

And protip for image development. Have two different images, "jenkins-slave" and "jenkins-slave-test". The "jenkins-slave-test" is inherited from "jenkins-slave", but has ENTRYPOINT overridden to "/bin/bash" so you can explore the image.

So, the main parts of how this was done. I'm sure that there's a lot of better ways to do things, so please, tell me :).

The jenkins_slaves.yml -playbook is something like this:

- hosts: jenkins-slaves
vars:
- jenkins_master: "http://jenkins.example.com"
- container_names: [ builder1, builder2, builder3, builder4, builder5, builder6 ]
roles:
- { role: docker-host, image_name: jenkins_builder }

The template for docker file is following:

FROM fedora
MAINTAINER jyrki.puttonen@sysart.fi
RUN yum install -y java-1.7.0-openjdk-devel blackbox firefox tigervnc-server dejavu-sans-fonts dejavu-serif-fonts ImageMagick unzip ansible puppet git tigervnc
RUN useradd jenkins
ADD vncpasswd /home/jenkins/.vnc/passwd
RUN chown -R jenkins:jenkins /home/jenkins/.vnc
# Run as jenkins user. Biggest reason for this is that in our tests, we want
# # check some filesystem rights, and those tests will fail if the user is root.
#ADD http://maven.jenkins-ci.org/content/repositories/releases/org/jenkins-ci/plugins/swarm-client/1.15/swarm-client-1.15-jar-with-dependencies.jar /home/jenkins/
ADD swarm-client-1.15-jar-with-dependencies.jar /home/jenkins/
# Without this, maven has problems with umlauts in tests
ENV JAVA_TOOL_OPTIONS -Dfile.encoding=UTF8
#so vncserver etc use right directory
ENV HOME /home/jenkins
WORKDIR /home/jenkins/
ADD start.sh /home/jenkins/
RUN chmod 755 /home/jenkins/start.sh
ENTRYPOINT ["/home/jenkins/start.sh"]

Start.sh starts jenkins swarm plugin:

#!/bin/bash
OWNER=$(stat -c %U /workspace)
if [ OWNER != "jenkins" ]
then
chown -R jenkins:jenkins /workspace
fi
# Use swarm client to connect to jenkins. Broadcast didn't work due to container networking,
# so easiest thing to do was just to set right address.
{% set labelscsv = labels|join(",") -%}
{% set labelsflag = '-labels ' + labelscsv -%}
su -c "/usr/bin/java -jar swarm-client-1.15-jar-with-dependencies.jar -master {{jenkins_master}} -executors 1 -mode {{mode}} {{ labelsflag if labels else '' }} -fsroot /workspace $@" - jenkins

vars/main.yml has following variables defined

docker_directory: "docker"
image_name: "igor-builder"
docker_file: "Dockerfile.j2"
docker_data_directory: "/data/docker"
image_build_directory: "{{docker_data_directory}}/{{image_name}}"

And tasks/main.yml is like this. There's a lot of comments inside so I decided to include it as is to here.

# As I want to control individual containers with systemd, install new unit
# file that adds "-r" to options so docker -d doesn't start containers.
# Without this, containers started by systemd would fail to start, and would be
# started again
- name: install unit file for docker
copy: src=docker.service dest=/etc/systemd/system/docker.service
notify:
- reload systemd

# Install docker from updates-testing, as there 0.9.1 available and it handles deleting containers better
- name: install docker
yum: name=docker-io state=present enablerepo=updates-testing

- name: start docker service
service: name=docker enabled=yes state=started

- name: install virtualenv
yum: name=python-virtualenv state=absent

- name: install pip
yum: name=python-pip state=present

# docker module requires version that is > 0.3, which is not in Fedora repos, so install with pip
- name: install docker-py
pip: name=docker-py state=present

- name: create working directory {{image_build_directory}} for docker
file: path={{image_build_directory}} state=directory

- name: install unit file for systemd {{container_names}}
template: src=container-runner.service.j2 dest=/etc/systemd/system/{{item}}.service
with_items: container_names
notify:
- enable services for {{container_names}}
- reload systemd

# Setup files needed for building docker image for Jenkins usage
- name: Download swarm client
get_url: url="http://maven.jenkins-ci.org/content/repositories/releases/org/jenkins-ci/plugins/swarm-client/1.15/swarm-client-1.15-jar-with-dependencies.jar" dest={{image_build_directory}}

- name: copy vnc password file
copy: src=vncpasswd dest={{image_build_directory}}

- name: copy additional files
copy: src={{item}} dest={{image_build_directory}}
with_items: additional_files

- name: create start.sh
template: src=start.sh.j2 dest={{image_build_directory}}/start.sh validate="bash -n %s"

- name: copy {{docker_file}} to host
template: src="{{docker_file}}" dest="{{image_build_directory}}/Dockerfile"

# This is something I would like to dom but docker module can't set volumes as rw,
# volumes="/data/builders/{{item}}:/home/jenkins/work:rw"
# Also I couldn't get the user to "jenkins" for volumes
- name: create volume directories for containers
file: path="/data/builders/{{item}}" state=directory
with_items: container_names

#
# For some reason, this will always return changed
- name: build docker image {{ image_name }}
docker_image: path="{{image_build_directory}}" name="{{image_name}}" state=present
notify:
- stop {{container_names}}
- wait for containers to be removed on Jenkins side
- remove {{container_names}}
- create containers {{container_names}} with image {{image_name}}
- wait for containers to be started
- start {{container_names}}

and handlers/main.yml

- name: reload systemd
command: systemctl daemon-reload

# Can't use service here, Ansible fails to parse output
- name: enable services for {{container_names}}
command: /usr/bin/systemctl enable {{ item }}
with_items: container_names

# service cannot be used here, Ansible fails to parse output.
- name: stop {{container_names}}
command: /usr/bin/systemctl stop {{ item }}
# service: name={{ item }} state=stopped
with_items: container_names

# Jenkins takes a while to remove slaves. If containers are started immediately, they will have names
#.containing ip -address of the host in them. ugly :(
- name: wait for containers to be removed on Jenkins side
command: curl -s -w %{http_code} {{ jenkins_master }}/computer/{{ansible_hostname}}-{{item}}/api/json -o /dev/null
register: result
tags: check
until: result.stdout.find("404") != -1
retries: 10
delay: 5
with_items: container_names

- name: remove {{container_names}}
docker: name="{{item}}" state=absent image="{{image_name}}"
with_items: container_names

- name: create containers {{container_names}} with image {{image_name}}
docker: image="{{image_name}}" name="{{item}}" hostname="{{item}}" memory_limit=2048MB state=present command="\"-name {{ansible_hostname}}-{{item}}\"" volumes="/data/builders/{{item}}:/workspace"
with_items: container_names

- name: wait for containers to be started
pause: seconds=10

- name: start {{container_names}}
command: /usr/bin/systemctl start {{ item }}
with_items: container_names