tiistai 22. huhtikuuta 2014

Lessons learned from running Jenkins Slaves as Docker boxes

I've been running Jenkins slaves from docker containers just for a week now. In general, they have been working wonderfully. Of course there has been some kins and glitches, mainly when stopping and destroying containers. Version of Ansible script used for during this post can be found from https://github.com/sysart/ansible-jenkins-docker/tree/limits and most recent from https://github.com/sysart/ansible-jenkins-docker


I was trying to control the containers with systemd, but this seemed cause some problems. It seems that it is quite easy to get into a situation, where the container was restarted immediately after being stopped, causing some weird problems. These became visible when docker refused to remove containers, complaining that their mounts were still in use. So I decided to forget the usage of systemd and just use the docker -module from ansible to stop running containers.

- name: stop {{container_names}}
  docker: image="{{image_name}}" name="{{item}}" state=stopped
  with_items: container_names
After this, stopping and starting works perfectly


Be careful when you access files. Lets say that you have something like
VOLUME [ "/workspace" ]
ADD file /workspace/file
And then you run this with
docker run image /bin/ls /workspace
You will the see the file. But if you mount a volume, ie.
docker run -v /hostdir:/workspace image /bin/ls /workspace
 The directory will be empty. This bit me when I wanted to have the home directory for jenkins user to be on a volume, but have some files added from Dockerfile. I ended up with linking few directories in start script to achieve what I wanted.

Limits and memory

The default memory amount for containers was pretty low, but it was easy to adjust. And in Fedora, there's a default process limit set in "/etc/security/limits.d/90-nproc.conf" -file which forces process limit to 1024 for all users.
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.
*          soft    nproc     1024
root       soft    nproc     unlimited
This had to be changed for both host and containers. How this showed up was a random "Exception in thread "Thread-0" java.lang.OutOfMemoryError: unable to create new native thread" during test run.