torstai 9. marraskuuta 2017

Complex systems, root cause analysis and failure

I just read http://www.michaelnygard.com/blog/2017/11/root-cause-analysis-as-storytelling/ and it reminded me about classic "How Complex Systems Fail" ( http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf ) .

We are building complex systems all the time, and it's actually scary how many defenses against failure are built into them. These defenses can be as simple as checking return value of function, or more complex with fallbacks and alternative implementations. They aren't scary because they are there; they are scare when you think that if even one of those defenses is missing, things go bad pretty quickly.

Currently humans are still superior in defending these systems. They make workarounds and processes that avoid potential failures. It might be really interesting to apply machine learning in these situations, trying to find out the sets of actions that lead to failures.

But meanwhile, we have to learn from our systems by ourselves, so try to avoid hunting that one root cause.

keskiviikko 20. syyskuuta 2017

Amazon Cloudformation and tagging

AWS Cloudformation has multiple different commands in aws cli, like "create-stack", "update-stack" and "deploy". Each of these have their good and bad sides. For multiple reasons, we've decided to use "deploy". But the problem then becomes tagging. "Create-stack" and "update-stack" both have support for giving tags which are then propagated to all supported resources, but deploy does not have it. To make things worse, some Cloudformation types does not support tags as their properties, but they seem to get tags from Cloudformation stack if tags are there.

Now we do after deploy "aws cloudformation update-stack --stack-name <some> --tags ...". This becomes quite easy with some scripting when you have jq!



As update-stack wants to have all parameters with "UsePreviousValue=true", use some jq to generate necessary parameters. Then we take existing Parameters we've used for tagging and generate tags from that.

Well, actually "quite easy" is a lie, as I had some problems in understanding right syntax to replace key in JSON array with jq.

tiistai 19. syyskuuta 2017

Docker, Alpine and dillon's cron: "setpgid: Operation not permitted"

For a while, I've been strugling to get dillon's cron working properly in Docker container. The problem has been that when the ENTRYPOINT was anything else than in shell form, I got 'setpgid: Operation not permitted'.

So, this worked:
ENTRYPOINT /usr/sbin/crond -f
None of these seemed to work:
ENTRYPOINT ["/usr/sbin/crond", "-f"]
Or
ENTRYPOINT ["./entrypoint.sh"]
CMD ["/usr/sbin/crond", "-f"]
 As both would give
setpgid: Operation not permitted
But using shell form has been enough, for now. Now as I finally needed to have entrypoint for doing some preparation work, something had to be done.

"su -c" to the rescue.

ENTRYPOINT ["./entrypoint.sh"]
CMD ["su", "-c", "/usr/sbin/crond -f"]
Seems to be working perfectly.

perjantai 30. syyskuuta 2016

Reinstalling NOOBS and Rasbian on Raspberry Pi

I've got a small gluster of six Raspberry Pies, and I wanted to update them all. I'm just too lazy to update every SD card one by one, so I wondered if it is possible to do the update without removing SD cards from Raspberries. And it is!

Installation of NOOBS creates a partition on the SD card, /dev/mmcblk0p1. This partition contains files needed for install (details are available at https://github.com/raspberrypi/noobs/wiki/NOOBS-partitioning-explained). So only thing you need to do is to download new Noobs, mount /dev/mmcblk0p1 on device and replace files.

So you need do something like following to update NOOBS
curl https://downloads.raspberrypi.org/NOOBS_latest -L -o noobs.zip
sudo mount -t vfat  /dev/mmcblk0p1 /mnt
sudo rm -rf /mnt/*
sudo unzip noobs.zip -d /mnt/
Then you can boot up the Raspberry and start the NOOBS recovery by pressing shift -key during start up. But I'm too lazy to do even that. Luckily it is possible to make the NOOBS automatically start recovery and install new OS.

The behaviour of NOOBS can be controlled with commandline options. These options are defined in file called "recovery.cmdline"  in the root of  /dev/mmcblk0p1. The default contents of the file are following:

quiet ramdisk_size=32768 root=/dev/ram0 init=/init vt.cur_default=1 elevator=deadline

To make the installer start by default, you have to add "runinstaller" option. This only starts the installer, but it will need user input to continue. Another option, silentinstall, will tell the installer to go forth and install OS. Just make sure that there is only one OS in os/ -directory, and if it has more that one flavour, edit it's flavours.json file (details in https://github.com/raspberrypi/noobs#how-to-automatically-install-an-os).

So the recovery.cmdline should have following contents
runinstaller silentinstall quiet ramdisk_size=32768 root=/dev/ram0 init=/init vt.cur_default=1 elevator=deadline
After installation, the installer does remove the "runinstaller" -option from recovery.cmdline so it does not reinstall on every boot. The "silentinstall" option remains, though.

So when everything is in place, at next reboot, there's a new version of noobs, and it will install the OS automatically. Just remember, that everything on Raspberry will be wiped!

Here's a ansible playbook that does everything. It will take quite a while to complete, as the NOOBS image file is pretty big and takes a while to download and transfer to hosts. Reason why I'm downloading NOOBS to local machine is that I'm running this playbook with six Raspberries and it should be faster to download NOOBS once to local machine and the transfer it to Raspberries instead if downloading it on every Raspberry
- hosts: all
  vars:
    - noobs_file: noobs.zip
    - recovery_directory: /mnt/recovery
  tasks:
    - name: download noobs if not present
      local_action: get_url url=https://downloads.raspberrypi.org/NOOBS_latest dest={{playbook_dir}}/{{noobs_file}}
      become: no 
    - name: mount device
      mount: name=/mnt/recovery src=/dev/mmcblk0p1 fstype=vfat state=mounted 
    - name: remove old noobs
      file: path={{recovery_directory}}/* state=absent 
    - name: unzip noobs
      unarchive: src={{noobs_file}} dest={{recovery_directory}} owner=root group=root 
    - name: set reinstall
      lineinfile: dest={{recovery_directory}}/recovery.cmdline regexp='^(runinstaller)?\s?(silentinstall)?\s?(.*)$' line='runinstaller silentinstall \3' backrefs=yes 
    - name: unmount device
      mount: name=/mnt/recovery src=/dev/mmcblk0p1 fstype=vfat state=unmounted 
    - name: reboot
      command: shutdown -r now
      ignore_errors: True
  become: yes



 

torstai 8. syyskuuta 2016

Getting full error message from "docker service ps"


I was trying out docker swarm, network and services, and for some reason my nginx containers failed to start. Unfortunately, "docker service ps my-web" truncated the error, giving something like below
e5qw27qr4qbc9vrm68g3i9tl0   my-web.1  nginx  node3  Shutdown       Failed 2 seconds ago          "starting container failed: ca…"
There will be "--no-trunc" in version 1.13, which should resolve this. Meanwhile, using "docker inspect e5qw27qr4qbc9vrm68g3i9tl0" (id from docker service ps) gave the full error message.

In this case, the VM created with docker-machine did not have necessary pieces to connect secured network.

sunnuntai 7. elokuuta 2016

Ubuntu Xenial64 on Virtual Box and Vagrant


There was a lot of strange problems with ubuntu/xenial64, and in https://github.com/mitchellh/vagrant/issues/6616 there is a mention by Seth Vargo (employee of Hashicorp)
    
The ubuntu/xenial64 box is built wrong and horribly broken. Please note that "ubuntu" is the name of a user, not a representation of a canonical source for ubuntu images. Please try bento/ubuntu-16.04 instead. Thanks.
https://github.com/mitchellh/vagrant/issues/6616#issuecomment-227776489

These errors included following:

rejecting i/o to offline device
This happened almost everytime after heavier I/O operations, for example after loading Docker images.

stderr: Inappropriate ioctl for device
I think that this happened when Vagrant tried to setup network interfaces, mainly "enp0s8".

So just use bento/ubuntu-16.04

tiistai 5. heinäkuuta 2016

Jenkins Workflow: Executing build step for every change in commit

At work, we wanted send an email for every change that was made in a project. By default, Jenkins likes to collate changes into as few builds as possible, and normally sends an email per build.

The solution seemed to be usage of Jenkins Pipeline. Jenkins Pipeline enables creation and execution of jobs "on the fly" as needed.

First problem was to get access to ChangeLogSet. There is some preset variables in Jenkinsfile, but I could not find documentation for them. After some googling, Stack Overflow came to rescue.

def changes = currentBuild.rawBuild.changeSets

But when this was executed, Jenkins complained
org.jenkinsci.plugins.scriptsecurity.sandbox.RejectedAccessException: Scripts not permitted to use method org.jenkinsci.plugins.workflow.support.steps.build.RunWrapper getRawBuild


There's a "In-process Script Approval" -tool in Jenkins, where you can allow usage of these methods.

After that was solved, next problem was with serialization. As the actual job execution is transferred to different node, every non-serializable object caused an exception. To prevent this, I had to null objects in proper places. This then prevented running jobs in loop, as the variables in loops needed to be nulled before job execution. So I had to collect jobs into map, and after every job was defined, null everything and use "parallel" -task to execute jobs.

So the whole thing is here:

//changes is http://javadoc.jenkins-ci.org/hudson/scm/ChangeLogSet.html
def changes = currentBuild.rawBuild.changeSets
//We need to create branches for later execution, as otherwise there would be serialization exceptions
branches = [:]
for (int j = 0; j < changes.size(); j++) {
    def change = changes.get(j)
    for (int i = 0; i < change.getItems().size(); i++) {
        def entry = change.getItems()[i]
        def commitTitleWithCaseNumber = entry.getMsg()
        def commitMessage = entry.getComment()
        //split from first non digit
        def caseNumber = (commitTitleWithCaseNumber =~ /^[0-9]*/)
        // check that caseNumber was in case place
        if( !caseNumber[0].isEmpty() && commitTitleWithCaseNumber.startsWith(caseNumber[0])) {
          // Remove number from title, just for nicer subject line
          def commitTitle = commitTitleWithCaseNumber.substring(caseNumber[0].length()).trim()
          def number = caseNumber[0]
          branches["mail-${j}-${i}"] = {
              node {
                  emailext body: commitMessage, subject: "[Sysart ${number}] ${commitTitle}", to: 'redacted@example.com'
             }
          }
        }
        // Need to forcibly null all non serializable classes
        caseNumber = null
        entry = null
    }
    change = null
}
changes = null
stage 'Mail'
parallel branches
This was a little more difficult that I had expected, mainly because of serialization complications. But in the end, it works so it cannot be completely stupid.

maanantai 27. kesäkuuta 2016

docker: Error response from daemon: invalid bit range [4, 4]

Fooling around with docker, trying to create a overlay network. Copied some settings from net, and when starting container, docker reported an error.

root@infra-front:~# docker network create -d overlay --subnet=192.168.50.0/24 --gateway=192.168.50.1 --ip-range=192.168.50.4/32 test

32bdd738a6b7444d8c9a471e451793acd8793db1739f4f660b9981df018252c1

root@infra-front:~# docker run --rm -ti --net test alpine sh

docker: Error response from daemon: invalid bit range [4, 4].

It seems that my network settings where wrong. For now, I just removed gateway and subnet and things started to work.

tiistai 14. kesäkuuta 2016

Jaspersoft Studio 6.2.2 on Fedora 23: no swt-pi-gtk in java.library.path

When starting Jaspersoft Studio 6.2.2 only thing I got was

Jaspersoft Studio:
GTK+ Version Check
Jaspersoft Studio:
An error has occurred. See the log file
/home/jyrki/projects/jasper/TIBCOJaspersoftStudio-6.2.2.final/configuration/1465956968426.log.

Log file had:

TIBCOJaspersoftStudio-6.2.2.final/configuration/org.eclipse.osgi/264/0/.cp/libswt-pi-gtk-4530.so: libgtk-x11-2.0.so.0: cannot open shared object file: No such file or directory
no swt-pi-gtk in java.library.path
/home/jyrki/.swt/lib/linux/x86/libswt-pi-gtk-4530.so: libgtk-x11-2.0.so.0: cannot open shared object file: No such file or directory
Can't load library: /home/jyrki/.swt/lib/linux/x86/libswt-pi-gtk.so
Problem got fixed after installing gtk2.i686 (32 bit version)

sudo dnf install gtk2.i686

Using ldd (print shared library dependencies) helped to find out what was actually missing, as the error message is somewhat miss leading (Can't load library: /home/jyrki/.swt/lib/linux/x86/libswt-pi-gtk.so)

ldd /home/jyrki/projects/jasper/TIBCOJaspersoftStudio-6.2.2.final/configuration/org.eclipse.osgi/264/0/.cp/libswt-pi-gtk-4530.so
ldd: warning: you do not have execution permission for `/home/jyrki/projects/jasper/TIBCOJaspersoftStudio-6.2.2.final/configuration/org.eclipse.osgi/264/0/.cp/libswt-pi-gtk-4530.so'
linux-gate.so.1 (0xf7741000)
libgtk-x11-2.0.so.0 => not found
libgthread-2.0.so.0 => /lib/libgthread-2.0.so.0 (0xf76af000)
libXtst.so.6 => /lib/libXtst.so.6 (0xf76a8000)
libc.so.6 => /lib/libc.so.6 (0xf74da000)
libpthread.so.0 => /lib/libpthread.so.0 (0xf74bd000)
libglib-2.0.so.0 => /lib/libglib-2.0.so.0 (0xf737b000)
libX11.so.6 => /lib/libX11.so.6 (0xf723a000)
libXext.so.6 => /lib/libXext.so.6 (0xf7226000)
libXi.so.6 => /lib/libXi.so.6 (0xf7214000)
/lib/ld-linux.so.2 (0x5660d000)
libxcb.so.1 => /lib/libxcb.so.1 (0xf71ed000)
libdl.so.2 => /lib/libdl.so.2 (0xf71e8000)
libXau.so.6 => /lib/libXau.so.6 (0xf71e4000)

tiistai 19. huhtikuuta 2016

Using Keycloak APIs: "RESTEASY004655: Unable to invoke request"


Following exception was thrown while executing multiple calls to Keycloak API.


Caused by: javax.ws.rs.ProcessingException: RESTEASY004655: Unable to invoke request
at org.jboss.resteasy.client.jaxrs.engines.ApacheHttpClient4Engine.invoke(ApacheHttpClient4Engine.java:287)
at org.jboss.resteasy.client.jaxrs.internal.ClientInvocation.invoke(ClientInvocation.java:436)
at org.jboss.resteasy.client.jaxrs.internal.proxy.ClientInvoker.invoke(ClientInvoker.java:102)
at org.jboss.resteasy.client.jaxrs.internal.proxy.ClientProxy.invoke(ClientProxy.java:64)
at com.sun.proxy.$Proxy276.findAll(Unknown Source)
at org.keycloak.admin.client.resource.ClientsResource$findAll.call(Unknown Source)
Caused by: java.lang.IllegalStateException: Invalid use of BasicClientConnManager: connection still allocated.
Make sure to release the connection before allocating another one.
at org.apache.http.util.Asserts.check(Asserts.java:34)
at org.apache.http.impl.conn.BasicClientConnectionManager.getConnection(BasicClientConnectionManager.java:160)
at org.apache.http.impl.conn.BasicClientConnectionManager$1.getConnection(BasicClientConnectionManager.java:142)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:423)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at org.jboss.resteasy.client.jaxrs.engines.ApacheHttpClient4Engine.invoke(ApacheHttpClient4Engine.java:283)

I was calling

keycloak.realm(realm).clients().create(representation) 
and did not read anything from response. Simple fix was

            def response = keycloak.realm(realm).clients().create(representation)
            response.close()


lauantai 26. maaliskuuta 2016

Problem with Kubernetes SkyDNS healtz

I had some problems when trying to get DNS working on Kubernetes. I followed instructions from https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns. Everything seemed to be working well, but the pod got restarted after 30 seconds. The log for healthz -container had following entries:

2016/03/19 04:25:25 Client ip 10.0.96.1:50326 requesting /healthz probe servicing cmd sleep 10 && nslookup kubernetes.default.svc.kube.local localhost >/dev/null
2016/03/19 04:25:25 Healthz probe error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.kube.local': Name does not resolve, at 2016-03-19 04:25:23.967737423 +0000 UTC, error exit status 1
After trying a lot of things, I found a bug report for Alpine Linux. Basically, the nslookup does not respect the server parameter, if /etc/resolv.conf  has entries. Comment on that issues recommends using dig or drill for querying.

So I made a simple image and pushed it into docker hub, https://hub.docker.com/r/jyrki/arm-kubernetes-healthz-drill/. Nothing fancy, just added drill (https://github.com/jyrkiput/arm-kubernetes-healthz-drill/blob/master/Dockerfile). I used the existing image as base as I wanted to have the exechealtz available.

Then I had to change the healtz command to
drill -q kubernetes.default.svc.kube.local @localhost

perjantai 18. maaliskuuta 2016

Kubernetes 1.2.0 beta-1 not starting on Raspbian 8.0

While trying to start kubernetes v1.2.0 on Raspbian 8.0 I ran into problems. Only k8s-master and k8s-master-proxy containers were started so the system was not getting up properly. Logs for k8s-master were telling following:

7215 kubelet.go:2365] skipping pod synchronization - [Failed to start ContainerManager system validation failed - Following Cgroup subsystem not mounted: [memory]]
Cgroup memory subsystem is not enabled by default. You can enable it by adding
cgroup_enable=memory
into /boot/cmdline. Reboot is needed after this.

You can check if the memory subsystem is enabled by listing /sys/fs/cgroup/ which should the have directory called "memory" among others
blkio  cpu  cpuacct  cpu,cpuacct  cpuset  devices  freezer  memory  net_cls  systemd

 

sunnuntai 14. helmikuuta 2016

Reveal.js with backgound image and company logo, all in css theme

I wanted to use reveal.js for my presentations. In our company, we have (as usual) a standard template for presentations. The problem was that the background had two parts: A small triangle in top left corner, and company logo in bottom right corner.

CSS3 supports having multiple backgrounds, so after a little tinkering I came up with following css snippets.

body {
  background-image: url('theme_images/corner.svg'), url('theme_images/sysart.svg');
  background-repeat: no-repeat;
  background-position: top left, bottom right;
  background-size: auto 30%, 20% 20%;
  }

The frontpage was some what different.
html.title body {
  background:url("theme_images/front.svg"), url('theme_images/sysart.svg');
  background-repeat: no-repeat;
  background-position: top left, bottom right;
  background-size: 100% auto, 20% 20%;
}
The whole css can be seen in https://github.com/sysart/reveal.js/blob/sysart-theme/sysart.css#L10


perjantai 5. syyskuuta 2014

Getting project version from Maven project in Jenkins

Execute system Groovy Script with following script
import hudson.FilePath
import hudson.remoting.VirtualChannel

def pomFile = build.getParent().getWorkspace().child('pom.xml').readToString();
def project = new XmlSlurper().parseText(pomFile);      
def param = new hudson.model.StringParameterValue("MAVEN_VERSION", project.version.toString());
def paramAction = new hudson.model.ParametersAction(param);
build.addAction(paramAction);
Now you can use the "MAVEN_VERSION" in build, for example pass it on with "Trigger parameterized build on other projects" post build action by adding predefined parameters:
project_version=${MAVEN_VERSION}
Or in some shell commands
build.sh ${MAVEN_VERSION}
I've found this to be useful when one project deploys artefacts into repository, and another project wants to use those artefacts with exact version number.

Bash script for resolving Docker ports

Docker containers can expose ports to outer world when needed. This is done by giving "-P" flag to docker run -command. This will publish all exposed ports to "a random high port from the range 49000 to 49900" (from Docker userguide). Even though the user guide doesn't explicitly say that the docker daemon will track what ports are published, I would guess that it does.

Publishing to random ports is useful as then you can have multiple containers running at the same. We need this for our CI -setup. But there's also requirement for accessing container from outside, so we need to resolve the port published by -P in build scripts.

"docker inspect" is a command which can be used for this. It takes "--format=template" parameter, which can be used to output information about container.

So following bash script resolves public port for given container name and exposed port.

#!/bin/bash
set -o nounset
set -o errexit
function resolvePort() {
  local container=$1
  local exposedPort=$2
  local port=$(docker inspect --format='{{range $p, $conf := .NetworkSettings.Ports}}{{if eq $p "'$exposedPort'/tcp"}}{{(index $conf 0).HostPort}}{{end}} {{end}}' $container)
  echo $port
}
 The magic happens in
{{range $p, $conf := .NetworkSettings.Ports}}{{if eq $p "'$exposedPort'/tcp"}}{{(index $conf 0).HostPort}}{{end}} {{end}}
In easier to read format:
{{range $p, $conf := .NetworkSettings.Ports}}
    {{if eq $p "'$exposedPort'/tcp"}}
        {{(index $conf 0).HostPort}}
    {{end}}
{{end}}
The "{{range $p, $conf := .NetworkSettings.Ports}}" iterates over ports configuration. It is like map, and one key-value -pair looks something like this
"80/tcp": [
  {
      "HostIp": "0.0.0.0",
      "HostPort": "49101"
  }

$p is the key and $conf is the value. $p is the exposed port and its value is something like "80/tcp".

Then there's {{if eq $p "'$exposedPort'/tcp"}}, which is simple comparison.

The value of $conf is a array, and in this use case, we just need the first value (also there is just one). So in
{{(index $conf 0).HostPort}}
(index $conf 0) gives just that, and then (index $conf 0).HostPort returns 49101.

This can be then used
readonly publicPort=$(resolvePort containerName 80)
curl localhost:$publicPort
We might have been able to avoid this by using another container for tests and doing container linking, but this seems to work okay. And I wanted to learn about docker inspect --format :)

maanantai 2. kesäkuuta 2014

Using HTTP Basic Authentication with YUM

HTTP Basic Authentication is supported by yum. You just have to add username and password into repository configuration

username=username
password=password
But, at least on the older versions of Centos, this does not work. Problem is that yum relies on python library called "urlgrabber" for connection. The version of this library that is available on repositories doesn't seem to working. You can see the packages in the repository with "yum info", "yum search" but you cannot install them.

I resolved this problem by installing urlgrabber from sources:

git clone git://yum.baseurl.org/urlgrabber.git/
cd urlgrabber
python setup.py install
That got it working.

Trying out Ansible with Vagrant


There's two virtual machines in this setup, called "ansible" and "development". The first one ("ansible") is the host that is running ansible, and the latter one is the target. Both are running Fedora 20 images created as explained in previous blog post.

There's three, somewhat advanced, configurations in the Vagrantfile. The Vagrantfile defines two different hosts. They have to be named, and they can have different configurations. The.

To make things easier, there's a private network between these two hosts. This way the hosts can have predefined IPs, which can then be used for making connections between them.

Third thing is provisioning setup. Provisioning simply means configuring the environment by installing packages and modifying configurations. Here, simple bash script is used. This script installs Ansible from source, sets up profile -file to source env-setup on login and does some other minor things.

So here's the Vagrantfile:

VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.define "development" do |development|
    development.vm.box = "basic-fedora-20-x86_64"
    development.vm.network :private_network, ip: "192.168.111.100"
  end
  config.vm.define "ansible" do |ansible|
    ansible.vm.box = "basic-fedora-20-x86_64"
    ansible.vm.network :private_network, ip: "192.168.111.101"
    ansible.vm.provision "shell", path: "install_ansible.sh"
  end
end 
When defining multiple machines, vagrant commands are applied to all by default. So you can start both of machines with "vagrant up" in the same directory where the Vagrantfile is. After a while, both machine have booted.

Then you can ssh into "ansible" with vagrant ssh ansible. On login, you should see something like

Setting up Ansible to run out of checkout...
 The directory, where the Vagrant file is, can be found from /vagrant. In that directory, you can fined a inventory file (development-hosts) and simple playbook (base.yml). To make sure that everything is working, go to /vagrant -directory and execute

ansible -i development-hosts -u vagrant -k -m ping all
which will ask for a password (which is "vagrant") and should then print

ansible | success >> {
    "changed": false,
    "ping": "pong"
}
deployment | success >> {
    "changed": false,
    "ping": "pong"
}
The command  "ansible -i development-hosts -k -m ping all" is different than the one in tutorial. "-i development-hosts" tells ansible to use given file as inventory, "-u vagrant" means that the user who makes connection is vagrant and "-k" makes ansible to ask a ssh password. The "-u vagrant" is not necessary here, because without it, ansible would use the username of currently logged in user.

When ping is working, you can run the "base.yml" playbook with

ansible-playbook -i development-hosts -k base.yml
This playbook will output information about default ipv4 interface using debug -module.

Now you have a pretty good playground for trying out Ansible. First thing to do would be setting up public key authentication so you do not need to write password all the time (hint: authorized_key module in ansible)


tiistai 6. toukokuuta 2014

Initialize virtual machine with Vagrant

I like to have well defined environment for my projects. By "well defined" I mean that the environment must be explicitly defined, ie. there must be a way to initialize whole environment over and over again while being sure that the environment is exactly the same.

In Linux world, environment can be defined as distribution (ie. Fedora, Ubuntu, Mint), installed packages and configurations. These are actually a major dependency to your whole project, and they can cause major headaches. Packages are continuously updated with security and bug fixes, their behavior can change, the version of packages can be different between different distribution and even between installations. So these must be controlled.

I've previously blogged about how Veewee can be used for creating VirtualBox -images in controlled manner. In those posts, I created a Fedora 20 -image from scratch. One of the key files in that process is the kicstart -definition, which is specific to Redhat -derived distros. In this kickstart file, you can define what packages to install into your system. Major caveat here is that then you are tightly coupled to Redhat -distros. Also updating these packages in controlled manner is impossible.

Configurations are completely different beast. If you ssh into your environment and make a change, changes are that you will not remember to do the same change next time. So you're creating so called "Snowflake" -environments.

So you need a tool for handling packages and configurations. Main option here are Puppet, Chef, CF-Engine, Salt and Ansible. Others exists, but I would say that those are the biggest ones.

But how you can use these for initializing virtual machine? First step is to create running instances which are identical to each other.

First, you should have a way to control your Virtual machine with definition files that can be in version control system. Veewee is a tool which can define the basics of VM, ie. things like disk size, operating system, some boot stuff.

Veewee can then output Vagrant boxes, which are binary files. Vagrant is a tool for controlling VM instances, ie. creating, starting, stopping and destroying.

After you've created and added a Vagrant box, you can start using it.

vagrant init basic-fedora-20-x86_64

This command creates an Vagrantfile into your current directory. The Vagrantfile is a text file, which can then be controlled. In simplest form, it is just

VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "basic-fedora-20-x86_64"
end


"vagrant init" creates a Vagrantfile, which has a lot of commented lines. I'd recommend that you read all of them. Seriously.

After this, you can just execute in the directory where the Vagrantfile is
vagrant up

Which will start your virtual machine defined in Vagrantfile. This might take some time, but after this you can ssh into your box.

vagrant ssh

This will log you in as "vagrant" -user. As the kickstart file, which defines the basic things for Fedora installation, sets the passwordless sudo for vagrant user, you have all the control needed.

Be free to fool around, but remember that every change you make is persisted. If you want to a clean environment, you have to logout from the running virtual machine and execute
vagrant destroy
vagrant up

Lastly, you can stop the instance with
vagrant halt

and bring it back up with
vagrant up

lauantai 26. huhtikuuta 2014

Building Fedora 20 image with Veewee

I was in need of Fedora 20 virtual machine, and at the same time wanted to learn something new. So instead of googling around for Vagrant image, I decided to use Veewee to build a new one. The resulting source files are available at Github

After installing Veewee, I started to create my image file. Veewee has a lot of predefined templates, one of which was Fedora-20-x86_64.

One of my goals in project structures is to have everything related to a project available with one checkout. So in this case, I want to have the Veewee definition file in the same directory that the rest of the files. The basic usage of bundler and Veewee requires that the 'bundle exec veewee' is executed in the Veewee -directory. So you have to define the working directory when running command

NOTE: THIS DOESN'T SEEM TO WORKING RIGHT NOW, https://github.com/jedi4ever/veewee/issues/936

bundle exec veewee vbox define 'basic-fedora-20-x86_64' 'Fedora-20-x86_64' -w ../project/veewee

This will create the definitions directory under "../project/veewee/", ie. "../project/veewee/definitions/basic-fedora-20-x86_64". I like to have the tool name as the directory name here, so there's some hint what these files are.

NOTE: WORKING COMMAND, EXECUTE IN YOUR project/veewee -directory

BUNDLE_GEMFILE=/home/jyrki/projects/veewee/Gemfile bundle exec veewee vbox define 'basic-fedora-20-x86_64' 'Fedora-20-x86_64'

After executing this command, you should have following project structure:

example-project
`-- veewee
    `-- definitions
        `-- basic-fedora-20-x86_64
            |-- base.sh
            |-- chef.sh
            |-- cleanup.sh
            |-- definition.rb
            |-- ks.cfg
            |-- puppet.sh
            |-- ruby.sh
            |-- vagrant.sh
            |-- virtualbox.sh
            |-- vmfusion.sh
            `-- zerodisk.sh

I would say that the most interesting file here is "ks.cfg", which is the kickstart file defining the installation. From there you can change
disk sizes etc.

Last command to execute for buidling image is

BUNDLE_GEMFILE=/home/jyrki/projects/veewee/Gemfile bundle exec veewee vbox build basic-fedora-20-x86_64

This will start VirtualBox
and starts to execute commands on it. Some of these include typing to the console, which is kind of funny to look at. The Virtualbox is left running, so you can ssh into it with the command

ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -p 7222 -l vagrant 127.0.0.1

Before you can use the image with vagrant, you have to export it from veewee and then add it into the vagrant. First execute command

BUNDLE_GEMFILE=/home/jyrki/projects/veewee/Gemfile bundle exec veewee vbox export basic-fedora-20-x86_64

which will shutdown the machine if it is running and export it to "basic-fedora-20-x86_64.box" -file. Now this file can be imported to vagrant with

vagrant box add 'basic-fedora-20-x86_64' 'basic-fedora-20-x86_64.box'

After this, you can start using the box in your Vagrantfiles.

tiistai 22. huhtikuuta 2014

Lessons learned from running Jenkins Slaves as Docker boxes


I've been running Jenkins slaves from docker containers just for a week now. In general, they have been working wonderfully. Of course there has been some kins and glitches, mainly when stopping and destroying containers. Version of Ansible script used for during this post can be found from https://github.com/sysart/ansible-jenkins-docker/tree/limits and most recent from https://github.com/sysart/ansible-jenkins-docker

Systemd

I was trying to control the containers with systemd, but this seemed cause some problems. It seems that it is quite easy to get into a situation, where the container was restarted immediately after being stopped, causing some weird problems. These became visible when docker refused to remove containers, complaining that their mounts were still in use. So I decided to forget the usage of systemd and just use the docker -module from ansible to stop running containers.

- name: stop {{container_names}}
  docker: image="{{image_name}}" name="{{item}}" state=stopped
  with_items: container_names
After this, stopping and starting works perfectly

Volumes

Be careful when you access files. Lets say that you have something like
VOLUME [ "/workspace" ]
ADD file /workspace/file
And then you run this with
docker run image /bin/ls /workspace
You will the see the file. But if you mount a volume, ie.
docker run -v /hostdir:/workspace image /bin/ls /workspace
 The directory will be empty. This bit me when I wanted to have the home directory for jenkins user to be on a volume, but have some files added from Dockerfile. I ended up with linking few directories in start script to achieve what I wanted.

Limits and memory

The default memory amount for containers was pretty low, but it was easy to adjust. And in Fedora, there's a default process limit set in "/etc/security/limits.d/90-nproc.conf" -file which forces process limit to 1024 for all users.
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.
*          soft    nproc     1024
root       soft    nproc     unlimited
This had to be changed for both host and containers. How this showed up was a random "Exception in thread "Thread-0" java.lang.OutOfMemoryError: unable to create new native thread" during test run.