Record'IT Blog


Recording the IT best technologies


How to create a web scale infrastructure based on Docker, CoreOS, Vulcand and Kubernetes. And why Object Storage becomes the de facto data repository.

Introduction

I recently created a similar post using Mesos & Marathon instead of kubernetes and said that kubernetes was at too early stage.

This week I attended a kubernetes Meetup in London and I’ve been really amazed by the kubernetes demo from Kelsey Hightower (from the CoreOS team).

Then, I decided that I needed to learn more about Kubernetes NOW.

Mesos & Marathon are providing HA (with Mesos backup masters and Marathon instances proxying the requests). Kubernetes doesn’t provide this capability yet.

But, kubernetes stores all the critical information in ETCD, so the temporary unavailability of the kubernetes controller manager doesn’t have a major impact.

Why Docker ?

The first question should even be, why Linux containers ?

Linux containers are providing a very low compute and storage overhead compared to virtual machines.

Docker has really simplified the way anyone can leverage Linux containers, but also provide useful features like the Dockerfiles, the Docker Hub and a layered file system (see https://docs.docker.com/terms/layer/).

In my setup, I use a private Docker Registry which is storing the images on ViPR using the Amazon S3 API (as described in my previous post).

More information available at http://www.docker.io

Why CoreOS ?

All the components will run in Docker containers, so you could say that the Operating System isn’t really important.

But, CoreOS is providing many advantages:

  • Automatically update the entire OS as a single unit, instead of package by package (and can even reboot the system if you don’t have any SPOF in your infrastructure)
  • Include etcd for Service Discovery, which is also used by Vulcand (and even kubernetes in this setup)
  • Include systemd and fleet, a tool that presents your entire cluster as a single init system. I don’t use fleet in this setup, but I use it for other purposes, like starting an elastic search cluster in few seconds)

More information available at http://www.coreos.com

Why Vulcand ?

I see many tutorials about how to deploy containers or virtual machines, but I’m always surprised to see that they rarely cover the load balancing part of the infrastructure.

In my opinion, load balancing is a key component of a web scale infrastructure. Why applying automation everywhere if then your application cannot be reached by your users ?

Vulcand is a reverse proxy for HTTP API management and microservices.

And Vulcand is watching etcd to automatically detect new rules it needs to implement, so you don’t need to reload any service. Simply add the right keys in etcd and your service/application becomes available from the outside world.

More information available at http://www.vulcanproxy.com

Why kubernetes ?

Kubernetes is becoming more and more popular to manage Linux containers at scale and provide some very advanced capabilities, like managing rolling upgrades easily.

Why Object Storage ?

Using the different software above, an application can be deployed, scaled easily and accessed from the outside world in few seconds.

But, what about the data ?

Structured content would probably be stored in a distributed database, like MongoDB, for example

Unstructured content is traditionally stored in either a local file system, a NAS share or in Object Storage.

A local file system doesn’t work as a container can be deployed on any node in the cluster.

A NAS share could technically work, but would be very complex. For example, the share would have to be mounted on all the hosts, then you would have to specify the volume for each container, run the container in privileged mode, … If the NAS share isn’t available when the container is launched, then the application inside the container will have to manage the issue as well

On the other side, Object Storage can be used by any application from any container, is highly available due to the use of load balancers, doesn’t require any provisioning and accelerate the development cycle of the applications. Why ? Because a developper doesn’t have to think about the way data should be stored, to manage a directory structure, and so on.

Also, I’ve developed a web application which shows how an application can manage uploads and downloads without being in the data path. I’ll run this application in this setup later.

Big picture

alt text

This diagram shows the different components and how I have setup the 3 nodes CoreOS cluster.

I use Keepalived to make sure the public IP 10.64.231.84 is always up either on the coreos1 or the coreos3 node.

Vulcand is running on each node to balance the load between the users and the web application, but also between the web application and the different ViPR nodes.

A private Docker Registry is running on the coreos1 node and is storing the image layers on ViPR using the Amazon S3 API.

The kubernetes kube-controller-manager, kube-scheduler and kube-apiserver services are running on the coreos2 node.

The kubernetes kubelet and kube-proxy services are running on all the CoreOS nodes.

The different steps

Creating a Docker image for my web application

Here is the Dockerfile I use to create the Docker Image:

FROM golang
WORKDIR /
RUN git clone https://djannot:[email protected]/djannot/s3pics.git
WORKDIR /s3pics
RUN go build
EXPOSE 8080

I build the image using the --no-cache parameter to make sure the latest source code is cloned from github.

[email protected] /media/share1/Dockerfiles/s3pics $ docker build --no-cache .
Sending build context to Docker daemon 2.048 kB
Sending build context to Docker daemon
Step 0 : FROM golang
---> 1ea210e0e1f6
Step 1 : WORKDIR /
---> Running in f6987b175723
---> 022aa96f56d0
Removing intermediate container f6987b175723
Step 2 : RUN git clone https://djannot:[email protected]/djannot/s3pics.git
---> Running in 54d6a32e90ba
Cloning into 's3pics'...
---> 3369bca87577
Removing intermediate container 54d6a32e90ba
Step 3 : WORKDIR /s3pics
---> Running in d875bc08eac9
---> 73946142ea54
Removing intermediate container d875bc08eac9
Step 4 : RUN go build
---> Running in e0bd59c1f28b
---> baebdd1b633e
Removing intermediate container e0bd59c1f28b
Step 5 : EXPOSE 8080
---> Running in 16d3fa9be1c5
---> 815b7aed2c83
Removing intermediate container 16d3fa9be1c5
Successfully built 815b7aed2c83

Finally, I push the image to the Docker registry

[email protected] /media/share1/Dockerfiles/s3pics $ docker push 10.64.231.45:5000/s3pics:2.0
The push refers to a repository [10.64.231.45:5000/s3pics] (len: 1)
Sending image list
Pushing repository 10.64.231.45:5000/s3pics (1 tags)
Image 511136ea3c5a already pushed, skipping
Image 16386e29a1f4 already pushed, skipping
Image 835c4d274060 already pushed, skipping
Image 22c23ce0a90c already pushed, skipping
Image 3f1e6432f26e already pushed, skipping
Image 7982826b1e59 already pushed, skipping
Image 1dafbd563f5a already pushed, skipping
Image 7a94d87545e8 already pushed, skipping
Image e2d60f7b3d07 already pushed, skipping
Image 4f23222e2f74 already pushed, skipping
Image 258b590ccdee already pushed, skipping
Image 986643313a7b already pushed, skipping
Image 1ea210e0e1f6 already pushed, skipping
022aa96f56d0: Image successfully pushed
3369bca87577: Image successfully pushed
73946142ea54: Image successfully pushed
baebdd1b633e: Image successfully pushed
815b7aed2c83: Image successfully pushed
Pushing tag for rev [815b7aed2c83] on {http://10.64.231.45:5000/v1/repositories/s3pics/tags/2.0}

I’ve specified a tag (2.0) to make sure each node of the cluster will pull the latest version from the private Docker Registry

Deploying the kubernetes service

First, we need to create a kubernetes replication controller which will manage the pods corresponding to this service.

To do this, I’ve created a json file called s3picsStableController.json to describe this kubernetes replication controller:

{
    "id": "s3picsStableController",
    "kind": "ReplicationController",
    "apiVersion": "v1beta1",
    "desiredState": {
        "replicas": 1,
        "replicaSelector": {
            "name": "s3pics",
            "environment": "production",
            "track": "stable"
        },
        "podTemplate": {
            "desiredState": {
                "manifest": {
                    "version": "v1beta1",
                    "id": "s3pics",
                    "containers": [{
                        "name": "s3pics",
                        "image": "10.64.231.45:5000/s3pics",
                        "workingdir": "/s3pics",
                        "command": ["./s3pics", "[email protected]", "-SecretKey=xxxx", "-EndPoint=http://denisnamespace.ns.viprds.ad.forest:9020", "-Namespace=denisnamespace"],
                        "ports": [{"containerPort": 8080}],
                        "livenessProbe": {
                            "enabled": true,
                            "type": "http",
                            "initialDelaySeconds": 30,
                            "httpGet": {
                                "path": "/",
                                "port": "8080"
                            }
                        }
                    }]
                }
            },
            "labels": {
                "name": "s3pics",
                "environment": "production",
                "track": "stable"
            }
        }
    },
    "labels": {
        "name": "s3pics",
        "environment": "production",
        "track": "stable"
    }
}

I’ve specified a value 1 for the replicas parameter, so this kubernetes replication controller will launch only one pod.

Let’s now launch it:

# kubecfg -c s3picsStableController.json create replicationControllers
I0208 11:33:35.510366   16811 restclient.go:146] Waiting for completion of operation 11
Name                     Image(s)                   Selector                                          Replicas
----------               ----------                 ----------                                        ----------
s3picsStableController   10.64.231.45:5000/s3pics   environment=production,name=s3pics,track=stable   1

After that, I’ve created a json file called s3picsService.json to describe the kubernetes service:

{
    "id": "s3pics",
    "kind": "Service",
    "apiVersion": "v1beta1",
    "port": 80,
    "containerPort": 8080,
    "selector": {
        "name": "s3pics",
        "environment": "production"
    }
}

Let’s now launch it:

# kubecfg -c s3picsService.json create services
I0208 11:35:04.606757   17096 restclient.go:146] Waiting for completion of operation 15
Name                Labels              Selector                             IP                  Port
----------          ----------          ----------                           ----------          ----------
s3pics                                  environment=production,name=s3pics   192.168.200.139     80

I can also check that I have one kubernetes pod running for my service:

# kubecfg list pods
Name                                   Image(s)                   Host                Labels                                            Status
----------                             ----------                 ----------          ----------                                        ----------
f3faf912-af7d-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.231.25/       environment=production,name=s3pics,track=stable   Running

Updating the vulcand proxy

Let’s now see how we can access this web application from the outside world.

I’ve developed a small tool in golang which is using both the Kubernetes API and the etcd API to:

  • determine what Kubernetes services are running without a corresponding vulcand rule in etcd and create the missing rules
  • determine what vulcand rules exist in etdc for Kubernetes services which aren’t running anymore and delete them

I now run this tool to create the vulcand rule in etcd for the service I’ve just deployed.

kubernetes-vulcan -KubernetesEndPoint=http://10.64.229.36:8081 -EtcdEndPoint=http://10.64.229.36:4001/v2 -EtcdRootKey=/kubernetes-vulcan
Kubernetes Service s3pics: Key /keys/vulcand/backends/b1/servers/192.168.200.139-80 created or updated in ETCD with value={"Id":"192.168.200.139-80","URL":"http://192.168.200.139:80"} for the frontend f1

Accessing the web application

The web application can now be accessed from the outside world (http://s3pics.ad.forest).

alt text

The name of the container where the application is running is displayed on the top left corner.

Kubernetes is automatically using the kubernetes pod id to define the hostname of the container, so all the pods will use the same hostname.

The Amazon S3 endpoint used to upload and download pictures is displayed on the bottom left corner and shows that ViPR is used to store the data.

We can now upload a picture.

alt text

The code below is sent to the internet browser by the web application to allow the browser to upload the picture directly to the Object Storage platform.

var files = $("#file")[0].files;
var reader = new FileReader();
reader.onload = function(event){
  var content = event.target.result;
  try {
    $.ajax({
      url: 'http://bucket1.denisnamespace.ns.viprds.ad.forest:9020/pictures/HD-landscape-Photographs.png',
      data: content,
      cache: false,
      processData: false,
      type: 'PUT',
      beforeSend: function (request)
      {
        request.setRequestHeader('Content-Length','3933288');
                  request.setRequestHeader('Content-Type','binary/octet-stream');
                  request.setRequestHeader('x-amz-date','Sun, 08 Feb 2015 11:08:33 UTC');
                  request.setRequestHeader('host','bucket1.denisnamespace.ns.viprds.ad.forest');
                  request.setRequestHeader('Authorization','AWS [email protected]:8rA7sUCFyVspftEFpq/CvL9qtLc=');

      },
      success: function(data, textStatus, request){
        $('#alert-success').html("Picture uploaded").show().delay(5000).fadeOut();
      },
      error: function(data, textStatus, request){
        $('#alert-danger').html("Upload failed").show().delay(5000).fadeOut();
      }
    });
  }
  catch (e) {
    alert(e);
  }
}
reader.readAsArrayBuffer(files[0]);

The fact that the picture is uploaded directly to the Object Storage platform means that the web application is not in the data path. This allows the application to scale without deploying hundreds of instances.

This web application can also be used to display all the pictures stored in the corresponding Amazon S3 bucket.

alt text

The url displayed below each picture shows that the picture is downloaded directly from the Object Storage platform, which again means that the web application is not in the data path.

This is another reason why Object Storage is the de facto standard for web scale applications.

Scaling the kubernetes service

One of the beauty of kubernetes is its ability to scale easily the number of instances of an application currently running.

# kubecfg resize s3picsStableController 20
I0208 12:11:00.031966   25705 restclient.go:146] Waiting for completion of operation 18
metadata:
  creationTimestamp: 2015-02-08T11:33:35+01:00
  labels:
    environment: production
    name: s3pics
    track: stable
  name: s3picsStableController
  namespace: default
  resourceVersion: "13881583"
  selfLink: /api/v1beta1/replicationControllers/s3picsStableController?namespace=default
  uid: efad202f-af7d-11e4-a49d-005056bf7661
spec:
  replicas: 20
  selector:
    environment: production
    name: s3pics
    track: stable
  template:
    metadata:
      creationTimestamp: null
      labels:
        environment: production
        name: s3pics
        track: stable
    spec:
      containers:
      - command:
        - ./s3pics
        - [email protected]
        - -SecretKey=sREvJOCLvj9ELxtwIvkuANkmE/C5uB18ePBpbflH
        - -EndPoint=http://denisnamespace.ns.viprds.ad.forest:9020
        - -Namespace=denisnamespace
        cpu: 0m
        image: 10.64.231.45:5000/s3pics
        imagePullPolicy: PullIfNotPresent
        livenessProbe:
          httpGet:
            path: /
            port: "8080"
          initialDelaySeconds: 30
        memory: "0"
        name: s3pics
        ports:
        - containerPort: 8080
          protocol: TCP
        workingDir: /s3pics
      dnsPolicy: ClusterFirst
      restartPolicy:
        always: {}
      volumes: null
status:
  replicas: 1

After few seconds, 20 instances are running.

# kubecfg list pods
Name                                   Image(s)                   Host                Labels                                            Status
----------                             ----------                 ----------          ----------                                        ----------
2a6755f2-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.229.36/       environment=production,name=s3pics,track=stable   Running
2a5d9454-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.229.36/       environment=production,name=s3pics,track=stable   Running
2a646e15-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.229.36/       environment=production,name=s3pics,track=stable   Running
2a65124e-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.231.45/       environment=production,name=s3pics,track=stable   Running
2a65b059-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.229.36/       environment=production,name=s3pics,track=stable   Running
2a665d4d-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.229.36/       environment=production,name=s3pics,track=stable   Running
2a669e8e-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.231.45/       environment=production,name=s3pics,track=stable   Running
2a66fa48-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.231.25/       environment=production,name=s3pics,track=stable   Running
2a6590c5-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.229.36/       environment=production,name=s3pics,track=stable   Running
2a663b29-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.231.25/       environment=production,name=s3pics,track=stable   Running
2a66bec2-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.231.25/       environment=production,name=s3pics,track=stable   Running
2a66dbcc-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.231.45/       environment=production,name=s3pics,track=stable   Running
2a673714-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.231.45/       environment=production,name=s3pics,track=stable   Running
f3faf912-af7d-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.231.25/       environment=production,name=s3pics,track=stable   Running
2a5df8b2-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.231.45/       environment=production,name=s3pics,track=stable   Running
2a6534a8-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.231.25/       environment=production,name=s3pics,track=stable   Running
2a656318-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.231.45/       environment=production,name=s3pics,track=stable   Running
2a667a8a-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.231.45/       environment=production,name=s3pics,track=stable   Running
2a65d73c-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.231.25/       environment=production,name=s3pics,track=stable   Running
2a6719c0-af83-11e4-a49d-005056bf7661   10.64.231.45:5000/s3pics   10.64.229.36/       environment=production,name=s3pics,track=stable   Running

I don’t need to run my tool again to update the vulcand rules because the kubernetes service natively manage the load balancing between the pods.