The intro

what is Kaniko

Kaniko is simply a Docker image builder. You may not be satisfied with this introduction, but this is what Kaniko is all about. Build images and push them to a registry from Dockerhub, ECR, ACR or your own container image registry.

How does it work?

It’s super easy to use. Extract the base image file system first. Run each of the instructions (COPY, ADD, RUN) into the base image’s file system. It would create a snapshot of the current state of the file system, add newly added or changed files to the base image, and update the image metadata. Quite useful. No? Not until you learn how to use it.

Use cases

Kaniko creates Docker images inside a container or Kubernetes. If you face any of these issues, then this article will help you to mitigate them:

If you are building images on a shared machine or runner, you may be able to reduce the risk of security breaches. There are several methods for building images. You might be able to write some bash scripts, Ansible playbooks, or some sort of script into your CI/CD provider to do something like docker build -t blahblahblah:v0.0.1 -f SomeSortOfDockerfile. You can run this command from your bash or from a container (Docker in Docker). Neither of them is suitable for shared environments. First, imagine a scenario where another user/developer/pipeline gains access to your code, or you need to access some critical data from a database, or maybe you need to get some critical credentials (or keys) from another source. This could lead to a disaster. Your code could be accessed by others, your database data could be leaked, or someone could steal your keys. This is the worst nightmare for a security and operations team.
Reduced build time is another feature you may be interested in. Servers (runners) are sometimes a problem for DevOps teams due to the long-running time of a build process. You may have a bad day if your codebase is huge and you have many dependencies to get from the internet. No one (I mean the DevOps or SRE teams) wants to get another ticket from the Dev team about “our build pipelines are slow. we are on a force and hotfixes should be merged into production ASAP”.
Perhaps you have a limited budget for infrastructure, so you rely completely on your Kubernetes cluster. From the database to the environment for building Docker images. (Pro tips: Do not deploy anything on Kubernetes. In the future, I will write about it. Databases could be run on Kubernetes, but don’t do that)

We have Kaniko to fix these issues. Although it’s not a superhero, it can make us the company’s superhero.

The issue

What we had

I will use this opportunity to share how our CI/CD infrastructure in Alibaba Travels was deployed and what we did to make it more reliable. This was an old infrastructure that needed to be renewed or restructured. We have just migrated several projects to this new structure into the staging, dev and other environment other than production but there are still many more that must be done.

Returning to the main topic, we have projects from different stacks. We use everything from Dotnet Core to Nodejs and Python. Previously, all projects used a shared runner, which ran on Gitlab Continuous Integration. There are several shell and Docker runners installed on that beefy server (to reduce failure risks, they are still available but we are planning to migrate them). These are the types of runners used by the projects. Obviously, projects that run build process on Docker will have access to the Docker socket, and on pipelines, all people within that runner can see each other and even grant shell access to another container.

The danger zone

This could lead to reading some data and maybe codes. Shell runners are also available. They provide shell access to the runner and can do whatever they want to the gitlab-runner user. Several reasons led us to move our runners to Kubernetes. Better control of each project’s resources, better control of the network, isolation between containers, and more.

A step further

After we create several Kubernetes runners, everything works fine, but there is another issue. It is not possible to mount the Docker socket of the host into the container. It is too risky. The container will gain access to the host’s container. There are some of you who said that’s ok, we can create dedicated machines to run on (which is a costly solution). Can you mount a single Docker socket to multiple containers? Additionally, you have to deal with security context, which creates more holes in the system.

This is the disaster.

The Solution

What we need

Kaniko helps us to close these gaps. After implementing the Kaniko, it took me 21 minutes to build a Docker image from a DotNet application. The process takes three and a half minutes without the Kubernetes runner. For a Node.js application, this time increased to 45 minutes. On our shared runners, it takes 11 minutes.

What we do

We want it to be more reliable, but what we do takes much longer.

Caching

Caching helps me to fix it. Caching should be enabled first (--cache=true flag). We have two choices for caching.

cache into a local directory (--cache-dir) Cached into a specified repository (--cache-repo flag) Since we have a lot of resources on our Registry, we decided to cache them

Not to forget that COPY and RUN instructions can be cached when you enable the --cache-copy-layers=true flag. Note 2: These two caching methods will become available when you set --cache=true

Dockerfile

Check out this Dockerfile. Everything seems to be okay, and the developer is happy with it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


FROM mcr.microsoft.com/dotnet/sdk:6.0
RUN apt update && apt install libgnutls30 && apt install ca-certificates && update-ca-certificates
WORKDIR /app
ENV ASPNETCORE_ENVIRONMENT=Production
ENV ASPNETCORE_URLS=http://+:80
COPY . ./
RUN dotnet restore   src/app/app.csproj --configfile src/app/nuget.config
RUN dotnet publish src/app/apptifier.csproj -c Release -o /app/out
FROM mcr.microsoft.com/dotnet/sdk:6.0
WORKDIR /app
COPY --from=build-env /app/out .
EXPOSE 80
ENTRYPOINT ["dotnet","app.dll"]

It’s pretty normal, but I think this is another disaster. Let me explain:

COPY instructions should be divided into smaller parts. In this case I changed to copy every module of the application.
It is possible to merge two RUN instructions.
The apt command does not delete cached packages. This is not important in this situation, but we will do it anyway.

Reducing the number of lines in the Dockerfile does not work for Kaniko. Don’t be afraid to write more lines. Here is what I changed it to:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


FROM mcr.microsoft.com/dotnet/sdk:6.0 AS build-env

WORKDIR /app
ENV ASPNETCORE_ENVIRONMENT=Production
ENV ASPNETCORE_URLS=http://+:80

COPY src/Module1 ./src/Module1
COPY src/Module2 ./src/Module2
COPY src/app ./src/app

COPY tests/app.Test ./tests/Iapp.Test.Test
COPY tests/app.integrity.Test ./tests/app.integrity.Test

COPY app.sln ./
RUN dotnet publish --configfile src/app/nuget.config src/app/app.csproj -c Release -o /app/out --nologo -verbosity:quiet

FROM mcr.microsoft.com/dotnet/sdk:6.0

WORKDIR /app
COPY --from=build-env /app/out .

EXPOSE 80

entrypoint ["dotnet","app.dll"]

I used the same method for other projects as well.

Snapshot

In the beginning of my post, I mentioned that Snapshots are a method of storing layers of images and their states. With --single-snapshot=true, you can only take a snapshot at the end of the process, which is super fast, but not cache friendly, or you can use --snapshotMode to ensure the proper snapshot method is used. I just copied and pasted the snapshotmone arguments from the official document to demonstrate how it works.

Mode	result
full	The full file contents and metadata are considered when snapshotting. This is the least performant option, but also the most robust.
redo	he file mtime, size, mode, owner uid and gid will be considered when snapshotting. This may be up to 50% faster than “full”, particularly if your project has a large number files.
time	only file mtime will be considered when snapshotting
for some reason I choose redo.

Result

First, it took a little bit longer than the original docker build command, which is understandable. Kaniko attempts to create caches. However, after making a few changes to the modules and files, the results are stunning. Here’s a look at the table:

Project type	docker build time	Kaniko without caching	Kaniko first time caching	Kaniko Second time with caching
Dotnet core	3:10	14:25	3:40	0:27
NodeJS	11:45	45:20	13:10	2:10

Using Kaniko

There are four things you need to use Kaniko

A Dockerfile and source code called build context, along with a little tweak to your Dockerfile
a registry that you push to
Kaniko
Kaniko documents

My Method

Here is the complete command I use:

1
2
3
4
5
6
7
8
9


        /kaniko/executor
        --context "projectDirectory"
        --dockerfile "$projectDirectory/Dockerfile"
        --destination "imageTag:version"
        --cache-copy-layers=true
        --snapshotMode=time
        --use-new-run
        --cache=true
        --cache-repo="imageTag:cache"

Conclusion

Notes

Notes on Prodction

This was applied to staging, development, and other environments other than production. It may take up to six months to fully adopt this method and tools. Therefore, if you wish to implement this, make sure you have tested everything and no complex issues like security issues appear and harm your production. Thus, it is a stable and perfect tool, Use at your own risk.

Other notes

The goal of this post is to help you improve the performance and security of your Docker build pipelines. Please feel free to leave any kind of comment. Please let me know if there is any conflict or issue :)

Create Docker images secure and fast with Kaniko

Tests and results of it on building docker images using Kaniko and its effect on pipeline security and speed