Containerization
Deep inside user space and linux process management
But how is Docker is even implemented? What means to spawn a container?
Go: minimal container
Create the following program in a new folder
// container/main.go
package main
import (
"fmt"
"os"
"os/exec"
"syscall"
)
const ROOT_DIR = "./fs"
func run() {
fmt.Printf("running %v as %d\n", os.Args[2:], os.Getpid())
cmd := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...)
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
cmd.SysProcAttr = &syscall.SysProcAttr{
Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID,
}
cmd.Run()
}
func child() {
fmt.Printf("running %v as %d\n", os.Args[2:], os.Getpid())
syscall.Sethostname([]byte("container"))
// Change root to fs directory
must(syscall.Chroot(ROOT_DIR))
must(syscall.Chdir("/"))
// Mount /proc to see PID namespace info
must(syscall.Mount("proc", "/proc", "proc", 0, ""))
cmd := exec.Command(os.Args[2], os.Args[3:]...)
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
cmd.Env = append(os.Environ(), "PATH=/bin")
must(cmd.Run())
// Unmount the proc directory after container shuts down
must(syscall.Unmount("/proc", 0))
}
func must(err error) {
if err != nil { panic(err) }
}
func main() {
switch os.Args[1] {
case "run":
run()
case "child":
child()
default:
panic("bad command")
}
}
How does this work
Consider the following flowchart, it is the explanation for the code snippet given directly above
flowchart TD
A[START] --> B{os.Args}
B -- run --> C[run]
B -- child --> D[child]
B -- other --> E[panic]
C --> F[exec /proc/self/exe with args]
F --> D
D --> G[sethostname]
G --> H[chroot and chdir]
H --> J[mount]
J --> K[exec target command inside container]
K --> M[wait for exit and unmount]
M --> N[exit child process]
E --> O[exit main process]
Minimal filesystem
We will spawn a container rooted inside the ./fs
directory, for this we need to setup fs
. Create the following script outside fs
#!/bin/bash
# container/make-rootfs.sh
# 1. Make sure the fs tree exists
mkdir -p fs/{bin,proc}
# 2. Download a real binary
if [ ! -f ./fs/bin/busybox ]; then
curl -L https://busybox.net/downloads/binaries/1.35.0-x86_64-linux-musl/busybox -o fs/bin/busybox
fi
# (Check BusyBox site for the latest x86_64 build if the above link 404s)
# 3. Make it executable and symlink sh
chmod +x fs/bin/busybox
for app in sh ls cat echo ps; do
if [ ! -f ./fs/bin/$app ]; then
ln -s busybox fs/bin/$app
fi
done
Spin up container
This will spawn a child process which execs the program given by the user (we will try to invoke ./fs/bin/sh
).
- run the script defined above:
sudo ./make-rootfs.sh
- build the container binary:
go build -o container main.go
- run the container:
sudo ./container run /bin/sh
$ sudo ./container run /bin/sh
running [/bin/sh] as 1079
running [/bin/sh] as 1
/ # ls -a
. .. bin proc
/ # ps
PID USER TIME COMMAND
1 0 0:00 /proc/self/exe child /bin/sh
7 0 0:00 /bin/sh
9 0 0:00 ps
/ # exit