Sometimes, issues can only be fixed in a layer below the abstractions you rely on usually. Understanding the whole stack as much as possible is a great advantage when a bug pops up. For this reason, I decided to implement a simple Linux syscall to become familiar with the Linux kernel's code organization.

That said, I don't aspire to become a kernel hacker. The new syscall printk simply prints a given null-terminated message into the kernel log. All code changes referenced in this post can be found in this Pull Request.

Clone the kernel code

git clone --branch v5.8 --single-branch --no-tags https://github.com/torvalds/linux.git
git checkout -b add-printk-syscall

Note the branch option allows tags, too.

Preparing the development environment

We will test our custom kernel on a VM to protect our host machine's stability. In my case, the host runs Ubuntu 20.04. The modified Kernel will be tested in a VM running Ubuntu 18.04.

Install the packages needed to compile the kernel. More information about the packages required for compilation can be found on kernel.org/.

sudo apt-get install git build-essential kernel-package fakeroot libncurses5-dev libssl-dev ccache libncurses-dev bison flex gcc make git vim

Install virtualization software for the VM:

sudo apt-get install -y qemu-kvm uvtool-libvirt

Fetch an Ubuntu cloud image, a compact pre-installed disk images. With this, we don't have to run the OS install:

uvt-simplestreams-libvirt sync --source https://cloud-images.ubuntu.com/daily release=focal arch=amd64

Create the VM:

uvt-kvm create kerneltest arch=amd64 release=focal --memory 4096 --cpu 2 --disk 15 --unsafe-caching

Install ccache to heavily reduce build times after the first build. Making changes iteratively and playing around with the code is a lot faster with ccache.

apt install -y ccache

You can check the number of cache misses and hits with "ccache -s".

Implementing the syscall

This is the code of the syscall itself (kernel/sys_printk.c):

#include <linux/kernel.h>
#include <linux/syscalls.h>

SYSCALL_DEFINE1(printk, char *, msg)
{
    char buf[256];
    long copied = strncpy_from_user(buf, msg, sizeof(buf));
    if (copied < 0 || copied == sizeof(buf)) {
        return -EFAULT;
    }
    printk(KERN_INFO "Msg: \"%s\"\n", buf);
    return 0L;
}

The syscall is defined with the SYSCALL_DEFINE1 macro because it takes one argument. The macro's first argument is the syscall name. The second and third macro arguments describe the type and name of the syscall's first argument.

Accessing user space memory from the kernel is a bad idea. Therefore, we copy the content of the message passed as argument to the syscall into kernel space with "strncpy_from_user".

Next, we hook the new file into the build system in kernel/Makefile:

# SPDX-License-Identifier: GPL-2.0
#
# Makefile for the linux kernel.
#

obj-y     = fork.o exec_domain.o panic.o \
	    cpu.o exit.o softirq.o resource.o \
	    sysctl.o sysctl_binary.o capability.o ptrace.o user.o \
	    signal.o sys.o umh.o workqueue.o pid.o task_work.o \
	    extable.o params.o \
	    kthread.o sys_ni.o nsproxy.o \
	    notifier.o ksysfs.o cred.o reboot.o \
	    async.o range.o smpboot.o ucount.o sys_printk.o
...

Extend the system call table

Add an entry for the printk syscall to vim arch/x86/entry/syscalls/syscall_64.tbl:

...
434     common  pidfd_open              sys_pidfd_open
435     common  clone3                  sys_clone3
437     common  openat2                 sys_openat2
438     common  pidfd_getfd             sys_pidfd_getfd
439     common  faccessat2              sys_faccessat2
440     64      printk                  sys_printk

#
# x32-specific system call numbers start at 512 to avoid cache impact
# for native 64-bit operation. The __x32_compat_sys stubs are created
# on-the-fly for compat_sys_*() compatibility system calls if X86_X32
# is defined.
#
512     x32     rt_sigaction            compat_sys_rt_sigaction
513     x32     rt_sigreturn            compat_sys_x32_rt_sigreturn
514     x32     ioctl                   compat_sys_ioctl
...

Extend the syscall header file

Add the following line to the syscall header file include/linux/syscalls.h

/* kernel/sys_printk.c */
asmlinkage long sys_printk(const char __user *msg);

Build the kernel

Create the kernel configuration with the following command. You can pick drivers, algorithms and customize the kernel with a huge number of options.

make menuconfig

Next, we build the kernel and create a debian package containing the kernel. We can easily install the debian package in the VM.

# Disable debug info, reduces compile time required space
scripts/config --disable DEBUG_INFO
make -j `$nproc` bindeb-pkg CC="ccache gcc"

The -j or --jobs option specifies how many makefile recipes can be executed in parallel. A sensible value reduces the compile time a lot. I set the number of jobs to $nproc, the number of CPU cores in my laptop (including hyper-threads). In my case, $nproc is 12.

The CC variable stands for "c compiler". We use ccache because it caches compilation artifacts which speeds up subsequent builds.

Installing the kernel in the VM

Move the Debian packages into the VM:

scp ../linux-*.deb ubuntu@$(uvt-kvm ip kerneltest):~

Install the Debian packages in the VM:

uvt-kvm ssh kerneltest -- sudo dpkg -i /home/ubuntu/*.deb

Reboot the VM:

uvt-kvm ssh kerneltest -- sudo reboot

Check the new kernel is running with:

# Wait until VM boot is ready after the reboot
uvt-kvm ssh wait kerneltest

# Returns 5.8.0
uvt-kvm ssh kerneltest -- uname -r

Testing the syscall

For testing purposes, we will create a small test program that uses our new syscall. There is no libc wrapper function for the new syscall printk. To call our syscall printk, we use the "syscall" function. It's arguments are the syscall number specified in the syscall table and the arguments for the syscall.

#include <unistd.h>
#include <sys/syscall.h>
#include <stdio.h>
#include <errno.h>

#define __NR_printk 440

int main(int argc, char *argv[]) {
  if (argc != 2) {
    fprintf(stdout, "Please provide a msg (with quotes if multiple words)\n");
    return 1;
  }
  char* msg = argv[1];
  int result = syscall(__NR_printk, msg);
  if (result == -1) {
    fprintf(stderr, "syscall failed, errno = %d\n", errno);
  } else {
    fprintf(stdout, "syscall succeeded, run \"dmesg\" to see the message in the kernel logs\n");
  }
  return 0;
}

Comspile the test program "syscall-printk-test" with "make". We run the program on our development host to verify it fails as our kernel doesn't have the new syscall.

make
./syscall-printk-test "hello world"
syscall failed, errno = 38

Let's check the meaning of errno 38.

sudo apt-get install errno
errno 38
ENOSYS 38 Function not implemented

As expected, the test program fails because our kernel doesn't have a syscall with number 440.

Next, we run the test program in our VM with the custom kernel.

# Move binary into VM
scp syscall-printk-test ubuntu@$(uvt-kvm ip kerneltest):~

# Run the test program
uvt-kvm ssh kerneltest -- "~/syscall-printk-test 'hello kernel'"
syscall succeeded, run "dmesg" to see the message in the kernel logs

# Check the kernel log in the VM
uvt-kvm ssh kerneltest -- dmesg

...
[   11.495005] audit: type=1400 audit(1599501627.676:39): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.hook.remove" pid=838 comm="apparmor_parser"
[   55.784973] Msg: "hello kernel"

The message was written to the kernel log successfully.

Conclusion

As usual, taking a look under the hood of a technology demystifies it - there is no magic involved. Building and installing the Linux kernel was a lot simpler than I expected. Modifying the kernel was slightly trickier because of all the historical cruft. Surely, having a basic understanding of the kernel's source code organization will come in handy in the future.