Sometimes, issues can only be fixed in a layer below the abstractions you rely on usually. Understanding the whole stack as much as possible is a great advantage when a bug pops up. For this reason, I decided to implement a simple Linux syscall to become familiar with the Linux kernel's code organization.
That said, I don't aspire to become a kernel hacker. The new syscall printk simply prints a given null-terminated message into the kernel log. All code changes referenced in this post can be found in this Pull Request.
Clone the kernel code
git clone --branch v5.8 --single-branch --no-tags https://github.com/torvalds/linux.git
git checkout -b add-printk-syscall
Note the branch option allows tags, too.
Preparing the development environment
We will test our custom kernel on a VM to protect our host machine's stability. In my case, the host runs Ubuntu 20.04. The modified Kernel will be tested in a VM running Ubuntu 18.04.
Install the packages needed to compile the kernel. More information about the packages required for compilation can be found on kernel.org/.
sudo apt-get install git build-essential kernel-package fakeroot libncurses5-dev libssl-dev ccache libncurses-dev bison flex gcc make git vim
Install virtualization software for the VM:
sudo apt-get install -y qemu-kvm uvtool-libvirt
Fetch an Ubuntu cloud image, a compact pre-installed disk images. With this, we don't have to run the OS install:
uvt-simplestreams-libvirt sync --source https://cloud-images.ubuntu.com/daily release=focal arch=amd64
Create the VM:
uvt-kvm create kerneltest arch=amd64 release=focal --memory 4096 --cpu 2 --disk 15 --unsafe-caching
Install ccache to heavily reduce build times after the first build. Making changes iteratively and playing around with the code is a lot faster with ccache.
apt install -y ccache
You can check the number of cache misses and hits with "ccache -s".
Implementing the syscall
This is the code of the syscall itself (kernel/sys_printk.c):
#include <linux/kernel.h>
#include <linux/syscalls.h>
SYSCALL_DEFINE1(printk, char *, msg)
{
char buf[256];
long copied = strncpy_from_user(buf, msg, sizeof(buf));
if (copied < 0 || copied == sizeof(buf)) {
return -EFAULT;
}
printk(KERN_INFO "Msg: \"%s\"\n", buf);
return 0L;
}
The syscall is defined with the SYSCALL_DEFINE1 macro because it takes one argument. The macro's first argument is the syscall name. The second and third macro arguments describe the type and name of the syscall's first argument.
Accessing user space memory from the kernel is a bad idea. Therefore, we copy the content of the message passed as argument to the syscall into kernel space with "strncpy_from_user".
Next, we hook the new file into the build system in kernel/Makefile:
# SPDX-License-Identifier: GPL-2.0
#
# Makefile for the linux kernel.
#
obj-y = fork.o exec_domain.o panic.o \
cpu.o exit.o softirq.o resource.o \
sysctl.o sysctl_binary.o capability.o ptrace.o user.o \
signal.o sys.o umh.o workqueue.o pid.o task_work.o \
extable.o params.o \
kthread.o sys_ni.o nsproxy.o \
notifier.o ksysfs.o cred.o reboot.o \
async.o range.o smpboot.o ucount.o sys_printk.o
...
Extend the system call table
Add an entry for the printk syscall to vim arch/x86/entry/syscalls/syscall_64.tbl
:
...
434 common pidfd_open sys_pidfd_open
435 common clone3 sys_clone3
437 common openat2 sys_openat2
438 common pidfd_getfd sys_pidfd_getfd
439 common faccessat2 sys_faccessat2
440 64 printk sys_printk
#
# x32-specific system call numbers start at 512 to avoid cache impact
# for native 64-bit operation. The __x32_compat_sys stubs are created
# on-the-fly for compat_sys_*() compatibility system calls if X86_X32
# is defined.
#
512 x32 rt_sigaction compat_sys_rt_sigaction
513 x32 rt_sigreturn compat_sys_x32_rt_sigreturn
514 x32 ioctl compat_sys_ioctl
...
Extend the syscall header file
Add the following line to the syscall header file include/linux/syscalls.h
/* kernel/sys_printk.c */
asmlinkage long sys_printk(const char __user *msg);
Build the kernel
Create the kernel configuration with the following command. You can pick drivers, algorithms and customize the kernel with a huge number of options.
make menuconfig
Next, we build the kernel and create a debian package containing the kernel. We can easily install the debian package in the VM.
# Disable debug info, reduces compile time required space
scripts/config --disable DEBUG_INFO
make -j `$nproc` bindeb-pkg CC="ccache gcc"
The -j or --jobs option specifies how many makefile recipes can be executed in parallel. A sensible value reduces the compile time a lot. I set the number of jobs to $nproc, the number of CPU cores in my laptop (including hyper-threads). In my case, $nproc is 12.
The CC variable stands for "c compiler". We use ccache because it caches compilation artifacts which speeds up subsequent builds.
Installing the kernel in the VM
Move the Debian packages into the VM:
scp ../linux-*.deb ubuntu@$(uvt-kvm ip kerneltest):~
Install the Debian packages in the VM:
uvt-kvm ssh kerneltest -- sudo dpkg -i /home/ubuntu/*.deb
Reboot the VM:
uvt-kvm ssh kerneltest -- sudo reboot
Check the new kernel is running with:
# Wait until VM boot is ready after the reboot
uvt-kvm ssh wait kerneltest
# Returns 5.8.0
uvt-kvm ssh kerneltest -- uname -r
Testing the syscall
For testing purposes, we will create a small test program that uses our new syscall. There is no libc wrapper function for the new syscall printk. To call our syscall printk, we use the "syscall" function. It's arguments are the syscall number specified in the syscall table and the arguments for the syscall.
#include <unistd.h>
#include <sys/syscall.h>
#include <stdio.h>
#include <errno.h>
#define __NR_printk 440
int main(int argc, char *argv[]) {
if (argc != 2) {
fprintf(stdout, "Please provide a msg (with quotes if multiple words)\n");
return 1;
}
char* msg = argv[1];
int result = syscall(__NR_printk, msg);
if (result == -1) {
fprintf(stderr, "syscall failed, errno = %d\n", errno);
} else {
fprintf(stdout, "syscall succeeded, run \"dmesg\" to see the message in the kernel logs\n");
}
return 0;
}
Comspile the test program "syscall-printk-test" with "make". We run the program on our development host to verify it fails as our kernel doesn't have the new syscall.
make
./syscall-printk-test "hello world"
syscall failed, errno = 38
Let's check the meaning of errno 38.
sudo apt-get install errno
errno 38
ENOSYS 38 Function not implemented
As expected, the test program fails because our kernel doesn't have a syscall with number 440.
Next, we run the test program in our VM with the custom kernel.
# Move binary into VM
scp syscall-printk-test ubuntu@$(uvt-kvm ip kerneltest):~
# Run the test program
uvt-kvm ssh kerneltest -- "~/syscall-printk-test 'hello kernel'"
syscall succeeded, run "dmesg" to see the message in the kernel logs
# Check the kernel log in the VM
uvt-kvm ssh kerneltest -- dmesg
...
[ 11.495005] audit: type=1400 audit(1599501627.676:39): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.hook.remove" pid=838 comm="apparmor_parser"
[ 55.784973] Msg: "hello kernel"
The message was written to the kernel log successfully.
Conclusion
As usual, taking a look under the hood of a technology demystifies it - there is no magic involved. Building and installing the Linux kernel was a lot simpler than I expected. Modifying the kernel was slightly trickier because of all the historical cruft. Surely, having a basic understanding of the kernel's source code organization will come in handy in the future.