ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Linux Compatibility on BSD for the PPC Platform

by Emmanuel Dreyfus
05/10/2001

This document deals with the main problems encountered when implementing Linux binary compatibility for PowerPC-based NetBSD ports. It is intended to document various parts of the emulation subsystem, and to highlight some architecture-dependent issues that can arise in argument passing, signal handling, and with the way some system calls work. I hope it will help potential developers to do further work on the NetBSD binary compatibility framework.

Most, if not all, of this paper is intended for technically oriented readers. It is assumed that the reader has some understanding of the C programming language and has a good understanding of how processes are managed on a Unix system. Information about this, and much more, can be found in Design and Implementation of the 4.4BSD Operating System, or The Linux Kernel.

Setting up minimal emulation support

In this part, we will introduce Linux emulation and the way it is implemented. Then we will describe the different steps required in order to run statically linked Linux binaries on a NetBSD/PowerPC system.

What is Linux compatibility?

Some programs such as Netscape or Sun's JDK are not distributed with source code, so it is not possible to port them to NetBSD. We have to make do with a Linux binary, sometime a FreeBSD binary, but never a NetBSD binary. Nevertheless, users want these kind of applications to run on their NetBSD machines. To address this problem, Linux compatibility was developed on NetBSD. This Linux emulation is available through the COMPAT_LINUX kernel option on the NetBSD ports that support it (i386, alpha, and m68k). The compatibility subsystem emulates Linux system calls, and not the program itself. From the Linux program's point of view, the NetBSD kernel just looks like the Linux kernel. The Linux binary is thus able to run on NetBSD, at normal CPU speed. All its system call are intercepted and mapped to native NetBSD system calls. The overhead of Linux compatibility is hence very small.

How does it work: the global picture

A userland executable interacts in only two ways with the kernel. On one hand, we have calls from the executable to the kernel, which are system calls. On the other hand, we have the interaction from the kernel to the executable, which is signal delivery.

In order to emulate Linux binaries, the NetBSD kernel must mimic Linux kernel behaviour for system calls and signal delivery. Signal delivery is the trickiest part of the job, and not all executables actually need signals for normal operation, so we will keep signal handling for later. On the other hand, system calls are mandatory. If you want your program to do simple operations, such as reading a file or writing some text to a terminal, you need to make system calls. You might build an executable that does not make any system calls, but I am not sure that running it will be actually of any interest. So let us talk about system call emulation.

The main idea is to translate system calls. Each system call has a system call number and some arguments. If you run a Linux binary on NetBSD without writing any compatibility support in the NetBSD kernel, it will not work, because the executable will use a system call number that is incorrect for NetBSD. For instance, let us assume that our program uses the nice() system call, which is syscall #43 on Linux/PowerPC. If you run it as a Linux binary on NetBSD/PowerPC, it will actually call fchflags(), which is the syscall #43 on NetBSD/PowerPC. And even if the syscall is the same, the arguments will probably not fit. For instance, Linux will use a 32-bit long where NetBSD uses a 64-bit long long for the same argument, and this will cause the program to fail.

The NetBSD kernel must therefore first match the Linux executable. That is, it must recognise it as a Linux binary and not as a NetBSD binary. Then, when the program makes a system call, the NetBSD kernel will translate the Linux system call to a NetBSD system call. Of course, executables matched as NetBSD binaries have their system calls unchanged.

Translating syscalls

For this step, we will need the kernel sources of both NetBSD and Linux. NetBSD kernel sources can be found here, but if you plan to actually work on the kernel sources, you would do better using CVS (see the documentation to learn how to use CVS to track NetBSD-current). You can also browse the source files using CVSWeb.

Linux sources can be found on various FTP sites, for example, ftp://ftp.kernel.org/pub/linux/kernel/v2.4/linux-2.4.0.tar.gz for the 2.4 kernel. Grab the latest kernel, which will certainly be something other than 2.4 when you read this paper. It is not mandatory to get the latest kernel, but it is better to do so.

First, let us have a look at NetBSD syscalls. They are defined in the machine-independent part of the kernel sources, in sys/kern/syscalls.master. This file is used to automatically create the files sys/kern/syscalls.c, sys/sys/syscall.h, and sys/sys/syscallargs.h. Each syscall in syscalls.master is basically the system call name with "sys_" prepended to it. Here are a few lines from the sys/kern/syscalls.master file:

0  INDIR { int sys_syscall(int number, ...); }
1  STD   { void sys_exit(int rval); }
2  STD   { int sys_fork(void); }
3  STD   { ssize_t sys_read(int fd, void *buf, size_t nbyte); }
4  STD   { ssize_t sys_write(int fd, const void *buf, \
          size_t nbyte); }

Now, the Linux syscalls: Here the job is a bit more complicated, since the system call definitions are architecture dependent on Linux. The different architectures supported by the Linux kernel are in linux/arch. Each architecture has its directory. For instance, the PowerPC port of Linux has its machine-dependent source code in linux/arch/ppc/. The syscalls definition file lives in the kern subdirectory of the architecture directory, but the name of the file is not the same on all Linux ports! If you are working on another LINUX_COMPAT port, you can find the file by greping on system call names, such as mmap() or uname(). For the PowerPC, the file is linux/arch/ppc/kernel/misc.S. Here are a few lines from that file :

.long sys_ni_syscall /* 0  -  old "setup()" system call */
.long sys_exit 
.long sys_fork 
.long sys_read
.long sys_write

This Linux file lists all the syscalls, using the syscall number order. The arguments to the syscalls are not shown. To find out the arguments of a given system call, you will have to grep for its name in linux/arch/ppc/kernel and/or linux/kernel, find the function implementing the system call, and look at the function parameters.

And now, let us move to the compat directory in the NetBSD sources, which is where we will have to write a few files. For Linux compatibility on the PowerPC, it is sys/compat/linux/arch/powerpc. Here we must create a syscalls.master file and fill it with the Linux system call numbers and the function that implements them in the NetBSD kernel. The easiest way is by grabbing the syscalls.master file from another port (I used the syscalls.master from i386 Linux compatibility, which can be found at sys/compat/linux/arch/i386/syscalls.master), and modify it so that it reflects Linux syscalls on our target port, here PowerPC.

Most Linux system calls have a wrapper function in NetBSD. For example, the open() system call (syscall #3) is implemented by the linux_sys_open() function. Here is the open() system call definition in Linux compatibility, from sys/compat/linux/arch/i386/syscalls.master:
5  STD  { int linux_sys_open(const char *path, int flags, \
          int mode); }

This linux_sys_open() wrapper function lives in a file in the sys/compat/linux/common directory. Its job is to do appropriate argument translation, and then to transfer control to the sys_open() function of the NetBSD kernel.

Other Linux system calls are implemented directly by the corresponding NetBSD system call. This is the case for exit() or fork() (syscalls #1 and #2), which are defined by the sys_exit() and sys_fork() kernel functions. Here are Linux exit() and fork() definitions, from sys/arch/compat/linux/i386/syscalls.master:

1  NOARGS      { int sys_exit(int rval); }
2  NOARGS      { int sys_fork(void); } 

Most of the job is quite straightforward: It is just about reordering system calls. But sometimes, you will find that a given syscall has no equivalent for the target port. This is true, for example, for the Linux/i386 vm86() system call, which is left unimplemented in the sys/compat/linux/arc/powerpc/syscalls.master, using the UNIMPL option in the second column of the file.

Some other syscalls do not work the same way on different architectures, due to different argument sizes or different argument transmission mechanisms (in registers vs on stack). For some of them, there are already alternative implementations of the wrapper function. For instance, a call to mmap() is implemented by linux_sys_mmap() on the Alpha, and it is implemented by linux_old_mmap() on the i386.

Now, the idea is to get a good but not perfect syscalls.master, and to fix problems as they arise later. So once syscalls.master looks good, we build the linux_syscallargs.h, linux_syscalls.h by typing "make" in sys/compat/linux/arch/powerpc, and we can start trying to build a kernel.

Building the first kernel

Now when we try to build a kernel, of course it will fail, because most of the required source code is still missing, but the idea is that a failed build will tell us which gaps to fill.

First, we want to tell the config(8) tool that we added files to the kernel. Here we will work on the NetBSD/macppc port, but everything remains true for other ports. In the sys/arch/macppc/conf directory, we have a file called files.macppc that lists the files used to build a kernel for macppc. In order to modularize the compatibility code in the kernel, we will just add two include statements. These statements will tell config(8) to include the file describing what is needed for the machine-independent part of Linux compatibility (sys/compat/linux/files.linux), and the file describing what is needed for the machine-dependent part (sys/compat/linux/arch/powerpc/files.linux_powerpc):

# Linux binary compatibility (COMPAT_LINUX)
include "compat/linux/files.linux"
include "compat/linux/arch/powerpc/files.linux_powerpc"

There is also the OSS audio compatibility framework, which is required in order to link a kernel with Linux compatibility. This is included by the following lines:

# OSS audio driver compatibility
include "compat/ossaudio/files.ossaudio"

We then have to create the latter files.linux_powerpc file, and fill it with all the source files created in sys/compat/linux/arch/powerpc so far. Again, the idea is just to grab the i386 version of that file from sys/compat/linux/arch/i386, and to comment out or remove every line referencing files that are not yet in the powerpc directory.

Then we can add the COMPAT_LINUX option to our favourite kernel config file, and start a kernel build. (If you need some documentation, please read the documentation here). Of course it will fail; we expected it. During the various failures, we can discover that the source code in sys/compat/linux/common needs a lot of macros prefixed with LINUX_ and a lot of typedefs and struct definitions prefixed by linux_. The idea is always the same: to grab the i386 version of the file containing the requested macro/typedef/struct definition, and to adapt it for the PowerPC.

During this work, the linux/include/asm-ppc and linux/include/linux directories from the Linux kernel sources will be useful. It is essential to avoid just copying the i386 version of the different files needed in the powerpc directory such as linux_termios.h or linux_types.h. There are very few differences between most of the i386 version and the PowerPC version of the Linux includes we need to define, but a careful check of every value will avoid lots of trouble finding out what went wrong later.

After adding a lot of header files, the sys/compat/linux/arch/powerpc directory starts looking like its i386 counterpart. There are only a few .c files missing. We then have to define a few functions that are defined in i386/linux_machdep.c and i386/linux_ptrace.c, else the kernel will not build. Most of the linux_machdep.c file holds functions related to signal delivery, whereas the linux_ptrace.c file holds functions that enable Linux's gdb use on emulated binaries. Obviously, we don't need most of this now. So the idea is to write empty functions that just return zero without actually doing anything. The goal is to have a kernel that builds, and to add the missing code later.

Remember that each time a .c file is added to the sys/compat/linux/arch/powerpc directory, it has to be added to sys/compat/linux/arch/powerpc/files.linux_powerpc, and then, the config(8) utility must be rerun. This integrates the new file into the kernel build process. Otherwise, the new file will be ignored.

Matching the Linux binaries

Once we have a working kernel, we can try our first Linux binary on it. To do this, we go on a LinuxPPC machine and compile the following program, linked as a static binary. This is done using the -static flag with gcc.

/*
 * hello.c -- A hello world test
 * Build with gcc -static -o hello hello.c
 */
#include <stdio.h>
int main (int argc, char **argv) {
    printf ("Hello world!\n");
    return 0;
}

Then we try to run the compiled binary on the NetBSD system. Normally, it shouldn't work. Most likely, we get a strange message explaining that a syntax error occurred after a "(", and this sounds like the kernel decided this was a shell-script and gave it to the shell to execute. The dynamic version should just crash, but we will take care of it later.

Our problem is that the kernel was not able to recognise the executable as a Linux binary. This can be outlined by running ktrace(1) and kdump(1) on the executable. If the kernel had matched the executable as a Linux binary, then the kernel trace should contain a EMUL "linux" record.

So we have to get a working Linux binary-matching mechanism. When starting a new binary (on execve() calls), the NetBSD kernel performs some probe tests to find out what to do. Practically, the kernel maintains a list of struct execsw (struct execsw is defined in sys/sys/exec.h) describing the available ways of executing a program: native ELF, native a.out, shell scripts, Linux emulation, and so on. This list is initialized from sys/kern/exec_conf.c, and is used in sys/kern/kern_exec.c. A member of the struct execsw is a pointer to a probe function, whose job is to return 0 if it matches the executable. For Linux ELF32 emulation, this function is linux_elf32_probe(), which is implemented in sys/compat/linux/common/linux_exec_elf32.c.

This function performs several tests. The first test is the linux_elf32_signature(), which looks for an interpreter name specific to Linux. The interpreter is a helper program used to run the executable. This is the ld.so program used to launch dynamically linked programs. The linux_elf32_signature() looks in the ELF headers for an interpreter like /lib/ld.so or /lib/ld-linux.so, which is really Linux-specific. For instance, a NetBSD ELF program uses /usr/libexec/ld.elf_so, and a System V Release 4 system should use /usr/lib/ld.so.

This test is good for dynamically linked binaries, but it fails for statically linked binaries, for which there is no interpreter name in the ELF header. To fix this flaw, there is a second test, enabled by the LINUX_GCC_SIGNATURE macro, linux_elf32_gcc_signature(), which looks for a GCC signature in the .comment ELF section of the executable. This is not a very good test, since this GCC signature is specific to GCC but not to Linux. Anyway, for some unknown reasons, this test failed on the PowerPC.

We therefore have to find an alternative way of matching statically linked Linux binaries. The objdump(1) command is useful to investigate for such a new method: objdump -h program will dump the ELF section headers of the program, and objdump -j .name -s program will dump the content of named section .name. Here is an example of objdump -h output for a statically linked Linux binary:

$ objdump -h hello                         

hello:        file format elf32-powerpc

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00030930  018000a0  018000a0  000000a0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .init         00000080  018309d0  018309d0  000309d0  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .fini         00000028  01830a50  01830a50  00030a50  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  3 .rodata       00003f8c  01830a78  01830a78  00030a78  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 __libc_atexit 00000004  01834a04  01834a04  00034a04  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .sdata2       00000000  01834a08  01834a08  00034a08  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .data         00000cb8  01874a08  01874a08  00034a08  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  7 .got2         00000010  018756c0  018756c0  000356c0  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  8 .ctors        00000010  018756d0  018756d0  000356d0  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  9 .dtors        00000008  018756e0  018756e0  000356e0  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 10 .got          00000010  018756e8  018756e8  000356e8  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 11 .sdata        0000011c  018756f8  018756f8  000356f8  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 12 .sbss         00000024  01875814  01875814  00035814  2**2
                  ALLOC
 13 .bss          000008b8  01875838  01875838  00035814  2**2
                  ALLOC
 14 .stab         00000cfc  00000000  00000000  00035814  2**2
                  CONTENTS, READONLY, DEBUGGING
 15 .stabstr      00000fba  00000000  00000000  00036510  2**0
                  CONTENTS, READONLY, DEBUGGING
 16 .comment      00002060  00000fba  00000fba  000374ca  2**0
                  CONTENTS, READONLY

Dumping the ELF section header, we can see that all statically linked Linux programs have a section named __libc_atexit. This is specific to Linux, and as far as we know, it does not occur on any other operating system. A good point is that this __libc_atexit section does not seems to be Linux/PowerPC specific: We can find it in Linux/i386 static binaries as well.

We therefore have to write a new test in sys/compat/linux/common/linux_exec_elf32.c, enabled by the LINUX_ATEXIT_SIGNATURE macro. This test just checks if there is a __libc_atexit section in the ELF header. With this test, statically linked Linux binaries are matched. We can check this by enabling the DEBUG_LINUX macro and looking at what the kernel outputs when we try to run the binary. With this new test, it is very likely that the hello world program now runs in compatibility.

In the event it does not work, the way of solving the problem is running ktrace(1) on the program on the NetBSD box, and the Linux equivalent (which is strace(1)) on a Linux box, and see what is going wrong. Possible issues are badly translated syscalls. For instance, if we incorrectly translated mmap() to dup2(), this shows up immediately on a kernel trace, because we see that dup2() is called instead of mmap(). We need to rebuild kdump(1) if we want it to display the system call names and arguments when running emulated binaries. Generally speaking, we need to recompile kdump(1) each time we modify any syscalls.master file.

Now that statically linked binaries work, we can try dynamically linked binaries. Note that you need to download a set of Linux libraries from a PowerPC Linux box in order to run dynamically linked programs. For the hello world program, you need at least ld.so.1 and libc.6.

On the PowerPC, dynamically linked programs were immediately matched by the linux_elf32_signature() test, but running them did not work, either because it crashed, or because we got a Linux ld.so message saying that we invoked ld.so without arguments. We will focus on these dynamic binaries-specific issues in part 2.

Emmanuel Dreyfus is a system and network administrator in Paris, France, and is currently a developer for NetBSD.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.