ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Linux Compatibility on BSD for the PPC Platform: Part 3
Pages: 1, 2, 3

X11 client failures

Everything starts with trying to run an X client. All programs fail the same way, with the same error. This simple program was able to reproduce the problem:



/*
 * simplex.c -- A simple X tester
 * build with gcc -I/usr/X11R6/include -L/usr/X11R6/lib
 * -lX11 -o simplex simplex.c 
 */
 
#include <stdio.h>
#include <stdlib.h>
#include <X11/Xlib.h>

int main (int argc, char **argv) {
        Display *display;

        if (!(display = XOpenDisplay (argv[1]))) {
                perror ("XOpenDisplay");
                exit (1);
        }
}

When executed, this code produces the following error:

XIO: fatal IO error -11 (Unknown error 4294967285) on
     X server "10.0.12.137:0.0" after 0 requests (0 
     known processed) with 0 events remaining.

This problem is a side effect of a nasty bug in the way errno was handled. Here the program expects no error, or a errno = 11 (EAGAIN); but in fact, it gets errno = -11, which does not mean anything to a Linux binary. The test program was thus confused, and claimed that an unknown error occured. In fact, the program got the good errno, except that it was negative. The following test program highlights the bug:

/*
 * errno tester
 */
#include <stdio.h>
#include <unistd.h>
extern int errno;

int main (int argc, char **argv) {
        int dontcare;

        dontcare = setuid(0);
        printf ("errno = %d\n", errno);

        return 0;
}

Natively on Linux, this program output 1, and emulated on NetBSD/PowerPC, it did -1, thus demonstrating the bug.

There is a reason for handling negative error numbers. Linux uses negative error codes inside the kernel. On most platforms, this negative code is returned to the user, and glibc converts it to a positive errno, which is what a userland Unix program expects.

This operation can be found in the glibc sources. On most platforms, errno is set through the use of the __set_errno macro in the INLINE_SYSCALL macro, which is used as a wrapper for all system calls. For Linux/i386, this is defined in sysdeps/unix/sysv/linux/i386/sysdep.h.

In i386, ARM, and m68k Linux, __set_errno is used with a minus sign, so that the negative error code returned by the kernel turns into a positive errno:

__set_errno (-_sys_result);

On the PowerPC, things are quite different. The Linux kernel returns a positive value. When the kernel returns an error, glibc system call handlers jump to the __syscall_error() function, which is defined in sysdeps/unix/sysv/linux/powerpc/sysdep.c. This function sets errno using the __set_errno macro, but here there is no minus sign:

int
__syscall_error (int err_no)
{
  __set_errno (err_no);
  return -1;
}

In its Linux emulation, NetBSD mimics the Linux way of using negative error codes inside the kernel, and returns negative error codes to userland. This is okay for i386, alpha, and m68k, but it causes a bug on the PowerPC platform, because Linux's libc expects the kernel to return a positive errno, and does not make it positive if it is negative.

So let's have a closer look on how error numbers are handled in NetBSD's Linux emulation. Most error numbers are defined in sys/compat/linux/common/linux_errno.h, and some architecture-dependent error numbers are defined in sys/compat/linux/arch/powerpc/linux_errno.h, for the PowerPC port.

These error codes are used in an array that translates native NetBSD error codes to Linux error codes. This is the native_to_linux_errno array, which is built in sys/compat/linux/common/linux_errno.c. Here are the first four lines of the array definition:

const int native_to_linux_errno[] = {
   0,
   -LINUX_EPERM,
   -LINUX_ENOENT,
   -LINUX_ESRCH,
(snip)

This array is used in sys/compat/linux/common/linux_exec.c as the e_errno field of the struct emulsw that is defined in sys/sys/proc.h). This later e_errno field is used when leaving the kernel, in sys/arch/powerpc/powerpc/trap.c:trap().

if (p->p_emul->e_errno)
         error = p->p_emul->e_errno[error]; 
frame->fixreg[FIRSTARG] = error;

Everything is now architecture-independent in the way the errno is handled except the final step in trap(). To make the errno positive on return to userland, we have two options. First, modify trap() so that if the current program is a Linux binary, the errno is made positive before returning to userland. This would make the above code look something like this:

#ifdef COMPAT_LINUX 
if (p->p_emul == &emul_linux)
        /*
         * Linux uses negative errno in kernel, but   
         * returns a positive errno to userland.  
         */ 
        frame->fixreg[FIRSTARG] = -error; 
else
        frame->fixreg[FIRSTARG] = error; 
#else 
frame->fixreg[FIRSTARG] = error; 
#endif

The other option is to make all errno values positive for the PowerPC in sys/compat/linux/common/linux_errno.c. That latter option may seem like a bad choice because it requires the modification of an architecture-independent source file in order to fix an architecture-dependent problem. On the other hand, modifying trap.c is just fixing an architecture-dependent problem in an architecture-dependent file, so it does not have this drawback.

Introducing positive numbers in linux_errno.c turns out to be the best choice because other Linux ports could have the same problem. Having the ability to choose the errno sign in a machine-dependent header file without adding tests in the machine-dependent code was therefore a good idea. It is achieved by introducing a LINUX_SCERR_SIGN macro in the architecure-dependent linux_errno.h, which is - for all ports that need a negative errno value to be returned to userland, and + for ports that need a positive errno values. So far, the + only applies to the PowerPC.

This is how the native_to_linux_errno array then gets defined in sys/compat/linux/common/linux_errno.c:

const int native_to_linux_errno[] = {
   0,
   LINUX_SCERR_SIGN LINUX_EPERM,
   LINUX_SCERR_SIGN LINUX_ENOENT,
   LINUX_SCERR_SIGN LINUX_ESRCH,
(snip)

With this fix, errno is correctly handled on the PowerPC, and X binaries (and other programs) are fixed! After this last bug fix, Linux compatibility on NetBSD/PowerPC reaches a state where it is possible to run interesting real-life Linux binaries such as Netscape Communicator.

Emmanuel Dreyfus is a system and network administrator in Paris, France, and is currently a developer for NetBSD.


Return to ONLamp.com.





Sponsored by: