oreilly.comSafari Books Online.Conferences.


Linux Compatibility on BSD for the PPC Platform: Part 3
Pages: 1, 2, 3

Tuning: Fixing system-call-specific issues

A simple bug fix: ioctl() issues
Now the time has come to try running real Linux binaries, and see what happens. We discover many small problems here. For example, the Linux ioctl() TIOCGETA and TIOCGWINSZ fails without any reason.

ioctl() is used to make non-standard operations on devices. It is widely used to get and set terminal parameters. For example, ioctl() TIOCGETA is used to get the terminal's struct termios, and ioctl() TIOCGWINSZ is used to get the terminal window size. If you need more information about ioctl(), refer to the ioctl(2) man page.

After some investigation with ktrace(1), it is obvious that the ioctl com argument was wrong: Linux tried to do a ioctl() TIOCGETA, and NetBSD understood another ioctl() (and thus, it failed). This is caused by a struct linux_termios mismatch.

The ioctl com parameters are calculated on the ioctl type (read, write, read/write, or nothing), its group (the letter in the ioctl definition), its number, and the size of the third argument to ioctl(). Here the problem is that in our NetBSD definition, the struct linux_termios is not the same size than the real Linux's struct termios. This happens because the struct linux_termios is defined in sys/compat/linux/common/linux_termios.h. It is considered to be architecture-independent, but it is not. Moving the definition to an architecture-dependent file fixes the problem.

One fake bug: lstat() issues
There are also fake problems. For example, lstat() fails with glibc-2. A program build on a glibc-1 LinuxPPC system worked fine on the Linux system with glibc-1, but it broke on NetBSD when using glibc-2. If I had a glibc-2 LinuxPPC system to try out my binary built on a glibc-1 LinuxPPC system, I would have been able to understand quickly that the failure was normal: A program using lstat() and dynamically linked against glibc-1 cannot work with glibc-2. Let's study why it failed.

glic-2.1.3 sources are available here. Alternatively, you can browse the source using CVSWeb.

Here is a simple program that tests lstat(). It was build on a LinuxPPC system that uses glibc-1.

 * lstat.c - A lstat() tester
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>

int main (int argc, char **argv) {
 const char *file_name = "/etc";
 struct stat buf;
 int res;

 if (argc >= 2)
  file_name = argv[1];

 res = lstat (file_name, &buf);
 if (res < 0) {
  printf ("res=%d file_name=%s &buf=0x%lx\n", res, file_name, &buf);
  perror ("lstat() failed");
  exit (-1);
 return 0;

Now, if we try to use the libc-2.1.3, the same binary will fail. According to the kernel trace, the lstat() system call is successful, but the program gets a -1 return value (errno set to EINVAL). The modification of the result is done with glibc glue. Looking at glibc-2.1.3 sources, we discover there is a mechanism for dealing with the multiple versions of the struct stat that exists on the Linux system (Linux-2.4 defines a struct old_kernel_stat and a struct stat). glibc has to detect the version of the stat structure expected by the program, and if the kernel does not provide that structure, it has to convert it. Here's how it works:

  • lstat() is defined in glibc/io/lstat.c, and it calls __lxstat(), with _STAT_VER as the first argument. This function gets statically linked into the executable, and therefore the _STAT_VER parameter is hard-coded into the executable with a value specific to the struct stat that is expected. When linking with glibc-1.99, the value is 0.

  • __lxstat() is defined in glibc/sysdeps/unix/sysv/linux/lxstat.c, it tests the first arguments (it calls it vers), return if it is _STAT_VER_KERNEL, or calls xstat_conv, giving it vers as first argument if not (xstat_conv is called with _STAT_VER). The call from lstat() to _lxstat() is dynamic. __lxstat() compares the vers version to _STAT_VER_KERNEL that is specific to the current kernel's struct stat. On glibc-2.1.3, this value is "3."

  • xstat_conv() is defined in glibc/sysdeps/unix/sysv/linux/xstatconv.c. Its job is to convert the kernel's struct stat into what the executable expects. It checks two possibilities about the vers parameter:

    • If it is equal to _STAT_VER_KERNEL, just return
    • If it is equal to _STAT_VER_LINUX, the struct old_kernel_stat is converted to a struct stat, and we return.
    • Otherwise, return an error (EINVAL).

Obviously, when running on a glibc-2 system -- a binary linked with glibc-1 -- we are hitting the "otherwise" case in xstat_conv(). The conclusion is that glibc-2 does not expect the user to use lstat() in a binary built for glibc-1. Building the binary on a glibc-2 Linux system fixes the problem, and the binary works fine with NetBSD's Linux emulation. There was no fix to do in the NetBSD emulation code, so we could consider it a glibc-2 bug.

open() unable to create files

This is a really annoying bug: The bug causes open() to ignore the O_CREAT flag. Therefore, open() system calls requiring a file creation fail because the file does not exist. The reason is silly: In Linux's fnctl.h, the O_CREAT flag definition is like this: #define O_CREAT 0100. Looking at it, if you do not use C octal notation every day, you may think that this is a hexadecimal value, and that the Linux code adds the leading "0x" where it needs to use this value. Therefore, you might write this in NetBSD's linux_fcntl.h file: #define LINUX_O_CREAT 0x0100

If you use octal notation, just remember that in C, "0100" means 100 in octal, which is 40 in hexadecimal. You may think this the silliest mistake described in this document. Well, I did it so I hope this section will be useful for people who have forgotten how to define an octal value in C.

Pages: 1, 2, 3

Next Pagearrow

Sponsored by: