Sunday, February 10, 2008

Waving the flag: NetBSD developers speak about version 4.0

By Federico Biancuzzi | Published: January 30, 2008 - 11:53PM CT (Whole Article)

Introduction

The NetBSD community announced last month the official release of NetBSD 4.0, the latest version of the Unix-like open-source operating system. Version 4.0 includes significant new features like Bluetooth support, version 3 of the Xen virtual machine monitor, new device drivers, and improvements to the Veriexec file integrity subsystem. NetBSD, which is known for its high portability, is capable of running on 54 different system architectures and is suitable for use on a wide range of hardware, including desktops, servers, mobile devices, and even kitchen toasters.

Meet the developers

To commemorate the NetBSD 4.0 launch, enthusiast Federico Biancuzzi communicated with 21 developers to produce this expansive interview with loads of insightful information about the NetBSD 4.0 development process.

NetBSD Foundation secretary Christos Zoulas will discuss why sendmail has been removed and what goals the project has for the fundraising process. He wrote a lot of code, such as svr4 emulation, isapnp code, ptyfs, siginfo, ELF loader, mach-o loader, statvfs, pam integration, etc.

Liam J. Foy will explain the delay in the release engineering process.

Elad Efrat's area of interest in NetBSD is mostly enabling security technologies. From allowing flexible fine-grained security policies on the system, to actually writing them and making them easy to deploy, he is interested in minimizing the time it takes to construct a secure installation. His major contributions to NetBSD, if trying to chronologically order them, are Veriexec, fileassoc(9), kauth(9) (and the secmodel(9) derivatives), security model abstraction (bsd44, securelevel), PaX features (MPROTECT, Segvguard, ASLR), pw_policy(3). These topics were summarized in a paper published on SecurityFocus in late 2006 and later presented in EuroBSDCon 2006. Subsets were also presented in smaller academic and military venues in Israel.

Matt Fleming will describe how veriexecgen works.

Nicolas Joly will present the status of Linux binary compatibility.

Matthias Scheler is currently mostly working on "pkgsrc" and occasionally fixing bugs in the base system. He was the responsible release engineer who managed the NetBSD 3.0 release. He will talk about the X Window System and the new digital transfer mode in cdplay.

Joerg Sonnenberger contributed a lot of infrastructure work for pkgsrc and he is the responsible for modular Xorg in pkgsrc. His contributions in the NetBSD base system are ACPI and x86 improvements.

Jason Thorpe will describe proplib(3).

Manuel Bouyer is a NetBSD user since 0.8. His first contribution was porting the OpenBSD shared library support for NetBSD/pmax in 1995, then he started working on ATAPI devices support. In late 1996/early 1997, he also wrote the ext2fs support, starting from the ffs code (no GPL code in it). He got invited to becode a developer at this time to integrate his work on ATAPI and ext2fs. After that, he kept working in the ATA and SCSI areas (he added the primary DMA support for PCI ATA controllers for example). On occasion, he also worked on other parts of the kernel (Ethernet controller drivers, of dual-endian support to FFS, for example).

When Xen 2.0 was out, he started working on domain0 support so that he could build virtual servers not relying on Linux. He added support for Xen 3.0 when it got out, and became NetBSD/xen portmaster at about this time. He also became a releng member in early 2007 to give some manpower to the team. He will give an overview of NetBSD/Xen and the new 'no-emulation' eltorito boot method.

Phil Nelson installed and ran the first web server serving www.netbsd.org. (the service has since been taken over by project servers.) He still does some minor admin work for the web server.

He runs the WWU build cluster and improved the scripts for the build clusters so they could build releases faster. He wrote a program called xarcmp that compares the contents of archives without having to extract them. It is part of the standard tool now used by both build clusters.

He wrote the initial versions of menuc, msgc and sysinst. sysinst is the current install system and it uses menuc and msgc. He will talk about LFS (Log-structured File System).

Julio M. Merino Vidal has been a member of The NetBSD Foundation and a developer since November 2002. He started contributing to the pkgsrc project with the main goal of getting GNOME 2 to work under NetBSD. He then made other contributions to the core NetBSD operating system, the most relevant of which are the tmpfs file system and the automated testing framework (ATF). He will discuss the features of tmpfs.

Antti Kantee worked on pkgsrc, device drivers and hardware support for various platforms, file systems, and assorted kernel and userland bits. He maintains the file(1) utility in the NetBSD source tree and NetHack in pkgsrc. He will describe the relationship between puffs, FUSE, and ReFUSE.

Alistair Crooks is the president of The NetBSD Foundation, Core Team member, and founder of pkgsrc.

He contributed to pkgsrc, user management software - user(8), ReFUSE (BSD-licensed re-implementation of FUSE), numerous filesystems based on ReFUSE, and iSCSI target and initiator. He will make clear which parts of the iSCSI protocol are included in this release and how much "hackathons" are helpful.

Reinoud Zandijk will add details about the implementation of the Universal Disk Format (UDF).

Iain Hibbert's major contribution has been the Bluetooth protocol stack and associated drivers and utilities. He talks about Bluetooth support in NetBSD.

Arnaud Lacombe has been a developer for about one year now. At the beginning, he joined the project to fix bugs found by the Coverity scan of NetBSD. He is interested in porting NetBSD to new embedded platforms. OpenMoko was a really interesting way to do it, but he never managed to get hardware, so this project is stalled for the moment.

Martin Fouts, Noud de Brouwer, and Arnaud discuss the current support for the hardware included in the iPhone and OpenMoko (Neo1973).

Alan Ritter will comment on the possibility of using the NDIS wrapper to port binary drivers among platforms.

Yamamoto Takashi will describe what agr(4) is.

Jan Schaumann used to work a lot on the web site. He also ported pkgsrc to IRIX and did bulk-builds there. He is a member of the communication-exec team and managed NetBSD's participation in the first two Google Summer of Code. He will discuss their experience with the Google Summer of Code project.

Release engineering, Sendmail, and kauth

What happened to the Release Engineering process for 4.0?

Liam J. Foy: Basically, the release engineering was started as planned. However, after the release engineering started a lot of changes were made which would be too time consuming to pull up (pull up from current to the NetBSD 4branch). Thus, we started the process again with all the changes merged.

You may remember a few NetBSD hackathons which took place. Well, these are what caused the large number of changes.

Why did you remove sendmail?

Christos Zoulas: Sendmail has been, is, and will be a security accident waiting to happen (unless it is rewritten from the ground up with security consciousness). Performing character pointer gymnastics in 50-100 line loops does not create any warm and fuzzy feelings for me. To top this off, most sendmail security issues are marked as confidential, and we are prevented from fixing or mentioning the problem until the ban is lifted. The last time this happened, we said "enough" and removed it altogether before the ban for that particular security issue was lifted

Would you like to present kauth, the kernel authorization framework, included in this release?

Elad Efrat: Kernel authorization is really something cool that I'm happy many people will get a chance to test in this release.

Basically, the story goes like this: the traditional Unix security model, the one where root is almighty and everyone else is not, is slowly beginning to show signs of age as demands for finer-grained security policies arise. The problem becomes more apparent when you look at the kernel code and see that, probably as time went by, people used various ways to check whether a certain operation is allowed to be performed or not—most are variants of checking whether the effective user-id is 0 or using the suser() function—effectively embedding the security model to the kernel, in a way that the question "Can this user open a raw socket?" really became "Is this user root?"

Kauth(9) was originally designed by Apple, and it provides a dispatching model that allows us to really ask if a certain operation can be performed by specifying it along with the relevant context. This abstracts the security model from the various kernel subsystems. The system is divided to "scopes" that collect actions of the same nature (right now, NetBSD has scopes like "process", "network","machdep", etc.), and each scope has a set of "actions" that define the operation (the context provided depends on the operation). When the kernel wants to ask if a certain operation is allowed, it calls the authorization wrapper for the relevant scope, providing it the action and context, and receives back a binary response of "yes" or "no".

This upper layer, like I said, effectively abstracts the way these decisions are made, allowing one to "plug" (almost) any security model one can think of.

These security models are implemented by attaching "listeners" (kauth(9) terminology for callback functions) that receive the action and context and make the decision. In the future it is hoped that these listeners will be able to run on an entirely different machine, thus allowing a centralized security policy for an entire network to be controlled from a single host.

Unfortunately, in NetBSD 4.0, although the security model abstraction is almost complete, there is still not full control over all operations. That is—in some places, the question is still "Is this user root?", so implementing different security models will not be complete. However, it will provide an indication of how well kauth(9) works and how users react to this new ability to develop custom security models (for example, classic uses like restricting raw networking, or binding to certain low ports, to a few users). I'm hoping to be surprised by our user base's creativity.

I highly recommend spending a few minutes reading kauth(9) and secmodel(9) in the current man pages. There's a little taste of what's in the plans for NetBSD 5.0—like credential inheritance control—with the ultimate goal being to provide system administrators with security policy control that requires either zero or very little effort.

PaX, fileassoc, and Veriexec

What security features have you added to mprotect(2) from PaX?

Elad Efrat: First, the PaX project, for those who aren't familiar with it, is responsible for all the modern exploit mitigation technologies. It's where stuff like W^X and ASLR (Address Space Layout Randomization) were born, along with many, many other cool features. One of them, which I initially ported to NetBSD, is PaX MPROTECT.

PaX MPROTECT can be thought of as "strict W^X". If in a normal system, a program starts with no memory pages that are both writable and executable, but those pages can still be created using mprotect(2), systems or programs that run with PaX MPROTECT are "immune" to attacks where protection on pages is modified, often by trashing arguments to mprotect(2).

paxctl(8) is a tool that enables PaX MPROTECT on a per-program basis, if you don't want to enable it globally. I'm afraid that not much testing was done on various architectures using this feature, so users should first experiment with it...

How does the new fileassoc KPI (Kernel Programming Interface) work?

Elad Efrat: I'll begin by describing what fileassoc(9) is.

Some (newer) filesystems allow on to attach metadata to files by using extended attributes (in addition to the usual ones—permissions, timestamps, etc.) and such. A common use for them may be, for example, to store ACLs (Access Control List) for the file that allow finer granularity on access control.

The problem is that these extended attributes are filesystem-dependent and either don't exist on most filesystems or are interfaced differently. This is where fileassoc(9) comes into play: it allows you to attach metadata to files in a filesystem-independent way while storing the information in fast-access kernel tables. The advantage is that it really is filesystem-independent, so there's the potential of adding features and such to filesystems that lack them (for example, again, ACLs). The disadvantages are that the metadata is not really "attached" to the file in any way, and must be loaded in some form—from a database file or such—to the kernel. So that the dependency becomes on the OS itself, but that's cool, because it's aimed for NetBSDsystems.

The KPI works by storing data in hash tables. Metadata is identified by a key (a C string) and is a stream of bytes. A kernel subsystem can then use the fileassoc(9) KPI to either attach, query, or remove metadata from files, by specifying the key.

The fileassoc(9) KPI is implemented in the VFS (Virtual File System) layer, identifying each file using its "file handle" (these are supposed to be unique). So pretty much any filesystem is supported. An example (and the only, at the moment) consumer of fileassoc(9) is Veriexec, where a database file holds information that is parsed by a userland tool, then fed to fileassoc(9) using a special device.

Plans to write a generic interface for communicating with fileassoc(9) have come up in the past, but since we're not yet convinced of the necessity, these are just plans at the moment. Developers interested in feeding data to fileassoc(9) should also implement their own special device stub—at least for now.

Is there any news about Veriexec?

Elad Efrat: As with NetBSD 4.0, I feel Veriexec has gotten to a point where it should probably be used by most of our users.

Veriexec is NetBSD's integrity subsystem, which, in short, can guarantee the integrity of the programs, configuration files, etc., on the system.

In addition to the tons of improvements in performance and stability, and the many features added, NetBSD 4.0 introduces 'veriexecgen', which is a tool written by Matt Fleming (mjf@). Veriexecgen tremendously lowers the bar for fingerprint database generation to a point where it's possible to run a single command with no arguments after installation to have Veriexec set-up appropriately. I strongly recommend reading the veriexecgen(8) main page and giving it a try.

What is veriexecgen and how does it work?

Matt Fleming: veriexecgen is a program that runs against directories and generates a set of fingerprints for use with veriexec. This fingerprint database is usually stored in /etc/signatures.

XFree86, pkgsrc, proplib, and Xen

Do you still include Linux binary compatibility?

Nicolas Joly: I'm currently working on improving Linux binary compatibility on AMD64 for both 32- and 64-bit applications.

The amd64 compat linux is the first one that include NPTL (NativePOSIX Thread Library) emulation found on Linux 2.6 kernel. Likewise, compat linux32 is pretty new, and even if mostly identical to i386linux emulation needs to be modified.

For NetBSD 4.0, kernel support for compat linux is not enabled bydefault as this is not as stable as other ports. For this release, do not expect to run complicated linux applications yet, but basic ones should work. In the mean time, -current has made some progresses...

Which X Window System is included in NetBSD 4.0?

Matthias Scheler: XFree86 4.5.0.

The XFree86 Project has unfortunately lost a lot of momentum. NetBSD is currently in the process of switching to the X.org X11 distribution. We initially tried integrating the monolithic X.org distribution into our xsrc source tree. But it never reached a state where it worked on all platforms and supported cross builds via build.sh.

After the X.org project changed to a modular distribution, it was obvious that pkgsrc is the best way to integrate X11 in the future. There is ongoing work mostly by Joerg Sonnenberger to make modular X11 in pkgsrc cross-buildable. When that work has been finished and properly integrated with the system installation, the XFree86 distribution in xsrc will probably be retired.

What's new in pkgsrc?

Joerg Sonnenberger: I'm only aware of NetBSD 4 being a requirement to use the cross-compiling support. The updates to pkg_install will be in NetBSD4.1, so that is ruled out.

The cross-compiling support allows a small subset of pkgsrc to be built for any architecture running NetBSD using the output of build.sh. Currently this subset includes modular Xorg and a few applications like Xpdf as proof of concept. This is intended to replace the aging XFree86in xsrc, but needs some more polishing work on the pkgsrc side.

What is proplib(3) and what can we use it for?

Jason Thorpe: proplib is a library for manipulating property lists. Property lists are collections of properties, typically stored in a dictionary. A dictionary is an associative array, essentially key-valuepairs. The keys are strings, and the values are strings, numbers, opaque data, booleans, arrays, dictionaries, ...

This is handy in a variety of applications. For example, communicating structured data between userspace and the kernel, describing properties of devices, etc.

What can we do with NetBSD/Xen?

Manuel Bouyer: Well, we can do a lot of things, from merging different physical boxes to a single one (if you need different OSes for different tasks, for example), to easily create test systems (it's very convenient for kernel developement: a guest boots much faster than a regular PC). With the virtualization features of recent X86 CPUs (which are supported by Xen3 with NetBSD as dom0), you can also boot plain i386 OSes. This can be used to test install media, for example, or to load systems that can't be paravirtualized (e.g., Windows).

Filesystems

What is the status of LFS (Log-structured File System)?

Phil Nelson: LFS is in use on about 1/2 of the 23 nodes of the WWU build cluster. Granted, it is running an older 4.0-Beta kernel, but they have been running just fine for quite a while. Currently all the machines in the WWU build cluster have been up 78+ days and continuously building. They are all i386 machines.

Would you like to describe the features of tmpfs?

Julio M. Merino Vidal: tmpfs was born as a replacement for MFS—the memory file system included in all BSDs as far as I know—and as such it shares many functionality with it. It basically is an efficient, memory-based file system which means that it uses part of the system's virtual memory to store files in a way that is more space- and speed-efficient than MFS. It supports all of the features expected from a Unix-style filesystem, including hard links, symbolic links, devices, permissions, file flags and NFS exportability, but it currently does not support sparse files.

As opposed to MFS, tmpfs file systems can grow and shrink automatically depending on the available memory if they are configured as such; their memory consumption can also be upper-limited. And even though the end user cannot directly notice it, tmpfs's code is much simpler than MFS's one, which means that it is easier to audit, test and optimize.

As you just mentioned, tmpfs is a bit different than MFS, but why did MFS need a replacement in the first place?

Julio M. Merino Vidal: The main problem of MFS is that it performs poorly, uses too much memory, and the used memory cannot be reclaimed unless the file system is unmounted. To understand why this happens, we need to see how MFS works, and to see that, we first need to outline how FFS is designed.

FFS (or UFS), the traditional BSD file system, is conceptually split in two different layers. the upper layer handles all the logic from the file system, including the management of directories, the common file operations, the routines to map data to blocks, etc. The lower layer lays out the data on disk, allocating blocks and inodes as needed, etc. This seems like a very nice design, but in the end, it imposes restrictions on the way the lower layer operates (or at least that's the impression I gathered; I'm not too familiar on FFS's code).

MFS is just a replacement of FFS's lower layer to operate on virtual memory. The approach it takes is dead simple: it allocates a contiguous block of memory and treats that as if it were a disk, organizing the contents of such memory region as disk blocks. As you can expect, such approach has a lot of overhead because the file system uses an incorrect abstraction—disk blocks and cylinder groups—to manage memory.

tmpfs, on the other hand, is a much simpler filesystem which takes advantage of the fact that it is always memory-backed. It uses traditional structures, linked lists, arrays, memory pools and other sorts of data abstractions to represent the contents of the file system. These abstractions are easier to deal with and hence require less resources to manage.

It is also interesting to mention that tmpfs grew a custom regression testing suite that was also well received by developers. But the way it was created was suboptimal, which made me start the ATF (Automated Testing Framework) project this year. We'll see a better testing suite in NetBSD 5.0, or at least I hope so! Oh, and by the way, FreeBSD also imported tmpfs in its source tree.

What is the relationship between puffs, FUSE and ReFUSE?

Antti Kantee: puffs stands for Pass-to-Userspace Framework File System. It is an interface and a framework implementation for userspace filesystems on NetBSD. The userspace filesystem interface itself is heavily influenced by the kernel virtual filesystem interface. puffs can be thought to consist of two parts: the mechanism for transporting file system requests from the kernel to userspace and a userspace library, libpuffs, for interfacing with the kernel and writing the userspace filesystem implementation.

FUSE, or Filesystem in Userspace, is another API for implementing userspace filesystems. It is native to Linux, but has been widely ported to other operating systems such as FreeBSD and OpenSolaris. FUSE is considered the standard interface for implementing userspace filesystems with numerous filesystems readily available.

ReFUSE is the implementation of the FUSE API for NetBSD. It is implemented on top of puffs and completely done in userspace. Other non-Linux operating systems implement FUSE compatibility in the kernel, but we believe the ReFUSE approach is the right way to address the issue: export the kernel filesystem interface in the most natural way possible (puffs) and implement compatibility in userspace (ReFUSE).

A very important difference to note is API stability. Since the FUSE API is well-established, it is stable, and filesystems written for it will most likely continue to function for a long time. The puffs API is a "low-level" API which follows the NetBSD kernel virtual filesystem API fairly closely. As the virtual filesystem API on NetBSD evolves, so will the puffs API (and vice versa, if things go as planned). A benefit from this evolutionary incompatibility is the ability to use all the features provided by the kernel API at the best possible performance.

What is their status in NetBSD 4.0?

Antti Kantee: puffs is still under heavy development. The version of puffs found in 4.0 is a snapshot of what was in the tree at the time 4.0 was branched. This means that ReFUSE and therefore FUSE support is unfortunately not present. The most useful application for users is likely to be ssshfs, simple sshfs, which can be found in source form from the tree under src/share/examples/puffs/ssshfs. It implements sshfs functionality. This implementation was superceded by mount_psshfs(8) for NetBSD 5.0.

Technically there is no reason why support for FUSE could not be added to the NetBSD 4 branch. However, it requires backporting ReFUSE and some puffs features from the development branch and therefore a person with time and motivation to do the work, and, above all, test and support it.

No comments: