Writing [Feed] About Pub

Debugging a Zig Test Failure

Sep 25, 2023

As a computer programmer what do you do when faced with an unfamiliar error? Do you head for Google in hopes that some kind soul before you has made a sacrifice at the Altar of Shannon? Do you begin on a hero’s journey of printf statements? Perhaps you’re an intellectual who debugs from first principles by talking to yourself in the shower. No, you’re too sophisticated for that, your coffees are pour overs and your debugging is done in a debugger — you step through each line as meticulously as you weighed your 10g of coffee per 180ml of water this morning. Or finally, maybe you just “don’t have time for this shit” so you do a drive by on some poor schmuck’s issue tracker with the most vague report possible never to return again, officially making this “someone else’s problem”. Look, I get it, we’ve all been there. Sometimes you are 11 bugs deep and you just need this one win, and you need it right now.

But what if I told you there is another way? What if the art of debugging isn’t one of those things but a combination of all of them? What if I told you the key to good debugging is a rabid curiosity combined with the ability to ask the system questions about itself. Let me give you a glimpse of this world in the context of a recent error I debugged while standing-up zig-0.11.0 on illumos.

NameTooLong

$ zig test lib/std/std.zig --zig-lib-dir lib --main-pkg-path lib/std --test-filter 'test.max file name component lengths'
Test [43/61] test.max file name component lengths... FAIL (NameTooLong)
/tmp/build_rpz/zig-0.11.0/zig-0.11.0/lib/std/os.zig:2726:25: 0x356d1e in mkdiratZ (test)
        .NAMETOOLONG => return error.NameTooLong,
                        ^
/tmp/build_rpz/zig-0.11.0/zig-0.11.0/lib/std/os.zig:2682:9: 0x31faec in mkdirat (test)
        return mkdiratZ(dir_fd, &sub_dir_path_c, mode);
        ^
/tmp/build_rpz/zig-0.11.0/zig-0.11.0/lib/std/fs.zig:1445:9: 0x31f97f in makeDir (test)
        try os.mkdirat(self.fd, sub_path, default_new_dir_mode);
        ^
/tmp/build_rpz/zig-0.11.0/zig-0.11.0/lib/std/fs.zig:1480:29: 0x31fcc4 in makePath (test)
                else => |e| return e,
                            ^
/tmp/build_rpz/zig-0.11.0/zig-0.11.0/lib/std/fs.zig:1491:9: 0x35a2c0 in makeOpenPath (test)
        try self.makePath(sub_path);
        ^
/tmp/build_rpz/zig-0.11.0/zig-0.11.0/lib/std/fs/test.zig:737:25: 0x35bc02 in testFilenameLimits (test)
        var maxed_dir = try iterable_dir.dir.makeOpenPath(maxed_filename, .{});
                        ^
/tmp/build_rpz/zig-0.11.0/zig-0.11.0/lib/std/fs/test.zig:774:9: 0x35d2a6 in test.max file name component lengths (test)
        try testFilenameLimits(tmp.iterable_dir, &maxed_ascii_filename);
        ^
60 passed; 0 skipped; 1 failed.

Here we have one of Zig’s standard library tests reporting a failure. From the output alone we can surmise that the failure is with the mkdiratZ() function and that the test has something to do with checking “max file name component lengths”. The error NameTooLong points to something being larger than expected.

From here there are many places we could choose to visit next. Where to go next is the art of debugging and depends largely on your existing knowledge of “the system”. By “the system” I mean many things all at once: Zig, this test, the standard library, the operating system, the hardware it’s running on, etc. Debugging starts with your current body of knowledge and ends by answering the question “what is happening”. Between those two points you may have to answer many other questions. Many times those questions will take you outside the bounds of your knowledge. The trick is to gently expand your horizon until it sheds light on the ultimate question you want answered. You do this with a rabid curiosity along with the ability to ask questions of the system.

When I first saw this test failure my existing body of knowledge clued me in to some basic facts about this error.

Oftentimes the first question I like to answer is “what sequence of steps led to the failure”? Just like we refer to software as “high level” and “low level”, we can answer a question like this at ever deepening levels of detail. For example, the NameTooLong failure happens when I run Zig’s stdlib test suite. Going another level down I know it happens when I run the max file name component lengths test. Another level down I know it happens when invoking Zig’s mkdiratZ() function. Repeating this process eventually brings you to some bedrock upon which your overall understanding can rest. First you focus your eyes to the trees, then you unfocus them to the forest.

In this case Zig was nice enough to provide a stacktrace of the events leading up to the failure. At the top of the stack is the mkdiratZ() call. After that we cross the system call boundary where things are handed off to the operating system; but I’m not worried about that just yet. I want to gather more data on what Zig is doing first. From the bottom of the stack we see the function testFilenameLimits() is called along with where that function is defined in the source. I’d like to go another level deeper and look at the test source to see what I can infer about this test.

lib/std/fs/test.zig
const maxed_ascii_filename = [_]u8{'1'} ** std.fs.MAX_NAME_BYTES;
try testFilenameLimits(tmp.iterable_dir, &maxed_ascii_filename);

Based on the Zig documentation I know that the first line is using the array multiplication operator ** to build an array of repeating 1 bytes of size std.fs.MAX_NAME_BYTES. This array is the input to our failing test function testFilenameLimits(). It’s not a terrible bet that maxed_ascii_filename is the same path that is passed to mkdirat. I’ll make that assumption for now while keeping in mind that I could be wrong. Next I want to find the code that defines MAX_NAME_BYTES.

lib/std/fs.zig
/// This represents the maximum size of a UTF-8 encoded file name component that
/// the platform's common file systems support. File name components returned by file system
/// operations are likely to fit into a UTF-8 encoded array of this length, but
/// (depending on the platform) this assumption may not hold for every configuration.
/// The byte count does not include a null sentinel byte.
pub const MAX_NAME_BYTES = switch (builtin.os.tag) {
    .linux, .macos, .ios, .freebsd, .openbsd, .netbsd, .dragonfly => os.NAME_MAX,
    // Haiku's NAME_MAX includes the null terminator, so subtract one.
    .haiku => os.NAME_MAX - 1,
    .solaris => os.system.MAXNAMLEN,
    // Each UTF-16LE character may be expanded to 3 UTF-8 bytes.
    // If it would require 4 UTF-8 bytes, then there would be a surrogate<
    // pair in the UTF-16LE, and we (over)account 3 bytes for it that way.
    .windows => os.windows.NAME_MAX * 3,
    // For WASI, the MAX_NAME will depend on the host OS, so it needs to be
    // as large as the largest MAX_NAME_BYTES (Windows) in order to work on any host OS.
    // TODO determine if this is a reasonable approach
    .wasi => os.windows.NAME_MAX * 3,
    else => if (@hasDecl(root, "os") and @hasDecl(root.os, "NAME_MAX"))
        root.os.NAME_MAX
    else
        @compileError("NAME_MAX not implemented for " ++ @tagName(builtin.os.tag)),
};

The code for std.fs.MAX_NAME_BYTES provides more clues.

At this point, based on my hunch that Zig is calling into the mkdirat system call, I feel compelled to read its man page. Man pages document the contract made between you and some other part of the system. They are paramount to understanding your operating system. Here is an abbreviated version of the mkdirat(2) page.

mkdirat(2)
SYNOPSIS


       #include <sys/types.h>
       #include <sys/stat.h>

       int mkdir(const char *path, mode_t mode);

       int mkdirat(int fd, const char *path, mode_t mode);


DESCRIPTION


       The mkdir() and mkdirat() functions create a new directory named by the
       path name pointed to by path. The mode of the new directory is
       initialized from mode (see chmod(2) for values of mode). The protection
       part of the mode argument is modified by the process's file creation mask
       (see umask(2)).

       The mkdirat() function behaves similarly to mkdir(); however, if path is
       a relative path, then the directory represented by fd is used as the
       starting point for resolving path. To use the processes current working
       directory, fd may be set to the value AT_FDCWD.

RETURN VALUES


       Upon successful completion, 0 is returned. Otherwise, -1 is returned, no
       directory is created, and errno is set to indicate the error.

ERRORS

       ENAMETOOLONG
                       The length of the path argument exceeds PATH_MAX, or the
                       length of a path component exceeds NAME_MAX while
                       _POSIX_NO_TRUNC is in effect.

There’s the ENAMETOOLONG error we see in the Zig stacktrace. There’s also mention of the NAME_MAX constant used by other systems in the Zig code. My hunch about a mismatch in max length is looking stronger and now is a good time to verify it by asking the operating system what it sees when it runs the test.

$ truss -f -t mkdirat -v mkdirat zig test lib/std/std.zig --zig-lib-dir lib --main-pkg-path lib/std --test-filter 'test.max file name component lengths'
14615/1:        mkdir("/export/home/rpz/.cache/zig", 0755)      Err#17 EEXIST
14615/1:        mkdirat(4, "/tmp/build_rpz/zig-0.11.0/zig-0.11.0/zig-cache", 0755) Err#17 EEXIST
14615/1:        mkdirat(6, "h", 0755)                           Err#17 EEXIST
14615/1:        mkdirat(6, "o/4e3b5e9d8ad22697aae9cfad6e27775d", 0755) Err#17 EEXIST
14615/1:        mkdirat(6, "z", 0755)                           Err#17 EEXIST
14615/1:        mkdirat(5, "z", 0755)                           Err#17 EEXIST
Semantic Analysis [6867] 14615/1:       mkdirat(5, "h", 0755)                           Err#17 EEXIST
14615/1:        mkdirat(5, "o/0163f9916a3eb95af3c16afab95a1433", 0755) Err#17 EEXIST
14615/1:        mkdirat(5, "z", 0755)                           Err#17 EEXIST
14615/1:        mkdirat(5, "z", 0755)                           Err#17 EEXIST
LLD Link... 14615/1:           Received signal #18, SIGCLD, in waitid() [default]
14615/1:              siginfo: SIGCLD CLD_EXITED pid=14616 status=0x0000
Test [43/61] test.max file name component lengths... 14618:        mkdir("zig-cache", 0755)                        Err#17 EEXIST
14618:  mkdirat(3, "tmp", 0755)                         Err#17 EEXIST
14618:  mkdirat(4, "NRsiNI8yMfae_o0l", 0755)            = 0
14618:  mkdirat(5, "1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111", 0755) Err#78 ENAMETOOLONG
Test [43/61] test.max file name component lengths... FAIL (NameTooLong)
/tmp/build_rpz/zig-0.11.0/zig-0.11.0/lib/std/os.zig:2726:25: 0x356d1e in mkdiratZ (test)
        .NAMETOOLONG => return error.NameTooLong,
                        ^
/tmp/build_rpz/zig-0.11.0/zig-0.11.0/lib/std/os.zig:2682:9: 0x31faec in mkdirat (test)
        return mkdiratZ(dir_fd, &sub_dir_path_c, mode);
        ^
/tmp/build_rpz/zig-0.11.0/zig-0.11.0/lib/std/fs.zig:1445:9: 0x31f97f in makeDir (test)
        try os.mkdirat(self.fd, sub_path, default_new_dir_mode);
        ^
/tmp/build_rpz/zig-0.11.0/zig-0.11.0/lib/std/fs.zig:1480:29: 0x31fcc4 in makePath (test)
                else => |e| return e,
                            ^
/tmp/build_rpz/zig-0.11.0/zig-0.11.0/lib/std/fs.zig:1491:9: 0x35a2c0 in makeOpenPath (test)
        try self.makePath(sub_path);
        ^
/tmp/build_rpz/zig-0.11.0/zig-0.11.0/lib/std/fs/test.zig:737:25: 0x35bc02 in testFilenameLimits (test)
        var maxed_dir = try iterable_dir.dir.makeOpenPath(maxed_filename, .{});
                        ^
/tmp/build_rpz/zig-0.11.0/zig-0.11.0/lib/std/fs/test.zig:774:9: 0x35d2a6 in test.max file name component lengths (test)
        try testFilenameLimits(tmp.iterable_dir, &maxed_ascii_filename);
        ^
60 passed; 0 skipped; 1 failed.
14615/1:            Received signal #18, SIGCLD, in waitid() [default]
14615/1:              siginfo: SIGCLD CLD_EXITED pid=14618 status=0x0001
error: the following test command failed with exit code 1:

Here I’ve used truss to trace all mkdirat calls made by the Zig test process or any of its spawned children. And sure enough we see a call with a path consisting of a long sequence of "1" bytes which ultimately fails with ENAMETOOLONG (value 78 in errno.h). Based on what the mkdirat(2) man page says we should get ENAMETOOLONG when the path component exceeds NAME_MAX, but the Zig code for illumos is using some value named os.system.MAXNAMLEN. It would seem that Zig’s value is greater than the operating system’s, leading me to two new questions.

MAXNAMLEN

Smokey this is not 'Nam this is bowling, there are rules.

Walter — The Big Lebowski

There are several ways to discover the value of os.system.MAXNAMLEN. The easiest of which is to search for the symbol in the source. This is also a good excuse to get more familiar with Zig by manually tracking through the code a bit.

lib/std/os.zig
/// Applications can override the `system` API layer in their root source file.
/// Otherwise, when linking libc, this is the C API.
/// When not linking libc, it is the OS-specific system interface.
pub const system = if (@hasDecl(root, "os") and root.os != @This())
    root.os.system
else if (builtin.link_libc or is_windows)
    std.c
else switch (builtin.os.tag) {
    .linux => linux,
    .plan9 => plan9,
    .wasi => wasi,
    .uefi => uefi,
    else => struct {},
};

Finding the definition of os.system is easy, but you might be confused on how exactly it resolves given the code above. Let’s stake a step back.

Zig is a programming language. Programming languages ship with a standard library that allows them to perform useful routines, some of which require access to the underlying “system”. In most cases that system is the operating system, but it could just as well be a “freestanding” target. A freestanding target is how one targets an embedded environment or writes their own operating system. But the most typical situation is one where the system is the operating system you are running on.

Not all operating systems are created equal, and how a userland program interacts with the operating system varies. For Linux, the API/ABI is the system call table. Linux does not ship a libc or any other system library or utility for that matter. This is the meaning behind the meme “I think you mean GNU/Linux”. Other operating systems make different choices. For example, FreeBSD gives you everything: kernel, libc, system libraries and tools, and a POSIX environment — it stands on its own. Illumos sits in the middle, it provides the kernel, libc, system libraries, system utilities, and most of the POSIX environment; but illumos itself is not an installable operating system. Rather, like Linux, it has distros which fill in the blanks. I’m running a distro called OmniOS which is geared towards sever installs. Unlike Linux, the illumos API/ABI is libc, NOT the system call table. Not only that, but there is no static libc in illumos, you must link to it dynamically. In Linux you can choose from multiple libc providers, or you can go direct to the system call table — it’s up to you.

Going back to the Zig comment above, it should be more clear what they mean by “system”. Basically, Zig gives you the opportunity to completely redefine the system, link to libc, or to utilize its built-in implementation for the system. By the way, if you look at builtin.zig you’ll see there is no link_libc field defined; that’s because it’s generated on-the-fly as part of compilation via the Compilation.generateBuiltinZigSource() function. It’s set to true if the underlying system requires linking libc (like illumos) or if the programmer specifically demands it in the project’s build.zig file. You can view these generated values by running zig build-exe --show-builtin.

On illumos the system symbol resolves to std.c.solaris. In there we find MAXNAMLEN defined with a value of 511. The odd number feels bit weird as I would have expected 512. Is this an off-by-one error? That can’t be because 511 is less than 512; so we would expect to never reach the limit. Rather than guess I should look at the definition for this constant. Normally I would cscope a checkout of the illumos-gate source, but I can also just grep the header files installed on my system.

$ ggrep -Enr '^#define[[:space:]]+MAXNAMLEN' /usr/include/*
/usr/include/dirent.h:46:#define        MAXNAMLEN       512             /* maximum filename length */
/usr/include/sys/fs/udf_inode.h:82:#define      MAXNAMLEN       255
/usr/include/sys/fs/ufs_fsdir.h:73:#define      MAXNAMLEN       255

So Zig thinks the value is 511 and the system thinks it’s 512. This makes sense because in Zig you prefer to pass a slice ([]const u8) which includes the length and has no need for NUL termination. Zig wants to keep 1 byte in reserve to properly NUL-terminate the string before passing it to the system. What about the value of the NAME_MAX constant that the man page referenced?

$ grep -Enr '^#define[[:space:]]NAME_MAX' /usr/include/*
/usr/include/limits.h:270:#define       NAME_MAX        255

The value Zig is using is greater than the value enforced by the system. Now is the time to find the smoking gun, to track down the precise location where the system enforces this limit and generates the error. But how do we do that when mkdirat is a system call? Tools like truss cannot help us here as they only trace things from the userland perspective; we need a view into the kernel. We could cscope the kernel code, but in cases like this it can often be ambiguous if we are looking at the correct code location or not. What we’d really like is the ability to place a dynamic printf in each kernel function indicating when it enters and returns, along with its return value. We could create a custom kernel fit for this purpose but it would be a massive undertaking and not a good use of our time. Ideally there should exist some system tool to perform such a feat on-the-fly without any additional installation or compiling of code.

Magical Dynamic printf

Pretend with me for a moment, that we have a magical scripting language that lets us trace functions on-the-fly. This magical language is going to look something like AWK, where we have a sequence of patterns with optional predicates, and a block of associated actions.

<probe-pattern>,... [/<predicate>,.../] {
    <action1>;
    <action2>;
    ...
}

Let’s start with a script that prints all kernel function entries and returns along with the return value.

fbt:*:entry {
	printf("%s:%s\n", probefunc, probename);
}

fbt:*:return {
	printf("%s:%s => %d\n", probefunc, probename, retval);
}

The probe patterns consist of a <provider>:<probefunc>:<probename> sequence. In this case the provider is fbt which stands for Function Boundary Tracing. It allows us to trace all function entry/return points in the kernel. The probefunc variable is the name of the function, and the probename variable indicates entry or return. So fbt:*:entry traces all kernel function entry probes; and from that you should be able to guess what fbt:*:return does. We grab the return value from the variable retval. These variables are “built-in”, meaning they are available implicitly and their value is determined by the context they are referenced in.

This is a good start, but we have a major problem: this traces all kernel functions all of the time. I want to add two additional features to limit probe firing only to when they are in service to a mkdirat system call.

syscall:mkdirat:entry { self->trace = 1; }
syscall:mkdirat:return { self->trace = 0; }

fbt:*:entry /self->trace/ {
	printf("%s:%s\n", probefunc, probename);
}

fbt:*:return /self->trace/ {
	printf("%s:%s => %d [0x%x]\n", probefunc, probename, retval, retval);
}

The syscall provider allows us to trace entry/return points for a given system call. We could use the fbt provider for the same purpose, but due to the 50 years of Unix history in illumos the kernel function names don’t always match the system call names as they are presented to users.

The self->trace declaration is a thread-local variable; it retains its value across probe firings as long as they happen in the context of the same thread. Said another way, the variable exists only in the thread that set it giving us a filtering mechanism to trace only the calls made in service to this thread. We set it to 1 to use as a truthy value in a probe’s predicate. We set it to 0 as a falsey value to disable the tracing.

With these changes we now print only function calls made in service to a mkdirat system call. This greatly reduces the output, but it could still be noisy as it includes all mkdirat calls made by all processes on the system. It would help if we could filter it further to only calls from a specific executable, but even then Zig might make quite a few mkdirat calls, especially when running a test suite. What we really want is the ability to trace the sequence of kernel function calls only when the mkdirat call results in a failure.

One way to do this is to use the script above and then post-process the output. That requires bringing in another tool and an additional step to get the information I want. Perhaps the magic scripting language could support this use case directly? What if the printf output could be written to a “speculative” buffer? And what if we could delay the decision to print the contents of that speculative buffer until the return of the mkdirat call and base that decision on the return value?

syscall:mkdirat:entry {
        self->speculation = new_speculation();
}

syscall:mkdirat:return /self->speculation/ {
	if (errno != 0) {
	        commit(self->speculation);
	} else {
	        discard(self->speculation);
	}

	self->speculation = 0;
}

fbt:*:entry /self->speculate/ {
	speculate(self->speculation);
	printf("%s:%s\n", probefunc, probename);
}

fbt:*:return /self->speculate/ {
	speculate(self->speculation);
	printf("%s:%s => %d [0x%x]\n", probefunc, probename, retval, retval);
}

Upon entry to a mkdirat we create a new speculation buffer local to the running thread. As fbt probes fire we check for an active speculation and perform a printf to its buffer. Finally, on return from mkdirat, we decide to commit or discard the buffer based on the return value.

At this point you might think I’m a bit crazy to propose such an advanced tool — that’s some great vaporware you got there Ryan. But it’s not a proposal, merely a watered down description of a language that has existed for 20 years.

DTrace

What I’ve just described is a small subset of what is possible in DTrace. DTrace is an AWK-like scripting language that allows you to ask questions about almost any aspect of the system in a dynamic and production-safe manner. It’s available on all illumos-based systems as well as Solaris, macOS, FreeBSD, Windows, and other systems. On illumos it doesn’t require any special kernel modules or compilation — it’s there waiting and ready for the moment you need it. The differences between my make-believe script and the real one are mostly in syntax.

mkdir-err.d
/*
 * Flow trace all kernel function calls that lead to a failed mkdir
 * call. This script enables all FBT probes and should only be used
 * for exploratory purposes on a development system.
 */

#pragma D option quiet
/* Present the output in an inuitive "flow" output style. */
#pragma D option flowindent
/* Increase the speculation buffer size to avoid drops. */
#pragma D option specsize=512k
/* Incresae the maximum number of bytes copied by copyinstr().  */
#pragma D option strsize=1024

syscall::mkdir*:entry {
	self->spec = speculation();
	speculate(self->spec);
	/* Either mkdir(2) or mkdirat(2). */
	this->path = copyinstr(probefunc == "mkdir" ? arg0 : arg1);
	printf("%s (len: %d) %s\n", probefunc, strlen(this->path), this->path);
}

syscall::mkdir*:return /self->spec/ {
	speculate(self->spec);
	printf("returned (errno=%d)\n", errno);
}

syscall::mkdir*:return /self->spec/ {
	if (errno == 0) {
		discard(self->spec);
	} else {
		commit(self->spec);
	}

	self->spec = 0;
}

fbt:::entry /self->spec/ {
	speculate(self->spec);
	printf("\n");
}

fbt:::return /self->spec/ {
	speculate(self->spec);
	/* arg1 may contain nonsense for a void return function. */
	printf("=> %d [0x%X]\n", arg1, arg1);
}

The script produced the following output.

mkdir-err.d output
2  => mkdirat                               mkdirat (len: 511) 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
  2    -> mkdirat
  2      -> fgetstartvp
  2        -> copyin
  2        <- kcopy                           => 0 [0x0]
  2        -> getf
  2          -> getf_gen
  2            -> set_active_fd
  2            <- set_active_fd               => -1818612763880 [0xFFFFFE58923B7318]
  2            -> set_active_fd
  2            <- set_active_fd               => -1818612763880 [0xFFFFFE58923B7318]
  2          <- getf_gen                      => -1816832987752 [0xFFFFFE58FC50AD98]
  2        <- getf                            => -1816832987752 [0xFFFFFE58FC50AD98]
  2        -> releasef
  2          -> clear_active_fd
  2          <- clear_active_fd               => -1818612763880 [0xFFFFFE58923B7318]
  2          -> cv_broadcast
  2          <- cv_broadcast                  => -1818612763880 [0xFFFFFE58923B7318]
  2        <- releasef                        => -1818612763880 [0xFFFFFE58923B7318]
  2      <- fgetstartvp                       => 0 [0x0]
  2      -> audit_getstate
  2      <- audit_getstate                    => 0 [0x0]
  2      -> vn_createat
  2        -> audit_getstate
  2        <- audit_getstate                  => 0 [0x0]
  2        -> pn_get
  2          -> kmem_alloc
  2            -> kmem_cache_alloc
  2            <- kmem_cache_alloc            => -1807166613760 [0xFFFFFE5B3C79D700]
  2          <- kmem_alloc                    => -1807166613760 [0xFFFFFE5B3C79D700]
  2          -> pn_get_buf
  2            -> copyinstr
  2            <- copystr                     => 0 [0x0]
  2          <- pn_get_buf                    => 0 [0x0]
  2        <- pn_get                          => 0 [0x0]
  2        -> lookuppnat
  2          -> lookuppnatcred
  2            -> lookuppnvp
  2              -> audit_getstate
  2              <- audit_getstate            => 0 [0x0]
  2              -> pn_fixslash
  2              <- pn_fixslash               => 0 [0x0]
  2              -> pn_getcomponent
  2              <- pn_getcomponent           => 78 [0x4E]
  2              -> vn_rele
  2              <- vn_rele                   => 2 [0x2]
  2            <- lookuppnvp                  => 78 [0x4E]
  2          <- lookuppnatcred                => 78 [0x4E]
  2        <- lookuppnat                      => 78 [0x4E]
  2        -> pn_free
  2          -> kmem_free
  2            -> kmem_cache_free
  2            <- kmem_cache_free             => -1812913504448 [0xFFFFFE59E5EF3F40]
  2          <- kmem_free                     => -1812913504448 [0xFFFFFE59E5EF3F40]
  2        <- pn_free                         => -1812913504448 [0xFFFFFE59E5EF3F40]
  2      <- vn_createat                       => 78 [0x4E]
  2      -> vn_rele
  2      <- vn_rele                           => 1 [0x1]
  2      -> set_errno
  2      <- set_errno                         => 78 [0x4E]
  2    <- mkdirat                             => 78 [0x4E]
  2  <= mkdirat                               returned (errno=78)

The trick here is start at the bottom and trace the origin of the 78 (ENAMETOOLONG) value. We end up at pn_getcomponent() function which comes from the "Path Name utilities" code found in pathname.c.

uts/common/fs/pathname.c
/*
 * Get next component from a path name and leave in
 * buffer "component" which should have room for
 * MAXNAMELEN bytes (including a null terminator character).
 */
int
pn_getcomponent(struct pathname *pnp, char *component)
{
	char c, *cp, *path, saved;
	size_t pathlen;

	path = pnp->pn_path;
	pathlen = pnp->pn_pathlen;
	if (pathlen >= MAXNAMELEN) {
		saved = path[MAXNAMELEN];
		path[MAXNAMELEN] = '/';	/* guarantees loop termination */
		for (cp = path; (c = *cp) != '/'; cp++)
			*component++ = c;
		path[MAXNAMELEN] = saved;
		if (cp - path == MAXNAMELEN)
			return (ENAMETOOLONG);
	} else {
		path[pathlen] = '/';	/* guarantees loop termination */
		for (cp = path; (c = *cp) != '/'; cp++)
			*component++ = c;
		path[pathlen] = '\0';
	}

	pnp->pn_path = cp;
	pnp->pn_pathlen = pathlen - (cp - path);
	*component = '\0';
	return (0);
}

And now we have a third constant: MAXNAMELEN. Notice that this one has a vowel where the one defined in Zig does not. It’s defined in sys/param.h which is where various system parameters are kept.

uts/common/sys/param.h
/*
 * MAXPATHLEN defines the longest permissible path length,
 * including the terminating null, after expanding symbolic links.
 * TYPICALMAXPATHLEN is used in a few places as an optimization
 * with a local buffer on the stack to avoid kmem_alloc().
 * MAXSYMLINKS defines the maximum number of symbolic links
 * that may be expanded in a path name. It should be set high
 * enough to allow all legitimate uses, but halt infinite loops
 * reasonably quickly.
 * MAXNAMELEN is the length (including the terminating null) of
 * the longest permissible file (component) name.
 */
#define	MAXPATHLEN	1024
#define	TYPICALMAXPATHLEN	64
#define	MAXSYMLINKS	20
#define	MAXNAMELEN	256

And with that I have my smoking gun. The illumos kernel sets a limit of 256 bytes, including NUL, for a component name. Zig, on the other hand, believes it’s 512. So where did MAXNAMLEN (notice the missing vowel again) come from and does it ever come into play?

head/dirent.h
#if defined(__EXTENSIONS__) || !defined(__XOPEN_OR_POSIX)

#define	MAXNAMLEN	512		/* maximum filename length */

This #if pre-processor macro that comes before the definition is important. It states that this constant is exposed only when asking for extensions or when we are not compiling for an XOPEN/POSIX environment. This has to do with the enforcement of standards and you can read more about this in the standards(7) man page. The section on “Feature Test Macros” speaks a bit to this __EXTENSIONS__ check. But the main takeaway is that extensions define features outside of a conforming POSIX environment.

standards(7)
   Feature Test Macros


       Feature test macros are used by applications to indicate additional sets
       of features that are desired beyond those specified by the C standard. If
       an application uses only those interfaces and headers defined by a
       particular standard (such as POSIX or X/Open CAE),  then it need only
       define the appropriate feature test macro specified by that standard. If
       the application is using interfaces and headers not defined by that
       standard, then in addition to defining the appropriate standard feature
       test macro, it must also define __EXTENSIONS__. Defining __EXTENSIONS__
       provides the application with access to all interfaces and headers not in
       conflict with the specified standard. The application must define
       __EXTENSIONS__ either on the compile command line or within the
       application source files.

MAXNAMLEN, MAXNAMELEN, and NAME_MAX

So what’s the correct constant to use? Technically speaking, none of them. The maximum component length is up to the filesystem and should be queried via pathconf(2). This system call and its related constant NAME_MAX were introduced way back in POSIX-1988. However, as I learned from illumos-11781, there is a history of using MAXNAMLEN in the BSDs. I did some spelunking in the unix-history-repo and found that BSD 4.3 Reno defined NAME_MAX but didn’t implement pathconf. Furthermore, its dir(5) page referred to MAXNAMLEN and its VFS lookup function used that constant to enforce component length. BSD 4.4 implemented pathconf and modified the VFS layer to use NAME_MAX, but dir(5) still referred to MAXNAMLEN. In fact, to this day FreeBSD’s dir(5) still documents MAXNAMLEN. So how did we end up with the wrong value? Fifty years of Unix history and a lot of confusion, that’s how.

Passing Result

The fix here is simple: patch the Zig code to use NAME_MAX for illumos just like the other platforms. After that the test passes as expected.

Request for Questions

Did you find this interesting? Do you have a question about your system that you don’t know how to answer? If so, send me an email at ryan@zinascii.com with your question and the tag RFQ in the subject line. If I have some insight to offer I will. If I think the question is fun I might even write a blog post on it.