Crossed Signals, A 15 Year Old Bug in a Feature You’ve Never Heard Of
3 Nov 2014
My interpretation (as well as Stevens’) of POSIX was
incorrect. Section B.1.3 states that the rationale guidance
is for the application developer, not the systems
developer. Setting the sa_mask
, in the
application, is the appropriate way to achieve cross
platform priority delivery. See Stevens’
own errata
entry (p. 102) and Geoff Clare’s reply (#211) to
a bug
filed against POSIX.1-2004. Sorry for not catching this
sooner and for any confusion this mistake has caused.
Your production operating system does not implement realtime signals correctly. Don’t worry, no one’s does—not unless they are running HP-UX64 11.11i. And if you are one of those five people, congratulations, you can now justify the coin you dropped for that license.
We first run the program under Solaris 2.6, but the output is not what is expected. The nine signals are queued, but the three signals are generated starting with the highest signal number (we expect the lowest signal number to be generated first). Then for a given signal, the queued signals appear to be delivered in LIFO, not FIFO, order…We now run the program under Digital Unix 4.0B and see the expected results…The Solaris 2.6 implementation appears to have a bug. Ste99 §5.7
It appears to have a bug indeed! I admire the dry euphemism deployed by the great Richard Stevens, but I must be blunt: not only did Solaris 2.6 have a bug, it got delivery order completely wrong. Was it opposite day when this code was written? Did anyone bother to verify the POSIX semantics were met? It seems not.
And lest you think I’m picking on old man Solaris, you’ll be happy to know modern day kernels such as illumos, FreeBSD, and yes, even your coveted Linux, all get it wrong. If I were to give this bug a theme, in tradition with the recent Heartbleed and Shellshock bugs, I’d have to call it the EverybodyGotItWrongButNoOneNoticed bug. I’m still working on a logo.
Realtime signals were codified 21 years ago in POSIX.1b. Yet they have gone implemented incorrectly in most operating systems this entire time. I hope that my post may effect change, and we can finally close that bug Stevens reported 15 years ago.
POSIX.1b and Realtime Signals
Realtime signals are an extension to the familiar base signals
used everyday, such as the SIGSEGV
trap that my C
programs like to produce. They were standardized by POSIX.1b,
published in 1993 (The original name was POSIX.4, an eponym of
the working group. But someone couldn’t leave good enough
alone. There was a “Grand Renumbering.” All POSIX.1 extensions
were renamed to POSIX.1x. Just know that POSIX.4, POSIX
1003.1b-1993, and POSIX.1b all refer to the same thing—the
realtime extensions to
POSIX.1 Gal95). Included in the same document
that specified the ubiquitous mmap(2)
and fsync(3C)
, and gave us better portable
semaphores, timers, and IPC. Realtime signals extend base
signals in three significant ways.
-
QueueingWhen the same signal number is generated multiple times; it shall be delivered multiple times. E.g., if realtime signal S is being delivered while two more S signals are generated then those additional signals will form a queue. It is undefined whether base signals queue, implementations are free to collapse them.
-
FIFO orderingMultiple pending signals of the same number must be delivered in the order they were generated.
-
PriorityLower signal numbers must be delivered first. They preempt higher numbered signals. There is no defined order between base signals nor the case when both base and realtime signals are pending, it is up to the implementation. E.g., Linux gives base signals priority over realtime signals.
Overloading the signal number and giving higher priority to lower numbers is not accidental. This allowed reuse of the existing delivery mechanism while remaining efficient and simple to implement IEEE13 §B.2.4.
The concern of this post is with the last two points.
A 15 Year Old Bug
In figure 1, the output of Stevens’ realtime signals test running on OmniOS r151012 (a descendant of Solaris 2.6). In figure 2, the expected output. In the generation-and-a-half since Stevens ran his test the LIFO bug has been fixed but priority inversion persists.
(The test, rtsignals/test1.c
, can be obtained
at the
UNPV22e website).
The good news is FreeBSD and Ubuntu produce the same incorrect output. But three popular kernels being wrong in the same way doesn’t change the fact that they are wrong. Surely, in the time that has passed since Fight Club was #1 at the box office, someone else must have noticed POSIX.1b’s inflamed sense of rejection.
According to the POSIX standard, multiple real-time signals pending to a process should be delivered in a strict order. Specifically, the lowest-numbered signal should be delivered first and multiple occurrences of signals with the same number should be delivered in FIFO order.
Current Linux kernel delivers the highest-numbered signals pending to a process first, not the lowest-numbered ones. This contradicts to the requirement explained above. The problem can be demonstrated by the following test program… Sal07a
This LKML thread from 2007 is describing the same bug discovered by Stevens (along with an attached test and a patch for Linux 2.6.22.1). The author received exactly one response.
I believe you should check that you mask or signal in your signal handler. If you don’t the high-prio handler will be prempted by low-prio, and they will be executed in the reverse order. CAS07
On one hand, this person is correct, masking off the higher signals will prevent preemption. On the other hand, this solution is annoying. The operating system should enforce POSIX semantics, not the user! It turns out I’m not the only one who feels this way. The next reply mentions that two different Linux kernels showed different behavior.
When I ran the old test program using a vendor-specific heavily patched kernel the signals order was as the POSIX standard specified. Another kernel, which was closer to the vanilla kernel, did not show the expected behavior. Instead, the signals were handled in the reversed order. Sal07b
But enough with the history. Let’s get to the fun part, the actual reason for this bug.
Kernel and libc Sitting in a Tree: P-R-E-E-M-P-T
As a process exits a syscall the kernel checks for pending
signals. If a handler is registered for the signal, by a
previous call to sigaction(2)
, the kernel will
invoke libc to perform delivery. In illumos this leads
to call_user_handler()
. Just before executing
the handler, this function calls lwp_sigmask()
to
block the current signal being delivered and any signals
specified in sa_mask
. After setting the mask,
but before returning, lwp_sigmask()
checks for
additional pending signals. If any exist it sets
the t_sig_check
flag; alerting the kernel to read
pending signals on the next syscall exit. This is all fine
and good, except for one small
thing: lwp_sigmask()
is a syscall.
Upon exiting lwp_sigmask()
, the kernel
notices t_sig_check
is set and starts the
delivery process all over again, preempting the delivery that
was already in motion. This process repeats until all unique
signal numbers are seen and masked off by the
thread’s t_hold
field. At this point the stack
can begin to unwind, but the damage has already been done. The
kernel delivers the the signals in the correct order only to
have userland invert them on the stack. Below is the output of
a DTrace script
demonstrating this effect.
See the complete output.
Ironically, the fix is not only simple but is spelled out directly in POSIX.
Given the specified selection of the lowest numeric unblocked pending signal, preemptive priority signal delivery can be achieved using signal numbers and signal masks by ensuring that the sa_mask for each signal number blocks all signals with a higher numeric value. IEEE13 §B.2.4
If SIGRTMAX=73
and realtime signal 47 is being
delivered then call_user_handler()
should
set sa_mask
to block signals 48 through 73. That
way only lower-numbered, higher-priority signals may preempt
its delivery. But wait, what if multiple instances of 47 are
pending? It would preempt itself and cause LIFO ordering. To
honor both priority and FIFO semantics all signals
greater or equal to the realtime signal being delivered
must be blocked during delivery. That brings me to the topic
of non-deferred signals.
All signals are deferred by default. The signal being
delivered is blocked while being delivered, preventing
preempting by the same signal number. When registering a
handler, the option SA_NODEFER
is used to disable
this behavior. POSIX does not discuss mixing realtime and
nodefer together, but it should have. It is nonsensical to
mix the two. SA_NODEFER
is in fundamental
disagreement with FIFO ordering.
Either sigaction(2)
should fail
with EINVAL
or the delivery implementation should
ignore the option when delivering a realtime signal.
Too Late to be Portable?
The only operating system I know to produce correct results is HP-UX64 11.11i. The other seven—OmniOS r151012, FreeBSD 10.0, NetBSD 6.1.5, Ubuntu 14.04, IRIX64 6.5.29, AIX 5.3, and AIX 6—managed to produce four unique orderings. The output of the rtsignal test for each operating system can be found in results.txt. Even if all kernels are patched tomorrow, realtime signal ordering cannot be truly portable without introducing some type of compile-time check. Until then, if you want maximum portability, you should manually block signals to achieve the proper ordering.
If you’d like to test your operating system then I have three options for you: 1) Stevens’s original test program, 2) my rtsignal test—a hybrid of the Stevens and LKML tests, or 3) rt_sig_test.c, based off the patch I wrote for illumos.
Acknowledgements
Thankyou to my friend Andrew Thompson for firing up his HP c8000 and SGI Octane to run tests on the proprietary Unices. Jared Morrow and Tom Santero for reviewing this post. Garret D’Amore for reviewing my illumos patch. And all those who provided input during my investigation: Bryan Cantrill, Bob Friesenhahn, Andrew Gabriel, Robert Mustacchi, and Rafel Vanoni.
References
CAS07 |
|
Gal95 |
|
IEEE13 |
|
Sal07a |
|
Sal07b |
|
Ste99 |