Mixed Signals (or, Why Hasn’t This Been Solved Yet?)

POSIX signals have a long history and at least a couple unpleasant limitations. For one thing,¬†with some threading implementations (those with fewer processes than threads) you can’t reliably target a specific thread as a signal recipient. However, luckily for me, that is not my problem.

My problem is both organizational and technical. Signal masks are for an entire process, and that means that masking a signal in your code may unintentionally impact code elsewhere in your application that expected signal delivery to work. This could in theory affect any codebase written by more than one person, but it really becomes an issue when your process uses code written by third parties.

The Background

We recently did some work to enable Android applications to use our MCAPI library. Most Android developers work with Java, with each application run in its own virtual machine. However, our MCAPI library is “native code” (i.e. C, not Java), and for that Android uses its own C library called “bionic” and its own threading implementation. The first problem is that bionic doesn’t implement one of the POSIX thread APIs: pthread_cancel().

As it so happens, we use pthreads in MCAPI for internal control messages. When the user de-initializes MCAPI, we need to shut those threads down, and so on Linux we ordinarily use pthread_cancel(). Since that’s unavailable in an Android environment, we implemented our own by sending a signal to wake our control thread. The thread is typically blocked in the kernel waiting for a hardware interrupt, so a signal causes it to be scheduled again, at which point it notices it should exit. Not a lot of code; tested on Linux and worked great; problem solved. When we ran it on Android though, it did nothing at all.

Remember how signals are process-wide? Well, as it turns out, Dalvik uses some signals for itself, including the signal we chose for MCAPI: SIGUSR1. When it came time to kill our thread, we sent the signal… but unbeknown to us, Dalvik code elsewhere in the application had masked SIGUSR1. Our thread never woke up and never exited.

The Solution (?)

The fix? Use SIGUSR2 instead. Works great; problem solved. ;) Longer term though, there’s no guarantee that Dalvik won’t start using that too, or an application will link with some other library that (like us) tries to use SIGUSR2. Since there is no standard API to request and reserve signal numbers, conflicts seem inevitable.

So what to do? The best general solution I can come up with is one that embedded software developers should be familiar with: punt the problem to the integrator. The developer who writes the application using our library should be able to configure MCAPI to use an arbitrary signal, which they ensure won’t conflict with the rest of the application and libraries through code inspection. (Sure hope their third-party libraries come with source code.)

That doesn’t feel very satisfying to me either.

Post Author

Posted July 21st, 2010, by

Post Tags

, , ,

Post Comments

3 Comments

About lots of little pieces

Observations and opinions from a software guy about embedded systems, especially virtualization and partitioning. lots of little pieces

Comments

3 comments on this post | ↓ Add Your Own

Commented on 19 Aug 2010 at 13:54
By Randell Jesup

I’ve never understood the lack of good signal support in Linux; I think it’s a by-product of the minimalist filesystem-oriented IPC of older Unixes, and the influence of sockets and select().

The Amiga (and I believe Xinu, which the Amiga Exec was patterned after) had AllocSignal() and the like, which formed the base for most inter-thread/inter-process communication – for example, a MsgPort would allocate a signal which would get set on PutMsg(), waking up anyone waiting on the port (or waiting on the amalgam of signals gleaned from ports, direct signals (like ^C), filehandles (which devolves to the DOS/filesystem MsgPort), etc. Yes, select does something roughly similar, but with a lot more overhead, and it also requires beating everything into a filesystem paradigm. This is part of why you see people implementing Unix “signal” behavior with “write(mypipe[0],””,1)”. It works, but boy is it a silly way to implement an IPC signal.

Commented on 30 Aug 2010 at 05:45
By Thomas Ulber

What about using POSIX condition variables? They are in local scope for the thread, but have similar semantics as UNIX signals. As far as I can tell, condition variables are available on Android (android-ndk-r4b/docs/system/libc/OVERVIEW.TXT) and Linux.

Commented on 30 Aug 2010 at 09:30
By Hollis Blanchard

We do use condition variables, but we can’t do that here because our thread is blocked in the kernel (waiting for external input). That’s the thread we need to wake…

Add Your Comment