Beware of a long post. On 5/13/21 5:27 PM, André Almeida wrote:
Hi there,
I'm the author of futex2[0], a WIP new set of Linux's syscalls that allows userspace to write efficient sync mechanisms. I would like to hear from Boost's developers if the project would benefit from this new interface.
From Boost/sync's codebase, I can see that you are already familiar with futexes, but just in case:
[snip]
The detailed description of the API can be seen in the documentation patch[1]. Do you think that Boost would benefit from it?
Hi, and thank you for working on this and especially for including 64-bit futex support in the latest patches. I have already described some of the use cases in my earlier post on LKML[1], but I'll try to recap and expand on it here. Boost contains many libraries, but there are few of them that deal with thread synchronization directly: - Boost.Atomic implements atomic operations and also basic wait/notify operations. Supports both inter-thread and inter-process synchronization. - Boost.Interprocess implements inter-process communication primitives, including synchronization. - Boost.Sync and Boost.Thread implement inter-thread communication primitives, including synchronization. (Note that Boost.Sync is not an officially accepted library yet; you can consider it a work in progress that is not yet an official part of Boost.) A few other libraries are worth mentioning. Boost.Fiber and Boost.Log implement custom thread synchronization primitives that use futex API directly. Some libraries may be also using low-level thread synchronization APIs, such as pthread and WinAPI, but not futex directly. Of the libraries I mentioned, the prime user of futex2 would be Boost.Atomic. With the current implementation based on existing futex API, the important missing part is support for futex sizes other than 32 bits. This means that for atomics other than 32-bit Boost.Atomic must use an internal lock pool to implement waiting and notifying operations, which increases thread contention. For inter-process atomics, this means that waiting must be done using a spin loop, which is terribly inefficient. So, the support for 8, 16 and 64-bit futexes would be very much needed here. Another potential use case for futex2 is the mass locking algorithms[2] in Boost.Thread. Basically, the algorithm accepts a list of lockable objects (e.g. locks or mutexes) and attempts to lock them all before returning. Here, I imagine, the support for waiting on multiple futexes could come in handy. It should be noted that the algorithms are generic, so they must work on any type of lockable objects, including those that do not use or expose a futex, so the optimization is not trivial or universally applicable. However, if the algorithm is applied to Boost.Thread primitives, and those expose a futex, this could work quite well. Although Boost.Interprocess doesn't currently use futexes directly, I imagine it would benefit from it. Not in least part because pthread does not provide robust condition variables, and robust mutexes alone are often not enough for organizing inter-process communication. Robust IPC is a recurring theme in Boost.Interprocess issues and PRs, so I think, some solution is needed here and futex could be a building block. In my LKML post I have described one solution to this problem (that is implemented in a project outside Boost) and there 64-bit futexes would be very much useful. Alternatively, futex2 could offer a new API for implementing robust primitives in userspace. I know the current futex2 patch set does not implement robust futexes, and I'm not asking to implement them, but if there are plans to eventually add robust futexes, here is a thought. The new API should preferably support multiple users of this feature. That is, the kernel API should allow any piece of userspace code (not just libc) to mark individual futexes as robust, without having to maintain a common list of robust futexes in userspace. Currently, this list is maintained by libc internally, which prevents any futex user (other than libc itself) from using robust futexes. But this feature should probably be discussed with libc develolpers. Other than the above, I can't readily remember potential use cases for futex2 in Boost. We do use futexes (the currently exiting futex API) in Boost.Sync and other libraries and could use them elsewhere, but for primitives like mutexes, condition variables, semaphores and events the existing API is sufficient. We currently don't implement NUMA-specific primitives, which might be a good future addition to Boost, but I can't tell whether the new futex2 API would be sufficient to it. Better NUMA support could be interesting for the thread pool implementation in Boost.Thread, but I'm not familiar with that code and don't know how useful futex2 would be there. As for use cases outside Boost, that application that I described in the LKML post would benefit not only from 64-bit futexes but also from the ability to wait on multiple futexes. We are also using futex bitset API in order to reduce the number of woken threads blocked on a futex. The bitset is used as a mask of events that each blocked thread subscribes to. When the notifying thread wakes, it sets the bitset to the mask of events that happened, so that only the threads that are waiting for the events are woken up. I think, this could be emulated with multiple futexes in the futex2 design, although I'm not sure if that would be as efficient, as that would increase the number of futexes at least twofold in our case (since every thread most of the time subscribes to at least two events). I can provide more details on this use case, if you're interested. [1]: https://lore.kernel.org/lkml/9557a62c-ab64-495b-36bd-6d8db426ddce@gmail.com/ [2]: https://www.boost.org/doc/libs/1_76_0/doc/html/thread/synchronization.html#t...