[Looking for Feedback] - Fairness lib
Dear Boost Community,
I am writing to seek feedback on a new C++ library that I have been
developing (https://github.com/Sernior/fairness
https://github.com/Sernior/fairness).
This library, named "*Fairness*," is still a work in progress.
Nonetheless, I believe that this is an opportune time to begin gathering
feedback.
*Fairness* is designed to provide fair synchronization primitives. While
the standard C++ library offers various tools for managing concurrency,
it does not provide mechanisms for handling fairness policies.
To introduce the most fundamental component of my library, the
"priority_mutex," I would like to share a simple example:
/#include <thread>/
/#include <chrono>/
/#include <algorithm>/
/#include
To add the benchmark results since the image did not show in my mail. --------------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------------------------------------------------------------- priority_mutex_benchmark::PM_LockUnlock/threads:8 1.64 ns 13.0 ns 54189280 standard_mutex_benchmark::STD_LockUnlock/threads:8 0.712 ns 5.54 ns 125137360 slim_priority_mutex_benchmark::SLM_PM_LockUnlock/threads:8 3.27 ns 26.0 ns 27112392 spinlock_priority_mutex_benchmark::SPNLC_PM_LockUnlock/threads:8 0.825 ns 6.49 ns 111065720 slim_spinlock_priority_mutex_benchmark::SPNLC_SLM_PM_LockUnlock/threads:8 3.41 ns 27.2 ns 25413960 recursive_priority_mutex_benchmark::R_PM_LockUnlock/threads:8 1.65 ns 13.1 ns 53737320 standard_recursive_mutex_benchmark::R_STD_LockUnlock/threads:8 0.960 ns 7.67 ns 105165456 shared_priority_mutex_benchmark::PM_S_LockUnlock/threads:8 1.62 ns 12.9 ns 54708152 standard_shared_mutex_benchmark::STD_S_LockUnlock/threads:8 0.965 ns 7.68 ns 106124968 shared_priority_mutex_benchmark::PM_S_SLockSUnlock/threads:8 1.58 ns 12.6 ns 54070432 standard_shared_mutex_benchmark::STD_S_SLockSUnlock/threads:8 0.721 ns 5.76 ns 117148936 ____________________________________________________________________________________________ priority_mutex_benchmark::PM_pipeline_benchmark_long/threads:8 9541052 ns 52513517 ns 16 standard_mutex_benchmark::STD_pipeline_benchmark_long/threads:8 9774822 ns 51878916 ns 16 slim_priority_mutex_benchmark::SLM_PM_pipeline_benchmark_long/threads:8 9380027 ns 55017352 ns 16 spinlock_priority_mutex_benchmark::SPNLC_PM_pipeline_benchmark_long/threads:8 9531843 ns 75511719 ns 8 slim_spinlock_priority_mutex_benchmark::SPNLC_SLM_PM_pipeline_benchmark_long/threads:8 9532650 ns 75912929 ns 8 recursive_priority_mutex_benchmark::R_PM_pipeline_benchmark_long/threads:8 9539370 ns 52512176 ns 16 standard_recursive_mutex_benchmark::R_STD_pipeline_benchmark_long/threads:8 9775127 ns 51879528 ns 16 shared_priority_mutex_benchmark::PM_S_pipeline_benchmark_long/threads:8 9531850 ns 76251304 ns 16 standard_shared_mutex_benchmark::STD_S_pipeline_benchmark_long/threads:8 9775146 ns 51880461 ns 16 shared_priority_mutex_benchmark::PM_S_Spipeline_benchmark_long/threads:8 6484466 ns 51870805 ns 16 standard_shared_mutex_benchmark::STD_S_Spipeline_benchmark_long/threads:8 6484449 ns 51869927 ns 16 ____________________________________________________________________________________________ priority_mutex_benchmark::PM_pipeline_benchmark_gaming/threads:8 949731 ns 5213265 ns 136 standard_mutex_benchmark::STD_pipeline_benchmark_gaming/threads:8 1001777 ns 5193649 ns 136 slim_priority_mutex_benchmark::SLM_PM_pipeline_benchmark_gaming/threads:8 966555 ns 5557157 ns 128 spinlock_priority_mutex_benchmark::SPNLC_PM_pipeline_benchmark_gaming/threads:8 952582 ns 7604306 ns 96 slim_spinlock_priority_mutex_benchmark::SPNLC_SLM_PM_pipeline_benchmark_gaming/threads:8 921612 ns 7355304 ns 88 recursive_priority_mutex_benchmark::R_PM_pipeline_benchmark_gaming/threads:8 969296 ns 5291337 ns 136 standard_recursive_mutex_benchmark::R_STD_pipeline_benchmark_gaming/threads:8 1001943 ns 5193937 ns 136 shared_priority_mutex_benchmark::PM_S_pipeline_benchmark_gaming/threads:8 921843 ns 7348795 ns 88 standard_shared_mutex_benchmark::STD_S_pipeline_benchmark_gaming/threads:8 999545 ns 5191391 ns 136 shared_priority_mutex_benchmark::PM_S_Spipeline_benchmark_gaming/threads:8 648466 ns 5187308 ns 136 standard_shared_mutex_benchmark::STD_S_Spipeline_benchmark_gaming/threads:8 648462 ns 5187299 ns 136 ____________________________________________________________________________________________ priority_mutex_benchmark::PM_pipeline_benchmark_audio/threads:8 96730 ns 525237 ns 1336 standard_mutex_benchmark::STD_pipeline_benchmark_audio/threads:8 102634 ns 521351 ns 1344 slim_priority_mutex_benchmark::SLM_PM_pipeline_benchmark_audio/threads:8 97671 ns 566508 ns 1264 spinlock_priority_mutex_benchmark::SPNLC_PM_pipeline_benchmark_audio/threads:8 91991 ns 734638 ns 936 slim_spinlock_priority_mutex_benchmark::SPNLC_SLM_PM_pipeline_benchmark_audio/threads:8 97838 ns 780163 ns 928 recursive_priority_mutex_benchmark::R_PM_pipeline_benchmark_audio/threads:8 96704 ns 525095 ns 1336 standard_recursive_mutex_benchmark::R_STD_pipeline_benchmark_audio/threads:8 102726 ns 521373 ns 1344 shared_priority_mutex_benchmark::PM_S_pipeline_benchmark_audio/threads:8 92130 ns 734847 ns 904 standard_shared_mutex_benchmark::STD_S_pipeline_benchmark_audio/threads:8 102640 ns 521309 ns 1344 shared_priority_mutex_benchmark::PM_S_Spipeline_benchmark_audio/threads:8 64865 ns 518881 ns 1352 standard_shared_mutex_benchmark::STD_S_Spipeline_benchmark_audio/threads:8 64864 ns 518872 ns 1352 ____________________________________________________________________________________________ priority_mutex_benchmark::PM_pipeline_benchmark_fast/threads:8 1181 ns 7046 ns 95936 standard_mutex_benchmark::STD_pipeline_benchmark_fast/threads:8 1537 ns 7263 ns 95040 slim_priority_mutex_benchmark::SLM_PM_pipeline_benchmark_fast/threads:8 1218 ns 9528 ns 78904 spinlock_priority_mutex_benchmark::SPNLC_PM_pipeline_benchmark_fast/threads:8 1004 ns 8017 ns 86744 slim_spinlock_priority_mutex_benchmark::SPNLC_SLM_pipeline_benchmark_fast/threads:8 1028 ns 8192 ns 85040 recursive_priority_mutex_benchmark::R_PM_pipeline_benchmark_fast/threads:8 1177 ns 7084 ns 94584 standard_recursive_mutex_benchmark::R_STD_pipeline_benchmark_fast/threads:8 1573 ns 7318 ns 96456 shared_priority_mutex_benchmark::PM_S_pipeline_benchmark_fast/threads:8 1380 ns 10979 ns 59928 standard_shared_mutex_benchmark::STD_S_pipeline_benchmark_fast/threads:8 1534 ns 7301 ns 97200 shared_priority_mutex_benchmark::PM_S_Spipeline_benchmark_fast/threads:8 665 ns 5318 ns 131400 standard_shared_mutex_benchmark::STD_S_Spipeline_benchmark_fast/threads:8 663 ns 5303 ns 131816
On 10/21/23 17:11, Federico Abrignani via Boost wrote:
Dear Boost Community,
I am writing to seek feedback on a new C++ library that I have been developing (https://github.com/Sernior/fairness https://github.com/Sernior/fairness). This library, named "*Fairness*," is still a work in progress. Nonetheless, I believe that this is an opportune time to begin gathering feedback.
*Fairness* is designed to provide fair synchronization primitives. While the standard C++ library offers various tools for managing concurrency, it does not provide mechanisms for handling fairness policies.
[snip]
The primary motivation behind creating this library is to enhance throughput in multi-threaded pipelines. To illustrate the concept more simply, imagine a pipeline with N tasks, some of which access a protected resource. In cases of contention, there exists an optimal order of resource accesses that minimizes overall execution time, https://sernior.github.io/fairness/ https://sernior.github.io/fairness/ the images show 2 different ways of ordering critical sections one of which is much better than the other.
This library aims to empower developers, who have profiled their software, to manipulate this order with minimal performance overhead and as simply as possible.
While I do think priority-based synchronization primitives would be a very useful addition to Boost libraries, I find this motivation puzzling. Priorities are introduced to avoid starvation of specific (high-priority) threads at the expense of the less important threads and likely overall performance (because fairness is not free). The chart you show on the front page is nice, but it seems to me like a hand-tailored case that is not necessarily related to reality. Specifically, there is no guarantee that processing times for different threads are aligned as depicted on your chart. For example, one of the threads could have a very long processing time compared to other threads, which would negate any effect on the total processing time from reordering threads using priorities. What priorities actually provide is minimize the processing time of the high-priority thread, pretty much regardless of the low-priority threads, barring data starvation, of course. So I think, you should probably reconsider the motivation or at least present some evidence that the library helps to achieve the stated goals in practice (and how does it do that).
Priorities are introduced to avoid starvation of specific (high-priority) threads at the expense of the less important threads and likely overall performance (because fairness is not free). The chart you show on the front page is nice, but it seems to me like a hand-tailored case that is not necessarily related to reality.
Yes the charts I provide are a simple hand tailored case that I am using to explain visually a concept that may happen in the real world.
Specifically, there is no guarantee that processing times for different threads are aligned as depicted on your chart. For example, one of the threads could have a very long processing time compared to other threads, which would negate any effect on the total processing time from reordering threads using priorities. What priorities actually provide is minimize the processing time of the high-priority thread, pretty much regardless of the low-priority threads, barring data starvation, of course. So I think, you should probably reconsider the motivation or at least present some evidence that the library helps to achieve the stated goals in practice (and how does it do that).
Right, but that is also why I said "After profiling your software".
There is no guarantee until a developer sees it happening.
Allow me to quote my own documentation:
"
These tools, if misused, have the potential to cause harm rather than
benefit. Careful implementation and understanding are crucial to harness
their benefits effectively.
"
What I meant with this sentence is exactly what you are saying:
These tools should be used by people who have first profiled their
software and found themselves in the situation where they thought:
"If only I had a way to tell my threads that a specific task should be
performed first ".
But I feel your concern as it is also my concern. I surely need to
perform more studies to find out exactly the boundaries of when using
priority mutexes is better than not using them.
If you are interested you can have a look at my pipeline benchmarks and
tinker with them!
https://github.com/Sernior/fairness/blob/main/benchmarks/pm/priority_mutex_b...
std::array
On 10/21/23 21:03, Federico Abrignani via Boost wrote:
Specifically, there is no guarantee that processing times for different threads are aligned as depicted on your chart. For example, one of the threads could have a very long processing time compared to other threads, which would negate any effect on the total processing time from reordering threads using priorities. What priorities actually provide is minimize the processing time of the high-priority thread, pretty much regardless of the low-priority threads, barring data starvation, of course. So I think, you should probably reconsider the motivation or at least present some evidence that the library helps to achieve the stated goals in practice (and how does it do that).
But returning to your example, if one of the threads, lets call it *L*, has has a very *L*ong processing time compared to the other threads, wouldn`t you agree that it would be better if *L* had priority
over the other threads? If you wanted to finish as fast as possible the tasks of all threads?
No, not necessarily. It is usually not the matter of long or short run time, but rather what is the bottleneck or what is required to finish fast. To give an example, consider two threads. One is produces data and puts it in a queue, the other one dequeues the data and, say, writes it to a file or sends to network. Producing data is expensive, so the first thread may take a lot of processing to produce a piece of the data. Also, the queue has limited capacity and may overflow, in which case the writer will overwrite previously written data. In order to avoid that, you would typically prefer the second thread to have a higher priority, so that the enqueued data is dispatched ASAP and the queue is mostly empty. Doing so may increase the runtime of the first thread, as it may block on the queue for longer periods of time, especially if there are multiple readers. Surely, there is a wide variety of use cases, some of them are more along the lines of what you're saying. My point, though, is that priorities are not about minimizing overall run time, as your documentation suggests. They are primarily about minimizing starvation or minimizing latency of *specific* threads or tasks at the expense of other threads and general performance. Whether that effect is used to achieve correctness, reduce latency or improve throughput is application-specific.
Surely, there is a wide variety of use cases, some of them are more along the lines of what you're saying. My point, though, is that priorities are not about minimizing overall run time, as your documentation suggests. They are primarily about minimizing starvation or minimizing latency of *specific* threads or tasks at the expense of other threads and general performance. Whether that effect is used to achieve correctness, reduce latency or improve throughput is application-specific.
I see your point. Yes, the reason I started this lib was loop like pipelines and maybe I got too "settled" with that specific use case while writing that piece of documentation. The consumer producer problem is surely not the use case I started this library for and who knows how many more use cases are there. But my point is: Imagine you have an loop like pipeline. while (true){ update(); render(); } This could be a video game or any other graphical application. update() get performed by X threads each one having a specific task. In order to gain maximum FPS you want those threads to be able to finish as fast as possible. This is the use case I has in mind when I wrote that. But I see your point now. I will remove that statement and provide instead different examples of different use cases instead. Thanks for the feedback. Federico Abrignani.
participants (3)
-
Andrey Semashev
-
Federico Abrignani
-
Federico Abrignani