[asio] monitoring io_context load
Hello, I would like to have a metric that measures time spent for work (executing completion handlers) vs time spent waiting for work in thread pool executing io_context::run(). It will help to estimate service capacity (i.e. how many more requests per second it can handle). It is similar to the system's CPU usage metric but CPU usage metric does not include waiting for mutex locks or sync io performed from completion handlers. In our performance tests io_context becomes overloaded (the queue grows faster than it's processed) when CPU usage is about 85%. I would like to have a metric that shows 100% in this case. There is BOOST_ASIO_CUSTOM_HANDLER_TRACKING macro that allows to intercept handler invocations and to measure execution time. But in the case of many lightweight handlers measuring time can bring significant overhead. I'm considering adding similar macros before and after waiting for the condition variable and epoll to measure wait time and then calculate execution time. I'm writing to ask if there is a better alternative? If not, would you be interested in a patch that adds "Handler Tracking" macros around wait calls? Thank you, Dmitry
On Fri, Jun 30, 2023 at 6:58 AM Dmitry via Boost-users
I would like to have a metric that measures time spent for work (executing completion handlers) vs time spent waiting for work in thread pool executing io_context::run().
Maybe there are some ideas here? https://github.com/XRPLF/rippled/blob/f18c6dfea7870132490124e1942901a6a0cddc... Thanks
Thanks! We already have a similar metric. The problem is that measured latency grows only after io_context becomes overloaded: when the queue grows because all threads are busy. I would like to measure io_context load before it became overloaded to estimate capacity. Thanks, Dmitry On Fri, Jun 30, 2023 at 17:00, Vinnie Falco via Boost-users < boost-users@lists.boost.org>:
On Fri, Jun 30, 2023 at 6:58 AM Dmitry via Boost-users
wrote: I would like to have a metric that measures time spent for work (executing completion handlers) vs time spent waiting for work in thread pool executing io_context::run().
Maybe there are some ideas here?
https://github.com/XRPLF/rippled/blob/f18c6dfea7870132490124e1942901a6a0cddc...
Thanks _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org https://lists.boost.org/mailman/listinfo.cgi/boost-users
What you are asking for is more or less possible, but what do you plan on doing with this information?
Thanks for asking! After thinking more about the metric, it does not seem
helpful anymore.
There were two scenarios which I wanted to improve:
1. Estimate service capacity. For example, if io_context load goes above
80%, we should add new nodes to avoid latency spikes. But if we measure
io_context load every second, then 80% means that there may be very busy
800ms and idle 200ms. During busy 800ms there may be large latency spikes
(up to 800ms), io_context is overloaded but the metric does not show that.
2. In investigations of user-facing latency issues, knowing that io_context
was overloaded would be very helpful but the metric may not show that.
Scenario #2 is partially solved by the metric you suggested before (except
for cases with very short operations that start and end between metric
measurements).
Scenario #1 - for now I have no ideas for it.
Regards,
Dmitry
On Mon, Jul 3, 2023 at 15:43, Vinnie Falco
On Fri, Jun 30, 2023 at 7:31 AM Dmitry
wrote: I would like to measure io_context load before it became overloaded to estimate capacity.
What you are asking for is more or less possible, but what do you plan on doing with this information?
On Mon, Jul 3, 2023 at 6:00 AM Dmitry
What you are asking for is more or less possible, but what do you plan on doing with this information? ... Thanks for asking! After thinking more about the metric, it does not seem helpful anymore.
I would be careful using the information gained from measurements to inform algorithms for dealing with load. Note that io_context threads are not designed to perform long-running tasks; it is an unwritten rule that completion handlers should not block. They need to do their job and return as quickly as possible. Long-running work should be scheduled to a separate thread pool. Thanks
You can create a wrapper around the Asio executor and incorporate
counters (or timestamps) to track tasks before and after their
execution.
For example, the following monitors the number of tasks in the executor queue:
https://godbolt.org/z/Mf9c5nfK4
https://gist.github.com/ashtum/19eb64eae51b150b4fc5086f9790c1dc
Thanks
On Mon, Jul 3, 2023 at 4:30 PM Dmitry via Boost-users
What you are asking for is more or less possible, but what do you plan on doing with this information?
Thanks for asking! After thinking more about the metric, it does not seem helpful anymore.
There were two scenarios which I wanted to improve: 1. Estimate service capacity. For example, if io_context load goes above 80%, we should add new nodes to avoid latency spikes. But if we measure io_context load every second, then 80% means that there may be very busy 800ms and idle 200ms. During busy 800ms there may be large latency spikes (up to 800ms), io_context is overloaded but the metric does not show that. 2. In investigations of user-facing latency issues, knowing that io_context was overloaded would be very helpful but the metric may not show that.
Scenario #2 is partially solved by the metric you suggested before (except for cases with very short operations that start and end between metric measurements).
Scenario #1 - for now I have no ideas for it.
Regards, Dmitry
On Mon, Jul 3, 2023 at 15:43, Vinnie Falco
: On Fri, Jun 30, 2023 at 7:31 AM Dmitry
wrote: I would like to measure io_context load before it became overloaded to estimate capacity.
What you are asking for is more or less possible, but what do you plan on doing with this information?
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org https://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (3)
-
Dmitry
-
Mohammad Nejati [ashtum]
-
Vinnie Falco