[asio] async udp socket latency
I'm having difficulty with latency when running an async asio udp socket. I don't have the same problems with an identical application that uses synchronous udp. I'm communicating with a physical device which permits a maximum 5ms latency for each UDP packet, and I'm currently getting a latency of 8ms with a std dev of 0.065ms which is surprisingly precise. I'm on Mac OS X 10.10.4 with Xcode 6.3.2 (6D2105). Here are some of the key details about this problem: - Sync UDP test program - works, staying under 5ms latency 100% of the time for over an hour - Code - main file KukaFRIClientDataTest.cpp https://github.com/ahundt/grl/blob/master/test/KukaFRIClientDataTest.cpp - UDP socket communication class KukaFriClientData.hpp https://github.com/ahundt/grl/blob/master/include/grl/KukaFriClientData.hpp - Key lines - calls socket.receive_from() then socket.send() at the lowest level. - receive_from call https://github.com/ahundt/grl/blob/master/include/grl/KukaFriClientData.hpp#... - send call https://github.com/ahundt/grl/blob/master/include/grl/KukaFriClientData.hpp#... - Async UDP test program - buggy, 8ms w/ 0.065 ms std dev, continuously high latency - Code - main file KukaFRITest.cpp https://github.com/ahundt/grl/blob/master/test/KukaFRITest.cpp - High level wrapper class KukaFRIThreadSeparator.hpp https://github.com/ahundt/grl/blob/master/include/grl/KukaFRIThreadSeparator... - Low level wrapper class KukaFRI.hpp https://github.com/ahundt/grl/blob/master/include/grl/KukaFRI.hpp . - Key lines - calls socket_.async_receive_from() then that handler calls socket_.async_send_to(), which calls the final callback with the results. - async_receive_from call https://github.com/ahundt/grl/blob/master/include/grl/KukaFRI.hpp#L374 - async_send_to_call https://github.com/ahundt/grl/blob/master/include/grl/KukaFRI.hpp#L432 Does anyone have insight into why this latency issue may be occurring in the Async UDP version? Thanks. Cheers! Andrew Hundt
On 2 Jul 2015 at 3:12, Andrew Hundt wrote:
I'm having difficulty with latency when running an async asio udp socket. I don't have the same problems with an identical application that uses synchronous udp. I'm communicating with a physical device which permits a maximum 5ms latency for each UDP packet, and I'm currently getting a latency of 8ms with a std dev of 0.065ms which is surprisingly precise.
You should try your code on Linux and especially FreeBSD first. OS X can be ... weird. Secondly I'd ask this on both stackoverflow and the ASIO users mailing list, as there are fewer ASIO experts here. Thirdly, try running a busy loop on a core during the tests. I usually fire up python in a command box, and run "while 1: pass". I would also add that it is well known that async i/o attracts a ~15% latency over sync i/o. If you want absolute max performance, you create a thread per socket, and let the kernel schedule you more efficiently. However creating thousands of kernel threads is unwise on 32 bit platforms, and comes with substantial demands on perfect architecture and choice and implementation of algorithms in your code and control over the target system, but if you really absolutely need minimum socket latency and you don't want to invest in a proprietary userspace networking stack, it's about the only way to go. For most demanding absolute minimum networking latency (HPC, hedge funds) they can afford a proprietary userspace networking stack. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Niall is very thorough and careful, but your answer this time skipped a step.
On Jul 2, 2015, at 6:28 AM, Niall Douglas
wrote: ... I would also add that it is well known that async i/o attracts a ~15% latency over sync i/o. If you want absolute max performance, you create a thread per socket, and let the kernel schedule you more efficiently. However creating thousands of kernel threads is unwise on 32 bit platforms, and comes with substantial demands on perfect...
There are plenty of challenging and valuable applications that use sockets and yet do not need to deal with thousands of connections. The missing step is “If you want absolute max performance -- and your application actually has to scale to thousands of sockets -- …” I’ve had to deal with some really bad design and implementation by kids who’ve been taught that async i/o is “better”, without bothering to mention the circumstances under which it is and is not better. It’s a hidden assumption that there are no applications but internet server applications. The astute reader might surmise that I’ve suffered enough pain over this issue that it’s become a hot button for me. Andrew, you mentioned that you're talking to a physical device; that suggests that you may well not need to scale up to thousands of sockets. (For example if the network connects components within a large machine or the devices are on a LAN.) In that case your solution is to use thread-per-socket. Not only will it perform better, it’ll be easier to debug and maintain. The fact that your test program works but the async program is buggy supports this point. OK, I’ll get off my soapbox and go back into my cave… Steve Clark
Right now I have to still up to a grand total of three devices. However it
is for robot so latency is critical. Considering what everyone mentioned, I
guess I'll just try out the synchronous version first and stick with that.
On Thursday, July 2, 2015, Steven Clark
Niall is very thorough and careful, but your answer this time skipped a step.
On Jul 2, 2015, at 6:28 AM, Niall Douglas
javascript:;> wrote: ... I would also add that it is well known that async i/o attracts a ~15% latency over sync i/o. If you want absolute max performance, you create a thread per socket, and let the kernel schedule you more efficiently. However creating thousands of kernel threads is unwise on 32 bit platforms, and comes with substantial demands on perfect... There are plenty of challenging and valuable applications that use sockets and yet do not need to deal with thousands of connections. The missing step is “If you want absolute max performance -- and your application actually has to scale to thousands of sockets -- …”
I’ve had to deal with some really bad design and implementation by kids who’ve been taught that async i/o is “better”, without bothering to mention the circumstances under which it is and is not better. It’s a hidden assumption that there are no applications but internet server applications. The astute reader might surmise that I’ve suffered enough pain over this issue that it’s become a hot button for me.
Andrew, you mentioned that you're talking to a physical device; that suggests that you may well not need to scale up to thousands of sockets. (For example if the network connects components within a large machine or the devices are on a LAN.) In that case your solution is to use thread-per-socket. Not only will it perform better, it’ll be easier to debug and maintain. The fact that your test program works but the async program is buggy supports this point.
OK, I’ll get off my soapbox and go back into my cave…
Steve Clark
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org javascript:; http://lists.boost.org/mailman/listinfo.cgi/boost-users
-- Cheers! Andrew Hundt
On 07/02/2015 09:12 AM, Andrew Hundt wrote:
Does anyone have insight into why this latency issue may be occurring in the Async UDP version? Thanks.
I would expect the additional latency introduced by async operations to be sub-milliseconds, so I suspect that something in your handlers is causing the delay.
On Thu, Jul 2, 2015 at 12:12 AM, Andrew Hundt
Does anyone have insight into why this latency issue may be occurring in the Async UDP version? Thanks.
Without taking the time to look at your code or google OSX... When a thread yields on this platform, how long before it runs again? It is 10ms on a number of platforms, which may cause you grief if you're trying to do something (asynchronously) more frequently than that, i.e. 5 ms. Jonathan
On Thu, Jul 2, 2015 at 5:00 PM, Jonathan Franklin < franklin.jonathan@gmail.com> wrote:
On Thu, Jul 2, 2015 at 12:12 AM, Andrew Hundt
wrote: Does anyone have insight into why this latency issue may be occurring in the Async UDP version? Thanks.
Without taking the time to look at your code or google OSX...
When a thread yields on this platform, how long before it runs again? It is 10ms on a number of platforms, which may cause you grief if you're trying to do something (asynchronously) more frequently than that, i.e. 5 ms.
Very interesting! Thread priority could explain it since the async calls yield when they are done and the sync would be returned by the underlying system call at the right time. Thanks! Cheers! Andrew Hundt
On Thu, Jul 2, 2015 at 6:18 PM, Andrew Hundt
On Thu, Jul 2, 2015 at 5:00 PM, Jonathan Franklin < franklin.jonathan@gmail.com> wrote:
On Thu, Jul 2, 2015 at 12:12 AM, Andrew Hundt
wrote: runs again? It is 10ms on a number of platforms, which may cause you grief if you're trying to do something (asynchronously) more frequently than that, i.e. 5 ms.
Very interesting! Thread priority could explain it since the async calls yield when they are done and the sync would be returned by the underlying system call at the right time. Thanks!
Interestingly enabling realtime mode with `thread_policy_set()` didn't fix the problem, and actually increased the variation. I'm sure I could pick parameters that would reduce variation, but I don't think it would solve the 8ms problem. mean = 8.000 [ms], std = 0.176, max/min = 17.615/0.059; 6999 iter. Cheers! Andrew Hundt
participants (5)
-
Andrew Hundt
-
Bjorn Reese
-
Jonathan Franklin
-
Niall Douglas
-
Steven Clark