[asio] Problem with async_read/async_read_some not reacting to incoming data
Hei,
We're rewriting a client with boost asio but have run into some problems
when stresstesting the client. The client is fetching textual and
graphics from a server with one connection that is open at all time.
When the client is getting large amounts of graphics data it will after
awhile suddenly stop receiving data, and eventually it will hit our
timeouts and the last call sent is an async_read/async_read_some. We
are keeping the server well-fed with requests so there should be
graphics forthcoming without pause. The problem has been seen on win32,
linux and darwin when testing on a gigabit net fetching raw 1080i
graphics (4M for each field), and is most frequent on darwin. This is
naturally an absolute show-stopper for us.
So we are a bit loss what is going wrong and why
async_read/async_read_some stops reacting in the middle of the
fetch-queue, despite wireshark showing that the data is incoming. When
using compression on the data the problem is harder to reproduce, which
might suggest a race-condition somewhere. But our code is just using a
single thread for io_service and all async-communication is triggered
from this io-thread which has a work-object to keep the io_service
spinning. We're also making sure there is at most one async_read and
one async_write in effect at a time, roughly similar to the chat_client
sample.
Has anyone seen something similar or have any input on how best to
figure out what goes wrong? Are there invariants that says you cannot
read and write at the same time?
Some symptoms are the same in each test. When we get the last image
from the socket the buffersize is zero afterwards, and the next
async_read request is to transfer_at_least(1). The async_read never
calls the handler for completion of this byte, so Nagle would have
kicked in. It is also fairly hard to strip down to a small example
using a mock server.
I have included some stripped down code below in case that might be
helpful spotting something that we cant see.
Cheers,
Stig
Stacktrace during timeout of the asio-thread:
(gdb) bt
#0 0x91302f66 in kevent ()
#1 0x003e8010 in boost::asio::detail::kqueue_reactor<false>::run ()
#2 0x0045f70a in
boost::asio::detail::task_io_service ::do_one ()
#3 0x0045f8c3 in
boost::asio::detail::task_io_service ::operator() ()
#9 0x003fa62e in boost::detail::thread_data ::run ()
#10 0x003b97ce in thread_proxy ()
#11 0x913036f5 in _pthread_start ()
#12 0x913035b2 in thread_start () The connection headerfile:
#ifndef VCL_ASIO_CONNECTION_HPP
#define VCL_ASIO_CONNECTION_HPP
#include
Has anyone seen something similar or have any input on how best to figure out what goes wrong?
We use asio for very intensive streaming/playback, and we never encountered such an issue.
Are there invariants that says you cannot read and write at the same time?
No, you can.
Some symptoms are the same in each test. When we get the last image from the socket the buffersize is zero afterwards, and the next async_read request is to transfer_at_least(1). The async_read never calls the handler for completion of this byte, so Nagle would have kicked in.
One of the main invariants in asio is that every async_read ends with a call of its handler. So you have to check 2 points: 1) ensure that async_read was really called, 2) ensure that the data you're waiting for were *not* received during in the previous call of async_read.
Stig Sandø wrote:
Hei,
We're rewriting a client with boost asio but have run into some problems when stresstesting the client. The client is fetching textual and graphics from a server with one connection that is open at all time. When the client is getting large amounts of graphics data it will after awhile suddenly stop receiving data, and eventually it will hit our timeouts and the last call sent is an async_read/async_read_some. We are keeping the server well-fed with requests so there should be graphics forthcoming without pause. The problem has been seen on win32, linux and darwin when testing on a gigabit net fetching raw 1080i graphics (4M for each field), and is most frequent on darwin. This is naturally an absolute show-stopper for us.
So we are a bit loss what is going wrong and why async_read/async_read_some stops reacting in the middle of the fetch-queue, despite wireshark showing that the data is incoming. When using compression on the data the problem is harder to reproduce, which might suggest a race-condition somewhere. But our code is just using a single thread for io_service and all async-communication is triggered from this io-thread which has a work-object to keep the io_service spinning. We're also making sure there is at most one async_read and one async_write in effect at a time, roughly similar to the chat_client sample.
I would be suspicious of the 'incoming_request' queue, where is that data being popped from the queue, if it is not from the context of the io_service thread then it is not thread safe.
Has anyone seen something similar or have any input on how best to figure out what goes wrong? Are there invariants that says you cannot read and write at the same time? Some symptoms are the same in each test. When we get the last image from the socket the buffersize is zero afterwards, and the next async_read request is to transfer_at_least(1). The async_read never calls the handler for completion of this byte, so Nagle would have kicked in. It is also fairly hard to strip down to a small example using a mock server.
I have included some stripped down code below in case that might be helpful spotting something that we cant see.
Cheers, Stig
[snip ...] HTH -- Bill Somerville
participants (3)
-
Bill Somerville
-
Igor R
-
Stig Sandø