Stig Sandø wrote:
Hei,
We're rewriting a client with boost asio but have run into some problems when stresstesting the client. The client is fetching textual and graphics from a server with one connection that is open at all time. When the client is getting large amounts of graphics data it will after awhile suddenly stop receiving data, and eventually it will hit our timeouts and the last call sent is an async_read/async_read_some. We are keeping the server well-fed with requests so there should be graphics forthcoming without pause. The problem has been seen on win32, linux and darwin when testing on a gigabit net fetching raw 1080i graphics (4M for each field), and is most frequent on darwin. This is naturally an absolute show-stopper for us.
So we are a bit loss what is going wrong and why async_read/async_read_some stops reacting in the middle of the fetch-queue, despite wireshark showing that the data is incoming. When using compression on the data the problem is harder to reproduce, which might suggest a race-condition somewhere. But our code is just using a single thread for io_service and all async-communication is triggered from this io-thread which has a work-object to keep the io_service spinning. We're also making sure there is at most one async_read and one async_write in effect at a time, roughly similar to the chat_client sample.
I would be suspicious of the 'incoming_request' queue, where is that data being popped from the queue, if it is not from the context of the io_service thread then it is not thread safe.
Has anyone seen something similar or have any input on how best to figure out what goes wrong? Are there invariants that says you cannot read and write at the same time? Some symptoms are the same in each test. When we get the last image from the socket the buffersize is zero afterwards, and the next async_read request is to transfer_at_least(1). The async_read never calls the handler for completion of this byte, so Nagle would have kicked in. It is also fairly hard to strip down to a small example using a mock server.
I have included some stripped down code below in case that might be helpful spotting something that we cant see.
Cheers, Stig
[snip ...] HTH -- Bill Somerville