Re: [boost] Designing a multi-threaded file parser

30 Apr 2016

      Nial,

A correction : turns out the logic for memory mapping was different from
fread/fwrite logic, in my program.
When they were equal, timing was exactly the same using both methods.

Thanks again for your help,
Aaron

On Sat, Apr 23, 2016 at 9:02 AM, Aaron Boxer <boxerab@gmail.com> wrote:
...
On Fri, Apr 22, 2016 at 2:18 PM, Niall Douglas <s_sourceforge@nedprod.com>
wrote:
...
On 22 Apr 2016 at 10:31, Aaron Boxer wrote:
...
My impression is that memory mapping is best when reading a file more
than
once, because
the first read gets cached in virtual memory system, so subsequent reads
don't have to go to disk.
Also, it eliminates system calls, using simple buffer access instead
Since memory mapping acts as a cache, it can create memory pressure on
the
virtual memory system,
as pages need to be recycled for the next usage. And this can slow
things
down, particularly when reading
files whose total size meets are exceeds current physical memory.
In my case, I am reading the file only once, so I think the normal file
IO
methods will be better.
Don't know until I benchmark.
You appear to have a flawed understanding of unified page cache
kernels (pretty much all OSs nowadays apart from QNX and OpenBSD).
Unless O_DIRECT is on, *all* reads and writes are memcpy()ied from/to
the page cache. *Always*.
mmap() simply wires parts of the page cache into your process
unmodified. Memory mapped i/o therefore saves on a memcpy(), and is
therefore the most efficient cached i/o you can do.
If you are not on Linux, a read() or write() of >= 4Kb on a 4Kb
aligned boundary may be optimised into a page steal by the kernel of
that memory page into the page cache such that DMA can be directed
immediately into userspace. But, technically speaking, this is still
DMA into the kernel page cache as normal, it's just the page is wired
into userspace already.
So basically you only slow down your code using read() or write().
Use mapped files unless the cost of the memcpy() done by the read()
is lower than a mmap(). This is typically 16Kb or so, but it depends
on memory bandwidth pressure and processor architecture. That part
you should benchmark.
Obviously all the above is with O_DIRECT off. Turning it on is a
whole other kettle of fish, and I wouldn't recommend you do that
unless you have many months of time to hand to write and optimise
your own caching algorithm, and even then 99% of the time you won't
beat the kernel's implementation which has had decades of tuning and
optimisation.
Thanks a lot for the detailed explanation.
I tested this on windows : fread/fwrite and memory mapped both gave the
same performance
in my use case. So, it doesn't look like mem mapping will make much of a
difference on windows
for my case.  Need to test this on Linux.
Aaron
...
_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost