Nial,
A correction : turns out the logic for memory mapping was different from
fread/fwrite logic, in my program.
When they were equal, timing was exactly the same using both methods.
Thanks again for your help,
Aaron
On Sat, Apr 23, 2016 at 9:02 AM, Aaron Boxer
On Fri, Apr 22, 2016 at 2:18 PM, Niall Douglas
wrote: On 22 Apr 2016 at 10:31, Aaron Boxer wrote:
My impression is that memory mapping is best when reading a file more than once, because the first read gets cached in virtual memory system, so subsequent reads don't have to go to disk. Also, it eliminates system calls, using simple buffer access instead
Since memory mapping acts as a cache, it can create memory pressure on the virtual memory system, as pages need to be recycled for the next usage. And this can slow things down, particularly when reading files whose total size meets are exceeds current physical memory.
In my case, I am reading the file only once, so I think the normal file IO methods will be better. Don't know until I benchmark.
You appear to have a flawed understanding of unified page cache kernels (pretty much all OSs nowadays apart from QNX and OpenBSD).
Unless O_DIRECT is on, *all* reads and writes are memcpy()ied from/to the page cache. *Always*.
mmap() simply wires parts of the page cache into your process unmodified. Memory mapped i/o therefore saves on a memcpy(), and is therefore the most efficient cached i/o you can do.
If you are not on Linux, a read() or write() of >= 4Kb on a 4Kb aligned boundary may be optimised into a page steal by the kernel of that memory page into the page cache such that DMA can be directed immediately into userspace. But, technically speaking, this is still DMA into the kernel page cache as normal, it's just the page is wired into userspace already.
So basically you only slow down your code using read() or write(). Use mapped files unless the cost of the memcpy() done by the read() is lower than a mmap(). This is typically 16Kb or so, but it depends on memory bandwidth pressure and processor architecture. That part you should benchmark.
Obviously all the above is with O_DIRECT off. Turning it on is a whole other kettle of fish, and I wouldn't recommend you do that unless you have many months of time to hand to write and optimise your own caching algorithm, and even then 99% of the time you won't beat the kernel's implementation which has had decades of tuning and optimisation.
Thanks a lot for the detailed explanation.
I tested this on windows : fread/fwrite and memory mapped both gave the same performance in my use case. So, it doesn't look like mem mapping will make much of a difference on windows for my case. Need to test this on Linux.
Aaron
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost