Erik, your problem is interesting.
Erik wrote:
...
When swapping back and forth,
the OS doesn't really know what's going on, whereas I might have the
knowledge of the order in which the different parts of the graph are
processed etc.
Then again, maybe this argument is bogus and the OS actually is better
at deciding what parts should be kept in memory for the moment.
---
You hit on a key point - you do have some knowledge of your graph's
semantics and the algorithms you're running so you it's likely that you
_can_ make some assertions about what the memory access patterns are likely
to be. I'll let Dave step in elaborate further on exactly what he was
talking about but I think that generally speaking, if your runtime heap
requirements exceed physical memory, then ultimately you need to worry about
paging virtual memory to/from physical memory. This brings up an interesting
question: how best to minimize the paging? Several general truisms: when
you're executing your graph algorithm (regardless of if using BGL or your
own hand-coded graph abstraction) you want to work hard to get the memory
footprint of the algorithm aligned with a page boundary and hopefully small
enough to remain resident in the CPU's cache memory. Similarly, as your
algorithm processes data addressed in process virtual memory space, you
would like to minimize the frequency of cross paging boundaries. Now this
depends on your graph topology and algorithm - but I guess you have some
data about this. I think Dave was eluding to the fact that you might be able
to control the memory layout of the graph data more deterministically if you
hand-code your graph abstraction to explicitly address this memory layout
issue. I can't speak for other operating systems, but at least on Windows
it's possible for a user mode application to manually allocate contiguous
virtual memory regions with calls through the Windows API to the kernel. You
can then use this huge chunk of contiguous virtual memory to store your data
using the knowledge you have about the order that things will be accessed in
order to minimize the number of page crossings during execution of your
algorithm. Tools like Intel's VTune are invaluable for evaluating your
runtime performance as it will give you cache and paging metrics. VTune is
available for Windows and now Linux as well (as is Intel's C++ compiler
(what I'm using) and high-performance SIMD instruction support libraries).
And when you're satisfied you can do no better, then if it's still not fast
enough (and it never is) then you can decide if shelling out the $$$ for
things like striped fiber channel drive arrays controlled by 64-bit PCI
controllers is worth it. They certainly make a difference - we used this
monsters a lot back when I was working on video editing stuff. Hope this
helps and good luck! What are you crunching out of curiosity (if you're at
liberty to say?)
- Regards
cdr
"Erik Arner"
David Abrahams wrote:
Erik Arner
writes: Modelling Sequence or RandomAccessContainer correctly for data that is not always in-memory is notoriously difficult. Typically it requires iterators which store their value_type internally, so dereferencing returns an internal reference to the iterator itself.
OK, as my knowledge regarding this is fairly limited, do you have any pointers to books or articles (on paper or on the web) that discuss this technique?
The Boost Iterator Adaptors library can be a help with constructing conforming iterators (also notoriously difficult).
Point taken.
Of course, the BGL's use of descriptors instead of heavy data may help get you around this problem.
If you have the time, please elaborate!
You might consider whether the problem gets any easier if you don't try to use adjacency_list, but instead model the Graph concept yourself. I've done that several times; it's not as hard as it might seem.
This certainly seems like a better solution than my original idea. Thanks!
/Erik
Info: http://www.boost.org Wiki: http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl Unsubscribe: mailto:boost-users-unsubscribe@yahoogroups.com
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/