[BGL] Using BGL for huge graphs
Hi, has anybody tried using BGL for really huge graphs? I mean graphs that are so big that there's no way to store the whole graph in memory. Any ideas on how it should be implemented? My first thought is that it should be fairly straightforward: simply construct classes that model Sequence or RandomAccessContainer for the VertexList parameter etc. The classes store part of the data in memory and read/write data to disc when necessary. Any pitfalls, gotchas or other stuff I should be aware of before embarking on this? Thanks, /Erik
Erik Arner
My first thought is that it should be fairly straightforward: simply construct classes that model Sequence or RandomAccessContainer for the VertexList parameter etc. The classes store part of the data in memory and read/write data to disc when necessary.
Any pitfalls, gotchas or other stuff I should be aware of before embarking on this?
Modelling Sequence or RandomAccessContainer correctly for data that is not always in-memory is notoriously difficult. Typically it requires iterators which store their value_type internally, so dereferencing returns an internal reference to the iterator itself. The Boost Iterator Adaptors library can be a help with constructing conforming iterators (also notoriously difficult). Of course, the BGL's use of descriptors instead of heavy data may help get you around this problem. You might consider whether the problem gets any easier if you don't try to use adjacency_list, but instead model the Graph concept yourself. I've done that several times; it's not as hard as it might seem. HTH, -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
David Abrahams wrote:
Erik Arner
writes: Modelling Sequence or RandomAccessContainer correctly for data that is not always in-memory is notoriously difficult. Typically it requires iterators which store their value_type internally, so dereferencing returns an internal reference to the iterator itself.
OK, as my knowledge regarding this is fairly limited, do you have any pointers to books or articles (on paper or on the web) that discuss this technique?
The Boost Iterator Adaptors library can be a help with constructing conforming iterators (also notoriously difficult).
Point taken.
Of course, the BGL's use of descriptors instead of heavy data may help get you around this problem.
If you have the time, please elaborate!
You might consider whether the problem gets any easier if you don't try to use adjacency_list, but instead model the Graph concept yourself. I've done that several times; it's not as hard as it might seem.
This certainly seems like a better solution than my original idea. Thanks! /Erik
Erik, your problem is interesting.
Erik wrote:
...
When swapping back and forth,
the OS doesn't really know what's going on, whereas I might have the
knowledge of the order in which the different parts of the graph are
processed etc.
Then again, maybe this argument is bogus and the OS actually is better
at deciding what parts should be kept in memory for the moment.
---
You hit on a key point - you do have some knowledge of your graph's
semantics and the algorithms you're running so you it's likely that you
_can_ make some assertions about what the memory access patterns are likely
to be. I'll let Dave step in elaborate further on exactly what he was
talking about but I think that generally speaking, if your runtime heap
requirements exceed physical memory, then ultimately you need to worry about
paging virtual memory to/from physical memory. This brings up an interesting
question: how best to minimize the paging? Several general truisms: when
you're executing your graph algorithm (regardless of if using BGL or your
own hand-coded graph abstraction) you want to work hard to get the memory
footprint of the algorithm aligned with a page boundary and hopefully small
enough to remain resident in the CPU's cache memory. Similarly, as your
algorithm processes data addressed in process virtual memory space, you
would like to minimize the frequency of cross paging boundaries. Now this
depends on your graph topology and algorithm - but I guess you have some
data about this. I think Dave was eluding to the fact that you might be able
to control the memory layout of the graph data more deterministically if you
hand-code your graph abstraction to explicitly address this memory layout
issue. I can't speak for other operating systems, but at least on Windows
it's possible for a user mode application to manually allocate contiguous
virtual memory regions with calls through the Windows API to the kernel. You
can then use this huge chunk of contiguous virtual memory to store your data
using the knowledge you have about the order that things will be accessed in
order to minimize the number of page crossings during execution of your
algorithm. Tools like Intel's VTune are invaluable for evaluating your
runtime performance as it will give you cache and paging metrics. VTune is
available for Windows and now Linux as well (as is Intel's C++ compiler
(what I'm using) and high-performance SIMD instruction support libraries).
And when you're satisfied you can do no better, then if it's still not fast
enough (and it never is) then you can decide if shelling out the $$$ for
things like striped fiber channel drive arrays controlled by 64-bit PCI
controllers is worth it. They certainly make a difference - we used this
monsters a lot back when I was working on video editing stuff. Hope this
helps and good luck! What are you crunching out of curiosity (if you're at
liberty to say?)
- Regards
cdr
"Erik Arner"
David Abrahams wrote:
Erik Arner
writes: Modelling Sequence or RandomAccessContainer correctly for data that is not always in-memory is notoriously difficult. Typically it requires iterators which store their value_type internally, so dereferencing returns an internal reference to the iterator itself.
OK, as my knowledge regarding this is fairly limited, do you have any pointers to books or articles (on paper or on the web) that discuss this technique?
The Boost Iterator Adaptors library can be a help with constructing conforming iterators (also notoriously difficult).
Point taken.
Of course, the BGL's use of descriptors instead of heavy data may help get you around this problem.
If you have the time, please elaborate!
You might consider whether the problem gets any easier if you don't try to use adjacency_list, but instead model the Graph concept yourself. I've done that several times; it's not as hard as it might seem.
This certainly seems like a better solution than my original idea. Thanks!
/Erik
Info: http://www.boost.org Wiki: http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl Unsubscribe: mailto:boost-users-unsubscribe@yahoogroups.com
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Chris Russell wrote:
[lots of valuable tips about caching, paging etc]
Thanks for all the input - I guess the problem is much, much harder than I expected (as always... :-) My initial hope was to develop the app in BGL (since I'm somewhat familiar with it) and try it out on smaller examples just to get everything up and running. Then I could optimize the graph data structure for huge datasets without touching the actual code that uses the graph. Now I'm starting to think that this is a stupid idea and that I'm better off coding the thing from scratch with the huge dataset in mind.
And when you're satisfied you can do no better, then if it's still not fast enough (and it never is) then you can decide if shelling out the $$$ for things like striped fiber channel drive arrays controlled by 64-bit PCI controllers is worth it. They certainly make a difference - we used this monsters a lot back when I was working on video editing stuff.
Unfortunately it's not up to me - I'd have to convince my boss which could be... ahem... somewhat tricky :-)
Hope this helps and good luck! What are you crunching out of curiosity (if you're at liberty to say?)
It's a tool for shotgun sequence assembly, a technique for puzzling together large quantities of short DNA sequences into longer contigous sequences. Google on "shotgun sequencing" for more info. Again, thanks for valuable comments! /Erik
It's a tool for shotgun sequence assembly, a technique for puzzling together large quantities of short DNA sequences into longer contigous sequences. Google on "shotgun sequencing" for more info.
^- Man - that's a cool project. My googling on "shotgun sequencing" could easily consume a week or two of full-time surfing. I'm using BGL to build a generic framework for building generative programming applications. There a tie-in somewhere between GP and genetics, but it's elusive to say the least. Still can't help musing about it though. Anyway, good luck w/your sequencing! - cdr
Erik, how huge is huge? Heap requirements in excess of 2^64 bytes huge? If
so, that's one _massive_ graph and some supercomputer programmer will have
to help. If not, then why not leverage the full virtual memory space
allocated to your process and let the OS worry about paging the data in/out
of physical memory? Seems like if you can fit into 2^32 (Xeon workstation)
or 2^64 then an Itanium or Alpha (4-processor Alpha servers running DEC UNIX
are real cheap on eBay these days) with a fiber channel drive array (striped
config for maximum throughput) might do the trick? If this is even close to
reasonable suggestion then you might check out the non-linear video editing
guys for ideas on drive arrays (e.g. http://www.avid.com,
http://www.media100.com).
- Regards
cdr
"Erik Arner"
Hi,
has anybody tried using BGL for really huge graphs? I mean graphs that are so big that there's no way to store the whole graph in memory. Any ideas on how it should be implemented?
My first thought is that it should be fairly straightforward: simply construct classes that model Sequence or RandomAccessContainer for the VertexList parameter etc. The classes store part of the data in memory and read/write data to disc when necessary.
Any pitfalls, gotchas or other stuff I should be aware of before embarking on this?
Thanks, /Erik
Info: http://www.boost.org Wiki: http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl Unsubscribe: mailto:boost-users-unsubscribe@yahoogroups.com
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Chris Russell wrote:
Erik, how huge is huge? Heap requirements in excess of 2^64 bytes huge? If so, that's one _massive_ graph and some supercomputer programmer will have to help.
Well, 2^64 bytes is several orders of magnitude larger than I had in mind. I guess my graph would be something like 1000 GB tops. It's still a pretty massive graph though...
If not, then why not leverage the full virtual memory space allocated to your process and let the OS worry about paging the data in/out of physical memory?
This is definitely an alternative, but my worry is that this solution might be much slower than it needs to be. When swapping back and forth, the OS doesn't really know what's going on, whereas I might have the knowledge of the order in which the different parts of the graph are processed etc. Then again, maybe this argument is bogus and the OS actually is better at deciding what parts should be kept in memory for the moment.
Seems like if you can fit into 2^32 (Xeon workstation) or 2^64 then an Itanium or Alpha (4-processor Alpha servers running DEC UNIX are real cheap on eBay these days) with a fiber channel drive array (striped config for maximum throughput) might do the trick? If this is even close to reasonable suggestion then you might check out the non-linear video editing guys for ideas on drive arrays (e.g. http://www.avid.com, http://www.media100.com).
Thanks for the tip, it'll be worth looking into. /Erik
Chris Russell wrote:
Erik, how huge is huge? Heap requirements in excess of 2^64 bytes huge? If so, that's one _massive_ graph and some supercomputer programmer will have to help.
Well, 2^64 bytes is several orders of magnitude larger than I had in mind. I guess my graph would be something like 1000 GB tops. It's still a pretty massive graph though...
I think the term you're looking for is "External-Memory Graph Algorithms".....googling turns up some stuff which may be interesting. Note that this is an active area of research so it may not be easy to make these algorithms work with BGL....
If not, then why not leverage the full virtual memory space allocated to your process and let the OS worry about paging the data in/out of physical memory?
This is definitely an alternative, but my worry is that this solution might be much slower than it needs to be. When swapping back and forth, the OS doesn't really know what's going on, whereas I might have the knowledge of the order in which the different parts of the graph are processed etc.
Letting the OS worry about paging is usually not a good idea if you have some advance knowledge on how you'll use your data. Running times can be decreased by several orders of magnitude by paying attention to how memory is used. Check out papers on cache aware/cache oblivious/external memory/ algorithms for further details.... rgds Jeppe
Jeppe N. Madsen wrote:
I think the term you're looking for is "External-Memory Graph Algorithms".....googling turns up some stuff which may be interesting.
Thanks - lots of interesting stuff indeed!
Note that this is an active area of research so it may not be easy to make these algorithms work with BGL....
Which is a pity since the BGL provides so much functionality out of the box. It would really be a pain to have to implement everything from scratch...
Check out papers on cache aware/cache oblivious/external memory/ algorithms for further details....
Will do! Thanks, /Erik
participants (4)
-
Chris Russell
-
David Abrahams
-
Erik Arner
-
Jeppe N. Madsen