[BGL] Using BGL for huge graphs

Erik Arner

31 Jan 2003 31 Jan '03

11:58 a.m.

Hi, has anybody tried using BGL for really huge graphs? I mean graphs that are so big that there's no way to store the whole graph in memory. Any ideas on how it should be implemented? My first thought is that it should be fairly straightforward: simply construct classes that model Sequence or RandomAccessContainer for the VertexList parameter etc. The classes store part of the data in memory and read/write data to disc when necessary. Any pitfalls, gotchas or other stuff I should be aware of before embarking on this? Thanks, /Erik

Show replies by date

David Abrahams

1 Feb 1 Feb

2:21 p.m.

Erik Arner <yg-boost-users@m.gmane.org> writes:

...

My first thought is that it should be fairly straightforward: simply construct classes that model Sequence or RandomAccessContainer for the VertexList parameter etc. The classes store part of the data in memory and read/write data to disc when necessary.

Any pitfalls, gotchas or other stuff I should be aware of before embarking on this?

Modelling Sequence or RandomAccessContainer correctly for data that is not always in-memory is notoriously difficult. Typically it requires iterators which store their value_type internally, so dereferencing returns an internal reference to the iterator itself. The Boost Iterator Adaptors library can be a help with constructing conforming iterators (also notoriously difficult). Of course, the BGL's use of descriptors instead of heavy data may help get you around this problem. You might consider whether the problem gets any easier if you don't try to use adjacency_list, but instead model the Graph concept yourself. I've done that several times; it's not as hard as it might seem. HTH, -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution

Erik Arner

3 Feb 3 Feb

10:21 a.m.

David Abrahams wrote:

...

Erik Arner <yg-boost-users@m.gmane.org> writes:

Modelling Sequence or RandomAccessContainer correctly for data that is not always in-memory is notoriously difficult. Typically it requires iterators which store their value_type internally, so dereferencing returns an internal reference to the iterator itself.

OK, as my knowledge regarding this is fairly limited, do you have any pointers to books or articles (on paper or on the web) that discuss this technique?

...

The Boost Iterator Adaptors library can be a help with constructing conforming iterators (also notoriously difficult).

Point taken.

...

Of course, the BGL's use of descriptors instead of heavy data may help get you around this problem.

If you have the time, please elaborate!

...

You might consider whether the problem gets any easier if you don't try to use adjacency_list, but instead model the Graph concept yourself. I've done that several times; it's not as hard as it might seem.

This certainly seems like a better solution than my original idea. Thanks! /Erik

Chris Russell

2:16 p.m.

Erik, your problem is interesting. Erik wrote: ... When swapping back and forth, the OS doesn't really know what's going on, whereas I might have the knowledge of the order in which the different parts of the graph are processed etc. Then again, maybe this argument is bogus and the OS actually is better at deciding what parts should be kept in memory for the moment. --- You hit on a key point - you do have some knowledge of your graph's semantics and the algorithms you're running so you it's likely that you _can_ make some assertions about what the memory access patterns are likely to be. I'll let Dave step in elaborate further on exactly what he was talking about but I think that generally speaking, if your runtime heap requirements exceed physical memory, then ultimately you need to worry about paging virtual memory to/from physical memory. This brings up an interesting question: how best to minimize the paging? Several general truisms: when you're executing your graph algorithm (regardless of if using BGL or your own hand-coded graph abstraction) you want to work hard to get the memory footprint of the algorithm aligned with a page boundary and hopefully small enough to remain resident in the CPU's cache memory. Similarly, as your algorithm processes data addressed in process virtual memory space, you would like to minimize the frequency of cross paging boundaries. Now this depends on your graph topology and algorithm - but I guess you have some data about this. I think Dave was eluding to the fact that you might be able to control the memory layout of the graph data more deterministically if you hand-code your graph abstraction to explicitly address this memory layout issue. I can't speak for other operating systems, but at least on Windows it's possible for a user mode application to manually allocate contiguous virtual memory regions with calls through the Windows API to the kernel. You can then use this huge chunk of contiguous virtual memory to store your data using the knowledge you have about the order that things will be accessed in order to minimize the number of page crossings during execution of your algorithm. Tools like Intel's VTune are invaluable for evaluating your runtime performance as it will give you cache and paging metrics. VTune is available for Windows and now Linux as well (as is Intel's C++ compiler (what I'm using) and high-performance SIMD instruction support libraries). And when you're satisfied you can do no better, then if it's still not fast enough (and it never is) then you can decide if shelling out the $$$ for things like striped fiber channel drive arrays controlled by 64-bit PCI controllers is worth it. They certainly make a difference - we used this monsters a lot back when I was working on video editing stuff. Hope this helps and good luck! What are you crunching out of curiosity (if you're at liberty to say?) - Regards cdr "Erik Arner" <yg-boost-users@m.gmane.org> wrote in message news:3E3E428D.90107@cgb.ki.se...

...

David Abrahams wrote:

...
Erik Arner <yg-boost-users@m.gmane.org> writes:

Modelling Sequence or RandomAccessContainer correctly for data that is not always in-memory is notoriously difficult. Typically it requires iterators which store their value_type internally, so dereferencing returns an internal reference to the iterator itself.

OK, as my knowledge regarding this is fairly limited, do you have any pointers to books or articles (on paper or on the web) that discuss this technique?

...
The Boost Iterator Adaptors library can be a help with constructing conforming iterators (also notoriously difficult).

Point taken.

...
Of course, the BGL's use of descriptors instead of heavy data may help get you around this problem.

If you have the time, please elaborate!

...
You might consider whether the problem gets any easier if you don't try to use adjacency_list, but instead model the Graph concept yourself. I've done that several times; it's not as hard as it might seem.

This certainly seems like a better solution than my original idea. Thanks!

/Erik

Info: <http://www.boost.org> Wiki: <http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl> Unsubscribe: <mailto:boost-users-unsubscribe@yahoogroups.com>

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Erik Arner

4 p.m.

Chris Russell wrote:

...

[lots of valuable tips about caching, paging etc]

Thanks for all the input - I guess the problem is much, much harder than I expected (as always... :-) My initial hope was to develop the app in BGL (since I'm somewhat familiar with it) and try it out on smaller examples just to get everything up and running. Then I could optimize the graph data structure for huge datasets without touching the actual code that uses the graph. Now I'm starting to think that this is a stupid idea and that I'm better off coding the thing from scratch with the huge dataset in mind.

...

And when you're satisfied you can do no better, then if it's still not fast enough (and it never is) then you can decide if shelling out the $$$ for things like striped fiber channel drive arrays controlled by 64-bit PCI controllers is worth it. They certainly make a difference - we used this monsters a lot back when I was working on video editing stuff.

Unfortunately it's not up to me - I'd have to convince my boss which could be... ahem... somewhat tricky :-)

...

Hope this helps and good luck! What are you crunching out of curiosity (if you're at liberty to say?)

It's a tool for shotgun sequence assembly, a technique for puzzling together large quantities of short DNA sequences into longer contigous sequences. Google on "shotgun sequencing" for more info. Again, thanks for valuable comments! /Erik

Chris Russell

5:05 p.m.

...

It's a tool for shotgun sequence assembly, a technique for puzzling together large quantities of short DNA sequences into longer contigous sequences. Google on "shotgun sequencing" for more info.

^- Man - that's a cool project. My googling on "shotgun sequencing" could easily consume a week or two of full-time surfing. I'm using BGL to build a generic framework for building generative programming applications. There a tie-in somewhere between GP and genetics, but it's elusive to say the least. Still can't help musing about it though. Anyway, good luck w/your sequencing! - cdr

Chris Russell

4:39 a.m.

Erik, how huge is huge? Heap requirements in excess of 2^64 bytes huge? If so, that's one _massive_ graph and some supercomputer programmer will have to help. If not, then why not leverage the full virtual memory space allocated to your process and let the OS worry about paging the data in/out of physical memory? Seems like if you can fit into 2^32 (Xeon workstation) or 2^64 then an Itanium or Alpha (4-processor Alpha servers running DEC UNIX are real cheap on eBay these days) with a fiber channel drive array (striped config for maximum throughput) might do the trick? If this is even close to reasonable suggestion then you might check out the non-linear video editing guys for ideas on drive arrays (e.g. http://www.avid.com, http://www.media100.com). - Regards cdr "Erik Arner" <yg-boost-users@m.gmane.org> wrote in message news:3E3A64FB.90002@cgb.ki.se...

...

Hi,

has anybody tried using BGL for really huge graphs? I mean graphs that are so big that there's no way to store the whole graph in memory. Any ideas on how it should be implemented?

My first thought is that it should be fairly straightforward: simply construct classes that model Sequence or RandomAccessContainer for the VertexList parameter etc. The classes store part of the data in memory and read/write data to disc when necessary.

Any pitfalls, gotchas or other stuff I should be aware of before embarking on this?

Thanks, /Erik

Info: <http://www.boost.org> Wiki: <http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl> Unsubscribe: <mailto:boost-users-unsubscribe@yahoogroups.com>

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Erik Arner

10:11 a.m.

Chris Russell wrote:

...

Erik, how huge is huge? Heap requirements in excess of 2^64 bytes huge? If so, that's one _massive_ graph and some supercomputer programmer will have to help.

Well, 2^64 bytes is several orders of magnitude larger than I had in mind. I guess my graph would be something like 1000 GB tops. It's still a pretty massive graph though...

...

If not, then why not leverage the full virtual memory space allocated to your process and let the OS worry about paging the data in/out of physical memory?

This is definitely an alternative, but my worry is that this solution might be much slower than it needs to be. When swapping back and forth, the OS doesn't really know what's going on, whereas I might have the knowledge of the order in which the different parts of the graph are processed etc. Then again, maybe this argument is bogus and the OS actually is better at deciding what parts should be kept in memory for the moment.

...

Seems like if you can fit into 2^32 (Xeon workstation) or 2^64 then an Itanium or Alpha (4-processor Alpha servers running DEC UNIX are real cheap on eBay these days) with a fiber channel drive array (striped config for maximum throughput) might do the trick? If this is even close to reasonable suggestion then you might check out the non-linear video editing guys for ideas on drive arrays (e.g. http://www.avid.com, http://www.media100.com).

Thanks for the tip, it'll be worth looking into. /Erik

Jeppe N. Madsen

2:55 p.m.

...

Chris Russell wrote:

...
Erik, how huge is huge? Heap requirements in excess of 2^64 bytes huge? If so, that's one _massive_ graph and some supercomputer programmer will have to help.

Well, 2^64 bytes is several orders of magnitude larger than I had in mind. I guess my graph would be something like 1000 GB tops. It's still a pretty massive graph though...

I think the term you're looking for is "External-Memory Graph Algorithms".....googling turns up some stuff which may be interesting. Note that this is an active area of research so it may not be easy to make these algorithms work with BGL....

...

...
If not, then why not leverage the full virtual memory space allocated to your process and let the OS worry about paging the data in/out of physical memory?

This is definitely an alternative, but my worry is that this solution might be much slower than it needs to be. When swapping back and forth, the OS doesn't really know what's going on, whereas I might have the knowledge of the order in which the different parts of the graph are processed etc.

Letting the OS worry about paging is usually not a good idea if you have some advance knowledge on how you'll use your data. Running times can be decreased by several orders of magnitude by paying attention to how memory is used. Check out papers on cache aware/cache oblivious/external memory/ algorithms for further details.... rgds Jeppe

Erik Arner

4 p.m.

Jeppe N. Madsen wrote:

...

I think the term you're looking for is "External-Memory Graph Algorithms".....googling turns up some stuff which may be interesting.

Thanks - lots of interesting stuff indeed!

...

Note that this is an active area of research so it may not be easy to make these algorithms work with BGL....

Which is a pity since the BGL provides so much functionality out of the box. It would really be a pain to have to implement everything from scratch...

...

Check out papers on cache aware/cache oblivious/external memory/ algorithms for further details....

Will do! Thanks, /Erik

8187

Age (days ago)

8190

Last active (days ago)

List overview

Download

3 comments

3 participants

participants (3)

Chris Russell
David Abrahams
Erik Arner
Jeppe N. Madsen