On Jan 15, 2014, at 3:10 AM, Knut Krause
Hi,
I'm planning on using the BGL on huge graphs the represent a grid where each vertex is a cell connected to it's 8 neighbours. I think something around 2,500,000,000 vertices would be a good number to approach but it would be great if it could handle (a lot) more later.
If I got this graph I have to find the shortest path between two vertices.
Does anyone here have experiences with graphs that large? Is this even worth a try on "normal" hardware?
The edge weights would be stored in the vertices. So if you want to go from (a) to (b) you'd have to pay the cost stored in (b). The graph is not directed either.
I read about BGLs CSR graph http://www.boost.org/doc/libs/1_55_0/libs/graph/doc/compressed_sparse_row.ht... but since I honestly have no clue how to use BGL yet I thought I'd first hear your opinons.
I know you asked about the sequential BGL, but as a data point, I’ve run the Parallel BGL on graphs of this scale. ~100 (quite old) cluster nodes with 4G each or 16-32 16G nodes, connected via a low-latency interconnect (e.g., not Ethernet/TCP), were sufficient depending on the algorithm. Strong scaling improvements are certainly possible up to a point if you have more machines available. The large surface-to-volume ratio of grid graphs should minimize contention, though in the case of shortest paths the distribution of your edge weights matters quite a bit as well. CSR is essential for memory efficiency but is straightforward to substitute in place of an adjacency list. I’m not sure what qualifies as “normal” hardware but if a modest cluster with Myrinet/Infiniband is reasonable than graphs of the scale you mention are certainly reasonable. I suppose it’s possible to run the PBGL in one of the public cloud environments, but the networking performance tends to be so abysmal with most of them that I would expect significantly increased runtimes. Hope that data-point is helpful. Regards, Nick