Hi Knut,
For the shortest-paths algorithm itself you’re probably looking at minutes, most likely a small number of them. The diameter of your graph and the parameters to your algorithm regarding how much work to do in each phase will determine the number of super-steps required, which plays a large role in the overall runtime of the algorithm.
In my experience it takes longer to import the data and build the graph than to actually run a single instance of an algorithm. The performance of your distributed filesystem plays a big role here. Also, the CSR graph has special “in-place” constructors that allow you to build your graph without duplicating the edges, which doubles the amount of graph data you can store on each node which are probably worth looking into.
-Nick
On Jan 21, 2014, at 12:12 AM, Knut Krause
Hi Nick,
thanks for your reply. Could give a guess how long path finding would take on your suggested setup? It doesn't have to be precise just something like seconds, minutes, hours, days.
Knut
Am Dienstag, 21. Januar 2014, 00:06:23 schrieb Nicholas Edmonds:
On Jan 15, 2014, at 3:10 AM, Knut Krause
wrote: Hi,
I'm planning on using the BGL on huge graphs the represent a grid where each vertex is a cell connected to it's 8 neighbours. I think something around 2,500,000,000 vertices would be a good number to approach but it would be great if it could handle (a lot) more later.
If I got this graph I have to find the shortest path between two vertices.
Does anyone here have experiences with graphs that large? Is this even worth a try on "normal" hardware?
The edge weights would be stored in the vertices. So if you want to go from (a) to (b) you'd have to pay the cost stored in (b). The graph is not directed either.
I read about BGLs CSR graph http://www.boost.org/doc/libs/1_55_0/libs/graph/doc/compressed_sparse_row. html but since I honestly have no clue how to use BGL yet I thought I'd first hear your opinons.
I know you asked about the sequential BGL, but as a data point, I’ve run the Parallel BGL on graphs of this scale. ~100 (quite old) cluster nodes with 4G each or 16-32 16G nodes, connected via a low-latency interconnect (e.g., not Ethernet/TCP), were sufficient depending on the algorithm. Strong scaling improvements are certainly possible up to a point if you have more machines available. The large surface-to-volume ratio of grid graphs should minimize contention, though in the case of shortest paths the distribution of your edge weights matters quite a bit as well. CSR is essential for memory efficiency but is straightforward to substitute in place of an adjacency list.
I’m not sure what qualifies as “normal” hardware but if a modest cluster with Myrinet/Infiniband is reasonable than graphs of the scale you mention are certainly reasonable. I suppose it’s possible to run the PBGL in one of the public cloud environments, but the networking performance tends to be so abysmal with most of them that I would expect significantly increased runtimes.
Hope that data-point is helpful.
Regards, Nick _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users