I recently ran into a situation where dijkstra_shortest_paths in the boost
graph library was running very slowly. When I replaced the call to
dijkstra_shortest_paths with the code below, things were a couple of orders
magnitude faster. Any idea why?
- Bhaskara
Version 1: with dijkstra_shortest_paths
ReachableCostVector GridGraph::singleSourceCosts (const Cell2D& source,
const vector<Cell2D>& dests)
{
GridDistances distances;
GridGraphVertex start = cellVertex(source);
resetIndices();
typedef map
I recently ran into a situation where dijkstra_shortest_paths in the boost graph library was running very slowly. When I replaced the call to dijkstra_shortest_paths with the code below, things were a couple of orders magnitude faster. Any idea why?
Perhaps the code below doesn't compute the shortest paths? if the "queue" in the code below refers to std::queue, it won't be a priority queue, but a first in first out queue. The priority queue would be std::priority_queue. The important question is not whether the code below has this specific bug, but whether it gives the same answer as dijkstra_shortest_paths for the input data where you observed dijkstra_shortest_paths to be running very slowly (and still being a couple of orders magnitude faster). In that case, it would really be interesting to find out why dijkstra_shortest_paths would be slower that the code below. Regards, Thomas
On Thu, Apr 30, 2009 at 1:34 PM, Bhaskara Marthi
I recently ran into a situation where dijkstra_shortest_paths in the boost graph library was running very slowly. When I replaced the call to dijkstra_shortest_paths with the code below, things were a couple of orders magnitude faster. Any idea why?
Try building in release mode? The debugging instrumentation in the BGL can kill performance. Andrew Sutton andrew.n.sutton@gmail.com
On Thu, Apr 30, 2009 at 1:34 PM, Bhaskara Marthi
I recently ran into a situation where dijkstra_shortest_paths in the boost
graph library was running very slowly. When I replaced the call to
dijkstra_shortest_paths with the code below, things were a couple of orders
magnitude faster. Any idea why?
Try building in release mode? The debugging instrumentation in the BGL can
kill performance.
Andrew Sutton
andrew.n.sutton@gmail.com I'm compiling my source files with -O3 using gcc on ubuntu. That should be sufficient, right? - Bhaskara
Bhaskara Marthi wrote:
Try building in release mode? The debugging instrumentation in the BGL can kill performance.
I'm compiling my source files with -O3 using gcc on ubuntu. That should be sufficient, right?
For release mode, NDEBUG must be defined. However, a superficial glance at the sources doesn't reveal any debugging instrumentation depending on NDEBUG. Andrew, is there any? Another question for Andrew: Why was the relaxed heap replaced with a 4-ary heap? In my experience, a normal array based binary heap is a nightmare for the cache, and I wonder whether the 4-ary heap will be much better in this respect. However, I don't know whether the cache behavior of a relaxed heap is much better. Regards, Thomas
On Fri, 1 May 2009, Thomas Klimpel wrote:
Another question for Andrew: Why was the relaxed heap replaced with a 4-ary heap? In my experience, a normal array based binary heap is a nightmare for the cache, and I wonder whether the 4-ary heap will be much better in this respect. However, I don't know whether the cache behavior of a relaxed heap is much better.
I was the one who changed the default heap type. The d-ary heap (with d = 4) was approximately 23% faster on the Dijkstra benchmark (in lists/graph/test) on x86. The d-ary heap does have largely random cache behavior (although it only has half the number of levels of a binary heap), but it does do some small scans that are contiguous. Other studies (I don't have them off-hand) have also found d-ary heaps to be good for Dijkstra's algorithm. Relaxed heaps are likely to also be bad for the cache, as you suggested, because they involve many pointer dereferences; they do have asymptotically better behavior than traditional heaps, but this benefit seems to be limited by large constants, even on large data sets. -- Jeremiah Willcock
On Fri, May 1, 2009 at 11:34 AM, Thomas Klimpel wrote: Bhaskara Marthi wrote: Try building in release mode? The debugging instrumentation in the BGL
can
kill performance. I'm compiling my source files with -O3 using gcc on ubuntu.
That should be sufficient, right? For release mode, NDEBUG must be defined. However, a superficial glance at
the sources doesn't reveal any debugging instrumentation depending on
NDEBUG. Andrew, is there any? Another question for Andrew: Why was the relaxed heap replaced with a 4-ary
heap? In my experience, a normal array based binary heap is a nightmare for
the cache, and I wonder whether the 4-ary heap will be much better in this
respect. However, I don't know whether the cache behavior of a relaxed heap
is much better. Regards,
Thomas
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users I verified that I'm compiling with NDEBUG. Compiler flags are shown below:
/usr/bin/c++ -Dtopological_graph_EXPORTS -O3 -DNDEBUG -fPIC
-I/u/bhaskara/ros/ros-pkg/world_models/topological_map/include
-I/u/bhaskara/ros/ros-pkg/deprecated/deprecated_msgs/msg/cpp
-I/u/bhaskara/ros/ros/core/rosconsole/include -I/opt/ros/include/boost-1_37
-I/u/bhaskara/ros/ros-pkg/common/tinyxml/include
-I/u/bhaskara/ros/ros-pkg/motion_planning/sbpl/src
-I/u/bhaskara/ros/ros-pkg/motion_planning/navfn/include
-I/u/bhaskara/ros/ros-pkg/motion_planning/navfn/srv/cpp
-I/u/bhaskara/ros/ros-pkg/motion_planning/navfn/msg/cpp
-I/u/bhaskara/ros/ros-pkg/world_models/costmap_2d/include
-I/u/bhaskara/ros/ros-pkg/common/laser_scan/include
-I/u/bhaskara/ros/ros-pkg/common/laser_scan/msg/cpp
-I/u/bhaskara/ros/ros-pkg/common/filters/include
-I/u/bhaskara/ros/ros-pkg/common/loki/build/loki-0.1.7/include
-I/u/bhaskara/ros/ros-pkg/common/angles/include
-I/u/bhaskara/ros/ros-pkg/common/robot_srvs/srv/cpp
-I/u/bhaskara/ros/ros-pkg/common/tf/include
-I/u/bhaskara/ros/ros-pkg/common/tf/msg/cpp
-I/u/bhaskara/ros/ros-pkg/common/tf/srv/cpp
-I/u/bhaskara/ros/ros/core/roscpp/include
-I/u/bhaskara/ros/ros/3rdparty/xmlrpc++/src
-I/u/bhaskara/ros/ros-pkg/common/bullet/build/include
-I/u/bhaskara/ros/ros-pkg/common/bullet/swig
-I/u/bhaskara/ros/ros-pkg/visualization_core/visualization_msgs/msg/cpp
-I/u/bhaskara/ros/ros-pkg/common/robot_msgs/msg/cpp
-I/u/bhaskara/ros/ros/std_msgs/msg/cpp
-I/u/bhaskara/ros/ros/core/roslib/msg/cpp
-I/u/bhaskara/ros/ros/core/roslib/include
-I/u/bhaskara/ros/ros-pkg/3rdparty/eigen/build/eigen-2.0.0
-I/opt/ros/include -I/u/bhaskara/ros/ros/3rdparty/gtest/gtest/include
-I/u/bhaskara/ros/ros-pkg/world_models/topological_map/srv/cpp
-DROS_PACKAGE_NAME=\"topological_map\" -O3 -DBT_USE_DOUBLE_PRECISION
-DBT_EULER_DEFAULT_ZYX -DBT_USE_DOUBLE_PRECISION -DBT_EULER_DEFAULT_ZYX -W
-Wall -Wno-unused-parameter -fno-strict-aliasing -o
CMakeFiles/topological_graph.dir/src/grid_graph.o -c
/u/bhaskara/ros/ros-pkg/world_models/topological_map/src/grid_graph.cpp
I also tried using my own priority-queue based version shown below. On a
graph with about 10^4 nodes and degree 4, my version takes about .05
seconds, while dijkstra_shortest_path takes ten times as long.
- Bhaskara
struct QueueItem
{
QueueItem(const GridGraphVertex& v, const double d) : v(v), d(d) {}
GridGraphVertex v;
double d;
};
inline
bool operator< (const QueueItem& q1, const QueueItem& q2)
{
return q1.d > q2.d;
}
ReachableCostVector GridGraph::singleSourceCosts (const Cell2D& source,
const vector<Cell2D>& dests)
{
GridDistances distances;
GridGraphVertex start = cellVertex(source);
priority_queue<QueueItem> q;
q.push(QueueItem(start,0.0));
while (!q.empty()) {
double d=q.top().d;
GridGraphVertex v=q.top().v;
q.pop();
if (distances.find(v)==distances.end()) {
distances[v] = d;
GridGraphAdjacencyIterator iter, end;
for (tie(iter, end)=adjacent_vertices(v, graph_); iter!=end; iter++)
if (distances.find(*iter)==distances.end())
q.push(QueueItem(*iter,d+graph_[edge(v,*iter,graph_).first].cost));
}
}
ReachableCostVector costs;
for (vector<Cell2D>::const_iterator iter=dests.begin(); iter!=dests.end();
++iter) {
GridDistances::iterator pos = distances.find(cellVertex(*iter));
if (pos==distances.end())
costs.push_back(ReachableCost(false,-42.0));
else
costs.push_back(ReachableCost(true, pos->second));
}
return costs;
}
On Fri, May 1, 2009 at 12:34 PM, Bhaskara Marthi
On Fri, May 1, 2009 at 11:34 AM, Thomas Klimpel < Thomas.Klimpel@synopsys.com> wrote:
Bhaskara Marthi wrote:
Try building in release mode? The debugging instrumentation in the BGL can kill performance.
I'm compiling my source files with -O3 using gcc on ubuntu. That should be sufficient, right?
For release mode, NDEBUG must be defined. However, a superficial glance at the sources doesn't reveal any debugging instrumentation depending on NDEBUG. Andrew, is there any?
Another question for Andrew: Why was the relaxed heap replaced with a 4-ary heap? In my experience, a normal array based binary heap is a nightmare for the cache, and I wonder whether the 4-ary heap will be much better in this respect. However, I don't know whether the cache behavior of a relaxed heap is much better.
Regards, Thomas _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
I verified that I'm compiling with NDEBUG. Compiler flags are shown below:
/usr/bin/c++ -Dtopological_graph_EXPORTS -O3 -DNDEBUG -fPIC -I/u/bhaskara/ros/ros-pkg/world_models/topological_map/include -I/u/bhaskara/ros/ros-pkg/deprecated/deprecated_msgs/msg/cpp -I/u/bhaskara/ros/ros/core/rosconsole/include -I/opt/ros/include/boost-1_37 -I/u/bhaskara/ros/ros-pkg/common/tinyxml/include -I/u/bhaskara/ros/ros-pkg/motion_planning/sbpl/src -I/u/bhaskara/ros/ros-pkg/motion_planning/navfn/include -I/u/bhaskara/ros/ros-pkg/motion_planning/navfn/srv/cpp -I/u/bhaskara/ros/ros-pkg/motion_planning/navfn/msg/cpp -I/u/bhaskara/ros/ros-pkg/world_models/costmap_2d/include -I/u/bhaskara/ros/ros-pkg/common/laser_scan/include -I/u/bhaskara/ros/ros-pkg/common/laser_scan/msg/cpp -I/u/bhaskara/ros/ros-pkg/common/filters/include -I/u/bhaskara/ros/ros-pkg/common/loki/build/loki-0.1.7/include -I/u/bhaskara/ros/ros-pkg/common/angles/include -I/u/bhaskara/ros/ros-pkg/common/robot_srvs/srv/cpp -I/u/bhaskara/ros/ros-pkg/common/tf/include -I/u/bhaskara/ros/ros-pkg/common/tf/msg/cpp -I/u/bhaskara/ros/ros-pkg/common/tf/srv/cpp -I/u/bhaskara/ros/ros/core/roscpp/include -I/u/bhaskara/ros/ros/3rdparty/xmlrpc++/src -I/u/bhaskara/ros/ros-pkg/common/bullet/build/include -I/u/bhaskara/ros/ros-pkg/common/bullet/swig -I/u/bhaskara/ros/ros-pkg/visualization_core/visualization_msgs/msg/cpp -I/u/bhaskara/ros/ros-pkg/common/robot_msgs/msg/cpp -I/u/bhaskara/ros/ros/std_msgs/msg/cpp -I/u/bhaskara/ros/ros/core/roslib/msg/cpp -I/u/bhaskara/ros/ros/core/roslib/include -I/u/bhaskara/ros/ros-pkg/3rdparty/eigen/build/eigen-2.0.0 -I/opt/ros/include -I/u/bhaskara/ros/ros/3rdparty/gtest/gtest/include -I/u/bhaskara/ros/ros-pkg/world_models/topological_map/srv/cpp -DROS_PACKAGE_NAME=\"topological_map\" -O3 -DBT_USE_DOUBLE_PRECISION -DBT_EULER_DEFAULT_ZYX -DBT_USE_DOUBLE_PRECISION -DBT_EULER_DEFAULT_ZYX -W -Wall -Wno-unused-parameter -fno-strict-aliasing -o CMakeFiles/topological_graph.dir/src/grid_graph.o -c /u/bhaskara/ros/ros-pkg/world_models/topological_map/src/grid_graph.cpp
I also tried using my own priority-queue based version shown below. On a graph with about 10^4 nodes and degree 4, my version takes about .05 seconds, while dijkstra_shortest_path takes ten times as long. - Bhaskara
I came up with a potential explanation. So my graph is actually about 10^6 nodes in size, but the initial node is in a connected component of size 10^4. Now, my implementation only scales with the size of this connected component: nodes are added to the distance map only as and when they are visited, so all those nodes outside the connected component don't matter. But, I suspect that bgl's dijkstra_shortest_paths is using data structures that scale with the graph size rather than the connected component. If heap performance scales logarithmically, the 100-fold increase in size would make the algorithm several times slower. Bgl experts: does this seem like the right explanation? If so, what should I do: are there algorithms that take advantage of the smaller connected component, or am I stuck rolling my own? - Bhaskara Also, I just checked what the time is being spent on in dijkstra. Apparently, about half the time is spent on the initialization phase of marking every node's distance as infinite. My algorithm avoids this. The graph I'm trying it on is actually a million-node graph, but the start node is in a connected component of size 10,000. So that explains part of the performance difference --- not including the initialization, my implementation is just four times faster than dijkstra_shortest_paths. Once the initialization is complete, though, are there other parts of the algorithm that scale with the overall graph size?
participants (4)
-
Andrew Sutton
-
Bhaskara Marthi
-
Jeremiah Willcock
-
Thomas Klimpel