Re: [boost] [Fibers] Performance

11 Jan 2014

      Oliver,
...
...
Do you have some performance data for your fiber implementation? What
is the
(amortized) overhead introduced for one fiber (i.e. the average time
required to create, schedule, execute, and delete one fiber which runs
an empty function, when executing a larger number of those, perhaps
500.000 fibers)? It would be interesting to see this number when
giving 1..N cores to the scheduler.
unfortunately I've no performance tests yet - maybe I'll write one after
some optimizations (like replacing the stl containers by a single linked
list of intrusive_ptr).
I'd write the test before starting to do optimizations.
...
I'm not sure what a fiber should execute within such a test. should the
fiber-function have an empty body (e.g. execute nothing)? or should it at
least yield one time?
Well, that are two separate performance tests already :-P 
However, having it yielding just adds two more context switches and a
scheduling cycle, thus I'd expect not too much additional insight from this.

While you're at it, I'd suggest to also write a test measuring the overhead
of using futures. 

For an idea how such tests could look like, you might want to glance here:
https://github.com/STEllAR-GROUP/hpx/tree/master/tests/performance.
...
if the code executed by the fiber does nothing then the execution time
will be determined by the algorithm for memory allocation of the clib. the
context switches for resuming ans suspending the fiber and the time
required to insert and remove the fiber from the ready-queue inside the
the fiber-scheduler.
That's assumptions you're having which are by no means conclusive. From our
experience with HPX (https://github.com/STEllAR-GROUP/hpx) the overheads for
a fiber (which is a hpx::thread in our case) are determined by many more
factors than just the memory allocator. Things like contention caused by the
work stealing or by NUMA effects such when you start stealing across NUMA
domains usually overshadow the memory allocation costs. Additionally, the
quality of the scheduler implementation affects things gravely.
...
this queue is currently a stl container and will be replaced by a single-
linked list of intrusive-ptrs.
If you had a performance test you'd immediately see whether this improves
your performance. Doing optimizations based on gut feelings are most of the
time not very effective, you need measurements to support your work.
...
a context switch (suspending/resuming a coroutine) needs ca. 80 CPU cycles
on Intel Core2 Q6700 (64bit Linux).
Sure, but this does not tell you how much time is consumed by executing
those. The actual execution time will be determined by many factors, such a
caching effects, TLB misses, memory bandwidth limitations and other
contention effects.

IMHO, for this library to be accepted, it has to prove to be of high quality
which implies best possible performance. You might want to compare the
performance of your library with other existing solutions (for instance TBB,
qthreads, openmp, HPX). The link I provided above will give you a set of
trivial tests for those. Moreover, we'd be happy to add an equivalent test
for your library to our repository.

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu

Re: [boost] [Fibers] Performance

Hartmut Kaiser