Hangs with Boost 1.56 on clang armhf
I'm seeing consistent hangs of AFIO on Boost 1.56 on Ubuntu 14.04 LTS for clang armhf. GCC armhf works fine, so this is a clang armhf related issue. https://ci.nedprod.com/view/Boost.AFIO/job/Boost.AFIO%20Test%20POSIX_A RM_clang%203.4/10/console I talked to Ubuntu upstream about this, and I had the idea of trying the LLVM armhf binaries provided at http://llvm.org/releases/3.4.2/clang+llvm-3.4.2-armv7a-linux-gnueabihf .tar.xz as surely LLVM provide good binaries. Same problem - tests hang. I tried running a test manually and I get this: terminate called after throwing an instance of 'boost::thread_interrupted' terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively Remember, this is identical code to any other Linux target, plus this works a-ok for GCC armhf. A quick backtrace with gdb reveals this: terminate called after throwing an instance of 'boost::thread_interrupted' [Thread 0xb5b76450 (LWP 12610) exited] Program received signal SIGABRT, Aborted. [Switching to Thread 0xb6376450 (LWP 12609)] __libc_do_syscall () at ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:44 44 ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S: No such file or directory. (gdb) bt #0 __libc_do_syscall () at ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:44 #1 0xb639f0fe in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #2 0xb63a1956 in __GI_abort () at abort.c:89 #3 0xb65377a8 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 #4 0xb65361c8 in ?? () from /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?) What looks to be happening here is that thread cancellation, which works by having an exception thrown, appears to not be caught by the thread being cancelled for some odd reason. I don't have time to investigate this until maybe Saturday. I guess we'll just have to ship 1.56 and look into this for 1.57. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 30 Jul 2014 at 16:47, Niall Douglas wrote:
I'm seeing consistent hangs of AFIO on Boost 1.56 on Ubuntu 14.04 LTS for clang armhf. GCC armhf works fine, so this is a clang armhf related issue. [snip] Same problem - tests hang. I tried running a test manually and I get this:
terminate called after throwing an instance of 'boost::thread_interrupted' terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively terminate called recursively
Remember, this is identical code to any other Linux target, plus this works a-ok for GCC armhf.
A quick backtrace with gdb reveals this:
terminate called after throwing an instance of 'boost::thread_interrupted' [Thread 0xb5b76450 (LWP 12610) exited]
Program received signal SIGABRT, Aborted. [Switching to Thread 0xb6376450 (LWP 12609)] __libc_do_syscall () at ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:44 44 ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S: No such file or directory. (gdb) bt #0 __libc_do_syscall () at ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:44 #1 0xb639f0fe in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #2 0xb63a1956 in __GI_abort () at abort.c:89 #3 0xb65377a8 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 #4 0xb65361c8 in ?? () from /usr/lib/arm-linux-gnueabihf/libstdc++.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
What looks to be happening here is that thread cancellation, which works by having an exception thrown, appears to not be caught by the thread being cancelled for some odd reason.
I don't have time to investigate this until maybe Saturday. I guess we'll just have to ship 1.56 and look into this for 1.57.
Good news on this, though it took a day of head scratching to figure it out. It turns out that ARM clang 3.3 and earlier implements C++ exceptions using the inefficient SJLJ exception ABI - you know, the one that calls setjmp all the time for every place an unwind might happen. Anyway, they finally got round to implementing the ARM EHABI exception ABI which is zero runtime cost and they went ahead and turned it on by default in 3.4, or at least it is being turned on by default in the Debian/Ubuntu binaries as well as the LLVM binaries. Unfortunately, it was very broken indeed. It produces ARM binaries which simply cannot catch non-trivial C++ exceptions, though otherwise work fine. This caused Boost.Thread when interrupting a thread wait to enter an infinite loop at described above, indeed if you EVER catch a type with RTTI it infinite loops. Fortunately, Chromium realised this shortly after the 3.4 release, and they've been hard at work making a EHABI implementation which actually works for 3.5. I just finished compiling 3.5 from trunk and I can confirm that all AFIO unit test pass swimmingly with it for armhf. You can read more about the 3.5 ARM exception handling improvements at http://llvm.org/docs/ReleaseNotes.html#changes-to-the-arm-backend. I'll go notify Debian and Ubuntu upstream that clang 3.4 on ARM is useless for C++. Something in the Boost 1.56 release notes might be helpful too, basically don't use ARM EHABI before clang 3.5. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
participants (1)
-
Niall Douglas