On 9/23/22 15:37, Towner, Daniel via Boost wrote:
I would like to propose a SIMD/data-parallel header-only library for inclusion in boost. The library is an example implementation already developed by Intel of a C++ TS proposal called std::experimental::parallelism_v2::simd [1]. The library is styled as a data-parallel library where the programmer specifies the data parallelism available in the source program, and the compiler will generate the appropriate simd instruction sequences. The library is agnostic of simd element type and length, and within reason, should be able to target any processor with simd capabilities. Intel has added further generic extensions to the proposal, such as permutation operations and support for complex numbers [2].
Motivation: The concept of simd needs little introduction as the technique is widely used in high performance software and there are many libraries providing varying level of abstraction and portability. In [1], Matthias Kretz made a proposal to provide a standard C++ library which implements many data-parallel concepts in a portable, vendor-neutral style, and with tight integration with C++ (e.g., operator overloading, overloads for all appropriate standard library functions). That proposal and the feedback on it have been discussed in WG21 and Intel supports the proposal. There was an attempt to add a boost.simd library in the past but that was abandoned after its authors decided to commercialise it. Despite the availability of other simd or data-parallel libraries in the wider community there is still a gap for a data-parallel, width-agnostic, vendor neutral library with close integration into C++. The std::simd library proposal and its example implementation from Intel se em to fill this gap. The existence of an example implementation in boost will also make it easier to get users to adopt the TS and provide feedback to improve its API and feature set for real-world code.
History: There are two strands of history that come together in this proposal: Intel's own vector libraries, and the C++ std::simd proposal. Intel has a long history of developing simd-capable processors and has its own libraries for accessing simd features [3]. A recent Intel project to refresh those libraries resulted in a new library called `xvec' which incorporates many modern C++ features and is deliberately vendor and target neutral, with the target being selected at compile-time. In 2013 the proposal was started, and then a C++ TS was published in 2018 [1] and discussed in WG21. Due to the similarities of the two libraries and their common goals, myself and Matthias Kretz are now working together on merging the simd specification from the Parallelism TS 2 into the IS for C++26.
Status and Implementation: The `xvec' example library implemented by Intel supports most of the features of the std::simd proposal [1], along with a number of extensions to add some capabilities (e.g., generic permutation, complex number support) [2]. The library works in Linux and Windows, and can be compiled with GCC 12, LLVM 14 and Intel's oneAPI 2022. It is currently tested using googletest. Most of the library is implemented on top of the LLVM and GCC vector extensions [4], which makes it compilable on any target supported by those extensions. In a limited number of places intrinsics are used to access specific Intel instructions, but efficient generic fall-backs are also provided for simd targets which don't have those instructions (and this can included unsupported Intel processors too).
Request: I would welcome any comments, questions or ideas on how to proceed with this proposal.
I'd be interested in a SIMD library. Some thoughts below, in no particular order. Is there public source code and documentation available online? Note that www.intel.com and other Intel sites are not accessible in some regions, so I don't consider those public. I'm interested whether the library integrates well with SIMD intrinsics. I imagine that the generic interface does not support some of the more exotic instructions, like horizontal arithmetics, data shuffling, cryptography, etc., so code that relies on these operations would need to use compiler intrinsics directly. The fact that the library is based on gcc vector extensions will be a problem for MSVC users. Are there any plans to tackle this? Not that supporting any particular compiler is a must for a Boost library, but MSVC still has a significant user base, and Boost is known, among other things, for its portability. Supporting MSVC would be desirable. On the topic of portability, what is the minimum C++ version requirement? Are there any performance numbers and comparisons with other solutions? In particular, with code that directly uses compiler intrinsics. Regarding inclusion in Boost, note that the library has to be licensed under the Boost Software License 1.0. Using googletest for tests might be problematic as we currently don't have it in pre-requisites, and I don't think there's infrastructure for installing it from an external source. Unless this is resolved somehow, porting tests to one of the Boost solutions will probably be desired.