I would like to propose a SIMD/data-parallel header-only library for inclusion in boost. The library is an example implementation already developed by Intel of a C++ TS proposal called std::experimental::parallelism_v2::simd [1]. The library is styled as a data-parallel library where the programmer specifies the data parallelism available in the source program, and the compiler will generate the appropriate simd instruction sequences. The library is agnostic of simd element type and length, and within reason, should be able to target any processor with simd capabilities. Intel has added further generic extensions to the proposal, such as permutation operations and support for complex numbers [2]. Motivation: The concept of simd needs little introduction as the technique is widely used in high performance software and there are many libraries providing varying level of abstraction and portability. In [1], Matthias Kretz made a proposal to provide a standard C++ library which implements many data-parallel concepts in a portable, vendor-neutral style, and with tight integration with C++ (e.g., operator overloading, overloads for all appropriate standard library functions). That proposal and the feedback on it have been discussed in WG21 and Intel supports the proposal. There was an attempt to add a boost.simd library in the past but that was abandoned after its authors decided to commercialise it. Despite the availability of other simd or data-parallel libraries in the wider community there is still a gap for a data-parallel, width-agnostic, vendor neutral library with close integration into C++. The std::simd library proposal and its example implementation from Intel seem to fill this gap. The existence of an example implementation in boost will also make it easier to get users to adopt the TS and provide feedback to improve its API and feature set for real-world code. History: There are two strands of history that come together in this proposal: Intel's own vector libraries, and the C++ std::simd proposal. Intel has a long history of developing simd-capable processors and has its own libraries for accessing simd features [3]. A recent Intel project to refresh those libraries resulted in a new library called `xvec' which incorporates many modern C++ features and is deliberately vendor and target neutral, with the target being selected at compile-time. In 2013 the proposal was started, and then a C++ TS was published in 2018 [1] and discussed in WG21. Due to the similarities of the two libraries and their common goals, myself and Matthias Kretz are now working together on merging the simd specification from the Parallelism TS 2 into the IS for C++26. Status and Implementation: The `xvec' example library implemented by Intel supports most of the features of the std::simd proposal [1], along with a number of extensions to add some capabilities (e.g., generic permutation, complex number support) [2]. The library works in Linux and Windows, and can be compiled with GCC 12, LLVM 14 and Intel's oneAPI 2022. It is currently tested using googletest. Most of the library is implemented on top of the LLVM and GCC vector extensions [4], which makes it compilable on any target supported by those extensions. In a limited number of places intrinsics are used to access specific Intel instructions, but efficient generic fall-backs are also provided for simd targets which don't have those instructions (and this can included unsupported Intel processors too). Request: I would welcome any comments, questions or ideas on how to proceed with this proposal. [1] ISO/IEC TS 19570:2018, Programming Languages - Technical Specification for C++ Extensions for Parallelism, by Matthias Kretz. http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/n4808.pdf [2] P2638R0: Intel's response to P1915R0 for std::simd parallelism in TS 2. https://wg21.link/p2638r0 [3] https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-d... [4] https://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vec...
On 9/23/22 15:37, Towner, Daniel via Boost wrote:
I would like to propose a SIMD/data-parallel header-only library for inclusion in boost. The library is an example implementation already developed by Intel of a C++ TS proposal called std::experimental::parallelism_v2::simd [1]. The library is styled as a data-parallel library where the programmer specifies the data parallelism available in the source program, and the compiler will generate the appropriate simd instruction sequences. The library is agnostic of simd element type and length, and within reason, should be able to target any processor with simd capabilities. Intel has added further generic extensions to the proposal, such as permutation operations and support for complex numbers [2].
Motivation: The concept of simd needs little introduction as the technique is widely used in high performance software and there are many libraries providing varying level of abstraction and portability. In [1], Matthias Kretz made a proposal to provide a standard C++ library which implements many data-parallel concepts in a portable, vendor-neutral style, and with tight integration with C++ (e.g., operator overloading, overloads for all appropriate standard library functions). That proposal and the feedback on it have been discussed in WG21 and Intel supports the proposal. There was an attempt to add a boost.simd library in the past but that was abandoned after its authors decided to commercialise it. Despite the availability of other simd or data-parallel libraries in the wider community there is still a gap for a data-parallel, width-agnostic, vendor neutral library with close integration into C++. The std::simd library proposal and its example implementation from Intel se em to fill this gap. The existence of an example implementation in boost will also make it easier to get users to adopt the TS and provide feedback to improve its API and feature set for real-world code.
History: There are two strands of history that come together in this proposal: Intel's own vector libraries, and the C++ std::simd proposal. Intel has a long history of developing simd-capable processors and has its own libraries for accessing simd features [3]. A recent Intel project to refresh those libraries resulted in a new library called `xvec' which incorporates many modern C++ features and is deliberately vendor and target neutral, with the target being selected at compile-time. In 2013 the proposal was started, and then a C++ TS was published in 2018 [1] and discussed in WG21. Due to the similarities of the two libraries and their common goals, myself and Matthias Kretz are now working together on merging the simd specification from the Parallelism TS 2 into the IS for C++26.
Status and Implementation: The `xvec' example library implemented by Intel supports most of the features of the std::simd proposal [1], along with a number of extensions to add some capabilities (e.g., generic permutation, complex number support) [2]. The library works in Linux and Windows, and can be compiled with GCC 12, LLVM 14 and Intel's oneAPI 2022. It is currently tested using googletest. Most of the library is implemented on top of the LLVM and GCC vector extensions [4], which makes it compilable on any target supported by those extensions. In a limited number of places intrinsics are used to access specific Intel instructions, but efficient generic fall-backs are also provided for simd targets which don't have those instructions (and this can included unsupported Intel processors too).
Request: I would welcome any comments, questions or ideas on how to proceed with this proposal.
I'd be interested in a SIMD library. Some thoughts below, in no particular order. Is there public source code and documentation available online? Note that www.intel.com and other Intel sites are not accessible in some regions, so I don't consider those public. I'm interested whether the library integrates well with SIMD intrinsics. I imagine that the generic interface does not support some of the more exotic instructions, like horizontal arithmetics, data shuffling, cryptography, etc., so code that relies on these operations would need to use compiler intrinsics directly. The fact that the library is based on gcc vector extensions will be a problem for MSVC users. Are there any plans to tackle this? Not that supporting any particular compiler is a must for a Boost library, but MSVC still has a significant user base, and Boost is known, among other things, for its portability. Supporting MSVC would be desirable. On the topic of portability, what is the minimum C++ version requirement? Are there any performance numbers and comparisons with other solutions? In particular, with code that directly uses compiler intrinsics. Regarding inclusion in Boost, note that the library has to be licensed under the Boost Software License 1.0. Using googletest for tests might be problematic as we currently don't have it in pre-requisites, and I don't think there's infrastructure for installing it from an external source. Unless this is resolved somehow, porting tests to one of the Boost solutions will probably be desired.
I'd be interested in a SIMD library. Some thoughts below, in no particular order.
Is there public source code and documentation available online? Note that www.intel.com and other Intel sites are not accessible in some regions, so I don't consider those public.
We would put it up at github.com/intel alongside our other open-source contributions. I can't test that myself to see if that is accessible in all regions, but I assume it is might be more accessible than our main website. It is currently undergoing internal review before being published.
I'm interested whether the library integrates well with SIMD intrinsics. I imagine that the generic interface does not support some of the more exotic instructions, like horizontal arithmetics, data shuffling, cryptography, etc., so code that relies on these operations would need to use compiler intrinsics directly.
We have provided mechanisms to make it relatively easy to use intrinsics since there will always be a need to write at least some platform dependent code to get at the exotic instructions for special cases. The simplest mechanism (already proposed in std::simd) allows a simd<> value to be converted to and from builtin types so that an intrinsic can be called directly. It is more tricky to deal with a simd<> value which is bigger than the native type (e.g., a fixed_size_simd
The fact that the library is based on gcc vector extensions will be a problem for MSVC users. Are there any plans to tackle this? Not that supporting any particular compiler is a must for a Boost library, but MSVC still has a significant user base, and > Boost is known, among other things, for its portability. Supporting MSVC would be desirable.
We have tried to lean on the gcc/llvm compilers as much as possible to handle the SIMD for us since that allows us to reach a wide range of SIMD-capable targets without having to write reams of intrinsic-based code to deal with the details. It is a pity that MSVC doesn't implement those too. There are probably ways to handle MSVC too in the future, but for now we plan to devote our time to supporting gcc and llvm instead.
On the topic of portability, what is the minimum C++ version requirement?
C++20 at the moment, because we use things like concepts to make it easier to write, and we provide overloads for functions from <bit>. With some more verbosity in the code we could support older standards.
Are there any performance numbers and comparisons with other solutions? In particular, with code that directly uses compiler intrinsics.
Right from the start we have been very conscious of performance; there was no point in creating something that was slower than intrinsic-based alternatives, even if it did simplify the code. Our goal has always been that we should have comparable performance to intrinsic based solutions. I don't have numbers that I can share publicly yet, but performance isn't something that worries me.
Regarding inclusion in Boost, note that the library has to be licensed under the Boost Software License 1.0.
I don't think that will be a problem.
Using googletest for tests might be problematic as we currently don't have it in pre-requisites, and I don't think there's infrastructure for installing it from an external source. Unless this is resolved somehow, porting tests to one of the Boost solutions will probably be desired.
I suspected that would be the case. At the moment we use type parameterised tests: we list all the valid simd element types and all the valid sizes, and then gtest checks every combination against a suite of appropriate tests. So all simds would have basic operations checked (constructors, indexing, permuting, etc), then simd<_Float16|float|double> would add in a floating-point test suite, simd<unsignedX> would add in shifts and rotates, and so on.
On 9/23/22 20:18, Towner, Daniel wrote:
I'd be interested in a SIMD library. Some thoughts below, in no particular order.
Is there public source code and documentation available online? Note that www.intel.com and other Intel sites are not accessible in some regions, so I don't consider those public.
We would put it up at github.com/intel alongside our other open-source contributions. I can't test that myself to see if that is accessible in all regions, but I assume it is might be more accessible than our main website. It is currently undergoing internal review before being published.
GitHub should be fine, thank you. Looking forward to see the code and docs.
Using googletest for tests might be problematic as we currently don't have it in pre-requisites, and I don't think there's infrastructure for installing it from an external source. Unless this is resolved somehow, porting tests to one of the Boost solutions will probably be desired.
I suspected that would be the case. At the moment we use type parameterised tests: we list all the valid simd element types and all the valid sizes, and then gtest checks every combination against a suite of appropriate tests. So all simds would have basic operations checked (constructors, indexing, permuting, etc), then simd<_Float16|float|double> would add in a floating-point test suite, simd<unsignedX> would add in shifts and rotates, and so on.
Boost.Test has a similar feature: https://www.boost.org/doc/libs/1_80_0/libs/test/doc/html/boost_test/tests_or... One other question. Does the library offer any utilities for implementing runtime dispatch between code branches that use different instruction sets?
On Fri, Sep 23, 2022 at 8:13 AM Towner, Daniel via Boost < boost@lists.boost.org> wrote:
I would like to propose a SIMD/data-parallel header-only library for inclusion in boost. The library is an example implementation already developed by Intel of a C++ TS proposal called std::experimental::parallelism_v2::simd [1].
Looking forward to seeing the documentation and the source code.
On Fri, Sep 23, 2022 at 8:13 AM Towner, Daniel via Boost < boost@lists.boost.org> wrote:
I would like to propose a SIMD/data-parallel header-only library for inclusion in boost.
Hi Daniel -- This is a bit of a late response, but I for one would really like to see this proposed for Boost. In my work we have used or experimented with several simd libraries. Unfortunately, the TS reference implementation wasn't really available. Given that there's a large variety of API's and approaches in the existing libraries, if we're even considering standardizing one, having the reference implementation in Boost provides the largest exposure possible to application developers to use and abuse. Which is essential to having a good final proposal. I'm an experienced Boost developer (date-time) and believe I can get some cycles to help with the 'boostification' aspects once the source becomes available. Jeff Garland
Hi Daniel,
I too would love to see this work in Boost.
Noel Belcourt
On 10/14/22, 4:26 PM, "Boost on behalf of Jeff Garland via Boost"
Hi Noel and Jeff,
Thank you for your support. I think there is sufficient interest from the boost community to try get this work integrated into boost. Now that I've got that demonstration of interest I can work through the necessary internal processes to get the code released to github.com/intel, and once it is publicly available I will come back to this mailing list to announce that and decide on next steps.
Thanks,
dan.
-----Original Message-----
From: Boost
participants (5)
-
Andrey Semashev
-
Belcourt, Kenneth
-
Emil Dotchevski
-
Jeff Garland
-
Towner, Daniel