Change to guidelines for characters in C++ source files

25 Jun 2015

      Since the very early days of Boost the guidelines for acceptable characters
in C++ source files has been the 96 characters of C++ standard's basic
source character set encoded in 7-bit ASCII. The inspect program also
allowed several additional 7-bit ASCII characters that sometimes appear in
comments.

The rationale was to ensure that Boost code was portable to all compilers
available at that time. We had gotten complaints that even a character as
innocuous as a copyright sign (U+00A9) was causing compiles to fail on some
compiler releases targeting Asian languages. UTF-8 support was far from
universal.

Times have changed:

* Source files encoded in UTF-8 with a leading byte order mark (BOM) of the
byte sequence 0xEF,0xBB,0xBF are supported by all C++ compilers that we are
aware of, and this has been true for many years now.

* As of C++11, the C++ language now includes types and literals directly
supporting UTF-8, UTF-16, and UTF-32, and creating code points above 7-bit
ASCII in such literals is much easier if UTF-8 source encoding is used.
Even editors as dumb as Windows Notepad have supported UTF-8 with BOM for
some time now.

* As Boost Libraries start to incorporate C++11 Unicode related features,
it becomes difficult to write test programs if limited to 7-bit ASCII. For
example, incorporating the Filesystem TS into Boost.Filesystem requires
test cases with UTF-8, UTF-16, and UTF-32 and that's painful under the
current 7-bit ASCII guidelines.

So...

It looks to me like it is high time to change the Boost guideline for C++
source file encoding to 7-bit ASCII without BOM or UTF-8 with BOM, and to
change the inspect program accordingly.

Comments?

--Beman

Beman Dawes

Paul Mensonides

Sebastian Redl

John Maddock

Niall Douglas

Sebastian Redl

Niall Douglas

Beman Dawes

Andrey Semashev

Mateusz Loskot

Gavin Lambert

Michael Caisse

Niall Douglas

tags

participants (9)