Re: [boost] Boost.Text Unicode licence issues

24 Aug 2020


      On Mon, Aug 24, 2020 at 8:30 AM Phil Endecott via Boost
<boost@lists.boost.org> wrote:
...
Zach Laine wrote:
...
On Sun, Aug 23, 2020 at 11:08 AM Phil Endecott via Boost
<boost@lists.boost.org> wrote:
...
Could you please explain what you've done about the copyright issues?
Sure.  I've reimplemented the code that originally came from ICU, and ...
...
As far as I can tell, you still depend on the Unicode data files that
have a Boost-incompatible licence.  You previously included this
Unicode copyright text in the documentation but that page has now been
removed, if I'm looking in the right place.
... removed the ICU copyright from these files. They are the output of
a code generation tool, and so are not copyrightable individually (like
the output of lex and yacc).
For the benefit of everyone else let me describe what Zach does:
1. There are some files at unicode.org that have a Boost-incompatible
licence.
2. Zach has some Python scripts at https://github.com/tzlaine/text/tree/master/scripts
3. The scripts download the files from unicode.org, convert them into C++
source files, and prefix the result "(C) Zach Laine Boost License".
4. These generated files are checked in at https://github.com/tzlaine/text/tree/master/include/boost/text/data
The intention is not that end-users of Boost.Text will run the scripts,
but rather that the generated files will be included in the Boost source
distribution.
Zach thinks this is OK because "they are the output of a code
generation tool, and so are not copyrightable individually (like
the output of lex and yacc)".
I think that's completely wrong. I believe it's a well-established
principle of software copyright law that the output of a tool -
whether that is g++, bison, or rot13 - is a derived work of the
input to that tool.  You cannot (without permission) take example.y
that's (C) Megacorp, run bison on it, and claim that the resulting
example.tab.c is now (C) Someone Else.
This worries me.  We really, really don't want to be shipping code
that has copyright violations!
Agreed, though I don't think this is one instance.  If this is a
copyright violation, we have been in violation for years and years
already.  Look in

boost/spirit/home/support/char_encoding/unicode/UnicodeData.txt
boost/spirit/home/support/char_encoding/unicode/DerivedCoreProperties.txt

and the other files in that directory.  Note that these are in the
header paths, not inside src/ or something.  DerivedCoreProperties.txt
even has the Unicode copyright still on it.  Moreover, the data in the
files in that directory is derived from the .txt files.  Even though
the .txt files appear to have been removed on November 19, we end up
with the same problem, to the extent it is a problem, that Phil raises
-- distribution of code derived from non-code .txt files.

Zach