Boost.Text is looking for a (mini-)review manager
I've made all the recommended changes to Boost.Text that came up in the review. Glen, the previous review manager, recommended a mini review of Boost.Text once I'd made those changes. If you'd like to manage the mini-review, please let me know. For everyone, including potential reviewers and review managers, here are the major changes: * All the specific review feedback issues have been addressed. * The string layer is now gone. * The text layer got a major re-work. It is now a set of template-based adaptors (basic_text<>, vs. a text typedef. etc.), which lets everyone get what they want. You can change the underlying storage type (including whatever allocator you prefer), select UTF-8 or UTF-16, and change the normalization form. Glen recommended I drop this layer, but the library is way less useful without this stuff, and these changes address all the concerns raised about the text layer in the review. The updated docs can be found here: https://tzlaine.github.io/text/doc/html/index.html Oh yeah, and I added concept constraints to the whole library when you build in C++20 mode. Zach
On Sun, 23 Aug 2020 at 01:05, Zach Laine via Boost
I've made all the recommended changes to Boost.Text that came up in the review. Glen, the previous review manager, recommended a mini review of Boost.Text once I'd made those changes.
The Mini-Review is usually conducted by the same review manager [1]. Would it be possible? [1] https://www.boost.org/community/reviews.html#Maintainer Best regards, -- Mateusz Loskot, http://mateusz.loskot.net
On Sunday, August 23, 2020, Mateusz Loskot via Boost
On Sun, 23 Aug 2020 at 01:05, Zach Laine via Boost
wrote: I've made all the recommended changes to Boost.Text that came up in the review. Glen, the previous review manager, recommended a mini review of Boost.Text once I'd made those changes.
The Mini-Review is usually conducted by the same review manager [1]. Would it be possible?
Zach already asked me, but I am unavailable at the moment. :-) Glen
On Sun, 23 Aug 2020 at 14:57, Glen Fernandes
On Sunday, August 23, 2020, Mateusz Loskot via Boost
wrote: On Sun, 23 Aug 2020 at 01:05, Zach Laine via Boost
wrote: I've made all the recommended changes to Boost.Text that came up in the review. Glen, the previous review manager, recommended a mini review of Boost.Text once I'd made those changes.
The Mini-Review is usually conducted by the same review manager [1]. Would it be possible?
Zach already asked me, but I am unavailable at the moment. :-)
Okay. I added the Text mini-review to the schedule, with the call for a review manager. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net
Hi Zach, Zach Laine wrote:
I've made all the recommended changes to Boost.Text that came up in the review.
* All the specific review feedback issues have been addressed. * The string layer is now gone. * The text layer got a major re-work.
Could you please explain what you've done about the copyright issues? As far as I can tell, you still depend on the Unicode data files that have a Boost-incompatible licence. You previously included this Unicode copyright text in the documentation but that page has now been removed, if I'm looking in the right place. Is this the correct URL for the new version: https://github.com/tzlaine/text/ Regards, Phil.
On Sun, Aug 23, 2020 at 11:08 AM Phil Endecott via Boost
Hi Zach,
Zach Laine wrote:
I've made all the recommended changes to Boost.Text that came up in the review.
* All the specific review feedback issues have been addressed. * The string layer is now gone. * The text layer got a major re-work.
Could you please explain what you've done about the copyright issues?
Sure. I've reimplemented the code that originally came from ICU, and ...
As far as I can tell, you still depend on the Unicode data files that have a Boost-incompatible licence. You previously included this Unicode copyright text in the documentation but that page has now been removed, if I'm looking in the right place.
... removed the ICU copyright from these files. They are the output of a code generation tool, and so are not copyrightable individually (like the output of lex and yacc).
Is this the correct URL for the new version: https://github.com/tzlaine/text/
Yes, that's it. Zach
Zach Laine wrote:
On Sun, Aug 23, 2020 at 11:08 AM Phil Endecott via Boost
wrote: Could you please explain what you've done about the copyright issues?
Sure. I've reimplemented the code that originally came from ICU, and ...
As far as I can tell, you still depend on the Unicode data files that have a Boost-incompatible licence. You previously included this Unicode copyright text in the documentation but that page has now been removed, if I'm looking in the right place.
... removed the ICU copyright from these files. They are the output of a code generation tool, and so are not copyrightable individually (like the output of lex and yacc).
For the benefit of everyone else let me describe what Zach does: 1. There are some files at unicode.org that have a Boost-incompatible licence. 2. Zach has some Python scripts at https://github.com/tzlaine/text/tree/master/scripts 3. The scripts download the files from unicode.org, convert them into C++ source files, and prefix the result "(C) Zach Laine Boost License". 4. These generated files are checked in at https://github.com/tzlaine/text/tree/master/include/boost/text/data The intention is not that end-users of Boost.Text will run the scripts, but rather that the generated files will be included in the Boost source distribution. Zach thinks this is OK because "they are the output of a code generation tool, and so are not copyrightable individually (like the output of lex and yacc)". I think that's completely wrong. I believe it's a well-established principle of software copyright law that the output of a tool - whether that is g++, bison, or rot13 - is a derived work of the input to that tool. You cannot (without permission) take example.y that's (C) Megacorp, run bison on it, and claim that the resulting example.tab.c is now (C) Someone Else. This worries me. We really, really don't want to be shipping code that has copyright violations! Glen, in your review result announcement you said you were confident that Zach would be able to resolve the copyright problems. What did you have in mind when you wrote that? Regards, Phil.
Phil Endecott wrote: ...
4. These generated files are checked in at https://github.com/tzlaine/text/tree/master/include/boost/text/data
https://github.com/tzlaine/text/tree/master/include/boost/text/detail surely?
I believe it's a well-established principle of software copyright law that the output of a tool - whether that is g++, bison, or rot13 - is a derived work of the input to that tool.
Kind of. It depends on whether the tool extracts copyrightable elements from the source. Either way, if we go with the strict interpretation and decide that {0x0028, 0x0029, bidi_bracket_type::open}, {0x0029, 0x0028, bidi_bracket_type::close}, {0x005B, 0x005D, bidi_bracket_type::open}, {0x005D, 0x005B, bidi_bracket_type::close}, {0x007B, 0x007D, bidi_bracket_type::open}, {0x007D, 0x007B, bidi_bracket_type::close}, is a derived work of a Unicode data file, I see no way of ever having a Unicode library in Boost.
Peter Dimov via Boost said: (by the date of Mon, 24 Aug 2020 17:10:48 +0300)
Either way, if we go with the strict interpretation and decide that
{0x0028, 0x0029, bidi_bracket_type::open}, {0x0029, 0x0028, bidi_bracket_type::close}, {0x005B, 0x005D, bidi_bracket_type::open}, {0x005D, 0x005B, bidi_bracket_type::close}, {0x007B, 0x007D, bidi_bracket_type::open}, {0x007D, 0x007B, bidi_bracket_type::close},
is a derived work of a Unicode data file, I see no way of ever having a Unicode library in Boost.
There are debian packages available which support UTF-8 encoding, conversion etc. Debian has rather strict license requirements. These packages wouldn't be possible if the code numbers of UTF code glyphs weren't publicly accessible. Also there are many online UTF-8 databases. -- # Janek Kozicki http://janek.kozicki.pl/
Janek Kozicki wrote:
There are debian packages available which support UTF-8 encoding, conversion etc. Debian has rather strict license requirements.
Not as strict as Boost's. Boost's requirement is that the code should be licensed under the Boost Licence. Debian accepts code under many licences - for example, there are many Debian packages that are GPL-licensed, and I hope everyone understands that you cannot incorporate someone else's GPL-licensed code into a Boost library. It is also worth noting that on a Debian system you can look in /usr/share/doc/libicu*/copyright and see Unicode's required attribution statement. Similarly, if you have an iPhone you can look in Settings -> General -> Legal & Regulatory -> Legal Notices and see the Unicode attribution statement. This is the practical issue that the Boost licensing policy is trying to deal with: it has been decided that users of Boost should not have the burden of accompanying their Boost-using end products with the thousands of lines of attribution material that you see in those cases. Regards, Phil.
Peter Dimov wrote:
Phil Endecott wrote: ...
4. These generated files are checked in at https://github.com/tzlaine/text/tree/master/include/boost/text/data
https://github.com/tzlaine/text/tree/master/include/boost/text/detail surely?
Err yes, both I think. It's doesn't matter much though.
if we go with the strict interpretation and decide that
{0x0028, 0x0029, bidi_bracket_type::open}, {0x0029, 0x0028, bidi_bracket_type::close}, {0x005B, 0x005D, bidi_bracket_type::open}, {0x005D, 0x005B, bidi_bracket_type::close}, {0x007B, 0x007D, bidi_bracket_type::open}, {0x007D, 0x007B, bidi_bracket_type::close},
is a derived work of a Unicode data file, I see no way of ever having a Unicode library in Boost.
I see various possible ways forward though none is particularly appealing: 1. A run-time dependency on libicu, getting libicu to do all the work, as Boost.Locale does. 2. A run-time dependency on libicu that generates these tables by interrogating libicu at start-up, and then using Zach's code. 3. Getting users to download the Unicode data files and run the scripts at Boost compile time. 4. Making an exception to the licensing policy. 5. Removing this from Boost.Text, keeping just the UTFn conversion code and anything else that doesn't depend on the problematic data files (as I advocated in my review). Regards, Phil.
5. Removing this from Boost.Text, keeping just the UTFn conversion code and anything else that doesn't depend on the problematic data files (as I advocated in my review).
I think this would be to the ultimate detriment of the library. Anything Boost.Text can do for the user is a boon in this regard and only increases the quality of the lib.
1. A run-time dependency on libicu, getting libicu to do all the work, as Boost.Locale does.
I think this would be the best path forward. Obtaining a dynamic version of libicu is relatively straight-forward on all systems and would dodge the licensing issues. - Chris
On Wed, 26 Aug 2020 at 09:59, Christian Mazakas via Boost
5. Removing this from Boost.Text, keeping just the UTFn conversion code and anything else that doesn't depend on the problematic data files (as I advocated in my review).
I think this would be to the ultimate detriment of the library. Anything Boost.Text can do for the user is a boon in this regard and only increases the quality of the lib.
Agree
1. A run-time dependency on libicu, getting libicu to do all the work, as Boost.Locale does.
I think this would be the best path forward. Obtaining a dynamic version of libicu is relatively straight-forward on all systems and would dodge the licensing issues.
I disagree as a user. From a user perspective a libicu dependency does not "dodge" anything. The user then definitely has to comply with the libicu license. Also, for many uses of the words "all systems" (that want to process unicode text) obtaining and using a dynamic version of libicu is anything but "reasonably straight-forward". Having only a licence issue to resolve is simpler than having a library dependency that itself creates a dependency on the same license. Note I am not referring to any "copyleft" issues that might arise (and might or might not be avoided through dynamic linking - its hard to dynamically link a header dependency) as there is no copyleft involved in this license. The preceding is about licence terms - but there is first an issue of what rights the copyright holder actually has over the work / whether the work is actually copyrightable in the first place. Some would claim that the icu data files are simply a list of facts and not creative. However, as ICU "made up" (defined) some of those facts it perhaps can't be said not to be creative. In the absence of explicit permission from ICU to place a different licence on the generated files it may be best to ensure that these files do identify their ICU dependency/derivation, and reference the relevant licence. Any/all of Boost.Text that does not depend on these files should be usable in a "NO_ICU" build that does not include/compile any of the generated files. And the ICU licence should be included with and referenced from the generated files. What a user chooses to do is up to them. Boost has definitely complied. And the user issue is JUST how (and, dare I say, if) to comply with the ICU license terms - which at worst requires distributing the ICU license file somehow. This isn't a huge hardship.
On 26.08.20 01:58, Christian Mazakas via Boost wrote:
1. A run-time dependency on libicu, getting libicu to do all the work, as Boost.Locale does.
I think this would be the best path forward. Obtaining a dynamic version of libicu is relatively straight-forward on all systems and would dodge the licensing issues.
The only reason I'm interested in Boost.Text is that it will help me get rid of my dependency on ICU, with its broken build system and bloat. A version of Boost.Text that relies on ICU has no value to me, and I would not use it. -- Rainer Deyke (rainerd@eldwood.com)
Gesendet: Donnerstag, 27. August 2020 um 07:15 Uhr Von: "Rainer Deyke via Boost"
On 26.08.20 01:58, Christian Mazakas via Boost wrote:
1. A run-time dependency on libicu, getting libicu to do all the work, as Boost.Locale does.
I think this would be the best path forward. Obtaining a dynamic version of libicu is relatively straight-forward on all systems and would dodge the licensing issues.
The only reason I'm interested in Boost.Text is that it will help me get rid of my dependency on ICU, with its broken build system and bloat. A version of Boost.Text that relies on ICU has no value to me, and I would not use it.
For me it wouldn't become useless, but I'd certainly say it would be the least appealing solution for me and make it significantly less likely that we will use this library in one of our projects. Also, how would such a design impact the interface and performance of this library? For example this would make it impossible to ever use parts of this lib in a constexpr context correct?. Considering that it seems unclear if such measure are necessary to begin with, I'd implore the boost organization to first seek legal advice and/or contact whoever holds the copyright for libicu to establish that this is really the only viable option before taking "the easy way out" and delivering a technically inferior library just out of uncertainty. What I wonder: If the same data could - in theory - be scraped from a published ISO standards document, then can that data really be subject to the libicu license requirements? I'm also not positive, that if you put a copyrighted document through a language processor and the output retains none of the structure of the original document, that this counts as "derived work" in the sense of copyright law. Extreme case: If I run a novel through a tool like, wc, I don't believe the output of that tool would stell be under the copyright of the owner. Of course IANAL so my opinion doesn't mean anything, but this is why this should be reviewd by someone who is actually a lawyer. Best Mike
-- Rainer Deyke (rainerd@eldwood.com)
Mike via Boost said: (by the date of Fri, 28 Aug 2020 12:47:55 +0200)
Extreme case: If I run a novel through a tool like, wc, I don't believe the output of that tool would stell be under the copyright of the owner.
that would be an attempt to copyright a natural number. That's not possible. The actual information that UTF *has* is which natural number corresponds to which glyph. Ask about this. -- # Janek Kozicki http://janek.kozicki.pl/
Gesendet: Freitag, 28. August 2020 um 13:23 Uhr Von: "Janek Kozicki via Boost"
Mike via Boost said: (by the date of Fri, 28 Aug 2020 12:47:55 +0200)
Extreme case: If I run a novel through a tool like, wc, I don't believe the output of that tool would stell be under the copyright of the owner.
that would be an attempt to copyright a natural number. That's not possible.
the output isn't a natural number, but 3 natural numbers corresponding to the line, word and bytecount of that document (at least on my system).
The actual information that UTF *has* is which natural number corresponds to which glyph. Ask about this.
Whom should I ask what?
-- # Janek Kozicki http://janek.kozicki.pl/
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Mike via Boost said: (by the date of Fri, 28 Aug 2020 15:12:50 +0200)
The actual information that UTF *has* is which natural number corresponds to which glyph. Ask about this.
Whom
My apologies, clearly it was another person who wanted to ask (or already asked) the boost lawyers and the UTF consortium lawyers.
should I ask what?
If this "number" ↔ "glyph" correspondence can be used by boost on the terms of boost license. -- # Janek Kozicki http://janek.kozicki.pl/
On 2020-08-28 19:29, Janek Kozicki via Boost wrote:
Mike via Boost said: (by the date of Fri, 28 Aug 2020 15:12:50 +0200)
The actual information that UTF *has* is which natural number corresponds to which glyph. Ask about this.
Whom
My apologies, clearly it was another person who wanted to ask (or already asked) the boost lawyers and the UTF consortium lawyers.
should I ask what?
If this "number" ↔ "glyph" correspondence can be used by boost on the terms of boost license.
The license applies not to the abstract coding of glyphs or characters but to the data files published by Unicode Inc. that describe this. It is these files Zach translates into bits of Boost.Text code, which is then claimed to be not covered by Unicode Inc. license. The important difference is that (a) Zach is using a material document that is licenseable (as opposed to abstract facts or numbers, which are not) as a source and (b) the result of translation preserves the original information in some meaningful amount (as opposed to e.g. applying a hash function or `wc`, which does not).
On Mon, Aug 24, 2020 at 9:30 AM Phil Endecott via Boost
Glen, in your review result announcement you said you were confident that Zach would be able to resolve the copyright problems. What did you have in mind when you wrote that?
That the next submission of Text (and thus the content of boostorg/text if accepted) contains nothing that isn't licensed under the BSL. The mechanics of achieving this are fortunately not up to me, but the desired outcome is something that should hold up to scrutiny in the next Boost review.
This worries me. We really, really don't want to be shipping code that has copyright violations!
We do not, of course. (And we need to address any such cases in existing libraries that are shipping with the Boost distribution). Glen
On 2020-08-24 16:29, Phil Endecott via Boost wrote:
Zach Laine wrote:
On Sun, Aug 23, 2020 at 11:08 AM Phil Endecott via Boost
wrote: Could you please explain what you've done about the copyright issues?
Sure. I've reimplemented the code that originally came from ICU, and ...
As far as I can tell, you still depend on the Unicode data files that have a Boost-incompatible licence. You previously included this Unicode copyright text in the documentation but that page has now been removed, if I'm looking in the right place.
... removed the ICU copyright from these files. They are the output of a code generation tool, and so are not copyrightable individually (like the output of lex and yacc).
For the benefit of everyone else let me describe what Zach does:
1. There are some files at unicode.org that have a Boost-incompatible licence.
I believe, these are the terms of use published by Unicode Inc.: http://www.unicode.org/copyright.html https://www.unicode.org/license.html I'll cite the relevant paragraph from the latter document: Permission is hereby granted, free of charge, to any person obtaining a copy of the Unicode data files and any associated documentation (the "Data Files") or Unicode software and any associated documentation (the "Software") to deal in the Data Files or Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, and/or sell copies of the Data Files or Software, and to permit persons to whom the Data Files or Software are furnished to do so, provided that either (a) this copyright and permission notice appear with all copies of the Data Files or Software, or (b) this copyright and permission notice appear in associated Documentation. I'm not a lawyer, but is doesn't seem to contradict our license requirements: https://www.boost.org/development/requirements.html#License In particular: - It grants permission to copy, use and distribute free of charge and commercially. - It requires the license to appear in the docs or the source code of the data (which, I believe, is in textual form). - It does *not* require the license to appear in the compiled binaries, which may contain the data in binary form. Am I missing some aspect where the license is incompatible with Boost requirements?
2. Zach has some Python scripts at https://github.com/tzlaine/text/tree/master/scripts
3. The scripts download the files from unicode.org, convert them into C++ source files, and prefix the result "(C) Zach Laine Boost License".
4. These generated files are checked in at https://github.com/tzlaine/text/tree/master/include/boost/text/data The intention is not that end-users of Boost.Text will run the scripts, but rather that the generated files will be included in the Boost source distribution.
Zach thinks this is OK because "they are the output of a code generation tool, and so are not copyrightable individually (like the output of lex and yacc)".
I think that's completely wrong. I believe it's a well-established principle of software copyright law that the output of a tool - whether that is g++, bison, or rot13 - is a derived work of the input to that tool. You cannot (without permission) take example.y that's (C) Megacorp, run bison on it, and claim that the resulting example.tab.c is now (C) Someone Else.
Changing the copyright indeed does not look good. However, generally, the terms of use of the source and compiled/translated forms can be different, depending on the license.
This worries me. We really, really don't want to be shipping code that has copyright violations!
Agreed to this. I think, Boost needs to consult with a lawyer on this matter. Personally, I wouldn't like if Boost stopped being entirely covered by the BSL. As I said earlier, this is an important property for Boost to be acceptable in many environments. However, if the Unicode Inc. terms of use don't contradict the BSL and Boost License requirements (as determined by a lawyer) and there really is no other way around it, I'd accept it as an exception for the sake of a greater good of improving Unicode support in C++. Should it end up this way, the licensing terms must be explained clearly in Boost.Text docs, with proper copyright attribution. With a note that license compatibility has been verified and confirmed by a lawyer.
Andrey Semashev wrote:
- It does *not* require the license to appear in the compiled binaries, which may contain the data in binary form.
It does. Without explicit permission (which the BSL contains for this very reason), binaries are derived works as much as script-processed files are.
On 2020-08-24 20:22, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
- It does *not* require the license to appear in the compiled binaries, which may contain the data in binary form.
It does. Without explicit permission (which the BSL contains for this very reason), binaries are derived works as much as script-processed files are.
I thought "copies" means copies of the source code, doesn't it? There is also a latch that the license can appear only in the docs, not the data itself, whether in source or binary form. I guess, that's what the layer has to clarify. But if it indeed means binary form as well then, alas, no Unicode in Boost.
Andrey Semashev wrote:
But if it indeed means binary form as well then, alas, no Unicode in Boost.
It does, but I'm pretty sure the Unicode consortium isn't going to go after us for this specific use (using the data files to generate code that is necessary to implement the functionality specified in the standard.)
On 2020-08-24 21:24, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
But if it indeed means binary form as well then, alas, no Unicode in Boost.
It does, but I'm pretty sure the Unicode consortium isn't going to go after us for this specific use (using the data files to generate code that is necessary to implement the functionality specified in the standard.)
It doesn't matter how likely it is. This isn't just about us, but also about our users. One other option would be to ask Unicode Inc. for a permission to use Unicode data in binary form without the requirement to present the license.
On Mon, Aug 24, 2020 at 2:31 PM Andrey Semashev via Boost
On 2020-08-24 21:24, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
But if it indeed means binary form as well then, alas, no Unicode in Boost.
It does, but I'm pretty sure the Unicode consortium isn't going to go after us for this specific use (using the data files to generate code that is necessary to implement the functionality specified in the standard.)
It doesn't matter how likely it is. This isn't just about us, but also about our users.
One other option would be to ask Unicode Inc. for a permission to use Unicode data in binary form without the requirement to present the license.
Why is this suddenly a problem if Boost.Spirit has been doing it for years? Zach
On 2020-08-24 22:40, Zach Laine via Boost wrote:
On Mon, Aug 24, 2020 at 2:31 PM Andrey Semashev via Boost
wrote: On 2020-08-24 21:24, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
But if it indeed means binary form as well then, alas, no Unicode in Boost.
It does, but I'm pretty sure the Unicode consortium isn't going to go after us for this specific use (using the data files to generate code that is necessary to implement the functionality specified in the standard.)
It doesn't matter how likely it is. This isn't just about us, but also about our users.
One other option would be to ask Unicode Inc. for a permission to use Unicode data in binary form without the requirement to present the license.
Why is this suddenly a problem if Boost.Spirit has been doing it for years?
Because apparently noone knew they were doing this. It may well become a problem of Boost.Spirit now. Personally, I'm still not sure there is a license incompatibility, until a lawyer comments.
Andrey Semashev wrote:
Personally, I'm still not sure there is a license incompatibility, until a lawyer comments.
Lawyers can't tell you whether the Unicode consortium will be OK with it; they can't even tell you with any certainty whether a court will rule that this is a violation. It's a sufficiently gray area that it's potentially problematic, but not clearly problematic. My opinion is that this use is fine in practice. (Using headers bearing the Unicode consortium copyright directly was not.)
Did anyone try to contact the unicode consortium about this? I'm not part of that community, but maybe they are ok and they can officially say so? Or maybe they are not and we can avoid guessing what they are going to do. El lun., 24 ago. 2020 a las 17:22, Peter Dimov via Boost (< boost@lists.boost.org>) escribió:
Andrey Semashev wrote:
Personally, I'm still not sure there is a license incompatibility, until a lawyer comments.
Lawyers can't tell you whether the Unicode consortium will be OK with it; they can't even tell you with any certainty whether a court will rule that this is a violation. It's a sufficiently gray area that it's potentially problematic, but not clearly problematic.
My opinion is that this use is fine in practice. (Using headers bearing the Unicode consortium copyright directly was not.)
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Mon, Aug 24, 2020 at 8:55 PM Damian Vicino via Boost
Did anyone try to contact the unicode consortium about this? I'm not part of that community, but maybe they are ok and they can officially say so? Or maybe they are not and we can avoid guessing what they are going to do.
I've sent them an email. Zach
-----Original Message----- From: Boost
On Behalf Of Peter Dimov via Boost Sent: 24 August 2020 22:22 To: boost@lists.boost.org Cc: Peter Dimov Subject: Re: [boost] Boost.Text Unicode licence issues Andrey Semashev wrote:
Personally, I'm still not sure there is a license incompatibility, until a lawyer comments.
Lawyers can't tell you whether the Unicode consortium will be OK with it; they can't even tell you with any certainty whether a court will rule that this is a violation. It's a sufficiently gray area
that it's
potentially problematic, but not clearly problematic.
My opinion is that this use is fine in practice. (Using headers bearing the Unicode consortium copyright directly was not.)
I agree with this. It is clearly a grey area, but Boost has a defensible position (and the likelihood of being called to defend it is sub-normal near zero). Confirmation from Unicode would be perfect, but I doubt if we will get it. My 2 p. Paul
On 8/25/20 3:46 AM, Andrey Semashev via Boost wrote:
On 2020-08-24 22:40, Zach Laine via Boost wrote:
On Mon, Aug 24, 2020 at 2:31 PM Andrey Semashev via Boost
wrote: On 2020-08-24 21:24, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
But if it indeed means binary form as well then, alas, no Unicode in Boost.
It does, but I'm pretty sure the Unicode consortium isn't going to go after us for this specific use (using the data files to generate code that is necessary to implement the functionality specified in the standard.)
It doesn't matter how likely it is. This isn't just about us, but also about our users.
One other option would be to ask Unicode Inc. for a permission to use Unicode data in binary form without the requirement to present the license.
Why is this suddenly a problem if Boost.Spirit has been doing it for years?
Because apparently noone knew they were doing this. It may well become a problem of Boost.Spirit now.
Personally, I'm still not sure there is a license incompatibility, until a lawyer comments.
Oh my! Michael Caisse brought this to my attention. I too was not aware that this will cause some license problems. The development of the unicode parts of Spirit was not a secret. Development proceeded in full public view and no one pointed out this issue, until now. It never occurred to me that this is a violation. I will defer to consensus in the Boost community. I too would like to hear what a lawyer says about this. Regards, -- Joel de Guzman
The data files that the Unicode Consortium provides are based on the Unicode specifications, yes? Could we not simply reproduce the data tables from the specification? I realize this is potentially a bunch of work. -- Bryce Adelstein Lelbach aka wash US Programming Language Standards (PL22) Chair ISO C++ Library Evolution Chair CppCon and C++Now Program Chair CUDA Core C++ Libraries (Thrust, CUB, libcu++) Lead @ NVIDIA --
On Mon, 24 Aug 2020 at 22:40, Zach Laine via Boost
It doesn't matter how likely it is. This isn't just about us, but also about our users.
One other option would be to ask Unicode Inc. for a permission to use Unicode data in binary form without the requirement to present the license.
Why is this suddenly a problem if Boost.Spirit has been doing it for years?
IANAL and all that, but I don't think that argument works very well. In the case of a hypothetical copyright violation, I have never heard anything suggesting that the length of a period of violation somehow renders the copyright moot, nor is there any suggestion that not coming after a violation does so either.
On Mon, Aug 24, 2020 at 2:47 PM Ville Voutilainen
On Mon, 24 Aug 2020 at 22:40, Zach Laine via Boost
wrote: It doesn't matter how likely it is. This isn't just about us, but also about our users.
One other option would be to ask Unicode Inc. for a permission to use Unicode data in binary form without the requirement to present the license.
Why is this suddenly a problem if Boost.Spirit has been doing it for years?
IANAL and all that, but I don't think that argument works very well. In the case of a hypothetical copyright violation, I have never heard anything suggesting that the length of a period of violation somehow renders the copyright moot, nor is there any suggestion that not coming after a violation does so either.
Sure, but that was not the point of my comment. My my point is that if we are concerned that Spirit's use of the Unicode Character Database would cause difficulty for end-user license approval, or potential lawsuits from Unicode (which I think is approximately 0% likely), it probably would have come up already in the last 10 years or so. Zach
On Mon, 24 Aug 2020 at 23:02, Zach Laine
On Mon, Aug 24, 2020 at 2:47 PM Ville Voutilainen
wrote: On Mon, 24 Aug 2020 at 22:40, Zach Laine via Boost
wrote: It doesn't matter how likely it is. This isn't just about us, but also about our users.
One other option would be to ask Unicode Inc. for a permission to use Unicode data in binary form without the requirement to present the license.
Why is this suddenly a problem if Boost.Spirit has been doing it for years?
IANAL and all that, but I don't think that argument works very well. In the case of a hypothetical copyright violation, I have never heard anything suggesting that the length of a period of violation somehow renders the copyright moot, nor is there any suggestion that not coming after a violation does so either.
Sure, but that was not the point of my comment. My my point is that if we are concerned that Spirit's use of the Unicode Character Database would cause difficulty for end-user license approval, or potential lawsuits from Unicode (which I think is approximately 0% likely), it probably would have come up already in the last 10 years or so.
Right, that's the "nor is there any suggestion that not coming after a violation does so either" part. That argument doesn't work all that well either. Where does boost go for legal advice?
On 2020-08-24 22:17, Ville Voutilainen via Boost wrote:
Where does boost go for legal advice?
The Software Freedom Conservancy: https://lists.boost.org/Archives/boost/2015/12/226951.php Although we would need confirmation from the Boost Steering Committee that we are still a member.
On Mon, Aug 24, 2020 at 5:01 PM Bjorn Reese wrote:
The Software Freedom Conservancy:
Although we would need confirmation from the Boost Steering Committee that we are still a member.
https://sites.google.com/a/boost.org/steering/boost-foundation "After being a project of Software Freedom Conservancy for over a decade, Boost has spun-off into it owns 501(c)(3) non-profit corporation named Boost Foundation. We are forever grateful to Software Freedom Conservancy for their support and guidance to achieve this milestone. Boost Foundation Board of Directors will now assume the role that was filled by the Boost Steering Committee." Glen
Software Conservancy used to have a general counsel on staff, but
unfortunately he left a few years ago and I don't believe they have
general counsel anymore (he's still on the board of directors,
though). I'm not sure if we have legal representation for the Boost
Foundation, and more specifically the right type of legal
representation (e.g. someone with ample experience in software
intellectual property law).
Software Conservancy or the LLVM Foundation may be able to point us in
the direction of law firms / lawyers specializing in this area. If we
really believe we need a lawyer for this, it will probably cost money.
On Mon, Aug 24, 2020 at 2:04 PM Glen Fernandes via Boost
On Mon, Aug 24, 2020 at 5:01 PM Bjorn Reese wrote:
The Software Freedom Conservancy:
Although we would need confirmation from the Boost Steering Committee that we are still a member.
https://sites.google.com/a/boost.org/steering/boost-foundation
"After being a project of Software Freedom Conservancy for over a decade, Boost has spun-off into it owns 501(c)(3) non-profit corporation named Boost Foundation. We are forever grateful to Software Freedom Conservancy for their support and guidance to achieve this milestone.
Boost Foundation Board of Directors will now assume the role that was filled by the Boost Steering Committee."
-- Bryce Adelstein Lelbach aka wash US Programming Language Standards (PL22) Chair ISO C++ Library Evolution Chair CppCon and C++Now Program Chair CUDA Core C++ Libraries (Thrust, CUB, libcu++) Lead @ NVIDIA --
-----Original Message----- From: Boost
On Behalf Of Bryce Adelstein Lelbach aka wash via Boost Sent: 1 September 2020 03:12 To: boost@lists.boost.org Cc: Bryce Adelstein Lelbach aka wash Subject: Re: [boost] Boost.Text Unicode licence issues Software Conservancy used to have a general counsel on staff, but unfortunately he left a few years ago and I don't believe they have general counsel anymore (he's still on the board of directors,
not sure if we have legal representation for the Boost Foundation, and more specifically the right type of legal representation (e.g. someone with ample experience in software intellectual property law).
Software Conservancy or the LLVM Foundation may be able to point us in the direction of law firms / lawyers specializing in this area. If we really believe we need a lawyer for this, it will
I sense it is unlikely that we would get a definitive legal opinion, even after paying money, for a license detail that is clearly ill-defined. Are there any reasons why Unicode would wish to complain to (or worse sue) Boost or its users? I can't conceive of any, especially when Unicode have not done so already to Boost when it has been in use for a decade. Have the most stringent reading of the requirements been imposed on any other user of Unicode? Not to my knowledge. Zach's work provides a defensible position and allows us to plausibly claim compliance. I feel we are worrying about a non-issue. Paul though). I'm probably cost
money.
On Mon, Aug 24, 2020 at 2:04 PM Glen Fernandes via Boost
wrote: On Mon, Aug 24, 2020 at 5:01 PM Bjorn Reese wrote:
The Software Freedom Conservancy:
Although we would need confirmation from the Boost Steering Committee that we are still a member.
https://sites.google.com/a/boost.org/steering/boost-foundation
"After being a project of Software Freedom Conservancy for over a decade, Boost has spun-off into it owns 501(c)(3) non-profit corporation named Boost Foundation. We are forever grateful to Software Freedom Conservancy for their support and guidance to achieve this milestone.
Boost Foundation Board of Directors will now assume the role that was filled by the Boost Steering Committee."
-- Bryce Adelstein Lelbach aka wash US Programming Language Standards (PL22) Chair ISO C++ Library Evolution Chair CppCon and C++Now Program Chair CUDA Core C++ Libraries (Thrust, CUB, libcu++) Lead @ NVIDIA --
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 2020-09-01 14:50, Paul A Bristow via Boost wrote:
I sense it is unlikely that we would get a definitive legal opinion, even after paying money, for a license detail that is clearly ill-defined.
Are there any reasons why Unicode would wish to complain to (or worse sue) Boost or its users? I can't conceive of any, especially when Unicode have not done so already to Boost when it has been in use for a decade.
Have the most stringent reading of the requirements been imposed on any other user of Unicode? Not to my knowledge.
Zach's work provides a defensible position and allows us to plausibly claim compliance.
I feel we are worrying about a non-issue.
Licensing clarity is important for our downstream users. It doesn't matter if Unicode will sue anyone or not. If they are actually fine with the license not being distributed with binaries, then let them confirm this in written form. If they don't confirm then that would be seen as a violation, which we don't want to have in Boost, regardless of the consequences.
I sense it is unlikely that we would get a definitive legal opinion, even after paying money, for a
Paul A Bristow wrote: license detail that is clearly ill-defined. I agree. The proper role of our legal representative - if we had one - in this case would not be to provide us with legal advice, but to contact the legal representative of the Unicode Consortium, explain the situation (Boost does not allow libraries that impose an attribution requirement for binaries, which on its face precludes us ever having a Unicode library), ask them to maybe consider dropping that requirement from their license, failing that, ask them for an explicit permission for Boost libraries to use their data files without such a license requirement, failing that, ask them for a clear and an official statement that they do stand by this license requirement. (In the last case all we can do is write a few angry blog posts, tweet them and link them on Reddit.)
-----Original Message----- From: Boost
On Behalf Of Peter Dimov via Boost Sent: 1 September 2020 14:17 To: boost@lists.boost.org Cc: Peter Dimov Subject: Re: [boost] Boost.Text Unicode licence issues Paul A Bristow wrote:
I sense it is unlikely that we would get a definitive legal opinion, even after paying money, for a license detail that is clearly ill-defined.
I agree. The proper role of our legal representative - if we had one - in this case would not be to provide us with legal advice, but to contact the legal representative of the Unicode Consortium, explain
situation (Boost does not allow libraries that impose an attribution requirement for binaries, which on its face precludes us ever having a Unicode library), ask them to maybe consider dropping that requirement from their license, failing that, ask them for an explicit permission for Boost
the libraries to use
their data files without such a license requirement, failing that, ask them for a clear and an official statement that they do stand by this license requirement.
Ok with this - but, since we do not have a legal representative, can someone else write 'officially' from Boost? One of the steering group? Paul
On Tue, Sep 1, 2020 at 8:50 AM Paul A Bristow via Boost
-----Original Message----- From: Boost
On Behalf Of Peter Dimov via Boost Sent: 1 September 2020 14:17 To: boost@lists.boost.org Cc: Peter Dimov Subject: Re: [boost] Boost.Text Unicode licence issues Paul A Bristow wrote:
I sense it is unlikely that we would get a definitive legal opinion, even after paying money, for a license detail that is clearly ill-defined.
I agree. The proper role of our legal representative - if we had one - in this case would not be to provide us with legal advice, but to contact the legal representative of the Unicode Consortium, explain the situation (Boost does not allow libraries that impose an attribution requirement for binaries, which on its face precludes us ever having a Unicode library), ask them to maybe consider dropping that requirement from their license, failing that, ask them for an explicit permission for Boost libraries to use their data files without such a license requirement, failing that, ask them for a clear and an official statement that they do stand by this license requirement.
Ok with this - but, since we do not have a legal representative, can someone else write 'officially' from Boost?
One of the steering group?
Paul
Yes. I'm in the process of doing this now. An email is out; I'm awaiting a response. I'm on the Steering Committee (now the Boost Foundation Board). Zach
On Tue, 1 Sep 2020, 12:13 pm Bryce Adelstein Lelbach aka wash via Boost, < boost@lists.boost.org> wrote:
If we really believe we need a lawyer for this, it will probably cost money.
Boost only needs legal advice if Boost wants to avoid passing the license buck onto users of Boost.Text. And the user feedback here is that such avoidance is not essential and much preferable to having to depend on the icu library. The only barrier to passing through the license (and it only applies to the generated files, nothing else) is Boost policy. Boost doesn't need legal advice to grant/allow an exception to that policy only for code generated from other people's copyright "data only" source, only as explicitly granted by Boost, presumably only granted after Boost have attempted to get the upstream copyright holder(s) to allow at least this specific use under BSL...
On Mon, Aug 24, 2020 at 8:30 AM Phil Endecott via Boost
Zach Laine wrote:
On Sun, Aug 23, 2020 at 11:08 AM Phil Endecott via Boost
wrote: Could you please explain what you've done about the copyright issues?
Sure. I've reimplemented the code that originally came from ICU, and ...
As far as I can tell, you still depend on the Unicode data files that have a Boost-incompatible licence. You previously included this Unicode copyright text in the documentation but that page has now been removed, if I'm looking in the right place.
... removed the ICU copyright from these files. They are the output of a code generation tool, and so are not copyrightable individually (like the output of lex and yacc).
For the benefit of everyone else let me describe what Zach does:
1. There are some files at unicode.org that have a Boost-incompatible licence.
2. Zach has some Python scripts at https://github.com/tzlaine/text/tree/master/scripts
3. The scripts download the files from unicode.org, convert them into C++ source files, and prefix the result "(C) Zach Laine Boost License".
4. These generated files are checked in at https://github.com/tzlaine/text/tree/master/include/boost/text/data The intention is not that end-users of Boost.Text will run the scripts, but rather that the generated files will be included in the Boost source distribution.
Zach thinks this is OK because "they are the output of a code generation tool, and so are not copyrightable individually (like the output of lex and yacc)".
I think that's completely wrong. I believe it's a well-established principle of software copyright law that the output of a tool - whether that is g++, bison, or rot13 - is a derived work of the input to that tool. You cannot (without permission) take example.y that's (C) Megacorp, run bison on it, and claim that the resulting example.tab.c is now (C) Someone Else.
This worries me. We really, really don't want to be shipping code that has copyright violations!
Agreed, though I don't think this is one instance. If this is a copyright violation, we have been in violation for years and years already. Look in boost/spirit/home/support/char_encoding/unicode/UnicodeData.txt boost/spirit/home/support/char_encoding/unicode/DerivedCoreProperties.txt and the other files in that directory. Note that these are in the header paths, not inside src/ or something. DerivedCoreProperties.txt even has the Unicode copyright still on it. Moreover, the data in the files in that directory is derived from the .txt files. Even though the .txt files appear to have been removed on November 19, we end up with the same problem, to the extent it is a problem, that Phil raises -- distribution of code derived from non-code .txt files. Zach
For the record, Zach requested to mark the Text submission as withdrawn [1].
So, we currently are not looking for review manager.
[1] https://github.com/boostorg/website/commit/f1da3a92969d0e778b43c0a77ac16dffe...
On Sun, 23 Aug 2020 at 01:05, Zach Laine via Boost
I've made all the recommended changes to Boost.Text that came up in the review. Glen, the previous review manager, recommended a mini review of Boost.Text once I'd made those changes.
If you'd like to manage the mini-review, please let me know.
For everyone, including potential reviewers and review managers, here are the major changes:
* All the specific review feedback issues have been addressed. * The string layer is now gone. * The text layer got a major re-work. It is now a set of template-based adaptors (basic_text<>, vs. a text typedef. etc.), which lets everyone get what they want. You can change the underlying storage type (including whatever allocator you prefer), select UTF-8 or UTF-16, and change the normalization form. Glen recommended I drop this layer, but the library is way less useful without this stuff, and these changes address all the concerns raised about the text layer in the review.
The updated docs can be found here:
https://tzlaine.github.io/text/doc/html/index.html
Oh yeah, and I added concept constraints to the whole library when you build in C++20 mode.
Zach
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Mateusz Loskot, http://mateusz.loskot.net
participants (17)
-
Andrey Semashev
-
Bjorn Reese
-
Bryce Adelstein Lelbach aka wash
-
Christian Mazakas
-
Damian Vicino
-
Darryl Green
-
Glen Fernandes
-
Janek Kozicki
-
Joel de Guzman
-
Mateusz Loskot
-
Mike
-
pbristow@hetp.u-net.com
-
Peter Dimov
-
Phil Endecott
-
Rainer Deyke
-
Ville Voutilainen
-
Zach Laine