Making copyright holders easier to parse

newer
[asio][coroutine] forbid_suspend...

Giovanni Mascellani

27 Jul 2019 27 Jul '19

1:36 a.m.

Dear Boost developers, I am one of the Debian developers maintaining the Boost package in Debian. As part of packaging policy, we want the copyright status of each file in Debian packages to be documented, which includes at least the name of the copyright holders and the license of that particular file. As you can believe, this is rather complicated for Boost, since there are a lot of files with many copyright holders. I started to use the tool bcp, which among other things is able to collect such copyright information. My clone is available here[1]. However, bcp often is confused by the inconsistent style of how copyright holders are listed in different files, so it required a lot a manual fixing. For example, sometimes copyright years are written before the name, sometimes after, sometimes they're absent; names are sometimes separated by commas, sometimes by newlines, maybe even sometimes by nothing. Sometimes there are spelling mistakes or casing inconsistencies between names, or inconsistent institutions' names. Bcp has some complicated regular expression to overcame all these differences, but the result is brittle at its best. [1] https://salsa.debian.org/gio/boost-copyright Since I am doing this manual work anyway, I might as well update the Boost files so that their copyright headers are more consistent and easy to parse, and then I could submit you patches for having them fixed in the Boost official repositories. My question is: are you interested in this kind of patches? Of course I would still go through the ordinary patch submission procedure. I am just asking if this kind of patches would be well received or not. To better illustrate what I mean, let me consider a few examples (the chosen files are completely random); file libs/log/include/boost/log/exceptions.hpp currently has:

...

/* * Copyright Andrey Semashev 2007 - 2015. * Distributed under the Boost Software License, Version 1.0. * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */

I might change this to:

...

/* * Copyright: 2007-2015 Andrey Semashev * License: Boost Software License, Version 1.0 * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */

File libs/atomic/include/boost/atomic/fences.hpp has:

...

/* * Distributed under the Boost Software License, Version 1.0. * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) * * Copyright (c) 2011 Helge Bahmann * Copyright (c) 2013 Tim Blechmann * Copyright (c) 2014 Andrey Semashev */

I might change this to:

...

/* * Copyright: 2011 Helge Bahmann * Copyright: 2013 Tim Blechmann * Copyright: 2014 Andrey Semashev * License: Boost Software License, Version 1.0 * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */

I hope these examples illustrate my intention. I think that having more easily parsable copyright holders could be useful for Debian, for Boost and for Boost adopters which should properly care about the licensing of the libraries they use. Thank you, Giovanni. -- Giovanni Mascellani <g.mascellani@gmail.com> Postdoc researcher - Université Libre de Bruxelles

Show replies by date

jrmarsha

27 Jul 27 Jul

3:11 a.m.

This sounds entirely reasonable.

Antony Polukhin

4:46 a.m.

On Sat, Jul 27, 2019, 04:44 Giovanni Mascellani via Boost < boost@lists.boost.org> wrote: <...>

...

My question is: are you interested in this kind of patches? Of course I would still go through the ordinary patch submission procedure. I am just asking if this kind of patches would be well received or not.

I'd be glad to merge such patches. Many thanks in advance! BTW, we'll have to update one of our static analysis tools to make sure that all the files have the right copyright notice format.

pbristow＠hetp.u-net.com

9:14 a.m.

...

-----Original Message----- From: Boost <boost-bounces@lists.boost.org> On Behalf Of Antony Polukhin via Boost Sent: 27 July 2019 05:46 To: boost@lists.boost.org List <boost@lists.boost.org> Cc: Antony Polukhin <antoshkka@gmail.com> Subject: Re: [boost] Making copyright holders easier to parse

On Sat, Jul 27, 2019, 04:44 Giovanni Mascellani via Boost < boost@lists.boost.org> wrote: <...>

...
My question is: are you interested in this kind of patches? Of course I would still go through the ordinary patch submission procedure. I am just asking if this kind of patches would be well received or not.

I'd be glad to merge such patches. Many thanks in advance!

BTW, we'll have to update one of our static analysis tools to make sure that all the files have the right copyright notice format.

We already have our inspect tool https://www.boost.org/doc/libs/release/tools/inspect/ (Although we don't make as much use of it as we should and we ought to be checking it more carefully and repairing any copyright omissions). This should (and I think does) ensure that there is a copyright claim and Bost license text/link for every file. And we could change it to enforce a uniform content (at a significant price of a big churn of file changes and big rebuilds - Boost is BIG). Is it really necessary to collect all the individual copyright owners names? Can't you just record as a 'Member of Boost'? Just checking 😉 Paul Paul A. Bristow Prizet Farmhouse Kendal, Cumbria LA8 8AB UK

pbristow＠hetp.u-net.com

29 Jul 29 Jul

8:50 a.m.

...

-----Original Message----- From: Boost <boost-bounces@lists.boost.org> On Behalf Of Paul A Bristow via Boost Sent: 27 July 2019 10:14 To: boost@lists.boost.org Cc: pbristow@hetp.u-net.com Subject: Re: [boost] Making copyright holders easier to parse

...
-----Original Message----- From: Boost <boost-bounces@lists.boost.org> On Behalf Of Antony Polukhin via Boost Sent: 27 July 2019 05:46 To: boost@lists.boost.org List <boost@lists.boost.org> Cc: Antony Polukhin <antoshkka@gmail.com> Subject: Re: [boost] Making copyright holders easier to parse

On Sat, Jul 27, 2019, 04:44 Giovanni Mascellani via Boost < boost@lists.boost.org> wrote: <...>

...
My question is: are you interested in this kind of patches? Of course I would still go through the ordinary patch submission procedure. I am just asking if this kind of patches would be well received or not.

I have another suggestion. We already have the inspect program written in C++ which 'parses' and emits an html report on missing copyright (and many other transgressions of Boot guidelines). See https://www.boost.org/doc/libs/release/tools/inspect/inspect.cpp and copyright_check.cpp and .hpp I suspect that this could easily be altered to add an output to a file of copyright authors(s) and date(s) in whatever format and file type is easiest for Debian to deal with. For example a test file containing Library_name Author(s)_name Copyright_Date(s) ... The build tools are in I:\boost\tools\inspect If this works, I feel we could use this updated version in Boost itself. Other packagers and those needing to jump through copyright and GDPR hoops might find helpful. This would avoid a paroxysm in our extensive CI system 😊 Paul Paul A. Bristow Prizet Farmhouse Kendal, Cumbria LA8 8AB UK

Andrey Semashev

27 Jul 27 Jul

9:02 a.m.

On 7/27/19 4:36 AM, Giovanni Mascellani via Boost wrote:

...

...
/* * Copyright Andrey Semashev 2007 - 2015. * Distributed under the Boost Software License, Version 1.0. * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */

I might change this to:

...
/* * Copyright: 2007-2015 Andrey Semashev * License: Boost Software License, Version 1.0 * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */

File libs/atomic/include/boost/atomic/fences.hpp has:

...
/* * Distributed under the Boost Software License, Version 1.0. * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) * * Copyright (c) 2011 Helge Bahmann * Copyright (c) 2013 Tim Blechmann * Copyright (c) 2014 Andrey Semashev */

I might change this to:

...
/* * Copyright: 2011 Helge Bahmann * Copyright: 2013 Tim Blechmann * Copyright: 2014 Andrey Semashev * License: Boost Software License, Version 1.0 * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */

I hope these examples illustrate my intention. I think that having more easily parsable copyright holders could be useful for Debian, for Boost and for Boost adopters which should properly care about the licensing of the libraries they use.

I'd prefer if the license headers were also easily readable by human users, since it is humans these headers are intended for in the first place. In particular, keep the copyright holders visually separate from the license e.g. by an empty line. I'm not very keen on using colon to introduce a-la-HTTP headers, but that might be ok if everyone agrees. Are we sure that the "Distributed under" part has no legal significance? Also, we have an "inspect" tool that checks for the license header presence. Make sure that the modified headers satisfy that tool. Also, it might be a good time to update license URLs to https.

Rene Rivera

11:04 a.m.

Not to discourage your effort but... On Fri, Jul 26, 2019 at 7:44 PM Giovanni Mascellani via Boost < boost@lists.boost.org> wrote:

...

...
/* * Copyright Andrey Semashev 2007 - 2015. * Distributed under the Boost Software License, Version 1.0. * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */

I might change this to:

...
/* * Copyright: 2007-2015 Andrey Semashev * License: Boost Software License, Version 1.0 * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */

Both of those changes, to the attribution and licensing, would almost certainly require Boost to get legal consult. As the current template, as describe in <https://www.boost.org/users/license.html <https://www.boost.org/users/license.html#FAQ>> was a product of the original creation of the Boost Software License. Although I'm all for making the attributions and licensing consistent :-) -- -- Rene Rivera -- Grafik - Don't Assume Anything -- Robot Dreams - http://robot-dreams.net

Bo Persson

11:51 a.m.

On 2019-07-27 at 13:04, Rene Rivera via Boost wrote:

...

Not to discourage your effort but...

On Fri, Jul 26, 2019 at 7:44 PM Giovanni Mascellani via Boost < boost@lists.boost.org> wrote:

...
...
/* * Copyright Andrey Semashev 2007 - 2015. * Distributed under the Boost Software License, Version 1.0. * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */

I might change this to:

...
/* * Copyright: 2007-2015 Andrey Semashev * License: Boost Software License, Version 1.0 * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */

Both of those changes, to the attribution and licensing, would almost certainly require Boost to get legal consult. As the current template, as describe in <https://www.boost.org/users/license.html <https://www.boost.org/users/license.html#FAQ>> was a product of the original creation of the Boost Software License.

Sounds like a good idea. Old US copyright laws specifically mention "Copyright" and "Copr." as proper forms, but says nothing about "Copyright:". Wouldn't want to stuble on such a technicality, would we? :-)

...

Although I'm all for making the attributions and licensing consistent :-)

Right. Bo Persson

Hans Dembinski

5 Aug 5 Aug

7:56 a.m.

...

On 27. Jul 2019, at 06:51, Bo Persson via Boost <boost@lists.boost.org> wrote:

Sounds like a good idea.

Old US copyright laws specifically mention "Copyright" and "Copr." as proper forms, but says nothing about "Copyright:".

Wouldn't want to stuble on such a technicality, would we? :-)

...
Although I'm all for making the attributions and licensing consistent :-)

Right.

The best proposal I have seen in this thread is to consistently apply the template from https://www.boost.org/users/license.html everywhere. If it is consistently applied, it is easy to parse automatically, even if the format is not particularly parser-friendly. As was said before, we need a new check in the Boost test matrix https://www.boost.org/development/tests/develop/developer/summary.html to enforce consistency, since the inspection reports http://boost.cowic.de/rc/docs-inspect-develop.html are currently not enforced. License and copyright errors are by far the most common problems found by the inspection tool, so by adding a check to the test matrix we could clean that up a lot. Best regards, Hans

pbristow＠hetp.u-net.com

9:37 a.m.

...

-----Original Message----- From: Boost <boost-bounces@lists.boost.org> On Behalf Of Hans Dembinski via Boost Sent: 5 August 2019 08:57 To: Boost Devs <boost@lists.boost.org> Cc: Hans Dembinski <hans.dembinski@gmail.com>; Bo Persson <bo@bo- persson.se> Subject: Re: [boost] Making copyright holders easier to parse

...
On 27. Jul 2019, at 06:51, Bo Persson via Boost <boost@lists.boost.org> wrote:

Sounds like a good idea.

Old US copyright laws specifically mention "Copyright" and "Copr." as proper forms, but says nothing about "Copyright:".

Wouldn't want to stuble on such a technicality, would we? :-)

...
Although I'm all for making the attributions and licensing consistent :-)

Right.

The best proposal I have seen in this thread is to consistently apply the template from https://www.boost.org/users/license.html everywhere. If it is consistently applied, it is easy to parse automatically, even if the format is not particularly parser-friendly.

As was said before, we need a new check in the Boost test matrix https://www.boost.org/development/tests/develop/developer/summary.html to enforce consistency, since the inspection reports http://boost.cowic.de/rc/docs-inspect-develop.html are currently not enforced.

License and copyright errors are by far the most common problems found by the inspection tool, so by adding a check to the test matrix we could clean that up a lot.

+1 The inspect tool is neglected. It can (and should) be run *locally* by each library maintainer. Cd to boost/libs/somelibrary >inspect > inspect.html And inspect the file written called inspect.html - or whatever you called it. (and then delete after reading(and correcting 'transgressions' ) to avoid inspect.html being flagged as a dodgy file 😉 Paul

Rene Rivera

11:33 a.m.

On Mon, Aug 5, 2019 at 2:57 AM Hans Dembinski via Boost < boost@lists.boost.org> wrote:

...

...
On 27. Jul 2019, at 06:51, Bo Persson via Boost <boost@lists.boost.org> wrote:

Sounds like a good idea.

Old US copyright laws specifically mention "Copyright" and "Copr." as proper forms, but says nothing about "Copyright:".

Wouldn't want to stuble on such a technicality, would we? :-)

...
Although I'm all for making the attributions and licensing consistent :-)

Right.

The best proposal I have seen in this thread is to consistently apply the template from https://www.boost.org/users/license.html everywhere. If it is consistently applied, it is easy to parse automatically, even if the format is not particularly parser-friendly.

...

As was said before, we need a new check in the Boost test matrix https://www.boost.org/development/tests/develop/developer/summary.html to enforce consistency, since the inspection reports http://boost.cowic.de/rc/docs-inspect-develop.html are currently not enforced.

License and copyright errors are by far the most common problems found by the inspection tool, so by adding a check to the test matrix we could clean that up a lot.

Maybe.. It should be easy though. It just takes someone to add such a check to < https://github.com/boostorg/boost/blob/develop/status/boost_check_library.py

...

.

-- -- Rene Rivera -- Grafik - Don't Assume Anything -- Robot Dreams - http://robot-dreams.net

Vinnie Falco

4:15 p.m.

Q: "How many Boost engineers does it take to change a copyright notice?" A: "All of them."

Joseph Van Riper

6 Aug 6 Aug

10:39 a.m.

On Mon, Aug 5, 2019 at 12:15 PM Vinnie Falco via Boost < boost@lists.boost.org> wrote:

...

Q: "How many Boost engineers does it take to change a copyright notice?"

A: "All of them."

Ah, but it's peer reviewed to be the best copyright notice available. - Trey

jrmarsha

7 Aug 7 Aug

3:50 a.m.

Can someone point me to where the license checking is used in CI? boost-ci doesn't have an example.

Hans Dembinski

8:06 a.m.

...

On 7. Aug 2019, at 05:50, jrmarsha via Boost <boost@lists.boost.org> wrote:

Can someone point me to where the license checking is used in CI? boost-ci doesn't have an example.

Maybe you should read this whole thread again carefully, the answers are here. There is currently no license checking done in CI. There is the inspect tool https://github.com/boostorg/inspect which has a license check, but it is not run in CI and it was not written to be run as a unit test. boost-ci is a new project, it is not used by all Boost projects. You probably want to add the check to __boost_check_library__, a special check that is run together with the library-local tests in the boost test matrix https://www.boost.org/development/tests/develop/developer/summary.html You can find the code for boost_check_library here: https://github.com/boostorg/boost/blob/master/status/boost_check_library.py Best regards, Hans

Glen Fernandes

27 Jul 27 Jul

1:22 p.m.

On Sat, Jul 27, 2019 at 7:05 AM Rene Rivera wrote:

...

Not to discourage your effort but...

Both of those changes, to the attribution and licensing, would almost certainly require Boost to get legal consult. As the current template, as describe in <https://www.boost.org/users/license.html <https://www.boost.org/users/license.html#FAQ>> was a product of the original creation of the Boost Software License.

Although I'm all for making the attributions and licensing consistent :-)

+1. Instead of inventing a new format now, if anything, why not make them consistent with https://www.boost.org/users/license.html prescribes? After all, many of our libraries are already consistent with it, and as Rene (and the page) conveys, some effort when into deciding things like that. The page has the format "Copyright Joe Coder 2004 - 2006" and most libraries have that, or "Copyright (C) Joe Coder 2004 - 2006". Both are easy to parse without needing to introduce a colon after "Copyright:". In any case, some discussion needs to happen around what format we want, the License page should be updated first, all before any pull requests start being made. Glen

Giovanni Mascellani

6:38 p.m.

Hi, Il 27/07/19 10:22, Glen Fernandes ha scritto:

...

+1.

Thank you for everybody's feedback, which seems to be mostly positive!

...

Instead of inventing a new format now, if anything, why not make them consistent with https://www.boost.org/users/license.html prescribes? After all, many of our libraries are already consistent with it, and as Rene (and the page) conveys, some effort when into deciding things like that.

The page has the format "Copyright Joe Coder 2004 - 2006" and most libraries have that, or "Copyright (C) Joe Coder 2004 - 2006". Both are easy to parse without needing to introduce a colon after "Copyright:".

I wasn't actually pushing for any specific header format, sorry for not making this clear. To me, anything that can be easily automatically parsed is fine, including of course the one mentioned in the FAQs (which I had not previously noticed). However, that template does not cover the case of more than one copyright holder. Would something like this be acceptable? // Copyright Joe Coder 2004 - 2006. // Copyright Bob Hacker 2010 - 2015. // Copyright Department of Writing Very Long Names, Newline // Company Inc. 2017 - 2019. // Distributed under the Boost Software License, Version 1.0. // (See accompanying file LICENSE_1_0.txt or copy at // https://www.boost.org/LICENSE_1_0.txt) If not, what other? (the thing with a very long name is not pretentious; there is already a "Institute of Transport, Railway Construction and Operation, University of Hanover" in Boost). If this proposal seems appropriate for you, I can start to patch a few files and submit them here, so that they can be evaluated more carefully (this might not immediate, as I still have to write the code to do so). Thanks again, Giovanni. -- Giovanni Mascellani <g.mascellani@gmail.com> Postdoc researcher - Université Libre de Bruxelles

Vinnie Falco

11:42 p.m.

On Sat, Jul 27, 2019 at 6:23 AM Glen Fernandes via Boost <boost@lists.boost.org> wrote:

...

...
Although I'm all for making the attributions and licensing consistent :-)

I am very much in favor of this as long as it does not require any of my source files to change. Regards

pbristow＠hetp.u-net.com

28 Jul 28 Jul

8:27 a.m.

...

-----Original Message----- From: Boost <boost-bounces@lists.boost.org> On Behalf Of Vinnie Falco via Boost Sent: 28 July 2019 00:43 To: boost@lists.boost.org List <boost@lists.boost.org> Cc: Vinnie Falco <vinnie.falco@gmail.com> Subject: Re: [boost] Making copyright holders easier to parse

On Sat, Jul 27, 2019 at 6:23 AM Glen Fernandes via Boost <boost@lists.boost.org> wrote:

...
...
Although I'm all for making the attributions and licensing consistent :-)

I am very much in favor of this as long as it does not require any of my source files to change.

+1 Because we have a massive Continuous Integration system with many compilers and platform that picks up changes to source files, *any* change to source, test or documentation files is like to trigger a massive recompilation and rebuild and retest and redocumentation. That will cause a load of machine time (already insufficient).. I think that we are pretty much following the guidelines https://www.boost.org/users/license.html (though they do not mention the common case of multiple authors). We (and anyone) can check the results of the inspect program to confirm that all files have a Boost copyright claim. All copyright lines have either a name or a date after the word copyright. Nobody has a digit in their name? Surely this isn't too difficult to parse? Any that cause trouble can be fixed individually if you tell us? Paul Paul A. Bristow Prizet Farmhouse Kendal, Cumbria LA8 8AB UK

Marc Glisse

9:06 a.m.

On Sun, 28 Jul 2019, Paul A Bristow via Boost wrote:

...

Because we have a massive Continuous Integration system with many compilers and platform that picks up changes to source files, *any* change to source, test or documentation files is like to trigger a massive recompilation and rebuild and retest and redocumentation.

That will cause a load of machine time (already insufficient)..

It appears that continuous integration in boost is a failure. CI is supposed to make it easier to make changes (it checks that your modifications don't break stuff). However, it seems that it is actually preventing people from touching anything. Not quite trolling, this seems like an argument to disable CI (or at least change its configuration significantly), not to avoid making the source changes. (for the debian boost packages, I was hoping that the election of a new DPL would help relax the requirements a bit...) -- Marc Glisse

Andrey Semashev

11:54 a.m.

On 7/28/19 11:27 AM, Paul A Bristow via Boost wrote:

...

Because we have a massive Continuous Integration system with many compilers and platform that picks up changes to source files, *any* change to source, test or documentation files is like to trigger a massive recompilation and rebuild and retest and redocumentation.

I don't think we build docs during our CI jobs, do we?

...

That will cause a load of machine time (already insufficient)..

If you want a commit to not trigger a CI job, you can add "[ci skip]" to the commit title line. Unfortunately, in case of PRs, that is on the submitter's concience.

Michael Caisse

29 Jul 29 Jul

5:10 p.m.

On 7/28/19 01:27, Paul A Bristow via Boost wrote:

...

Because we have a massive Continuous Integration system with many compilers and platform that picks up changes to source files, *any* change to source, test or documentation files is like to trigger a massive recompilation and rebuild and retest and redocumentation.

That will cause a load of machine time (already insufficient)..

We can stage this so there is a single update at the super project. -- Michael Caisse Ciere Consulting ciere.com

jrmarsha

27 Jul 27 Jul

1:44 p.m.

...

Both of those changes, to the attribution and licensing, would almost certainly require Boost to get legal consult.

I don't think that is the case. All the same information is conveyed with very similar context, manner, and detail. What I'd be interested in is adding this automated check to the general CI infrastructure.

James E. King III

28 Jul 28 Jul

1:10 p.m.

On Fri, Jul 26, 2019 at 9:44 PM Giovanni Mascellani via Boost <boost@lists.boost.org> wrote:

...

I hope these examples illustrate my intention. I think that having more easily parsable copyright holders could be useful for Debian, for Boost and for Boost adopters which should properly care about the licensing of the libraries they use.

Not a huge fan of the HTTP header syntax for copyright statements. Everyone has their own style as you can see. I always use: Copyright (C) YYYY - YYYY James E. King III In the examples shown, the ability to parse the original exists: A) Indication of a copyright, B) A year or a year range (YYYY, YYYY-YYYY, YYYY - YYYY, "YYYY, YYYY - YYYY, YYYY, ...") C) An optional copyright symbol D) One or more names All four sections are easily defined by allowed character content and/or keyword. Does "bcp" identify the files that it finds a "?opyright" statement in but cannot parse? Why not just fix those? Given we already have one (or more) regular expressions to find this information, how about adding Mergeable as a GitHub app to our repositories and adding a condition for a successful PR so things do not degrade? - Jim

2167

Age (days ago)

2178

Last active (days ago)

List overview

Download

23 comments

14 participants

participants (14)

Andrey Semashev
Antony Polukhin
Bo Persson
Giovanni Mascellani
Glen Fernandes
Hans Dembinski
James E. King III
Joseph Van Riper
jrmarsha
Marc Glisse
Michael Caisse
pbristow＠hetp.u-net.com
Rene Rivera
Vinnie Falco