Making copyright holders easier to parse
Dear Boost developers, I am one of the Debian developers maintaining the Boost package in Debian. As part of packaging policy, we want the copyright status of each file in Debian packages to be documented, which includes at least the name of the copyright holders and the license of that particular file. As you can believe, this is rather complicated for Boost, since there are a lot of files with many copyright holders. I started to use the tool bcp, which among other things is able to collect such copyright information. My clone is available here[1]. However, bcp often is confused by the inconsistent style of how copyright holders are listed in different files, so it required a lot a manual fixing. For example, sometimes copyright years are written before the name, sometimes after, sometimes they're absent; names are sometimes separated by commas, sometimes by newlines, maybe even sometimes by nothing. Sometimes there are spelling mistakes or casing inconsistencies between names, or inconsistent institutions' names. Bcp has some complicated regular expression to overcame all these differences, but the result is brittle at its best. [1] https://salsa.debian.org/gio/boost-copyright Since I am doing this manual work anyway, I might as well update the Boost files so that their copyright headers are more consistent and easy to parse, and then I could submit you patches for having them fixed in the Boost official repositories. My question is: are you interested in this kind of patches? Of course I would still go through the ordinary patch submission procedure. I am just asking if this kind of patches would be well received or not. To better illustrate what I mean, let me consider a few examples (the chosen files are completely random); file libs/log/include/boost/log/exceptions.hpp currently has:
/* * Copyright Andrey Semashev 2007 - 2015. * Distributed under the Boost Software License, Version 1.0. * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */
I might change this to:
/* * Copyright: 2007-2015 Andrey Semashev * License: Boost Software License, Version 1.0 * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */
File libs/atomic/include/boost/atomic/fences.hpp has:
/* * Distributed under the Boost Software License, Version 1.0. * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) * * Copyright (c) 2011 Helge Bahmann * Copyright (c) 2013 Tim Blechmann * Copyright (c) 2014 Andrey Semashev */
I might change this to:
/* * Copyright: 2011 Helge Bahmann * Copyright: 2013 Tim Blechmann * Copyright: 2014 Andrey Semashev * License: Boost Software License, Version 1.0 * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */
I hope these examples illustrate my intention. I think that having more
easily parsable copyright holders could be useful for Debian, for Boost
and for Boost adopters which should properly care about the licensing of
the libraries they use.
Thank you, Giovanni.
--
Giovanni Mascellani
On Sat, Jul 27, 2019, 04:44 Giovanni Mascellani via Boost < boost@lists.boost.org> wrote: <...>
My question is: are you interested in this kind of patches? Of course I would still go through the ordinary patch submission procedure. I am just asking if this kind of patches would be well received or not.
I'd be glad to merge such patches. Many thanks in advance! BTW, we'll have to update one of our static analysis tools to make sure that all the files have the right copyright notice format.
-----Original Message----- From: Boost
On Behalf Of Antony Polukhin via Boost Sent: 27 July 2019 05:46 To: boost@lists.boost.org List Cc: Antony Polukhin Subject: Re: [boost] Making copyright holders easier to parse On Sat, Jul 27, 2019, 04:44 Giovanni Mascellani via Boost < boost@lists.boost.org> wrote: <...>
My question is: are you interested in this kind of patches? Of course I would still go through the ordinary patch submission procedure. I am just asking if this kind of patches would be well received or not.
I'd be glad to merge such patches. Many thanks in advance!
BTW, we'll have to update one of our static analysis tools to make sure that all the files have the right copyright notice format.
We already have our inspect tool https://www.boost.org/doc/libs/release/tools/inspect/ (Although we don't make as much use of it as we should and we ought to be checking it more carefully and repairing any copyright omissions). This should (and I think does) ensure that there is a copyright claim and Bost license text/link for every file. And we could change it to enforce a uniform content (at a significant price of a big churn of file changes and big rebuilds - Boost is BIG). Is it really necessary to collect all the individual copyright owners names? Can't you just record as a 'Member of Boost'? Just checking 😉 Paul Paul A. Bristow Prizet Farmhouse Kendal, Cumbria LA8 8AB UK
-----Original Message----- From: Boost
On Behalf Of Paul A Bristow via Boost Sent: 27 July 2019 10:14 To: boost@lists.boost.org Cc: pbristow@hetp.u-net.com Subject: Re: [boost] Making copyright holders easier to parse -----Original Message----- From: Boost
On Behalf Of Antony Polukhin via Boost Sent: 27 July 2019 05:46 To: boost@lists.boost.org List Cc: Antony Polukhin Subject: Re: [boost] Making copyright holders easier to parse On Sat, Jul 27, 2019, 04:44 Giovanni Mascellani via Boost < boost@lists.boost.org> wrote: <...>
My question is: are you interested in this kind of patches? Of course I would still go through the ordinary patch submission procedure. I am just asking if this kind of patches would be well received or not.
I have another suggestion. We already have the inspect program written in C++ which 'parses' and emits an html report on missing copyright (and many other transgressions of Boot guidelines). See https://www.boost.org/doc/libs/release/tools/inspect/inspect.cpp and copyright_check.cpp and .hpp I suspect that this could easily be altered to add an output to a file of copyright authors(s) and date(s) in whatever format and file type is easiest for Debian to deal with. For example a test file containing Library_name Author(s)_name Copyright_Date(s) ... The build tools are in I:\boost\tools\inspect If this works, I feel we could use this updated version in Boost itself. Other packagers and those needing to jump through copyright and GDPR hoops might find helpful. This would avoid a paroxysm in our extensive CI system 😊 Paul Paul A. Bristow Prizet Farmhouse Kendal, Cumbria LA8 8AB UK
On 7/27/19 4:36 AM, Giovanni Mascellani via Boost wrote:
/* * Copyright Andrey Semashev 2007 - 2015. * Distributed under the Boost Software License, Version 1.0. * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */
I might change this to:
/* * Copyright: 2007-2015 Andrey Semashev * License: Boost Software License, Version 1.0 * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */
File libs/atomic/include/boost/atomic/fences.hpp has:
/* * Distributed under the Boost Software License, Version 1.0. * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) * * Copyright (c) 2011 Helge Bahmann * Copyright (c) 2013 Tim Blechmann * Copyright (c) 2014 Andrey Semashev */
I might change this to:
/* * Copyright: 2011 Helge Bahmann * Copyright: 2013 Tim Blechmann * Copyright: 2014 Andrey Semashev * License: Boost Software License, Version 1.0 * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */
I hope these examples illustrate my intention. I think that having more easily parsable copyright holders could be useful for Debian, for Boost and for Boost adopters which should properly care about the licensing of the libraries they use.
I'd prefer if the license headers were also easily readable by human users, since it is humans these headers are intended for in the first place. In particular, keep the copyright holders visually separate from the license e.g. by an empty line. I'm not very keen on using colon to introduce a-la-HTTP headers, but that might be ok if everyone agrees. Are we sure that the "Distributed under" part has no legal significance? Also, we have an "inspect" tool that checks for the license header presence. Make sure that the modified headers satisfy that tool. Also, it might be a good time to update license URLs to https.
Not to discourage your effort but... On Fri, Jul 26, 2019 at 7:44 PM Giovanni Mascellani via Boost < boost@lists.boost.org> wrote:
/* * Copyright Andrey Semashev 2007 - 2015. * Distributed under the Boost Software License, Version 1.0. * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */
I might change this to:
/* * Copyright: 2007-2015 Andrey Semashev * License: Boost Software License, Version 1.0 * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */
Both of those changes, to the attribution and licensing, would almost certainly require Boost to get legal consult. As the current template, as describe in <https://www.boost.org/users/license.html https://www.boost.org/users/license.html#FAQ> was a product of the original creation of the Boost Software License. Although I'm all for making the attributions and licensing consistent :-) -- -- Rene Rivera -- Grafik - Don't Assume Anything -- Robot Dreams - http://robot-dreams.net
On 2019-07-27 at 13:04, Rene Rivera via Boost wrote:
Not to discourage your effort but...
On Fri, Jul 26, 2019 at 7:44 PM Giovanni Mascellani via Boost < boost@lists.boost.org> wrote:
/* * Copyright Andrey Semashev 2007 - 2015. * Distributed under the Boost Software License, Version 1.0. * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */
I might change this to:
/* * Copyright: 2007-2015 Andrey Semashev * License: Boost Software License, Version 1.0 * (See accompanying file LICENSE_1_0.txt or copy at * http://www.boost.org/LICENSE_1_0.txt) */
Both of those changes, to the attribution and licensing, would almost certainly require Boost to get legal consult. As the current template, as describe in <https://www.boost.org/users/license.html https://www.boost.org/users/license.html#FAQ> was a product of the original creation of the Boost Software License.
Sounds like a good idea. Old US copyright laws specifically mention "Copyright" and "Copr." as proper forms, but says nothing about "Copyright:". Wouldn't want to stuble on such a technicality, would we? :-)
Although I'm all for making the attributions and licensing consistent :-)
Right. Bo Persson
On 27. Jul 2019, at 06:51, Bo Persson via Boost
wrote: Sounds like a good idea.
Old US copyright laws specifically mention "Copyright" and "Copr." as proper forms, but says nothing about "Copyright:".
Wouldn't want to stuble on such a technicality, would we? :-)
Although I'm all for making the attributions and licensing consistent :-)
Right.
The best proposal I have seen in this thread is to consistently apply the template from https://www.boost.org/users/license.html everywhere. If it is consistently applied, it is easy to parse automatically, even if the format is not particularly parser-friendly. As was said before, we need a new check in the Boost test matrix https://www.boost.org/development/tests/develop/developer/summary.html to enforce consistency, since the inspection reports http://boost.cowic.de/rc/docs-inspect-develop.html are currently not enforced. License and copyright errors are by far the most common problems found by the inspection tool, so by adding a check to the test matrix we could clean that up a lot. Best regards, Hans
-----Original Message----- From: Boost
On Behalf Of Hans Dembinski via Boost Sent: 5 August 2019 08:57 To: Boost Devs Cc: Hans Dembinski ; Bo Persson Subject: Re: [boost] Making copyright holders easier to parse On 27. Jul 2019, at 06:51, Bo Persson via Boost
wrote: Sounds like a good idea.
Old US copyright laws specifically mention "Copyright" and "Copr." as proper forms, but says nothing about "Copyright:".
Wouldn't want to stuble on such a technicality, would we? :-)
Although I'm all for making the attributions and licensing consistent :-)
Right.
The best proposal I have seen in this thread is to consistently apply the template from https://www.boost.org/users/license.html everywhere. If it is consistently applied, it is easy to parse automatically, even if the format is not particularly parser-friendly.
As was said before, we need a new check in the Boost test matrix https://www.boost.org/development/tests/develop/developer/summary.html to enforce consistency, since the inspection reports http://boost.cowic.de/rc/docs-inspect-develop.html are currently not enforced.
License and copyright errors are by far the most common problems found by the inspection tool, so by adding a check to the test matrix we could clean that up a lot.
+1 The inspect tool is neglected. It can (and should) be run *locally* by each library maintainer. Cd to boost/libs/somelibrary >inspect > inspect.html And inspect the file written called inspect.html - or whatever you called it. (and then delete after reading(and correcting 'transgressions' ) to avoid inspect.html being flagged as a dodgy file 😉 Paul
On Mon, Aug 5, 2019 at 2:57 AM Hans Dembinski via Boost < boost@lists.boost.org> wrote:
On 27. Jul 2019, at 06:51, Bo Persson via Boost
wrote: Sounds like a good idea.
Old US copyright laws specifically mention "Copyright" and "Copr." as proper forms, but says nothing about "Copyright:".
Wouldn't want to stuble on such a technicality, would we? :-)
Although I'm all for making the attributions and licensing consistent :-)
Right.
The best proposal I have seen in this thread is to consistently apply the template from https://www.boost.org/users/license.html everywhere. If it is consistently applied, it is easy to parse automatically, even if the format is not particularly parser-friendly.
As was said before, we need a new check in the Boost test matrix https://www.boost.org/development/tests/develop/developer/summary.html to enforce consistency, since the inspection reports http://boost.cowic.de/rc/docs-inspect-develop.html are currently not enforced.
License and copyright errors are by far the most common problems found by the inspection tool, so by adding a check to the test matrix we could clean that up a lot.
Maybe.. It should be easy though. It just takes someone to add such a check to < https://github.com/boostorg/boost/blob/develop/status/boost_check_library.py
.
-- -- Rene Rivera -- Grafik - Don't Assume Anything -- Robot Dreams - http://robot-dreams.net
Q: "How many Boost engineers does it take to change a copyright notice?" A: "All of them."
Can someone point me to where the license checking is used in CI? boost-ci doesn't have an example.
On 7. Aug 2019, at 05:50, jrmarsha via Boost
wrote: Can someone point me to where the license checking is used in CI? boost-ci doesn't have an example.
Maybe you should read this whole thread again carefully, the answers are here. There is currently no license checking done in CI. There is the inspect tool https://github.com/boostorg/inspect which has a license check, but it is not run in CI and it was not written to be run as a unit test. boost-ci is a new project, it is not used by all Boost projects. You probably want to add the check to __boost_check_library__, a special check that is run together with the library-local tests in the boost test matrix https://www.boost.org/development/tests/develop/developer/summary.html You can find the code for boost_check_library here: https://github.com/boostorg/boost/blob/master/status/boost_check_library.py Best regards, Hans
On Sat, Jul 27, 2019 at 7:05 AM Rene Rivera wrote:
Not to discourage your effort but...
Both of those changes, to the attribution and licensing, would almost certainly require Boost to get legal consult. As the current template, as describe in <https://www.boost.org/users/license.html https://www.boost.org/users/license.html#FAQ> was a product of the original creation of the Boost Software License.
Although I'm all for making the attributions and licensing consistent :-)
+1. Instead of inventing a new format now, if anything, why not make them consistent with https://www.boost.org/users/license.html prescribes? After all, many of our libraries are already consistent with it, and as Rene (and the page) conveys, some effort when into deciding things like that. The page has the format "Copyright Joe Coder 2004 - 2006" and most libraries have that, or "Copyright (C) Joe Coder 2004 - 2006". Both are easy to parse without needing to introduce a colon after "Copyright:". In any case, some discussion needs to happen around what format we want, the License page should be updated first, all before any pull requests start being made. Glen
Hi, Il 27/07/19 10:22, Glen Fernandes ha scritto:
+1.
Thank you for everybody's feedback, which seems to be mostly positive!
Instead of inventing a new format now, if anything, why not make them consistent with https://www.boost.org/users/license.html prescribes? After all, many of our libraries are already consistent with it, and as Rene (and the page) conveys, some effort when into deciding things like that.
The page has the format "Copyright Joe Coder 2004 - 2006" and most libraries have that, or "Copyright (C) Joe Coder 2004 - 2006". Both are easy to parse without needing to introduce a colon after "Copyright:".
I wasn't actually pushing for any specific header format, sorry for not
making this clear. To me, anything that can be easily automatically
parsed is fine, including of course the one mentioned in the FAQs (which
I had not previously noticed). However, that template does not cover the
case of more than one copyright holder. Would something like this be
acceptable?
// Copyright Joe Coder 2004 - 2006.
// Copyright Bob Hacker 2010 - 2015.
// Copyright Department of Writing Very Long Names, Newline
// Company Inc. 2017 - 2019.
// Distributed under the Boost Software License, Version 1.0.
// (See accompanying file LICENSE_1_0.txt or copy at
// https://www.boost.org/LICENSE_1_0.txt)
If not, what other?
(the thing with a very long name is not pretentious; there is already a
"Institute of Transport, Railway Construction and Operation, University
of Hanover" in Boost).
If this proposal seems appropriate for you, I can start to patch a few
files and submit them here, so that they can be evaluated more carefully
(this might not immediate, as I still have to write the code to do so).
Thanks again, Giovanni.
--
Giovanni Mascellani
-----Original Message----- From: Boost
On Behalf Of Vinnie Falco via Boost Sent: 28 July 2019 00:43 To: boost@lists.boost.org List Cc: Vinnie Falco Subject: Re: [boost] Making copyright holders easier to parse On Sat, Jul 27, 2019 at 6:23 AM Glen Fernandes via Boost
wrote: Although I'm all for making the attributions and licensing consistent :-)
I am very much in favor of this as long as it does not require any of my source files to change.
+1 Because we have a massive Continuous Integration system with many compilers and platform that picks up changes to source files, *any* change to source, test or documentation files is like to trigger a massive recompilation and rebuild and retest and redocumentation. That will cause a load of machine time (already insufficient).. I think that we are pretty much following the guidelines https://www.boost.org/users/license.html (though they do not mention the common case of multiple authors). We (and anyone) can check the results of the inspect program to confirm that all files have a Boost copyright claim. All copyright lines have either a name or a date after the word copyright. Nobody has a digit in their name? Surely this isn't too difficult to parse? Any that cause trouble can be fixed individually if you tell us? Paul Paul A. Bristow Prizet Farmhouse Kendal, Cumbria LA8 8AB UK
On Sun, 28 Jul 2019, Paul A Bristow via Boost wrote:
Because we have a massive Continuous Integration system with many compilers and platform that picks up changes to source files, *any* change to source, test or documentation files is like to trigger a massive recompilation and rebuild and retest and redocumentation.
That will cause a load of machine time (already insufficient)..
It appears that continuous integration in boost is a failure. CI is supposed to make it easier to make changes (it checks that your modifications don't break stuff). However, it seems that it is actually preventing people from touching anything. Not quite trolling, this seems like an argument to disable CI (or at least change its configuration significantly), not to avoid making the source changes. (for the debian boost packages, I was hoping that the election of a new DPL would help relax the requirements a bit...) -- Marc Glisse
On 7/28/19 11:27 AM, Paul A Bristow via Boost wrote:
Because we have a massive Continuous Integration system with many compilers and platform that picks up changes to source files, *any* change to source, test or documentation files is like to trigger a massive recompilation and rebuild and retest and redocumentation.
I don't think we build docs during our CI jobs, do we?
That will cause a load of machine time (already insufficient)..
If you want a commit to not trigger a CI job, you can add "[ci skip]" to the commit title line. Unfortunately, in case of PRs, that is on the submitter's concience.
On 7/28/19 01:27, Paul A Bristow via Boost wrote:
Because we have a massive Continuous Integration system with many compilers and platform that picks up changes to source files, *any* change to source, test or documentation files is like to trigger a massive recompilation and rebuild and retest and redocumentation.
That will cause a load of machine time (already insufficient)..
We can stage this so there is a single update at the super project. -- Michael Caisse Ciere Consulting ciere.com
Both of those changes, to the attribution and licensing, would almost certainly require Boost to get legal consult.
I don't think that is the case. All the same information is conveyed with very similar context, manner, and detail. What I'd be interested in is adding this automated check to the general CI infrastructure.
On Fri, Jul 26, 2019 at 9:44 PM Giovanni Mascellani via Boost
I hope these examples illustrate my intention. I think that having more easily parsable copyright holders could be useful for Debian, for Boost and for Boost adopters which should properly care about the licensing of the libraries they use.
Not a huge fan of the HTTP header syntax for copyright statements. Everyone has their own style as you can see. I always use: Copyright (C) YYYY - YYYY James E. King III In the examples shown, the ability to parse the original exists: A) Indication of a copyright, B) A year or a year range (YYYY, YYYY-YYYY, YYYY - YYYY, "YYYY, YYYY - YYYY, YYYY, ...") C) An optional copyright symbol D) One or more names All four sections are easily defined by allowed character content and/or keyword. Does "bcp" identify the files that it finds a "?opyright" statement in but cannot parse? Why not just fix those? Given we already have one (or more) regular expressions to find this information, how about adding Mergeable as a GitHub app to our repositories and adding a condition for a successful PR so things do not degrade? - Jim
participants (14)
-
Andrey Semashev
-
Antony Polukhin
-
Bo Persson
-
Giovanni Mascellani
-
Glen Fernandes
-
Hans Dembinski
-
James E. King III
-
Joseph Van Riper
-
jrmarsha
-
Marc Glisse
-
Michael Caisse
-
pbristow@hetp.u-net.com
-
Rene Rivera
-
Vinnie Falco