Thanks, Steve, for your EXCELLENT exposition in point 6 of the issues involved.
Are folks familiar with http://www.blackducksoftware.com/protex ? (I have no interest in Black Duck and have not myself ever used their products or services.)
The “offending” code unfortunately does not even have to come from the Internet:
Jones is working on a software project. He engages his buddy Smith to write portions of the code on a handshake sub-contractor basis. Jones subsequently contributes some of the code to Boost or another open source project, with all of the proper paperwork. Smith probably has a copyright claim on any code that uses the open source project.
Smith is a nice guy and told Jones “he would never sue anybody” but when he sees the name Disney the cash register in his mind goes ca-ching! He rationalizes suing on the basis that Disney (or fill in your favorite corporation) is part of the evil empire.
Charles
From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of steve@parisgroup.net
Sent: Wednesday, September 05, 2012 8:05 PM
To: boost-users@lists.boost.org
Subject: Re: [Boost-users] Why is there so much co-dependency in Boost? Is there anything to be done about it?
Hi again,
I'm the original poster that started this thread. WOW! Thanks for all of the great responses. I apologize for posting this message and then getting called away on a business trip. It is only just now that I'm getting back to see what kind of response I got, and I'm thrilled. I'm happy to see that a number of folks involved with Boost see this issue as a significant problem, if only to certain types of companies.
APOLOGY: I must apologize for a small mistake in my numbers, that might be somewhat important to someone. I managed to reverse the counts for the "Smart Ptr" and "String Algo" libraries. I remember thinking it kinda strange that one referenced more modules but the other referenced more lines. So it's really true that "String Algo" causes 382 files to be read, while "Smart Ptr" causes only 180 to be read. Sorry about that.
I've taken a first pass through all the responses, and rather than respond to each of them individually, I'll offer some more information here and attempt to address address some of the questions that have been pointed back to me.
1) How did I get these numbers. Give some examples.
Here's one of the places I'd love to be shown to be wrong. If my numbers are inflated, my sales job to my boss will be that much easier. So by all means, someone correct me if my approach is unsound.
What I did was very simple. All I did was compile a very simple program and have g++ give me a list of all of the headers it read during the compilation, excluding system headers. This is done using the following line from my test Makefiles:
$(CXX) $(CPPFLAGS) $(CXXFLAGS) -c $< 2> /dev/null -MM > headers.lst
Here's the test for SmartPtr:
#include <iostream>
#include
using namespace std;
int main(int argc, char* argv[])
{
return 0;
}
This simple test produces a file named headers.lst with 180 unique header paths in it, all starting with "boost/".
Discovering the modules used by each module took a few hours of fairly tedious labor, where I sorted and then grouped each list of headers, where each group consisted of headers coming from the same module.
2) Here are the specific module dependencies:
Any: "base", Config, Exception, MPL, Preprocessor, Static Assert, Type Traits, Utility
Filesystem: "base", Config, Exception, Functional, Integer, Iterators, MPL, Preprocessor, Smart Ptr, Static Assert, Type Traits, Utility
SmartPtr "base", Config, Exception, MPL, Preprocessor, Static Assert, Type Traits, Utility
StringAlgo: "base", Bind, Compatibility, Concept Check, Config, Exception, Function, Integer, Iterators, MPL, Preprocessor, Range, Static Assert, String Algo, Type Traits, Utility
3) In response to the suggestion to not use the convenience headers, like say "smart_ptr.hpp" as apposed to a header for an individual header type.
It's bad enough to tell my programmers they can only use certain Boost modules. To tell them that they can only use certain parts of certain Boost modules just gets to be too much. Plus, I can see eventually using most, if not all of the functionality of the SmartPtr module. The same can be said for the other modules I'm interested in. If I have to run to my boss every time I want to use one new particular feature from a Boost module, it's not worth the effort. Nor would it be worth the overhead of figuring out how to police such a level of code use.
So for better or worse, my consideration of the use of Boost has to be on a Module by Module basis.
4) In response to "the license says that it's free to use, and the copyright holders have agreed to that license, so everything is fine".
That's not true in the legal world. Neither the license nor any statements made by the person stating a copyright mean anything if that person somehow, if intentionally or unintentionally, included some bit of someone else's code in what they are calling their own. If the original writer of the code can prove original authorship of the code, nothing done without THAT PERSON'S GRANT OF LICENSE means anything. That original author owns all rights to the use of that code, and can dictate how it can and cannot be used. It is this issue that concerns companies like mine.
5) In response to "Who cares how much code there is. How does one "vet" a piece of code, regardless of how much of it there is".
It is not hard to look at 100 or 1000 lines of code in a few files and say "there's nothing novel here". If the code is all written to do one basic thing or set of things in a direct way, it's pretty easy to believe that a single or a few individuals wrote the code. And, if the claimed authorship is invalid, real damages would easy to justify as being minimal, given the very limited scope of what the code is capable of doing.
It's also much easier to feel comfortable in the fact that many other developers are using these 1000 lines of code in their commercial products and haven't yet been sued over the use of some portion of it. And if/when one wants to upgrade to the next version of a module consisting of 1000 lines of code, it's pretty easy to see what was added/removed.
But in the case of boost, with hundreds and possibly thousands (with fuller adoption) of individual files involved, consisting of tens to hundreds of thousands of lines of code, you can't have any idea what you've got In fact, you can feel fairly confident that all of those lines of code are NOT NECESSARY in the basic sense to the benefit you wish to gain from the module in question. So you have to ask yourself "what more does all this code do?", and you certainly can't read and understand the purpose of every line of such a quantity of code to answer that question And the fact that there's so much of it, leads one to wonder "what novel things might be going on in that code to require so much of it"? I mean, 384 header files being read for a Smart Ptr library is pretty darn "novel" in and of itself.
Finally, there's mere statistics involved. If 1000 lines of code opens a company to a certain amount of negative exposure, 100,000 lines of code, one might argue, opens the company to 100 times as much exposure.
6) And...I'm not sure this question was asked specifically, but I'll ask it myself..."what are you so worried about".
Here's an example of what we're worried about....
Say we develop a tool for Disney to use on one of its feature length films. A month before the premier date of the film, someone takes Disney to court and claims that one of their production tools, the one we wrote, contains code that was stolen from them. Disney asks us to come to court to defend our use of that code.
In the case of 1000 lines, we can say exactly what we did to vet the use of the code, and state exactly what that code does not just for us but for anyone who might use it, pointing out that each of those users has a very clear idea of what the code does, what it's worth to them, and why they considered the copyright given by the supposed author to be valid.
For 100,000 lines of code we say, well the 1% of the code we use kinda/sort works by doing this, but it does that by going off and using bits and pieces of all these other files, and frankly, we couldn't take the time to understand what all that code is for, and therefore can not possibly have understood that the code contained something novel that might have been misrepresented as to its authorship for reason of personal gain on the part of the offending copyright grantor.
In the first case, maybe the judge puts some value on the 1000 lines of code, and because it's Disney, that number gets multiplied by 10X. It's still a small amount of money for Disney, so they pay the money and just decide never to do business with us again.
In the second case, the judge says "wow, there's a lot of code here. This is going to take a lot of time to work out the ramifications of, and to put a dollar amount on" and files an injunction against Disney releasing their film. This costs Disney many millions of dollars on everything they've set in motion in order to release the film, that will now all be wasted money. Disney sues us for all of that money. We, as a very small software company, talk to our lawyer, who tells us our best bet is to fold the company and go find jobs working for Google.
7) Use a more modern C++
Some of our customers are in the Operating System Stone Age. For example, I often develop on Fedora, but my code has to be able to compile and run on Red Hat 5. AND, we are often told exactly what compiler to use, and that compiler sometimes not open source, and in a few cases no longer supported. So solving these issues with newer compilers is not an option.
8) Conclusion
So, it DOES MATTER, IN A BIG WAY how much code I have to bring into my project's codebase to get SmartPtr capabilities. And even if it turned out that it didn't, my boss doesn't consider it worth the risk to make that call. He'd rather hire another programmer just to write a SmartPtr library, so that our project can stay on schedule and he can sleep at night, knowing his company isn't going to some day go "poof" due to a relaxed approach to using Open Source.
Some of our customers don't allow their engineering departments any access to code on the internet for this very reason. There are firewalls designed solely to look for and disallow anything that looks like significant code or other data from coming into the company walls. We have to justify our use of each specific piece of Open Source to EACH OF THESE COMPANIES before we can begin to supply them with anything. So another big issue for us is that as soon as we say "We use Boost", we are dismissed from consideration for a project. I bet this happens all the time.
Thanks All for all the interesting and valuable discourse! Take care!
Steve
PS) My company DOES already use the Boost Smart Ptr library. However, it uses a much earlier version of the library, one that depends on just a dozen or so headers. So I guess at one point individual Boost modules were more separable. Or maybe it was just generally smaller back when that module was adopted. So we DO already have and use Boost Smart Ptrs...we just don't have all the nifty new features in the latest and greatest, the most important of which is the ability to not require that the pointed to class be defined wherever a Smart Ptr is instantiated. I'm dying for that feature.