On 09/22/2014 11:37 AM, Rob Stewart wrote:
On September 21, 2014 11:12:51 AM EDT, "Bjørn Roald"
wrote: On 09/21/2014 10:36 AM, Vicente J. Botet Escriba wrote:
After the long threads concerning the modularization it seems clear to me that we are in an impasse.
Maybe most of the friction is more of a case of lack of clear communication rather than real disagreements. It could be the goals would be agreed if they where clear to everyone. Some participants in the threads seems to have clear goals in mind for what need to be done first, and just feel need to to proceed, while others are confused about what is going on and why. The latter may need to understand the "why" as in how we get to a end result we want and what that result looks like. The former group may be more concerned with what they "know" has to be done before we get anywhere. They need to convince the skeptics why that is the case. Neither side's statements and arguments are hard to understand if you are willing to try to shift mindset for the sake of understanding. Nevertheless it need to be some level of consensus before this can proceed.
You are very likely correct.
So how can consensus be achieved? I think starting with more concrete meaning to terminology used in discussions, proposals and guidelines would be a very helpful. Guessing what people mean with module, sub-module, library, sub-library, repo, sub-repo, package, dependency, etc. is not helpful to understanding each other.
+1
Library: A library is a collection of code in Boost that is reviewed and accepted/rejected by boost as community. A library is maintained be individuals that are the library maintainers. The code is managed in a separate git repository that is included as a git submodule in the libs folder of the boost master repository. A library contain the library's main module in subdirectories include, src, test, build, and doc. In addition a library may contain a number of additional directories containing optional modules that depend on the main module, these are called sub-libraries.
You've defined "library" in terms of "module" and "sub-library" which have not yet been defined.
Right, module should most likely be defined first as its definition depend less, if at all on the library definition.
What is a "main module"?
For library A, the main module live in libs/A/include libs/A/src etc. Each sub-library contain a module as well, sub.library A/x live in: libs/A/x/include libs/A/y/src etc. all these modules are modules of library A, but the main module is a sort of focus point. It is the boost library's primary features. Sub libraries are there to provide optional utilities that depend on or or create a bridge to other modules, boost or external modules. Sub-libraries could be used for other purposes than modularization, e.g. logical partitioning of a libraries facilities. But if that is useful, it is off-topic, so I leave that.
I need to understand that to understand what's included in a library. More on module's definition below.
Sub-library: A library may contain related code in sub-libraries that should be treated as separate module to limit dependencies incurred if they are part of the library's main module. The sub-library has its own module structure containing its own include, src, test, build, and doc directories. A sub-library is part of the library and is maintained by the libraries maintainers.
I need to understand "module" to understand "sublibrary" (which needn't be hyphenated, BTW).
OK, - actually I am struggling with the temptation of using submodule rather than sublibrary as term here as it really is more logical to me. Then you get the "main module" v.s. the "submodule(s)" inside a library. But I try to avoid using submodule due to the danger of mixup with the git thing with the same name. One option would simply be to call both the main module and the sublibrary simply for "modules". No main v.s. sub relationship implied. If there are more than one module in the library we require that they live in separate subdirectories or levels in the directory tree.
Package: Unit of deployment of boost source code and/or pre-build libraries,
I assume you meant "pre-built" rather than "pre-build" here.
yes
documentation etc. Typically there may be a one-to-one relationship between packages and modules, but it is possible to deploy more than one module in a package or break one module into more than one package.
The current packaging model puts all modules into one package, so it's more than possible, it's the norm.
agreed.
Repository: A version controlled directory structure containing checked out or modified files in a working directory and a database of the repository history and relationships to other repositories. In a git working directory, the database is in the .git subdirectory or is pointed to by a .git file.
The usual meaning of "repository", at least in my experience is the managed history in a certain control tool, not the files in a workspace.
Well, yes and no... in git what you are referring to is a "bare repository". But it is not important to me. We could call a repository with a working directory for "dressed up" -- just kidding. I just think most developers will think of the working directory when they clone or update their repository, so that is why I put it the way I did. If we include this in a normative definition we should try to be precise. The simplest way is to leave these details out if they do not add anything to the subject at hand.
Sub-Repository: I suggest we do not use this term mean sub-library. Use the term sub-library or git submodule instead.
If the VCS ever changes again, the tool-specific name of this entity will probably change. It would be better to provide an abstraction. That is, formalize "subrepository" and not that a git submodule is a subrepository.
Good point. But, my take here was that we do not need the term sub-repository, hence I don't really see the need for an abstraction either. If the discussion is about VCS, we have git repository and git submodule. If the discussion is about source code structure and organization we have libraries and modules. As stated above, maybe sublibrary is not needed, we can simply use module.
Module: A organized set of boost library code that can be handled in a uniform manner by boost tools. A module shall contain the include, test, build, and doc directory, Modules that are not header-only shall also contain the src directory that is used to build one or more corresponding library files.
How is a module distinct from a library?
A library can have more than one module. If it has one it is more or less the same.
Both are defined in terms of the directories they contain. Each is defined in terms of the other.
Module take 2: A organized set of boost code that can be handled in a uniform manner by boost tools. A module shall contain the include, test, build, and doc directory, Modules that are not header-only shall also contain the src directory that contain sources used to build static and dynamic library files that the user will link with.
Sub-module: I suggest we do not use this term to mean sub-library, use sub-library instead. If it is not clearly given by context, use git submodule if we have a git repository tracked using a git submodule in mind (http://git-scm.com/docs/git-submodule).
Until I better understand the difference between "library" and "module", I can't say whether I agree with your conclusion on submodule.
Hopefully some of this is clearer now.
Dependencies: Handling of dependencies is where I struggle the most with seeing a clear path forward. In particular what determines the nodes and edges in the dependency graphs we care about. And what are we going to use the dependency graph for.
Right
Test Example, and Doc Dependencies: First of all, if test, example and doc code is part of the module and incur additional requirements, we certainly do not always want to track those dependencies as the modules dependencies. A separate dependency graph node for test code seems to be a solution if there is a real need to track it at all. Documentation can also clearly be treated separate if need be. However, given this, then the module as defined above is no longer the node in the dependency graph. But that is probably just the beginning.
Test and doc dependencies should certainly be tracked separately, if at all.
Lib Dependencies: Modules that are not header only have source files in the src directory that are compiled into one or more library files (ignoring variants directly supported by Boost.Build). Separate dependency graph nodes may be appropriate here to distinguish dependencies at link and compile time. But there are many possible facets of this, so I think the real use-cases for the dependency graph should drive requirements for what the nodes and edges shall model. In addition dependencies may vary on configuration of the target environment. It is not clear if or how such external dependencies should be tracked, however starting with the Jamfile lib dependencies is certainly a good start. It may be most package management systems has what is needed for the rest, so it is a mater of bridging these worlds.
I should think dependencies would be computed at the logical grouping represented by library or module, depending on what those terms actually mean.
Yes I do agree with that, I was just trying to point out some addiitonal potenital aspects. I was not saying we needed to care about them if they are not needed. Module has that role as in modularization.
I presume one will choose to build components by such logical entities.
Maybe, but we need to define "component" and what that means if we are going to use it. Actually to me, with regard to boost, component is more or less synonym with module. Maybe components are more about how they are deployed and re-used, and module is more about the separation of the components sources from the sources of other components or modules in the boost source tree. But there are clearly alternative definitions of component. Nevertheless, I am not sure we need both component and module in the boost terminology dictionary, so I opted for module as it has been used more than component in discussions and it sort of fits with modularization.
Include Dependencies: Dependencies in the include directory may cause compile and link time dependencies for the module user. These dependencies does not incur before a header is included directly or indirectly that require the specific dependency to be met. This could, as some have pointed out, be leveraged to get very flexible and fine-grained "real" dependency graph in boost. However, as the actual dependencies are not known before the application developer changes source code, compiles and links, and then understand cause of the resulting diagnostics, this is not very helpful for packaging of minimum required sub-sets of boost. I am also afraid the diagnostics for missing headers or object file symbols will not be a very user friendly solution. However if that could be fixed somehow to point directly at the missing package, or even better that a package manager could be more or less automatically invoked to fix it, then this may be a path forward. Such fine-grained dependency tracking could greatly reduce need for sub-libraries.
I agree that such fine-gained tracking can be a cause of confusion and hassles. I normally prefer to think in terms of libraries, not optional features. That does less to problems managing dependencies like Date Time's optional dependency on Serialization, however.
Separating larger chunks of code in a sub-library may seem reasonable for several reasons, but to separate single headers into their own sub-library only to get a "pretty" graph may clearly be way off the reasonableness scale. Especially, if it can be reasoned that we don't push internal boost structure problems on the helpless application developer to figure out. It seems reasonable to look for facilitation for something much simpler in these cases.
+1
For the lack of a better term for what some are suggesting, I just invented bridging-header as a term which may be a mechanism to help in this situations.
Bridging Headers: A bridging header is a C++ header files that bridges facilities in one module with facilities in another module to provide a new convenience facility to users. The bridging header is part of the include structure in one of the two modules and only depend on a minimal required set of features from the two modules to provide the new convenience facility. A bridging header is marked in a to-be-determined way that allow dependency tracking tools to track the set of bridging headers between any two modules as a separate node (a bridge) in the dependency graph. When a user include a bridging header it add both the bridged modules as dependencies, however it may not be practical to have every bridge tracked by a package manager as a separate package.
That seems like a decent approach.
The main challenge may be that it does not fit well with the dependency tracking model used by many package managers. However, as it has been pointed out in the discussions, any reasonable use of a bridging header would be in an environment where the other package would installed, even if not by enforcement of package-manager dependency rules. E.g.: DateTime and Serialization packages would be naturally installed by a user before any attempt to serialize DateTime data types. So it may not be a big deal if there is no DateTimeSerialization package in addition, it seems almost like the extra package would just create friction here as users would have to discover it to be aware of of the bridging facilities. The bridge may just as well be part of the serialization package or the DateTime package without enforcing installation of the other. -- Bjørn