questions regarding dependency reports(s)
I've been looking at the question of "dependencies" and I would like to know a little more about how they are generated. I've looked at: a) http://www.pdimov.com/tmp/report-6d1f271/serialization.html b) http://www.steveire.com/boost/2014sept16_serialization.png c) bcp documentation And it doesn't seem that they all produce the same list of dependencies. a) doesn't include any *.cpp files. So I assume that it contains less than the true number of dependencies. That is if a *.cpp file includes a header as part of it's implementation and not it's interface, it wouldn't be included. b) produces a very intimidating graph. I'm not sure what the rules are for it's generation. I'm guessing that it uses dependencies at the library level rather than the *.cpp level which might make things look worse than they are. c) BCP seems pretty exhaustive in that it cans all the code in all the *.cpp libraries, tests, examples etc. For lots of applications - e.g. if I wanted to include a boost library with my application but not all the tests, It would generate a likely much larger subset than I need. But maybe not. If my application includes the boost library.dll then I guess I have to include it all. This raises an interesting question to me. Suppose I want to distribute my app with source and I include a CMake which just doesn't build the libraries but just includes the relevant *.cpp files in the application build itself? This would make my application depend upon the smallest subset of boost possible. What would it take to make a version of BCP which, given an arbitrary group of source files, returns a list of headers and *.cpp files which could be used to build the app? We know from an intuitive basis that we want to give the option of delivering a smaller boost, but we've always thought that that means delivering a library subset. Is this really what we want? What would users do with the smaller subset? (Besides complain that its not small enough?) How about if we tweaked BCP to deliver just a subset of files relevant to the target? Would that make them happier? Would it get us off the hook for trying to "solve" what the dependencies are? Just thinking out loud here. Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/questions-regarding-dependency-reports-s-... Sent from the Boost - Dev mailing list archive at Nabble.com.
On Friday 26 September 2014 15:35:54 Robert Ramey wrote:
What would it take to make a version of BCP which, given an arbitrary group of source files, returns a list of headers and *.cpp files which could be used to build the app?
The main problem with bcp and header-based dependency tracking in general is dealing with preprocessor tricks which affect the inclusion graph. The cases when a macro unfolds in the header name are not uncommon. Even more complicated are cases when #include directives are conditioned on some tests like compiler version or platform or macros defined in other headers. You could collect dependencies that correspond to a particular environment (e.g. the one you're currently running) but that wouldn't be a portable distribution. I suppose, the more correct approach would be to build a superposition of all possible condition results and preprocess each header in every possible way, but that's beyond what a normal C++ preprocessor does and I doubt this would be practical. I think the most feasible way to do it is to preprocess headers multiple times according to a number of pre-defined environment presets, each corresponding to a platform Boost supports. It will still work rather slow, and defining these presets would be a tough job, but at least this looks doable.
On 9/26/2014 7:10 PM, Andrey Semashev wrote:
On Friday 26 September 2014 15:35:54 Robert Ramey wrote:
What would it take to make a version of BCP which, given an arbitrary group of source files, returns a list of headers and *.cpp files which could be used to build the app?
The main problem with bcp and header-based dependency tracking in general is dealing with preprocessor tricks which affect the inclusion graph. The cases when a macro unfolds in the header name are not uncommon. Even more complicated are cases when #include directives are conditioned on some tests like compiler version or platform or macros defined in other headers.
It gets much more complicated than that: * You could have a header which, if it is not used, means that some dependency is unnecessary. * You could a conditional which, if it is not defined, mean that some dependency is unnecessary. * You could have a "feature" which, if it is not used, mean that some dependency is unnecessary. A "feature" can be anything along the lines of what people are suggesting as a sublibrary, ie. support for something outside the mainstream use of a library. * Nobody has talked about a most common case: a library will depend on some particular version(s) of another library but will not work with other versions of another library. Given that libraries usually strive to not remove existing interfaces this usually means that a library will depend on some minimal version on up of another library. Currently Boost has no way to specify/track version information in Boost libraries, a serious flaw IMO to any idea of being able to isolate a particular library and its dependencies for the purpose of use outside of the entire Boost tree. While processing headers is worthwhile, I do not believe that any dependency system relying just on that technique will ever be able to always determine all the dependency information necessary to isolate a library and its dependencies for every potential use case. It may work for a very simple situation but will not scale as a library gets more complicated and offers a number of choices how it can be used. Boost ideally will need something better based on some sort of meta-information which a library author will need to supply. Of course if there is no real impetus to provide Boost library isolation and Boost will continue to be distributed in its current monolithic way, then the tracking of dependencies via header file analysis may be as adequate as we want to get to a decent indication of what depends on what.
Edward Diener-3 wrote
While processing headers is worthwhile, I do not believe that any dependency system relying just on that technique will ever be able to always determine all the dependency information necessary to isolate a library and its dependencies for every potential use case. It may work for a very simple situation but will not scale as a library gets more complicated and offers a number of choices how it can be used. Boost ideally will need something better based on some sort of meta-information which a library author will need to supply.
What would this meta information include?
Of course if there is no real impetus to provide Boost library isolation and Boost will continue to be distributed in its current monolithic way, then the tracking of dependencies via header file analysis may be as adequate as we want to get to a decent indication of what depends on what.
I would like to see Boost be able to grow to 500 libraries in the next 10
years.
Requiring a user to download/install 500 libraries to use the one he want's
doesn't seem convincing to me.
You've all convinced me that no completely automated approach will
do the job.
So I'm sort of stumped. Maybe we can make this work by a couple
of simple things
a) enhance BCP so that the top level dependent doesn't have to be
a library but could be an application. This would mean that a user
no interested in tests or examples could just get a list of the the
dependencies for his application or perhaps just the library build itself.
b) If we assign libraries to one of the following levels, we might be
able to keep things under control (of course there'd be a couple more
than 5).
level 0: no libraries
level 1: stl libraries
level 2: config, exception ... // used indirectly by almost
everything
level 3: mpl, type_traits, ... // used by programs which use
generic programming
level 4: shared_ptr, .... // core utilities used by
other libraries
level 5: asio, serialization, date_time // used by applications rather
than by other libraries
there are no level 0 libraries
each level depends only on other libraries at a lower level.
This is mostly a convention so that when we add a library, we consciously
are
deciding where it sits in the dependency hierarchy rather than just letting
the chips fall where they may as we do now.
c) Consider refining a couple of major offenders - e.g. xml_archives so that
they would be separate dlls/libraries in the same module that they are now.
There is precedent for this. The serialization library module actually
creates two
libraries - serialization.dll and wserialization.dll . Bjam build scripts
could be
tweaked to produce in addition to these - xmlserialization.dll and
w_xmlserialization.dll.
Note that BCP would have to be tweaked to parse either the bjam or
some other spec file so follow just the dependencies relevant to the root.
We would then have an mechanism such that given an application like
my_app.cpp
#include
On 9/27/2014 1:20 AM, Robert Ramey wrote:
Edward Diener-3 wrote
While processing headers is worthwhile, I do not believe that any dependency system relying just on that technique will ever be able to always determine all the dependency information necessary to isolate a library and its dependencies for every potential use case. It may work for a very simple situation but will not scale as a library gets more complicated and offers a number of choices how it can be used. Boost ideally will need something better based on some sort of meta-information which a library author will need to supply.
What would this meta information include?
Each configuration of a library could have a separate name, with the main configuration being called 'main' let's say. For each configuration there would be a list of the library's directories and/or files which are needed to use that library along with a list of all direct dependencies of the library. The direct dependencies of the library would include such information about the dependency such as the library name itself, versions of the library which are acceptable, and the name of a configuration which is acceptable. Then you go to each direct dependency library, with the correct version(s) and configuration and recursively find out its dependencies etc. Each library would need some sort of version number as part of it meta-information. Most probably this should be both a general Boost version number and an individual library version number. The idea is for every library to tell you what its direct dependencies and files for each of its configurations, and this gets applied recursively for each dependent library until a working set of directories/files are created which can be downloaded and used for any particular library. The end-user says, I want to use library X with configuration Y and he gets exactly the files he needs to use that library. Naturally each configuration for a library needs to be well documented as to its meaning. This would also include instructions on how to use a particular configuration. A configuration could include all sorts of things in the instructions to using it such as 'include these header files', 'define this macro', etc. etc. The idea is to be as friendly to the end-user a possible when the end-user just has a particular usage of a Boost library. For many libraries there may just be one 'main' configuration but I see nothing wrong with the library programmer creating as many separate configurations as he likes. For instance if library X uses library Y only in its tests, it may create a 'testless' configuration so that library Y does not need to be part of the working set of library X if the end-user decides he does not need to run library X's tests just to use library X. We would of course need some sort of tool, whether command-line like bcp, or GUI, where an end-user could pick the individual Boost libraries he wants, with the configuration he wants for each library, and he gets to then download the files he wants put into a working set he can use. I realize that this is all very general and there is an enormous number of details to work out, but I think that if Boost does proceed along the path of individual Boost library distribution as opposed to monolithic distribution sometime in the future then something along the lines I have suggested would be a good idea.
On 09/27/2014 07:20 AM, Robert Ramey wrote:
Edward Diener-3 wrote
While processing headers is worthwhile, I do not believe that any dependency system relying just on that technique will ever be able to always determine all the dependency information necessary to isolate a library and its dependencies for every potential use case. It may work for a very simple situation but will not scale as a library gets more complicated and offers a number of choices how it can be used. Boost ideally will need something better based on some sort of meta-information which a library author will need to supply.
What would this meta information include?
Of course if there is no real impetus to provide Boost library isolation and Boost will continue to be distributed in its current monolithic way, then the tracking of dependencies via header file analysis may be as adequate as we want to get to a decent indication of what depends on what.
I would like to see Boost be able to grow to 500 libraries in the next 10 years.
Requiring a user to download/install 500 libraries to use the one he want's doesn't seem convincing to me.
You've all convinced me that no completely automated approach will do the job.
So I'm sort of stumped. Maybe we can make this work by a couple of simple things
a) enhance BCP so that the top level dependent doesn't have to be a library but could be an application. This would mean that a user no interested in tests or examples could just get a list of the the dependencies for his application or perhaps just the library build itself.
It is an attractive idea. However, here is a concern that should be considered. With any sort of packaging or other partial distribution I worry about how discovery of yet-to-use parts are facilitated for the user. Such discovery facilitation may degrade further with a potentially very fine grained bcp "just copy what you use" strategy to deployment. A very simple example is that you can use IDE editors to find the correct name of an include file, but if it is not there yet, this will simply not work. Certainly a coarse distribution of this information may be better, but hopefully there is better ways than all of boost itself in one monolithic package. One approach would be to build some sort of meta-data package information for all of boost that tools could use to let users discover, explore and include unused elements of boost with ease. -- Bjørn
What would it take to make a version of BCP which, given an arbitrary group of source files, returns a list of headers and *.cpp files which could be used to build the app?
bcp will do that now with the --scan option, for example: bcp --list --scan myfile.cpp Will list the dependent headers and source files of myfile.cpp. However, note that: * It finds too much - broken compiler workarounds pull in additional includes you may not need on *your* compiler - this is the most common complaint - but of course without this the subset isn't actually portable. I suspect (with no data to back it up!) that this issue may reduce as we increasingly support only newer compilers. * It finds too much (part 2) - if it finds a header from library X, and library X has some source files associated with it, then those files and their dependencies get included. Of course there are some libraries that have "optional" source files - only required if you use some specific header subset. * It finds too little - it can't follow obfuscated includes (via preprocessor defs) - though neither can other dependency scanning tools as far as I know. There is a mechanism inside bcp to add specific manual dependencies when required, but as you can imagine, it's always out of date. One point worth noting - bcp is not an offline installer - it can only scan what's on your hard drive. I guess though it could be used to produce an install list for each library with each release - that would get us into an endless debate about what such a list should include: 1) Dependencies required to use the library. 2) What about optional dependencies / from bridging/glue headers? 3) What about dependencies required by examples/tests? I guess that (2) and (3) could be made separate packages? John.
Robert Ramey wrote:
I've been looking at the question of "dependencies" and I would like to know a little more about how they are generated.
b) http://www.steveire.com/boost/2014sept16_serialization.png
b) produces a very intimidating graph. I'm not sure what the rules are for it's generation. I'm guessing that it uses dependencies at the library level rather than the *.cpp level which might make things look worse than they are.
The dependencies shown are 'interface' dependencies between git repos. for each git repo 'R': for each file 'F' in 'R/include/boost': if 'F' #includes a file in git repo 'D': draw an edge from 'R' to 'D' I did add some special cases for mplcore, which no longer exists, etc. Checking the files in 'R/src/' in addition does not change much. In particular for serialization, it does not add any edges. I think the rules for Peters tool are similar, so my output should simply be a graph version of the same information he's producing. Thanks, Steve.
Robert Ramey wrote:
a) doesn't include any *.cpp files. So I assume that it contains less than the true number of dependencies. That is if a *.cpp file includes a header as part of it's implementation and not it's interface, it wouldn't be included.
There isn't one true number of dependencies. There are at least three, and (a) produces one of them. 1. The .hpp dependencies. These are required if one wants to use the library. 2. The .cpp dependencies. These are required if one wants to build the library. 3. The test dependencies. These are required if one wants to run the library's tests. For header-only libraries, (2) obviously doesn't apply. (3) is a superset of (2), and (2) is generally a superset of (1). I've chosen to focus on (1) in my report both because the libraries with which I'm most concerned are header-only, and because its transitive closure gives a better approximation of the indirect dependencies of a library. Libraries sometimes include another library's headers without actually needing the compiled library for ordinary use. The dependencies on Serialization are generally of this sort - if you don't serialize a, say, variant, you won't need to link to Serialization and consequently don't need to have it built, but the headers must still be there. But of course if you want to know what do you need to build a library, you need (2), not (1). And similarly, if you want to know what you need for a library's tests, you'd need (3). Neither is truer than the others, as they are fit for different purposes. A transitive report based on (3), for instance, would be nearly useless - many libraries use Boost.Test, and Boost.Test depends on the world.
participants (7)
-
Andrey Semashev
-
Bjørn Roald
-
Edward Diener
-
John Maddock
-
Peter Dimov
-
Robert Ramey
-
Stephen Kelly