Andrey Semashev wrote:
Thinking about it more, the requirement to build the dependency graph before the the download may require the cache to be available for any given git commit. This is probably a good reason to make the cache version controlled.
Perhaps it could be stored in a git note [1].
I'm not very familiar with git and don't know anything about git notes. Maybe they fit for this purpose. But my request would be that these notes are not required to be added by maintainers.
[...]
A slightly more advanced solution would be to have the handler download only the dependency file associated with the release tag using git archive [2], before cloning the entire module (that might not work with git notes, though). For releases, the dependency information could also simply be aggregated in the superproject archive.
Ok. As I said, I specifically did not require any particular means for delivering metadata into the tool. If this is possible with git, provided that usability is satisfactory, I'm all for it.
[...]
I agree that the cache is a good idea, as long as it's just a cache. I'm just saying that its role is auxiliary and it should not be managed by developers. [...]
So how about this: we work with two files. For now, let's call them conditional_deps.txt and deps_cache.txt. Both are optional and versioned if present. The conditional_deps.txt contains only toolset/platform annotations and is maintained by humans. The deps_cache.txt contains only the "bare" header-level dependency information and is never maintained or even supposed to be read by a human (perhaps it could be hidden). A commit hook is provided that module maintainers can opt to add to their module configuration to have it generated automatically (this won't affect history or be slow; see below). Libraries that don't have the cache can still be handled "blindly", as you suggested. In release archives the cache is (automatically) bundled with the superproject. Would you find that agreeable?
Do git notes affect history? If yes, it would be undesirable if libraries history is spammed with automated commits adding notes with dependency info.
They don't. You may consider a git note a piece of custom metadata associated with a commit, although it works a bit differently under the hood. The same applies to a deps_cache.txt file: it is created as part of the commit procedure and included with the same commit object. No additional commits appear in history. The maintainer does not need to do anything to make this happen except for installing the hook, once.
It is friendly to tell end users in advance what dependencies will be installed, but that can be solved by other means. A very simple solution would be to list the dependencies on the Boost website.
That doesn't really work for obvious reasons: (a) the advertised dependencies will get out of sync with reality sooner or later
Of course they would be generated automatically (and that would only be necessary for global releases).
and (b) you can't realistically request users to consult the website when they are about to install a Boost library. The tool should provide that information.
Good point.
It is possible that the tool is not able to do that, if the cache is not available for the given commit to be checked out. The tool should notify the user about this problem but still allow to download the necessary components "blindly", by parsing headers for dependencies.
I believe this shouldn't really be necessary because a commit hook should be transparent to the maintainer and sufficient to ensure that the cache always exists. But I agree that this would be a reasonable fallback option.
[...]
Another alternative is to create a new git submodule to store the cache in.
I think that would be a bad idea. The cache should be directly coupled to the commit. We must avoid rolling our own datastructures just to match the right cache to the right commit.
The advantage of just storing a plain file in the module directory is that it certainly works, even if you download an archive without git history, and without a need to set up a new FTP server or other web service. I would prefer to start there and investigate prettier solutions later.
We're discussing a mechanism that will require mass changes to the libraries and possibly the workflow.
No, I think it shouldn't. My intention is to provide a new layer of convenience without shaking things up too much. It should make it easier to introduce other, more transformative changes; not the other way round.
[...]
[...] But it might be more difficult to build the cache in time for heads of branches; there will be some latency between the commit and its metadata.
If the cache is updated by a commit hook, this will not be true. The cache will always be 100% up-to-date. Committing by itself will not take notably longer than usual either, because in most cases only a small number of headers will be affected and this information is available to the commit hook. Even if the deps_cache.txt needs to be re-generated entirely and the module is very large, it should take less than a second. (*) Cheers, Julian ___________ (*) I just tried: $ cd PATH_TO/include/boost/math/ $ time grep -r --include="*pp" "#include" . > ~/test.txt and it took 87 ms. Disk access is order of magnitude slower than in-memory file processing, so I expect this to be fairly representative of single-module dependency detection even on older computers.