Re: [boost] date_time -> serialization (Was: spirtit -> serialization)

16 Jun 2014

      Andrey Semashev wrote:
...
Thinking about it more, the requirement to build the dependency graph
before the the download may require the cache to be available for any
given git commit. This is probably a good reason to make the cache
version controlled.
...
Perhaps it could be stored in a git
note [1].
I'm not very familiar with git and don't know anything about git
notes. Maybe they fit for this purpose. But my request would be that
these notes are not required to be added by maintainers.
[...]
...
A
slightly more advanced solution would be to have the handler download
only the dependency file associated with the release tag using git
archive [2], before cloning the entire module (that might not work
with git notes, though). For releases, the dependency information
could also simply be aggregated in the superproject archive.
Ok. As I said, I specifically did not require any particular means for
delivering metadata into the tool. If this is possible with git,
provided that usability is satisfactory, I'm all for it.
[...]
I agree that the cache is a good idea, as long as it's just a cache.
I'm just saying that its role is auxiliary and it should not be
managed by developers. [...]
So how about this: we work with two files. For now, let's call them 
conditional_deps.txt and deps_cache.txt. Both are optional and 
versioned if present. The conditional_deps.txt contains only 
toolset/platform annotations and is maintained by humans.

The deps_cache.txt contains only the "bare" header-level dependency 
information and is never maintained or even supposed to be read by a 
human (perhaps it could be hidden). A commit hook is provided that 
module maintainers can opt to add to their module configuration to 
have it generated automatically (this won't affect history or be slow; 
see below). Libraries that don't have the cache can still be handled 
"blindly", as you suggested. In release archives the cache is 
(automatically) bundled with the superproject.

Would you find that agreeable?
...
Do git notes affect history? If yes, it would be undesirable if
libraries history is spammed with automated commits adding notes with
dependency info.
They don't. You may consider a git note a piece of custom metadata 
associated with a commit, although it works a bit differently under 
the hood.

The same applies to a deps_cache.txt file: it is created as part of the 
commit procedure and included with the same commit object. No 
additional commits appear in history. The maintainer does not need to 
do anything to make this happen except for installing the hook, once.
...
...
It is friendly to tell end users in advance what dependencies will be
installed, but that can be solved by other means. A very simple
solution would be to list the dependencies on the Boost website.
That doesn't really work for obvious reasons: (a) the advertised
dependencies will get out of sync with reality sooner or later
Of course they would be generated automatically (and that would only be 
necessary for global releases).
...
and (b)
you can't realistically request users to consult the website when they
are about to install a Boost library. The tool should provide that
information.
Good point.
...
It is possible that the tool is not able to do that, if the cache is
not available for the given commit to be checked out. The tool should
notify the user about this problem but still allow to download the
necessary components "blindly", by parsing headers for dependencies.
I believe this shouldn't really be necessary because a commit hook 
should be transparent to the maintainer and sufficient to ensure that 
the cache always exists. But I agree that this would be a reasonable 
fallback option.
...
[...]
Another alternative is to create a new git submodule to store the cache in.
I think that would be a bad idea. The cache should be directly coupled 
to the commit. We must avoid rolling our own datastructures just to 
match the right cache to the right commit.
...
...
The advantage of just storing a plain file in the module directory is
that it certainly works, even if you download an archive without git
history, and without a need to set up a new FTP server or other web
service. I would prefer to start there and investigate prettier
solutions later.
We're discussing a mechanism that will require mass changes to the
libraries and possibly the workflow.
No, I think it shouldn't. My intention is to provide a new layer of 
convenience without shaking things up too much. It should make it 
easier to introduce other, more transformative changes; not the other 
way round.
...
[...]
[...] But it might be more difficult to build the
cache in time for heads of branches; there will be some latency
between the commit and its metadata.
If the cache is updated by a commit hook, this will not be true. The 
cache will always be 100% up-to-date. Committing by itself will not 
take notably longer than usual either, because in most cases only a 
small number of headers will be affected and this information is 
available to the commit hook. Even if the deps_cache.txt needs to be 
re-generated entirely and the module is very large, it should take 
less than a second. (*)

Cheers,
Julian

___________

(*) I just tried:

    $ cd PATH_TO/include/boost/math/
    $ time grep -r --include="*pp" "#include" . > ~/test.txt

and it took 87 ms. Disk access is order of magnitude slower than 
in-memory file processing, so I expect this to be fairly 
representative of single-module dependency detection even on older 
computers.

Re: [boost] date_time -> serialization (Was: spirtit -> serialization)

Julian Gonggrijp