Niall Douglas
writes: Me personally I'd just chuck away any unmappable historical information. Chalk it up to a cost of transition. If they really need the history, they can go fetch it from the read-only SVN repo.
I see you've not been keeping up with the list lately ;) Daniel et. al. suggested doing just that a few months ago and was met with such a chorus of criticism, they didn't really have a choice but to fix it.
Never actually was on this list, not ever, until recently as it's a lot of extra reading. Been involved with Boost for over a decade though.
Personally, I agree with the chorus. After all, the point of a VCS is to have a history of the code's evolution to a point. The VCS, be it SVN, Git, whatever, is just a means to get that history. Jetissoning the ends for the means seems misguided.
What happens in a few years time when Git is replaced with the next big
No, it's a cost of doing an upgrade. Those of you who ever migrated a large CVS to SVN transition know what I mean: stuff gets lost, and actually it isn't important enough to preserve that it requires a quadrupling of transition effort when a read-only copy of the old technology repo is good enough. Distributed SCM is much more dimorphic again, and you *have* to accept some data loss. Let me put this another way: those who want no history loss ought to be the ones volunteering the additional time to preserve history. What actually happens sadly is then the argument becomes we must stick with whatever the old technology is, because people will choose a small reduction of productivity every day over fixing tooling once and for all ("programmers program, they don't do tooling"). I *still* find CVS in use in places because "history must never ever be lost", and given the anti-productivity nature of CVS that costs far more in present developer time than accepting a 5% or 10% history loss, most of which will never significantly matter anyway because it occurs on the edges where SCMs dimorph. thing?
Do we lose the history again? And then again when that gets replaced too?
That's exactly what happens. Bitrot is always inevitable in the long run. Here call it "non-fatal bitrot" :) My only red line is corruption of past and present source code releases. It must *always* be possible to check out tag X and build release X. Other than that, I'm flexible, including loss of branch integrity, because in the end if that branch is really important its owner will manually fix up the damage.
I fear that if modularization is taken to its logical extreme, you could see submodules get out of step with other submodules. You may of course have already anticipated this and have implemented something I haven't realized. As I said, I am confused.
Can you explain a bit more about what you mean by out-of-step? The whole point of modularising the code is to *help* modules to get out-of-step and therefore be easier to develop and test independent of what other Boost libraries are doing. But perhaps you mean something else?
[1]: By findable I mean that when Boost library users do #include
they get the main Boost repo version, not the submodule version. I absolutely would expect an automated tool to pull headers from submodules, check them for ABI breakage and push them into the main repo. My point is that some sanity check ought to be
Well, the Git way of helping stuff get out of step is that everyone gets their own full copy of the whole repo, and their copy is just as important as anyone else's copy. You clone, you develop and test, and optionally push changes elsewhere which could be your friend, your team, your employer, or of course some central authoritative copy. So I'm afraid I just don't get the present design for a library as small and as tightly integrated as Boost. Something huge and cleanly separated like KDE sure, but for Boost I suspect it's overkill. Unless Boost plans to grow 10x in the next three years that is. there.
I'm not following why you would want to do this. Perhaps you can explain
what
problem you are anticipating and how this would solve it?
Most of Boost is implemented in headers, very much unlike KDE or most other C++ libraries. Moreover, those headers are quite brittle, unlike KDE or most other C++ libraries. If broken into submodules, I can see an apparently innocent change in submodule X appearing to compile and be okay in developer X's set of submodule clones, but silently be a breaking change with a simultaneous change in submodule Y by developer Y. Why this matters is because when in git you go to push, git will force you to merge before push and at that point you "see" the breakage by a conflict appearing (hopefully) and if not then your immediate next compile will fail. With the submodule approach that doesn't happen, so you *don't* see the breakage till much later when the regression tests suddenly start failing. I'm a great believer in refusing to let programmers commit code which breaks other code rather than nagging them later to fix an earlier commit. The point of failure notification ought to be as close to cause as possible.
I also don't get what 'findable' means. What would a non-findable header be?
Any internal header only findable by internal implementation. What I'm basically suggesting is an approach where the master repo keeps a gold candidate set of headers automatically extracted regularly from the submodules. Then on push a hook can do appropriate black magic to force the pusher to merge headers before the push. Then stuff which is broken appears broken as soon as possible, rather than suddenly emerging many days later. Does this make sense? Niall --- Opinions expressed here are my own and do not necessarily represent those of BlackBerry Inc.