Re: [boost] Git Modularization Ready for Review

10 May 2013

      ...
Niall Douglas <ndouglas@blackberry.com> writes:
...
Me personally I'd just chuck away any unmappable historical information.
Chalk it up to a cost of transition. If they really need the history,
they can go fetch it from the read-only SVN repo.
I see you've not been keeping up with the list lately ;) Daniel et. al.
suggested doing just that a few months ago and was met with such a chorus
of
criticism, they didn't really have a choice but to fix it.
Never actually was on this list, not ever, until recently as it's a lot of
extra reading. Been involved with Boost for over a decade though.
...
Personally, I agree with the chorus.  After all, the point of a VCS is to
have a
history of the code's evolution to a point.  The VCS, be it SVN, Git,
whatever, is
just a means to get that history.  Jetissoning the ends for the means
seems
misguided.
...
What happens in a few years time when Git is replaced with the next big
No, it's a cost of doing an upgrade. Those of you who ever migrated a large
CVS to SVN  transition know what I mean: stuff gets lost, and actually it
isn't important enough to preserve that it requires a quadrupling of
transition effort when a read-only copy of the old technology repo is good
enough. Distributed SCM is much more dimorphic again, and you *have* to
accept some data loss.

Let me put this another way: those who want no history loss ought to be the
ones volunteering the additional time to preserve history. What actually
happens sadly is then the argument becomes we must stick with whatever the
old technology is, because people will choose a small reduction of
productivity every day over fixing tooling once and for all ("programmers
program, they don't do tooling"). I *still* find CVS in use in places
because "history must never ever be lost", and given the anti-productivity
nature of CVS that costs far more in present developer time than accepting a
5% or 10% history loss, most of which will never significantly matter anyway
because it occurs on the edges where SCMs dimorph.

thing?
...
Do we lose the history again?  And then again when that gets replaced too?
That's exactly what happens. Bitrot is always inevitable in the long run.
Here call it "non-fatal bitrot" :)

My only red line is corruption of past and present source code releases. It
must *always* be possible to check out tag X and build release X. Other than
that, I'm flexible, including loss of branch integrity, because in the end
if that branch is really important its owner will manually fix up the
damage.
...
...
I fear that if modularization is taken to its logical extreme, you
could see submodules get out of step with other submodules. You may of
course have already anticipated this and have implemented something I
haven't realized.
As I said, I am confused.
Can you explain a bit more about what you mean by out-of-step?  The whole
point of modularising the code is to *help* modules to get out-of-step and
therefore be easier to develop and test independent of what other Boost
libraries are doing.  But perhaps you mean something else?
...
...
[1]: By findable I mean that when Boost library users do #include
<boost/whatever> they get the main Boost repo version, not the
submodule version. I absolutely would expect an automated tool to pull
headers from submodules, check them for ABI breakage and push them
into the main repo. My point is that some sanity check ought to be
Well, the Git way of helping stuff get out of step is that everyone gets
their own full copy of the whole repo, and their copy is just as important
as anyone else's copy. You clone, you develop and test, and optionally push
changes elsewhere which could be your friend, your team, your employer, or
of course some central authoritative copy. So I'm afraid I just don't get
the present design for a library as small and as tightly integrated as
Boost. Something huge and cleanly separated like KDE sure, but for Boost I
suspect it's overkill. Unless Boost plans to grow 10x in the next three
years that is.

there.
...
I'm not following why you would want to do this.  Perhaps you can explain
what
...
problem you are anticipating and how this would solve it?
Most of Boost is implemented in headers, very much unlike KDE or most other
C++ libraries. Moreover, those headers are quite brittle, unlike KDE or most
other C++ libraries. If broken into submodules, I can see an apparently
innocent change in submodule X appearing to compile and be okay in developer
X's set of submodule clones, but silently be a breaking change with a
simultaneous change in submodule Y by developer Y.

Why this matters is because when in git you go to push, git will force you
to merge before push and at that point you "see" the breakage by a conflict
appearing (hopefully) and if not then your immediate next compile will fail.
With the submodule approach that doesn't happen, so you *don't* see the
breakage till much later when the regression tests suddenly start failing.

I'm a great believer in refusing to let programmers commit code which breaks
other code rather than nagging them later to fix an earlier commit. The
point of failure notification ought to be as close to cause as possible.
...
I also don't get what 'findable' means.  What would a non-findable header
be?
Any internal header only findable by internal implementation. What I'm
basically suggesting is an approach where the master repo keeps a gold
candidate set of headers automatically extracted regularly from the
submodules. Then on push a hook can do appropriate black magic to force the
pusher to merge headers before the push. Then stuff which is broken appears
broken as soon as possible, rather than suddenly emerging many days later.

Does this make sense?

Niall

---
Opinions expressed here are my own and do not necessarily represent those of
BlackBerry Inc.