Fwd: Binary Region Differentials
Suppose I have a rather huge binary region. Transferring it by any means is an expensive operation. But changes to it will only be some fraction at worst case as large as the binary region itself, but in practice will typically be regional byte blobs, mostly not more than a single kilobyte long.. Is there any data structure that already handles this, where I could say, have every instance start off from the original binary region, and calculate what any other instance of that data structure has by "applicating the deltas"?
On 21 May 2015, at 21:20, Kenneth Adam Miller
wrote: Suppose I have a rather huge binary region. Transferring it by any means is an expensive operation. But changes to it will only be some fraction at worst case as large as the binary region itself, but in practice will typically be regional byte blobs, mostly not more than a single kilobyte long.. Is there any data structure that already handles this, where I could say, have every instance start off from the original binary region, and calculate what any other instance of that data structure has by "applicating the deltas"?
Do you keep track of changes, or do you need to calculate what the difference between two given huge binaries are? You might be able to efficiently compare binaries without transferring them by computing hashes of chunks of the binaries and transfer those hashes so that you can compare those. It's not an exact comparison because you have a risk of hash collisions, but that risk is typically extremely small. Once you have a list or regions that differ then you can simply send those regions and patch the binary at the other end. What you have sounds a bit like a version control system?
I don't care about intermediate changes, so I'm not keeping track. I don't need to know the difference between two huge binaries, just start with a large one, keep track of changes that are small, then applicate those changes to derive a resulting binary region. It does indeed sound much like version control. Here's my use case: I have an original binary region. I make a small change to it, but I want to save only that change. I could make many of these changes, and if I applicate them in a forward direction, I should have confidence that I can produce an identical binary region as from which these differentials were derived. I could branch from a common parent and have many different differentials. So long as my application path from parent to child is sane, it produces regions.
On 21 May 2015, at 21:20, Kenneth Adam Miller
wrote: Suppose I have a rather huge binary region. Transferring it by any means is an expensive operation. But changes to it will only be some fraction at worst case as large as the binary region itself, but in practice will typically be regional byte blobs, mostly not more than a single kilobyte long.. Is there any data structure that already handles this, where I could say, have every instance start off from the original binary region, and calculate what any other instance of that data structure has by "applicating the deltas"?
Do you keep track of changes, or do you need to calculate what the difference between two given huge binaries are? You might be able to efficiently compare binaries without transferring them by computing hashes of chunks of the binaries and transfer those hashes so that you can compare those. It's not an exact comparison because you have a risk of hash collisions, but that risk is typically extremely small. Once you have a list or regions that differ then you can simply send those regions and patch the binary at the other end. What you have sounds a bit like a version control system? _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 2015-05-21 15:20, Kenneth Adam Miller wrote:
Suppose I have a rather huge binary region. Transferring it by any means is an expensive operation. But changes to it will only be some fraction at worst case as large as the binary region itself, but in practice will typically be regional byte blobs, mostly not more than a single kilobyte long.. Is there any data structure that already handles this, where I could say, have every instance start off from the original binary region, and calculate what any other instance of that data structure has by "applicating the deltas"?
Suggest you read https://github.com/bup/bup/blob/master/DESIGN which discusses the algorithm used by rsync and bup to do this sort of thing efficiently. It's not quite what you're asking for, but it should be able to achieve much the same results. If your problem is actually to transfer such files, updating the one at the destination, then perhaps you can simply use rsync and not worry about it. John Bytheway
No, I don't want to transfer files. If at all possible, I want to use
native C/++ and I want to link against a library that provides this an do
all of this in memory with regular c/++ types. :)
On Tue, May 26, 2015 at 11:04 PM, John Bytheway
On 2015-05-21 15:20, Kenneth Adam Miller wrote:
Suppose I have a rather huge binary region. Transferring it by any means is an expensive operation. But changes to it will only be some fraction at worst case as large as the binary region itself, but in practice will typically be regional byte blobs, mostly not more than a single kilobyte long.. Is there any data structure that already handles this, where I could say, have every instance start off from the original binary region, and calculate what any other instance of that data structure has by "applicating the deltas"?
Suggest you read https://github.com/bup/bup/blob/master/DESIGN which discusses the algorithm used by rsync and bup to do this sort of thing efficiently. It's not quite what you're asking for, but it should be able to achieve much the same results.
If your problem is actually to transfer such files, updating the one at the destination, then perhaps you can simply use rsync and not worry about it.
John Bytheway
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (3)
-
John Bytheway
-
Kenneth Adam Miller
-
Thijs (M.A.) van den Berg