[boost] [serialization] proposed improvements: forward-compatibility of serialization

14 Apr 2014

      Hello Everybody,

(it seems that my previous post had contained only title - so I repeat
the full posting below)

We've started (again) using boost::serialization in our multi-platform
distributed application about one year ago. However, we have arrived at
a road block with forward-compatibility of serialization - which of
major importance to us since we cannot force users to upgrade to the
same/newer version of application all at the same time.

Firstly, question is:
 perhaps you or someone that you know of - would be interested in such
improvement work? 
We are small company rushed against some die-or-prosper deadlines - and
simply do not have enough resources to do it on our own in time
avialable - but we could sponsor such work.

Secondly, the problem is:

To avoid ambiguity I have put together the following description and our
own suggestion for solution:

The project requirement is centred on improving multi-version class
compatibility in binary archives e.g. more flexibility in a situation
when we are reading or transferring modified class in C++ code. 
 We need to go beyond what current version of object versioning in boost
offers:
 http://www.boost.org/doc/libs/1_55_0/libs/serialization/doc/tutorial.html#ve...

 Specifically we need to add ability to skip reading unknown or
unexpected fields in a non-XML archives we are parsing.

 Specific example:

 Version of the application (newer one) that saved the archive has the
following code:

 struct extensionClass {
     std::string moreInfo;
     time_t date;

     template<class Archive>
     void serialize(Archive & ar, const unsigned int version)
     {
         ar & moreInfo;
         ar & date;
     } 
 }

 class gps_position
 {
 public:
     template<class Archive>
     void serialize(Archive & ar, const unsigned int version)
     {
         ar & degrees;
         ar & minutes;
         ar & seconds;
         ar & new_field
     }
     int degrees;
      int minutes;
     float seconds;
     extensionClass new_field;

     gps_position(){};
 };

 And now the version of the software that needs to read the archive is
an older application that has the code that implements the gps_position
class in the following way:

 class gps_position
 {
 public:
     template<class Archive>
     void serialize(Archive & ar, const unsigned int version)
     {
         ar & degrees;
         ar & minutes;
         ar & seconds;
     }
     int degrees;
     int minutes;
     float seconds;

     gps_position(){};
 };

 The purpose of the project is to gracefully allow for serialization in
older version of application to continue reading the archive, by simply
omitting information related to extensionClass new_field; e.g. to
properly advance the read position to the beginning of next class in the
archive.

It is desirable to add to programming interface information that there
was incompatibility detected, for further escalation to either to user
interface, or higher level code logic that might decide, for instance,
to change communication protocol.

 It is critically important for us to optimize archive size while adding
this capability. We prefer to use eos::portable_iarchive which provides
varinteger support for multi-OS compatibility (size of int, little vs
big endian), therefore we prefer solution which would accept any
archive, however, we will be ok with standard boost binary archive
solution.

Thirdly, the proposal is:

The most promising solution that we envisage is to add recursive size
information associated with the number identifying class (already placed
in front of every data object). 
By recursive size information I mean size not in terms of bytes for raw
data, but in terms of number of fields/objects in the class.
The size information would than be expanded until it can be expressed in
terms of C++11 standard predefined POD sizes.

For instance for simple class example:

class SimplePODs {
    uint32_t firstField,
    uint64_t int secondField;
    float thirdField;
}

The associated (leading) size information for the SimplePODs class is 3.
Than within the class itself we already have identifiers of each field
types.
Since they are POD we can maintain global (and for the archive)
dictionary of sizes for each identifier in this case: 4  than 8 and 4 

And than it follows:

class AggregateClass {
    SimplePODs   first_field;
    char         second_field;
}

In this case the size information (in the global dictionary) for
identifier associated with  AggregateClass  is 2.
Than within the class itself sizes for first_field would be 3 and for
second_field it is of course 1.

I believe this can be done as part of the version and object tracking
process so the performance would still be high 
and most importantly, the incremental size overhead both in memory and
in the archive will be relatively small (only one size information per
each type of object contained in the archive - and no size overhead for
POD fields). 

By the way this approach would also increase multi-OS compatibility 
for instance class on the source computer might be 32-bit compilation of
:
class CrossPlatform {
    int idont_care_about_size;
}

And during saving that field: idont_care_about_size  would actually be
saved and identified as uint32_t

The same CrossPlatform  class on destination machine would be compiled
as 64-bit application - but since target int would be larger that source
we can do silent promotion.
Of course the other way around, we can throw exception if the actual
value contained within idont_care_about_size exceeds the targets'
compilation POD size.

OK - I hope that my explanation is clear and I would appreciate any
feedback that you might have.

And finally let me state that we do greatly appreciate (and use) a lot
of boost work and in particular, we consider boost.serialization
approach to be probably the best possible under the current constrains
of the language.

Best regards,

Andrew Horoszczak

[boost] [serialization] proposed improvements: forward-compatibility of serialization

Andrzej Horoszczak