Paul Long wrote:
ibitstream performs bit-by-bit processing.
it will never be as efficient as
hand-coded logic.
You should be more ambitious! Let's start with this:
(Please consider all of this code pseudo-code!)
class ibitstream {
uint64_t buffer;
unsigned int bits_in_buffer;
uint32_t get_next_32() { .... }
template <unsigned int bits>
void require() {
if (bits_in_buffer < bits) {
buffer |= get_next_32() << bits_in_buffer;
}
}
public:
template // bits <= 32
void get(T& data) {
require<bits>();
T = buffer & ((1<> bits;
bits_in_buffer -= bits;
}
};
ibitstream s(......);
int a,b,c;
get<3>(a);
get<4>(b);
get<10>(c);
This avoids the bit-by-bit processing. A problem is that require() is
being called for every get(), and that's not going to be optimised away
because, for example, get_next_32() might throw and the compiler is going
to deliver that precisely. We probably only care if all-or-none of the
get()s succeed, which we could achieve with
template // bits <= 32
void unchecked_get(T& data) {
T = buffer & ((1<> bits;
bits_in_buffer -= bits;
}
template
// bits1 + bits2 + bits3 <= 32
void get(T1& t1, T2& t2, T3& T3) {
require();
unchecked_get<bits1>(t1);
unchecked_get<bits2>(t2);
unchecked_get<bits3>(t3);
}
int a,b,c;
get<3,4,10>(a,b,c);
No doubt one could make a version of that that
(a) uses varargs templates so you can get as many fields as you want at once, and
(b) removes the requirement that the total size is <= 32, with specialisations
for the <=32 and the >32 cases.
You could then use expression templates to wrap that up in the operator>>
syntax:
int a,b,c;
s >> bits<3>(a) >> bits<4>(b) >> bits<10>(c);
Or maybe:
bits<3> a; bits<4> b; bits<10> c;
s >> a >> b >> c;
I think there's a good chance that you could end up with something with
performance very close or even better than hand-written code.
Regards, Phil.