JSON and no-alloc
Greetings, I'm targeting the boost JSON library for an embedded target. There appears to be one place where there could be either a beneficial new feature, or the feature exists and I'm not seeing it. The section about Avoiding Dynamic Allocation is close to what we'd like to see, except that we don't want to depend on a stack buffer for value data. All of the text of the JSON message is present in the string view. It would be desirable to reference the values from there. Let me be a little more explicit. We receive a message in JSON format. The whole of this message must be received before we can pass it to the JSON parser. It's /possible/ to let the JSON parser manage the message as it arrives and manage both the parse and storage of the content. Doing so would require shifting some responsibilities within the firmware that are difficult to change. We parse out the fields we care about and then want to hold onto these values for some time without retaining the memory necessary for the parse itself. Since all of the values and keys are present in the original source buffer passed to parse() as a string_view, we could simply save the string_views for the values and discard the JSON parser. Except, this parser copies the value to another buffer s.t. discarding the parser invalidates the value string_views. Alternatively, if the values themselves were returned as string_views into the source buffer, these string_views would (could) survive destruction of the parser. If we are missing the method by which this can be implemented with the current parser, we'd be delighted to hear it. Cheers -- /Marc Oscar Singer/ *Woollysoft* +1.206.328.1718
On Mon, Aug 14, 2023 at 11:46 AM Marc Oscar Singer via Boost-users < boost-users@lists.boost.org> wrote:
I'm targeting the boost JSON library for an embedded target. There appears to be one place where there could be either a beneficial new feature, or the feature exists and I'm not seeing it.
The section about Avoiding Dynamic Allocation is close to what we'd like to see, except that we don't want to depend on a stack buffer for value data. All of the text of the JSON message is present in the string view. It would be desirable to reference the values from there.
When it comes to building a JSON library, there are many tradeoffs that must be made. Tailoring the library for one use-case necessarily inhibits other use cases or makes them less efficient. The design of Boost.JSON is based on offering an efficient and flexible "value type." That is, objects of type boost::json::value, which implement a form of variant across the seven JSON native types: null, bool, integer, floating point, string, array, and object. What you are describing is a completely different JSON library. For example your model would be read-only, while instances of boost::json::value are also writeable. Implementing the feature you describe using Boost.JSON's existing types would be, to put it simply, a huge hack and incur enormous technical debt. It would be much cleaner to simply offer that as its own library. Or perhaps in another namespace under Boost.JSON, which reuses very little of the existing Boost.JSON code. However, this effort is unnecessary because there is already a great library that does what you describe, and that is simdjson. Check it out: https://github.com/simdjson/simdjson Would this suit your needs? Regards
On 8/14/23 12:05, Vinnie Falco via Boost-users wrote:
On Mon, Aug 14, 2023 at 11:46 AM Marc Oscar Singer via Boost-users
wrote: I'm targeting the boost JSON library for an embedded target. There appears to be one place where there could be either a beneficial new feature, or the feature exists and I'm not seeing it.
The section about Avoiding Dynamic Allocation is close to what we'd like to see, except that we don't want to depend on a stack buffer for value data. All of the text of the JSON message is present in the string view. It would be desirable to reference the values from there.
When it comes to building a JSON library, there are many tradeoffs that must be made. Tailoring the library for one use-case necessarily inhibits other use cases or makes them less efficient. The design of Boost.JSON is based on offering an efficient and flexible "value type." That is, objects of type boost::json::value, which implement a form of variant across the seven JSON native types: null, bool, integer, floating point, string, array, and object.
What you are describing is a completely different JSON library. For example your model would be read-only, while instances of boost::json::value are also writeable. Implementing the feature you describe using Boost.JSON's existing types would be, to put it simply, a huge hack and incur enormous technical debt. It would be much cleaner to simply offer that as its own library. Or perhaps in another namespace under Boost.JSON, which reuses very little of the existing Boost.JSON code.
However, this effort is unnecessary because there is already a great library that does what you describe, and that is simdjson. Check it out:
https://github.com/simdjson/simdjson
Would this suit your needs?
Unlikely. Remember, this is for an embedded target. We are fortunate enough to have a 32 bit core. simdjson requires a 64 bit CPU. And, yes, of course ... tradeoffs. I was hopeful when I read that "Boost.JSON works great on embedded devices." For the kinds of embedded work we do, this is probably not the case. We don't tend to parse and reedit JSON. We tend to use JSON as a message serialization mechanism. This discussion is academic for the time being. The current release doesn't compile with ARM gcc and it doesn't appear to accept -fno-exceptions, even on MacOS, though I'm sure these details will be worked out. Cheers -- /Marc Oscar Singer/ *Woollysoft* +1.206.328.1718
On 14/08/2023 20:25, Marc Oscar Singer via Boost-users wrote:
However, this effort is unnecessary because there is already a great library that does what you describe, and that is simdjson. Check it out:
https://github.com/simdjson/simdjson
Would this suit your needs?
Unlikely. Remember, this is for an embedded target. We are fortunate enough to have a 32 bit core. simdjson requires a 64 bit CPU.
I always preferred https://github.com/chadaustin/sajson over simdjson, because sajson is completely in-place, you feed it a buffer and no further allocations are done.
And, yes, of course ... tradeoffs. I was hopeful when I read that "Boost.JSON works great on embedded devices." For the kinds of embedded work we do, this is probably not the case. We don't tend to parse and reedit JSON. We tend to use JSON as a message serialization mechanism.
"Embedded" means a lot of things. I'd personally call it "whatever Arduino can do", but even that is pretty full featured nowadays, there is malloc and free on Arduino, std::string works as you'd expect, as does std::vector. Plenty of smaller embedded can't do any of that. I have no idea if Boost.JSON can work on Arduino, but I suspect unless it's tested by CI to at least build under Arduino, it probably does not. Last time I checked Arduino does support a particularly ancient port of Boost, somebody hand forked and patched it. So it is theoretically doable for somebody who likes masochism. Of course, just because it compiles doesn't mean you'll fit your Boost using library into a few hundred Kb of Flash or RAM, so I suggest better to stick with libraries specifically designed to work well within dozens of Kb. Niall
On 8/14/23 12:54, Niall Douglas via Boost-users wrote:
Unlikely. Remember, this is for an embedded target. We are fortunate enough to have a 32 bit core. simdjson requires a 64 bit CPU. I always preferredhttps://github.com/chadaustin/sajson over simdjson, because sajson is completely in-place, you feed it a buffer and no further allocations are done.
Thanks for the pointer. My searches hadn't turned-up this particular implementation.
And, yes, of course ... tradeoffs. I was hopeful when I read that "Boost.JSON works great on embedded devices." For the kinds of embedded work we do, this is probably not the case. We don't tend to parse and reedit JSON. We tend to use JSON as a message serialization mechanism. "Embedded" means a lot of things. I'd personally call it "whatever Arduino can do", but even that is pretty full featured nowadays, there is malloc and free on Arduino, std::string works as you'd expect, as does std::vector. Plenty of smaller embedded can't do any of that.
I have no idea if Boost.JSON can work on Arduino, but I suspect unless it's tested by CI to at least build under Arduino, it probably does not.
Last time I checked Arduino does support a particularly ancient port of Boost, somebody hand forked and patched it. So it is theoretically doable for somebody who likes masochism.
Of course, just because it compiles doesn't mean you'll fit your Boost using library into a few hundred Kb of Flash or RAM, so I suggest better to stick with libraries specifically designed to work well within dozens of Kb.
The micros have started to include more code storage s.t. it is possible to run a library that takes 100KiB. We only get a couple of those before we have to rethink our strategy. Thankfully, I'm not constrained to the PIC, a particularly hostile development environment--IMHO. Mostly, the tricky part is in mixing cooperative threading with, well, almost anything not cut from whole cloth. E.g. Parse JSON. Submit to driver for asynchronous processing. Return to scheduler (releasing the stack). Resume when async process complete. We /can/ malloc. Or, we can find a library that doesn't require it. sajson seems to fit the bill. Cheers -- /Marc Oscar Singer/ *Woollysoft* +1.206.328.1718
participants (3)
-
Marc Oscar Singer
-
Niall Douglas
-
Vinnie Falco