JSON Parser GSoC 2013

newer
Niranjan Singh has invited you to...

Stephan Bourgeois

10 Apr 2013 10 Apr '13

9:46 p.m.

Hi everybody, I am studying for an MSc in Computer Science at Oxford Brookes University. I have taken a compiler construction course this year and I am therefore interested in the JSON Parser idea for the Boost library. Open Source JSON parsers have already been implemented in C++ and in Java. Examples of Java libraries are: Gson, quick-json. Even if other libraries already exist, developers who are using Boost for their project will appreciate having a JSON parser within Boost.

...

From a compiler construction background, writing a JSON parser is not difficult. The JSON grammar is simple and the specification is easy to find. Potential difficulties can exist with robustness of error handling and recovery.

Show replies by date

Philip Bennefall

10 Apr 10 Apr

10:23 p.m.

----- Original Message ----- From: "Stephan Bourgeois" To: Sent: Wednesday, April 10, 2013 11:46 PM Subject: [boost] JSON Parser GSoC 2013 Hi everybody, I am studying for an MSc in Computer Science at Oxford Brookes University. I have taken a compiler construction course this year and I am therefore interested in the JSON Parser idea for the Boost library. Open Source JSON parsers have already been implemented in C++ and in Java. Examples of Java libraries are: Gson, quick-json. Even if other libraries already exist, developers who are using Boost for their project will appreciate having a JSON parser within Boost.

...

From a compiler construction background, writing a JSON parser is not difficult. The JSON grammar is simple and the specification is easy to find. Potential difficulties can exist with robustness of error handling and recovery.

The question is what data structure we should use to represent JSON objects, and how the user can access key/value pairs in those objects. (examples: Boost.PropertyTree, pre-existing C++ object, ...) Ideally we should offer validating and non-validating implementations. We should also offer JSON generation as well as parsing. Let me know what you think. Kind regards, Stephan. Hi Stephan, Have you looked at boost.property_tree? That includes a JSON parser among other things, as well as an appropriate data structure to hold the tree. Are you invisioning something different? Kind regards, Philip Bennefall

Klaim - Joël Lamotte

11 Apr 11 Apr

10:50 a.m.

On Thu, Apr 11, 2013 at 12:23 AM, Philip Bennefall wrote:

...

Have you looked at boost.property_tree? That includes a JSON parser among other things, as well as an appropriate data structure to hold the tree. Are you invisioning something different?

It have been pointed several times that boost::property_tree isn't appropriate if you want a JSON library, it only provide a JSON-like serialization but doesn't provide all valid JSON syntax/values, same thing with XML. See http://boost.2283326.n4.nabble.com/Using-property-tree-as-json-reader-writer...

Sergey Cheban

12:19 a.m.

11.04.2013 1:46, Stephan Bourgeois пишет:

...

I am studying for an MSc in Computer Science at Oxford Brookes University. I have taken a compiler construction course this year and I am therefore interested in the JSON Parser idea for the Boost library. There already is a json parser example in http://svn.boost.org/svn/boost/trunk/libs/spirit/example/qi/json/

I think it would be good to move it to the boost::spirit repository and to the release branch. It may be also a good idea to use this parser in the boost::property_tree instead of the current one.

...

The question is what data structure we should use to represent JSON objects, and how the user can access key/value pairs in those objects. (examples: Boost.PropertyTree, pre-existing C++ object, ...) The structure used in the spirit/example/qi/json/ looks reasonable: the json object is represented as a map and the value is represented as a boost::variant (with some wrapper around it).

I don't like the boost::property_tree approach because it loses all the type information for the values. -- Best regards, Sergey Cheban

Michael Caisse

12 Apr 12 Apr

12:37 a.m.

On 04/10/2013 05:19 PM, Sergey Cheban wrote:

...

There already is a json parser example in http://svn.boost.org/svn/boost/trunk/libs/spirit/example/qi/json/

FWIW, the json example parser in Spirit is based on our json library that we finally pushed to github this week. Links and docs (soon) can be found here: http://cierelabs.org The goal was to create a json library that allows usage similar to javascript or Python. I believe the parser is fully compliant and we would love to get some feedback from users. Hopefully it is something useful to the community. michael -- Michael Caisse ciere consulting ciere.com

Sergey Cheban

2:20 a.m.

12.04.2013 4:37, Michael Caisse пишет:

...

FWIW, the json example parser in Spirit is based on our json library that we finally pushed to github this week. Links and docs (soon) can be found here: http://cierelabs.org

The goal was to create a json library that allows usage similar to javascript or Python. I believe the parser is fully compliant and we would love to get some feedback from users.

Hopefully it is something useful to the community. I'm glad to hear it but I have to say that for the software development it is important to minimize the number of the external libraries count.

Every external dependence (i.e. library) leads to the additional costs: - the license compatibility must be checked - the version compatibility must be checked - the library must be built and installed - if the library is abandoned, it's a problem - etc. One of the benefits of the Boost is that it is licensed, tested and distributed as one thing. And I think that it would be good for your json parser if it was included into the Boost. PS. For now, I'm using my own implementation of the json parser. I would be happy to switch to the external one if it was included into one of the libraries I'm already using. -- Best regards, Sergey Cheban

Bjorn Reese

11 Apr 11 Apr

10:44 a.m.

On 04/10/2013 11:46 PM, Stephan Bourgeois wrote:

...

Open Source JSON parsers have already been implemented in C++ and in Java. Examples of Java libraries are: Gson, quick-json. Even if other libraries already exist, developers who are using Boost for their project will appreciate having a JSON parser within Boost.

Agreed.

...

The question is what data structure we should use to represent JSON objects, and how the user can access key/value pairs in those objects. (examples: Boost.PropertyTree, pre-existing C++ object, ...)

I would like to have several different interfaces: 1. Tokenizer API which reads the next token from an input string. This is important for streaming data. 2. Iterator API which iterates to the next token in the input string. Remembers its parent scopes (unlike tokenizer.) This is similar to the XmlTextReader API. 3. Tree API which parses the entire input string into a tree structure. This is a bit like the DOM API, and this is what the Spirit example and Boost.PropertyTree provides. 4. Serialization API which provides a Boost.Serialization input archive without going through an intermediate tree representation. For each of the above there should be corresponding generation interfaces. I have already created the tokenizer and serialization APIs for JSON (and several other encoding formats) at: http://protoc.sourceforge.net/ I have not had the opportunity to look into the iterator and tree APIs yet, so this may be a good candidate for a GSoC project. As there is no mentor for the JSON parsing library, I am willing to mentor it if is based on the protoc code. However, I am only a Boost hang-around, so I do not know the proper procedures for this. Unfortunately, the code is currently undocumented, so the best place to start is the code itself: http://sourceforge.net/p/protoc/code/ci/master/tree/include/protoc/json/ http://sourceforge.net/p/protoc/code/ci/master/tree/src/json/ decoder.hpp contains the tokenizer API. encoder.hpp contains the token generator API. iarchive.hpp contains the serialization input archive. oarchive.hpp contains the serialization output archive.

...

Ideally we should offer validating and non-validating implementations. We should also offer JSON generation as well as parsing.

Agreed. It is mainly the string validation that is going to be a (minor) challenge.

Michael Marcin

12 Apr 12 Apr

3:42 a.m.

Bjorn Reese wrote:

...

On 04/10/2013 11:46 PM, Stephan Bourgeois wrote:

...
Open Source JSON parsers have already been implemented in C++ and in Java. Examples of Java libraries are: Gson, quick-json. Even if other libraries already exist, developers who are using Boost for their project will appreciate having a JSON parser within Boost.

Agreed.

...
The question is what data structure we should use to represent JSON objects, and how the user can access key/value pairs in those objects. (examples: Boost.PropertyTree, pre-existing C++ object, ...)

I would like to have several different interfaces:

1. Tokenizer API which reads the next token from an input string. This is important for streaming data.

2. Iterator API which iterates to the next token in the input string. Remembers its parent scopes (unlike tokenizer.) This is similar to the XmlTextReader API.

3. Tree API which parses the entire input string into a tree structure. This is a bit like the DOM API, and this is what the Spirit example and Boost.PropertyTree provides.

4. Serialization API which provides a Boost.Serialization input archive without going through an intermediate tree representation.

I would think to add perhaps a fusion API which I think should allow you to have a usage similar to that of JsonFX in C# for any fusion adapted type. http://codepad.org/mu0VD9LG

Bjorn Reese

9:02 a.m.

On 04/12/2013 05:42 AM, Michael Marcin wrote:

...

I would think to add perhaps a fusion API which I think should allow you to have a usage similar to that of JsonFX in C# for any fusion adapted type.

This sounds like a great idea.

Arindam Mukherjee

9:13 a.m.

In JSON we typically deal with maps and arrays. The arrays themselves could have arbitrary types (string, object, array, numeric, boolean, null) as elements. The key types in the maps are always strings and the value types in the maps could be anything that can appear in an array, including another map or array. Due to this, I'd imagine being able to use Boost.Variant or Boost.Any in a list and as a value_type in a map would help. Regards, Arindam. On Thu, Apr 11, 2013 at 3:16 AM, Stephan Bourgeois wrote:

...

Hi everybody, I am studying for an MSc in Computer Science at Oxford Brookes University. I have taken a compiler construction course this year and I am therefore interested in the JSON Parser idea for the Boost library.

Open Source JSON parsers have already been implemented in C++ and in Java. Examples of Java libraries are: Gson, quick-json. Even if other libraries already exist, developers who are using Boost for their project will appreciate having a JSON parser within Boost.

From a compiler construction background, writing a JSON parser is not difficult. The JSON grammar is simple and the specification is easy to find. Potential difficulties can exist with robustness of error handling and recovery.

The question is what data structure we should use to represent JSON objects, and how the user can access key/value pairs in those objects. (examples: Boost.PropertyTree, pre-existing C++ object, ...)

Ideally we should offer validating and non-validating implementations. We should also offer JSON generation as well as parsing.

Let me know what you think. Kind regards, Stephan.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Michael Marcin

3:53 p.m.

Arindam Mukherjee wrote:

...

In JSON we typically deal with maps and arrays. The arrays themselves could have arbitrary types (string, object, array, numeric, boolean, null) as elements. The key types in the maps are always strings and the value types in the maps could be anything that can appear in an array, including another map or array.

Due to this, I'd imagine being able to use Boost.Variant or Boost.Any in a list and as a value_type in a map would help.

Probably. I find the most useful interface is to just provide a datatype you're expecting and let the json parser try its best to do the right thing. For example: string raw_json = R"({ data:{ a:"hello", b:"world", c:3, widget:3.5 } })"; struct my_type1 { map data; }; struct my_type2 { map> data; }; struct my_type3 { map data; }; struct my_type4 { unordered_map data; }; struct my_type5 { struct my_data { string a; string b; int c; float widget; } data; }; // these should all just work, with maybe a little bit of // metaprogramming to describe the types to the json library json::deserialize( json ); json::deserialize( json ); json::deserialize( json ); json::deserialize( json ); json::deserialize( json ); struct my_type6 { map data; }; // runtime error can't meaningfully convert // "hello" to an int json::deserialize( json );

TONGARI

4:21 p.m.

2013/4/12 Michael Marcin

...

Arindam Mukherjee wrote:

...
In JSON we typically deal with maps and arrays. The arrays themselves could have arbitrary types (string, object, array, numeric, boolean, null) as elements. The key types in the maps are always strings and the value types in the maps could be anything that can appear in an array, including another map or array.

Due to this, I'd imagine being able to use Boost.Variant or Boost.Any in a list and as a value_type in a map would help.

Probably.

I find the most useful interface is to just provide a datatype you're expecting and let the json parser try its best to do the right thing.

Very like what I experienced with Spirit before, yeah, it just works. I let the user specify the desired types though traits specialization. The old code is here: https://github.com/jamboree/jsume/blob/master/example/pretty_printer/json_co...

...

For example:

string raw_json = R"({ data:{ a:"hello", b:"world", c:3, widget:3.5

} })";

This doesn't seem like a valid json, the key must be a quoted string.

...

struct my_type1 { map data; };

struct my_type2 { map> data; };

struct my_type3 { map data; };

struct my_type4 { unordered_map data; };

struct my_type5 { struct my_data { string a; string b; int c; float widget; } data; };

It's impossible for my_type5 without more advanced Fusion adaption.

Michael Marcin

5:18 p.m.

On 4/12/13 11:21 AM, TONGARI wrote:

...

2013/4/12 Michael Marcin

...
For example:

string raw_json = R"({ data:{ a:"hello", b:"world", c:3, widget:3.5

} })";

This doesn't seem like a valid json, the key must be a quoted string.

In practice you often find json that does not quote keys and I would prefer a useful library to a pedantic one for this task. You could have a strict mode I suppose. I pretty much just go by what http://jsonviewer.stack.hu/ accepts.

...

It's impossible for my_type5 without more advanced Fusion adaption.

I would hope that if you adapt my_type5 and my_type5::my_data it could work. I guess you would need a little more than basic fusion adaption to get strings for the member variable names in order to parse json pairs associatively.

Bjorn Reese

13 Apr 13 Apr

8:41 a.m.

On 04/12/2013 07:18 PM, Michael Marcin wrote:

...

In practice you often find json that does not quote keys and I would prefer a useful library to a pedantic one for this task. You could have a strict mode I suppose.

I pretty much just go by what http://jsonviewer.stack.hu/ accepts.

If we are to accept an extended syntax, then it should be absolutely clear what those extensions are. I have no idea what the URL above accepts. Apart from unquoted keys, two other extensions I have seen are C-style comments, and support for the floating-point values of infinity and NaN. Are there other potential extensions?

Ilya Bobyr

10:59 a.m.

On 4/12/2013 10:18 AM, Michael Marcin wrote:

...

On 4/12/13 11:21 AM, TONGARI wrote:

...
2013/4/12 Michael Marcin

...
For example:

string raw_json = R"({ data:{ a:"hello", b:"world", c:3, widget:3.5

} })";

This doesn't seem like a valid json, the key must be a quoted string.

In practice you often find json that does not quote keys and I would prefer a useful library to a pedantic one for this task. You could have a strict mode I suppose.

I pretty much just go by what http://jsonviewer.stack.hu/ accepts.

According to both the RFC and the json.org keys should be string values and thus should have quotes. RFC: http://www.ietf.org/rfc/rfc4627.txt JSON.org: http://json.org/ Thank you, Ilya Bobyr

4332

Age (days ago)

4335

Last active (days ago)

List overview

Download

14 comments

10 participants

participants (10)

Arindam Mukherjee
Bjorn Reese
Ilya Bobyr
Klaim - Joël Lamotte
Michael Caisse
Michael Marcin
Philip Bennefall
Sergey Cheban
Stephan Bourgeois
TONGARI