GSoC 2014 Implementation of Algorithms for Boost.

Hi Guys, I am Tejas Nikumbh, a Senior UG at IIT Bombay. I am very much interested in contributing to Boost this year. I'm primarily interested in implementing Algorithms and Data Structures or both. I have considerable experience template based data structures as well as algorithms. I'll shortly post a link to the relevant code as this discussion goes on further. Besides the traditional DS and Algos I also have in mind implementation of certain awesome data structures (like say, KDTree) to extend the Boost Libraries capabilities. I found the following algorithms to be of importance to Boost [as from the ideas page on SVN] - Radix sort - Approximate string matching - Full text search - Near Duplicate Detection (shingling) - Parallel algorithms (sort, for_each) - Algorithms for gpgpu - Kinetic scrolling I know some of the algorithms in this list and would research and provide a in depth proposal for implementation of these into boost. As of now, I'd like to know about the potential mentors for this kind of project and whether it is something that Boost is looking forward to. I am pretty enthusiastic about this project so I would like to know how high the project is on Boost's priority list. Also, I wish to start early and get a head start by implementing one simple algorithm for Boost before GSoC so that I increase my chances of being selected. Please let me know what you guys think. -- Tejas Nikumbh, Fourth Year Undergraduate, Electrical Engineering Department, IIT Bombay.

Not to discourage you entirely from tackling it, but there is a
considerable amount of code for radix sort written already with a view to
inclusion in Boost. If you search through the archive of this mailing list
for "radix sort" you should find the discussion easily.
For starters there is this: https://github.com/jeremy-murphy/integer-sort
I have been inactive for a while but the idea of someone beating me to it
may be the motivation I need to finish it off. :)
Cheers.
Jeremy
On 19 February 2014 19:44, Tejas Nikumbh
Hi Guys,
I am Tejas Nikumbh, a Senior UG at IIT Bombay. I am very much interested in contributing to Boost this year. I'm primarily interested in implementing Algorithms and Data Structures or both. I have considerable experience template based data structures as well as algorithms. I'll shortly post a link to the relevant code as this discussion goes on further. Besides the traditional DS and Algos I also have in mind implementation of certain awesome data structures (like say, KDTree) to extend the Boost Libraries capabilities.
I found the following algorithms to be of importance to Boost [as from the ideas page on SVN]
- Radix sort - Approximate string matching - Full text search - Near Duplicate Detection (shingling) - Parallel algorithms (sort, for_each) - Algorithms for gpgpu - Kinetic scrolling
I know some of the algorithms in this list and would research and provide a in depth proposal for implementation of these into boost. As of now, I'd like to know about the potential mentors for this kind of project and whether it is something that Boost is looking forward to. I am pretty enthusiastic about this project so I would like to know how high the project is on Boost's priority list.
Also, I wish to start early and get a head start by implementing one simple algorithm for Boost before GSoC so that I increase my chances of being selected.
Please let me know what you guys think.
-- Tejas Nikumbh, Fourth Year Undergraduate, Electrical Engineering Department, IIT Bombay.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

But its not just radix sort right? The idea is about algorithms in general. So I kind of wanted to apply for a project that aims at applying for algorithms and or data structures in general..? Thanks, On Thu, Feb 20, 2014 at 7:48 PM, Jeremy Murphy < jeremy.william.murphy@gmail.com> wrote:
Not to discourage you entirely from tackling it, but there is a considerable amount of code for radix sort written already with a view to inclusion in Boost. If you search through the archive of this mailing list for "radix sort" you should find the discussion easily.
For starters there is this: https://github.com/jeremy-murphy/integer-sort I have been inactive for a while but the idea of someone beating me to it may be the motivation I need to finish it off. :)
Cheers.
Jeremy
On 19 February 2014 19:44, Tejas Nikumbh
wrote: Hi Guys,
I am Tejas Nikumbh, a Senior UG at IIT Bombay. I am very much interested in contributing to Boost this year. I'm primarily interested in implementing Algorithms and Data Structures or both. I have considerable experience template based data structures as well as algorithms. I'll shortly post a link to the relevant code as this discussion goes on further. Besides the traditional DS and Algos I also have in mind implementation of certain awesome data structures (like say, KDTree) to extend the Boost Libraries capabilities.
I found the following algorithms to be of importance to Boost [as from the ideas page on SVN]
- Radix sort - Approximate string matching - Full text search - Near Duplicate Detection (shingling) - Parallel algorithms (sort, for_each) - Algorithms for gpgpu - Kinetic scrolling
I know some of the algorithms in this list and would research and provide a in depth proposal for implementation of these into boost. As of now, I'd like to know about the potential mentors for this kind of project and whether it is something that Boost is looking forward to. I am pretty enthusiastic about this project so I would like to know how high the project is on Boost's priority list.
Also, I wish to start early and get a head start by implementing one simple algorithm for Boost before GSoC so that I increase my chances of being selected.
Please let me know what you guys think.
-- Tejas Nikumbh, Fourth Year Undergraduate, Electrical Engineering Department, IIT Bombay.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Tejas Nikumbh, Fourth Year Undergraduate, Electrical Engineering Department, IIT Bombay.

Right, I think the scope of the project is really up to you (and your
mentor). By "not tackling it" I just meant radix sort.
On 21 February 2014 01:22, Tejas Nikumbh
But its not just radix sort right? The idea is about algorithms in general. So I kind of wanted to apply for a project that aims at applying for algorithms and or data structures in general..?
Thanks,
On Thu, Feb 20, 2014 at 7:48 PM, Jeremy Murphy < jeremy.william.murphy@gmail.com> wrote:
Not to discourage you entirely from tackling it, but there is a considerable amount of code for radix sort written already with a view to inclusion in Boost. If you search through the archive of this mailing list for "radix sort" you should find the discussion easily.
For starters there is this: https://github.com/jeremy-murphy/integer-sort I have been inactive for a while but the idea of someone beating me to it may be the motivation I need to finish it off. :)
Cheers.
Jeremy
On 19 February 2014 19:44, Tejas Nikumbh
wrote: Hi Guys,
I am Tejas Nikumbh, a Senior UG at IIT Bombay. I am very much interested in contributing to Boost this year. I'm primarily interested in implementing Algorithms and Data Structures or both. I have considerable experience template based data structures as well as algorithms. I'll shortly post a link to the relevant code as this discussion goes on further. Besides the traditional DS and Algos I also have in mind implementation of certain awesome data structures (like say, KDTree) to extend the Boost Libraries capabilities.
I found the following algorithms to be of importance to Boost [as from the ideas page on SVN]
- Radix sort - Approximate string matching - Full text search - Near Duplicate Detection (shingling) - Parallel algorithms (sort, for_each) - Algorithms for gpgpu - Kinetic scrolling
I know some of the algorithms in this list and would research and provide a in depth proposal for implementation of these into boost. As of now, I'd like to know about the potential mentors for this kind of project and whether it is something that Boost is looking forward to. I am pretty enthusiastic about this project so I would like to know how high the project is on Boost's priority list.
Also, I wish to start early and get a head start by implementing one simple algorithm for Boost before GSoC so that I increase my chances of being selected.
Please let me know what you guys think.
-- Tejas Nikumbh, Fourth Year Undergraduate, Electrical Engineering Department, IIT Bombay.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Tejas Nikumbh, Fourth Year Undergraduate, Electrical Engineering Department, IIT Bombay.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Well, you got me interested in it again. Just spent a while adding support
for user-defined types.
That's really the last feature I needed to add.
Now I need to write some documentation and tidy up any silliness.
Would appreciate some eyes on it and would love to hear from anyone
interested in sort performance. For simple data types the time to sort is
shrunk by 2-20 x, with the best performance on the smallest type (char).
Cheers.
On 21 February 2014 18:30, Jeremy Murphy
Right, I think the scope of the project is really up to you (and your mentor). By "not tackling it" I just meant radix sort.
On 21 February 2014 01:22, Tejas Nikumbh
wrote: But its not just radix sort right? The idea is about algorithms in general. So I kind of wanted to apply for a project that aims at applying for algorithms and or data structures in general..?
Thanks,
On Thu, Feb 20, 2014 at 7:48 PM, Jeremy Murphy < jeremy.william.murphy@gmail.com> wrote:
Not to discourage you entirely from tackling it, but there is a considerable amount of code for radix sort written already with a view to inclusion in Boost. If you search through the archive of this mailing list for "radix sort" you should find the discussion easily.
For starters there is this: https://github.com/jeremy-murphy/integer-sort I have been inactive for a while but the idea of someone beating me to it may be the motivation I need to finish it off. :)
Cheers.
Jeremy
On 19 February 2014 19:44, Tejas Nikumbh
wrote: Hi Guys,
I am Tejas Nikumbh, a Senior UG at IIT Bombay. I am very much interested in contributing to Boost this year. I'm primarily interested in implementing Algorithms and Data Structures or both. I have considerable experience template based data structures as well as algorithms. I'll shortly post a link to the relevant code as this discussion goes on further. Besides the traditional DS and Algos I also have in mind implementation of certain awesome data structures (like say, KDTree) to extend the Boost Libraries capabilities.
I found the following algorithms to be of importance to Boost [as from the ideas page on SVN]
- Radix sort - Approximate string matching - Full text search - Near Duplicate Detection (shingling) - Parallel algorithms (sort, for_each) - Algorithms for gpgpu - Kinetic scrolling
I know some of the algorithms in this list and would research and provide a in depth proposal for implementation of these into boost. As of now, I'd like to know about the potential mentors for this kind of project and whether it is something that Boost is looking forward to. I am pretty enthusiastic about this project so I would like to know how high the project is on Boost's priority list.
Also, I wish to start early and get a head start by implementing one simple algorithm for Boost before GSoC so that I increase my chances of being selected.
Please let me know what you guys think.
-- Tejas Nikumbh, Fourth Year Undergraduate, Electrical Engineering Department, IIT Bombay.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Tejas Nikumbh, Fourth Year Undergraduate, Electrical Engineering Department, IIT Bombay.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Sat, Feb 22, 2014 at 10:33 PM, Jeremy Murphy < jeremy.william.murphy@gmail.com> wrote:
Would appreciate some eyes on it and would love to hear from anyone interested in sort performance
Yep I would appreciate some mentor attention on this thread as well. -- Tejas Nikumbh, Fourth Year Undergraduate, Electrical Engineering Department, IIT Bombay.

On 22 Feb 2014 at 22:38, Tejas Nikumbh wrote:
Yep I would appreciate some mentor attention on this thread as well.
Tejas, The GSoC ideas page lists many precanned GSoC projects written for students by potential mentors. If none of those interest you, then it is YOU who must write a suitable GSoC project proposal and ideally it is YOU who must try to find a potential mentor here willing to mentor you. The explanation at the top of the GSoC ideas page is very clear about this. You should be aware that all student proposals go into a peer review stage where Boost members vote on proposals, ranking them to merit. Google then reviews our suggested ranking and can and do alter it to their own wishes. We are then allocated a certain number of GSoC slots by Google, and the top X ranked proposals get GSoC funding. If your proposal is not of a similar quality to the precanned proposals written by potential mentors already on the GSoC ideas page, the chances are that your proposal will not be highly ranked. This is why it is especially important to start as early as possible if you wish to propose your own GSoC project. No one will write a proposal for you - if they did, that proposal would already be on the GSoC idea page. I hope this helps. Niall --- Boost C++ Libraries Google Summer of Code 2014 admin https://svn.boost.org/trac/boost/wiki/SoC2014

On Sun, Feb 23, 2014 at 7:22 AM, Niall Douglas
The GSoC ideas page lists many precanned GSoC projects written for students by potential mentors. If none of those interest you, then it is YOU who must write a suitable GSoC project proposal and ideally it is YOU who must try to find a potential mentor here willing to mentor you.
I started this thread to discuss a possible project based on the algorithms idea written at the bottom of the page. I assumed that the best way to know which mentors would be interested in guiding me would be starting a thread. Please let me know if I was wrong. No one will write a proposal for you - if they did, that proposal would
already be on the GSoC idea page.
Yes, this thread was just an introductory thread so that I could get things started. Is this not a good way to introduce? I would definitely go on to write a proposal after some discussion, but if I was supposed to write a proposal in the introductory thread itself, I'm sorry, I'll do that soon. I am new to this thread and the list in general and usually the norm is to introduce yourself first, get some reflection from the organization and go about working on your idea. I'm sorry if I did anything wrong here. Please let me know if I did. I'll also start looking into the ideas on the ideas page and re-evaluate what projects I can do, and post another thread about one of the ideas from the ideas list if they interest me. Thanks, -- Tejas Nikumbh, Fourth Year Undergraduate, Electrical Engineering Department, IIT Bombay.

On 23 Feb 2014 at 12:46, Tejas Nikumbh wrote:
The GSoC ideas page lists many precanned GSoC projects written for students by potential mentors. If none of those interest you, then it is YOU who must write a suitable GSoC project proposal and ideally it is YOU who must try to find a potential mentor here willing to mentor you.
I started this thread to discuss a possible project based on the algorithms idea written at the bottom of the page. I assumed that the best way to know which mentors would be interested in guiding me would be starting a thread. Please let me know if I was wrong.
No one will write a proposal for you - if they did, that
We see lots, and lots of people turn up here promising vague things and then disappear (they aren't usually students either). If an unknown person comes here with a well written proposal supported by facts and references to: 1. Research of prior art i.e. proof that they have researched alternative implementations in C++ and other libraries 2. Have a well designed, concrete design based on synthesis of the prior art research. It's great to state known unknowns. 3. Preferably have proof they have successfully written a high quality STL algorithm implementation before i.e. can supply a link to an open source copy. 4. They have a demonstrated good work ethic, and are humble and willing to learn. Then a student proposal is highly likely to be taken seriously, and you'll get plenty of feedback on what needs fixing, plus mentors will volunteer themselves. Miss any of the above factors and no one - student or otherwise - will be taken seriously here. proposal
would already be on the GSoC idea page.
Yes, this thread was just an introductory thread so that I could get things started. Is this not a good way to introduce? I would definitely go on to write a proposal after some discussion, but if I was supposed to write a proposal in the introductory thread itself, I'm sorry, I'll do that soon.
I do wish more students would read these same emails posted on this same list last year, the year before and so on. But every year we repeat.
I am new to this thread and the list in general and usually the norm is to introduce yourself first, get some reflection from the organization and go about working on your idea. I'm sorry if I did anything wrong here. Please let me know if I did.
No, you're doing fine. You have coherent English and are polite and you came here well before the final GSoC deadline, and that's a great first start. Niall --- Boost C++ Libraries Google Summer of Code 2014 admin https://svn.boost.org/trac/boost/wiki/SoC2014

On Sun, Feb 23, 2014 at 7:27 PM, Niall Douglas
quality STL algorithm implementation before i.e. can supply a link to an open source copy.
Firstly, thanks for the detailed response Niall. I really appreciate you providing feedback. I'm on to drafting a good first draft and spawning a new thread for that relevant idea, with the proposal attached. I just wanted to show you some Data Structure implementations that I've done, and know whether they are upto the standards that Boost expects. If not, I'm ready to do a simple DS or Code Demonstration(relevant to the project) before GSoC as per the guidelines provided. I'll include this point in the new thread as well. Please check this and let me know your views. [ I've linked to only two DS as I guess you would be busy to go over everything, but there are other DS in the repository as well.] https://github.com/tejasnikumbh/Datastructures/tree/master/Vector https://github.com/tejasnikumbh/Datastructures/tree/master/Stack Thanks, -- Tejas Nikumbh, Fourth Year Undergraduate, Electrical Engineering Department, IIT Bombay.

On 25 Feb 2014 at 9:01, Tejas Nikumbh wrote:
wanted to show you some Data Structure implementations that I've done, and know whether they are upto the standards that Boost expects. If not, I'm ready to do a simple DS or Code Demonstration(relevant to the project) before GSoC as per the guidelines provided. I'll include this point in the new thread as well.
Please check this and let me know your views. [ I've linked to only two DS as I guess you would be busy to go over everything, but there are other DS in the repository as well.]
https://github.com/tejasnikumbh/Datastructures/tree/master/Vector https://github.com/tejasnikumbh/Datastructures/tree/master/Stack
Well it definitely won't be me mentoring here, so anything I am about to say is personal opinion only and any mentor you may have may have very different views. So do bear that in mind. From a very cursory inspection: * Lack of Boost coding conventions. * Lack of STL idiom usage e.g. iterators, allocators. * Lack comprehensive documentation. * Lack of full unit test and functional testing. * Lack of exception safety guarantees or indeed exception safety. * Lack of worst case/average case complexity guarantees per API. * Lack of C++11 anything. * Implementation algorithms are distinctly suboptimal and show a lack of understanding of the C++ type system. Now don't take such criticism too hard - your code is actually above average for a typical compsci graduate, so you've actually done well relative to your peers. But you need to consider this: for me, a reasonably experienced programmer, to implement a Boost quality std::vector<> I would estimate at least 120 hours of my time. And I'm extremely productive when compared to a university student, and that 120 hours is probably me even still being overconfident in myself. Quality C++ really is *that* hard. More than half your effort goes on writing testing and documentation. Don't let this put you off a GSoC. Rather, let it convince you how valuable a GSoC would be to you. In one summer my student Paul last year went from a slightly above average graduate programmer ability to someone capable of putting professional software programmers to shame. In fact, he'll even be presenting at this year's C++ Now conference after he successfully passed conference peer review, and I think he'll do splendidly. Niall --- Boost C++ Libraries Google Summer of Code 2014 admin https://svn.boost.org/trac/boost/wiki/SoC2014
participants (3)
-
Jeremy Murphy
-
Niall Douglas
-
Tejas Nikumbh