Google Summer of Code Proposal Feedback
Hello Boost community, I am a student interested in the Google Summer of Code (GSOC), and have a few proposals in mind. If I have posted this in the wrong place, my apologies, but applicants are encouraged to post to the mailing list, and after some research this seemed like the best list to post to. I have an original proposal for Boost, and strong interest in the Boost.ASIO idea. I've decided to seek some feedback from the community on their feelings about my idea, in the hopes that I can better define them, and increase my chances of success in being selected for the GSOC. I realize it is a bit late to begin discussing proposals, but I wasn't really aware of GSOC until recently. I had strongly considered writing a proposal for Boos.ASIO, but I fear my lack of experience with Linux, POSIX, and concurrency in general might prove to be too great of an obstacle. If the subject matter can be learned relatively quickly, I would love to discuss what such a project would require with the mentor, Niall Douglas, or a community member who has is familiar with the project. *About Me* First, let me begin with a bit about myself. My name is Paul Kirth, and I am currently pursuing a masters in Computer Science. This statement may be a bit misleading, however, as I received my BS in Civil Engineering from UCLA in 2005, and have only recently decided to pursue a course of study in Computer Science. I have for the last two years taken CS classes to prepare for Masters level courses, but am still a bit of a novice in CS. I would say that I am at about a 3rd year level in CS, proficient in C++ and python, and competent in Java, with a limited amount of assembly, and very little work in concurrency. I have not spent much time doing engineering work in the last few years, as my focus has been on my education, but I was fairly good at solving linear systems, and had a good amount of experience with design and collaboration. As an undergraduate I made the mistake of not pursuing summer internships or jobs, and as a result I found that many of my peers had a much easier time with our courses as a result of their work experience. After all, 20-30 hours a week solving engineering problems will generally make you more proficient at solving them than someone who doesn't. I do not wish to make the same mistake with my new career path, and desperately want to gain experience in professional software design, and in working on a collaborative project. I am keenly aware of my lack of experience, and really wish to use this opportunity to push myself, and gain as much expertise as possible during the few months of the summer. *Original Proposal* My original proposal probably isn't unique, and much to my chagrin as I read the hints for successful proposals it seems a popular topic. I am of course suggesting a project dealing with Trees. Rather than suggest building a generic tree, I am suggesting the start of a tree library, or the addition of several types of trees to existing libraries, as I am not sure which is more appropriate. Because trees are such a fundamental data type in computer science, I have found it difficult to understand why there are no explicit tree classes in the c++ standard. I have read the arguments that there is no standard use for a tree, and as such it cannot be made to satisfy enough users. I have read more arguments stating that std::sets and std::map use tree structures underneath their interface. Furthermore I have read in several places that it is common to use Boost.graph to make trees. Trees have myriad uses, and a library implementation should consider this heavily when designing the data structure, but trees are common enough to warrant a standard implementation in the same way as lists, queues, and stacks. I will concede that trees can be considered, and in fact are, graphs, and that many graph algorithms are very powerful tools. But trees don't always need those algorithms, and graphs don't always maintain the tree in the way we wish. Using an std::set or std::map is not the same thing as using a tree, regardless of how they are implemented. Conservatively, a linked list can be fully implemented and tested in an afternoon, while implementing a Red Black Tree would take considerably longer, yet the C++ standard has linked lists and not Red Black Trees. This seems a bit counter intuitive. We have a relativity simple data structure in the standard, and a more complex one is not. I'm sure that linked lists are more common than red black trees, but by how much? If self balancing binary trees are considered a fundamental data structure by every computer science curriculum, why aren't comprehensive, reliable implementations of them part of the standard, or widely available in 3rd party libraries like Boost? The library I am proposing should eventually be able to satisfy almost anyone requiring the use of a tree. It should have many different types of trees, Binary trees, AVL Trees, Red Black Trees, Treeps, B-Trees, Skip-Lists, etc. It should support common methods of traversal, inorder, preorder, postorder, and user defined traversal methods. Eventually, the library should support an n-ary trees and their operations. Trees in the library should also be performant, fail gracefully, be well documented, unit and integration tested, as exception safe as possible, and, someday, concurrency safe/compatible. I'm unsure of the Boost stance on inheritance is, but my initial feeling is that users should be able to use tree classes as base classes if they desire. Proposing a large library as a summer project is ridiculous, hence I am proposing a start to this library, containing a few tree classes that can be implemented, tested, and documented within a few months, and a set of standards for new entries in the library. I don't expect the whole library to immediately be accepted into Boost, but the project will aim to meet those standards, with a long term goal of eventually being included into Boost. My research has found several projects that implement single tree classes, or groups of tree classes, but no tree library. A good c++ example is the rbrees project at https://code.google.com/p/rbtrees/ , or python has a few implementations of trees, such as bintrees 0.3.0, https://pypi.python.org/pypi/bintrees/0.3.0, which is a binary tree library, containing BinaryTree, AVLTree, and RBTree classes. I feel that bintrees 0.3.0 is a good project to mimic in the early stages, as binary trees are fairly easy to implement, test, and maintain with AVL and RB Trees. I am comfortable with binary, AVL and Red Black Trees, and feel very confident in my ability to implement them. I would like to do more, and push myself this summer, but I am not sure how much work unit and integration testing are, how long they take, and how much extra time is required for classes/libraries to be refined to Boost standards. I am sure that this time is non trivial, but as a student I'm very unsure about how much time things take, or how much time they should take. In general I would like to work on a project that would expose me to more new ideas, but I really think the world deserves access to a C++ tree library that can have some guarantees for performance, and is part of a larger, well documented, highly respected library. I understand that this project is virtually guaranteed to take more that one summer's worth of effort, and I am comfortable with staying involved in the project beyond the length of GSOC, or having the work done absorbed into another library/project. Whether or not this project is accepted into GSOC, if there is support for this effort in the community I would love to help as I am able. Obviously if I am working this summer my commitment will not be able to be as substantial, but I do feel strongly about the need for standard, performant, reliable tree classes in c++. Thank you for reading my lengthy first post to the mailing list. I hope to have more discussions with this community in the future, and I am eager to hear your thoughts about how this proposal could be improved. -- Paul Kirth (310) 709-2516
participants (1)
-
Paul Kirth