[Boost][GSOC] proposal for Trie

21 Apr 2013

      ...
From 2012 till now, I am studying Computer Science in Peking University
Hi, this is my proposal for GSOC 2013, please read it and give me some
useful information on it, thanks

== Personal Details ==
 * Name:  Zhe(Hardy) Huang
 * College/University: Peking University
 * Course/Major: Computer Science
 * Degree Program: M.Sc.
 * Email: hzithlony@gmail.com
 * Homepage: not yet
 * Availability:
   * How much time do you plan to spend on your GSoC?
I will spend the most part time of the semester(till late June), and then
the whole summer on my GSoC project.
   * What are your intended start and end dates?
I basically will obey the rules of GSoC, and I will also listen to
 mentor’s advice on it.
   * What other factors affect your availability (exams, courses, moving,
work, etc.)?
I am having my courses now, and I will have my final exams in June. Since I
am not having many courses, I have enough time to do something

for the project before exams. And in summer I could almost work full time
on it.

== Background Information ==
Please summarize your educational background (degrees earned, courses
taken, etc.).
I majored in Computer Science in Zhongshan University from 2008 to 2012, by
which I earned my bachelor’s degree. During the four years, I learned

many basic courses on Computer Science, like algebra, logic, data
structure, algorithms, operating system, Compilers, databases, networks.
pursuing my master’s degree. My research field is information retrieval.

Please summarize your programming background (OSS projects, internships,
jobs, etc.).
GSoC is the first time that I get involved with a real and large OSS
project.
Besides, I’ve developed many things in C and C++.
I developed a simplified B-tree data structure in C during my intership for
a small company.
Later a system in C to extract and evaluate the content of webpages for my
research, and I am still on it.
An implementation of a search system in C++.
A webpage template system in C for my lab to generating webpages in order
to do some research on them.

Please tell us a little about your programming interests.
I like to write codes for the basic and general use. I am pursuing
well-framework of design and code, and high reusability when I think of
develop

the project.

Please tell us why you are interested in contributing to the Boost C++
Libraries.
Boost is widely used by programmers and famous for its powerful and
efficient. It is an honor to contribute to such a lib, and have my codes
used

and reviewed by so many programmers. I like the reusability of
well-designed lib, so it also will be a good chance for me to improve my
skill of

design and programming. And a good chance to communicate with good guys as
well. I want to be a good programmer with many good programmers as my

friends.

What is your interest in the project you are proposing?
Because I have participated in many competitions of algorithm, I have a
good knowledge of algorithms and data structures. I have also implement

many, but for the sake of winning, the codes usually can not be
well-reused. Meanwhile the STL in C++, which I’ve used many times during
the

competitions, such as set, map, priority_queue, sort are good examples of
well-reused algorithms and data structures. So I’ve been expecting to

develop some general libraries of algorithms and data structures. Boost is
a well-formed organization, and Boost lib is a widely-used lib, so I

also want to learn some skills to develop a good lib from developing the
project with mentors in Boost.

Have you done any previous work in this area before or on similar projects?
I am good at data structure and algorithm, and I wrote codes of various
algorithms and data structures for many times. I am very familiar with the

Trie data structure, and as I list above, I've develop a simplified general
B-tree data structure in C, which is a good experience when I develop

Trie for Boost.

What are your plans beyond this Summer of Code time frame for your proposed
work?.
I will dedicate my summer to my work. And taking the experience as a good
beginning, I will be feeling happy to contribute more to Boost, and other

OSS projects.

Please rate, from 0 to 5 (0 being no experience, 5 being expert), your
knowledge of the following languages, technologies, or tools:
  - C++ 3
  - C++ Standard Library 3
  - Boost C++ Libraries 2
  - Subversion 3

What software development environments are you most familiar with (Visual
Studio, Eclipse, KDevelop, etc.)?
I use Visual Studio in my lab, and vim + gcc for myself.

What software documentation tool are you most familiar with (Doxygen,
!DocBook, Quickbook, etc.)?
I’ve used Doxygen before, it is simple and efficient.

== Project Proposal ==
Please provide a description of your proposed work.
Trie(Radix Tree) is an efficient in memory data structure to deal with
string storing and searching. It can be used in many applications, and in

comparison to hashtables, it has better expandability and stability. It
will be an data structure of great importance and wide usage.

== Proposed Milestones and Schedule ==
Please provide estimated milestones and schedule for completing the
proposed work.
Before May 27:
To familiarize with the Boost lib, with reading the codes and discussing
about it in mailing list.
 To have a better knowledge of lib, with reading some books of STL and
Boost, and comparing Boost with STL.
To setup us a proper environment for development.
 Study deep into Trie, reading some others’ good implementations.
May 27 – June 17:
Keep in touch with the Boost organization and have good communications with
my mentor.
 Get involved with the development of boost.
Try some toys about Boost, and discuss them with mentor, in order to know
better about mentor and how the development goes on.
June 17 – July 15(Coding period begins):
Discuss the detail of the design with mento, develop an alpha version of
Trie, and have it tested in small scale.

July 15 – July 29(To mid-term evaluation):
Refine my alpha codes, analyze the performance, and try to improve it.
Write an initial report of the project and document of Trie.

July 29 – September 23:
Refine my codes, and have some thorough tests on it, make it robust.
Reorganize codes and improve documents.

Hardy Huang

Antony Polukhin

Hardy

Michael Marcin

Antony Polukhin

Michael Marcin

Antony Polukhin

Michael Marcin

Antony Polukhin

Michael Marcin

Antony Polukhin

Michael Marcin

Hardy

Antony Polukhin

Hardy

Michael Marcin

Hardy

Antony Polukhin

Hardy

Antony Polukhin

Hardy

Antony Polukhin

Hardy

Antony Polukhin

Hardy

Hardy

Arash Partow

Hardy

tags

participants (5)