Adding Search Functionality to boost.org
Hey everyone, One of the issues with the current Boost website is the absence of search functionality. This forces users to rely on search engines and use queries like `site:boost.org asio cancellation_slot` However, this approach has several problems. Firstly, it often displays outdated versions in the search results. Secondly, it lacks a proper hierarchical presentation to effectively guide users. Moreover, there is no way to filter the results for a specific library. To address this problem quickly, we have decided to create a search index for the existing HTML documents on the Boost website. However, we encountered a primary challenge during this process. We needed to extract contents with the correct hierarchy for each library to ensure that the search results were displayed in a hierarchical manner. For instance, the results will be shown as `Asio › Reference › Deferred_values › Requirements` This hierarchical presentation can significantly enhance the user experience. The Boost libraries utilize various documentation formats such as QuickBook, AsciiDoc, Doxygen, and even multiple handwritten formats. This diversity posed a challenge in creating a generic crawler script. To overcome this obstacle, we initiated the Boost.Gecko project. This project involved the development of 16 custom crawlers, tailored to extract search records from the 151 Boost libraries. We are currently utilizing the Algolia search platform to index search records, which offers a free plan for open-source projects. Algolia also provides a rich JavaScript library for building a search interface on the frontend. We have leveraged this library to design a customized user interface for the search box, enhancing the navigation of search results. I kindly request you try out the new search functionality by visiting the demo page at: https://cppalliance.org/boost-gecko/ and share your thoughts, suggestions, or any issues you may encounter. Your input will greatly assist us in refining the search feature and ensuring it meets the needs of our users. Please note that the drop-down for selecting a library is included to simulate the experience of being on a specific library page. Our plan is to incorporate a search button into the header of every library page, defaulted to search within that specific library. Demo page: https://cppalliance.org/boost-gecko/ Repository: https://github.com/cppalliance/boost-gecko/ Respectfully Yours, Mohammad Nejati ---- C++ Alliance Staff Engineer
Hey this is pretty awesome, thanks! I tried it with several libraries using different formats, it worked pretty much flawlessly, and the interface is quick and slick. I have a question, what are the semantics of surrounding a search query in quotes? E.g. if I search for "default constructor" I get pretty much what I expect, but searching for "mutable" also produces things like "MutableBufferSequence". It seems to me quotes are ignored if the query has only one word? Anyway, it may be better if using quotes finds exact matches only. Obviously this is a very minor thing. Another question, how does it handle different Boost versions? E.g. what if a function is removed, it would be nice if it doesn't dig it up from some old version of the library. Overall it looks great!
Hey this is pretty awesome, thanks! I tried it with several libraries using different formats, it worked pretty much flawlessly, and the interface is quick and slick.
Thank you for your kind words and for taking the time to try it out.
but searching for "mutable" also produces things like "MutableBufferSequence" It seems to me quotes are ignored if the query has only one word?
I believe it looks for an exact match because "MutableBufferSequence" is prefixed with 'mutable,' which causes it to appear in the results. Algolia has a configuration parameter related to this, but I have tried all possible options and haven't noticed any difference in the results: https://www.algolia.com/doc/api-reference/api-parameters/exactOnSingleWordQu....
Another question, how does it handle different Boost versions?
It searches only in the exact version of the library being browsed.
For example, if you are at:
https://www.boost.org/doc/libs/1_82_0/doc/html/boost_asio.html, it
will search in Asio at version 1_82_0, and the same applies to the
"Other Libraries" tab as well.
On Fri, Jun 23, 2023 at 7:11 AM Emil Dotchevski via Boost
Hey this is pretty awesome, thanks! I tried it with several libraries using different formats, it worked pretty much flawlessly, and the interface is quick and slick.
I have a question, what are the semantics of surrounding a search query in quotes? E.g. if I search for "default constructor" I get pretty much what I expect, but searching for "mutable" also produces things like "MutableBufferSequence". It seems to me quotes are ignored if the query has only one word? Anyway, it may be better if using quotes finds exact matches only. Obviously this is a very minor thing.
Another question, how does it handle different Boost versions? E.g. what if a function is removed, it would be nice if it doesn't dig it up from some old version of the library.
Overall it looks great!
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Fri, 23 June 2023, 4:30 am Mohammad Nejati [ashtum] via Boost, < boost@lists.boost.org> wrote:
Hey everyone,
One of the issues with the current Boost website is the absence of search functionality... However, this approach has several problems. ... versions ... Secondly, it lacks a proper hierarchical presentation to effectively guide users. Moreover, there is no way to filter the results for a specific library.
These are addressed which is great. This hierarchical presentation can
significantly enhance the user experience.
It does. This is very nice.
The Boost libraries utilize various documentation formats such as QuickBook, AsciiDoc, Doxygen, and even multiple handwritten formats. This diversity posed a challenge in creating a generic crawler script. To overcome this obstacle, we initiated the Boost.Gecko project. This project involved the development of 16 custom crawlers, tailored to extract search records from the 151 Boost libraries.
We have leveraged this library to design a
customized user interface for the search box, enhancing the navigation of search results.
I was interested in seeing how this would work for searching for features across libraries. This works (chose an arbitrary library and you get the "other libraries" results. However these are neither ordered nor grouped by library, presumably there is some other ranking at work, which makes sense. It would be a nice enhancement to list "hits per library" and allow selection of a library from the result set to view just that library's hits. Our plan is to incorporate a search button
into the header of every library page, defaulted to search within that specific library.
This would still benefit from a better (condensed) "other libraries" handling - simply being able to navigate to another library and see the result of the same query in that libraries context would be nice. A search for "asynchronous" produces 1.8k hits across multiple (hard to tell how many in current interface) libraries. That isn't a criticism. It's a "this would be a nice feature" note. Thanks Darryl Green
I was interested in seeing how this would work for searching for features across libraries. This works (chose an arbitrary library and you get the "other libraries" results. However these are neither ordered nor grouped by library, presumably there is some other ranking at work, which makes sense. It would be a nice enhancement to list "hits per library" and allow selection of a library from the result set to view just that library's hits.
Thank you for your valuable feedback. Indeed, the search results are ranked based on proximity and whether they match the hierarchy. However, grouping the results poses a challenge due to the presence of a 'Show More' button that retrieves additional results. These new results can belong to previous libraries, making it difficult to track them if we simply assign them to their respective groups. I have created an issue in the repository, as I believe there might be a better alternative to the current design.
On 6/22/23 11:29 AM, Mohammad Nejati [ashtum] via Boost wrote:
Hey everyone,
I kindly request you try out the new search functionality by visiting the demo page at: https://cppalliance.org/boost-gecko/ and share your thoughts, suggestions, or any issues you may encounter. Your input will greatly assist us in refining the search feature and ensuring it meets the needs of our users. Please note that the drop-down for selecting a library is included to simulate the experience of being on a specific library page. Our plan is to incorporate a search button into the header of every library page, defaulted to search within that specific library.
Respectfully Yours, Mohammad Nejati
Hmmmm - seems to me that you've made this a much bigger job that it would otherwise be. Most of the Boost documentation is run through a tool chain which includes Boost Book. Boost book is a doc book XML derivative. At that point, the XML file contains all the documentation along with semantic tags indicating what the contents of the fields are. Generating an enhanced global table of contents can be done with an XSLT script which the current document tool chain already requires. Some of the boost libraries do not use Boost Book. The above paragraph would not apply to those libraries. In some cases the library predates the implementation of Boost Book (e.g. Serialization, Iterators) and other cases, the library authors have elected not to use Boost Book. (e.g. MP11). To my mind, efforts would be better spent just converting all libraries to Boost Book. It's a more general approach and automatically includes functionality such as rendering documentation as PDF. Robert Ramey
On Tue, Jun 27, 2023 at 10:34 AM Robert Ramey via Boost
To my mind, efforts would be better spent just converting all libraries to Boost Book.
Yep, I'll do that right after Peter does it. Note that I addressed this in my email to the list during C++Now. I quote:
I think these problems can be solved. But not by demanding that “everyone who maintains a Boost library must do X.” In Boost culture when you want something done you need to do it yourself, then convince the mailing list of the merits of your proposal.
https://cppalliance.org/boost/2023/05/08/Future-of-Boost.html You are suggesting "all libraries should be converted to Boost Book." I say, thanks for volunteering to do that Robert :) Thanks
On 6/27/23 10:39 AM, Vinnie Falco via Boost wrote:
On Tue, Jun 27, 2023 at 10:34 AM Robert Ramey via Boost
wrote: To my mind, efforts would be better spent just converting all libraries to Boost Book.
Yep, I'll do that right after Peter does it. LOL
Note that I addressed this in my email to the list during C++Now. I quote:
I think these problems can be solved. But not by demanding that “everyone who maintains a Boost library must do X.” In Boost culture when you want something done you need to do it yourself, then convince the mailing list of the merits of your proposal.
Boost does have these sorts of requirements. E.g - a boost library must have a test suite. I think the effort required to build this searchable index is greater than that required to convert docs to Boost Book. Of course I don't really know that so let's not start a debate about that. What happens when the maintainer of the searchable index is not around any more? Then were dependent on something that doesn't work like it did initially. I'm aware that Boost Book XML is inconvenient to work with and not popular for this reason. Hence the current situation. Perhaps I might suggest to the developer on this project that he might want to use the boost book xml if available. This would be a one time effort which would address all these libraries at once. The other libraries would be handled on a one by one basis by an ad hoc approach being proposed. Of course, he who is doing the work get's to decide - that's the boost way.
https://cppalliance.org/boost/2023/05/08/Future-of-Boost.html
You are suggesting "all libraries should be converted to Boost Book." I say, thanks for volunteering to do that Robert :)
Hmmm - but now we've got someone who is tasked with do "something" do make all the documentation searchable.
Thanks
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Tue, Jun 27, 2023 at 10:55 AM Robert Ramey via Boost
Hmmm - but now we've got someone who is tasked with do "something" do make all the documentation searchable.
Right, but that's different from what I said. Mohammad is not creating any work for Boost authors and maintainers, nor is he requiring any changes to their repositories. This is a pure extension which delivers a feature without placing any burden on the community.
Perhaps I might suggest to the developer on this project that he might want to use the boost book xml if available.
Because Boost authors and maintainers are a fickle bunch (myself included) and because it is the owner of the library who knows best what and how their content should be indexed, our long term goal is a decentralized system. That is, we provide default behavior such as what is in the current implementation, and we also provide a way for the owner of the library repository to control how their own library's index is generated before it gets uploaded to the cloud. Mohammad's search experience is designed to give quick relief to an area that has been completely ignored; it is only the beginning of Boost's journey with search, not the end. Regards
participants (5)
-
Darryl Green
-
Emil Dotchevski
-
Mohammad Nejati [ashtum]
-
Robert Ramey
-
Vinnie Falco