Boost.Python - implementing slicing on extension container class
I'm currently writing a set of container classes using Boost.Python to wrap
an existing C++ library. Progress has been very good - not even two days
and I've wrapped all the existing C++ methods (at least 100). But of course
a library designed for C++ is not a library designed for Python. Job 2 is
to make it more Pythonic.
The current problem is implementing slicing in the __getitem__, __setitem__
and __delitem__ methods. These are already working with single keys. To
handle slicing, I need to identify in the key object is a slice object.
This is the sticking point right now.
I can access the type of the object using p.attr ("__class__") - but how do
I get access to the slice type object from __builtins__ in order to compare?
I could probably use a workaround such as extracting and checking a string
representation from p.attr ("__class__").str (), but it seems to me that
the ability to import symbols from another module (whether __builtins__ or
some other Python or extension module).
This could reduce to the ability to convert a PyObject* to an object
instance - even more useful for using any C API functions that don't have a
Boost.Python wrapper - but there seems to be no obvious way to do this.
In addition, my containers are currently declared as follows...
class_
Steve,
First, Boost.Python questions should go to the C++-sig
(http://www.boost.org/more/mailing_lists.htm#cplussig).
Second, The current CVS has some excellent work on presenting C++
containers as Python containers by Joel de Guzman and Raoul Gough (see
vector_indexing_suite). I really suggest you try this stuff; it could
save you lots of code and will almost surely work better for you than
what you've done by hand because it handles some very subtle issues of
Python object lifetime and validity correctly.
Steve Horne
I'm currently writing a set of container classes using Boost.Python to wrap an existing C++ library. Progress has been very good - not even two days and I've wrapped all the existing C++ methods (at least 100). But of course a library designed for C++ is not a library designed for Python. Job 2 is to make it more Pythonic.
The current problem is implementing slicing in the __getitem__, __setitem__ and __delitem__ methods. These are already working with single keys. To handle slicing, I need to identify in the key object is a slice object. This is the sticking point right now.
I can access the type of the object using p.attr ("__class__") - but how do I get access to the slice type object from __builtins__ in order to compare?
I could probably use a workaround such as extracting and checking a string representation from p.attr ("__class__").str (), but it seems to me that the ability to import symbols from another module (whether __builtins__ or some other Python or extension module).
This could reduce to the ability to convert a PyObject* to an object instance - even more useful for using any C API functions that don't have a Boost.Python wrapper - but there seems to be no obvious way to do this.
In addition, my containers are currently declared as follows...
class_
("classname") ... ; The boost::noncopyable is there because the containers don't have copy constructors due to certain issues using the library in C++. Therefore, I'd like to better understand the implications of this.
For instance, in Python the copy constructor is often used to make copy the entire container (as opposed to just rebinding names). Presumably Boost.Python supports this using the C++ classes copy constructor. This raises the questions...
1. How can I create an object to return to Python, containing an instance of the C++ container, without removing boost::noncopyable? (I need to do this to implement slicing too)
This, I think, is roughly what is done in the Inheritance section of the tutorial so I probably need something like the 'factory' function. So, to reduce it to a simple copy-function example, the following should be about right...
Base* container::make_copy () { container *temp = new container (); temp->do_copy (*this); return temp; }
...
.def("make_copy", container::make_copy, return_value_policy
()) I think this should be right, but confidence is low.
2. How can I support the Python copy constructor without having a C++ copy constructor.
I'm using boost-1.30.2 with Python 2.2.3 and Visual Studio 7.
Any advice would be appreciated. Thanks.
-- Dave Abrahams Boost Consulting www.boost-consulting.com
At 11:20 16/09/2003 -0400, you wrote:
Steve,
First, Boost.Python questions should go to the C++-sig (http://www.boost.org/more/mailing_lists.htm#cplussig).
OK - sorry for the confusion. I understand that Boost.Python users must be a rather specific subgroup of Boost users, but I had not realised that this was the wrong place to ask Boost.Python questions. I guess I don't need to ask why my emails keep losing their subject line as I'm in the wrong place and may as well unsubscribe.
Second, The current CVS has some excellent work on presenting C++ containers as Python containers by Joel de Guzman and Raoul Gough (see vector_indexing_suite). I really suggest you try this stuff; it could save you lots of code and will almost surely work better for you than what you've done by hand because it handles some very subtle issues of Python object lifetime and validity correctly.
Actually, while I could probably have picked up some useful tips and saved myself some time, I now have everything figured out that I need. Also, as the containers I'm wrapping are *not* standard library containers but actually a replacement library I wrote, many of the things that would presumably get the biggest emphasis (e.g. the validity and lifetime issues) would simply be non-issues with my containers. The main motivations in writing the C++ library were basically that the standard library containers are too fragile, lack useful functionality that cannot reasonably be retrofitted, and probably (due to the red-black trees assumption of a constant-time memory access model which is decades out of date and ignores the everyday facts of caching and virtual memory) unnecessarily slow. While I wanted the C++ containers to be fast and scalable, the main reason for writing them was because I wanted more flexibility, better safety, and more convenience. The fact that the 'iterators' are not invalidated when the container is updated, and keep track of the item they were left pointing to, is probably the single most important feature. The goals of flexibility, safety and convenience seem well suited to a scripting language such as Python, so it seemed sensible to give them a try in that context. Using my containers, object lifetime and validity is already managed even in C++, so I really don't need to worry about it in Python.
Steve Horne
At 11:20 16/09/2003 -0400, you wrote:
Steve,
First, Boost.Python questions should go to the C++-sig (http://www.boost.org/more/mailing_lists.htm#cplussig).
OK - sorry for the confusion. I understand that Boost.Python users must be a rather specific subgroup of Boost users, but I had not realised that this was the wrong place to ask Boost.Python questions.
I guess I don't need to ask why my emails keep losing their subject line as I'm in the wrong place and may as well unsubscribe.
Second, The current CVS has some excellent work on presenting C++ containers as Python containers by Joel de Guzman and Raoul Gough (see vector_indexing_suite). I really suggest you try this stuff; it could save you lots of code and will almost surely work better for you than what you've done by hand because it handles some very subtle issues of Python object lifetime and validity correctly.
Actually, while I could probably have picked up some useful tips and saved myself some time, I now have everything figured out that I need.
Also, as the containers I'm wrapping are *not* standard library containers but actually a replacement library I wrote, many of the things that would presumably get the biggest emphasis (e.g. the validity and lifetime issues) would simply be non-issues with my containers.
The main motivations in writing the C++ library were basically that the standard library containers are too fragile, lack useful functionality that cannot reasonably be retrofitted, and probably (due to the red-black trees assumption of a constant-time memory access model which is decades out of date and ignores the everyday facts of caching and virtual memory) unnecessarily slow.
Fascinating. Submitting these to Boost, perchance?
While I wanted the C++ containers to be fast and scalable, the main reason for writing them was because I wanted more flexibility, better safety, and more convenience. The fact that the 'iterators' are not invalidated when the container is updated, and keep track of the item they were left pointing to, is probably the single most important feature. The goals of flexibility, safety and convenience seem well suited to a scripting language such as Python, so it seemed sensible to give them a try in that context.
Using my containers, object lifetime and validity is already managed even in C++, so I really don't need to worry about it in Python.
I guess I'll have to trust you on that one. It's hard to see how you can accomplish your goals of optimizing speed and safety at the same time. -- Dave Abrahams Boost Consulting www.boost-consulting.com
At 16:11 16/09/2003 -0400, you wrote:
Steve Horne
writes: The main motivations in writing the C++ library were basically that the standard library containers are too fragile, lack useful functionality that cannot reasonably be retrofitted, and probably (due to the red-black trees assumption of a constant-time memory access model which is decades out of date and ignores the everyday facts of caching and virtual memory) unnecessarily slow.
Fascinating. Submitting these to Boost, perchance?
That sounds like an uphill struggle to me. My choosing to opt out of using the STL containers myself is one thing, but submitting an incompatible alternative to the standard library containers for widespread use is something very different.
Using my containers, object lifetime and validity is already managed even in C++, so I really don't need to worry about it in Python.
I guess I'll have to trust you on that one. It's hard to see how you can accomplish your goals of optimizing speed and safety at the same time.
Of course there are trade-offs. I gain speed in some circumstances by my choice of data structure. I lose speed due to the management and in order to maintain auxiliary data within that structure which is used to provide the extra features. And of course in some circumstances the standard containers will be faster even if that management and auxiliary data is disregarded. There's also nothing revolutionary about the data structure I'm using - it is essentially a very old and heavily tried-and-tested data structure. It's just a variation of the age-old multiway trees, but stored in main memory instead of on disk. In an age where caching and virtual memory are an everyday fact of life on desktop PCs, I figure that main memory accesses have a lot in common with disk accesses - though of course this would not be true with most embedded systems. By using a multiway tree rather than a binary tree, you lose the ability to restructure the tree with only pointer rotations (you need to physically move items on insertions and deletions) but you gain in cache and virtual memory friendliness. Reads and simple writes are fast, and on the assumption that most data gets read several times after being inserted once, that can be a good tradeoff. And unlike an std::vector, the number of items that need to be shifted is strictly limited by the size of the node. I've gone with essentially what I believe are B+ trees (I've found it difficult to get good definitions of the difference between B trees and B+ trees). That is, data is only held in 'leaf' nodes at the bottom layer of the tree. It's also convenient to have prev and next links in the leaf nodes for easy iteration, and for each leaf node to have a pointer to a singly-linked chain of iterators (meaning that maintenance operations only have to deal with the normally small number of iterators that refer into one or two leaf nodes). Add a count of the number of items in the subtree to each branch node and subscripted access (while certainly not std::vector-fast) is also reasonable. And because these nodes are quite large, the overheads are quite small. The convenience and safety benefits are always there, but speed benefits are much more dubious. It's very easy to find cases where the standard library containers will be much faster - if the items being held are large and complex so that shifting the items within a node is slow, for instance. Anyway, my code has several problems. Not least (1) it belongs to my employer, and (2) it wasn't written to fit in with existing C++ libraries. I'd love to think that the idea behind them might be adopted more widely, but I suspect it's more of a niche thing - and maybe that niche is just me.
participants (2)
-
David Abrahams
-
Steve Horne