Re: [Boost-Users] Re: Association Library?

6 May 2002

      ...
Dear Craig,
--- In Boost-Users@y... #757, Craig Hicks <hicks@k...> wrote:
...
A graph can represent associations between two nodes by means of the
edge (or edges) which connect those two nodes.  The nodes are,
needless to say, distinct  objects.  It's up to the programmer to
design a node data type which can capture the necessary information
to distinguish between nodes.  Obviously what I'm saying is obvious,
so I think it's my turn to say "I'm missing your point", and ask for
clarification.
...
I don't know who's missing what, but I could have been more clear
before and will attempt to be so now. I'll also provide a somewhat
realistic example, which is what you originally asked for (in message
#594).
Consider an application used by a school for keeping track of students,
courses, teachers, rooms, books, meals, buses and other objects that
are part of the school.
The application uses a Student class:
class Student {
        int id;
        string name;
        string address;
        Date birthdate;
        ...
    };
The Student class has a lot of data members, and there are a lot of
(member and free) functions for handling students. Indeed, there is
an extensive graphical user interface that allows the user to add,
modify, delete and find students maintained by the application.
The application also uses a Course class:
class Course {
        int number;
        string name;
        string description;
        string instructor;
        string room;
        string meeting_times;
        ...
    };
Like the Student class, the Course class has a lot of data members,
and there are a lot of (member and free) functions for handling
course. And there is an extensive graphical user interface that
allows the user to add, modify, delete and find courses maintained
by the application.
What I want you to imagine is that there are two independent and
well-developed classes that do not yet have an association between
them. One might even imagine -- as is actually realistic -- that
the two classes are parts of different applications written by
different programmers.
Of course, requirements change and software evolves. At some point,
there is a requirement to associate students and courses: to keep
track of which students are enrolled in which classes. Obviously,
this is a many-to-many association.
As I understand it, in order to use the BGL -- indeed to use any graph
library with which I am familiar -- one would have to derive both the
Student and Course classes from some common base class:
class SchoolObject { ... };
class Student : public SchoolObject { ... };
class Course : public SchoolObject { ... };
Then one could create a graph for which each node is a SchoolObject.
I was thinking more along the following lines:

// reusable code for RTTI based multiclass association

class NodeBase { /* common node interface */ };

template <class T>
class NodeDerived : public NodeBase;
{
    T *pT;
};

// application code for multiclass association
typedef NodeDerived<Student> NodeStudent;
typedef NodeDerived<Course> NodeCourse;

As you can see, I am thinking of a design where all the code
that can gets factored into a seperate module for
class-generic RTTI-based multiclass association.

First and foremost, because this is RTTI, there is no compile time type
checking of the derived types.  But there is run-time checking,
which can also pushed into the reusable class-generic RTTI-based code.
...
While this approach might work, it is unnatural. A Student is not a
Course, and there really is no such concept as a "SchoolObject" in the
application domain. Indeed, such an approach would allow links
(graph edges) between students and students, or between courses and
courses -- neither of which makes sense. Thus there would need to be
source code to explicitly forbid such illogical links.
There would need to be such source code, but the hard work could be pushed
into the reusable class-generic RTTI-based code.
...
There are two main problems with using the BGL (or any general-purpose
graph library with which I am familiar).
First, a mathematically idealized graph generally does not distinguish
between nodes of different types: Any node can be linked to any other
node via an edge. The same can be said of computer implementations of
graphs. In brief, a graph is typically an association between a type
(class) and itself.
Yes, mathematically, one can consider bipartite graphs in which nodes
can be classified into different classes. But the graph libraries I
have seen do not implement this idea as a graph involving two
different (C++) classes of nodes.
Second, the graph libraries I am familiar with emphasize abstract nodes
linked by edges. The nodes and edges may each have data associated with
them (the BGL does this using property maps), but the data associated
with nodes and edges is considered secondary. In effect, the idea of a
graph makes the association primary and the participating class(es) of
nodes (objects) secondary.
But in typical applications -- as I try to indicate in my example --
the emphasis is the object. The classes of objects (students and
courses) are primary and are predetermined. One does not have the
flexibility to recode these classes to conform to, say, the BGL, the
Graph Template Library (GTL) or some other graph library. The
association between the classes is a relatively minor detail, and
associations can be added, modified or removed from an application
perhaps more frequently than the classes of objects themselves.
I am looking for a library that allows me to take the preexisting
Student and Course classes and to add an Enrollment association between
the two with but minimal modification of existing source code.
...
Are you saying it would be useful to have a bipartite graph (or
N-partite graph) with each part holding separate types?  Is that how
Tilevich's association library worked?  I can't recall.
Yes, as I note above, I am saying it would be useful to have a
bipartite graph, i.e., a graph with nodes in two different classes.
Yes, this is how Tilevich does it in the Sept. 2001 C/C++ Users Journal
article. He gives an example involving employees and projects, which is
conceptually the same as my example involving students and courses:
ManyMany<Employee*, Project*> Projects;
No question, Tilevich's code is superior in the case of two fixed classes,
for those applications whose requirements are completely satisfied
by said code.
...
...
I dare say the BGL (or any C++ based graph) could handle nodes with
different data types via inheritance (all nodes contain a pointer to
base data class).
Yes, I think it could. But as I note above, it is unnatural to derive
fundamentally different classes from some artificial base class.
Moreover, in so doing, you allow for unnatural edges linking, say, a
student to a student rather than to a course.
...
Regardless, the issue of classes is one of syntax.  My statement was
meant semantically, without regard to syntax.
1. Do you agree with the statement "Any implementation of an
many-many  assoication container must have an underlying graph
lurking within (semantically speaking)"?
Yes, I agree, if one allows that the underlying graph may be a
bipartite graph that has nodes in two different classes.
...
2. Do you agree with the statement "The BGL is capable of being the
underlying mechanism for implementing a many-many class (e.g., one
with the I/F defined by  Tilevich's interface) ?
I'm fairly certain about 1, and think 2 is correct, but am not
certain  especially since I don't have Tilevich's class at hand
right now.
Yes, it is capable of being the underlying mechanism.
But it is also true that it is possible to write an object-oriented
program with inheritance, polymorphism and virtual functions in
plain C. It is possible, but it is not very natural or convenient.
The point of the STL -- and I believe of Boost extensions of the STL
-- is to make common but involved tasks *easy* by providing templates
that are efficient and easy to use.
I will second that "*easy*" statement !!!
Providing a "configuration", an interface that specializes
a container to make it usable for a particular purpose counts as making 
it easy.
That is definitely a requirement if BGL were to be used to
to implement an association class.
...
...
The 3rd question is, if it can be implemented with BGL, is it worth
it?  You'd be able to reuse the exchangeability of data structures.
You'd have a host of algorithms available.  Anything new you
developed might be reused for graphs in general. You'd have more
time to concentrate on the I/F.
This question gets at my point about graphs emphasizing the
association, i.e., the edges, whereas most C++ application classes
that participate in associations are emphasized more than than the
associations between them.
To put it a different way, much of the functionality that a graph
library provides -- beyond being a mechanism for implementing the
association -- is not desired.
If the user-programmer can't see it (hidden by interface), does it exist?
...
Would one do a depth-first or breadth-first search of the graph
students and courses? What would that mean and what use would it be?
Would one want to find connected components, i.e., clusters of students
and courses that form a kind of school-within-a-school? One could do
that, but it would not be high on the list of requirements for a school
management application.
What is desired is being able to access "neighbors": all the courses a
particular student is enrolled in or all the students enrolled in a
particular course. But oned oes *not* need an entire graph library to
achieve this low-level functionality. One just needs an association
library.
And that is what I seek: An efficient, flexible, easy-to-use
association library that allows me to add, modify and delete different
kinds of associations (1-to-1, 1-to-many, many-to-many, unidirectional,
bidirectional, etc.) between classes A, B, C, etc., in which some
associations are between a class and itself (hence a general graph)
but other associations are between distinct classes.
Mostly I want to agree with you as far as the following is true:
You have an application all of whose requirements, (present and envisioned
future), are satisfied by Telivich's library.

Otherwise, you might want to think about the cost/benefits of an
RTTI version (with templates) versus a template only version.
...
Craig, thank you for your good questions and for giving me the
opportunity to think out loud.
Thanks,
Rob Zako
P.S.  So far we have been discussing only *binary* associations. If we
were to discuss ternary and higher-order associations, which link 3 or
more objects together, then the idea of an association as a graph would
break down.
Good point.

Craig Hicks