Hierarchical data and translation trees
Our understanding of reality is often built in frames containing each other. A cat is a feline, which is a mammal, which is an animal, which is a living entity, etc. When we say "Fritz the cat" we imply a full lot of information in that animal codename.
When dealing with content cataloguing we need a lot of such frames, and we need to make sure we correctly map what is included in what. But we need frames even to track the process by which a multimedia_text came to be assigned to a given profile.
It is important for us to know that someone created a piece of content as an original, or as a translation. If it was a translation we must know from what it was translated in the first place. This serves two purposes:
So our content gets assigned to a profile by a so called translation tree. This tree marks the way in which multimedia_text objects were produced and let's us immediately see that translation 1-4 is pretty likely to contain a huge semantic drift (do you remember we met this concept with multimedia_text already?).
This tree is not an object, but simply a structure that can be contained by proper objects. So while looking at its database definition you see nothing at all in the profile table, ambaradan can track the whole process that added content to it and, most important, it can track translation processes that happen within the system.
Yet, wait! Aren't we making dictionary-like entries? So what is this content we are talking about? The lemma itself or its definition? The answer is in the kind of tree we use. There are actually many, ambaradan uses them to map all possible kinds of taxonomic relations with just one dedicated set of routines that manages them all. There are trees to order
They all work in the same way, but before we proceed to explain their general structure it is probably better to make an example using translation trees. A tree node knows:
Tree nodes do not need to refer directly a multimedia_text object, for the simple reason that they can refer to their included object in instead. This allows trees to order literally anything in the system, not just linguistic dependant content. So you can easily reformulate the following example to understand how the other tree type work.
You will probably have noticed that the first three elements do not tell us anything about the hierarchical position of a node in the tree. They do not say that translation 1-4 came from translation 1-3. Let's see why.
One of the most difficult challenges for the coder is to find a way to efficiently represent hierarchical data in a relational database. It may seem weird, but there is no immediate way to retrieve a taxonomic tree from relational tables by a single efficient query. So we all resort to tricks.
All tree elements have a left and right value. They work as frames, so we immediately see that element 0-7 includes all the others, element 1-6 is included in 0-7 and includes both 2-3 and 4-5. This makes it trivial to arrange queries that retrieve a full taxonomic mapping in a single shot and can compute an element depth on the fly. It also makes it trivial to move around parts of a tree. Here we have the basic structure that allows merging and splitting things without much fuss and without any risk of loosing bits and pieces in the process.
Such a genial solution is obviously no invention of the ambaradan team, all credits for it go to Mike Hillyer for a very clear explanation of this method, along with basic code snippets that helped us build what was needed.
So, once we explained the basic technology we use to map and move relational data, let's move to the way in which "dictionary entries" are built.