Special classes and extended vs normal classification

Special classes and extended vs normal classification

Some classes are heavily used by the system for its own internal jobs. There was no point in developing two separated classification systems, one for humans and one for the engine, because basically they do the same thing, only with a different meaning.

When the system assigns a given linguistic content to the "English" class object, it means that it is expressed in English. Yet a Hindi native speaker may classify (in Hindi) the profile "Saxon genitive" with the very same "English" class object, because it is related to the English language. And he/she may do it while remaining totally immerse in the Hindi linguistic phase.

Both uses are obviously correct, and it is important that the system can tell the difference between the two. In this case we could use the fact that only a multimedia_text object is classified as linguistically dependant, while whatever is said of a profile is obviously a "normal human classification", yet there are more subtle logical traps ahead.

All object objects in the OWm2 storage engine bear license information. Most of them simply say they are not subject to copyright, yet potentially all are. So what happens when you assign a given profile to the class object "CC-BY"? Are you saying that this profile is related to this particular license, or are you rather stating licensing information proper?

You cannot tell, unless you state clearly which classification activity was made for what. So any time the system records the assignment of an object to a given class object, it also states whether it is doing so for its own internal purposes or as a result of human interaction. In the API you'll find the internal purposes named as "extended" vs "normal" (human) classification.

By all practical means we maintain in the system two parallel classificatory layers. One is service oriented, and it classifies objects by language, script, licensing information, file format and source, the other is the free associative machine by means of which a human user may state that "this is related to that". Once again, both layers share exactly the same software.

Special class objects are created for system's sake only, as they include the data needed by the system to perform special internal operations. They are made by wrapping a normal class object into a larger container, that is basically used only by the system. Users still see them and use them as ordinary class objects.

In particular, two special classes map the "legal input" for all linguistic content in the system. They do so by building a table that says what language/script coupling are allowed. There is indeed no point in storing English content in Cyrillic transliteration, but Serbian, for example, must map the possibility of both Serbian/Latin and Serbian/Cyrillic. Japanese has up to 4 possible variants and they must be kept well ordered and identifiable from each other.

When reading the API documentation you'll never see expressions like "Language" or "Script", though. This happens because the system is ready to accept non-human languages, too. We are not speaking about Martians Invaders, obviously, but rather about software.

There are a number of protocols that allow representing mathematical expressions, molecular 3D rendering etc. Calling these "languages" or "scripts" would have been (at the very best) improper, so we decided to use more generic labels. This is the reason why in the API you meet

  1. Communicative systems (which include human languages and rendering software)
  2. Mediums (which are scripts and protocol versions for software)

So now the circle is closed, and we finally came to see how a multimedia_text object knows in which language it is expressed.