Human beings, in general, are very sloppy with our understanding of the universe. We can hold contradictory information in our heads with little trouble, and are just fine with ambiguity and missing data. Computers, on the other hand, usually find the human way of storing data completely useless. For example, a resources like Wikipedia is excellent for humans researching a topic, but if you wanted a computer to do anything meaningful with it, you’d need to use its semantic cousin, WikiData. Warehouse filling artificial intelligences are just starting to be able to comprehend documents that the human mind instantly grasps.
One popular system for storing “semantic” information is called RDF, which is essentially a way of telling a computer the relationship between two nouns. The data is stored as a “triple”, with two nouns and a relationship. One of the most common ones would be something like (“Bob”, “has wife”, “Kelly”). It’s a very useful and powerful way to express concepts, and has strong backing from the W3C for sharing data across the world wide web.
The problem I see with this format is that it is only meaningful in the present tense. Many relationships change. For example, right now the President of the United States is Barack Obama, but in a couple of years that will be wrong. As far as I can tell, there is no good way to express this information using RDF.
Additionally, RDF has no way of annotating your relationship. For example, if I want to say that Bob is married to Kelly, it might be helpful to include a link to the newspaper article announcing the wedding. If someone wanted to contest any information about that relationship, they could look at the sources, and maybe add some of their own.
I’ve looked for good alternatives to RDF that take this into account, but so far I haven’t found anyone working on the same problem. Here are the essentials of what I think I would want:
- There are four main entities in the system: things, events, attributes, and relationships
- Things are essentially just a unique ID that has multiple attributes.
- Events can create things and/or set, modify, or delete attributes on them.
- Attributes have a type and a value. Each type defines which values are valid. Values may be hierarchical. Attributes only store data that cannot be conclusively determined from other attributes.
- Relationships are mutual attributes. They have a type and link two or more things, but don’t belong to any of them.
- Events must happen at a time, though that time does not need to be specific, and can be relative to another event.
- Any event can have annotations.
- An event can be caused by a thing, but that is not required.
- Each database is uniquely namespaced.
- All data is normalized. The system will refuse to store contradictory data.
What you should end up with is historical data that a computer can understand and answer questions about. It would probably not be in the least bit performant, but that is a secondary concern.
My one big puzzle with this reality mapping system is deciding what to do about ambiguous history. There needs to be a way to tell the system that an event is unconfirmed (meaning it may or may not have happened) or contested (meaning we are not sure which event happened). This seems like something that would have a fundamental impact on the structure of the system, so I doubt it’s safe to assume it can be added later.
Is there anything else you would want in your ultimate semantic storage system?