Bobland

Chesterton's Fence

Posted on Fri, Apr 2 2021 in Bob's Journal • Tagged with software

There is a guiding principle of second-order thinking explained by G. K. Chesterton in his book The Thing.

There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.”

While Chesterton was thinking of social structures, this is a good principle in many areas, including computer software. Many of us in the early days of our programming careers (because of course, we've all learned our lesson and never fall for this anymore) come across a line of clearly useless code. Its removal would improve readability and have no negative impact, and so it is discarded. Only later did we discover that that piece of code was instrumental in preventing some error we hadn't even realized was possible.

After a few such encounters, a programmer tends to become superstitious about such sections of code. If you don't have the time to understand it deeply (and in the modern business world, who does?), you just leave it alone and hope for the best. To return to the fence analogy, we could remove the fence, but that would mean taking responsibility for understanding it. Maybe it really could be removed. Maybe a speed bump or warning sign would be more appropriate, but it's hard to say. Instead, the enterprising programmer leaves the fence in place, but jams a stick in the hinges to hold the gate open.

The drivers are delighted that they can now barrel down the street without having to stop and open the fence, until that fateful day when the stick breaks or wiggles loose and the fence slams shut in front of a car traveling at fifty miles per hour.

The next programmer comes along to investigate the wreck and he isn't looking at the fence anymore. Now, he wants to understand the stick. After considering it a while, he decides he needs a stronger stick, or one that's differently shaped. Or perhaps he just needs to use some glue to hold the stick in place better. When that fails, more modifications are made to the stick, and eventually modifications begin being made to those modifications.

Finally, someone gets sick of this mess with the pile of twenty sticks held together with twine, mud, and glue, and reroutes the traffic along a mud path alongside the road. Of course, this path is more treacherous and we need a way to make traffic stop and consider the road ahead of them carefully. So a fence is added.

Time-Aware Alternative to RDF

Posted on Wed, Sep 24 2014 in Bob's Journal • Tagged with software

Human beings, in general, are very sloppy with our understanding of the universe. We can hold contradictory information in our heads with little trouble, and are just fine with ambiguity and missing data. Computers, on the other hand, usually find the human way of storing data completely useless. For example, a resources like Wikipedia is excellent for humans researching a topic, but if you wanted a computer to do anything meaningful with it, you'd need to use its semantic cousin, WikiData. Warehouse filling artificial intelligences are just starting to be able to comprehend documents that the human mind instantly grasps.

One popular system for storing "semantic" information is called RDF, which is essentially a way of telling a computer the relationship between two nouns. The data is stored as a "triple", with two nouns and a relationship. One of the most common ones would be something like ("Bob", "has wife", "Kelly"). It's a very useful and powerful way to express concepts, and has strong backing from the W3C for sharing data across the world wide web.

The problem I see with this format is that it is only meaningful in the present tense. Many relationships change. For example, right now the President of the United States is Barack Obama, but in a couple of years that will be wrong. As far as I can tell, there is no good way to express this information using RDF.

Additionally, RDF has no way of annotating your relationship. For example, if I want to say that Bob is married to Kelly, it might be helpful to include a link to the newspaper article announcing the wedding. If someone wanted to contest any information about that relationship, they could look at the sources, and maybe add some of their own.

I've looked for good alternatives to RDF that take this into account, but so far I haven't found anyone working on the same problem. Here are the essentials of what I think I would want:

There are four main entities in the system: things, events, attributes, and relationships
Things are essentially just a unique ID that has multiple attributes.
Events can create things and/or set, modify, or delete attributes on them.
Attributes have a type and a value. Each type defines which values are valid. Values may be hierarchical. Attributes only store data that cannot be conclusively determined from other attributes.
Relationships are mutual attributes. They have a type and link two or more things, but don't belong to any of them.
Events must happen at a time, though that time does not need to be specific, and can be relative to another event.
Any event can have annotations.
An event can be caused by a thing, but that is not required.
Each database is uniquely namespaced.
All data is normalized. The system will refuse to store contradictory data.

What you should end up with is historical data that a computer can understand and answer questions about. It would probably not be in the least bit performant, but that is a secondary concern.

My one big puzzle with this reality mapping system is deciding what to do about ambiguous history. There needs to be a way to tell the system that an event is unconfirmed (meaning it may or may not have happened) or contested (meaning we are not sure which event happened). This seems like something that would have a fundamental impact on the structure of the system, so I doubt it's safe to assume it can be added later.

Is there anything else you would want in your ultimate semantic storage system?