Though we are all experienced with the inadequacy of communication, at some level we all have an internal bias that says everything we say is precise. After all, the thoughts in our mind are clear, the context relatively obvious, so it should be no big deal.
Ah, but let us consider the ways in which language has built-in fuzziness!
Point 1 – all naturally occurring human languages provide a means by which a linear structure (e.g., a sequence of symbols or words) can be translated into some kind of hierarchical structure. This translation process is generally done via a grammar, usually called a Phrase Structure Grammar. A grammar describes how the linear sequence becomes the hierarchy.
Point 2 – you know almost all the rules of your native language – otherwise you would not be able to understand anything! Unfortunately, your knowing is subconscious (and quite probably hard-wired into the brain) so having anyone list out the rules is impossible. The field of linguistics has one focus of determining the rules by investigation of pieces of language that are allowed and those that are not allowed. For example, consider the simple sentence: “The president will resign.” and the sentence adverb “probably”. Which of the following sentences is legal?
- Probably the president will resign.
- The probably president will resign.
- The president probably will resign.
- The president will probably resign.
- The president will resign probably.
Any native speaker of English would know that number 2 is not proper and the other 4 are.
Because the phrase “the president” is already a phrase itself when added to the sentence, hence it cannot be broken up. The sentence, as you and I both understand it, is actually:
- ((the president) will resign)
The parentheses show the phrase structure… how the linear sequence of four words turns into a hierarchical tree. Adding the word “probably” at the sentence level does not allow us to penetrate into phrases already constructed and alter them. Hence option 2 is not allowed in English.
Point 3 – even the general syntax of language is imprecise. Consider a simple statement – “the man took the dog on the boat”. The sentence is syntactically ambiguous – meaning that it has more than one phrase structure possible from its sequence of words. Actually it has three.
- ((the man) took (the dog) (on (the boat))) – the man moved the dog from some unstated location to a location on the boat
- (((the man) took (the dog)) (on (the boat))) – the man took the dog, and he did this taking while he was on the boat (instead of, say, on the land next to the picnic table)
- ((the man) took (the dog (on (the boat)))) – the man took a particular dog, the dog that was on the boat (as opposed to the dog that was tied up near the picnic table)
Very few people, hearing the sentence of 8 words, quickly perceive the syntactic ambiguity. Usually, in the real world, we are using all of our senses (vision, hearing, touch, taste, smell, balance, hunger, thirst….) and we can augment the sentence with this added information to choose the proper syntactic interpretation.
Point 4 – the meaning of the words in a sentence are imprecise. For example, sometimes we might classify a man as “a real dog”… and the meaning of the sentence changes drastically based on this possible difference in meaning.
Point 5 – even the context and pragmatics of a sentence are imprecise. If I have raised Chihuahua’s my entire life, if I live and die by my tiny furry friends, yet you have the same affection for giant Saint Bernards, then when you talk about taking the dog on the boat, you have an entirely different expectation and understanding than I – not because of the meaning of the word “dog” but because all of our understanding of the word derived from our personal experience (it is said that our understanding is “epiphenomenal”).
So there is a lot more, so much more, to consider, review, understand, implement, test, improve, and the like. AI is coming, and linguistic understanding is the start!