Saturday, February 4, 2012 10:39pm CST
HideMyAss.com

Building Semantics Through Small Social Messages

March 15 2009 by Kerry Kobashi

There's a lot of buzz going on in the Internet that the current form of messaging is going to be replaced with a lightweight form of email. Social networking sites like Twitter, Facebook and FriendFeed have seen their users increase communication usage by using small, micro-blogging status updates to inform their circle of friends.

This "chatter" as I call it, is only going to grow in numbers. Human dialogue in the form of smaller sentence structure constrained by the size of the message would enduce more precise messages and meaning. The packaging of bite-sized "communication envelopes" would cut down on the size of the content allowing computer algorithmic heuristics to determine semantic meaning.

As we know, there are many different ways to express something. For example, to say "hello" we can do it many different ways in English:

"hey you!"
"how are you?"
"hi"
"whats up?"

By watching and logging human interaction in "chatter" databases and watching response, we can teach computers how to learn. Massive "sentence structure" databases could exist with each small microblog comment serving as a "leaf" in a huge language database. Leaves that fall along the same branch would be considered "synonyms". Upon lookup, the meanings would be the same.

By constraining the size of the message we can build databases that help us understand human dialogue by computer algorithms. But we have to constrain our dialogue into a smaller subset and that starts by changing the way we communicate. We have to shape our language structures to fit into easier means of understanding for machines. One could say we have to move more toward "caveman dialogue" where sentences shaped in smaller number of words would mean more precise things.

This would cause a different pattern in human behavior for communication. One would have to sit and pause to really think about how one is going to effectively package their message in a smaller message. Think of this as being a crafty way of the "less is more" and "divide and conquer" method approach being gently introduced over time to humans - eventually, over time it will be mass acceptable and prevalent. It may very be a social and cultural change that we are undergoing right now.

This could be a good thing, although strange at first. Search engines who would monitor these messages can store and break sentences up into semantic structure for gathering of human knowledge. By observing human dialogue a context can be built. Instead of keyword phrases, longer tail sentence structures could be looked up for meaning. By doing so, we can build deep "sentence trees" that can be aggregated and "looked up" for meaning. By observing human interaction and gathering common forms of dialogue, "knowledge engines" can move closer to the world where one could not tell the difference between human and computer dialogue.

Natural language processing could very well be obtainable if people change the way they communicate. Who knows, maybe someday Google will be able to mimick Joseph Weizenbaum's Eliza in a much more accurate means. Text to speech processing and speech to text algorithms could be used to use as a means to answer people's questions and be processed through a computer that understands the language.

This is an exciting time to come for those involved in computer artifical intelligence.

About Kerry Kobashi

Kerry Kobashi picture

Kerry is the founder of KerryOnWorld. He lives in Silicon Valley and has worked as an engineer and project manager. He owns Kobashi Computing a consulting company.

FifthSense's picture

this is brilliant

I just read through this and understand what you are saying Kerry. This is a brilliant interpretation of what could be going on right now.

"Constraining the message" and "enveloping it" in a "divide and conquer" means makes perfect sense. Your observation that the data gatherers must interpret and observe the interaction of the messages is a key indeed.