clear audio: 2014

Sunday, August 3, 2014

word signatures including gestures

Words by themselves are very difficult for speech recognizers to work witb. Consider the phrase "eats, shoots, and leaves". There is a lot of ambiguity as far as what kind of leaves is this referring to. Does it mean to abandon or a part of a plant. Humans, even those who have "practiced" the art of being social for 10,000 hours or once a person turns 10 cannot be expected to know all there is about every idiom and ventricle metaphor since the beginning of time. Therefore something else is happening. The book "Mirroring people" poses the queston of why is it that a person gestures when they are talking on a phone and it is impossible for the other person to be able to see them. It is because we use words as a vehicle to transport the other person to a thought or feeling we wish to convey. Many nlp practitioners use stemming as a way to isolate meaning but this will not work because words without context is empty. We establish context with pitch, tempo, and liveliness. The book "Social Physics" asks us to imagine a device that isolates the spoken word from the manner in which it is spoken. For example pitch, tempo and liveliness. This will also fail to grasp the full meaning. For us to do that we need to think of a human brain and how it condenses meaning so a person does not think too hard. The book "We are our brains" discusses one theory that a person uses the equivalent of $1500 for their entire life or 15 watts per hour. In order to come up with a computational model we need to ask the right questions. What some large speech recognition platforms is doing by training their system by using 30,000 people voices speaking. This is wrong because we need to think of words and concepts as being unique. By over complexity we slow down the system so it is not usable in real time and requires vast amounts of computing power beyond what is in our phones. Some systems make this a crowdsourcing problem and remove the computing resources into the cloud. Computer security specialists know the problem with this approach is that while some conversations glob together some certaintly do not. This is not a matter of a big enough sample, there is nothing we can do. The book "Uncharted"explains how almost in a decade worth of Google searches some queries stand out. What my clear audio project proposes to do is tot build a blackboard like system based off how we understand the human mind to work. To do this we need to extend an artificial neural network. An artificial neural network is a single parameter engine that comes up with a single parameter result baed on recursion and removing the strands that are not used very much. While we do have need for a single parameter output we do not yet have a nicely formatted single parameter vector of input. We need a blackboard system with several parameters from multiple agents. We can use what we learned from visual saliency to find feature sets. For example consider the book wher's waldo. First a featuee that we may associate with waldo is that he wears red. Then we look for stripes. Then we look for glasses. Then for certian we know where waldo is. This process is not unlike what we do with with clear audio. We take a vector of certainties and study the senttence trajectory. Using ziff's law we know how a topic is formed and diverged from. We know that every conversation has a topic sentence. It is unusual for a person to call another without a particular question in mind, even if it is to see how their day went. That conversation has a subect of the day. We need to ask the contexual questions formed by Zachman's formula: what, when, who, how, and why. By studying a person's speech including pitch, tempo, and liveliness as well as the words they actually do say, future words can be predicted. By living in the experience we can allow our nlp engines to detect starcasm.

Wednesday, July 9, 2014

tone tempo viverance

The aspects of my research that set it apart from other speech recognizing software is that they attempt to match phonemes with phonetic words. My approach builds on this but uses a blackboard approach. A blackboard has multiple agents that evaluate an aspect and throws throws it into a processing system. In this approach the words are a feature and so are tone (pitch) tempo and viverance. This is important because like my paper Jurassic Park Extrapolation Renders Speech to Speech engine greater accuracy mentions the brain is constantly bombarded with signals and information. The brain has trained itself to ignore some sounds such as the 60 hertz drone of a light bulb fixture. The brain needs some difference and variance in order to stay focused. When a robotic voice speaks it is often draining to listen to. I hope to find a feature set to annotate and make computers easier to listen to. I am currently working on looking at integrating the Neuromorphic Vision C++ Toolkit with gbbopen and pybrain. I hope to be able to understand the algorithms well enough to port them all to python and so it can run on an android. GBBOpen is written in lisp and neuromorphic Vision C++ Toolkit is written in C++.

Saturday, March 1, 2014

Determining the outcome of the jokester's joke

I've recently read The Creative Mind by Margaret Boden. Although it was written a few years back it had many relevant points to consider. It talked about whether a computer can be creative. It argued that a computer cannot be creative because it used heuristics and specific instructions of what to look for. The book Social argues that the mind is always in the mode of social communications and that it does not turn off. Social argues that although a person is not born with social mechanisms in place it quickly develops social capacity as the mind adapts to its environment. It takes 10,000 hours of practice to become an expert in a speciality. The brain achieves this social learning by age 10. The book The Second Machine Age intoduces a paper called the division of labor that takes a look at what computers are capable of. The Second Machine Age argues that while computers are not capable of doing anything but what we tell it to do, if we have a good enough feature set we, such as Google's Chauffer project with driverless cars, they can be better than humans at determining how things should be. The Second Machine Age discusses what judgment is and the book Social talks about how many hours it takes before a human is socially compotent. Recently I listened to Sanjeev Arora http://youtu.be/0WX0h5fu0zs who gave the idea to look for association words that serve as a trap door for other words. For example in a paragraph that mentions "snow" connotates that that paragraph is about snow. This is very similar to the Zachman framework. If our feature set is good enough than like the Google Chauffer project that only had two accidents as of early 2014 including one the time a car was rear ended at a stop light, then natural sentence processing clear audio project can predict as Margaret Boden says predict the surprising punch line of Grandpa's unpredictable jokes. Research to be done: I am currently looking at the ways words are based on how they sound. I am looking heavily at music based word formation after reading Gondel, Eusher, Bach. It has long been known that in order to determine whether a grammar was correct or not, it should be orally pronounced.