clear audio: October 2013

Thursday, October 24, 2013

clear audio for the disabled

Sometimes when people get older some sounds are not as easy for them to produce because of muscular dexterity sometimes so severe that they are unintelligible. This is extremely true for deaf people. What the program does is it uses a computer learning algorithm to predict what a person is about to say based on context and learns what the natural sound of their voice is. It then takes in input about the conversation and you know when you type on the android it suggests words that fit what you are saying. As your conversation progresses the more accurate it becomes. So it uses statistics to predict what you are about to say. It validates the input voice and if a sound does not appear to be good enough to understand it shapes the sound into something that is intelligible. Carnegie Mellon University had something I will improve it and Google might host part of it in the cloud using Google app engine. My contribution is working on the voice shaping and speech trajectory. That way people could use it for android smart phones. I asked people on researchgate what they thought and I'm going to see if it gets traction. I might even be able to make an iphone ios version of this app.

What is Clear Audio

Clear Audio is an experimental noise VOIP shaping using machine learning and natural language processing. Even today with the economy of technical resources and growing demand of cell tower its harder to do high bit rate voice conversations without a mechanism for compression. A gigabyte is not a gigabyte. If one were to take a large file say like a virtual machine of Linux and save it as a VMDK it can take about a gigabyte. If one were to compress the VMDK and convert it to a tar.gz it would become even smaller. What happened? There is a process called lossy file compression. This is about finding patterns in sound and eliminating redundancy. In the domain of voice we have Posterior probability where we can understand a conditional probability.

The book I am a Strange Loop talks about how people do the same thing over and over so they are predictable. The book Uncharted: Big Data and an Emergence of Human History talks about big data of the what people entered into the Google search engine. Using the Google n-gram application people can associate words with a certain pattern. We can project with other conversations what sounds might be uttered on a certainty probability scale.

What this project entails is a voice to text translator and a sound anticipatory.

Books I've been reading for this project:
Uncharted: Big Data and an Emergence of Human History
Surfaces and Essences by Hofstadter
I am a Strange Loop
Multirate Signal Processing for Communication Systems
Software Engineering for Embedded Systems by Oshana and Kraeling