Thursday, October 22, 2015

Reading 19: Recognizing Text Through Sound Alone

Citation

Li, Wenzhe, and Tracy Anne Hammond. "Recognizing text through sound alone." Twenty-Fifth AAAI Conference on Artificial Intelligence. 2011.

PDF

Summary

The paper presents a novel way to detect English characters drawn using a pen, key or fingernail on a rough surface using only sound. The key idea is that as in sketching our pen slows down during corners, the sound of the stroke also changes with the speed of the stroke. In this algorithm first the authors apply endpoint noise removal. Two algorithms have been discussed for the same. The first algorithm calculates the energy of the signal for the first 150 ms and this level is called the environmental noise level. Then for each frame of 45 ms compares the energy of that frame to the environmental noise calculated earlier. If E(frame)*T > E(noise) then that frame is considered the start of the signal. The same is done to remove the endpoint from the behind. The second method uses first 20 ms of the signal to calculate the Gaussian mean and standard deviation. Then for each 10 ms segment we calculate (x - mean)/st.deviation . If this value is below the threshold of 3 then the segment is considered a valid segment.

Discussion

The authors use two main features to detect different shapes. The first is the mean amplitude of the signal. The second is the MFCC (Mel-Frequency Capstral Coefficients). They are difficult to calculate and outside the scope of the paper but serve as very effective feature. Then each shape entered by user is compared against a template of all the shapes and best match is found. The formula used for template matching is as follows:

No comments:

Post a Comment