Saturday, December 12, 2015

Project Reading 8: Automatic License Plate Recognition

Citation

Chang, Shyang-Lih, et al. "Automatic license plate recognition." Intelligent Transportation Systems, IEEE Transactions on 5.1 (2004): 42-53.

PDF

Summary

License Plate Number Recognition is an important application of optical character recognition. It is used at automated toll booths, gate entries and many other points of authentication by car number plate. The key difference from sketch recognition techniques here is that the input is never available in the form of a sketch or stroke points. The input is always in the form of an image capture of the vehicle. So the task of number recognition is divided into two phases:
1) Locating the number plate of a vehicle in a given capture of the car.
2) Predicting the number present on the number plate.

Overall the process flowchart is as follows:




















Once the input license plate image is formed, it is passed through preprocessing steps where first image is converted to a binary pixel image, i.e. only white and black colors are kept. After that character segmentation is done. After character segmentation we achieve images of individual characters which can be used to extract useful features.

Discussion

The first step in feature extraction is constructing the contour of the character. Once the contour is formed. The contour lines are then used to form a set of points equidistantly spaced on the contour lines. These points are then used to match the features against a set of templates. Kohonen SO neural models is used to form the classification step. An example of feature extraction and template matching is shown below:


Project Reading 7: A study on the use of 8-directional features for online handwritten Chinese character recognition

Citation

Bai, Zhen-Long, and Qiang Huo. "A study on the use of 8-directional features for online handwritten Chinese character recognition." Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on. IEEE, 2005.

PDF

Summary

This paper presents a new way for Chinese Character Recognition using directional features only. The input sketch is passed through some preprocessing steps. These steps include linear size normalization, adding imaginary strokes, nonlinear shape normalization, equidistance resampling, and smoothing. After these pre processing step a 64x64 normalized character sample is obtained. Then at 8x8 uniformly sampled location are computed using a filter similar to Gaussian envelop of a Gabor filter. Then 8 directional features are computed from each online trajectory point. This gives a total vector of 512 data points which are then used to do the classification. The system is tested extensively on 3755 level-1 Chinese characters in GB2312-80 standard.

Discussion

The overall flowchart of the process described above is shown below:













The authors use two simple character classifiers at the classification step. The first one is a maximum discriminant function based classifier with a single prototype. The prototype is the mean of the training feature vectors, and the discriminant function is the negative Euclidean distance between a testing feature vector and the prototype feature vector.

Sunday, December 6, 2015

Project Reading 6: Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models

Citation

Espana-Boquera, Salvador, et al. "Improving offline handwritten text recognition with hybrid HMM/ANN models." Pattern Analysis and Machine Intelligence, IEEE Transactions on 33.4 (2011): 767-779.

PDF


Summary

The paper presents a hybrid approach to optical character recognition. The idea is to use Hidden Markov Models (HMMs) to model the structural part of the optical input and use a multi layer perceptron to estimate the different classification probabilities. The paper also presents new methods to pre process the input images in terms of slope correction, size normalization and slant correction. The system was tested on IAM database.

Discussion

The image shows the hybrid Hidden Markov Model and Artificial Neural Networks technique used in this paper. First the images are pre-processed and resulting feature vector and a contextual vector from left and right of the character is processed by a Multi Layer Perceptron (MLP). The MLP outputs are then used as emission probabilities in HMMs.

Thursday, December 3, 2015

Project Reading 5: Structural Offline Handwriting Character Recognition Using Levenshtein Distance

Citation

Putra, Made Edwin Wira, and Iping Supriana. "Structural Offline Handwriting Character Recognition Using Levenshtein Distance."

PDF


Summary

This is one of the very latest research papers and discussed offline handwriting recognition using a new metric. Earlier methods of preprocessing are very expensive and use a lot of computing resources. The paper significantly improves recognition accuracy without relying on normalization techniques. The similarity metric used is Levenshtein Distance. The method was tested on digits and character images taken from ETL-1 and AIST databases. The Levenshtein distance gives accuracy of 84.69% on digits and 67.01% on alphabets. 

Discussion

In the preprocessing step the images is passed through a thresholding stage, then a thinning, and then slant correction. Features are extracted based on curve extraction, string feature representation and string graph representation. Then a string edit distance algorithm is used in this paper which is based on Levenshtein distance. The algorithm makes use of dynamic programming by using a 2-D array technique for calculating edit distance thereby speeding up the computation.  Levenshtein distance is the minimum distance required to change one string into another. The change operations are insertion, substitution and deletion.

Tuesday, December 1, 2015

Project Reading 4: Historical review of OCR research and development

Citation

Mori, Shunji, Ching Y. Suen, and Kazuhiko Yamamoto. "Historical review of OCR research and development." Proceedings of the IEEE 80.7 (1992): 1029-1058.

PDF


Summary

This paper is a review paper targeted specifically to OCR (Optical Character Recognition). OCR is a branch in character recognition where the input character which might be in the form of stroke points is first converted into an image to perform recognition. The paper discusses the research and development in the OCR field and what are the commercial uses of such techniques. 

Discussion

First generation Optical Character Recognition (OCR) systems can be dated back to 1930's. It was only a dream at that time till the advent of computers in the 1950's. The main research started in 1960's and started with template matching approach. The second generation applied structural analytical approach. The third generation of OCR brought a lot of commercial application of the academic research particularly in zip code recognition in postal services. As the time proceeded the R&D of OCR moved towards word recognition using contextual knowledge such as addresses and names. In such development postal address name reading machines came into trend. The history tells us that OCR technology has been researched over a long time. Lack of computing resources produced a major hindrance in testing many techniques but the situation improved with the advent of fast processors and computers.

Friday, November 27, 2015

Project Reading 3: Online and offline handwritten recognition: a comprehensive survey

Citation

Plamondon, Réjean, and Sargur N. Srihari. "Online and off-line handwriting recognition: a comprehensive survey." Pattern Analysis and Machine Intelligence, IEEE Transactions on 22.1 (2000): 63-84.

PDF


Summary

This is a survey paper providing an breath view of different online and offline handwriting recognition techniques, discussing the state of the art and many other techniques specific to certain domains. For offline handwriting recognition the paper discusses a variety of preprocessing steps, which include thresholding, noise removal, line segmentation (which is an important problem already thoroughly discussed in previous readings), word and character segmentation. Then comes the character recognition step, which can be divided into two main areas, mainly OCR (Optical Character Recognition) treating the input character as an image rather than a set of sketched points and performing recognition on the image. The second area is doing sketch based recognition, which takes the input character as a set of points with x, y coordinates or possible a timestamp.

Discussion

In the online character recognition space, the problem is even more complex as we have to process the same input character and give the recognition result within a certain timeframe to support its use in a realtime application. A lot of structural and rule based methods have been explored in this area, some being discussed in previous readings like paper from Rubine or Long. Another type of method applied are statistical methods using sketch as a realtime application of providing more information by giving more strokes and using this information to make a prediction. Markov Models have been used to model this process.

Thursday, November 26, 2015

Project Reading 2: Template-based online character recognition

Citation

Connell, Scott D., and Anil K. Jain. "Template-based online character recognition." Pattern Recognition 34.1 (2001): 1-14.

PDF


Summary

The paper presents and demonstrates a template-based system for online character recognition where the number of representative templates is determined automatically. These templates can be viewed as representing different styles of writing a particular character. These templates are then used to classify the character particularly using decision trees. Overall the system produced accuracy of 86.9% on a dataset of around 18000 characters which included the 26 lower case characters, 10 numerical digits.

Discussion

The input data was resampled to produce a points which were equidistant in space rather than in time. Then Gaussian filtering was also applied on both x and y coordinates. The paper discussed two different methods for classification, the first being nearest neighbor classification and the other being decision tree based classification. The paper uses a custom distance metric to compute the distance between an input sketch and a set of template sketches.

Wednesday, November 25, 2015

Project Reading 1: Online handwriting recognition: the NPen++ recognizer

Citation

Jaeger, Stefan, et al. "Online handwriting recognition: the NPen++ recognizer."International Journal on Document Analysis and Recognition 3.3 (2001): 169-180.

PDF


Summary

The paper presents an on line handwriting recognition system called NPen++. This recognition engine is based on multi state time delay neural networks. The recognition accuracy was found to be from 96 percent for a dictionary of size 5000 and around 93 percent for a dictionary of around 20000 words. The preprocessing state has various steps from normalizing size, normalizing rotations, interpolating missing points, smoothing, normalizing inclination, resampling and removing delayed strokes.

Discussion

The features computed for recognition are writing direction, curvature, pen-up/pen-down times, hat feature, aspect, curliness, line-ness, slope, ascenders/descenders, context bitmaps. The Multi-State Time Delay Neural Networks (MS-TDNN). The system was evaluated on many datasets from UKA, CMU and MIT which included both printed and cursive writings.


Sunday, November 15, 2015

Reading 27: A Visual Approach to Sketched Symbol Recognition

Citation

Ouyang, Tom Y., and Randall Davis. "A visual approach to sketched symbol recognition." (2009).



Summary

The paper presents a image based method for sketch recognition. Earlier methods have used geometric or gesture based features for sketch recognition. This paper tries to apply ideas of computer vision to sketch recognition. The paper presents 5 novel image based features and and efficient metric for recognition which improves the accuracy significantly compared to state of the art systems.
The intuition is that images are perceived more as images by humans than as stroke points and their geometric properties. The authors present 5 new features:
1) 4 features based on the stroke direction with respect to 4 reference angles (0, 45, 90 and 135 degrees). The feature values are calculated as the difference between the stroke angle and the reference angle, and vary linearly between 1.0 (if the two are equal) and 0.0 (if they differ by more than 45 degrees)
2) 1 feature based on the endpoints of the stroke. It is equal to 1.0 if the point is at the beginning or end of a stroke and 0.0 otherwise.

The overall process is as follows:


Discussion

The distance metric the authors use to find the distance between an input sketch and a template sketch is as follows:


dx represents the shift in of a point inside a 3x3 box. The minimum distance among these shifts is taken to be the distance for that x, y. This is done for all x, y boxes. The image below shows this:

The paper applies two pruning methods:
1) Coarse Candidate Pruning, based on taking first k Principle Component features and Euclidean L2 distance metric.
2) Hierarchical Clustering.

Saturday, November 14, 2015

Reading 26: Envisioning sketch recognition: a local feature based approach to recognizing informal sketches

Citation

Oltmans, Michael. Envisioning sketch recognition: a local feature based approach to recognizing informal sketches. Diss. Massachusetts Institute of Technology, 2007.

PDF


Summary

This thesis presents a novel way to recognize sketches not by their geometric features but by introducing a new type of features named the bullseye feature by the authors. The idea is overlay the input sketch on a bullseye concentric circles shape and count the number of dots in each part of the shape. Then match shapes by finding how close are the number of points in each of the bullseye part. The bullseye features look like these:
The bullseye feature typically is the number of ink points lying in each of the part of the bullseye. The radius of the concentric circles increases on a log scale. The inner parts are smaller and the outer parts are larger, the intuition behind which is central parts of a sketch are much more important than the outer parts.

Discussion

By calculating the bullseye's histogram relative to the stroke direction, instead of relative to the x-axis, the bullseyes are rotationally invariant and do not change when the shape is drawn at a different orientation.
Each part is also divided into 4 sub-bins containing number of points at particular stroke angle. Doing this encodes some sense of direction in the bullseye features. The bins look like this:
The distance metric used to find the close ness of two bullseye features is weighted such that inner circles are weighted higher than the outer circles, thereby complementing the larger size of the outer bins. The distance metric is a modified version of the common X square distance. 
The shapes are represented in the form of Match Vectors in the Codebook.

Reading 25: Who dotted that 'i'?: context free user differentiation through pressure and tilt pen data

Citation

Eoff, Brian David, and Tracy Hammond. "Who dotted that'i'?: context free user differentiation through pressure and tilt pen data." Proceedings of Graphics Interface 2009. Canadian Information Processing Society, 2009.

PDF


Summary

The paper presents a novel way to distinguishing who is using the pen on a sketch input surface by using features based on pen tilt, pressure and speed. The paper presents two experiments: 
1) First experiment proves that tilt, pressure and speed features are consistent for a user within a certain sketch and also consistent in time varying in hours or days. 
2) The second experiment tries to match a given set of tilt, pressure and speed features to a unique user. The paper uses k nearest neighbor classifier and gets an accuracy of 97.5 percent to distinguish two users and an accuracy of 83.5 percent to distinguish ten users simultaneously.

Discussion

The paper presents 24 features based on pen tilt, pressure and speed. 14 are based on pen tilt, 7 based on pressure and 3 based on speed. The features used in the paper are as follows:

The authors found out as the set size of the number of users to distinguish increases, the accuracy decreases.

Sunday, November 8, 2015

Reading 24: SketchREAD: a multi-domain sketch recognition engine

Citation

Alvarado, Christine, and Randall Davis. "SketchREAD: a multi-domain sketch recognition engine." Proceedings of the 17th annual ACM symposium on User interface software and technology. ACM, 2004.

PDF

Summary

SketchREAD (Sketch recognition ending for many domains) is a multi domain sketch recognition system capable of understanding freely drawn, messy, two dimensional diagrammatic sketches. The system provides a method for users to describe shapes which can be used to setup appropriate recognizers for the same. The system builds a Bayesian network and assigns probabilities for each interpretation given by the system.

Discussion

The system tries to model each shape by the set of rules and these rules can be used for recognizing objects later. It is much more intuitive than earlier methods as writing domain specific recognizer for each domain can be very tedious tasks. Rather the system lets the user to describe the shapes. An example showing the pattern used to describe a new shape:

Reading 23: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition

Citation

Rabiner, Lawrence R. "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE 77.2 (1989): 257-286.

PDF


Summary

The paper presents an in-depth overview of Hidden Markov Models (HMMs) and then gives a real world example of usage of HMM's in speech recognition application. In Hidden Markov Models the system being modeled is assumed to be a Markov Process with unobserved or hidden states. HMMs can be presented as simplest dynamic Bayesian network. For example, a HMM with 4 states and 3 observed events is shown below (probabilities are not mentioned along the edges):


Discussion

Forward chaining is a method in which we find probabilities of upcoming events in forward direction of movement of vents. At each step we calculate the probabilities by multiplying the probabilities of current events with the probabilities of past events. At each stage we can prune out the states which cannot be maximum in any case. In backward chaining we do the same thing but in reverse direction, that is from current event to events in the past.

Saturday, October 31, 2015

Reading 22: HMM-based efficient sketch recognition

Citation

Sezgin, Tevfik Metin, and Randall Davis. "HMM-based efficient sketch recognition." Proceedings of the 10th international conference on Intelligent user interfaces. ACM, 2005.



Summary

This paper presents a novel way of doing sketch recognition by considering it as an interactive process where strokes are entered one after another. The system captures that stroke ordering to draw a complex shape tell a lot about what shape is being drawn. Earlier sketch recognition systems only used geometric or gesture based methods to conclude what shape is being drawn. The authors conducted a user study and found out that most people follow very strict stroke ordering when drawing a complex system.

Sketch recognition involves grouping strokes that constitute the same object (segmentation) and determining the object class for each group (recognition). The framework the authors introduce in this paper provides a mechanism for capturing an individual's preferred stroke ordering during sketching, and uses it for efficient segmentation and recognition of hand-sketched objects in polynomial time.

Discussion

The system uses Hidden Markov Models to model individual shapes. When a stroke is input the system checks among the trained HMM models to give probabilities of the predicted shapes. The shape with the maximum probability is chosen. The system is incremental, as the user adds more strokes to the sketch, the new sketch is again run through the trained HMM models to get the new classification probabilities for the complex shape.

The system also differs from numerous sketch recognition systems by its ability to do recognition with only polynomial time and space complexity, and by its utilization of drawing order for capturing and modeling user sketching styles.

Reading 21: LADDER, a sketching language for user interface developers

Citation

Hammond, Tracy, and Randall Davis. "LADDER, a sketching language for user interface developers." Computers & Graphics 29.4 (2005): 518-532.

PDF


Summary

This paper presents LADDER which is a language to describe how sketches are drawn, displayed or edited in a domain. These descriptions are then transformed into actual recognizers which can recognize and show the described behaviors of displaying and editing the recognized sketch. Such system will let any scientist to create a sketch recognition system tuned for a particular domain by specifying all the shapes that need to be recognized for that particular domain.

The five main things that define a shape in LADDER are:
1) Components: The components (mainly primitives) consisting of the new shape.
2) Constraints: Constraints on how the components are organized in the sketch.
3) Aliases: Giving names to components that can be reused describing certain properties.
4) Editing: Set of triggers and actions which describe any editing behavior on the sketch. Such as double clicking the shape should case it to get selected.
5) Display: How the shape should be displayed, should the components be used as original or beautified or should some color be applied to the stroke. 

Discussion

LADDER is also hierarchical so new shapes can be described by using existing defined shapes. It also supports abstract shapes so that shapes belonging to a particular category inherit certain properties from a common parent class. 
An example description for an arrow shape is shown below:


Reading 20: Recognizing sketched multistroke primitives

Citation

Hammond, Tracy, and Brandon Paulson. "Recognizing sketched multistroke primitives." ACM Transactions on Interactive Intelligent Systems (TiiS) 1.1 (2011): 4.

PDF

Summary

This paper presents a novel way to detect multiple stroke primitives. The system achieves this by joining the strokes which may seem logically connected but are actually two different strokes. Once the strokes are combined then it is checked if the new shape match any primitive shape. The system uses low level recognizers such as Paleosketch to detect single stroke primitives against a database of known sketches. The flowchart for the algorithm is as follows:


Discussion

The first step is the graph building step. In this step a graph is built with each node representing an endpoint of a stroke and an edge representing a connection between the other end of the stroke or any other endpoint which is nearby but from another stroke. A special criterion is used to measure nearness of two nodes in the graph:
1) Euclidean distance between the endpoints divided by the total stroke length must be within a threshold.
2) Euclidean distance between the endpoints divided by the max of length or width of the bounding box must be within a threshold.

After the graph is built a graph searching algorithm is applied to get a strongly connected component from the graph. Tarjan's algorithm is used to find the strongly connected component in linear time. Tarjan's algorithm uses Depth First Search.

The strongly connected component is assumed to be a single stroke and the third step viz. stroke merging is applied. All the stroke endpoint that have an edge in the strongly connected component are connected to each other using a straight line.

Once we get a new combined shape low level recognizers are used to detect primitives. Some additional steps to remove false positives are also applied.

Thursday, October 22, 2015

Reading 19: Recognizing Text Through Sound Alone

Citation

Li, Wenzhe, and Tracy Anne Hammond. "Recognizing text through sound alone." Twenty-Fifth AAAI Conference on Artificial Intelligence. 2011.

PDF

Summary

The paper presents a novel way to detect English characters drawn using a pen, key or fingernail on a rough surface using only sound. The key idea is that as in sketching our pen slows down during corners, the sound of the stroke also changes with the speed of the stroke. In this algorithm first the authors apply endpoint noise removal. Two algorithms have been discussed for the same. The first algorithm calculates the energy of the signal for the first 150 ms and this level is called the environmental noise level. Then for each frame of 45 ms compares the energy of that frame to the environmental noise calculated earlier. If E(frame)*T > E(noise) then that frame is considered the start of the signal. The same is done to remove the endpoint from the behind. The second method uses first 20 ms of the signal to calculate the Gaussian mean and standard deviation. Then for each 10 ms segment we calculate (x - mean)/st.deviation . If this value is below the threshold of 3 then the segment is considered a valid segment.

Discussion

The authors use two main features to detect different shapes. The first is the mean amplitude of the signal. The second is the MFCC (Mel-Frequency Capstral Coefficients). They are difficult to calculate and outside the scope of the paper but serve as very effective feature. Then each shape entered by user is compared against a template of all the shapes and best match is found. The formula used for template matching is as follows:

Saturday, October 17, 2015

Reading 18: Geometry Review

Citation

Handout distributed in class.

Summary

The handout distributed in class contains some basics of geometry. It starts with the introduction of Pythagoras theorem which states in a  right angle triangle the hypotenuse squared is equal to base squared plus height squared. If the angle between the base and hypotenuse is theta then the length of the base is hCos(theta) and length of the height is equal to hSin(theta) where h is the length of the hypotenuse.
Also, relationship between the sides of a triangle is discussed:









Also relation relation between sides of a parallelogram and its area are discussed. The area of a parallelogram with diagonally opposite endpoints as (ax, ay) and (bx, by) is determinant of the matrix [ax, bx; ay, by]

Discussion

To prove that area of a parallelogram is equal to the determinant of the matrix made by the coordinates of the diagonally opposite points, you first divide the parallelogram into two equal triangles divided by the diagonal. Then calculate the area of each triangle using formulas discussed previously. Then add those two areas to get the final area which comes out to be ax*by - ay*bx which is the determinant of the matrix [ax, bx; ay, by]

Reading 17: PaleoSketch: accurate primitive sketch recognition and beautification

Citation

Paulson, Brandon, and Tracy Hammond. "PaleoSketch: accurate primitive sketch recognition and beautification." Proceedings of the 13th international conference on Intelligent user interfaces. ACM, 2008.

PDF


Summary

Paleosketch provides a a new low level primitive recognizer with ranking based system for providing different interpretations of the same sketch. The paper introduces two new features which very effectively distinguish between most primitive shapes. The primitive shapes considered in this paper are line, arc, circle, ellipse, polyline, helix, spiral, curve. A sketch could be recognized as combination or one or more primitives. The two new features are NDDE (Normalized distance between direction extremes) which is equal to the distance between the direction extreme points divided by the total stroke length. The other feature is DCR (Direction Change Ratio) which is maximum change in direction divided by average change in direction.
The paper also introduces a new ranking algorithm which provides ranking to different interpretations of the same sketch by adding the rankings of low level primitives. Line as rank 1, Arc 3, Helix, Spiral, Ellipse have rank 5.

Discussion

NDDE is usually close to 1 for highly curved shapes with direction changing in only one direction. For example circle, helix, spirals will have high NDDE. Whereas shapes like line, polyline where distance between direction extreme points is not the same as the length of the stroke the NDDE value is low. DCR is high for shapes where there is a sudden change in direction. So the maximum change in direction is high comparing it to average change in direction. So DCR will be high for polylines or shapes with sharp edges. DCR will be low for highly curved shapes as there is always slight change in direction and not a sharp one.

Sunday, October 11, 2015

Reading 16: Combining corners from multiple segmenters

Citation

Wolin, Aaron, Martin Field, and Tracy Hammond. "Combining corners from multiple segmenters." Proceedings of the Eighth Eurographics Symposium on Sketch-Based Interfaces and Modeling. ACM, 2011.



Summary

This paper introduces a novel way to detect corners in sketches. Previous algorithms in corner detection work well for one case but might fail for another. This paper introduces a hybrid approach by combing the results of successful corner finding algorithms and trying to find the best subset of all the points detected by those algorithms. The paper uses subset selection to find the best subset. The algorithm is as follows:
1) Take corners detected by five famous corner finding algorithms viz PaleoSketch, Douglas Peucker, Sezgin, Kim and Kim, ShortStraw.
2) Take all these corners as initial set and apply subset selection by using Mean Square Error as the error metric.
3) Output the best subset.

Discussion

Subset selection algorithm used in this paper also allows to add any point that was removed earlier. Mean Square Error (MSE) itself was not used to set the threshold for finding the best subset because difference in MSE might be high for larger scale sketches and smaller for smaller scale sketches. So a ratio of the MSE's was taken. To find the threshold for finding the best subset in the delta MSE graph a dynamic approach was used. A system was trained on existing sketches and a delta MSE was calculated for n+1 to n named R(below) and n to n-1 named R(above) where n is the correct number of corners in that sketch. It was observed R(below) values lie close to 1 as removing irrelevant corners didn't increase MSE much. Whereas R(above) values had a higher range. R(below) and R(above) were represented as Gaussian curves and their intersection was taken as the threshold which came out to be 1.99. To fit R(below) and R(above) to Gaussian curves instead of using Mean and Standard Deviation of the collection, Median and MAD (Median Absolute Deviation) were used as Mean and Standard Deviation of the Gaussian Curve. The accuracy of this method was observed to be higher than any previous method. More specifically the all or nothing accuracy was considered the most important metric to measure the accuracy of this corner finding algorithm.

Reading 15: Revisiting ShortStraw: improving corner finding in sketch-based interfaces

Citation

Xiong, Yiyan, and Joseph J. LaViola Jr. "Revisiting ShortStraw: improving corner finding in sketch-based interfaces." Proceedings of the 6th Eurographics Symposium on Sketch-Based Interfaces and Modeling. ACM, 2009.

PDF


Summary

This paper introduces a new method for corner finding called IStraw. This method tries to improve the existing method called ShortStraw. The two major shortcomings that the authors discussed in ShortStraw are not using direction information which is already present to enhance the corner finding algorithm and ShortStraw cannot handle curves and works only for polylines. The authors also argue that there is no straw length for the first w and last w points where w is the window used to find the straw lengths.

Discussion

The authors introduces formulas for finding the straw lengths for the first w and last w points. The idea is that missing straw lengths for these points could lead to some missed corners. Instead of the fixed empirical threshold in ShortStraw, IStraw uses a dynamic threshold. The paper also introduces a novel way to differentiate between a polyline intersection and a curve. The idea is to take equidistant points on either side of the detected corner. In a polyline intersection as these points come closer to the corner the angle doesn't change. Whereas in curves as these points come closer to the detected corner the angle increase. This finding is used to classify an earlier detected corner to be false. Overall IStraw improves all or nothing accuracy over ShortStraw and previous methods.


Reading 14: ShortStraw: A Simple and Effective Corner Finder for Polylines

Citation

Wolin, Aaron, Brian Eoff, and Tracy Hammond. "ShortStraw: A Simple and Effective Corner Finder for Polylines." SBM. 2008.

PDF


Summary

This paper introduces a novel way to find corners in a sketch. This paper doesn't use any speed or direction information to find corners. The basic idea is to first re sample the sketch into points which are equidistant. Then take a window of points (w = 3) on left and right of a point. We calculate the length of the line segment joining the points n-w and n+w. The intuition is that this length will be smaller at the corners. All the lengths are calculated and 0.95 times the median length is taken as the threshold for the corners.

Discussion

This algorithm is a two step process, in  the bottom up step corners are found using the straw lengths. In the top-down step further calculations are done to find any missing corner or any special case. Each found corner is checked to be forming a line. If the corners don't pass the line test, then they may be removed. Then each three pair of corners are passed through co-linear test. If three corners are co-linear and each passed the line test then the middle corner is irrelevant. ShortStraw performed significantly better than previous algorithms from Sezgin and Kim and Kim.

Reading 13: A domain-independent system for sketch recognition

Citation

Yu, Bo, and Shijie Cai. "A domain-independent system for sketch recognition."Proceedings of the 1st international conference on Computer graphics and interactive techniques in Australasia and South East Asia. ACM, 2003.


Summary

This paper introduces a domain independent system for sketch recognition. There are many sketch recognition systems that use domain knowledge to identify shapes. This paper introduces a more general recognizer which can be used as a first step before applying any domain knowledge. The paper introduces the concept of feature area to match given stroke with primitive shapes. A given stroke is matched with all of the primitive shapes with feature area. If a match is found the stroke is classified as that primitive. If no match is found the stroke is recursively broken down into smaller strokes and the smaller strokes are then used to match to primitives

Discussion

Feature are is an important metric which can be used in many other ways. This is an effective way to recognize corners as we can divide the stroke into line segments using feature area and take the intersection of those lines as corners. This paper also applies direction graph to distinguish between lines, circles and helix. The direction graph for a line segment will be a line segment. The direction graph for a circle will be a straight line with slope = 2*pie/n, where n is the number of the points in the circle. If the total direction change is more than 2*pie the shape is more likely to be a spiral or helix.

Saturday, October 3, 2015

Reading 12: Sketch based interfaces: early processing for sketch understanding

Citation

Sezgin, Tevfik Metin, Thomas Stahovich, and Randall Davis. "Sketch based interfaces: early processing for sketch understanding." ACM SIGGRAPH 2006 Courses. ACM, 2006.

PDF

Summary

The important contribution by this paper is the Vertex detection algorithm. The paper re-establishes an earlier proved concept that speed of the pen reduces when a corner is reached while drawing a freehand sketch. This paper uses this approach and some more heuristics to find possible points which are corners. Mainly direction, curvature and speed graphs are used to determine the corners. The algorithm is as follows:
All the points which exceed certain threshold in the curvature and speed graphs are taken in sorted order. Then at each point a point is added to the corner set taken from either the curvature sorted points or speed sorted points based on an error metric.

Discussion

The error metric used in the corner finding algorithm is the sum of orthogonal distance squared between the actual sketch and the proposed hybrid fit. This is referred to as OSDQ in the paper. Curves are handled separately from lines and poly lines in the paper. Once the basic primitive shapes are recognized the sketch is passed through a beautification stage where slopes of line segments are altered from mid-point to match the actual drawn curve.

Friday, October 2, 2015

Reading 11: What!?! No Rubine Features?: Using Geometric-based Features to Produce Normalized Confidence Values for Sketch Recognition

Citation

Paulson, Brandon, et al. "What!?! no Rubine features?: using geometric-based features to produce normalized confidence values for sketch recognition." HCC Workshop: Sketch Tools for Diagramming. 2008.

PDF

Summary

This paper claims that geometric based features are much more effective in recognizing hand drawn sketches than gesture based features as those of Rubine's. This paper takes gestures based features i.e. the Rubine's features and geometric based features from other papers (PalioSketch) and creates a total feature set containing total 44 features. Then a subset selection of features is done to find the most effective features and it is determined that geometric based features are much more effective in making accurate decisions rather than gesture based features.

Discussion

Geometric features based sketch recognition is much more intuitive as it is more comparable to how our eyes perceive the drawn objects. When we see the final sketch we do not consider how the sketch was drawn. Though such information can further aid in classifying the sketch correctly. For example in Rubine's features a circle drawn clockwise will produce different feature values than a circle drawn counter-clockwise though the two shapes are visually similar to our eyes. This is a very important observation validated by this paper that geometric based features are much better at distinguishing sketches than gestures based features.

Thursday, October 1, 2015

Reading 10: Visual Similarity of Pen Gestures

Citation

Long Jr, A. Chris, et al. "Visual similarity of pen gestures." Proceedings of the SIGCHI conference on Human Factors in Computing Systems. ACM, 2000.

PDF

Summary

Long introduced used important features from Rubine and introduced 11 more features that enhanced recognition of single stroke freehand gestures. The last two features from Rubine's paper Maximum Speed and Total time were neglected because they were mostly independent from the shape of the gestures and were more dependent on the user who is drawing the sketch. The 11 new features introduced by Long are:
1) Aspect
2) Curviness
3) Total angle traversed / Total length
4) Density Metric 1
5) Density Metric 2
6) Non-subjective Openness
7) Area of bounding box
8) Log(Area)
9) Total Angle / Total Absolute Angle
10) Log(Total Length)
11) Log(Aspect)

Discussion

Many of the features used in Long's paper are log of quantities. For example Log of area is taken because it reduces the dramatic nature in difference in areas when the objects are small. The Log converts these dramatic scale into a much less extreme scale and is much more effective for the purpose of recognition.

Saturday, September 26, 2015

Reading 9: Gesture recognition features overview

Citation

Class handout (not for distribution).

Summary

The chapter covers the basic features which can be used to classify gestures or sketches. These features include the 13 features introduced by Rubine in the paper discussed in the previous reading. Apart from features introduced by Rubine 11 more features introduced by Long are mentioned in the chapter. These features are more or less derived or similar features from Rubine's features, such as Density Metric or Area of bounding box, etc. These extra features help in classifying more complex gestures but do not do much improvement in accuracy over Rubine's feature set.

Discussion

Certain features such as Area of bounding box can have dramatic effects in classification as area is a multiplicative feature and the delta's get multiplied. So a log of area is taken to map this dramatically changing feature to a less dramatic feature. Most of the features are derived features from Rubine's features. Example: One of the density metric is total stroke length divided by the length of the bounding box diagonal.

Friday, September 25, 2015

Reading 8: Specifying Gestures by Example

Citation

Rubine, Dean. Specifying gestures by example. Vol. 25. No. 4. ACM, 1991.

PDF

Summary

Rubine has 13 features which can be used to classify sketches. The 13 features are as follows:
1) Cosine of starting angle
2) Sine of starting angle
3) Length of bounding box diagonal
4) Angle of bounding box diagonal
5) Length of start to end point
6) Cosine of angle from start to end point
7) Sine of angle from start to end point
8) Stroke length
9) Total relative rotation
10) Total absolute rotation
11) Total squared rotation (sharpness)
12) Maximum speed
13) Total time

Discussion

A linear classifier is also specified in which each new class is trained by creating a new linear function. When doing classification, the function value is calculated for all possible classes and the class with the maximum function value is chosen as the selected class. This provides a very easy and efficient way to classify sketches but might be influenced hugely by a bad initial training example, thereby degrading the accuracy of classification step later.

Thursday, September 24, 2015

Reading 7: Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes

Citation

Wobbrock, Jacob O., Andrew D. Wilson, and Yang Li. "Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes."Proceedings of the 20th annual ACM symposium on User interface software and technology. ACM, 2007.

PDF

Summary

The paper introduces a relatively simple algorithm to classify gestures. This is intended for people who have little or no knowledge about pattern recognition. The algorithm as 3 preprocessing steps and one step for training or classification. The training can be done by a single example. The three preprocessing steps are:
1) Resampling to points to a set of 64 points.
2) Getting rotational invariance. This is done by making the angle between the first point and the centroid of the points to be zero.
3) The next step is to do uniform scaling and aligning the centroid with a reference point.

The last step is different for training or classification. In training the the positions of points is stored for later retrieval. During classification, average distance between the points is computed and a final similarity match from all possible classes is choose based on this distance. 

Discussion

The algorithm is called $1 algorithm because it is very cheap and easy. The code itself can be implemented in less than 100 lines of code. Also the idea behind eliminating rotational invariance is very interesting and works in most easy cases. Golden state search which is a variation of hill climbing algorithm is used to find the indicative rotation angle.

Wednesday, September 16, 2015

Reading 6: Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams

Citation

Bhat, Akshay, and Tracy Hammond. "Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams." IJCAI. Vol. 9. 2009.

PDF

Summary

Sketches in mechanical design or UML diagrams consist of both shapes and text. Existing systems are good at recognizing each separately but not together. The paper presents a single feature (zero order entropy) which can be used to distinguish between shapes and text with accuracy of 92.06%. The main underlying idea is that entropy (or randomness) of text strokes is much higher than shape strokes.

Discussion

The series of strokes are converted into a string of characters from (A, B, C, D, E, F, X) where A to F represent the angle a point in the stroke makes with the two neighboring points. A to F represent angles from 0 to pie. X for starting points.  This string of characters is pre-processed and entropy value is calculated. The entropy of text is visibly higher than other shapes. One question that arises is there could be shapes that are very complex, their entropy could also be higher.

Sunday, September 13, 2015

Reading 5: An image-based, trainable symbol recognizer for hand-drawn sketches

Citation

Kara, Levent Burak, and Thomas F. Stahovich. "An image-based, trainable symbol recognizer for hand-drawn sketches." Computers & Graphics 29.4 (2005): 501-517.

PDF

Summary

Paper presents a new way to recognize hand-drawn figures in a sketch. The paper uses image-based approach. Achieve rotational in-variance (using polar coordinates), which is difficult in template matching systems. Current systems achieve rotation in-variance but at high cost which is not suited for interactive use. The system requires single prototype example to learn an new symbol. 

Discussion

First the definitions which are markedly dissimilar using polar coordinates are pruned. The remaining are passed through 4 classifiers, the results of which are pooled and final decision is made. The 4 classifiers use Hausdorff Distance, Modified Hausdorff Distance, Tanimoto Similarity Coefficient and Yule Coefficient. The output of the 4 classifiers is different, they are converted into a similar unit of measurement, normalized and then combined.

Thursday, September 10, 2015

Reading 4: K-sketch: a 'kinetic' sketch pad for novice animators

Citation

Davis, Richard C., Brien Colwell, and James A. Landay. "K-sketch: a'kinetic'sketch pad for novice animators." Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2008.

PDF

Summary

The paper presents a 2-D animation sketching system called K-sketch. Provides simple and fast interface. Comparison made with PowerPoint. Test participants showed work done three times faster and with less cognitive load. 

Discussion

The interface design was considered as an optimization problem. There are many operations that are supported such as rotating the sketch, moving it, transforming it, rotating it, etch. The final output is a playable video which shows the animation. I think such tools are not rich enough to replace professional animation tools where accuracy and a method to define the animation accurately is more needed than being able to describe the animation easily and vaguely.

Wednesday, September 9, 2015

Reading 3: iCanDraw? – Using Sketch Recognition and Corrective Feedback to Assist a User in Drawing Human Faces

Citation

Dixon, Daniel, Manoj Prasad, and Tracy Hammond. "iCanDraw: using sketch recognition and corrective feedback to assist a user in drawing human faces."Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2010.

PDF

Summary

System designed to help people draw human faces. Human face in an image is first pre-processed using facial recognition algorithms to extract major feature points of the face. These facial points are then used to provide suggestions and similarity scores to users. Huge emphasis is on teaching style and how to guide user to draw better images. The teaching style adopted in this paper is step-by-step instruction coupled with timely corrective feedback.

Discussion

The reference images is pre-processed to extract 40 facial features. These 40 point features are used to set a template which is used to match and evaluate the user's drawn sketch. A similarity measures is established by considering a sliding window of 4 points in the user drawn sketch and calculating the distance between the closest 4 points in the set template. The user is given feedback based on the similarity score.

Tuesday, September 8, 2015

Reading 2: Sketch Recognition Algorithms for Comparing Complex and Unpredictable Shapes

Citation

Field, Martin, et al. "Sketch recognition algorithms for comparing complex and unpredictable shapes." IJCAI. 2011.

PDF

Summary

This paper provides further insights into the algorithms discussed in Mechanix paper discussed in Reading 1. This paper more focuses on the algorithms used in Mechanix to compare two sketches. The three metrics used for comparison are Hausdorff Distance, Modified Hausdorff Distance and a variant of Tanimoto Coefficient.

Discussion

Hausdorff Distance gives the maximum of all the minimum distances between any two points in the two sketches being compared. Modified Hausdorff Distance takes the mean of Hausdorff distances from A to B and B to A. Then a match probability is calculated. The third metric is a modified version of Tanimoto Coefficient which takes into account the matching from sketch A to B as well from B to A. An average of the above three metrics is computed and if it is above a certain threshold (0.65) the sketches are considered to be matched.

Reading 1: Mechanix: A Sketch-Based Tutoring and Grading System for Free-Body Diagrams

Citation

Valentine, Stephanie, et al. "Mechanix: a sketch-based tutoring and grading system for free-body diagrams." AI Magazine 34.1 (2012): 55.


Summary

The paper introduces a tool named Mechanix which is used for teaching courses which require sketch based assignments. When number of students is very large, it becomes difficult for the instructor to review and grade individual's assignments. Also the final result doesn't really show if the student went through the right steps to come with the outcome. Mechanix provides instructors to create questions and expected sketched solution. Primary focus is drawing Truss and applying Forces and reaction forces at different points. From the students's sketch, first Truss is detected and forces are matched at different nodes. Then the student's response with instructor's expected solution using Hausdorff Distance, Modified Hausdorff Distance and Tanimoto Coefficient. Student's are also provided by a creative response if their response is wrong at certain step. Student has to go through all the required steps to complete the question.

Discussion

First the sketch is segmented into line segments. Then two nodes with shared edge and two paths between them are considered as Truss. To detect this Breadth First Search is used to first find the shortest path, and then if there exists any other path between the two nodes, that implies Truss is detected.

Wednesday, January 28, 2015

LeetCode: Container with most water

I was trying to think about this problem from 2 days but really couldn't reach the optimal solution. I really loved the problem when I studied the solution. Here's the problem:

https://oj.leetcode.com/problems/container-with-most-water/

Given n non-negative integers a1, a2, ..., an, where each represents a point at coordinate (i, ai). n vertical lines are drawn such that the two endpoints of line i is at (i, ai) and (i, 0). Find two lines, which together with x-axis forms a container, such that the container contains the most water.

Note: You may not slant the container.

The brute force solution is easy to think. Just check the area for each pair of starting and ending line and find the pair with maximum area. Running time is O(n^2)

However, this problem has a very neat O(n) solution. Start with two pointers left = 0 and right = n-1. Calculate it's area and see if it is maximum. Now if height of left pointer is less than the height of the right pointer, increment left, else increment right. Now this part is difficult to understand. Here's the intuition: if the height of the left pointer is less than the height of the right pointer, even if you move the right pointer you cannot do any better; because even if the new right is even bigger the left height will be minimum of the two. And in the other case if the new right is even smaller, it anyhow cannot make any improvement. So why increment left? If we reach a new left height which is larger then we can get a new maximum area. Keep doing this till left < right.

Here's the code which got accepted in LeetCode Online Judge.

class Solution {
public:
    int maxArea(vector<int> &height) {
        int left = 0, right = height.size()-1;
        int max_area = 0, cur_area;
        while(left < right){
            cur_area = (right-left)*min(height[left], height[right]);
            max_area = max(cur_area, max_area);
            if(height[left] < height[right])
                left++;
            else
                right--;
        }
        return max_area;
    }
};