Saturday, October 31, 2015

Reading 22: HMM-based efficient sketch recognition

Citation

Sezgin, Tevfik Metin, and Randall Davis. "HMM-based efficient sketch recognition." Proceedings of the 10th international conference on Intelligent user interfaces. ACM, 2005.



Summary

This paper presents a novel way of doing sketch recognition by considering it as an interactive process where strokes are entered one after another. The system captures that stroke ordering to draw a complex shape tell a lot about what shape is being drawn. Earlier sketch recognition systems only used geometric or gesture based methods to conclude what shape is being drawn. The authors conducted a user study and found out that most people follow very strict stroke ordering when drawing a complex system.

Sketch recognition involves grouping strokes that constitute the same object (segmentation) and determining the object class for each group (recognition). The framework the authors introduce in this paper provides a mechanism for capturing an individual's preferred stroke ordering during sketching, and uses it for efficient segmentation and recognition of hand-sketched objects in polynomial time.

Discussion

The system uses Hidden Markov Models to model individual shapes. When a stroke is input the system checks among the trained HMM models to give probabilities of the predicted shapes. The shape with the maximum probability is chosen. The system is incremental, as the user adds more strokes to the sketch, the new sketch is again run through the trained HMM models to get the new classification probabilities for the complex shape.

The system also differs from numerous sketch recognition systems by its ability to do recognition with only polynomial time and space complexity, and by its utilization of drawing order for capturing and modeling user sketching styles.

Reading 21: LADDER, a sketching language for user interface developers

Citation

Hammond, Tracy, and Randall Davis. "LADDER, a sketching language for user interface developers." Computers & Graphics 29.4 (2005): 518-532.

PDF


Summary

This paper presents LADDER which is a language to describe how sketches are drawn, displayed or edited in a domain. These descriptions are then transformed into actual recognizers which can recognize and show the described behaviors of displaying and editing the recognized sketch. Such system will let any scientist to create a sketch recognition system tuned for a particular domain by specifying all the shapes that need to be recognized for that particular domain.

The five main things that define a shape in LADDER are:
1) Components: The components (mainly primitives) consisting of the new shape.
2) Constraints: Constraints on how the components are organized in the sketch.
3) Aliases: Giving names to components that can be reused describing certain properties.
4) Editing: Set of triggers and actions which describe any editing behavior on the sketch. Such as double clicking the shape should case it to get selected.
5) Display: How the shape should be displayed, should the components be used as original or beautified or should some color be applied to the stroke. 

Discussion

LADDER is also hierarchical so new shapes can be described by using existing defined shapes. It also supports abstract shapes so that shapes belonging to a particular category inherit certain properties from a common parent class. 
An example description for an arrow shape is shown below:


Reading 20: Recognizing sketched multistroke primitives

Citation

Hammond, Tracy, and Brandon Paulson. "Recognizing sketched multistroke primitives." ACM Transactions on Interactive Intelligent Systems (TiiS) 1.1 (2011): 4.

PDF

Summary

This paper presents a novel way to detect multiple stroke primitives. The system achieves this by joining the strokes which may seem logically connected but are actually two different strokes. Once the strokes are combined then it is checked if the new shape match any primitive shape. The system uses low level recognizers such as Paleosketch to detect single stroke primitives against a database of known sketches. The flowchart for the algorithm is as follows:


Discussion

The first step is the graph building step. In this step a graph is built with each node representing an endpoint of a stroke and an edge representing a connection between the other end of the stroke or any other endpoint which is nearby but from another stroke. A special criterion is used to measure nearness of two nodes in the graph:
1) Euclidean distance between the endpoints divided by the total stroke length must be within a threshold.
2) Euclidean distance between the endpoints divided by the max of length or width of the bounding box must be within a threshold.

After the graph is built a graph searching algorithm is applied to get a strongly connected component from the graph. Tarjan's algorithm is used to find the strongly connected component in linear time. Tarjan's algorithm uses Depth First Search.

The strongly connected component is assumed to be a single stroke and the third step viz. stroke merging is applied. All the stroke endpoint that have an edge in the strongly connected component are connected to each other using a straight line.

Once we get a new combined shape low level recognizers are used to detect primitives. Some additional steps to remove false positives are also applied.

Thursday, October 22, 2015

Reading 19: Recognizing Text Through Sound Alone

Citation

Li, Wenzhe, and Tracy Anne Hammond. "Recognizing text through sound alone." Twenty-Fifth AAAI Conference on Artificial Intelligence. 2011.

PDF

Summary

The paper presents a novel way to detect English characters drawn using a pen, key or fingernail on a rough surface using only sound. The key idea is that as in sketching our pen slows down during corners, the sound of the stroke also changes with the speed of the stroke. In this algorithm first the authors apply endpoint noise removal. Two algorithms have been discussed for the same. The first algorithm calculates the energy of the signal for the first 150 ms and this level is called the environmental noise level. Then for each frame of 45 ms compares the energy of that frame to the environmental noise calculated earlier. If E(frame)*T > E(noise) then that frame is considered the start of the signal. The same is done to remove the endpoint from the behind. The second method uses first 20 ms of the signal to calculate the Gaussian mean and standard deviation. Then for each 10 ms segment we calculate (x - mean)/st.deviation . If this value is below the threshold of 3 then the segment is considered a valid segment.

Discussion

The authors use two main features to detect different shapes. The first is the mean amplitude of the signal. The second is the MFCC (Mel-Frequency Capstral Coefficients). They are difficult to calculate and outside the scope of the paper but serve as very effective feature. Then each shape entered by user is compared against a template of all the shapes and best match is found. The formula used for template matching is as follows:

Saturday, October 17, 2015

Reading 18: Geometry Review

Citation

Handout distributed in class.

Summary

The handout distributed in class contains some basics of geometry. It starts with the introduction of Pythagoras theorem which states in a  right angle triangle the hypotenuse squared is equal to base squared plus height squared. If the angle between the base and hypotenuse is theta then the length of the base is hCos(theta) and length of the height is equal to hSin(theta) where h is the length of the hypotenuse.
Also, relationship between the sides of a triangle is discussed:









Also relation relation between sides of a parallelogram and its area are discussed. The area of a parallelogram with diagonally opposite endpoints as (ax, ay) and (bx, by) is determinant of the matrix [ax, bx; ay, by]

Discussion

To prove that area of a parallelogram is equal to the determinant of the matrix made by the coordinates of the diagonally opposite points, you first divide the parallelogram into two equal triangles divided by the diagonal. Then calculate the area of each triangle using formulas discussed previously. Then add those two areas to get the final area which comes out to be ax*by - ay*bx which is the determinant of the matrix [ax, bx; ay, by]

Reading 17: PaleoSketch: accurate primitive sketch recognition and beautification

Citation

Paulson, Brandon, and Tracy Hammond. "PaleoSketch: accurate primitive sketch recognition and beautification." Proceedings of the 13th international conference on Intelligent user interfaces. ACM, 2008.

PDF


Summary

Paleosketch provides a a new low level primitive recognizer with ranking based system for providing different interpretations of the same sketch. The paper introduces two new features which very effectively distinguish between most primitive shapes. The primitive shapes considered in this paper are line, arc, circle, ellipse, polyline, helix, spiral, curve. A sketch could be recognized as combination or one or more primitives. The two new features are NDDE (Normalized distance between direction extremes) which is equal to the distance between the direction extreme points divided by the total stroke length. The other feature is DCR (Direction Change Ratio) which is maximum change in direction divided by average change in direction.
The paper also introduces a new ranking algorithm which provides ranking to different interpretations of the same sketch by adding the rankings of low level primitives. Line as rank 1, Arc 3, Helix, Spiral, Ellipse have rank 5.

Discussion

NDDE is usually close to 1 for highly curved shapes with direction changing in only one direction. For example circle, helix, spirals will have high NDDE. Whereas shapes like line, polyline where distance between direction extreme points is not the same as the length of the stroke the NDDE value is low. DCR is high for shapes where there is a sudden change in direction. So the maximum change in direction is high comparing it to average change in direction. So DCR will be high for polylines or shapes with sharp edges. DCR will be low for highly curved shapes as there is always slight change in direction and not a sharp one.

Sunday, October 11, 2015

Reading 16: Combining corners from multiple segmenters

Citation

Wolin, Aaron, Martin Field, and Tracy Hammond. "Combining corners from multiple segmenters." Proceedings of the Eighth Eurographics Symposium on Sketch-Based Interfaces and Modeling. ACM, 2011.



Summary

This paper introduces a novel way to detect corners in sketches. Previous algorithms in corner detection work well for one case but might fail for another. This paper introduces a hybrid approach by combing the results of successful corner finding algorithms and trying to find the best subset of all the points detected by those algorithms. The paper uses subset selection to find the best subset. The algorithm is as follows:
1) Take corners detected by five famous corner finding algorithms viz PaleoSketch, Douglas Peucker, Sezgin, Kim and Kim, ShortStraw.
2) Take all these corners as initial set and apply subset selection by using Mean Square Error as the error metric.
3) Output the best subset.

Discussion

Subset selection algorithm used in this paper also allows to add any point that was removed earlier. Mean Square Error (MSE) itself was not used to set the threshold for finding the best subset because difference in MSE might be high for larger scale sketches and smaller for smaller scale sketches. So a ratio of the MSE's was taken. To find the threshold for finding the best subset in the delta MSE graph a dynamic approach was used. A system was trained on existing sketches and a delta MSE was calculated for n+1 to n named R(below) and n to n-1 named R(above) where n is the correct number of corners in that sketch. It was observed R(below) values lie close to 1 as removing irrelevant corners didn't increase MSE much. Whereas R(above) values had a higher range. R(below) and R(above) were represented as Gaussian curves and their intersection was taken as the threshold which came out to be 1.99. To fit R(below) and R(above) to Gaussian curves instead of using Mean and Standard Deviation of the collection, Median and MAD (Median Absolute Deviation) were used as Mean and Standard Deviation of the Gaussian Curve. The accuracy of this method was observed to be higher than any previous method. More specifically the all or nothing accuracy was considered the most important metric to measure the accuracy of this corner finding algorithm.

Reading 15: Revisiting ShortStraw: improving corner finding in sketch-based interfaces

Citation

Xiong, Yiyan, and Joseph J. LaViola Jr. "Revisiting ShortStraw: improving corner finding in sketch-based interfaces." Proceedings of the 6th Eurographics Symposium on Sketch-Based Interfaces and Modeling. ACM, 2009.

PDF


Summary

This paper introduces a new method for corner finding called IStraw. This method tries to improve the existing method called ShortStraw. The two major shortcomings that the authors discussed in ShortStraw are not using direction information which is already present to enhance the corner finding algorithm and ShortStraw cannot handle curves and works only for polylines. The authors also argue that there is no straw length for the first w and last w points where w is the window used to find the straw lengths.

Discussion

The authors introduces formulas for finding the straw lengths for the first w and last w points. The idea is that missing straw lengths for these points could lead to some missed corners. Instead of the fixed empirical threshold in ShortStraw, IStraw uses a dynamic threshold. The paper also introduces a novel way to differentiate between a polyline intersection and a curve. The idea is to take equidistant points on either side of the detected corner. In a polyline intersection as these points come closer to the corner the angle doesn't change. Whereas in curves as these points come closer to the detected corner the angle increase. This finding is used to classify an earlier detected corner to be false. Overall IStraw improves all or nothing accuracy over ShortStraw and previous methods.


Reading 14: ShortStraw: A Simple and Effective Corner Finder for Polylines

Citation

Wolin, Aaron, Brian Eoff, and Tracy Hammond. "ShortStraw: A Simple and Effective Corner Finder for Polylines." SBM. 2008.

PDF


Summary

This paper introduces a novel way to find corners in a sketch. This paper doesn't use any speed or direction information to find corners. The basic idea is to first re sample the sketch into points which are equidistant. Then take a window of points (w = 3) on left and right of a point. We calculate the length of the line segment joining the points n-w and n+w. The intuition is that this length will be smaller at the corners. All the lengths are calculated and 0.95 times the median length is taken as the threshold for the corners.

Discussion

This algorithm is a two step process, in  the bottom up step corners are found using the straw lengths. In the top-down step further calculations are done to find any missing corner or any special case. Each found corner is checked to be forming a line. If the corners don't pass the line test, then they may be removed. Then each three pair of corners are passed through co-linear test. If three corners are co-linear and each passed the line test then the middle corner is irrelevant. ShortStraw performed significantly better than previous algorithms from Sezgin and Kim and Kim.

Reading 13: A domain-independent system for sketch recognition

Citation

Yu, Bo, and Shijie Cai. "A domain-independent system for sketch recognition."Proceedings of the 1st international conference on Computer graphics and interactive techniques in Australasia and South East Asia. ACM, 2003.


Summary

This paper introduces a domain independent system for sketch recognition. There are many sketch recognition systems that use domain knowledge to identify shapes. This paper introduces a more general recognizer which can be used as a first step before applying any domain knowledge. The paper introduces the concept of feature area to match given stroke with primitive shapes. A given stroke is matched with all of the primitive shapes with feature area. If a match is found the stroke is classified as that primitive. If no match is found the stroke is recursively broken down into smaller strokes and the smaller strokes are then used to match to primitives

Discussion

Feature are is an important metric which can be used in many other ways. This is an effective way to recognize corners as we can divide the stroke into line segments using feature area and take the intersection of those lines as corners. This paper also applies direction graph to distinguish between lines, circles and helix. The direction graph for a line segment will be a line segment. The direction graph for a circle will be a straight line with slope = 2*pie/n, where n is the number of the points in the circle. If the total direction change is more than 2*pie the shape is more likely to be a spiral or helix.

Saturday, October 3, 2015

Reading 12: Sketch based interfaces: early processing for sketch understanding

Citation

Sezgin, Tevfik Metin, Thomas Stahovich, and Randall Davis. "Sketch based interfaces: early processing for sketch understanding." ACM SIGGRAPH 2006 Courses. ACM, 2006.

PDF

Summary

The important contribution by this paper is the Vertex detection algorithm. The paper re-establishes an earlier proved concept that speed of the pen reduces when a corner is reached while drawing a freehand sketch. This paper uses this approach and some more heuristics to find possible points which are corners. Mainly direction, curvature and speed graphs are used to determine the corners. The algorithm is as follows:
All the points which exceed certain threshold in the curvature and speed graphs are taken in sorted order. Then at each point a point is added to the corner set taken from either the curvature sorted points or speed sorted points based on an error metric.

Discussion

The error metric used in the corner finding algorithm is the sum of orthogonal distance squared between the actual sketch and the proposed hybrid fit. This is referred to as OSDQ in the paper. Curves are handled separately from lines and poly lines in the paper. Once the basic primitive shapes are recognized the sketch is passed through a beautification stage where slopes of line segments are altered from mid-point to match the actual drawn curve.

Friday, October 2, 2015

Reading 11: What!?! No Rubine Features?: Using Geometric-based Features to Produce Normalized Confidence Values for Sketch Recognition

Citation

Paulson, Brandon, et al. "What!?! no Rubine features?: using geometric-based features to produce normalized confidence values for sketch recognition." HCC Workshop: Sketch Tools for Diagramming. 2008.

PDF

Summary

This paper claims that geometric based features are much more effective in recognizing hand drawn sketches than gesture based features as those of Rubine's. This paper takes gestures based features i.e. the Rubine's features and geometric based features from other papers (PalioSketch) and creates a total feature set containing total 44 features. Then a subset selection of features is done to find the most effective features and it is determined that geometric based features are much more effective in making accurate decisions rather than gesture based features.

Discussion

Geometric features based sketch recognition is much more intuitive as it is more comparable to how our eyes perceive the drawn objects. When we see the final sketch we do not consider how the sketch was drawn. Though such information can further aid in classifying the sketch correctly. For example in Rubine's features a circle drawn clockwise will produce different feature values than a circle drawn counter-clockwise though the two shapes are visually similar to our eyes. This is a very important observation validated by this paper that geometric based features are much better at distinguishing sketches than gestures based features.

Thursday, October 1, 2015

Reading 10: Visual Similarity of Pen Gestures

Citation

Long Jr, A. Chris, et al. "Visual similarity of pen gestures." Proceedings of the SIGCHI conference on Human Factors in Computing Systems. ACM, 2000.

PDF

Summary

Long introduced used important features from Rubine and introduced 11 more features that enhanced recognition of single stroke freehand gestures. The last two features from Rubine's paper Maximum Speed and Total time were neglected because they were mostly independent from the shape of the gestures and were more dependent on the user who is drawing the sketch. The 11 new features introduced by Long are:
1) Aspect
2) Curviness
3) Total angle traversed / Total length
4) Density Metric 1
5) Density Metric 2
6) Non-subjective Openness
7) Area of bounding box
8) Log(Area)
9) Total Angle / Total Absolute Angle
10) Log(Total Length)
11) Log(Aspect)

Discussion

Many of the features used in Long's paper are log of quantities. For example Log of area is taken because it reduces the dramatic nature in difference in areas when the objects are small. The Log converts these dramatic scale into a much less extreme scale and is much more effective for the purpose of recognition.