Sketch Recognition: 十月 2010

2010年10月20日星期三

Reading #5 $1 Recognizer

Reading #5 $1 Recognizer

Comments on:
Wenzhe Li

Summary:

$1 recognizer includes four steps:

1. Resample the points of the stroke to a specific number N. This is achieved by adding new point at every distance of L/N along the path, where L is the length of the path.

2. Rotate the path by adjusting the angle formed between centroid and first point to ‘0’.

3. Scale and translate: expand or contract the original path into a square with width of ‘size’. Then, adjust its centroid to (0, 0).

4. Compute the sum of counterpart points’ distance (path distance) between the candidate and each template. Then, convert it to a score of matching to each template.

In order to find the optimal rotation angle which minimizes the path distance, the author also analyzes the usefulness of hill-climbing and GSS for similar comparison and dissimilar comparison. The result shows GSS is more efficient than Hill-climbing.

The four steps of $1 recognizer also prevent it from recognizing gestures that are sensitive on orientation, aspect ratios or position.

Discussion:

The advantage of hill-climbing in finding the optimal rotation angle is the number of iterations is small for similar pairs. And its drawback is the large number of iterations when used on dissimilar pairs. Since the suboptimality only decrease the possibility of mistake matches, why not just set a threshold, (e.g. 10)? The result of Hill-climbing could be as efficient as GSS as well as possibly increase our accuracy.

2010年10月19日星期二

Reading #14 Using Entropy to Represent Curvature

Reading #14 Using Entropy to Represent Curvature

Comments on:

Summary:

This paper proposes an important feature that distinguished shape from text. That is, the entropy of stokes. This paper thinks the text strokes are more randomly structured than common shapes. Therefore, we can use entropy measure to show their differences.

This paper defines an Entropy Model ‘Alphabet’. According to the angle with adjoining points, this paper matches each point to a one of seven symbols in the ‘alphabet’(six of them represent range of angle, and the last one represent ‘End point’). Then, we get an alphabet representation of ink strokes which can be computed entropy.

The implementation of this method firstly uses thresholds (spatial and temporal) to group the strokes in order to classify them later. Then resample the strokes, convert them into alphabet representation and compute their entropy using Shannon’s formula. After these steps, this paper classifies the strokes by the threshold obtained from training dataset. Except from test symbol and shapes, this paper’s method marks the strokes as “unclassified” when their entropies fall into neither of shape’s and text’s range.

This paper also computes the confidence of every classification by the distance of its distance from the threshold.

The results show that the accuracy changes along with the change of percentage of ‘unclassified’ strokes allowed. Also, the threshold shows reliablity when used on different domains.

Discussion:

I think the insight below the entropy method is that the curvature is an important feature that distinguishes text from shapes. That is, the test symbols tend to have larger curvature changes on average length. Then, what is the advantage of using entropy instead of directly using curvature?
Also, I doubt that the entropy rate can be treated as a domain-independent feature to distinguish shape and text. For example, some languages may have text more like a shape.