Sketch Recognition: 十一月 2010

2010年11月11日星期四

Reading #6 Protractor: An enhanced $1 using angular distance

Reading #6 Protractor: An enhanced $1 using optimal angular distance

Comments on:
Jianjie Zhang

Summary:

Li designed Protractor using a angular distance to calculate the similarity between candidates and templates. It is essentially similar with $1, both of them are template based recognizer. Compared with parametric recognizer(like Rubine’s method), template based recognizer (1) does not need many training data; (2) do not have to choose features(which is not easy) to represent the parametric model, therefore can be easily customized as long as the user provided related samples. However, template recognize is time and space consuming caused by many comparison needed.

In spite of the similarity, Protractor has many differences from $1:

(1) Protractor can be orientation-sensitive, it rotate the stroke to one of the 8 base orientations which require minimum rotation.

(2) Choose N=16 instead of N=64 to improve the computing speed and decreases the storage needed.

(3) Using optimal angular distance rather than Euclidian distances to compute of similarity from certain template.

(4) Because the size of stroke is irrelevant to compute angular distance, Protractor does not resize the stroke.

(5) As for searching the optimal rotation, Protractor uses a closed-form solution to compute the optimal rotation rather than the time-consuming iterative approach used in $1 recognizer.

Discussion:

Protractor is a modification of $1 in many details, the biggest change is that it use angular distance rather than Euclidean distance. He also shows, when the critical distance computing method is changed, the supplementary processing of the strokes may also changes. I think, on the other side, the insight of how to preprocessing of the stroke can also lead to some new ways to compute the distances.

As indicated by the author, parametric recognizer is based on a parametric model and it is not easy to find a great model to include all the aimed gesture and exclude unrelated gestures. Does it mean that this kind of method is not promising along with the improvement of computer speed and storage?

2010年11月10日星期三

Reading #2 Rubine’s Linear Recognizer

Reading #2 Rubine’s Linear Recognizer

Comments on:
Amir

Summary:

In this paper (in 1991), Rubine introduces GRANDMA as a toolkit to easy the programmer of gestural interface. He uses GDP as an example of gesture based application. It defines some gestures and their related operations, using GRANDMA to recognize the input and then perform specified manipulation on graph. GRANDMA can only recognize single-stroke gestures.

Then, Rubine explains his gesture recognizer. His recognizer has two steps: first, calculate some features of the stroke. Then, using a linear machine, the stroke is classified as the class which returns the maximum value.

The author wants these features:

(1) can be computed in O(1) per input;

(2) to be meaningful, Among the 13 features chosen subjectively (or empirically) by Rubine, the first 11 are about the geometry character of a stroke and the last two are about speed.

(3) of course, be able to differentiate strokes.

A linear evaluation is computed to classify gestures. It simple adds all the features multiplied with different weights to give a value which can represent the fitness of being certain class.

The weights’ assignment is the critical part of the classification. They are obtained from training data as:

Rubine also introduce two ways to reject a classification result:

And Mahalanobis distance:

Rubine also discuss the tradeoff between rejection and undo in the gesture application.

Discussion:

Compare with $1 dollar, Rubine’s recognizer is more efficient after the training. It has limitation in the case that not enough data is offered (as Rubine’s analysis, 15 samples are needed to give a good accuracy). Another one is it can only deal with single-stroke gesture.

The interesting thing is Rubine mention multi-finger input at the end of his paper. Does it has any relation with the Jeff Han’s multi-touch screen technologies?