2010年10月19日星期二

Reading #14 Using Entropy to Represent Curvature

Reading #14 Using Entropy to Represent Curvature
Comments on:

Summary:
This paper proposes an important feature that distinguished shape from text. That is, the entropy of stokes.  This paper thinks the text strokes are more randomly structured than common shapes. Therefore, we can use entropy measure to show their differences.
This paper defines an Entropy Model ‘Alphabet’. According to the angle with adjoining points, this paper matches each point to a one of seven symbols in the ‘alphabet’(six of them represent range of angle, and the last one represent ‘End point’). Then, we get an alphabet representation of ink strokes which can be computed entropy.
The implementation of this method firstly uses thresholds (spatial and temporal) to group the strokes in order to classify them later. Then resample the strokes, convert them into alphabet representation and compute their entropy using Shannon’s formula. After these steps, this paper classifies the strokes by the threshold obtained from training dataset. Except from test symbol and shapes, this paper’s method marks the strokes as “unclassified” when their entropies fall into neither of shape’s and text’s range.
This paper also computes the confidence of every classification by the distance of its distance from the threshold.
The results show that the accuracy changes along with the change of percentage of ‘unclassified’ strokes allowed. Also, the threshold shows reliablity when used on different domains.
Discussion:
I think the insight below the entropy method is that the curvature is an important feature that distinguishes text from shapes.  That is, the test symbols tend to have larger curvature changes on average length. Then, what is the advantage of using entropy instead of directly using curvature?
Also, I doubt that the entropy rate can be treated as a domain-independent feature to distinguish shape and text. For example, some languages may have text more like a shape.


没有评论:

发表评论