I started attending Mandarin language classes recently at the Meridian Chinese School in London. Studying involves a 2 hour lesson once a week and a few hours spent at home revising what I’ve learnt. And one of the best ways to study is to practise writing the characters (fun too!) and translating sentences. So I decided to build a web app which would allow me to practise whilst on the go. My aim was to enable character recognition using HTML 5 canvas and get it working on mobiles.
Here are some notes on the technical aspects:
Stroke input recognition
After much testing I decided to disable canvas stroke input for now and instead provide Pinyin input as well as the ability to input characters directly (in case you have a Chinese keyboard input method available for your device, which I do :).
Note: Thinking about the stroke input recognition, a smarter algorithm would compare the changes in angle and length between consecutive strokes rather than the individual stroke measurements themselves. A harder problem is solving for stroke mismatches. If the user inputs a curved line which gets interpreted as a straight line and yet the actual dictionary character stroke sees it as two closely connected straight lines, how do yo match them up in a consistently, repeatable manner? I might get around to these problems later on. For now you can see the code I’ve got by looking at the canvas_stroke_input branch in the Git repository.
For now I’ve hard-coded a whole bunch of sentences and their English translations in the
data module, categorizing them by study unit. In future it would be good to implement true sentence builders, i.e. algorithms which pick a subject, object, action, etc. and construct an appropriate sentence. Such randomization will be a better test for the user.
The dictionary module
A core part of the system is the
dict module. This contains a list of characters along with their matching pinyin representations (one character may have multiple pinyin representations) and also contains methods for looking up character by pinyin.
There is also a
Sentence object. This takes as input a string of characters and then allows you to see whether they match another string of characters to. The matching algorithm is careful enough to avoid punctuation marks (because different users may input them differently) and also returns a list of mismatched characters. To understandl exactly how it works you can look at the nodeunit test for this module.
I’ll happily accept any contributions to making this site better, and in particular any help on the stroke input recognition system. Please feel free to fork the github repo.