Chinese Text Analyzer (in Python)

Now that I’ve finished writing mnemonics for all HSK characters I started reading 活着 by 余华 and needed to figure out which characters I should learn (i.e. write mnemonics for) to enjoy reading this novel. Therefore I wrote a simple Chinese Text Analyzer which counts the appearance of each character and sorts by HSK. Here it is:

Continue reading Chinese Text Analyzer (in Python)

Practicing Chinese character stroke order in Anki: Natural tracing with JavaScript

A couple of weeks ago I found an amazing resource: makemeahanzi on Github by the excellent Shaunak Kishore. It is a repository of Chinese character stroke order information which he created by applying machine learning to the fonts Arphic PL KaitiM GB and Arphic PL UKai. There is also a natural app he is working on and you should definitely check it out: Inkstone. In the makemeahanzi repository, the individual strokes are saved in SVG format, and it is not very complicated to render them via JavaScript. The big advantage is that Anki naturally supports JavaScript, no plug-ins are necessary (edit: see below), and so the whole thing also works on Ankidroid. Have a look at this short video to see what it looks like:

Continue reading Practicing Chinese character stroke order in Anki: Natural tracing with JavaScript

Picking a random (Chinese) character in ruby

Here’s a simple way to choose a random Chinese character in ruby:

[*"\u4E00".."\u9FFF"].sample(1)

The asterisk expands the range, and “sample(1)” gives you one of the entries.

The unicode block 4E00 to 9FFF contains more than 20000 Chinese characters. If that’s not enough for you, you can find more code blocks here: https://en.wikipedia.org/wiki/CJK_Unified_Ideographs

Shoutout to http://yankist.com/blog/2012/11/28/generate-random-string-in-ruby/ for the many versions on how to create a random string.

I need this random Chinese character for my Factory Girl. She’ll create a new Hanzi everytime I call “let”:

let(:hanzi) { FactoryGirl.create(:hanzi) }

This is my Hanzi factory:

factory :hanzi do
sequence(:character) { |n| [*"\u4E00".."\u9FFF"].sample(1).join("") }
sequence(:components) { |n| [*"\u4E00".."\u9FFF"].sample(3).join("") }
end

The sequence is required to pick a new sample for every test run.