Changjei generation and accelerated code table from cns11643 Chinese official standard full flash

Has long been,I'm rightCJandExpressThis is useful advice ...... circulated on the Internet version of the code table many fellow sufferers,But the name called "CJ",By contrast and Wubi,Anyhow, also points a stroke 86 and 98,Then there is little new in the new century, and so on,But no Changjei,No matter what version of the CJ,Called "CJ"。

This is very embarrassing,Changjei code table is different for each person with,But not the same where,He did not know。

In short, Full font (cns11643) provides a common Chinese official coding,Actually, this stuff is used to make up for the Chinese utf8 like expression of lack of capacity,Like gb18030,But it is interesting that the data package also includes CJ coding,Chinese pronunciation、Radicals、Number of strokes、Stroke process data, etc.,Therefore bring the generated code table Changjei,It should be considered "official" of the bar code table。


After downloading the data packet has a description,turn up Open_Data/Napidagtbles/Unicode table of Contents,Inside there are three coding mapping file,We need to put out the coding process mapping dictionary,Otherwise, the direct use of full character encoding human readable。

For example, this:

Coding is very simple,Unicode character code corresponding to the whole coding,We only need the corresponding unicode encoded into human readable text to,The key statement is as follows (Python 3):

Note here that the fifth line,We added to the former unicode code \in Prefix,Let him become a unicode of escape,Then all of the unicode string encoding rules to convert the shape became " uffff" of,Then encode a binary string,Then unicode decoding to generate a human-readable Chinese characters。

After generating the appearance is as follows:

of course,Also included like this:

These are because the scope of the text font-wide expression ratio of large to utf8,You can not see these words in ordinary devices,So it is displayed as garbled。In short,We do not need these words,After all, most of the equipment can not be displayed,Simply put,Directly determine the number of characters,More than one skipped on the line。


Such,We got a table full flash the corresponding Chinese characters,You can now generate the code table Changjei,After generation have 9 More than 10,000 words,But still it can actually display 2 Wan about,By the same token,Remove the character can not express,The rest is the "official chronograph" the。

As for crash,Express is a simplified Changjei,The basic rule is encoded more than two Changjei then take the head and tail ......

I.e., a → a、ab→ab、abc→ac、so abcd → ad ......。

This time,We get the full flash and full flash crash Changjei code table。


Full font is open authorization,To indicate the source,ReferenceAuthorization Statement。Anyone canOpen PlatformDownload this font。


好久以前就看到这篇最近下来比对了下他们这套数据确实全单从收字数数据来看比就rime-cangjie多收了2万条左右但是有比较严重的二三代编码遗留对习惯五代码的人来说不友好去除冗余的BMP CJK兼容形式以后感觉可以用来补一补rime-cangjie的潜在缺码


