Changjei generation and accelerated code table from cns11643 Chinese official standard full flash

Has long been,I'm rightCJandExpressThis is useful advice ...... circulated on the Internet version of the code table many fellow sufferers,But the name called "CJ",By contrast and Wubi,Anyhow, also points a stroke 86 and 98,Then there is little new in the new century, and so on,But no Changjei,No matter what version of the CJ,Called "CJ"。

This is very embarrassing,Changjei code table is different for each person with,But not the same where,He did not know。

In short, Full font (cns11643) provides a common Chinese official coding,Actually, this stuff is used to make up for the Chinese utf8 like expression of lack of capacity,Like gb18030,But it is interesting that the data package also includes CJ coding,Chinese pronunciation、Radicals、Number of strokes、Stroke process data, etc.,Therefore bring the generated code table Changjei,It should be considered "official" of the bar code table。

Transcoding

After downloading the data packet has a description,turn up Open_Data/Napidagtbles/Unicode table of Contents,Inside there are three coding mapping file,We need to put out the coding process mapping dictionary,Otherwise, the direct use of full character encoding human readable。

For example, this:

Coding is very simple,Unicode character code corresponding to the whole coding,We only need the corresponding unicode encoded into human readable text to,The key statement is as follows (Python 3):

Note here that the fifth line,We added to the former unicode code \in Prefix,Let him become a unicode of escape,Then all of the unicode string encoding rules to convert the shape became " uffff" of,Then encode a binary string,Then unicode decoding to generate a human-readable Chinese characters。

After generating the appearance is as follows:

of course,Also included like this:

These are because the scope of the text font-wide expression ratio of large to utf8,You can not see these words in ordinary devices,So it is displayed as garbled。In short,We do not need these words,After all, most of the equipment can not be displayed,Simply put,Directly determine the number of characters,More than one skipped on the line。

Express

Such,We got a table full flash the corresponding Chinese characters,You can now generate the code table Changjei,After generation have 9 More than 10,000 words,But still it can actually display 2 Wan about,By the same token,Remove the character can not express,The rest is the "official chronograph" the。

As for crash,Express is a simplified Changjei,The basic rule is encoded more than two Changjei then take the head and tail ......

I.e., a → a、ab→ab、abc→ac、so abcd → ad ......。

This time,We get the full flash and full flash crash Changjei code table。

Authorize

Full font is open authorization,To indicate the source,ReferenceAuthorization Statement。Anyone canOpen PlatformDownload this font。

References

  • Full flash official website https://www.cns11643.gov.tw

Original article written by Gerber drop-off:R0uter's Blog » Changjei generation and accelerated code table from cns11643 Chinese official standard full flash

Reproduced Please keep the source and description link:https://www.logcg.com/archives/3211.html

About the Author

R0uter

The non-declaration,I have written articles are original,Reproduced, please indicate the link on this page and my name。

Comments

  1. A long time ago to see this,Recently ratio down to the next,They did a full set of data,From the received data, a single word can rime-cangjie than overcharged about 20,000,But there are more serious two、Three generations of legacy coding,For people accustomed to five code is unfriendly。After removing redundant BMP CJK Compatibility Forms,Feeling can be used to mend rime-cangjie potential missing code。

    Now folk cangjie users in mainland basically ran the camp rime,Considered to be the open source Cangjie input method is relatively active only solution to maintain it。

Leave a Reply

Your email address will not be published. Required fields are marked *