Has long been,I'm rightCJandExpressThis is useful advice ...... circulated on the Internet version of the code table many fellow sufferers,But the name called "CJ",By contrast and Wubi,Anyhow, also points a stroke 86 and 98,Then there is little new in the new century, and so on,But no Changjei,No matter what version of the CJ,Called "CJ"。
This is very embarrassing,Changjei code table is different for each person with,But not the same where,He did not know。
In short, Full font (cns11643) provides a common Chinese official coding,Actually, this stuff is used to make up for the Chinese utf8 like expression of lack of capacity,Like gb18030,But it is interesting that the data package also includes CJ coding,Chinese pronunciation、Radicals、Number of strokes、Stroke process data, etc.,Therefore bring the generated code table Changjei,It should be considered "official" of the bar code table。
Transcoding
After downloading the data packet has a description,turn up Open_Data/Napidagtbles/Unicode table of Contents,Inside there are three coding mapping file,We need to put out the coding process mapping dictionary,Otherwise, the direct use of full character encoding human readable。
For example, this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
15-604E 455C 15-607A 4787 15-6133 93B9 15-6143 93BF 15-6172 9BCF 15-6177 9D64 15-617C 9EBF 15-6253 3C07 15-6278 3F05 15-627B 3F06 15-636E 45F9 15-6377 89B8 15-6434 4953 |
Coding is very simple,Unicode character code corresponding to the whole coding,We only need the corresponding unicode encoded into human readable text to,The key statement is as follows (Python 3):
1 2 3 4 5 6 |
for line in file.readlines(): line = line.strip() first,last = line.split('\t') last = ('\\u'+last).encode().decode('unicode-escape') target.write('{}\t{}\n'.format(first,last)) |
Note here that the fifth line,We added to the former unicode code \in Prefix,Let him become a unicode of escape,Then all of the unicode string encoding rules to convert the shape became " uffff" of,Then encode a binary string,Then unicode decoding to generate a human-readable Chinese characters。
After generating the appearance is as follows:
1 2 3 4 5 6 7 8 9 10 11 |
14-546F 恸 14-5470 恹 14-5471 恺 14-5472 恻 14-5473 恽 14-5474 㦳 14-5475 㧛 14-5476 㧝 14-5477 挘 14-5478 挜 14-5479 挝 |
of course,Also included like this:
1 2 3 4 5 6 7 8 9 10 11 |
14-237A ﯪC 14-2422 ﯪ8 14-242B ﯪ5 14-242E ﯪ4 14-2430 ﯪ2 14-2432 ﯪ1 14-243A ﯪ0 14-243B ﯩF 14-243C ﯩE 14-243D ﯩD 14-243E ﯩC |
These are because the scope of the text font-wide expression ratio of large to utf8,You can not see these words in ordinary devices,So it is displayed as garbled。In short,We do not need these words,After all, most of the equipment can not be displayed,Simply put,Directly determine the number of characters,More than one skipped on the line。
Express
Such,We got a table full flash the corresponding Chinese characters,You can now generate the code table Changjei,After generation have 9 More than 10,000 words,But still it can actually display 2 Wan about,By the same token,Remove the character can not express,The rest is the "official chronograph" the。
As for crash,Express is a simplified Changjei,The basic rule is encoded more than two Changjei then take the head and tail ......
I.e., a → a、ab→ab、abc→ac、so abcd → ad ......。
This time,We get the full flash and full flash crash Changjei code table。
Authorize
Full font is open authorization,To indicate the source,ReferenceAuthorization Statement。Anyone canOpen PlatformDownload this font。
References
- Full flash official website https://www.cns11643.gov.tw
Original article written by LogStudio:R0uter's Blog » Changjei generation and accelerated code table from cns11643 Chinese official standard full flash
Reproduced Please keep the source and description link:https://www.logcg.com/archives/3211.html
A long time ago to see this,Recently ratio down to the next,They did a full set of data,From the received data, a single word can rime-cangjie than overcharged about 20,000,But there are more serious two、Three generations of legacy coding,For people accustomed to five code is unfriendly。After removing redundant BMP CJK Compatibility Forms,Feeling can be used to mend rime-cangjie potential missing code。
Now folk cangjie users in mainland basically ran the camp rime,Considered to be the open source Cangjie input method is relatively active only solution to maintain it。