GKJ-00944 |
Date | Description |
---|---|
IRG #57 2021-09-17 (Fri) 8:54 am +0800 Recorded by CHEN Zhuang | Postponed for further investigation |
Version | Description |
---|---|
2.0 | For 00014, change Status to Postponed |
2.0 | For 00014, add Discussion Record "Postponed for further discussion of encoding model, IRG 57." |
Source Reference | Glyph |
---|---|
GKJ-00944 | 1.0 |
group | China (GKJ - Science and Technology Characters) |
a) Source reference | GKJ-00944 |
b) PUA Code of TTF | E374 |
c) KangXi Radical Code(Primary) | 1.0 |
d) Stroke Count(Primary) | 11 |
e) First Stroke(Primary) | 5 |
g) Total Stroke Count | 12 |
i) IDS (Ideographic Description Sequence) | ⿰⿳一巛⿸厂二少 |
j) Similar/ Variants | N/A |
k) Ref. to Evidence doc | 碳原子、取代基及官能团数目的中文命名演变:1908—1932 |
Review Comments
I doubt they should be encoded as Han Ideographs. Encoding them as a single character is open-ended.
They are parts of the examples in 有機化學命名芻議[1] (A proposal of Organic Chemicals Nomenclature, see attachments below) by 梁國常. Here I summarize the naming rules related to the WS2021 characters.
* Alkane CnH2n+2 is named as ⿰⿳一巛⿸厂X充, where X is one of 一二三四五六七八九十 when n = 1, 2, ... , 9, 10. So WS2021-00016 ⿰⿳一巛⿸厂一充 is methane (CH4) when n = 1, WS2021-00017 ⿰⿳一巛⿸厂二充 is ethane (C2H6) when n = 2. When n = 11, ..., 19, X is the top-down combination of 十一, 十二, ... , 十九. For example, he gave ⿰⿳一巛⿸厂⿱十五充 for C15H32. When n = 21, ... , 29, X is the combination of 廿一, 廿二, ... 廿九. For example he gave ⿰⿳一巛⿸厂⿱廿一充 for C21H44. He also gave ⿰⿳一巛⿸厂⿱六十充 for C60H122, but he didn't give any example for n > 60.
* Alkene CnH2n (n >= 2) is named as ⿰⿳一巛⿸厂X欠, how X is created from n is as same as how he proposed for alkane. So WS2021-01900 is ethylene (C2H4) when n = 2.
* Alkyne CnH2n-2 (n >= 2) is named as ⿰⿳一巛⿸厂X少, how X is created from n is as same as how he proposed for alkane. So WS2021-00014 is ethyne (C2H2) when n = 2.
* Aromatic ring compounds like Benzene and Furan are named as ⿴囗⿳一巛⿸厂X, where X represents the number of nucleus in the ring. So WS2021-00777 ⿴囗⿳一巛⿸厂六 is benzene and ⿴囗⿳一巛⿸厂五 (also shown in the WS2021-00777 evidence but not submitted) is furan (C4H4O). He did not mention how to name other ring compounds with identical nucleus number, such as pyrrole (C4H5N), let alone general ring compounds.
Encoding ⿰⿳一巛⿸厂X充 as a single character is open-ended because alkane, alkene and alkyne can have arbitrarily large number of carbons. If we accept ⿰⿳一巛⿸厂一充, should we accept ⿰⿳一巛⿸厂⿱六十充 (already in the given evidence)? Should we accept ⿰⿳一巛⿸厂⿳一百一充?
Encoding ⿴囗⿳一巛⿸厂X as a single character is also open-ended because cycloalkanes (hydrocarbons ring compounds) also have arbitrarily large number of carbons, let alone general ring compounds. If we accept ⿴囗⿳一巛⿸厂六, should we accept ⿴囗⿳一巛⿸厂七? Should we accept ⿴囗⿳一巛𫨅?
Alternative encoding solution
To encode these characters, I suggest they be placed in a new Unicode block, using a different encoding model than Han Ideographs. Take alkane for example. we can encode ⿰⿳一巛⿸厂◌充 as a base character, which starts a ZWJ sequence to represent different alkanes:
⿰⿳一巛⿸厂二充 --> ⿰⿳一巛⿸厂◌充 + ZWJ + 二
⿰⿳一巛⿸厂⿱六十充 --> ⿰⿳一巛⿸厂◌充 + ZWJ + 六 + ZWJ + 十
Reference:
[1] 梁國常. 有機化學命名芻議. 北京大學月刊. 1920(7). pp71-89. Available online: http://read.nlc.cn/allSearch/searchDetail?searchType=all&showType=1&indexName=data_404&fid=01J000340
Attached PDF file