Please wait while loading

IRG Working Set 2021v1.0

Source: Henry CHAN
Date: Generated on 2024-10-08

Show Deleted | Show comments from version: 1.0 2.0 3.0 4.0 5.0 6.0 7.0
The Image/Source column is displayed as it was in WS2021 v1.0. The character may have a different status in the latest working set.

Unification

SnImage/SourceComment TypeDescription
01797
01797
木 75.8.3
GDM-00241
TS 12 · IDS
Unification
U+23591
Suggest to Unify to 𣖑 (U+23591), and update the glyph and source reference of U+23591 to GDM-00241.
04577
04577
魚 195.5.2
GKJ-00245
TS 16 · IDS 𫠠
UCV
Suggest the addition of a new UCV rule:


Based on the provided evidence, the glyph shape should be recognized as the component in the middle one instead of the current glyph shape in the provided font. It would subsequently be unifiable with 鮆 (U+9B86).
00263
00263
八 12.23.1
GKJ-00273
TS 25 · IDS
Unification
U+29F25
Possibly unifiable to 𩼥 (U+29F25).

This character seems to be a corrupted form of 𩼥 (U+29F25) where the top grass radical has been moved to the left and joined with 大 to become a 关 structure, with the bottom part added an additional dot as a generalization to another 关.

The main sources quoted for these characters are known to contain vulgar forms of characters. It may be more appropriate to encode them as unified variants of existing characters instead of as new characters.
04697
04697
鳥 196.5.2
GKJ-00333
TS 16 · IDS
UCV
Here is an example of some variant forms of 號:



(See: http://coe21.zinbun.kyoto-u.ac.jp/djvuchar?query=號)

There should be a new UCV for ⿱口了 and 号 on the left.

Based on the same source, ⿱口丁 can also be added:
04789
04789
鳥 196.15.4
GKJ-00360
TS 26 · IDS 𦎧
Unification
U+2A1BE
Consider Unification to 𪆾 U+2A1BE and updating the form at U+2A1BE.

GKJ-00360 seems to be a more normalized transcription than the current form at U+2A1BE.
04702
04702
鳥 196.6.1
GKJ-00382
TS 17 · IDS
Unification
U+9D1B
Suggest unification to 鴛 and encoded via IVS.

Seems to be a commonly miswritten form of 鴛.
Unification
夗 at the top is often 類化 to ⿱一夗 or 死, thereby losing its phonetic indication. Some examples of such variations:

怨 ~ ⿱死心:
Source from MOE Dictionary:


Source from 京都大学《拓本文字データベース》:


苑 ~ ⿱艹⿱一夗
Source from MOE Dictionary:


Source from 京都大学《拓本文字データベース》:
04910
04910
鼠 208.5.3
GKJ-00747
TS 18 · IDS
Unification
U+2A54B
Unify to 𪕋 U+2A54B.
Add new UCV 夘 and 卯
02573
02573
生 100.11.3
GKJ-00951
TS 16 · IDS
Unification
U+2DEB7
The glyph shape provided in the evidence matches 𭺷 (U+2DEB7), it should be unified to 𭺷 (U+2DEB7) upon correction.
01496
01496
手 64.10.4
GZ-2132401
TS 12 · IDS
Unification
The shape given for KC-06293 on the www.koreanhistory.or.kr website shows:


If ROK can confirm the shape given on www.koreanhistory.or.kr is correct, then the shape in ISO10646 should be updated.
00828
00828
土 32.8.1
SAT-04378
TS 11 · IDS 𡗜
UCV
Suggest to add 2 new UCV: (1) 奎 & SAT-04378, (2) 夸 & 𡘆
01476
01476
手 64.9.1
SAT-05633
TS 12 · IDS
UCV
Add new UCV 怱 = ⿱匇心
00416
00416
又 29.6.2
SAT-05802
TS 8 · IDS
UCV
Suggest to add UCV ⿱回又 and 𠬸, and unify to 𠬸.

Based on context the quoted character is identical in semantics to 𠬸 as it refer to an ancient form of the right hand part of 歿.
01283
01283
彐 58.5.5
SAT-07269
TS 8 · IDS
Unification
U+38C7
Suggest to unify to 㣇 (U+38C7).
01207
01207
巛 47.10.1
SAT-08726
TS 13 · IDS 𠙻
Unification
U+24C91
Unify to 𤲑 (U+24C91).
Suggest to add a new UCV rule ⿱巛𠙻 to 甾.
(To copy some examples from MOE Dictionary)
UCV
A similar form can be found in the MOE Dictionary:


Suggest to unify these shapes:
03783
03783
言 149.11.5
SAT-08887
TS 18 · IDS
UCV
Add a new UCV 㕘~參 and new UCV 㕘~喿
01293
01293
彳 60.6.3
SAT-08911
TS 9 · IDS 𠂛
Unification
U+5EF6
Unify to 延 (U+5EF6) and add a new UCV for the other shape to 廴.
03524
03524
虫 142.6.1
SAT-09003
TS 12 · IDS
Unification
U+45A4
Suggest unification to 䖤 and encoded via IVS.

夗 at the top is often 類化 to ⿱一夗 or 死, thereby losing its phonetic indication. Some examples of such variations:

怨 ~ ⿱死心:
Source from MOE Dictionary:


Source from 京都大学《拓本文字データベース》:


苑 ~ ⿱艹⿱一夗
Source from MOE Dictionary:


Source from 京都大学《拓本文字データベース》:


See also:
04702
鳥 196.6.1
GKJ-00382
TS 17 · IDS
00183
00183
人 9.10.1
SAT-09305
TS 12 · IDS
UCV
We may reference this UCV added in WS2017:
04266
04266
阜 170.9.3
UK-20475
TS 12 · IDS 𠚏
UCV
Add new UCV for the different wrapping structural forms of 𠚏.
00358
00358
勹 20.15.3
UK-20529
TS 17 · IDS
Unification
00030
00030
丿 4.5.2
UK-20575
TS 6 · IDS 𠂝
Unification
02110
02110
水 85.15.1
UK-20882
TS 18 · IDS 𠏉
UCV
I suggest a new UCV of 榦 & 𠏉 only, with 幹 excluded. In that case it won't be unified with 澣 (U+6FA3), but it can be unified with ⿰氵榦 if we see it in the future.
03409
03409
艸 140.12.3
UK-20947
TS 16 · IDS 𣸧
UCV
Based on the discussion, it is suggested to add a new UCV as follows:
00571
00571
口 30.10.5
UK-20997
TS 13 · IDS
UCV
Add a new UCV rule 沓~㳫.
㳫 is a corrupted form of 沓 according to the Kangxi Dictionary.
01363
01363
心 61.11.1
UTC-03181
TS 15 · IDS 𡗗丿
UCV
Based on the discussion, it is suggested to add new UCV:
02380
02380
犬 94.8.3
UTC-03186
TS 11 · IDS 𫤘
UCV
Suggest to add new UCV 兒/𠒇/𫤘.
00155
00155
人 9.8.1
UTC-03188
TS 10 · IDS
UCV
I think we should remove NUCV #317. That rule was put there at a time when we did not have the Ideographic Variation Database. The form 靣 is sufficiently rare nowadays but was a common variant form of 面, which makes it a good candidate to be encoded as an ideographic variation instead of as a separate character.
00450
00450
口 30.5.5
UTC-03197
TS 8 · IDS
Oppose Unification
Do not unify to 呣 (U+5463).
They are non-cognate. 呣 is used as a modal particle while UTC-03197 is used for negation.
01259
01259
广 53.9.3
UTC-03230
TS 12 · IDS 𠂊广
Unification
U+2B756
Unify to 𫝖

JH-JTC064 is a variant of 麁.

Also refer to the following Google search result, which shows the place name in the evidence to be 麁利町:
Unification
The pronunciation そ is also consistent with the on-reading of 麁.
00033
00033
丿 4.6.1
VN-F000F
TS 7 · IDS
Unification
U+6240
U+3ABD
U+20A44
U+2B742
U+2B826
U+2CED0
Existing coded variant forms of 所 include 㪽 (U+3ABD), 𠩄 (U+20A44), 𫝂 (U+2B742), 𫠦 (U+2B826), 𬻐 (U+2CED0).


Attributes

SnImage/SourceComment TypeDescription
02575
02575
生 100.17.1
GKJ-00953
TS 22 · IDS
Residual Stroke Count
SC=18, FS=2, TC=23
01273
01273
廴 54.11.1
VN-F1BA8
TS 13 · IDS
Total Stroke Count
TC=14


Evidence

SnImage/SourceComment TypeDescription
04700
04700
鳥 196.6.1
GKJ-00341
TS 17 · IDS
Evidence
Note: 𫛥 (⿰夹鸟) is already encoded at U+2B6E5. Does China want to encode this half simplified form?
04938
04938
鼠 208.18.4
GKJ-00650
TS 31 · IDS
Unclear evidence
Can we have additional evidence to show that this character is in actual use, and not an over-zealous derived traditional character from 𪖂?
02574
02574
生 100.11.4
GKJ-00805
TS 16 · IDS
Evidence
This seems to be an erroneous form of U+8564 蕤. Suggest to include more non-computer typed evidences to confirm if this is a stable error, or withdraw.
03960
03960
辵 162.5.1
TC-4261
TS 9 · IDS
Evidence
The new evidence provided by Eiso shows that the sound is 穴, which indicates that the inner component should be 戉 instead of 戊. Would there be another evidence to show that 戊 is the expected form?
00270
00270
冖 14.3.3
UK-20504
TS 5 · IDS 丿
Evidence
Are there more examples to prove which form is more canonical as a stroke-reduced form of 官?
00198
00198
人 9.12.2
UTC-00385
TS 14 · IDS
Unclear evidence
Likely a misprinted form of 𠎹. If so, suggest withdrawing.
04238
04238
門 169.11.3
UTC-03180
TS 19 · IDS
Evidence
I believe the shape for 7064 in this version is a misprint. Based on the number of strokes in the right hand side index, 7064 should be 臬 (10 strokes) instead of ⿱自夲 (11 strokes). The surrounding characters 7063 and 7067 are also 真 (10 strokes) and 欮 (10 strokes) respectively.

(The characters are ordered by stroke count in ascending order. The only exception is the possibility of an additional character added in the originally empty space at the end of the list. Cf: 7078 闆 (9 strokes) for 門 and 7021 錳 (8 strokes) for 金.)
03505
03505
虫 142.3.3
UTC-03182
TS 9 · IDS
Evidence
This seems to be a misprint of 虷. In other version of telegraph codes, 9168 is mapped to 虷.
03804
03804
豸 153.3.3
UTC-03183
TS 10 · IDS
Evidence
This seems to be a misprint of 豻. In other version of telegraph codes, 9251 is mapped to 豻.


Glyph Design & Normalization

SnImage/SourceComment TypeDescription
01138
01138
山 46.8.1
UK-20874
TS 11 · IDS
Glyph design
The 大 in the middle probably needs redesigning to fit the bottom component better
Glyph design


Other

SnImage/SourceComment TypeDescription
04686
04686
鱼 195′.17.5
GKJ-00285
TS 25 · IDS 𬶨
Comment
The text of the corresponding Wikipedia entry https://zh.wikipedia.org/wiki/白鱀豚 looks like this:


If the text is accurate, both ⿰鱼𬶨 or ⿰鱼暨 seem acceptable.
04783
04783
鳥 196.14.1
GKJ-00349
TS 25 · IDS
Other
⿰截鳥 is not an error form, but an alternate form closer to modern conventions. In modern conventions 截 is used instead of 𢧵.

Theoretically we could add this as a UCV as this is a systematic transliteration variant, but most variants are already coded. Or we should consider adding this as an NUCV and encoding the remaining variants.
03670
03670
虫 142.23.2
GKJ-00355
TS 29 · IDS
Comment
As mentioned in the meeting by Wang Yifan, the same character was submitted in WS2017 as 03913 and was postponed because the text suggests it is a miswritten ligature.
02206
02206
火 86.10.1
GXM-00267
TS 14 · IDS
Comment
The evidence provided in IRG WS2017 is as follows:
01418
01418
戈 62.4.2
SAT-03722
TS 8 · IDS
Comment
In IRG #57, there is discussion whether this should be unified with U+22994, as WS2021-01417 (⿱木戈) and WS2021-01418 (⿱水戈) constitute a continuum of variants to U+22994.

U+22994:


WS2021-01417 (A):


WS2021-01417 (B):


WS2021-01418:
03113
03113
糸 120.13.5
SAT-06345
TS 19 · IDS &S4-02;
Comment
Are there any examples of other characters where the bottom part of 雋 has been stretched out similarly?
00676
00676
口 30.15.1
SAT-06793
TS 18 · IDS 𢀩
Comment
Here is the information of U+2D2FF from http://www.koreanhistory.or.kr/newchar/list_view.jsp?code=76716:
00395
00395
卩 26.8.3
SAT-06950
TS 10 · IDS
Comment
To recap the discussion: It would be better to judge for Unification via a new UCV rule with specific constraints, or to encode it separately, if more data could be provided on how systematic the "stable misprint" or "stable variation" between 卩 and 阝 components. For example, if it is sufficiently rare, IRG may choose to encode this "error" as a separate character. But if such variation is quite common in SAT's corpus, it would be better to unify them all systematically.
01283
01283
彐 58.5.5
SAT-07269
TS 8 · IDS
Comment
[ {{WS2017-01228}} ]

To recap the discussion, another variant (WS2017-01228) also exists. It will be easier for IRG to judge unification or to code which characters after we can see the other variants of this character.
02578
02578
田 102.2.5
SAT-08976
TS 7 · IDS
Comment
This looks like a corrupted form of 臽? A corrupted form of the shape on the right of
01695
01695
日 72.11.3
TB-4B46
TS 15 · IDS
Comment
The evidence provided by Tao Yang does not support the pronunciation provided by TCA. Are there any other sources besides 汉字海? As 汉字海 seems be unreliable sometimes.
03343
03343
艸 140.7.2
TC-5F3D
TS 11 · IDS
Comment
The evidence provided does not have the same pronunciation as that provided by TCA.
03245
03245
肉 130.10.2
TD-4C77
TS 14 · IDS
Comment
The evidence seems to suggest it is a different pronunciation than that supplied by TCA.
04317
04317
雨 173.8.4
TD-7228
TS 16 · IDS
Comment
Evidence suggests it is a Taoist character, the handling to be decided by IRG.
04543
04543
髟 190.16.2
TE-4A60
TS 26 · IDS
Comment
The evidence provided by Tao Yang seems to suggest it is a misprint of 鬚.
03988
03988
辵 162.12.2
TE-6E7C
TS 16 · IDS
Comment
The evidence provided by Tao Yang does not match the pronunciation provided by TCA.
04338
04338
雨 173.11.3
TE-7835
TS 19 · IDS
Comment
Is this a variant of 虧? The pronunciation and the evidence provided by Tao Yang seems to indicate so.
01030
01030
子 39.17.2
TE-7B2E
TS 20 · IDS 𮑠
Comment
The pronunciation and evidence provided by Tao Yang indicates that it is a variant of 孽 (U+5B7D).
03881
03881
足 157.7.1
UK-20276
TS 14 · IDS 𧾷
Comment
The traditional character 𬧄 has been encoded in Extension C and seems to be also used in Min Nan. So I think this source can be accepted as sufficient proof of the existence of the simplified form.
02689
02689
目 109.5.1
UK-20290
TS 10 · IDS
Comment
This character is definitely used for this meaning in Cantonese and is already encoded in the URO:


I think the existence of this character is sufficiently proved.
04469
04469
饣 184′.7.4
UK-20291
TS 10 · IDS
Comment
This character is used in Cantonese for meaning of gnawing a bone and the traditional form is coded in Extension E:


I think there is sufficient evidence to prove that this character exists.
00270
00270
冖 14.3.3
UK-20504
TS 5 · IDS 丿
Comment
Cf: MOE Dictionary entry #A01015-009


Data for Unihan

SnImage/SourceComment TypeDescription
00039
00039
乙 5.1.3
UK-20253
TS 2 · IDS 丿
Semantic variant
Semantic Variant of 身, may be Spoofing Variant of 𠠲 (U+20832)
00873
00873
土 32.11.2
UK-20657
TS 14 · IDS
Semantic variant
Possible variant of 堽?